Updated: 7/6/2005; 10:03:13 PM.
Kevin Schofield's Weblog
Musings on life, kids, work, the Internet, Microsoft, politics, orcas, etc.
        

Tuesday, May 31, 2005

ok, one more Star Wars parody, from Burger King. Play 20 Questions with Darth Vader. He makes all sorts of snarky comments as you go along.

It's actually pretty good -- I asked some pretty obscure things, and it got it right.

But the best part: keep your eye on the background behind Vader when it guesses right.


3:08:48 PM    comment []

There's an interesting thread on Slashdot this morning about Google's machine translation efforts, based upon some comments at a Google open house last week.

My favorite comment: Anyone care to make a bet that Microsoft will announce a new revolutionary language translation service sometime in the next two weeks or so?

Um... while we don't have a translation web service, we do have what we think is some of the best MT technology out there, and that's already public information. We've had a group working on MT here in Redmond for several years, based upon some technology that came out of our Natural Language research group. We also have a second team in our Beijing lab, working on translation between Asian and Western languages. The MSR MT research teams have published tons of papers on their work (which I would assume Google's MT folks have all read -- shame they don't publish papers on their "research" to give back to the community) including some talking about a successful tech transfer project using the MT system to translate articles from Microsoft's Product Support knowledgebase to other languages.

Which raises an interesting issue, known well to the MT community and hinted at in the slashdot thread: MT quality is directly tied to the quality of the training corpus, and is very domain-dependent. Google apparently is using United Nations transcripts and documents, which means that they will create a system that is potentially very good at translating United Nations speeches and documents. Since reporters, corporate marketing writers, and bloggers rarely write in that style, it's going to have real issues with general web site translation.

You can take that limitation and embrace it, however, as we did with the Product Support Knowledgebase project: we trained the system on knowledgebase articles that had been hand-translated, and then used it to translate more. It worked very well.

It's unfortunate that the thing that will most likely hold up faster progress in machine translation is the existence of corpora of translated materials. They're hard to come by, and expensive to create from scratch.


1:28:38 PM    comment []

© Copyright 2005 Kevin Schofield.
 
May 2005
Sun Mon Tue Wed Thu Fri Sat
1 2 3 4 5 6 7
8 9 10 11 12 13 14
15 16 17 18 19 20 21
22 23 24 25 26 27 28
29 30 31        
Apr   Jun


Click here to visit the Radio UserLand website.

Subscribe to "Kevin Schofield's Weblog" in Radio UserLand.

Click to see the XML version of this web page.

Click here to send an email to the editor of this weblog.