28 January 2004

Machine Translation

Another interesting blog from Tim Oren, this time on Machine Translation. My interest in this area was piqued about 2 years ago when we met about 3 teams based in either Moscow or St. Petersburg who had similar takes on the same problem. Russia has an inherent advantage in this area as a. programmers are smart and b. there are a surfeit of highly qualified linguists. As Tim points out the requirement for "a large stock of training data" makes this a human-resource intense project. Furthermore its not just data inputting but actually requires highly skilled linguists with reasonable computer skills to create the stock of data that is in effect the meta language.

The second reason that this area is so interesting is the marketing approach. MT will have to be at least as good as human translators before value will transfer from the services to software. Consider that much of the $10bn market is in highly specialized subjects which require very accurate translation and it would seem that humans will continue to dominate the market for some time. Yet if the product comes close to acheiving human translation standard once a document is analyzed the result is a reasonably accurate "semantic" analysis; which can be equally used for (say) document management.

So do you swing for the fence and try to grab the EU's tower-of-babel translation contract, or convince Asian eectronic goods manufacturers that their user manuals might even be helpful if translated in to a language that the end user recognised as his / her own, or take an easier route to the first, and less demanding $.

No comments:

28 January 2004

Machine Translation

Another interesting blog from Tim Oren, this time on Machine Translation. My interest in this area was piqued about 2 years ago when we met about 3 teams based in either Moscow or St. Petersburg who had similar takes on the same problem. Russia has an inherent advantage in this area as a. programmers are smart and b. there are a surfeit of highly qualified linguists. As Tim points out the requirement for "a large stock of training data" makes this a human-resource intense project. Furthermore its not just data inputting but actually requires highly skilled linguists with reasonable computer skills to create the stock of data that is in effect the meta language.

The second reason that this area is so interesting is the marketing approach. MT will have to be at least as good as human translators before value will transfer from the services to software. Consider that much of the $10bn market is in highly specialized subjects which require very accurate translation and it would seem that humans will continue to dominate the market for some time. Yet if the product comes close to acheiving human translation standard once a document is analyzed the result is a reasonably accurate "semantic" analysis; which can be equally used for (say) document management.

So do you swing for the fence and try to grab the EU's tower-of-babel translation contract, or convince Asian eectronic goods manufacturers that their user manuals might even be helpful if translated in to a language that the end user recognised as his / her own, or take an easier route to the first, and less demanding $.

No comments: