The latest in machine translation

Seeing as translation is a pretty big deal in this community, I thought you might be interested in this video. The timestamped url (at 8:18; segment ends around 10:00, so very short) should redirect you to the exact moment with the presenter begins talking about their recent upgrades to auto-translation.

This is a pretty big deal. Remember, their quality comparison of a human translation is that of a bilingual professional translator, who tend to have not only a better grasp of both languages but also more willingness/dedication to spend time than hobbyist translators that we see in the community. I can say that a large number of translators (at least back when I did translations) do stick to phrase-based translation with very little sentence restructuring -- which is extremely tiresome, time consuming, and, unless the translator has the language mastery of a Interpreter (who can accurately translate in real time), it forces the translator to stop read-translating and switch to 'editing mode' after every line. This is one of the key reasons why fan translators often spit out very awkward, easy to misunderstand sentences that will give your English teacher a heart attack, because they still mirror the original JP/CN sentence structure.

It is apparent that Google is rapidly overcoming this hurdle, with their auto-translator increasingly able to restructure a sentence and even reformat multiple sentences.

The fact that google has integrated machine learning with their web based translators (although this is not new) also mean that their translation software will be growing and learning faster than any human translator is capable of. It's likely why their CN-EN translator has been doing so well, seeing as this is easily the most commonly encountered language barrier in the world (sorry JP, you're just not as important when it comes to international business).

It won't be long now before hobbyist translators are obsolete. Professional translators will be around for longer, because humans prefer to have another human do fact-checking (because for some reason, our hormone-addled, emotional brains are more trustworthy than logical electronic circuits? Yeah right). But in the long run? When you look at it in terms of decades instead of just years? Yeah... robots are totally taking over our jobs.

 

P.S. Remember this post?

25 thoughts on “The latest in machine translation

  1. Truth

    I call bullshit on this translation, I've ever seen Google translate something so cleanly. Anything more than a line and google starts speaking gibberish.

    Reply
  2. needhydra

    Im personally fine with with mtl atm as i use it more for vocab then actual translation. Im very bad with remembering vocab but actual grammar and understanding is not an issue for me oddly enough which is the hard part.

    I chalk this up to being well traveled as a child and have taken an effective year or 2 of a couple different languages and exposed repeatedly to a few others, mostly in high school because it had a lot of different languages(compared to other schools in the same state) to choose from, mostly taught by native speakers that immigrated to the US.

    Though you could grab a distance learning option and do any language really, aka a web course in a room with others doing a web course of the same or different language. while fun and interesting no one really finished the complete courses and did at most 75% of the web courses, but none cared not even the teacher as long as you had a basic understanding of the language you took and did not expect to use that as a language credit.

    Reply
  3. GURPy

    I can't wait to see this improve and spread. I heard something a while back about there being an ear piece that can almost translate in real time. Couple a few other recent advances and we're primed for some kick-ass international parties.

    Reply
  4. David

    This is my favorite American English sentence. Wonder what the translator would do with it?

    Buffalo buffalo Buffalo buffalo buffalo buffalo Buffalo buffalo.

    (Yes, that's a real sentence, look it up if you don't believe me.)

    Reply
  5. Taabraiz

    machine translation just gives the bare minimum grammatical quality i've been doing bachelors in chinese and engish literature for an year and still have a hard time sometimes there are areas where two words that have a lot of overlap in their respective Chinese and English definitions, but nonetheless are not used in quite the same way or have a mismatch in certain contexts. Other words might have an implied value judgment in one language but not in another.

    For instance, if I'm translatin an article in economies, 弹性 means "elastic", as in 弹性需求 (elastic demand), but if I'm translating an article in psychology, it means "resilience"machines dont have the logical sense to actually know which terminology would be the most appropriate sometimes

    There are areas where either the source or the target language has much more precise words to describe something.

    For instance, used as a verb, the Chinese word 博弈 means "compete", more or less, but it is most appropriately used when the competition is both asymetrical and zero-sum (e.g. bartering, where one party is trying to increase the price at the other's expense, while the other party is trying to decrease the price at the other's expense). I cannot think of such a precise word in English.

    Sometimes this happens because Chinese makes use of lots of colorful metaphors. An article I was helping a friend translate earlier read (literal translation): "his understanding of the 脉络 (literally 'vascular system', used figuratively to mean 'underlying fabric') of communication studies is more and more clear". I suppose an English reader would understand either of these, but I can't say that either would read smoothly. Unable to think of a close metaphor that wouldn't draw too much attention to itself at the expense of the meaning of the sentence, I settled for "his understanding of the various aspects of communication studies and how they fit together is increasinly clear". The problem is, this translation may capture some of the geometric vividness of the source language, but it loses the sense that these "various aspects" have their own vitality and work together to perform a vital function, not to mention the sharpness of using one word instead of nine.

    Of course, some of these potent metaphors have by now become fully integrated into the English language, to the point we forget they came from Chinese. "To lose face" is one such example.

    you simply can't expect a machine to know all this you know.

    there are idioms to describe a very specific phenomenon or sensation.

    Chinese makes use of a lot of set phrases called "chengyu/成语", many of which make allusions to historical events or scenes from canonical works of fiction. Some have rough equivalents in English, like the line from The Romance of the Three Kingdoms "说曹操,曹操就到/mention Cao Cao and all of a sudden he shows up", which is used the same way as "speak of the devil," in English. However, some really can't be translated. One of my favorites, also involving the character Cao Cao, is "望梅解渴/to quench one's thirst by admiring a plum forest". It describes a scene where Cao Cao leads his army to a plum forest before battle because they are thirsty, and just by looking at the forest, their thirst becomes quenched. The point, as I understood it, is not to describe something magical, but to describe that temporary and partial sense of relief we get from running our finger around the edge of a mosquito bite. Although everyone can understand this feeling, I cannot think of any phrase in English that would the original phrase justice in translation.

    Grammar:

    1) Phrasal Verbs. Because English is a hybrid mostly of Anglo-saxon, a Germanic language, and Middle French, a Latin language, it has certain inconsistencies that can become a serious head ache. One of the biggest ones, how to translate phrasal verbs, is something you'd probably never think about unless you were translating to or from English or studying linguistics. English has inhereted from German the ability to make some verbs phrasal, or add a preposition to indicate direction or completion (e.g. set up, put down, carry over, etc.). However, this doesn't work for many verbs with Latin roots, like many of those ending in "ize". You can "wrap up" your argument, but you can't "synthesize up" your main points. Chinese can also make a lot of verbs phrasal. A lot more, in fact, and it's complicated by the fact that some Chinese verbs that are composed of two characters inherently have a complement indicating direction or completion.

    2) Direct and Prepositional Objects. A common Chinese sentence structure is to use the character 把, which functions as something of an auxillary verb, to relocate the direct object of a verb before the verb. To say "I put my cell phone on the desk", the most colloquial expression would be something along the lines of "我把手机放在桌子上了", which literally reads (roughly): I take (no possesive pronoun needed) cell phone put at desk on top + modal particle indicating this action was completed (verbs aren't conjugated in Chinese). This formula is easy enough, but once it gets into abstract ideas, it gets confusing, often because one needs to add another directional verb at the end of the whole phrase where one would make the object prepositional in English. For instance, the sentence "I caught up with him (his position)" would likely be said in conversation "我把他的地位赶上去了“, which literally reads: "I take he (possessive particle) position catch up + go + particle indicating completion."

    3) Prefixes and Suffixes. Many English adjectives can be changed into prepositions by adding "ity" or "ness", sometimes with modification to the spelling of the root word. Likewise, many English adjectives and nouns can be changed into verbs either by adding "ize" or "ate". Just about any Chiense adjective can be changed into a noun by adding "性", which means gender, or essence, and just about any Chiense noun or adjective can be Changed into a verb by adding "化". A lot of times there is simply no way to communicate this as sharply and succintly in English without outright making up a word.

    Reply
  6. nyururin

    well doesn't that only apply to Japanese light novels? Chinese light novel translators are quite extreme you know they release like 1 chapter a day for most novels actually this doesn't really benefit me much as i live in hong kong and learned Chinese for 12 years.... i can also purchase the chinese ver of the japanese light novels but it just aint the same as reading it in english chinese is quite the poetic language the way you write chinese is totally different to how you speak it if you regularly read chinese novels like me and read a japanese to chinese converted novel the grammar in it would feel dull as hell at least thats the feeling i get so i kinda question the claim of machine translation actually aiding in translating chinese to english when even the closets lanague to chinese cant even produce the perfect results when translated by humans
    i believe that the main reason to the success of chinese to engish translation is due to the education, for hong kong in the public exam we take(exam you take after graduating from you seconday education/high school with 50k+ canidates each year) you must know how to translate english to chinese for one of the chinese exam so basically all the university students in hong kong are qualified to be translator as they are the ones they score among highest agasint 50k other people :/
    it simply just means japan doesn't give as much f*ck about students learning english due their deficiency

    ps: Chinese Hanyucidian (汉语辞典) has 370,000 words alternative dictionaries have as many as 56,000 - 90,000 characters. an average chinese literate person knows 7,000-9,000

    Reply
    1. nyururin

      machine translation just gives the bare minimum grammatical quality i've been doing bachelors in chinese and engish literature for an year and still have a hard time sometimes there are areas where two words that have a lot of overlap in their respective Chinese and English definitions, but nonetheless are not used in quite the same way or have a mismatch in certain contexts. Other words might have an implied value judgment in one language but not in another.

      For instance, if I'm translatin an article in economies, 弹性 means "elastic", as in 弹性需求 (elastic demand), but if I'm translating an article in psychology, it means "resilience"machines dont have the logical sense to actually know which terminology would be the most appropriate sometimes

      There are areas where either the source or the target language has much more precise words to describe something.

      For instance, used as a verb, the Chinese word 博弈 means "compete", more or less, but it is most appropriately used when the competition is both asymetrical and zero-sum (e.g. bartering, where one party is trying to increase the price at the other's expense, while the other party is trying to decrease the price at the other's expense). I cannot think of such a precise word in English.

      Sometimes this happens because Chinese makes use of lots of colorful metaphors. An article I was helping a friend translate earlier read (literal translation): "his understanding of the 脉络 (literally 'vascular system', used figuratively to mean 'underlying fabric') of communication studies is more and more clear". I suppose an English reader would understand either of these, but I can't say that either would read smoothly. Unable to think of a close metaphor that wouldn't draw too much attention to itself at the expense of the meaning of the sentence, I settled for "his understanding of the various aspects of communication studies and how they fit together is increasinly clear". The problem is, this translation may capture some of the geometric vividness of the source language, but it loses the sense that these "various aspects" have their own vitality and work together to perform a vital function, not to mention the sharpness of using one word instead of nine.

      Of course, some of these potent metaphors have by now become fully integrated into the English language, to the point we forget they came from Chinese. "To lose face" is one such example.

      Reply
  7. krytyk

    Honestly speaking, this will make stuff worse. Japanese is such peculiar language you simply cannot get the correct translation without human translator in charge. Mainly due to the way Japanese works with the person reading or translating having to be in-the-know of the current situational circumstances to get the meaning right.

    Moreover, this display shows that its not that the translation improves, but that the sentences are made to make more sense "out of the box" instead. Sure this will make more people carefreely go copy-pastey and call that translation.

    PS: Kana is helluva pain in the ass for MTL. Not in a thousand years bruh.

    Reply
    1. Aorii Post author

      I've heard every language's linguist say that about their own language, just like every specialist says that about their field. "You can never get a machine to do it as well as I can!"... until it happens.
      I mean, in the end, human are just machines written by an ATCG code rather than binary, and any method of pattern recognition we employ will eventually be duplicated by machine learning.

      Reply
      1. krytyk

        Hmm, actually its a simple case to prove. Japanese, like Chinese, doesn't use spaces for writing. If you add kana onto that... you have a problem a machine simply can't get over without understanding the whole conversation and context. If I go and write "あいつをいかされた" machine can only give you one possible meaning, while depending on situation it can have like 3 or 4 meanings. So, is it "let him/her go"? or is it "spared his/her life"? or is it "made him/her c*m"?. I think I could come up one or two more possible translations for that depending on the situation this was said. And Japanese is full of this bullshit. So until the translation algorithms come unbelievably close to understanding stuff like humans, for languages such as these it'll remain necessary for human translators to be there.

        As for lack of spaces in Japanese writing and kana, the problem is that algorithm doesn't understand where one word starts and the other ends, also doesn't understand the so-popular interruptions in Japanese. Ex. な、なつ… So, the character could theoretically mean to stumble and say "s-seven..." with a small interruption between syllables, but might have as well said stumbled and repeated first syllable saying "S-summer...", but how would a machine translation understood this? The machine would have to be equipped with damn advanced algorithms to understand the conversation topic, as well as be equipped with one that perfectly understands speech patterns.

        Honestly speaking, there's much more peculiarities in Japanese that easily would get in the way of machine translations (let's not even try explaining onomatopoeia's). I'm not saying MTL will never get there, but what google is changing now isn't going to improve it by that much.

        Reply
        1. Aorii Post author

          And... that entire examples is just another reason why they've moved on from phrase-translation so where machine translators look at entire sentences or multiple sentences. For most writing, if one cannot grasp the context within a few lines, something is wrong (of course, LN is bit of a special case since it ignores many 'universal standards for good writing')

          Reply
          1. Owl

            Human beings can grasp the concept. Hard coded computers have difficulty in it. And yes, I do speak multiple languages along with the rest of my family. I personally know how messed up sentence structures can be if you change it from one language to another. It's more of an art form at times than a hard science, one that I have severe doubts that an inflexible computer can handle. And that is simply just considering languages within the same region. Translate cross continent and you get very strange things. Like Greek to Chinese or Chinese to German or Hebrew to English.

            I'm with Kry on this, it's more make news than anything revolutionary.

            Come to think of it, why do you think there are so many versions of "The Bible"? Most are correct, just phrasing differences but enough to change the whole tone depending on which one you are reading.

        2. Sophie

          "If you tell me precisely what it is a machine cannot do, then I can always make a machine which will do just that." - John von Neumann

          the point is, if the problem is understanding context ("to be in-the-know of the current situational circumstances" as you put it), it's possible to simulate it. Though it may be arguable that it's not strong enough yet. But, it also can be argued that it can get stronger with more people refining it.
          https://www.technologyreview.com/s/601396/ai-gets-more-real-thanks-to-contextual-deep-learning/

          To further generalize, if you can tell me what the problem is, good chance is I will be able to design something that will handle it or someone later will do it.

          For example I can use probabilistic method to handle case where there's several possible meaning and/or context. That is by assigning different probabilities, based on previously translated sentences or the text as a whole, to each possible contexts/meanings. As for stumble, I can think of several ways handling it (and I feel like there are more ways than I can think of), specific one is seeing that some stumble usually have "、" or in English "-" you can incorporate that into your database. As for ambiguity arising from doing that way, in general we can always use probabilistic assignment method to handle it (ambiguity most of the time could be handled with probabilistic method). Same for lack of space, since all of these practically a problem of ambiguity class, that their handling should be similar.

          The practicality on how to assign probabilities indeed make a headache, one way is to use deep neural nets, or some syntactical and semantics analysis methods, but some kind of neural nets hybridized with syntactical/semantics/structure method, i.e some kind of neural nets with particular structure, might be the best (though in principle, neural nets already include syntactical and semantics as well as structural analysis by itself, but there's always problem of training efficiency).

          But the real problem is that, we don't know what we don't know. For example, sometimes there's just new expression that's not already in the "expert system database", that is a database of sort that keeps tabs on vocabs and syntax as well as semantics. I mean, because language is organic, sometime new form of expression with new structure very far to that commonly encountered appears. But, it shouldn't happen too common. If it does happen often in a language, it's better to just treat that language as prototype of a new language that will truly become a new language soon enough, and because of so, still too young that it doesn't have a "foundational/fundamental structure" yet, hence the often new contexts appearing. However it's not the case for Japanese. This might be the case for any kind of onomatopoeia though.

          For onomatopoeia, a specialized system to handle it is possible, however you have to tell me how onomatopoeia is usually formed. But as long as onomatopoeia has certain common characteristics and structures within given language, equipped with database system, it should be possible to somewhat handle it.

          Practically, it may be because we haven't grasp what "context" is, or we haven't found a good methods/algorithms for it. But with time both will be overcome. And even if "sentences are made to make more sense "out of the box,"" as long as it passes turing test then it is effectively the same with "sentences are REALLY made to make more sense."

          But, a truly real problem that might never be overcome principally is the different realities represented by different languages. That is, a context in one language might not have an equivalent in another language. We're practically at lost here. I can think of several workaround but it might be just better to just do a literal translation on such cases.

          Reply
    1. Aorii Post author

      Machines can learn idioms and other methods of expression the same ways humans can, you know =P

      Reply
      1. Jessigurl

        It is true, such as intuitive GPS that learns your preferred routes etc. however by nature they will also usually take the simplest most direct outcome. If they really wanted to make these machine translators more useful, they should probably make it give several interpretations/readings of the requested phrase translation... probably become tiring for the person feeding the raw into the machine translator if they are trying to translate more than a sentence or two.

        Reply
  8. Abedeus

    Makes sense. On one hand, this will mean (I mean, eventually, when they make JP translation better) more and easier access to light novel translations, as well as other books. On the other hand, as someone who can speak more than just my native language, it makes my skills a bit less useful. Unless, of course, we're talking about conversations or real-time translations, something a machine probably won't do accurately or as nicely as a human can.

    Reply
  9. TamaSaga

    The problem is the context. Japanese is rather context heavy. I don't think Chinese is as context heavy as Japanese. While it's true that making readable sentences is an important first hurdle, the translator also needs the ability to read the very first sentence or somehow know about the scene taking place. Without which, you tend to get random translations.

    I've had multiple machine translations where it's actually first person speaking but the translator was treating it as third person with the accompanying viewpoints to match.

    Reply
    1. Aorii Post author

      Chinese is actually more context heavy than Japanese, due to that language's lack of tenses, free-form grammar, and love of idioms/adages. I bet the auto-translator is still going to have trouble with those for a while though.

      Reply
      1. RandomDude

        Actually, while Chinese is context-reliant for grammar, Japanese is context reliant for verbs and nouns, both because they can be left out and read or heard multiple ways, so Japanese is generally considered more contextual than Chinese. Personally, I think MTLs will still be stuck on terminology. I hear Chinese fiction make up nearly as many words as Japanese fiction,

        Reply
    1. Aorii Post author

      Huh... looks like they shortened the video and took out the 25m of intro. Thanks.

      Reply
  10. Sylvia

    makes sense.

    pity its only chinese to english however that does mean i can start to read better rough copies of chinese wuxia and novels that haven't yet been picked up, or got dropped in the past. cant wait to see this come for japanese translations.

    My past issues with machine translators is that most of the people who try to do it, don't really have a clue of what they are doing thus making a nasty mess. I have always been of the opinion that a machine translator can put out a decent translation (not amazing or high quality, but) that can accurately get the story across as long as they put some effort in.

    I typically see what is original -> machine -> minor edits -> posted. when what shoul have been done was to put the original through the machine, then rewrite it a localised way, a way that sounds natural before going to make edits and posting. If i had the time i would potentially machine translate a few chapters as a teaser for something, but it would be slow because i would put in the human-effort needed to make sense xD

    that said, past google translates used to scare the crap outta me when i tried to put in asian character languages and ask for a translation, now i may just be able to browse their websites even just a little bit easier than before.

    Reply
    1. Caudyr

      Are you sure it's just Chinese to English...and not that that was simply the example that they were using there?

      Either way, it's probably an "improvement"...but still...I'll trust human-done translations A LOT more, anyway. ^^

      Reply

Leave a Reply

Your email address will not be published. Required fields are marked *