Google Translate for Yiddish? It Ain’t Worth Bupkis.
This article originally appeared in the Yiddish Forverts.
Several years ago I wrote an article for the Forverts describing the bizarre phenomenon by which a Google search for the innocent Yiddish word “meydlekh” (girls) yielded results for pornographic websites, listings for Canadian escorts and so forth. Were pornographic websites actually advertising in Yiddish? No, it was an error on Google’s end. Back then the search engine would automatically translate the words being “googled” into English and provide the relevant results. Since “girls” in English yielded pornography, and pornographic websites in English are more popular than any websites in Yiddish, such sites dominated the Yiddish search results. This made searching for anything in Yiddish nearly impossible because actual Yiddish-language sites rarely appeared in the search results, which were overrun by websites in other languages. Fortunately Google soon fixed the problem by favoring results in the language being searched in.
Back then Google Translate used first generation technology that relied upon the statistical analysis of translated texts. If a particular word were found in a database it would be translated. If not it would simply be transliterated letter by letter. Unsurprisingly the syntax of the resulting translations would often be bizarre. Despite some weird turns of phrase, however, Google Translate results were usually comprehensible.
In 2016 Google implemented a new technology based on a massive artificial neural network. Instead of ignoring words it can’t find in its database Google Neural Machine Translation (GNMT) now tries to guess what the text means and construct comprehensible and natural sentences based on context.
Every nerve-cell in humans and animals receives and transmits electrical signals through synapses, which function as a sort of natural wire that records these information exchanges. An ant’s neural network has around 250,000 cells; a cat’s has around 250 million, and a person’s – around 86 billion.
The development of modern medicine in the 20th century has allowed us to understand, more or less, the functions of living nerve cells. On the basis of then cutting-edge research in neuroscience the American-Jewish psychologist Frank Rosenblatt created a simple computer model of a neuron, the Perceptron, in 1957. Rosenblatt made sensational claims to the New York Times that the near future would see the rise of an entire civilization of sentient robots enlightened by artificial neurons.
Later on it would become apparent that machines like those Rosenblatt helped to create were at best only capable of mimicking the brain of a small worm. Interest in artificial neural networks remained limited until about 20 years ago when the size of computer networks grew large enough to theoretically support such endeavors. At the same time significant advances in neuropsychology have shed light on how information is processed by the human brain. Today computer models of neurons are used successfully in a wide range of fields. Thanks to such models specialized computer programs can now recognize faces, letters and other visual objects as well as predict economic processes, regulate traffic and help doctors to better diagnose their patients.
In recent years Google created the computer program AlphaGo which plays Go, an ancient Chinese game that is more than 2,500 years old. While it appears simple at first glance Go is actually phenomenally complicated, arguably more intricate than chess. Until recently professional Go players believed that, unlike chess grandmasters, they would never be defeated by a computer. In May 2017, however, Google’s AlphaGo did just that, besting world-champion Ke Jie and demonstrating that the program’s neural networks could replicate human abilities.
Just like our eyes, however, neural networks are far from perfect. When you read or write you often fail to notice typos and even entire missing words because your brain relies on the words and linguistic models you already know to fill in the gaps made by the nonsensical part of the text.
Although Google Translate’s new approach sounds like a giant leap forward it creates all sorts of problems. To begin with every text must first be translated into English before it can be rendered into another language. Secondly, instead of warning that it doesn’t recognize a certain word the new system automatically inserts its own “creative” meanings in a second-rate imitation of human neural processing.
The number of artificial cells in the new network may be very large, perhaps even larger than in the human brain. But it’s important to remember that an elephant also has three times as many neurons as a person. No animals, however, use anything akin to human language. Many processes performed by the human mind remain a mystery that can’t be cracked by any computer models, no matter how large they may be.
It’s possible that the primary goal of the new system isn’t to produce the highest possible quality of translations but rather to launch a global experiment examining human languages through neurological principles. You can easily participate in this experiment through your computer or smart phone. Keep in mind that the bizarre translations provided by Google Translate frequently change when you add an article (a, and, the), a preposition (on, in, after), or punctuation, even a comma. Yiddish texts written according to Hasidic and even Soviet spelling standards are understood more readily than the academic YIVO standard.
Despite the bizarre incident recounted above with the older version of Google Translate the word “shandhoyz” (brothel) is translated quite innocently (and inaccurately) as “shamrock” (in Yiddish that would be “klever-bletl.”) On the other hand the English word “whoredom” elicits the Yiddish word “verterbukh” (dictionary).
Interestingly enough if you enter “shandhoyz” written in the Hebrew alphabet into Google’s search engine it will recognize the word and bring you to the Forverts’ website and my article about the writer Sholem Asch (who set his famous play “God of Vengeance” in one). As I noted above Google’s search engine thankfully did away with automatically translated results so searching for “brothel” in Yiddish only yields a few pages of results. Reading through my article on Sholem Asch in Google Translate’s computer-generated pseudo-English I note that “Sfri-Koydesh” (holy books) is rendered into English as “Sri Krishna.”
Perhaps with time Google Translate’s new “elephant-mind” will prove to be intelligent. For the time being, however, its “dictionary” appears to truly be a “shandhoyz” (literally, a “house of shame.”)
A message from our CEO & publisher Rachel Fishman Feddersen
I hope you appreciated this article. Before you go, I’d like to ask you to please support the Forward’s award-winning, nonprofit journalism during this critical time.
At a time when other newsrooms are closing or cutting back, the Forward has removed its paywall and invested additional resources to report on the ground from Israel and around the U.S. on the impact of the war, rising antisemitism and polarized discourse..
Readers like you make it all possible. Support our work by becoming a Forward Member and connect with our journalism and your community.
— Rachel Fishman Feddersen, Publisher and CEO