Jews have always been prolific writers. Has AI wound up with too much of their work?

‘What we are seeing is going to be bad for everyone. It just might be especially bad for Jews.’

Sarah Silverman, seen at the Golden Globes in January 2024, has sued OpenAI over its use of her content in its database. (Photo by Tommaso Boddi/Golden Globes 2024/Golden Globes 2024 via Getty Images)

By Asaf Elia-Shalev March 25, 2024

(JTA) — Growing up Jewish in New York City, Heila Precel absorbed the lesson that education can set you on a path toward personal success and protect against the forces that have marginalized Jews throughout history.

“I was told by my family and by my culture versions of ‘They can’t take away your education.’ Investing in education has been a tremendously successful strategy for American Jews,” Precel said.

Precel heeded her childhood lesson and made her way to Boston University, where today she is working on a doctorate in computing and data sciences. But a research paper she just published, in partnership with other scholars, suggests that the formula for success that countless American Jews like herself have banked on could be in peril.

The threat comes from the rise of artificial intelligence systems powering the kind of chatbots that communicate like humans — ChatGPT, for example. Those systems are trained on books, articles and other texts that have been fed into the machine largely without the permission of their authors.

That means anyone who produces intellectual property can wind up seeing their work used without license. Those creators face potential copyright infringement and, in the longer term, possible job displacement as A.I. tools may come to replace many white-collar workers.

Precel discovered through research that Jews are overrepresented among authors whose intellectual property is being used for A.I. training purposes. Compared to their numbers in the overall U.S. population, Jewish authors are overrepresented by a factor of two to six-and-a-half based on an analysis of available data. Among those authors are comedian Sarah Silverman and novelist Michael Chabon, both of whom have sued OpenAI, the company behind ChatGPT, for alleged copyright infringement.

Developers of A.I. systems are likely glad to hoover up all the content they get without regard for the identity of its authors, and no one is alleging that antisemitism is at play in the overrepresentation of Jewish authors. In fact, Precel acknowledges that the premise of her research can sound like a bit of a humblebrag: Jews make up a tiny portion of the population but have produced so much knowledge that, to a worrying degree, the future of A.I. research relies on them.

But she said a narrow interpretation like that would miss the point of her paper.

For one thing, the paper emphasizes that further research would likely confirm that other groups, such as Hindu Americans and Asian Americans, are also likely overrepresented. Precel also says exposing biases that harm Jews often reveals broader issues. That idea is reflected in an analogy in the title of the paper, “A Canary in the AI Coal Mine: American Jews May Be Disproportionately Harmed by Intellectual Property Dispossession in Large Language Model Training.”

“We are not saying that all of the lawyers are Jews, and therefore replacing lawyers is going to be bad for the Jews,” Precel said. “There are many lawyers who are not Jewish, and what we are seeing is going to be bad for everyone. It just might be especially bad for Jews, because Jews have historically put a lot of our eggs into this basket of educational attainment. In other words, we are shining a light on this overall problem with the canary-in-the-coal-mine analogy — while making sure to remember that canary itself does not fare too well in this story.”

Heila Precel’s website outlines her research interests in artificial intelligence and her Jewish identity. (Screenshot)

Precel grew up in a Conservative Jewish household and attended Jewish day school as a child. As an adult she has become more observant and attends synagogue weekly. The label she gives herself is traditional egalitarian. That is all to say that Precel has had many chances to discuss her research with other Jews whose texts may be found in databases used for A.I. training without permission.

In fact, her new paper is published in such a database. She says she’s encountered people with concerns, but many others don’t understand where the training data comes from or how it’s used.

“I get a lot of surprised reactions and some anxieties but also optimism,” Precel said.

Her paper belongs to a larger genre of research into the impacts and implications of technological advancements in the areas of artificial intelligence and machine learning. But Precel’s co-author, Nicholas Vincent, said the issue is often examined “through the lens of underrepresentation” rather than overrepresentation.

“The most famous example is models that performed really poorly on people with dark skin,” said Vincent, a computer science professor at Simon Fraser University in Burnaby, Canada, referring to the problem of image analysis software mislabeling Black people as gorillas. In the realm of text-based systems, he said, “if you’re not from the predominant cultural background, you’re more likely to sort of receive poor outcomes with models used for hiring or credit scoring.”

A new paper released this month tested how A.I. relates to people speaking an African-American dialect of English as opposed to using what’s known as Standard American English. The study found that the A.I. makes racist assumptions based on the difference. One chatbot , for example, was more likely to recommend the death penalty for defendants when they spoke African-American English.

One of the limitations of all these studies is that many artificial intelligence systems operate as black boxes. With ChatGPT, for example, it’s not possible to know what content developers used to train the system, because its owner, OpenAI, considers that information proprietary.

For the Jewish authorship paper, what researchers tried to do, then, is study not the systems but the data that is fed into them. They looked at what data the open-source systems use and at digital repositories of knowledge that are likely being used by the proprietary systems. These repositories contain massive amounts of scientific literature, published books, legal opinions and other kinds of texts.

AI models incorporate large quantities of written words, of which Jewish authors have contributed many. (Getty Images)

But since authorship information typically doesn’t indicate that someone is Jewish, the researchers searched for a way to identify and classify authors en masse. For that task they turned to the field of Jewish demographic studies.

Many different techniques exist to identify and count Jews; each has its own strengths and weaknesses. Using surveys to study Jews, for example, can help answer granular questions but is very costly because Jews are a small minority scattered across a wide geography.

“You end up spending a huge amount of money reaching out to people who are not Jewish,” Precel said. “There have been a lot of methods developed in the Jewish demographic literature to try to solve this problem.”

The team settled on a method that infers Jewish identity based on a set of distinctive Jewish last names. Many Jews have indistinguishable last names, but demographers have repeatedly found throughout recent decades in American Jewish history that distinctive Jewish names can be used as a statistical proxy for the overall Jewish population. The method is not helpful for research about Jewish diversity, but it can be used in certain scenarios, such as estimating the number of Jews in a long list of authors of A.I. training texts.

Much of the paper is spent on what might be done to address the concerns raised by the findings. The researchers imagine a future in which A.I. isn’t allowed to replace human work but to augment it, while avoiding large-scale economic disruption.

One possibility for achieving that scenario is using the findings to help inform policymakers and A.I. developers concerned with the ethical dimension of the technology. But the researchers also suggest another route.

“If people organize collectively around their intellectual property, there can be a more level playing field to negotiate with operators of A.I. technologies,” Vincent said. “Individually, your data is of really low value, but when we get enough people together, we have a lot of leverage.”

The Jewish community might already be organized enough to make collective advocacy possible. While there isn’t a union of Jewish writers, for example, informal coalitions of creative professionals have responded to anti-Israel sentiment in the literary world and in Hollywood.

In a hypothetical scenario, a group representing Jewish writers could come together and agree to adopt measures on their websites blocking bots from collecting content.

“So going forward, that group is particularly hard to get data for, and then all of a sudden there’s a big gap in the data,” Vincent said.

This article originally appeared on JTA.org.

Asaf Elia-Shalev is a senior reporter with JTA and is based in Los Angeles. He is also the author of “Israel’s Black Panthers: The Radicals Who Punctured a Nation’s Founding Myth,” which was published by UC Press in 2024.

It’s our birthday and we’re still celebrating!

We hope you appreciated this article. Before you go, we’d like to ask you to please support the Forward’s independent Jewish news.

This week we celebrate 129 years of the Forward. We’re proud of our origins as a Yiddish print publication serving Jewish immigrants. And we’re just as proud of what we’ve become today: A trusted source of Jewish news and opinion, available digitally to anyone in the world without paywalls or subscriptions.

We’ve helped five generations of American Jews make sense of the news and the world around them — and we aren’t slowing down any time soon.

As a nonprofit newsroom, reader donations make it possible for us to do this work. Support independent, agenda-free Jewish journalism and our board will match your gift in honor of our birthday!

Don't just read the Forward - invest in it. Make a gift and our board will match it.

Support our mission to tell the Jewish story fully and fairly.

$129 $180 $360 OTHER AMOUNT

Free morning newsletter

Forwarding the News

Thoughtful, balanced reporting from the Forward and around the web, bringing you updated news and analysis each day.

Terms(Required)
I agree to the Forward's Terms of Service and Privacy Policy
Email(Required)

Most Recent

I come from a long line of Jewish Bundists. Now, Molly Crabapple is part of our family.

200+ Bnei Menashe immigrate to Israel from India, the first to make the journey in years

װאָס קען מען זיך אָפּלערנען פֿון מאָלי קראַבעפּלס בוך װעגן „בונד“?

Jews have always been prolific writers. Has AI wound up with too much of their work?

‘What we are seeing is going to be bad for everyone. It just might be especially bad for Jews.’

It’s our birthday and we’re still celebrating!

Support our mission to tell the Jewish story fully and fairly.

Dive In

200+ Bnei Menashe immigrate to Israel from India, the first to make the journey in years

Netanyahu reveals prostate cancer treatment, says he hid diagnosis during Iran war

Teens in 2 states arrested over threat that shuttered Houston synagogue

In the depths of Tel Aviv’s bus station, a fragile refuge for those with nowhere else to go during war

Most Popular

In Case You Missed It

I come from a long line of Jewish Bundists. Now, Molly Crabapple is part of our family.

200+ Bnei Menashe immigrate to Israel from India, the first to make the journey in years

װאָס קען מען זיך אָפּלערנען פֿון מאָלי קראַבעפּלס בוך װעגן „בונד“?What can we learn from Molly Crabapple’s book about the Bund?

Today’s American Jews finally have their era’s Sandy Koufax

Jews have always been prolific writers. Has AI wound up with too much of their work?

‘What we are seeing is going to be bad for everyone. It just might be especially bad for Jews.’

Related

It’s our birthday and we’re still celebrating!

Support our mission to tell the Jewish story fully and fairly.

Dive In

200+ Bnei Menashe immigrate to Israel from India, the first to make the journey in years

Netanyahu reveals prostate cancer treatment, says he hid diagnosis during Iran war

Teens in 2 states arrested over threat that shuttered Houston synagogue

In the depths of Tel Aviv’s bus station, a fragile refuge for those with nowhere else to go during war

Most Popular

Who’s responsible for deadly antisemitism? Everyone will hate the answer

The antisemites are enjoying themselves

UCLA student government condemns campus Hillel for hosting former hostage

Lena Dunham’s new memoir is the most millennial thing ever

In Case You Missed It

I come from a long line of Jewish Bundists. Now, Molly Crabapple is part of our family.

200+ Bnei Menashe immigrate to Israel from India, the first to make the journey in years

װאָס קען מען זיך אָפּלערנען פֿון מאָלי קראַבעפּלס בוך װעגן „בונד“?What can we learn from Molly Crabapple’s book about the Bund?

Today’s American Jews finally have their era’s Sandy Koufax

Republish This Story

Jews have always been prolific writers. Has AI wound up with too much of their work?