How I Built My Jewish Charity Database
Last summer, I hit upon a way to measure Jewish institutional power: Using data from the IRS, I would gather financial information on every single Jewish organization that files a tax return.
My first story based on that project is printed in this issue of the Forward. This addendum, which is only for readers with a high tolerance for boring data stuff, describes how I built my database.
I started with two files, posted online by the IRS last spring, that contained data from all tax returns filed by tax-exempt groups in the 2012 calendar year. The IRS posted one set of data extracted from Form 990s, which are filed by larger tax-exempt groups, and another from Form 990-EZs, which are filed by smaller groups. I chose not to include a third dataset extracted from Form 990-PFs, which are filed by private foundations, as those groups are generally harder to classify as Jewish or not-Jewish.
The Form 990s and Form 990-EZs ask different questions, and the datasets provided by the IRS were structured differently. I mapped the two datasets onto each other so that they could be considered together. Then, using another dataset from the IRS — the Exempt Organizations Business Master File Extract, which includes information like names and addresses on all tax-exempt groups registered with the IRS — I searched for Jewish organizations.
I used a number of different methods to find these Jewish groups. I selected 151 different search terms to sort through the names of all tax-exempt groups, then searched for organizations that shared addresses with 50 large Jewish federations and national Jewish organizations. I also searched for groups that identified themselves as Jewish using the IRS’s own classification system, the National Taxonomy of Exempt Entities. That classification system was only marginally useful, as the vast majority of groups that it identifies as Jewish are religious organizations, which don’t file Form 990s.
Next I weeded out the false positives. My search terms returned many groups that didn’t belong — for instance, the search term “Hebron” returned not-for-profits located in the village of Hebron in Licking County, Ohio. I also excluded from my results hospitals with Jewish names and histories, such as Mount Sinai Hospital in New York, under the rationale that they were too tenuously connected to the Jewish community at this stage to be claimed as Jewish. I included Jewish nursing homes and health care networks with Jewish names in the final database.
Many groups had filed more than one Form 990 or Form 990-EZ in the 2012 calendar year. I included only the latest filing by each group in my final database.
I then sorted the groups into categories, using a mix of keywords and hand-sorting. Categorization decisions were based on the group’s name, on prior knowledge of the group and, in a few cases, on examinations of the group’s website or Form 990.
Finally, the Forward purchased from the charity information service GuideStar a number of data points that the IRS had not included in its free datasets.
The final database that I have used to write these stories has a number of inherent problems. The largest, which is discussed at length in my first story, is that religious groups aren’t required to file Form 990s. My database consequently misses thousands of Jewish organizations. I also have no way of knowing how many actual filers my search technique missed. Another issue arises from the fact that the IRS database included all filers in a given calendar year, while the groups were actually filing for fiscal years. That means that our database reflects a mix of fiscal years, many of which don’t overlap.
The grouping of organizations into broad categories, while done as carefully as possible, is necessarily a rough estimate, due to the complexity of some groups and to a lack of information about others.
And finally, many groups in the network likely make grants to each other. That means that some dollars are counted twice when we calculate figures like total revenue and total expenditures for the entire network. I have attempted to account for that in my analysis by discounting from the total revenue, in places, the amount that the communal grant-making groups in the network gave to other organizations within the United States.