Redistricting to test new Census method for protecting user data

New algorithm has created thousands of statistical improbabilities

A sign urging people to participate in the census stands in a planter on Main Street in Roswell, N.M. in 2020.  (Bill Clark/CQ Roll Call file photo)
A sign urging people to participate in the census stands in a planter on Main Street in Roswell, N.M. in 2020. (Bill Clark/CQ Roll Call file photo)
Posted August 26, 2021 at 4:21pm

Sixteen children lived alone in a wooded stretch of New York’s Jefferson County last year, at least according to the 2020 census results released earlier this month.

That’s because the Census Bureau applied a new algorithm to census results meant to protect respondents’ privacy — but it also created thousands of improbabilities like that block of “Boxcar Children” in upstate New York. 

Privacy experts and the agency have argued that in the field of big data, the Census Bureau should do more to protect the privacy of respondents, hence the new algorithm, a method called differential privacy that adds statistical “noise” to data. 

But some demographers fear the protections went too far. At stake is more than $1.5 trillion in federal spending over the next decade, allocated using population-based formulas. The debate over the utility of differentially private data has been theoretical — until now, as months of upcoming redistricting efforts and court fights put it to the test.

Adam Kincaid, executive director of the National Republican Redistricting Trust, said the extra noise adds another level of complexity to redistricting and related litigation.

“The differential privacy stuff injects a level of uncertainty intentionally into the data that, as Democrats bring their lawsuits and as we bring ours, will be a question that comes up time and time again,” Kincaid told reporters recently.

As states work up new maps, they’ll have to deal with improbabilities like the "Boxcar Children" blocks to draw equal population districts, despite what other records say about who actually lives there. Kincaid said his group has already identified several places where the official census results don’t match other records — particularly prisons. 

On Aug. 12, the Census Bureau released highly detailed population data that local officials use for redistricting and other mapmaking purposes. But those contain fewer people than reflected in the administrative records of prisons last year, Kincaid said.

The Census Bureau first adopted differential privacy in 2019 and released several tests over the course of the past two years. The initial results contained some anomalies — graveyards populated with the living, as well as broad systemic biases that put more people in rural areas than there actually were, for example.

Earlier this year, a three-judge panel threw out a suit filed by Alabama and Rep. Robert B. Aderholt, R-Ala., which in part argued the differential privacy protections would make redistricting data unusable. The judges wrote the state would have to wait until the data’s release to prove any harm.

The Alabama Attorney General’s Office said it’s still looking at the data and may revive the suit. 

Census Bureau officials have said they made improvements in the process and the data released Aug. 12 will be good enough for drawing new legislative and congressional districts. Research the agency released on Aug. 5 showed that for the smallest levels of geography — census blocks — population totals varied by less than 5 percent.

Agency leaders have also pointed out that they have shuffled the data around for decades, swapping households and the like, without being particularly public about it. Using differential privacy simply allows the agency to publish the work behind it.

Cornell University demographer Jan Vink has conducted research on the latest census results, along with a series of tests the Census Bureau released earlier this year. Since the release earlier this month, Vink found more than 6 percent of blocks had some form of impossible result — a population greater than zero but no occupied households, no population but occupied households, or blocks with only children living in them. 

The Census Bureau acknowledged the new algorithm would create such impossible or improbable results, such as “The Boxcar Children” blocks, before releasing the data.

However, Vink found that those problems occurred less with the August data release than in a May test. Earlier trials of the system had systemic biases toward putting extra population in rural areas, and against allowing a small area to remain one race.

“Until this latest release, this demonstration product, I didn’t feel comfortable about it at all,” Vink said. 

Still, the Census Bureau’s demonstration data has areas where the count differs by 5 percent or more from where it came out in 2010. Vink said for larger swathes of geography the effect disappears, but “for a particular place or city that could be a big problem.”

A research paper published by demographers from Penn State University and the University of Oklahoma found systemic shifts in rural and nonwhite populations. The researchers said that could present problems for uses of the data going forward, particularly for rural minority populations the Census Bureau does not publish other data for.

“It is imperative that data on these populations is accurate, as nonwhite populations in the rural reaches of the United States face significant levels of structural and interpersonal discrimination, resulting in worse health outcomes, higher rates of poverty, and lower educational outcomes, among other hardships, than their white neighbors and urban counterparts,” the paper said.