After centuries of putting pen or pencil to paper, the U.S. government is getting ready to rely on digital screens and the cloud for its first-ever primarily online census.
Starting March 12, households across the country are expected to be able to participate in the once-a-decade national head count by going to my2020census.gov to complete the online census questionnaire, which is set to be open to the public through July 31.
Under pressure to cut costs and keep up with these increasingly Internet-centric times, the Census Bureau is expecting, based on an earlier test run, about six out of 10 households that fill out a form on their own to do so online. For those who have limited Internet access or prefer to stay offline, the bureau is also collecting census responses over the phone and on paper forms, which are scheduled to arrive at some homes by mid-March and then in early April to every household that hasn't responded by then.
"People can reply almost anywhere, at any time," Census Bureau Director Steven Dillingham said last month in a written statement to lawmakers on the House Oversight and Reform Committee, while also noting the web form and call centers are available in 13 languages.
But the planned public debut for the online census form comes amidst heightened concerns about cybersecurity risks, disinformation campaigns and technical troubles.
NPR has learned that as a precaution, the bureau has increased its printing order to prepare enough paper forms for every home address in the country.
Still, a snafu with the upcoming digital rollout, some lawmakers and other census advocates worry, could derail the constitutionally mandated head count and undermine public trust in data that carry at least a decade's worth of implications across the U.S. The results of the count are used to redraw voting districts and redistribute congressional seats, Electoral College votes and an estimated $1.5 trillion a year in federal funding among the states.
"The Iowa primary debacle comes to mind"
A nightmare scenario played out less than four years ago with Australia's census. After a series of outages caused by distributed denial-of-service attacks inundating the website for that country's first-ever "digital-first" census in 2016, the Australian Bureau of Statistics decided to shut down the online form for nearly two days. In social media, it gave rise to the hashtag #CensusFail. The U.S. Census Bureau has been looking for lessons from Down Under since then.
But some census watchers in the U.S. say they don't have to look that far back for an incident that stokes their fears.
"I must tell you the Iowa primary debacle comes to mind when I think of the census going digital," Congressional Delegate Eleanor Holmes Norton, a Democrat who represents the District of Columbia, said with a furrowed brow before questioning Census Bureau officials during a House Oversight and Reform Committee hearing last month.
Unlike the smartphone app that delayed the reporting of results from this year's Iowa Democratic caucuses, the Census Bureau's IT systems are not expected to produce new census numbers in a matter of hours. The bureau has until the end of December to start releasing the results of the 2020 census, beginning with the latest state population counts.
Still, in the months leading up to that legally mandated deadline, the bureau faces an unforgiving timeline that involves not only launching an online census form, but also relying on new IT systems and other technology to hire, train and deploy as many as a half-million temporary workers to help complete the count. Beginning in mid-May, it's planning to send workers equipped with an iPhone app to collect and deliver information about people in households that have not self-responded to the census.
For those who do go online to fill out the census on their own, the experience will likely boil down to spending about 10 minutes on a government website. But for the federal government's largest statistical agency, it will be a culmination of years of largely under-the-radar planning, testing — including a practice run of the census in 2018 — and some last-minute changes that could turn this year's count into the most complicated yet.
"The Internet is here to stay"
For decades, the Census Bureau has tried to rely on the Internet to gather names, dates of birth, phone numbers and other personal information about every person living in the U.S. for the national count.
A little-known fact is that the first online U.S. census took place in 2000. The last-minute rollout was so quiet that the bureau did not push out any advertising or even issue a press release.
"Unless someone happened to stumble across the link to the on-line form, or had some connection to the Census Bureau, he or she would not likely have known about this response mode," according to the bureau's report on its unprecedented online efforts in 2000, when a sliver of a percent of the country's more than 281 million residents were counted that way.
Still, that 2002 report noted that a digital census was on the horizon.
"The Internet is here to stay. The exact form and function of Census Internet options, however, is largely undeveloped," the report said.
Bureau officials ultimately decided to pass on another opportunity to take the count online for the 2010 census, which turned out to be another paper-dependent count after a debacle involving handheld computers that census workers were supposed to use to collect information from non-responsive households. That count turned out to be the most expensive to date at $12.3 billion in 2020 dollars.
"Behind schedule and rushed to deploy its systems"
This year's census, estimated to cost $15.6 billion, could break that record.
Officials at the Census Bureau hope they can keep that number from ballooning in part by cutting back on paper and collecting most of the country's census information online, as well as automating other operations by going digital.
But last year, a report by the Office of Inspector General for the Commerce Department, which oversees the Census Bureau, said it found "fundamental security deficiencies" with the cloud-based IT systems for the 2020 census. Among the weaknesses, the internal watchdog group said, were incomplete plans for recovering lost census information stored in the cloud in case of a cybersecurity attack or disaster.
"Many of these deficiencies indicate that the Bureau was behind schedule and rushed to deploy its systems," the Commerce OIG wrote in its June 2019 report.
Questioned by Del. Norton last month during the House oversight committee hearing, Dillingham, the bureau's director, said the agency has "remedied" all of the issues highlighted by the OIG's report.
"We store the data in multiple areas in the cloud to ensure security, and we back that up regularly," explained Albert Fontenot, an associate director at the bureau in charge of the 2020 census, during the hearing. "We can recover data if we had a breach or a situation like that."
"A burden of proof"
The bureau says it has not identified any census data that have been compromised during the 2018 test run of the census.
But in December, reporting by Reuters, which NPR has not been able to confirm, raised questions among some census watchers about the security of the bureau's IT systems. Citing anonymous sources, Reuters reported that while no data was stolen, the website created to collect census responses was "hacked from IP addresses in Russia during 2018 testing of census systems" and that "an intruder bypassed a 'firewall' and accessed parts of the system that should have been restricted to census developers."
In a rare written statement released hours after the report was published online last year, the bureau said the story "contains inaccuracies and outdated information," adding that it was "looking into every aspect" of the report and "taking all concerns seriously."
NPR has learned the bureau has since concluded "no 'hack,' breach, compromise, or cyber attack occurred" during the 2018 test run, according to a Feb. 3 letter that Dillingham wrote to Sen. Jack Reed (D-R.I.) and the bureau's public information office provided to NPR.
Asked by NPR whether the incidents highlighted in the report involved IP addresses linked to Russia, Kevin Smith, the bureau's chief information officer, did not directly answer, saying instead there were no "nefarious" actions by "foreign entities" during the test run. "They basically were able to see the website until we further restricted the access to just U.S. territories and U.S. states," Smith added.
Reed, the top Democrat on the Senate Armed Services Committee, had asked Dillingham to provide an explanation for the incidents by Jan. 10, according to a letter NPR obtained from the senator's office. In a written statement, the bureau says it responds to letters from Congress "as soon as we are able to." But Reed tells NPR the bureau's delayed response "undermines confidence in their ability to function effectively."
The bureau says it's monitoring for cyber threats and reviewing its IT systems with help from the Department of Homeland Security, including its Cybersecurity and Infrastructure Security Agency, as well as federal intelligence agencies.
Reed, who has focused on cybersecurity issues when questioning bureau officials during Senate hearings, remains skeptical about whether the agency is ready to deal with ever-evolving threats.
"The Census Bureau has a burden of proof that they're building a census that is not going to be compromised either through a lack of manpower or through cyber intrusions, and they haven't done that to my satisfaction yet," Reed says.
The "iron door with five locks"
As a precaution for 2020, the bureau is blocking IP addresses based outside of the U.S. from accessing the online form. Smith, the bureau's CIO, acknowledges, though, these geoblocking restrictions are a "very lightweight security measure," especially since bad actors can make themselves look like they're coming from within the U.S.
"It's like putting a screen door in front of your iron door with five locks," Smith says. "That screen door does a little bit, but not much."
Exactly how that "iron door with five locks" is set up, the bureau won't say. It's trying to avoid giving potential census disruptors an advantage. But that, in turn, has created what Smith referred to in a 2018 blog post as "an ongoing communications challenge" given the public scrutiny the technological changes for the 2020 count has drawn.
The bureau is on the lookout for phishing attempts that try to dupe people into entering personal information on fake census websites. If any are found, Smith says the bureau is planning to alert the public as soon as possible.
"But we're not necessarily in direct control as the census to shut down a website," Smith said during a 2018 public meeting at the Census Bureau's headquarters in Suitland, Md.
Preparing for the "Beyoncé tweet load"
Another unknown for the bureau is exactly how many people will try to use the online census form at the same time.
The bureau is trying to avoid repeating the botched 2013 launch of HealthCare.gov. Within two hours of its public debut, that federal government website began experiencing outages partly because it drew up to five times more than the expected number of concurrent users.
For the 2020 census online form, the bureau is trying to pace web traffic by staggering the mailing of letters that direct most U.S. households to my2020census.gov. While it's expecting about 120,000 users on the census website at the same time, the bureau has been trying to build a system at least five times as strong to handle 600,000 concurrent users. The system can be expanded if there are any higher spikes in traffic, according to Michael Thieme, an assistant director at the bureau who's in charge of systems and contracts for the 2020 census.
Internally, Thieme says, some bureau officials have been referring to a hypothetical scenario in which Beyoncé, whose Twitter account has more than 15 million followers, sends a tweet urging people to respond to the 2020 census, driving U.S.-based fans to the online form.
"That's what we call our viral load. Some people call that the Beyoncé tweet load," Thieme adds.
How much web traffic the online census form can actually take on, however, has come under question in recent weeks. The bureau says it recently discovered the IT system that the bureau was planning to rely on as its primary system for collecting online responses could not handle 600,000 concurrent users "without experiencing performance issues," according to a Government Accountability Office report.
"It wasn't that we couldn't scale. It was that we noticed the slowdown in performance," says Lisa Pintchman, vice president of corporate communications for Pegasystems Inc., a Cambridge, Mass.-based software contractor that developed the system.
The bureau says it decided on Feb. 7 — just over a month before the online form's public debut — to switch to an in-house system that was intended to be the backup. The system built by Pegasystems, the bureau added, will still be used for other operations and is expected to be the new alternative option in case the in-house system goes down.
But Nick Marinos, a director at the GAO who leads IT and cybersecurity audits of other federal agencies, warns there is little time left for additional testing of the new setup.
"Late design changes, such as a shift from one system to another, can introduce new risk during a critical moment," Marinos told lawmakers during the recent House oversight committee hearing.
The all-paper backup
As part of a last-ditch backup, the bureau has made a potentially significant change to its census plans by ordering millions of extra paper forms from its Chicago-based printing contractor R.R. Donnelley & Sons, which was selected after a bungled bidding process. During the House hearing, Dillingham, the bureau's director, revealed that it was part of a contingency plan in case of "some type of catastrophe in which people could not reply online."
But the exact number of forms the bureau is preparing to distribute has been difficult to nail down. The bureau previously estimated it would need around 137 million paper questionnaires. But at a January press conference in Washington, D.C., Fontenot, the bureau's associate director for the 2020 census, said printing was "completed" and "over 120 million" questionnaires were ready to be stuffed into envelopes.
Speaking to NPR last month, however, Fontenot said a total of more than 150 million forms have been printed.
That would be enough to provide one to each of the 140 million-plus households in the country, giving the bureau what Fontenot considers "the maximum margin of safety" if what it hopes to be the first primarily online U.S. count has to fall back to all-paper.