Advertisement

Harvard Is Digitizing Nearly 40 Million Pages Of Case Law So You Can Access It Online And For Free

06:34
Download Audio
Resume
Harvard's Library Innovation Lab is working to digitize nearly 40 million pages of case law so the public can access it online and for free. Here, a case law book that has been freed from its binding is ready to go through a high-speed scanner. (Brooks Kraft/Harvard Law School)
Harvard's Library Innovation Lab is working to digitize nearly 40 million pages of case law from the Harvard Law School library collection, so the public can access it online and for free. Here, a case law book that has been cut from its binding is ready to go through a high speed scanner. (Brooks Kraft/Harvard Law School)

Not too long ago, a statement like this spoken in the hushed, hallowed hallways of the Harvard Law School library would have been considered heresy: "I think for court decisions, law books are becoming obsolete and even to some some degree a hindrance."

That's Adam Ziegler, and he's no heretic. He's the managing director of the Library Innovation Lab at Harvard. Ziegler is leading a team of legal scholars and digital data workers in the lab's Caselaw Access Project.

"We want the law, as expressed in court decisions, to be as widely distributed and as available as possible online to promote access to justice by means of access to legal information," Ziegler said. "But also to spur innovation, to drive new insights from the law that we've never been able to do when the law was relegated to paper."

"So what's going to result from this project is a huge database of electronic, digital court decisions. And the world of law has never seen that before."

Adam Ziegler, managing director of the Library Innovation Lab

Historically, libraries have been collections — books, multimedia materials and artwork. But increasingly they're about connections, linking digital data in new and different ways. The Caselaw Access Project is a state-of-the-art example of that shift.

"So what's going to result from this project is a huge database of electronic, digital court decisions," Ziegler explained. "And the world of law has never seen that before."

'Unbinding The Law'

Harvard Law's collection, second only to the one kept by the Library of Congress, includes the civil and criminal case law decisions from every state and federal court.

Ziegler and his team estimate that across all 43,000 case law books in the collection, each has an average of about 921 pages. That's nearly 40 million pages that need to be digitized.

The law school has so many books that the majority are stored in a vast vault in a hidden hilltop repository in Southborough, out of sight and not very accessible to students and scholars.

Ziegler says the oldest decision in Harvard's case law collection dates back to Rhode Island's Court of Trials circa 1647. He wants to extend its future forever.

"We're all bound by the law," he said. "We're all bound by the decisions that judges issue, we ought to be able to read them, and we ought not have to pay to read them."

The goal of the Caselaw Access Project is to liberate law books, making the contents available to anyone with an internet connection.

"We are literally, and sort of metaphorically, unbinding the law and making it available online for free, which is exciting," Ziegler says.

Before being scanned, a case law book must first be cut from its binding. (Brooks Kraft/Harvard Law School)
Before being scanned, a case law book must first be cut from its binding. (Brooks Kraft/Harvard Law School)

The books that are set to be digitized are first shipped from the Soutborough storage facility to the law school library. The physical unbinding happens in a prep room, where Zach Bodnar, a digitization specialist, uses an X-Acto knife to carefully cut case law books from their bindings. Then a machine slices neatly through the spines of the books.

"The machine itself does chomp with more force than a great white shark," Bodnar said. "A fun tidbit."

From there, the books are sent to another room, where a high speed scanner takes four different images of each page -- 100,000 pages a day.

"They also apply metadata that give structure to the resulting file — so we know the name of the case, name of judges, the name of the court, the date on which the decision was issued," Ziegler explained.

One of Harvard's case law books goes through the high speed scanner. (Brooks Kraft/Harvard Law School)
One of Harvard's case law books goes through the high speed scanner. (Brooks Kraft/Harvard Law School)

After being scanned, the unbound books are hermetically sealed in plastic along with their original binding using a device used by meat packers.

From there the books are sent to a limestone cave in Kentucky.

After the case law books are scanned, they are sealed with their original bindings before being shipped to Kentucky for storage. (Brooks Kraft/Harvard Law School)
After the case law books are scanned, they are sealed with their original bindings and shipped to Kentucky for storage. (Brooks Kraft/Harvard Law School)

"It's important to have it, just in case," Ziegler said. "If we need to reboot our democracy for some reason then we'll have all these books in Louisville. But also if we do our job right then that book will be a backup that's only needed if only something goes wrong."

Mining The Data

The digital transformation turns the case law books into files that can be data mined, and the information extracted for profit.

Harvard has granted Ravel Law an eight-year exclusive contract to use the case law information. The law school has an equity interest in the California-based company, which plans to use the data in new and innovative ways.

Daniel Lewis, CEO of Ravel, says it has applications that can detect trends and patterns in the law, even tracking bias among judges, presenting data in a visual way that discloses relationships never seen before in the law.

"So you have this raw case that's now digital, and then what we can do is add machine learning on top of that. And by adding all these extra pieces of information we make it more possible to sift through millions and millions of documents to find exactly what you want," Lewis said. "And we do that by combining legal expertise with software engineering."

Ravel's applications turn case law data into legal narratives in a way, the company says, word search databases such as LexisNexis and Westlaw do not. And while accessing the raw data is free, the analysis is going to cost you.

The Caselaw Access Project should be complete by next March, in time for Harvard Law School's bicentennial anniversary and a new chapter in the law school's future.

This segment aired on August 30, 2016.

Headshot of Bruce Gellerman

Bruce Gellerman Senior Reporter
Bruce Gellerman was a journalist and senior correspondent, frequently covering science, business, technology and the environment.

More…

Advertisement

More from WBUR

Listen Live
Close