Skip to main content

Support WBUR

The Internet Archive is in danger

47:00
FILE-On this Monday, Dec. 18, 2006, file photo, Internet Archive founder Brewster Kahle prepares a book for digital scanning in San Francisco.  Google and U.S. publishers settled a longstanding dispute over Google's book-scanning project Thursday, Oct. 4, 2012. Google already has scanned more than 20 million books. Publishers and authors sued, saying the project violated their copyrights. (AP Photo/Ben Margot)
FILE-On this Monday, Dec. 18, 2006, file photo, Internet Archive founder Brewster Kahle prepares a book for digital scanning in San Francisco. Google and U.S. publishers settled a longstanding dispute over Google's book-scanning project Thursday, Oct. 4, 2012. Google already has scanned more than 20 million books. Publishers and authors sued, saying the project violated their copyrights. (AP Photo/Ben Margot)

More than 900 billion webpages are preserved on The Wayback Machine, a history of humanity online. Now, copyright lawsuits could wipe it out.

Guests

Brewster Kahle, founder and director of the Internet Archive. Digital librarian and computer engineer.

James Grimmelmann, professor of digital and information law at Cornell Tech and Cornell Law School. Studies how laws regulating software affect freedom, wealth, and power.

Transcript

Part I

MEGHNA CHAKRABARTI: Republican Congresswoman Elise Stefanik represents New York's 21st congressional district. She is one of President-elect Donald Trump's most loyal supporters. And that loyalty has been rewarded. Trump has picked Stefanik to be the next U.S. ambassador to the United Nations when his new administration takes office in 13 days.

Now, yesterday happens to be the fourth anniversary of the January 6, 2021 riots and attacks on the United States Congress. The certification of Trump's win went smoothly yesterday, his 2024 win. Unlike 2021, when Trump supporters violently attacked police officers, defecated in the halls of Congress and forced the halt of the peaceful certification of a free and fair election. On that day, January 6, 2021, Trump in his first presidency sat in the Oval Office and watched the entire attack unfold on television.

He did not lift a finger to protect the people on Capitol Hill, the nation's representatives, or this system of government. Which represents, of course, this nation itself. I'm going over this history because, after a time, people start forgetting. The forgetting is already beginning, both as a product of the natural passage of time, we just forget stuff that's happened a long time ago, but also as a product of the purposeful deletion and rewriting of history.

For example, since 2021, hundreds of people have been prosecuted and found guilty by juries of their peers for their various actions on the day they attacked the Capitol. Donald Trump says one of the first things he'll do after January 20th this year is pardon those people. So this brings me back to Representative Stefanik.

Last year, she was on NBC's Meet the Press. She called the January 6th rioters hostages.

ELISE STEFANIK: I have concerns about the treatment of January 6th hostages. I have concerns. We have a role in Congress of oversight, and I believe that we're seeing the weaponization of the federal government against not just President Trump, but we're seeing it against conservatives.

CHAKRABARTI: Her answer goes on to talk about Hunter Biden and Hillary Clinton. Now, this is quite different from what she said on the day of the attack four years ago. On January 6, 2021, when Congress was able to reconvene in the small hours of the night, Stefanik took to the House floor and made this statement.

STEFANIK: Americans will always have the freedom of speech and the constitutional right to protest, but violence in any form is absolutely unacceptable. It is anti-American and must be prosecuted to the fullest extent of the law.

CHAKRABARTI: Again, that's Representative Stefanik, January 6th, 2021. Now, we got that clip from C-SPAN and, bless C-SPAN, evidence of what happens in the floor of the House and the floor of the Senate remains there, for as long as C-SPAN's archive exists. Now, Stefanik also published a written statement on January 6th, 2021, and posted it on her congressional website. That statement reads in part, quote:

I fully condemn the dangerous violence and destruction that occurred today at the United States Capitol. Violence in any form is absolutely unacceptable and anti-American. Thank you to the United States Capitol Police, all law enforcement, the National Guard and the bipartisan professional staff of the United States Capitol for protecting the people's House, and the American people.

CHAKRABARTI: End quote. That was on her website, January 6th, 2021. But you know what? I'm going to pull up my computer here.

Here it is right here. If you look for that statement today, and if you happen to be near a computer, and you want to do this, I'm telling you, you will not find that statement. Here is the URL for where that statement once was, okay, and you can try it. It's, I'm going to do it right here. HTPS, secure, right?

And then you go Stefanik, which is S T E F A N I K, S T E F A N I K, dot house, dot gov, slash 2021, slash one, slash Stefanik, again, dash statement, dash violence, dash united, dash states, dash capitol. Love those SEO URLs. Okay, so then hit go, enter. I wonder if you got what I got. I got a website that says, Error. The page you have requested does not exist or is undergoing routine maintenance.

It still says Elise Stefanik serving New York's 21st district in the upper left-hand corner. And in other words, that statement on Congresswoman Stefanik's website has been taken down. Now the problem here is that of course in the 21st century we use the internet as our history book, notebooks, bookmarks, primary source, our entire storage and filing system to document the story of ourselves and our nation.

So what happens when at the tap of a key that story can be so easily erased and then plausibly denied. What do we lose with the record of our lives and our action when we lose those very records? There is one place where that record remains preserved. It's called the Wayback Machine, and it's run by the non-profit Internet Archive.

That is how I found Stefanik's 2021 statement. Do this with me again. Just go to web.archive. Actually, let me go back here and copy the original one a little bit. There we go. Okay, then you go to web.archive.org. Alright, and then when that loads up, there's a URL you can enter for the old page you're looking for.

Gonna just paste it in there, and then you click the places where the archive has scraped it and preserved it, and you click there, and there it is. The original statement. As it appeared on the website, on Stefanik's website, the day it was first posted, January 6th. You see it right there. So what happens if the Internet Archive, the Wayback Machine itself, ceased to exist?

Do we take one more step towards the world of Winston Smith, the hero of George Orwell's classic 1984? Quote:

Every record has been destroyed or falsified, every book rewritten, every picture has been repainted, every statue and street building has been renamed, every date has been altered, and the process is continuing day by day and minute by minute. History has stopped. Nothing exists except an endless present in which the party always right.

CHAKRABARTI: End quote. Due to some very important court cases, active right now, in reality, a world without the Internet Archive is not impossible to imagine. So joining me now is Brewster Kahle. He's the founder of the Internet Archive and a digital librarian and computer engineer, Brewster Kahle, welcome to On Point.

BREWSTER KAHLE: Oh, great to be here. Thank you, Meghna.

CHAKRABARTI: What do you imagine the world might be like if the Internet Archive or the Wayback Machine ceased to exist?

KAHLE: The Internet Archive is the only real public record of the broad world wide web, but also all sorts of other things like old television, old books, that are all a cooperative effort of thousands of libraries to build a record of our time and make it as publicly available as we can. That's what the Internet Archive is, Archive.org, is a free service. It's used by millions of people a day. It's about the 200th most popular website of all. The good news is people want old stuff.

CHAKRABARTI: Yeah, they want old stuff because it's part of what makes us who we are.

And do you dare imagine, what would we do if we didn't have access to that old stuff that has only ever existed in this digital format? Do you dare imagine what that would be like?

KAHLE: It would mean that people wouldn't be able to be held as accountable for what it is they said.

But I think more broadly, they just wouldn't be able to remember. Just we get emails all the time from people just being so delighted that their old websites are still around. They're available or their parents' websites or their memories from their youth, their old alma mater.

The worldwide web is magic in the making. It's that everyone can be a publisher.  But Tim Berners-Lee's system of the World Wide Web was too simple. It only comes from one place. And that one place can be changed or deleted at any time. The average life of a web page is 100 days before it's changed or deleted.

Sometimes on purpose, like what you're saying, they want to change history. And sometimes just because it just fades off. So we need a record. We need a vibrant library system and that's what's a threat.

CHAKRABARTI: Yeah, I definitely accept the argument that the first ever web page that I made back in my college days with the little like dancing gifs of Bart Simpson, that doesn't necessarily need preservation.

KAHLE: (LAUGHS)

CHAKRABARTI: So that URL is dead and gone. But what you're saying, though, more is that this is who we are. And if who we are only has a lifespan of a hundred days, I'll come back to this later, but it really brings to question of like our engagement with the past and our belief in what is true in the present.

Now we have only about a minute left in this first segment, Brewster. Can you remind me, when did you come up with the idea of the Internet Archive?

KAHLE: The Internet Archive started in 1996 in the early days of the web. It was to basically build the library we'd been dreaming of forever, and I'd been working on since 1980, but others had been working on for actually much longer.

The idea of having a library system that worked better than what we had growing up has been a lifelong dream for myself and many others. The library of Alexandria is a constantly renewed mythological goal of trying to make the published works of humankind available, standing on the shoulders of giants.

For me, it was a pretty obvious step that we needed to do this. And so we created the World Wide Web, a little bit in a crufty way, but now we have the Wayback Machine to help fill in some.

Part II

CHAKRABARTI: So Brewster, how does it work? How are you storing 900 billion webpages? How do you do it?

KAHLE: Oh, it's just miracles of current computers. So we own our own computers at the Internet Archive. It's not in some cloud someplace, which is somebody else's computers.

Libraries take preservation very seriously. And there are about 1,300 libraries, including the National Archives, the library of Congress. And a thousand libraries that basically crawl these, at these frequencies, that we collect one over 1 billion URLs every day, 1 billion. And those go and are stored in their full original form.

With the, on hard drives and then they're indexed to be the Wayback Machine. So if you go to the Wayback Machine at Archive.org, you can just type in a URL and see past versions and see the web as it was. So if you click on say a 2001 political website, you'll go and click around that world as it existed then by pulling it out of the archive. And you could see all of the changes that were made to every URL when they disappeared or whatever.

Yeah, it's used by millions of people a day.

CHAKRABARTI: That's what I did for this press release that Congresswoman Stefanik released in 2021 and I can see I still have the Wayback Machine page open here for, there it is, beginning, quote: This is truly a tragic day for America. I fully condemn the dangerous violence.

And then there's a little, there's like a timeline at the top of the page that shows all the times that page was scraped by the Wayback Machine, and does it show even the changes to the page every time?

KAHLE: Yes, you can, in the upper right, you can click to see if they've changed a word or phrase, but often it's just pages just completely disappear.

And you can see, so the evolution, past additions have always been very important. It's the memory hole problem. It's the nightmare of being able to go back and change recorded history, and libraries as being third party, nonprofit, public services have always played a role in making a record and making that publicly available, as well.

CHAKRABARTI: So where do you get the funding for this? Because it seems like a very large undertaking.

KAHLE: It is and it isn't. But, yes, there are. We get about one third of our income from libraries paying us to collect web pages or digitize books and records for them. About one third from major donors and foundations and about one third from end users.

And the same kind of NPR begathon at the end of the year, of please, please. And we have over 150,000 people a year that go and say, I want to support access to history. And so it's about a $20 million, $25 million a year organization. Wikipedia is about 10 times that, but both of these are less than the San Francisco public library.

So even just San Francisco, that's not Alameda, we're tiny by comparison. So it is possible with these digital technology to make copies of these materials, preserve them, and then even put them in other locations for long term storage.

CHAKRABARTI: So other copies of the servers, right?

Or the storage units essentially that you have?

KAHLE: Yes, absolutely.

CHAKRABARTI: You only do a begathon once a year, Brewster? You gotta catch up! We do it like five times a year! Come on! But, so it's not just websites though, right? Like here we get into the nitty gritty. There's more, it's basically, and correct me if I'm wrong, but it's everything that's digitally available through the web.

KAHLE: We try to basically collect everything that's digitally available through the web. Now we can't keep up with everything, say, in YouTube, for instance. But if it's linked to, or linked from a Tweet, then we try to get it. But also, we try to record television. Worldwide television in cooperation with libraries around the world.

And try to make that searchable so that those C-SPAN that you were referring to is also recorded. And you can only get clips from us, and to search and then you can get, we can send you a thumb drive and you can borrow that program. If you want to reuse it for your documentary. Then you have to go and license it or something.

But it's available as a library, as a record. And it's very important that it's not just from one place. Because those are too easily manipulated, and they go out of business all the time. The Internet Archive is the place that websites that are long since dead. Geocities, old people's blogs all these past the SoundClouds, the Bandcamps, the internet music archives that existed 20 years ago and are long since dead.

Those hold fantastic works, creative works of people that they love to be able to get back, and their old hard drives and phones are long gone.

CHAKRABARTI: So music, you mentioned television books. This is all stuff that intellectually also belongs to people, and as you said at the very beginning of the show, one of the goals of the Internet Archive is to not only preserve this material, but make it accessible to everyone.

Isn't there a conflict there, right? Because we hear all the time about artists thinking, having their stuff listened to or read, but not receiving a single penny of remuneration from that.

KAHLE: Ah, that's how publishing has always worked. They basically, in the old days, back when we were growing up, publishers would make copies, sell them to libraries and individuals.

The publishers then would pay some of it back upstream. And if there are lots of publishers, then authors and musicians had multiple to pick from. Fewer now, but that's a different issue. And then these libraries would preserve them, because they've paid for the works.

And the question is, how do we move into this digital era? And what libraries do is they make things not as available through a bookstore or record store or something like that. They're available to those, the researchers that want to have access to the old versions.

It's crufty versions, but they're very important to have a record of them. And not, we don't compete really with the, I don't think that the blog of this radio program is going and complaining that the Internet Archive has it buried someplace in the Internet Archives collections.

It's because people will go to WBUR's podcast to go and find it. That parallel path of libraries and publishing have existed for thousands of years.

CHAKRABARTI: But of course, the difference is that we are a nonprofit that we put our work out there for the public good, and we want as many people to have access to it at zero cost.

That's not necessarily how the publishing business works. So Brewster, hang on here for just a second, because, as promised, we need to talk about these court cases that have come up regarding what the Internet Archive does. And in order to get the legal view on that, I'm going to bring James Grimmelmann into the conversation.

He's a professor of digital and information law at Cornell Tech and Cornell Law School, and he studies how laws regulating software affect freedom, wealth, and power. Professor Grimmelmann, welcome to On Point.

JAMES GRIMMELMANN: Hi, it's great to be here.

CHAKRABARTI: Okay, so first of all, there's two primary cases, or really two cases we need to talk about.

The first one is Hachette v. the Internet Archive. Tell us about that case.

GRIMMELMANN: So this is a case about the Internet Archive's use of book scans. The Internet Archive, in collaboration with other libraries, and like many organizations, has been digitizing books. They get a physical copy of it, they put it in a book scanner, they take photographs of each of the pages, they recognize what the word's on, and now they have a digital record of what used to be in a physical book.

So Brewster was talking before about preserving the web. Those are things that were accessible online at one point, but when you're talking about physical books, they've never been previously available digitally. And so this is an additional way of having archival copies of them that can be preserved.

So in addition to digitizing the books for preservation, The Internet Archive also made them available to people for reading, in a metaphorically way, the same way that a library with physical books would. You log in with your account, you check out the book, and then it's available to you to read on your computer until the end of your borrowing period.

You return the book, and you can check out something else. The idea is that people circulate a copy of this book in the same way that a library would circulate a physical book from its shelves.

CHAKRABARTI: Wait, can I just jump in here? So just to be clear, that digital copy that you're just talking about is in reference to how it works at your local library.

GRIMMELMANN: So the local libraries have done something similar with licenses from publishers. The publisher gives them a digital file for an e-book, and they will let you use it. One person or some number of people read it at a time, and they pay the publisher for the e-book that they lend out. It's a kind of imitation of the model of physical books, where the number of copies that circulate at any one given time is limited.

CHAKRABARTI: But Brewster, is that what the Internet Archive is doing with its digital books? Because a court said, I'll read the court ruling here in a second, but a court found the Internet Archive in basically in violation of copyright law with its book scanning program.

KAHLE: Yes, the Internet Archive, working with other libraries, basically has a physical copy, keeps that aside, and then lends the digitized photographs, as James put it, of these books to one reader at a time, so it's limited, on what's also a little different is what happens, from what I think James said, libraries do, is they actually don't even have copies of the digital books from those publishers.

They just pass the readers on to the publisher's database products, to their web page, if you will, and pay the right to go and do that. So the libraries actually, in the eBook, world never get a copy. They pay and pay, but they've never bought a copy. Digital ownership is key here. So mostly the Internet Archives' collections are old 20th century materials.

We link them into Wikipedia. So that people can go and look at the Wikipedia links to go and see, is that support the statement that's there? They just get a snippet, and then if they want to see more, they have to borrow or buy the book.

CHAKRABARTI: Okay. But the Internet Archive, to be clear, lost this case. So let me provide a little bit more background.

I believe it was first filed in June of 2020, and in the Southern District of New York. It wasn't just Hachette, the publisher. It was also HarperCollins, Penguin Random House, and Wiley. They were organized by the AAP, or the publishers. It involved 127 works from these publishers and basically what was found by a lower court judge and then affirmed by the Second Circuit was that, I'll say, I'll quote the ruling here.

Is it fair use for a non-profit organization to scan copyright protected print books in their entirety and distribute those digital copies online in full for free, subject to a one to one own to loan ratio, between its print copies and the digital copies it makes available at any given time? All without authorization from the copyright holding publishers? Or the court applied the Copyright Act, and they said the answer is no.

CHAKRABARTI: Professor Grimmelmann, explain this ruling to me. What in the eyes of the Second Circuit because the Internet Archive declined to appeal up to the Supreme Court, what is the violation here of the Copyright Act?

GRIMMELMANN: The issue is that the Internet Archive's lending program is, it looks and works a lot like traditional library book lending, but technically there are a bunch of computer implementation details that are different. And the court thought that those details make it fundamentally unlike library lending and not protected.

So libraries have always relied on another copyright defense. First sale. Once you buy a copy of a book, it is yours to sell, give away or lend out as you see fit. So libraries would always buy books and then first sale would protect their right to lend them out to any of their patrons. First sale protects your right to work with that particular copy.

If you buy one copy at the bookstore, you can sell that one copy. If you buy 10, you can sell or lend out those 10. The issue is that when you go to the digital world. The Internet Archive isn't distributing a physical artifact, like a book with paper and ink, to its readers. It's giving them digital access.

And the way that computers work when you want to give digital access to a file, it involves making a copy of the bits on that file on a different computer. And so the publishers argued and successfully persuaded the court that this is making a separate copy from the original book, the original file on either the book, paper form or the file on the archive servers, and that additional copy triggers copyright law and isn't protected by first sale.

CHAKRABARTI: And that is different than what Brewster was mentioning earlier, that is it that libraries when they have their digital copies that they lend out, they're getting those digital copies from the publishers themselves.

GRIMMELMANN: Yeah, the libraries are getting permission from the publishers which they have to pay for. They license the right to get people to read those copies, but really, they're not even licensing anything to their customers They're just paying the publishers for the publishers to give the library's patrons access.

They're basically sale, points of sale for publishers for read a book for a few, for a couple of weeks.

CHAKRABARTI: Okay. Brewster Kahle, I have a statement here from the Association of American Publishers, which brought the suit, and they said that they're thrilled to see that the Second Circuit's interpretation quote 'leaves no room for arguments.' That quote, that 'controlled digital lending is anything more than infringement whether performed by commercial or non-commercial actors.'

Now we've got to take a quick break here. But when we come back, I want to get your response to that, Brewster. And then we'll talk about how the music industry is also applying legal pressure to the Internet Archive. That's all in just a moment.

Part III

CHAKRABARTI: So Brewster, going back to that statement from the American, the publishers, when they said, basically.

I'm going to paraphrase here, there's no difference between what the Internet Archive is doing by scanning these books and slapping a book down on a photocopier and pressing copy. And in every book, it says you cannot do that, you cannot make copies of this book. So do you have an argument against that?

KAHLE: Oh, yeah, the publishers lend out their electronic books, whether it's the Harry Potters or the like, using the same technologies to protect it from having multiple readers read it, that we use for the digitized, our dusty-musties, the mid 20th century books about World War II, those sorts of books are protected with the same technology.

So I'd say I think the bigger picture here is that, yes, there was a New York court that sided with the publishers, but other courts side with libraries. For instance, in Europe, when almost this exact same case came up, of going and lending digitized books or digital books from libraries, all of Europe affirmed it, both at the local level in Holland and at the European level. China has allowed digitizing and lending for 15 years, India also concerned with educating their public and supporting libraries has also been supportive of educational exemptions.

So in the United States, 100 years ago, led in libraries, the Carnegie libraries. And you have to remember that the publishers, in general, sue libraries over and over again about things like lending and have forever. But the legislatures and the judiciary in the United States 100 years ago said it was important to have libraries and archives, and they supported libraries, and we made the Carnegie library system.

What will be this generation's, what countries are going to lead in libraries is really unknown, but that's the big question.

CHAKRABARTI: Philosophically, I agree with you. I am a giant proponent of making information easily accessible for the good of the general public. But James Grimmelmann, let me turn back to you here.

It doesn't seem to me that is what this case is about. As you said, it's about the existence of that one digital copy. Were the publishers' interest as narrow as that or did they actually have some sort of other strategy in play? Are they fearful that the Internet Archive, instead of doing those, like Brewster said, those dusty-musties about World War II history from 70 years ago, that they're going to move into digitizing Harry Potter.

GRIMMELMANN: Yeah, I think the publishers are concerned about what they see as a principle and a slippery slope. The Internet Archive is not going to put them out of business. But they're afraid that there will be lots of other libraries that have lower standards and take fewer safeguards, and work with books that are front list titles. And that if they don't sue everyone who is crossing their radar, that eventually they won't be able to enforce any restrictions, because everyone will just download free copies from the internet of everything.

CHAKRABARTI: I just want to make a note that we did reach out to representatives in the legal team from the American Association of Publishers and they did not respond, but we did have that statement that I read earlier. Okay. So the other case, so that, by the way, that case is complete. I just want to remind everyone. The Internet Archive declined to appeal to the Supreme Court.

So the Second Circuit's ruling stands. Brewster, you lost. I just want to, I'm sorry to put it so bluntly, but that is what happened. So we're going to ask, I'm going to ask you in a minute about the implications of that for the archive. But, Professor Grimmelmann, on the music side of things, there's another case, Universal Music Group.

And they are taking aim at the Internet Archive's Great 78 Project. What is this case about?

GRIMMELMANN: So this case is about, another Internet Archive's efforts to digitize and make available another source of old media. In this case, early 78 records. So 78 RPMs. This is the first major generation of widespread commercial records. And it's an amazing history of the early sounds of recorded music. And so the archive, recorded, took lots of these old ones, went to great effort to make digital versions of them, and put those on their websites so people can experience them without the risk of working with, finding and potentially destroying extremely old and fragile records.

CHAKRABARTI: And to be clear, these are not in the public domain.

GRIMMELMANN: There's a mixture, because some of these would be works that would now be public domain. Some of them were not part of the federal copyright system, but were added to it by the recent Music Modernization Act. It's a very diverse and in some ways legally complicated set of works.

CHAKRABARTI: Okay. So what's the issue here, though? Is it the same thing, that the Internet Archive is making this digital copy of these records, which, by the way, for those of you who are young enough not to know, we're talking about vinyl here. And that the existence of that copy in and of itself is the problem?

Or is it the fact that now many more people have access to it?

GRIMMELMANN: In some ways, they're very similar cases. Both are about digitizing these old works and then making them available. There's some legal differences between the two, due to the different status of music in the copyright system.

And there's some technical differences. But it's not fundamentally different in kind, it's an objection by copyright owners that other people are making digital archives and then giving the public access to those archives.

CHAKRABARTI: Okay, so we reached out to Universal Music Group, they did not respond, but we also reached out to the Recording Industry Association of America, RIAA, and their chief legal officer Ken Doroshow sent us back this statement, and he actually talked about some of the legal differences here. He says, quote:

Congress took decisive action to protect pre 1972 recordings in the Music Modernization Act.

And then he says, the Internet Archive's mass, quote-unquote:

Mass scale copying, streaming, and distribution of the thousands of pre 1972 recordings are blatant violations of those established laws.

CHAKRABARTI: How do you read that, Professor Grimmelmann?

GRIMMELMANN: I think he's saying that the Music Modernization Act singled out music as special for extra protection. I don't know if that's right. There are some ways in which music gets a little bit heightened protection in U.S. copyright law, and a lot of ways in which it gets less than other kinds of works.

The MMA reduced some of those disparities, but I wouldn't say that it elevated music and old recordings above everything else.

CHAKRABARTI: Brewster, the Recording Industry Association of America calls your Great 78 project, quote, yet another mass infringement scheme that has no basis in law. What's your response to that?

KAHLE: We're a library. So this project is a combination of a hundred different libraries and collections over that have participated in building this collection. And it's available in the same kind of way to the same kinds of users that their collections were when they were in their basements.

So one of the first collections that came in was from the Boston Public Library, that this collection of 78 RPM records is actually before vinyl. This is the old shellac recordings, where we have to wind it up, the horn, the dog, these are, they stopped being viable in 1950 and people don't even have the players for these.

So to understand what America sounded like, you actually had to either go and find these things and then some would destroy them by putting them on these old record players, winding them up and listening to them in a crufty way. And people just weren't doing it. So in general, the idea is you'd go and make this available to researchers, which are about the only ones that care about the old crackly things.

And we actually, since this project has been going on for 10, 15 years and demonstrated at the music industry forums and conferences. They loved it. So this is very different from, so there's, why are they going and trying to put the Internet Archive out of business, is a different sort of issue here, than what they, it's not a money issue.

Most of these things have only been listened to by researchers about a hundred times. If you were to pay full Spotify rates, all of the things they're complaining about would be in Spotify rates because people don't listen to crackly old books, records. It would be about $10.

Yet they're suing for $600 million. Why?

CHAKRABARTI: Professor Grimmelmann, that $600 million, is that what constitutes the threat of putting the Internet Archive out of business?

GRIMMELMANN: Yes, it is. Copyright has something called statutory damages, where the court is authorized to award up to potentially $150,000. Even without proof that the defendant made that much money, or the plaintiff lost that much money.

It's just meant to be a kind of deterrent. And when you multiply $150,000 by thousands of recordings, you get up into the high millions, hundreds of millions, very quickly.

CHAKRABARTI: Let me ask you, Professor Grimmelmann, the same question where I started with Brewster at the top of the hour. Imagine for a moment that, given the precedent of the publisher's case, that if this Universal Music Group case goes against the Internet Archive as well, and they are forced to pay hundreds of millions of dollars, which they can't, and they have to shut down, hopefully there's other alternatives, but if they had to shut down, what would we lose, in your mind?

GRIMMELMANN: The other alternatives are going to be what the publishers would call pirate sites. They're going to be people who make archives completely illegally. They're going to be people who do it without the standards of archivism, I don't remember the word, but the professional standards of actually trying to curate and organize these large masses of material.

We're going to have a huge morass of stuff out there, which will be polluted and overrun with advertising and malware and deep fakes, and it'll all just be this huge mishmash. I don't think they're actually going to stem the tide of anything. If anything, we're going to have a greater disproportion of stuff that's ephemeral.

Rather than the enduring historical classics, it's not like they're going to stop piracy. They're just going to make our past more confusing, messier and harder to access.

CHAKRABARTI: So Brewster, are you preparing for the possibility that you have to do something with these materials or that, do you have a plan for what you might do if the Internet Archive does not have a future?

KAHLE: Running a library. We always use the library, and you'll always come away with a book if you talk to a librarian. There's a wonderful book called The Library: A Fragile History. And it basically says, what happens to libraries? And it starts with the old Acadian libraries, the library of Alexandria, but there's all these libraries in between.

And what happens to libraries is they're destroyed, and they tend to be actively destroyed by the powerful. And it used to be church and government and now it's corporations. And government. And so that's what happens to libraries, so designed for it. And what libraries have generally done is they try to work with each other, but a lot is lost.

So if you take the history of the Chinese libraries, over the millennia. Because they have a long history, is the libraries would be built up and then there'd be a new dynasty in town that doesn't like the old stuff around. They don't want it available. And then they destroy the libraries.

People steal the books. They often punish those people if they're caught with the books with death, in the case of old Chinese. And then when the new dynasty comes around, they want them back and they start to build back the libraries again. Will that happen in this country? Probably. When?

Don't know. Other parts of the world go through different cycles, though. And going and having collections and materials that are archived in different places. As I said, the Europeans are taking a very positive view towards libraries in this time when the United States is not.

CHAKRABARTI: My mind is suddenly drawn to another great classic.

Fahrenheit 451, right? In that world, human beings memorized the books that were burned. Essentially, the material was contained somewhere else. And we just have a few seconds left. James Grimmelmann, I'll let you have the last word. There are other major organizations. I'm thinking the Library of Congress.

Is there a way for that institution, whose duty is to document the history of this country. Do they have a role here, to be a container for this information?

GRIMMELMANN: They could or should. There was an effort to create a digital public library of America a few years ago.

There is an absolute need for this to happen. The Internet Archive has stepped up to fill some of our archival needs, but there's much more than they can do.

This program aired on January 7, 2025.

Headshot of Claire Donnelly
Claire Donnelly Producer, On Point

Claire Donnelly is a producer at On Point.

More…
Headshot of Meghna Chakrabarti
Meghna Chakrabarti Host, On Point

Meghna Chakrabarti is the host of On Point.

More…

Support WBUR

Support WBUR

Listen Live