Using Forensic Linguistics To Decode An Anonymous Writer's IdentityPlay
The highly anticipated tell-all book by an anonymous senior official in the Trump administration is set to be released to the public on Tuesday.
And while some may pour over what are expected to be alarming details, others will be fixated on decoding the author’s identity through the book’s use of vocabulary and language patterns, otherwise known as forensic linguistics.
“Forensic linguistics is rooted in this basic idea that you and I and everyone you know has a distinct way of using language, of speaking and writing,” says Kavita Pillay, co-host of the podcast “Subtitle,” which is about the science of languages and the people who speak them.
“A forensic linguist can look at those patterns of speech in a person and figure out things that you may not expect: your level of education, the kind of work you do, where you grew up,” she says. “And then apply those patterns and that understanding that also a linguist would have, and apply it to a legal setting.”
The anonymous writer who wrote a 2018 op-ed about the Trump administration used the word lodestar, which led many to speculate that the writer’s identity was Vice President Mike Pence.
“Mike Pence has used that word with some frequency, so in a 965-word op-ed, a shiny word like that is going to stand out,” Pillay says. “And it had a lot of people pointing at Mike Pence is the possible author, but a forensic linguist might look at something like that and say, is that a linguistic smokescreen?”
In the case of the upcoming tell-all book, forensic linguists will have a lot more material to analyze, Pillay says. The editor of an anonymous book such as this also plays an interesting role in concealing the author’s identity.
“Typically an editor's trying to preserve someone's voice and identity,” she says. “But the editor of a book like this, someone who's trying to remain anonymous, would have to kind of blur certain things that might point out who it is.”
On how forensic linguistics helped reveal the identity of the Unabomber
“[The Unabomber] was so careful not to leave fingerprints or any kind of DNA evidence, but he wrote this 35,000-word manifesto, and it became the strongest evidence against him. And it was the first time that a judge granted a search warrant based solely on linguistic evidence.
“He referred to women as broads. He referred to black people as Negroes. So it obviously put him in a sort of coming of age somewhere before the civil rights movement. He used particular phrases that linguists were able to say probably put him having grown up in Chicago. He also did use some linguistic smokescreens. Like this is a guy who went to Harvard and who had a Ph.D., and he used phrases like, I think there was once or twice where he was trying to portray himself as someone who did not have advanced degrees. And so linguists looking at that said, 'Does he not have an advanced degree or is it a smokescreen?' ”
On the methods forensic linguists use and how technology has advanced in the field
“Some forensic linguists rely on the tools of traditional linguists and some are more reliant on computational methods and most are probably using a combination, you know, maybe a bit like translation. If you use something like Google Translate or other translation software, it's a great tool. It's getting better all the time. But if you're trying to translate something of any length or depth or meaning, you're going to need some human involvement as well. So the tools of forensic linguistics are ever-expanding because of things like [artificial intelligence], but forensic linguistics as a field is still heavily dependent on humans to guide it.”
On how forensic linguistics play a role in cracking down on hate speech and cyberbullying
“I think most people probably don't think that a word like forensic or linguistics applies to their life, but if you work in a company and they're using a software to keep track of your emails and to read your emails and such, that's an everyday form of forensic linguistics, right? It's being used on messaging services. Think about how Facebook or Twitter are using it to scan for certain types of messages and sometimes getting false positives. There was a case recently where a woman on Twitter in German made a reference to the phrase, 'OK, boomer,' but she used the word, 'die,' which is pronounced 'dee' as in … ‘die boomers.’ But in English it's D-I-E. It's die. So for that Twitter algorithm, it looked like she was making a threat against boomers, and they kicked her off Twitter for half a day or so.
“But I think probably the most timely example is legislation that Sen. John Cornyn of Texas has proposed. It's called the Response Act and it's in response to the mass shootings that took place in Texas. And that would require certain schools receiving federal funds to use software to look at things like students' emails and assignments and social media. So if they're using, quote, trigger words that might indicate that they're planning a mass shooting. Of course, there's all sorts of privacy issues and you might get a lot of false positives. I mean, how many times do kids say things like, 'I'm going to kill her,' or 'Oh, my God, I'm going to die.' ”
On how forensic linguistics might be applied in the future
“I think whether or not we're aware of it, it's certainly something that's entering our lives, whether it's work email being monitored or social media. But one situation that I hadn't thought of it being applied in is asylum cases. So if someone is saying that they're from a particular part of the world, it might be applied for something like that to show, are the terms that they're using actually indicative of them being from this particular part of the world? So there's a lot of different applications.”
Ciku Theuri produced and edited this interview for broadcast with Todd Mundt. Samantha Raphelson adapted it for the web.
This segment aired on November 18, 2019.