Study: To Read Accurately How Someone Is Feeling, Voice May Be Best

In this file photo, actress Ingrid Berman tells newsmen in London on Nov. 13, 1957 about her new picture in which she will co-star with Cary Grant, who listens to her comments. (AP)
In this file photo, actress Ingrid Berman tells newsmen in London on Nov. 13, 1957 about her new picture in which she will co-star with Cary Grant, who listens to her comments. (AP)

"If you want to know how someone is feeling, it might be better to close your eyes and use your ears," began the news alert from the American Psychological Association.

My ears perked up, so to speak. I've long been struck by what an emotional medium radio is for listeners, and how powerfully it can convey speakers' feelings — sometimes voluntarily, sometimes maybe not so much. Now, a new study in the journal American Psychologist was suggesting that voice can be such a telling vehicle for emotion that it might even be a better indicator than facial expression?

I spoke with the paper's author, Michael Kraus, an assistant professor at Yale University's School of Management. Our conversation, lightly edited:

How would you sum up what you found?

In the study, we compared different channels of communication, and how well people are able to read emotions based on those different channels. In particular, we compared emotion recognition while you're interacting with someone regularly — so, using all channels of communication, the voice, the face, non-verbals — and we took out some channels. We compared that to perceiving emotions in interactions where you just hear the voice.

And what we found, across our studies, is that the voice is best, relative to having all channels of communication, and relative to just looking at the face, for perceiving emotions accurately in interactions.

It's so surprising. We're so used to thinking that visual cues in particular are hugely important.

They are still important. But maybe we spend a little bit too much time on vision, in part because a lot of the emotion research started there — for good reason. We use our facial muscles to convey emotion. But we're missing a piece that actually comes online a bit sooner than vision, which is the voice. When you're a newborn, for instance, you're listening, and you're smelling, and you're touching for emotion. We thought maybe it was a missed area of emotion research.

So on a mundane level, do your findings suggest that next time I'm in, say, a difficult conversation with my husband, it might be a good idea to close our eyes or darken the room?

I don't know if we can go that far from just these studies. But there are at least a couple of good reasons why that might be a good strategy. The first is that if you are trying to really read somebody, and you are doing it by trying to listen to what they're saying, then also trying to see their facial expressions and parse those nuances — that's a lot of work. And it might be easier, it might be more efficient, and it also conveys a lot of the rich emotion, if you just really focus on the voice.

So certainly there needs to be more work done before I start dancing out there and saying everybody needs to close their eyes when they're talking. But I think our work starts that conversation.

And of course, it's wonderful for radio people because the suggestion is that listeners can parse our emotions extremely accurately when they're hearing us.

Right. And I was a relatively new parent when I started this. You know how, talking to your kid, you become really intentional about how you convey your emotions in the voice. You might be really frustrated, but you don't want that to leak out. So there are a lot of reasons why you pay attention to the voice, and you realize how much of the emotional content of an interaction is carried there.

On the research side, the field of studying emotion has been pretty dominated by faces, and a lot of the work on "emotional intelligence" has also been very visual. What might your findings do to that dominant flavor? 

I hope it'll shake things up a little bit. On emotional intelligence, it's really important to be emotionally intelligent, but if you are studying emotional intelligence and operationalizing it, and you're just defining it based on how well people recognize facial cues, you might be missing a really key component of emotional intelligence when you're studying it. That's a problem for researchers that it's going to take some new innovations to really solve.

I wonder if the field focused so much on visual cues because it's easier to publish pictures of faces than audio files....

That could be part of it. I also think that there's something about momentum. You had work in the '80s and '90s by Paul Ekman that was amazing work, and it came out first and defined the field in many ways. So we know a lot about what the face does. Now, pointing the light on different modes of emotion expression — like through touch, like through the voice — is a way to enrich our understanding of emotion more broadly.

Also, so much of the brain is devoted to vision you would think it would be primary or best at discerning emotions. But these findings suggest it's not necessarily best.

I think part of it is that we are really intentional sometimes about how our non-verbals are communicated and in some ways, we can conceal better. The face is a great tool for communicating emotion, but it's also a place where people can mask their expressions the best. People can work on their non-verbal body language. There's a whole cottage industry of trade books about using your body language to your advantage, and that might obscure the message in the face and the body.

But aren't we also good at dissembling with our voices?

It's harder. There's a lot of research on how much of you 'leaks out' in your voice. For most people, it's much easier to control our body language, to go into an interview and have a game plan for how you will look. But then you get thrown a question that reveals something in your voice.

Can you elaborate a bit on how we 'leak out' in our voices? 

This kind of 'leaks' into my other work: We look at thin-slicing and voice, and you have a ton of information there about your characteristics. People can tell how old you are, what your education level is, your race, ethnicity. A lot of it leaks out.

And you have a paper in press about being able to tell a lot by how people say just seven words, right?

Right, we had a large dataset of people speaking across the U.S., just speaking freely. We took seven words in all of the different recordings, and we just presented them to a group of naive observers who had never met these people before. And those seven words were enough to perceive that person's social class — education, occupation status — above chance accuracy. So people were accurate, at least minimally, with just hearing seven words people speak from all across the U.S.

And in this current study on voice-only being best, voice-only was just moderately better than other modes? 

Right, a small to medium effect. There's a small advantage in perceiving emotion through the voice relative to across all channels.

I imagine video conferencing companies are not going to like this study. How does this work translate into the workplace?

We think about this a lot with our students here at the School of Management. We do recommend face-to-face interaction and this work might question that.

But I have a big caveat that we can't really study in these data: Part of what's good about a video conference is that you can hold people more accountable to the interaction. So if I'm on video, I'm less likely to be on my email, I'm less likely to be making my grocery list. And if you are taking away the video, you take away some of the accountability, and that's not something that we studied. But if you can make everybody equally accountable, then I think the voice call might be better for perceiving emotion.

Are there any implications here for long-distance relationships? I've always thought of phone as a really hard way to keep things going...

You shouldn't take any single study and run with it too far. So don't throw out the face, don't throw out all the other channels of communication. I think maybe in this narrow domain of accuracy, you see some better performance in the voice, but the richness of communication is still cross-channel.

So what's your most hoped-for take-home message?

For me, it all comes back to listening. We think about listening as something that everybody can do. But in some ways, in a culture that is fast-paced, with a lot of information being thrown at us, you may get distracted from time to time. And that's going to be a problem for reading emotions.

And what we're finding here is how important it is to really stop and listen to what people are saying for accurately understanding their intentions. So that might be the biggest takeaway from this work. It's still the same grade-school message: Listen well.

Headshot of Carey Goldberg

Carey Goldberg Editor, CommonHealth
Carey Goldberg is the editor of WBUR's CommonHealth section.



More from WBUR

Listen Live