Quirks and Quarks

A writer and sound engineer investigates the science of the human voice

His book is 'Now You're Talking: The Story of Human Conversation from the Neanderthals to Artificial Intelligence'

His book is called 'Now You're Talking'

Community volunteers pose with speech bubbles. The human voice is a much more convenient way to communicate (Pexels)

Speech is a fundamental form of human communication that's allowed us to ask questions, tell stories, and exchange information in ways that made modern life possible.

It's this fundamental importance of human speech that inspired Dr. Trevor Cox to explore the science of it in his latest book Now You're Talking: The Story of Human Conversation from the Neanderthals to Artificial Intelligence.

In an interview with Quirks & Quarks host Bob McDonald, Cox, an award-winning science communicator and a professor of acoustic engineering at the University of Salford, shared his thoughts on the past, present and future of human speech.


This interview has been edited for length and clarity.

Bob McDonald: What do we know about when we first acquired the ability to speak?

Trevor Cox: The interesting thing about this sort of evolution is it's a really controversial area, and no one can quite agree what the answer is to when did it appear. But I guess I might date it to about half a million years ago. And the interesting thing about it is we kind of think that maybe homo sapiens, you know, modern man, is unique in speaking. But more and more evidence is coming about that Neanderthals and other species of humans probably had proto-language as well.

BM: What was it that enabled us to begin speaking?

TC: Well, one theory is that we got our big brain first and then that enabled us to do this really complicated process, because speaking and using languages are really complicated things our brain has to do. The theory is that actually we needed a big brain to allow us to use our hands. Because as soon as we stand up on two feet, then we have our hands free to do stuff and then we start doing very sophisticated things with our hands and that caused lots of neurological control that gives us a bigger brain that then allows us to have language and speech.

Trevor Cox's new book is called "Now You're Talking: The Story of Human Conversation from the Neanderthals to Artificial Intelligence." (Trevor Cox)

BM: Just how much brain power is involved in our voices and speech?

TC: If you look at, say, listening to speech, it's using large amounts of different parts of the brain to process it. We have two parts, one of which is actually just the words we're trying to interpret, but there's also 'am I saying it in an angry voice?' or 'am I happy?' These kind of things that we pick up on and other things like you would have noticed I've got a British accent. You pick up on where people are from and start working out are they educated? how old are they? All these sorts of things. So your brain is working on lots of different things.

BM: Could this lead you to develop a prejudice, if you've never met the person?

TC: Yes, definitely. I mean one thing I did find is sadly, voice, like anything else, you get people stereotyping. So they hear a voice and then make assumptions about it and then that sort of leads to a whole sort of prejudice on the back of it.

BM: So what should someone do if they're told that they have a thick accent?

TC: Well, I think they should celebrate and enjoy that thick accent. You will get people make assumptions based on it. One of the assumptions is that people with a strong accent are less educated, which is entirely unfair, because why can't some have a strong accent and be highly educated, and there's lots of people like that. But your brain makes these sort of heuristic decisions and ends up with this prejudice. So we have to be quite careful that you don't let this cloud your thinking.

Smart speakers from Amazon, Echo and Google. (Amazon/Google/Apple)

BM: Towards the end of your book you go beyond human speech. You start looking at artificial voices and speech. Tell me about that.

TC: Yes, so the idea of trying to create an artificial talker has been around for centuries actually. I mean you go back to really old devices that were just really mechanical devices so you could actually make a really simple speaker if you just blow air from a sort of tube and move it around and go 'ma ma ma ma,' so a bit like a cow sound that's relatively easy to do. But of course, once we get into the 20th century, people start trying to do this electronically. But the idea of trying to create something that can speak and converse has fascinated humans for centuries.

BM: Do you think we'll come to a point where artificial intelligence might become better at speaking than humans so that we could replace actors or singers or heaven forbid radio hosts?  

TC: There's huge excitement about it at the moment. If we look at some of the tech firms bringing out things like Siri and Google Home and Alexa, you'd think the speech on that is really good. But it's a long way from acted speech and it's a long way from what we're doing currently, just even having a conversation. What these devices can do is read things back to people in a sort of vaguely intelligible, vaguely natural way. That's a long way to containing all the things that we portray in our voices, whether we're happy or sad, how we're feeling today — all these secret things we give away that also make the voice sound special and interesting.

BM: So how far do you think artificial voices can go?

TC: I think we'll get to the point where it'll mimic human voices pretty well. We're some way off from doing that, but human listeners are quite accommodating actually. We're used to listening to garbled speech quite a lot of time if think of a mobile phone call. It's often not that clear, but we still accept that it's a human speaking, but through some technology that's garbling it. So I think it won't be long before we'll have those sort of situations where the speech will be quite natural. But to get completely natural in all cases, there's huge problems. Not least understanding the language.