Behind the Mic: The Science of Talking with Computers

come into this world with the innate
abilities to learn to interact with
other sentient beings. Suppose you had to
interact with other people by writing little
messages to them. It’d be a real pain. And that’s how we
interact with computers. It’s much easier
just to talk to them. It’s just so much
easier if the computers could understand
what we’re saying. And for that, you need really
good speech recognition. NARRATOR: The first
speech recognition system was developed by Bell
Laboratories in 1952. It could only recognize
numbers spoken by one person. In the 1970s, Carnegie
Mellon came out with the Harpy system, which
was able to recognize over 1,000 words and could recognize
different pronunciations of the same word. MALE COMPUTER VOICE: Tomato. FEMALE COMPUTER VOICE: Tomato. NARRATOR: Speech
recognition continued in the ’80s with
the introduction of the hidden Markov
model, which used a more mathematical approach
to analyzing sound waves and led to many of the
breakthroughs we have today. JEFF DEAN: You’re taking in
very raw audio waveforms. MALE SPEAKER: Like you get
from a microphone on your phone or whatever. MALE COMPUTER
chop it into small pieces and it tries to
identify which phoneme was spoken in that
last piece of speech. GEOFFREY HINTON: So
a phoneme is a kind of primitive unit
for expressing words. JEFF DEAN:
want to stitch those together into likely words
like Palo Alto. RAY KURZWEIL: Speech
recognition today is quite good at transcribing
what you’ve said. MALE SPEAKER: What’s the
weather like in Topeka? ROBERTO PIERACCINI: You
can talk about travels. You can talk about
your contacts. RAY KURZWEIL: Like
where can I get pizza? PHONE: Here are the
listings for pizza. RAY KURZWEIL: How tall
is the Eiffel Tower? PHONE: The Eiffel Tower is– FRANCOISE BEAUFAYS: We’ve
made tremendous improvements very quickly. MALE SPEAKER: Who is the 21st
President of the United States? PHONE: Chester A.
Arthur was the 21st– MALE SPEAKER: OK, Google. Where’s he from? RAY KURZWEIL: Years ago,
you had to be an engineer to interact with computers. I mean, today,
everybody can interact. ROBERTO PIERACCINI:
One thing, though, that is still in the infancy
is the understanding. GEOFFREY HINTON: We need a far
more sophisticated language understanding model
that understands what the sentence means. And we’re still a very
long way from having that. ALISON GOPNIK: Our
ability to use language is one of the things that
helps us have culture. It’s one of the things that
helps us pass on traditions from one generation to another. Figuring out about how
the system of language works, even though that seems
like a really easy problem, it turns out to be
one that’s really hard but that every baby
has cracked by the time they’re two years old. FEMALE CHILD: There’s two L’s. FEMALE SPEAKER: There’s two L’s. Yeah. E-L-L-I and then– FEMALE CHILD: E. FEMALE SPEAKER: E. ROBERTO PIERACCINI:
Language is extremely complex and sophisticated. BILL BYRNE: From the semantics– RAY KURZWEIL: Irony– FRANCOISE BEAUFAYS:
Strong accents– MALE SPEAKER:
Facial expressions– RAY KURZWEIL: Human
emotion because that’s part of how we communicate. BILL BYRNE: Humor. RAY KURZWEIL: Do I
have to be careful not to offend the dinosaur? BILL BYRNE: Language has
so many different layers, and that’s why it’s such
a difficult problem. GEOFFREY HINTON: At present, the
human brain, and the learning algorithms in the human
brain, are far, far better at things like
language understanding. And they’re still a lot
better at pattern recognition. BILL BYRNE: So whether
or not we replicate exactly what the brain
does to understand language and to understand speech,
is still a question. GEOFFREY HINTON: For
many, many years, we believed that neural
networks should work better than the dumb existing
technology that’s basically just table lookup. And then in 2009,
two of my students, with a little input from
me, got it working better. And the first time it just
worked a little bit better, but then it was
obvious that this could be developed to something
that worked much better. The brain has these
gazillions of neurons all computing in parallel. And all of the
knowledge in the brain is in the strength of the
connection between neurons. What I mean by neural
net is something that’s simulated on a
conventional computer, but is designed to work
in very, very roughly the same way as the brain. So until quite recently,
people got features by hand engineering. They looked at sound waves,
and they did Fourier analysis. And they tried to
figure out, what features should we feed to the
pattern recognition system? And the thing about
neural networks is they learn
their own features. And in particular,
they can learn features and then they can learn
features of features and then they can learn features
of features of features. And that’s led to
a huge improvement in speech recognition. JEFF DEAN: But you
can also use them for language
understanding tasks. And the way you do
this is you represent words in very high
dimensional spaces. GEOFFREY HINTON: We can
now deal with analogies where a word is represented
as a list of numbers. So for example, if I take
the list of 100 numbers that represents Paris and
I subtract from it France and I add to it
Italy, and if I look at the numbers I’ve
got, the closest thing is the list of numbers
that represents Rome. So by first converting words
into these numbers using a neural net, you can actually
do this analogical reasoning. I predict that in
the next five years, it will become clear that
these big deep neural networks with the new learning
algorithms are going to give us much better language
understanding. ALISON GOPNIK: When
we started out, we thought that things
like chess or mathematics or logic, those were going
to be the things that were really hard. They’re not that hard. I mean, we can end up with
a machine that actually can do chess as well as a
grandmaster can play chess. The things that we
thought were going to be easy for a
computer system, like understanding
language, those things have turned out to
be incredibly hard. BILL BYRNE: I can’t even
imagine the “we’ve done it” moment quite yet, just
because there are so many pieces of this
puzzle that are unsolved, both from a
science point of view, as well as from a technical
implementation point of view. There’s a lot of unknowns. ALISON GOPNIK: Those are
the great revolutions. They’re not just when we fiddle
a little with what we already know, but when we
discover something completely new and unexpected. JEFF DEAN: I think
once you kind of are in the ballpark of human
level performance, that will be pretty remarkable. [MUSIC PLAYING]

100 thoughts on “Behind the Mic: The Science of Talking with Computers

Leave a Reply

Your email address will not be published. Required fields are marked *