Bryan Pardo

Associate Professor, Electrical Engineering & Computer Science in the McCormick School of Engineering; head of the Interactive Audio Lab

Interviewed By

Josh Shi

Published On

May 2017

Originally Published

NURJ 2016-17

Leslie Zhang | Photo

A bookshelf rests on a wall adjacent to an electric keyboard. Beatles: The Complete Score and A Dictionary of Musical Scores sits on the top shelf, above C++ Primer and Numerical Recipes in C. Below that there are books on artificial intelligence (lots of them), pattern recognition, and the psychology of sound. This bookshelf gives a fair preview of the work of Prof. Bryan Pardo, who started his undergraduate career as a physics major and ended with a major in jazz composition and a minor in computer science. An Associate Professor in EECS in the McCormick School of Engineering, Prof. Pardo teaches Machine Perception of Music and Audio and runs the Interactive Audio Lab, which focuses on helping computers solve audio-related problems for people.

When did you realize that you could combine computer science and music?

As I was a junior, I got into an argument with someone who said they thought jazz was all inspiration from heaven, and I said most jazz players are hacks who are just paying a bunch of memorized turns of phrase that they stick together into various orders and make their solos. Right then a professor was walking by and he said, “Do you believe that?” And because I was unwilling to back down from my statement, I said of course. I spent the next year discovering that it is way harder to program a jazz bot than I ever thought.

What made it really interesting to me was that I learned two deep lessons that year: one was it’s much easier to just talk than it is to have a meaningful conversation. Making software that listens to what someone’s doing, thinks about it, and then plays something appropriate that goes with it, that is way harder in the way that it’s much harder to have a good conversation between two people or between a computer and a person than it is to have a computer just generate text. So I became much more interested in understanding the sounds people were making than just making the sounds myself with a machine. And the other thing was I realized, hang on, why am I trying to automate a process that people actually want to do? I switched my thinking to be, how do I find the parts of music making or sound that annoy people and not automate the stuff that they really want to do? And so my work became understanding sound and doing something with that understanding that somehow makes your life easier.

We’re trying to make machines understand sound. And that might mean recognizing a melody, it might mean identifying this is a dog barking versus a police siren. One thing is understanding the sonic world, but not what people are saying--everything you do with your ears when you’re in a foreign country where you don’t speak the language.

On the website for Balkano, your klezmer band, it says you play avant-jazz, circus-punk, klezmer, latin, Middle Eastern music, and gypsy-jazz. Do you find you have a difficult time talking about the music you play to people?

The trouble is, when you’re playing with a Colombian band, a Middle Eastern band, and a gypsy jazz band, and sometimes with a klezmer band, and someone asks, “What do you do?” Basically I do things that clarinet or saxophones are getting involved in outside the US.

Computer science, just like music, likes to put people into bins. In music, it’s like are you a funk band, or are you a jazz band? You can’t be a jazz-funk band, you have to pick which bin you go in, which radio stations play you. And in computer science, and you human-computer interaction, are you machine learning, are you signal processing? We do signal processing. We have to, because we’re dealing with sound. We do machine learning, we have to because we’re having machines automatically learn to recognize sounds. And we do human-computer interaction because we don’t just want to do it just to do it, we want to make things to help people. Depending on who I’m talking do I’m either a signal processor, a machine learner, or an HCI person.

If you were introduced to someone who has no idea what you do, how would you describe your work?

I make machines that hear and they use what they hear to help you do something with audio that you didn’t know could be done. That might be making a hearing aid that amplifies the speech but not the background music. I’m about human-centered computing. That’s a catchphrase these days, but everything I do is motivated by some particular problem people are having that I think we can help with.

Can you describe how the field has changed since you’ve entered it?

If we define our field as computer scientists working with non-speech audio, computers and music, when I got into it people thought of it as a weird way to make bleep-bloop noises or maybe help you make a synthesizer. Everything got transformed with the iPod because no one understood why someone would want a music search engine when they only have ten audio files on their computer. And then one day they discovered they have ten thousand audio files on their iPod and later, iPhone, and they couldn’t remember the names of all of them. So something where you could sing some part of the song and get the thing you were looking for started to make sense to them. Then when you go to everything streaming and over the web, all of a sudden music recommendation becomes a big thing and a company like Pandora comes out of nowhere. When I got into it, which was in the early 2000s, no one understood why anyone would want to search for music, why anyone would want to recommend music, why anyone would want to do any of that stuff, and now there are big companies centered around that. So this music information retrieval field is suddenly a major thing with corporate sponsorship. There was this transformation. When all the music went online, what’s interesting or possible to do changes.