jump to navigation

“Yeah, I see what you are saying” ( And take the quiz ) March 17, 2007

Posted by Sharath Rao in CMU, science.

We are familiar with waveforms – time along x axis and intensity along y-axis. What I have down here are four spectrograms. These have time in seconds along x-axis, frequency in Hz along Y-axis. The energy of the signal is given by the darkness/lightness. So, at any given time x, you can see the strength ( energy ) of the frequency component y. Its really a different way of seeing much the same information.

In this particular case, I have four people who have recorded the same set of words and the recorded utterance is represented in the form of spectrograms. Obviously, its requires lot of practice to just look at this image and say what was actually said by the person. But there are people out there that can do it. ( Prof. Zue at MIT is perhaps the best known among them. ) Prof. Reddy here at CMU apparently can look ( NOT listen which is what we all do ) at the waveform and identify the utterance with reasonable accuracy.

Anyway, here are 4 spectrograms from 4 different people with ( Male/Female, Age ) . The utterance that is common to all – Konkani phrase “Kasan Thondre” meaning “What is the trouble about ?” ( or as some might argue “Whats your f*in problem” 😉 ).

( Those yellow lines are not significant )

Fig 1.0 : Speaker : M, 28




Fig 2.0 : Speaker : M, 25





Fig. 3.0 : Speaker : F, 24



Fig. 4.0 : Speaker : F, 24



Two things now :

a) Now imagine the challenge of computer algorithms that have to convert automatically speech to text that have to use information that looks so different and arrive at the same conclusion.

b) We see identical twins/triplets who look so similar and marvel at the wonders of nature. Fig 1.0 is my brother Bharath Rao and Fig 2.0 is myself and just see ( yes, actually see ) how similar our voices are. Its a revelation to me too since I have never had recorded samples of our voices, that too for the same piece of text.

Now try listening to these two files and see if you can tell who is who. ( if you have listened to at least one of us

a) this one and b) this one.

Identify the speakers.

My mom then is to be forgiven for “asking me to buy something on the way back from Manipal !!” when I called from Boston. Talking about voices over the telephone, read this mail I wrote long ago to my school friends.



No comments yet — be the first.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s

%d bloggers like this: