Seeing Voices

Image credit: Screenshot of research video courtesy of MIT researcher Abe Davis


High-speed HD cameras are great for watching fish catch prey in slow motion, but it also turns out they can help researchers--and who knows who else--recreate sound. The implications from this new research from MIT are awe-inspiring and a little scary at the same time.



By analyzing high speed video of everyday objects like houseplants and potato chip bags, MIT graduate student Abe Davis has been able to reconstruct music and even human voices by filming objects in the vicinity of a source of sound.


Slow motion cameras take thousands of frames per second (instead of the usual 24fps of most film cameras) so they can capture sound as the vibrations move through objects. Because the framerate of the camera is faster than the vibrating frequency of the sound, the camera is able to pick up the oscillations that would normally just appear as a blur. With video enhancement and processing algorithms, Davis is able to turn these purely visual inputs into audio, reconstructing music and speech from nothing but the image of a vibrating surface.


When light hits the photoarray in the back of most digital cameras, the image isn’t processed all at once. Instead, an image is captured in successive slices, one pixel at a time. This can cause distortions at high speeds, a problem if you want to photograph a moving propellor, but an advantage if you’re looking for tiny vibrations. This perceived flaw became an asset to the researchers.


The work builds on research by co-author Michael Rubinstein, who developed a video enhancement technique exploiting this design to amplify subtle and normally invisible changes, revealing the pulse of blood vessels beneath the skin and the breathing of a sleeping baby. When applied to sound, the researchers were able to take those slices and treat them as individual frames, resulting in an effective framerate fast enough to capture the frequencies of audible sounds even from regular speed video.



The idea that any normal modern camera, pointed through a window at your begonias, could be used to record conversations is a bit unsettling, but Davis doesn’t think this technology will be used only by spies. As he told the Washington Post, “This is a new dimension to how you can image objects. It tells you something about how they respond physically to pressure, but instead of poking and prodding at them, all you need is to play sound at them.”


What materials scientists, big brother, or the world at large will do with this technology is anyone’s guess, but one thing is certain: the team at MIT have revealed a vast source of information nobody had previously considered, and it captured our imaginations.