Free Software: Translate Your Screen to Sound

Recently I've been experimenting with translating my computer screen into sound; the idea being that if you're blind, you could use a piece of software to translate your computer screen, or even the output of a webcam, into sound.

My attempts so far are available free, with source code, here, as a runnable jar file. If your computer is a bit slow, it might sound a bit choppy. This software, which I call "InnerEye", translates a 20x20 square under your cursor into sound. Best used with headphones!

I've been inspired by thoughts of walking boldly around the streets blindfolded reading signposts and labels etc, using this system. But that's a hell of a long way off right now!

I'm not the first person to have this idea; in fact you can already find other systems that probably work a lot better than this one! They generally involve sweeping across the field of vision, while my software translates a small square all at once.

To be honest, I doubt whether you could ever learn to, say, read your computer screen using my initial attempt as presented here. The software basically translates each pixel into a sound, translating the vertical axis into the pitch, while left-to-right translates into stereo panning and phase difference between the left and read ears. But of course it's extraordinarily difficult to make out any difference between multiple sounds of the same pitch along a left-right axis.

This is why systems that are more successful than my own employ a sweep across the field of vision.

What might render my system workable (short of a child genius working with it from the age of about 1 year old with massive enthusiasm and developing extraordinary hearing capabilities) is if the same shape produced in some kind of a way the same sound regardless of where it was located on the screen, or how big it happened to be. I'm not sure if that's really possible. There might be some mileage in having a system that subdivided the screen into, say, four, and produced a sound that represented the average brightness of each square, then subdivided each square again and again and so on. Probably you'd need to analyse the average intensities of the squares using a sort of blurred Gaussian sampling, so that the squares effectively overlapped, avoiding hard edges in the input image. How the output sound would be produced from this, I'm not completely certain. But it would be nice if small details (i.e. the smallest subdivisions) corresponded to the highest frequencies.

Whether this would achieve the desired effect or not (similar shapes equalling similar sounds), I'm not completely certain. I think the main problem is that two sounds aren't even perceived as similar unless their frequencies are multiples of each other. Octaves, in other words. But you quickly run out of octaves with this kind of thing.

On the other hand I'm not quite ready to drop the idea yet.

If you have any ideas about it, let me know!