I find myself alone inside the highly fortified research center with the knowledge that I must find my way into the innermost sanctum and stop the evil scientist from completing the experiment that went wrong. Along my way through the complex maze, I encounter mutants that try to kill me, crazed scientists who try to steal my gear, tools and weapons that can help me in my quest, and all sorts of other surprises that are thrown in my path. To make matters more difficult, I must perform all my actions—from shooting at monsters to navigating the rooms and hallways—entirely blind.
Shades of Doom
This is the scenario one finds oneself in when playing Shades of Doom from GMA Games, a leader in the development of complex virtual environments for audio game players. In 1998, David Greenwood, founder of GMA Games, started exploring the possibility of creating a three-dimensional (3-D) auditory environment in developing this exciting new world of audio interfaces for games.
According to Justin Daubenmire, president of BSC Games (the gaming division of Blindsoftware.com), a major breakthrough in the audio gaming industry was the release of a prototype version of Shades of Doom in November 1999, almost a year after Greenwood began gathering ideas about designing a revolutionary real-time, audio-interface game. "The game stood out from previous audio games because of its emphasis on action. Older audio games tended to be organized around text menus, with small puzzles or arcade games woven into the game play. While these are still popular, Shades of Doom is a 3-D world that allows the audio gamer to explore an interactive realm. It made the audio-game industry much more ambitious."
Shades of Doom was released to the public on May 31, 2001, and immediately changed the landscape of audio games by delivering a real-time action audio game featuring a first-person shooter, joystick support, a cheat code, nine levels, original music, lots of weapons and monsters, realistic sounds, braille printer-ready maps, and easy-to-use keyboard commands.
Those of us who work or have worked in the assistive technology field tend to believe that our products (such as JAWS, ZoomText, and Kurzweil 1000) demonstrate the greatest level of innovation and solve the hardest user-interface problems that a person who is blind or has low vision may encounter. Sorry, everybody, but the game hackers have us beat hands down. Developers of assistive technology should not feel too bad, though—video game developers have led the way to many of the most interesting discoveries in mainstream computing, including high-speed graphics adapters, multichannel audio devices, and inexpensive haptics or force-feedback products.
The description of Shades of Doom, provided by GMA, explains that you will hear up to 32 distinct sounds simultaneously. You need to navigate by listening to the direction of the wind and by the sounds of footsteps and breath echoing off walls. Meanwhile, you need to listen for monsters, treasures, crazed scientists, and lots of other goodies that you may encounter.
Less Speech Saves Time
Although a 3-D audio game like those from GMA can deliver up to 32 sounds at once, a screen reader or other speech product can deliver only a single syllable at a time. When I first went to work at Henter-Joyce, in 1997, Ted Henter would frequently remind us to use as few syllables as possible to deliver information to the user. Thus, JAWS says "star" instead of the more descriptive "asterisk" when it encounters that character. When you hear "star," you know what JAWS means and need to wait for only a single syllable, rather than the three syllables in the word asterisk.
If you consider that a syllable that is spoken by a speech product is equal to a unit of time (the amount of time will vary, depending on the synthesizer's speech rate), fewer syllables spoken means less time spent reading the text. Any person who is blind or has low vision who needs to collaborate with sighted colleagues on a large document will undoubtedly notice how much longer it takes to perform tasks like finding and integrating changes with a screen reader than with vision and a mouse.
With each new release, JAWS and Window-Eyes increase the amount of text that is generated by the screen reader to augment the information on the visible screen. Using Internet Explorer as an example, you hear the word link before you hear the name of the link, you hear combo box when you land on one, you hear edit when you land in a text-entry field, and so forth. In all these cases, the screen reader has added one to three syllables to the information that you are actually interested in hearing, and, as a result, up to three units of time have been added to your web-browsing session. If you are reading a Yahoo! or Google page with hundreds of links and controls, your session grows by the time it takes to deliver all the extra syllables.
Will Pearson, an expert in computer interfaces for people who are blind at the University of Bristol in England, suggested that "Through the use of multiple sounds, audio games have found the ability to deliver a lot of information to a player in an efficient and understandable manner. In terms of implementing this in a screen reader, the challenge is to work out how the human brain processes these multiple sounds to give us a better understanding of how the brain deals with multiple streams of information. Once we know this, it should be possible to create screen readers that offer significantly higher efficiency for a user than today's screen-reader offerings, in addition to access to activities that are currently considered inaccessible."
In Home Page Reader (HPR) from IBM, a user can replace some words with a sound that plays at the same time that HPR is speaking something, a big help in improving efficiency. The Freedom Scientific software team took the concept of decreasing the number of syllables to improve efficiency a step further with the introduction of its Speech and Sounds Manager in JAWS 5.xx. This utility lets a user replace a wide array of different things that JAWS would otherwise say with a sound that can play simultaneously to the text being spoken. For instance, I have JAWS set to play a little "ding" when I encounter a link on the Internet. So, when I read a newspaper, I hear the ding at the same time as I hear "News," "Weather," and so on. Thus, I save one unit of time for each link that I encounter. Unlike HPR, though, the JAWS Speech and Sounds facility is available in all applications, not just on the Internet.
Why should manufacturers of screen readers care? The less time that people who are blind or have low vision spend reading their e-mail, web page, word-processing document, or spreadsheet, the more time they can devote to their jobs, studies, and other tasks on a computer. Thus, they can gain some ground in the constant battle to become more competitive with their sighted counterparts. Ostensibly, a screen reader that can reduce the amount of effort that is required to complete tasks, limit fatigue, and provide people who are blind or have low vision with a more efficient tool will win out in this highly competitive market.
According to Pearson, some preliminary research results demonstrate that when listening to a speech synthesizer read text, people who are blind use more of their brains than do sighted people reading the same text visually. Although this is very preliminary data, it raised some questions: Do blind people suffer greater fatigue from hearing their text read by a synthesizer than sighted people do when reading text visually? Is there a theoretical maximum number of syllables that an individual can hear and comprehend in a single session, and will this number represent less information than a sighted person can read and understand in the same amount of time?
Today, two companies are working to push audio interfaces forward. Coincidentally, they both develop products for delivering mathematical information to users. Henter Math, founded by Ted Henter, developed and sells Virtual Pencil, and ViewPlus, a company founded by Oregon State Professor John Gardner, offers the Accessible Graphing Calculator (AGC). Both Gardner and Henter push the innovation envelope for delivering information in an audio format. Both products offer a major step forward by providing access to previously inaccessible information. Henter can now, for the first time in his life as a father, help his daughter do her arithmetic homework, and many children who are blind or have low vision can now use Virtual Pencil to perform basic arithmetic and algebra in the same manner as their sighted classmates. When I need to do math that involves a graph, I use AGC, as do many students and professionals who are visually impaired around the world.
If the technology that is available today can power the complex environment of Shades of Doom and other similar games, how can the people who make speech products for users who are blind or have low vision exploit the same elements of the operating system to provide a profoundly more efficient system for routine tasks? I commend Freedom Scientific for entering uncharted areas with its Speech and Sounds Manager, but, in most cases, Speech and Sounds Manager can be used to deliver only 2 pieces of information simultaneously, while the gamers can deliver 32.
Clearly, the gamers have an edge because they know the narrative of the adventure and can predict with certainty the finite number of actions that a player may make at any given moment. Conversely, it is impossible for a screen reader to know what a user will type next, what part of a word-processing document the user will find interesting, or the value in a random cell in a spreadsheet. This does not, however, prohibit screen readers from becoming more efficient by moving from a one-dimensional interface to one more like those found in the games.
The audio game developers seem isolated not only from the assistive technology companies, but from the research community. While there have been a number of studies on the efficiency of screen readers, the psychology of 2-D and 3-D audio environments, how humans distinguish important sounds from background noise, and the cognitive processes that are involved in processing textual information, no one has studied how the usable game interfaces may be applied to improve productivity.
A Sound Suggestion
I propose an effort that combines the talents of the screen-reader experts at the leading assistive technology companies with some of the game hackers and those in the research community to work together to develop a set of concrete user-interface paradigms that can be used to take the assistive technology field to the next level. Perhaps a summit meeting on the next generation of user interfaces for people who are blind or have low vision that includes all stakeholders would be a good place to start.