Every day, over 2 billion photos are shared across Facebook, Instagram, Messenger, and WhatsApp. And that number is growing ever larger as mobile phones, selfie sticks, consumer drones, and other devices make taking and sharing photos and videos easier and more fun. Unfortunately, participating in more image-based social interactions poses obvious challenges for people who are blind or severely visually impaired, leading many to feel frustrated and excluded because they cannot fully participate in conversations sparked by photos others have posted.

In the April 2015 issue of AccessWorld, we spoke with Facebook Accessibility Team founder Jeff Wieland and accessibility engineer Ramya Sethuraman about the social network's ongoing commitment to accessibility. At the time, one of Facebook's latest accessibility features was the Dynamic Alt Text Generator, which offered some accessibility to photos and videos by gathering all the metadata a user supplies and combining it to generate a caption that tells a more complete story. Recently, Facebook took an interesting new tack, replacing the Dynamic Alt Text Generator with a more powerful feature called Automatic Alternative Text, which is currently available on both their iOS and Android apps.

"Automatic Alternative Text (AAT) is a major step towards creating equal access to information, demonstrating just how much we care about our commitment to connecting everyone," says Wieland, who arranged a question and answer session with one of the newest members of the Facebook Accessibility Team, information scientist and project lead Shaomei Wu.

AW: Can you begin by describing a bit of the groundwork behind your project?

Wu: Certainly. We've spent the last few years diving into how people use screen readers on Facebook—in fact, we did a study on it. One of our most fascinating findings is that people [using screen readers] post, comment, and like photos as much as people who use Facebook without screen readers. In a second study we conducted, we gathered more insights about the specific challenges blind people face, and the strategies they use to interact with visual content. One thing we heard again and again during these interviews is that people often don't describe their photos, which makes it very hard for those without vision to participate in the conversations around them. Hearing these frustrations inspired my team to spend a year trying to solve this problem. Like most product teams at Facebook, our team is very small and a lot of employees volunteered their time and expertise to build a better product.

AW: What is Automatic Alternative Text?

Wu: Facebook's Automatic Alt Text technology processes images uploaded to Facebook. The technology is based on a neural network that has billions of parameters and is trained with millions of examples. Each advancement in object recognition technology means that the Facebook Accessibility team will be able to make technology even more accessible for more people.

AW: How does Automatic Alternative Text work?

Wu: To generate a description for a photo, Automatic Alternative Text uses object recognition to get a list of candidate tags—such as "pizza," "dog," or "child"—and filter them by their confidence. Here are some of the items that can be identified by our system:

  • Transportation: car, boat, airplane, bicycle, train, road, motorcycle, bus
  • Nature: outdoor, mountain, tree, snow, sky, ocean, water, beach, wave, sun, grass
  • Sports: tennis, swimming, stadium, basketball, baseball, golf
  • Food: ice cream, sushi, pizza, dessert, coffee
  • A person's appearance: baby, eyeglasses, beard, smiling, jewelry, shoes
  • And, of course, selfie!

While this technology is still nascent, tapping its current capabilities to describe photos is a huge step toward providing our visually impaired community the same benefits and enjoyment that everyone else gets from photos.

AW: Do you recognize every uploaded image or only those that have been accessed by someone using a screen reader?

Wu: Facebook's object recognition technology processes all images uploaded to Facebook, but currently we only generate automatic alt text for photos that are viewed by screen reader users on Facebook iOS and Android apps. We will be extending this feature to the web, so screen reader users will be able to view the automatic alt text. It will be placed within the standard alt text, so [people who don't use] screen readers won't see it unless they check out the HTML source of the page.

AW: Are these recognized images available on all of Facebook's various platforms and services?

Wu: Currently they are only available on Facebook, and only for those using either the iOS or Android mobile app. We are planning to implement AAT on Instagram, Messenger, and WhatsApp sometime in the near future.

AW: Speaking of the future, what are the future plans for Automatic Alternative Text? Where do you hope to be in one to three years?

Wu: We are working toward allowing touch recognition of an image. Say, for example, the AAT announces: "This picture may include three desks and a window." A Facebook user could slide a finger around the screen to get a more precise layout. The three desks are beside one another on the left and the window is on the right, the user may discover. We also hope to train the AI to provide a natural description of the photo just as a sighted person might offer a blind friend. Eventually, we'd love to get to a place where people who are blind can get a more complete sense what's in his or her viewfinder before they snap a photo to share, and ask questions about a posted image, or even a video—questions like "what kind of car is in the picture?" and the AI will answer them.

Comment on this article.

Related articles:

More from this author:

Author
Bill Holton
Article Topic
Access to Entertainment and Social Media