Aaron Preece
The rapid progress in generative AI over the past two or three years shows serious promise for drastically increasing accessibility in many aspects of life for people who are blind or have low vision. Previously, we've discussed the benefits of using services like Be My Eyes and the recognition features in iOS. However, as AI has increased in complexity and understanding, it has also become valuable for improving accessibility on desktop computers.
In this article, we will specifically focus on GPT-4O, though other highly complex AI models such as Google's Gemini, Anthropic's Claude, and Meta's LLaMA may achieve similar results.
Improving PDF Accessibility
Even at their most accessible, PDF documents can be challenging to navigate and manage when using a screen reader. While services exist to convert PDFs into more screen reader-friendly formats, such as Word or HTML, these conversions often struggle to accurately follow the tag structure of the original PDF. Additionally, many PDF documents lack proper tagging altogether, requiring the use of the "Infer Reading Order from Document" option in Adobe Reader, which is far from perfect.
I’ve discovered that generative AI can take the contents of an unstructured PDF and turn it into an accessible form with proper formatting. If you have an image-only PDF, you can either feed that PDF directly to the AI or use a service to break it down into individual images and then feed those one by one into the AI for greater accuracy. Simply asking the AI to convert the PDF into your preferred format (I personally always choose HTML) with specific parameters and then uploading the PDF directly can yield surprisingly good results.
However, it's important to be aware of the tendency for AI to hallucinate—that is, to return incorrect data as if it were correct. For example, when I uploaded a PDF to the AI, it started guessing at the content after a certain point, generating text that didn’t accurately reflect the PDF. A better approach was to select all the text from the PDF and paste it into the AI, which resulted in a much more accurate outcome.
Flexibility and Adaptability of Modern AI
Modern AI's ability to continually follow instructions without requiring overly detailed prompts is particularly useful. While working on this process, I sometimes found that the AI wasn't outputting the formats I wanted or was structuring data incorrectly. In these cases, because the AI can only generate certain portions at a time, I could interrupt at certain points and ask it to continue from the top. The AI's capability to understand plain language and work with you as if it were a person offers a lot of flexibility in how you want the information presented and what you can ask the AI to do.
This functionality isn’t limited to PDFs. If you have any text data that is unformatted or difficult to read—think tables that aren’t properly constructed—you can ask AI to properly format them for much easier reading.
Accessing and Analyzing Data in Images
Another useful ability of AI is the extraction and analysis of data from images. Whether it’s an inaccessible image or one with a proper description but lacking detailed access, AI can help. For instance, you can feed an unlabeled image of a chart or diagram to the AI and get answers to your questions about the data it contains. The AI can convert the data into a text table or present it in another format that’s easier to work with.
You can also ask AI to analyze the data. For example, if you have a chart showing the tracking of two different statistics and you want to know when those two statistics converge, you can ask the AI. A quick example would be a chart that lists months with restaurant revenue and holidays. You could ask the AI if revenue for the restaurant went up or down during holidays and, if so, which holidays caused increases or decreases. This kind of analysis, which might be simple to do visually, can be challenging if you’re working with raw data—but AI can make it accessible.
The Bottom Line
Outside of these use cases, there are likely many other situations where AI could be beneficial, depending on your needs. Essentially, if it’s something where you could give someone visual information and have them create a text or similar equivalent, it can be done with AI. There may be limitations when dealing with niche subjects or specialized knowledge, but so far, I’ve found few limitations in what I’ve tried.
As AI continues to improve, it’s exciting to see where the technology will take us next. New input formats are always being added, such as audio, and the AI itself is always increasing in complexity and capability. We're living in an exciting time for AI-assisted accessibility.