Any regular reader of this blog (both of you) would already know that I enjoy using a bunch of different products from Apple. I use an iPhone 5S running iOS/7, a Macbook Air running OSX/Mavericks with all of its updates, we have an AppleTV set-top box and we use an Apple TimeCapsule router. The first thing one notices when they get any of these devices is that their interfaces are 100% accessible in the iOS case and nearly 100% accessible on OSX right out-of-the-box. For this reason alone, Apple is by far the leader among mainstream companies trying to solve the problems of accessibility to people with vision and other print impairments. Apple continues to make its accessibility better with each release but, while it may be #1, Apple still has a lot of work ahead to be truly competitive with third party screen readers On the Internet.
Any user of a popular Windows screen reader (JAWS and NVDA) or even those with less popularity (Window-Eyes, SystemAccess, ChromeVox and Orca) will, for a variety of reasons, be entirely underwhelmed with the functionality of VoiceOver on a Macintosh with the Safari web browser.
This piece started as a bug report I wrote up for some contacts I have at Apple. For all intents and purposes, I have changed very little between the email I sent to friends there and this article. I’ve removed the names of some individuals who are not public figures, added a bunch of links and did a bit of other clean-up, removing some personal comments and such. This article is specifically about how VoiceOver works with Safari on OSX and may not be applicable in any way to iOS/7 or any other Apple products. Internet support, in my mind, is the single aspect of using a Macintosh with a screen reader that remains substandard which, as Apple is setting the standard in so many other areas, makes me sad.
My Specific Use Cases
It’s possible that each user has his or her own set of cases that are important to them. Like everyone else, I use the Internet for a lot of different things but, most importantly, I write a blog. My blog tends to use other Internet sites as source materials. Therefore, being able to copy and paste from sites is really important to me and,sometimes, when I go to a site and hit VO+ENTER to start selecting text, I hear the “scratching” sound and it actually selects text; sometimes, I just hear a ding and it refuses to select text using this method. On some occasions when the VO web site text selection facility doesn’t work, I can just use SHIFT+navigation keys and the text will be selected; on other occasions, I the only way I can select text on a web site is by doing a “select all,” copy and pasting the entire page into a text editor and finding the piece I’m looking for there. This is, in my mind, one of the worst problems with VO on OSX.
The Overall User Experience
Most other popular screen readers (JAWS and NVDA) and some less popular ones (Window-Eyes, ChromeVox, SystemAccess and Orca) allow the user to navigate around the page using only cursor keys as if in a word processing document. Originally, Orca’s FireFox support, also designed by the person who is now the lead UI developer for VoiceOver, functioned similarly to the VoiceOver design where arrow keys are virtually meaningless except when combined with a modifier key. Orca, not known for its tendency to be terribly competitive with other screen readers nor for its unpleasant user experience, took a step back and changed its UI design to be like JAWS, the screen reader that set the standard for Internet accessibility (if you disagree, I can provide a pile of links to actual testing scorecards that, quite objectively, demonstrate JAWS superiority in all of these areas including the WAI user agent guidelines). Apple, quite obviously, has infinitely more resources than does the Orca project (as far as I can tell, Orca has exactly one developer, Joanie Diggs, working part time on the effort) and can certainly make this happen.
Navigating on a Web Site
As far as I can tell, all other screen readers on general purpose computers (desktops, laptops) allow for single character navigation of a web page. In fact, all but Window-Eyes use the same standard set of keystrokes (h for next heading, t for next table, etc.) and, with all other screen readers, navigating a web page is profoundly more efficient. With NVDA (I don’t use JAWS), I go to a web page and hit “h” and I’m brought to the first heading, I hit “h” again and I’m at the next one and, if I follow that by typing a “t,” I then go to the next table and so on. With VO, I load a web page and, if I want to go to the next heading, I need to hit VO+u first to make sure it’s set for heading navigation and then either find the heading I’m looking for in the list box or, after setting the utility dialogue to headings, use VO+down arrow to find the next one and, then, when I want to find the table, I need to go back into the utility dialogue, change to tables and start over. Hence, finding the object I’m seeking requires far more keystrokes, requires far more cognitive processing, etc. but, worse, it makes switching from any other screen reader to VO much more difficult. I need to use OSX, iOS, Windows and GNU/Linux all nearly every day so anything that improves the similarity of screen readers is important based entirely in the HCI concept called “discoverability.”
On a personal use case note, I cannot tell you how many times I’ve been using VoiceOver, used Command+TAB to switch to another application, returned to Safari and found that I’m hitting the keystroke to go to the next object only to find that I had forgotten to set the granularity back to headings and hear something entirely useless like, “No more tables” which could have been avoided entirely if Apple would just implement the same sort of system as exists in the more popular Windows screen readers. Maybe I’m a bit of a stoner and, therefore, forget which granularity I had VoiceOver set to but I’m willing to bet that lots of other users make this mistake frequently as well. The rotor for granularity changes works reasonably well on iOS but changing granularity on OSX is unnecessarily cumbersome.
Correction: When I wrote the two prior paragraphs this morning, I did so in the absence of any awareness of the QuickNav Commander now available in VoiceOver. For all intents and purposes, if you go into the VoiceOver Utility (VO+F8 if you’re running VO), go to Commanders and select the QuickNav tab, you can turn on “Single character navigation” there and have an experience similar to that available in Windows screen readers. Back when I worked on JAWS, we had something of an unwritten rule, if we add a cool new feature, we made sure it was turned on by default in the next release of the screen reader so that users would find it right away. I don’t tend to read a lot of release notes and, until my friend and accessibility jock, Donal Fitzpatrick (@fitzpatrickd on Twitter) pointed this feature out to me, I didn’t know it was there. So, for all intents and purposes, you can ignore the two paragraphs preceding this one as, given this feature, they are just not true.
Performance and Time
VoiceOVer is ridiculously slow on “noisy” web pages (those with lots of objects). Go to this site about harmonica playing, search on a popular artist (Bob Dylan has a lot of stuff up there) and bring up the Item Chooser (VO+I) and count the seconds it takes to bring up the item chooser list box and, if you’re using the same 2012 model Macbook Air as me, you’ll see that this takes a little more than 7 seconds. Now, using NVDA on a cheap Windows laptop hit NVDA+F7 to bring up its analogue of Item Chooser and you will find that its list box is on the screen and talking in less than a single second. NVDA is also using cross application communication via an API to gather its data but, using caching and other performance enhancing techniques, it actually responds in a functional amount of time; in 2014, waiting 7 seconds for a computer to do anything other than downloading something big from an online source is simply absurd.
When, in September 1999, we at FS released JAWS 3.31, we used Jamal Mazrui’s EmpowermentZone web site as our favorite reference page. Jamal has something like 1700 links on the home page and, according to VO, it has 3789 objects in all. Back then, we were running on 60 mhz Pentium processors with megabytes of RAM and, then, JAWS 3.31 could load its object list dialogue on this page in about 20 seconds (compared with about 25 minutes using Window-Eyes). Just now, when I went into Safari to test this page, it took about 30 seconds for VO to load its item chooser on hardware more than a decade newer, using a quad-core system whose speed is measured in ghz, having thousands of times more RAM and so on. We solved this problem on Windows 98, effectively a 16 bit system; certainly, Apple can solve this problem now that much faster hardware is available.
The Broken Item Chooser
If a user hits VO+i to bring up the item chooser before a page has finished loading it will bring up the list box but, when one hits ENTER on an item, it will just ding and not bring the user to the point he had requested. VO seems to load all of its data much more slowly than any other screen reader (if I bring up the NVDA analogue of this dialogue by hitting the keystroke immediately after requesting a page, it appears immediately and is never out of sync with the rest of NVDA. I’m going to guess that this is a threading issue which are hard to fix but this bug has been present for years now, has been reported by me but I’m also certain that others also reported this problem to Apple.
For no reason apparent to anyone outside of Apple, it seems that the Item Chooser information isn’t cached anywhere. Hence, when one hits VO+i on the same page twice, VO takes as much time to build the list the second time as the first. If the page hasn’t changed, the Item Chooser information should all be present either in memory or cached on a disk and should, even given the other VO constraints, load virtually instantly the second time through.
What About “Clickable” Items?
When VO describes an item as a link, using VO+SPACEBAR will always open it. When, however, VO reports an item as being “clickable,” more often than not, VO+SPACEBAR does absolutely nothing and hitting VO+SHIFT+SPACEBAR to send a mouse click to the object works infrequently. I have my VO configurations set to have the mouse cursor follow focus and also try hitting VO+SHIFT+F5 to route the mouse cursor to the object I’m on but that seems to rarely work as well. [Note: while here in TextEdit, routing the mouse cursor works properly but, while using Apple Mail,, with my mouse cursor set to follow VO, I hit VO+SHIFT+F5 and wasn’t brought to the word i had just typed nor was I even brought to the edit area where I’m typing this but, rather, I clicked something on the Dock, a seriously bad outcome.]
Compared to JAWS
People who have read articles on this blog like “I Give Up” or “An Open Letter to Mark Riccobono” will know that I’m not just a user of Apple products but, based entirely on accessibility, I’m something of an advocate often recommending their hardware to other blind users. Now that Apple seems to have made iWork, their office suite, mostly accessible, the Internet is the only aspect of VoiceOver that I still don’t like much. Readers of this blog would have heard me say, “Apple has set the gold standard for out-of-the-box accessibility” which is true, for almost everything, except for the Internet. Online, JAWS remains the king with NVDA a close second. This is the one area where Apple really needs to do some massive improvements.
If I actually picked up the user agent guidelines and tested each item separately, I would find a ton more bugs. I would find a really big list of defects that one would not encounter if they used JAWS or NVDA. It’s pretty much the Internet access that causes me to use Windows to do most of my research and a lot of my writing these days. On all scorecards regarding screen reader functionality published online (I’m working on an article about these reports coming soon to this blog), JAWS remains the gold standard for using a speech interface to read the web. Apple may have set the bar for nearly everything else but, if Apple wants to be the best, they have a lot of work ahead of them.
The Object Model
VoiceOver arranges its web information by object but doesn’t also include a simpler navigation metaphor. Hence, as I wrote above, it uses a different system for moving from object to object so a separate keystroke is needed for each separate object. If a web site contains the sentence (including the links):
an NVDA or JAWS user might read the entire sentence by issuing a single keystroke (a down arrow perhaps) but, with VoiceOver reading each object separately, one needs to issue a keystroke for each link plus each chunk of text separating such for a total of eight keystrokes to read a single sentence. Also, while doing a “read all” of an entire web page, the user will hear pauses caused by VoiceOver trying to add a tone for each link making the entire reading experience sound really choppy. This is massively inefficient for the users and should be corrected immediately.
The Interaction Model
When a user accesses a web site with JAWS or NVDA, the information is pretty much organized like a word processing document with all of the same keystrokes for navigating through such. With VoiceOver, Apple introduced a model that attempts to group chunks of a web site into larger blocks that a user can navigate between and, when they are in a place they want more detail, the users can “interact” with that portion of the web page. In theory, this system should allow for greater efficiency as it permits the user to easily jump past information in these groups.
Very unfortunately, though, the interaction model only seems to improve efficiency on tremendously well organized web sites and, more often than not, actually requires the user issue more keystrokes than in the “virtual buffer” model presented by most other screen readers out there. For a quick example of this, if you’re using a Macintosh running VoiceOVer to read this page, find the place where one can follow me on Twitter and you’ll notice that you need to interact (and, therefore, stop interacting when you’re done) with an item that contains very little actual information. With JAWS or NVDA, however, simply moving from line to line with arrow keys gets you everything you want.
In my experience, the interaction model causes far more efficiency problems than it solves.
This isn’t my most well organized piece and it gets repetitive in places. I wanted to show, however, how even Apple, the world’s leader in out-of-the-box accessibility, sill needs to continue improving. I’m certain that this item will gather me a bunch of comments (either public or privately through the contact form on this site) about other problems blind users encounter with VoiceOver on Macintosh. If you have other bugs to report, sending them to me may make for a future version of this article to be more comprehensive but, at the same time, I urge you to report any problems you encounter to Apples Accessibility email address so they may both know about the bugs but, also, they may understand just how many people are effected badly by defects and design flaws in their accessibility software.