On July 12 of this year, Microsoft released a pretty remarkable new app for Apple’s iOS line of products. The app is called Seeing AI: Talking Camera For The Blind, it can be used to perform OCR, recognize products, to do facial recognition and (in a beta state) to have a scene described to you from a picture.
I downloaded and installed Seeing AI on the day it dropped, did some preliminary testing and this article contains my first impressions, most of which are highly positive.
Impressions Of Seeing AI
As this article is being written less than 24 hours since I first launched Seeing AI, readers must understand that the testing I’ve done personally and the feedback I’ve incorporated from others who’ve sent such to me is at best superficial. neither I nor any of the people with whom I’ve discussed Seeing AI has had enough time with it to test it in enough different situations and tried enough use cases to write a fully informed review.
These are my first impressions of the app after trying each of its five different modes (called channels in the app) over one night and a couple of hours the following morning.
The Two OCR Channels
Seeing AI has two different OCR modes. One is designed for short bits of text and the other for full page documents. Prior to the release of Microsoft’s Seeing AI, KNFB Reader was the best mobile OCR I’d ever used but it comes with a pretty expensive price tag; on July 12, Microsoft killed any need to use the KNFB product as Seeing AI does the OCR even better and the app is gratis to download and use.
I am thoroughly impressed with both of the OCR modes. The Short Text channel does a great job on things like soup cans, prescription bottles and some other oddly shaped items containing text that it’s been tested with. The document channel handles entire pages of text and is the best mobile OCR I’ve seen to date (based again in very limited testing).
If you balked at KNFB on the price, get Seeing AI and try out its OCR features. The software is gratis, easy to use and I think its OCR is pretty incredible.
The Product Channel
The Product channel works by doing bar code recognition. Bar codes are designed to be read by a computer so this doesn’t seem to be an especially innovative feature but with Seeing AI it works better than any other talking UPC thing I’ve ever tried before. In my testing, I only had one bar code that wasn’t recognized but it was printed in “blue screen” blue rather than the typically high contrast black and white bar codes. I’ve tried about a dozen products using bar code identification and it only missed the one printed in blue.
Using the Short Text channel described above, pointing the camera and taking a picture of a product also works pretty well. I’d say an 80% success rate in the bunch of things I tested. It’s easier to use the OCR than it is to get the bar code lined up with the camera so try the OCR before you go for UPC as it may be simpler.
With bar codes, the product information can contain cooking instructions, ingredients and all sorts of other useful information. This is a very worthwhile feature and, while it may take longer to find the bar code than with one of the proprietary scanners designed for blind users, we need to remember the price and at no cost to the consumer, this is an excellent alternative.
The People Channel
The People channel provides two different types of information, one is a generic description of a person and the other recognizes faces that the user has trained it to know.
The generic descriptions are, in my testing and that of others to whom I’ve spoken, highly unreliable. The first thing I tried to do was to take a picture of a woman who will be 64 next month and Seeing AI said she was a 90 year old man. It identified me as a 74 year old man and I’m 57, lived pretty hard for many of my years and probably did not age gracefully. The worst of these was a woman friend who is about to turn 50, Seeing AI described her as a 94 year old man. Two younger friends reported that Seeing AI was within a year of their actual ages. In my testing it did get hair color and whether the person is wearing glasses or a hat correct. With an 88% error in one of our test cases and roughly a 50:50 on getting women’s gender correctly (I’ve not heard of a case in which a man has been recognized as a woman but we are working with a tiny sample at this point), I find no practical application for this software at this time. This does not mean that this feature is not a really impressive step forward in technology for blind users, it just feels far more like a demo than a useful application to me.
Then, I tried the feature where you train the software on specific faces. To train it, you take three photos of the person’s face and provide a text label. I used my wife and myself for the test and this is where things get cool. Once you’ve trained it to recognize people you know, you turn it to people mode and point the camera at different places in the room, it will tell you the name of a person if it sees them and it finds their face in your personal database. My test was only two people in our living room. I was told the room was dimly lit but the TV was on which probably also added light. When it was facing empty space, it said, “No faces found,” when it was pointed at my wife or I, it said, “Gonz” or “Susan” accurately.
This needs to be tested in more stressful situations. I would like to try it in an actual crowd to see how many faces it recognizes but, while I will give this a try, I need to get a bunch of people into my personal database of faces which now stands at a grand total of two, I need to get some of these people into the same room with some others who I haven’t trained Seeing AI to recognize and then I can start the testing. I promise, I’m going to do this someday relatively soon but as I’ve had the software for less than 24 hours and other than my wife I’ve been in the company of three people at my dentist’s office today, it’s far too soon for me to even start trying.
In its current state, I think it’s incredibly impressive. I’d like to hear your feedback on how it’s working in your test cases so please post comments with your results.
The Scenes Channel
The Scenes channel is described by the app as a beta and instructs the user to not rely on its results. I found this mode interesting but not exactly useful in its current state. When I pointed it at our dining room table, it said, “Table with a lot of things on it” and I said, “no shit.” But, as I played with it more, it recognized Jackson, my guide dog, as large black dog on sofa. It took more effort with our little dog which was identified as “probably a cat” my first three attempts (a mistake a human might make based on sighted analysis of the photos) but when I got her face in the picture it said, “small dog on pillow” and I thought that was pretty cool. One amusing result came when I took a picture of our TV, Seeing AI reported, “Flat screen television watching television” and I didn’t know that televisions watch themselves. I’ve been playing with it around the house and having some fun with it.
The Scenes channel isn’t ready for prime time but it’s interesting to mess around with.
Problems I’ve Observed
The biggest downside to Seeing AI is that it is a battery draining nightmare on your phone. My level dropped from 88% to 34% in about a half hour of usage. This is because it is doing a lot of the recognition in real time on the device, the OCR is instant, the trained facial recognition is happening in real time and all of this is causing the CPU and GPU to work incredibly hard. Over a half hour, the phone heats up as these processors scream. While the battery drain is annoying, I cannot imagine this can be remedied easily and it’s not a showstopper, just turn off announcements in the UI and it’ll put the camera and processing to sleep until you need it again.
I also ran into some bugs. Sometimes it knows to turn on the flash, sometimes it doesn’t. I’ve crashed it three times but it restarts cleanly and gets right back to work.
I also found that it seems to be harder to get a face into the picture using the spoken hints from this app. With the Apple built in guided photography features in its Camera app, I find it easier to get a face recognized and centered in the frame than with Seeing AI. I don’t know why this is but slight movements when trying to tap the button on the phone screen can knock the face out of the frame where this doesn’t seem to happen as much with the Apple app.
Here is where I must thank my good friends Austin “Camlorn” Hicks, Sina Bahram and Derek Riemer for taking me to task for my very first impression tweets. These were based entirely on the generic face recognition and the Scenes result that told me the exceedingly obvious, “table with things on it.” I argued with Cam and Sina online more or less just to argue, isn’t that what Twitter is for? But, I respect these guys so highly that instead of just tossing Seeing Ai on the scrap heap, I started testing it further and this mostly positive review is the result.
These guys all said, “It will be better in the future” and I responded with something more along the lines of, “then call me when it’s done.” I’ve seen so many projects offer great promise only to see them fade away over the years that my cynicism became a barrier to the potential that Seeing AI demonstrates.
The killer application of this software may still be unknown. To me, the most incredible thing I can imagine it doing would be running on a pair of smart sunglasses and doing the instant facial recognition of people I’ve trained it to know. I could walk into my favorite pub (Toronado on Haight St. in San Francisco) and know which of my friends are there, roughly which direction they were in relative to my glasses and that would be amazing. Adding the OCR, I could actually read their entire menu of hundreds of beers from all over the globe in print at the bar. Pub culture would be far more fun with this kind of technology.
It’s obvious that my much younger friends saw all of this potential immediately and I’m happy they yelled at me as without their counsel, I would have an entirely different opinion of this technology.
Seeing AI is the most interesting and useful piece of software in the blindness space that I’ve tested out in years. Be My Eyes is pretty cool, KNFB is pretty cool but I’ll predict that Seeing AI will be remembered as the product of the year for 2017. It’s gratis, you probably have an iOS device, go to the AppStore and give it a ride.
Also, you should follow Sina Bahram, Austin “Camlorn” Hicks and Derek Riemer on Twitter. They are super smart and insightful people in the 30 and under category, the generation that needs to invent the future for our community.