Seeing AI: First Impressions

July 14, 2017 by Chris Hofstader 10 Comments

Introduction

On July 12 of this year, Microsoft released a pretty remarkable new app for Apple’s iOS line of products. The app is called Seeing AI: Talking Camera For The Blind, it can be used to perform OCR, recognize products, to do facial recognition and (in a beta state) to have a scene described to you from a picture.

I downloaded and installed Seeing AI on the day it dropped, did some preliminary testing and this article contains my first impressions, most of which are highly positive.

Impressions Of Seeing AI

As this article is being written less than 24 hours since I first launched Seeing AI, readers must understand that the testing I’ve done personally and the feedback I’ve incorporated from others who’ve sent such to me is at best superficial. neither I nor any of the people with whom I’ve discussed Seeing AI has had enough time with it to test it in enough different situations and tried enough use cases to write a fully informed review.

These are my first impressions of the app after trying each of its five different modes (called channels in the app) over one night and a couple of hours the following morning.

The Two OCR Channels

Seeing AI has two different OCR modes. One is designed for short bits of text and the other for full page documents. Prior to the release of Microsoft’s Seeing AI, KNFB Reader was the best mobile OCR I’d ever used but it comes with a pretty expensive price tag; on July 12, Microsoft killed any need to use the KNFB product as Seeing AI does the OCR even better and the app is gratis to download and use.

I am thoroughly impressed with both of the OCR modes. The Short Text channel does a great job on things like soup cans, prescription bottles and some other oddly shaped items containing text that it’s been tested with. The document channel handles entire pages of text and is the best mobile OCR I’ve seen to date (based again in very limited testing).

If you balked at KNFB on the price, get Seeing AI and try out its OCR features. The software is gratis, easy to use and I think its OCR is pretty incredible.

The Product Channel

The Product channel works by doing bar code recognition. Bar codes are designed to be read by a computer so this doesn’t seem to be an especially innovative feature but with Seeing AI it works better than any other talking UPC thing I’ve ever tried before. In my testing, I only had one bar code that wasn’t recognized but it was printed in “blue screen” blue rather than the typically high contrast black and white bar codes. I’ve tried about a dozen products using bar code identification and it only missed the one printed in blue.

Using the Short Text channel described above, pointing the camera and taking a picture of a product also works pretty well. I’d say an 80% success rate in the bunch of things I tested. It’s easier to use the OCR than it is to get the bar code lined up with the camera so try the OCR before you go for UPC as it may be simpler.

With bar codes, the product information can contain cooking instructions, ingredients and all sorts of other useful information. This is a very worthwhile feature and, while it may take longer to find the bar code than with one of the proprietary scanners designed for blind users, we need to remember the price and at no cost to the consumer, this is an excellent alternative.

The People Channel

The People channel provides two different types of information, one is a generic description of a person and the other recognizes faces that the user has trained it to know.

The generic descriptions are, in my testing and that of others to whom I’ve spoken, highly unreliable. The first thing I tried to do was to take a picture of a woman who will be 64 next month and Seeing AI said she was a 90 year old man. It identified me as a 74 year old man and I’m 57, lived pretty hard for many of my years and probably did not age gracefully. The worst of these was a woman friend who is about to turn 50, Seeing AI described her as a 94 year old man. Two younger friends reported that Seeing AI was within a year of their actual ages. In my testing it did get hair color and whether the person is wearing glasses or a hat correct. With an 88% error in one of our test cases and roughly a 50:50 on getting women’s gender correctly (I’ve not heard of a case in which a man has been recognized as a woman but we are working with a tiny sample at this point), I find no practical application for this software at this time. This does not mean that this feature is not a really impressive step forward in technology for blind users, it just feels far more like a demo than a useful application to me.

Then, I tried the feature where you train the software on specific faces. To train it, you take three photos of the person’s face and provide a text label. I used my wife and myself for the test and this is where things get cool. Once you’ve trained it to recognize people you know, you turn it to people mode and point the camera at different places in the room, it will tell you the name of a person if it sees them and it finds their face in your personal database. My test was only two people in our living room. I was told the room was dimly lit but the TV was on which probably also added light. When it was facing empty space, it said, “No faces found,” when it was pointed at my wife or I, it said, “Gonz” or “Susan” accurately.

This needs to be tested in more stressful situations. I would like to try it in an actual crowd to see how many faces it recognizes but, while I will give this a try, I need to get a bunch of people into my personal database of faces which now stands at a grand total of two, I need to get some of these people into the same room with some others who I haven’t trained Seeing AI to recognize and then I can start the testing. I promise, I’m going to do this someday relatively soon but as I’ve had the software for less than 24 hours and other than my wife I’ve been in the company of three people at my dentist’s office today, it’s far too soon for me to even start trying.

In its current state, I think it’s incredibly impressive. I’d like to hear your feedback on how it’s working in your test cases so please post comments with your results.

The Scenes Channel

The Scenes channel is described by the app as a beta and instructs the user to not rely on its results. I found this mode interesting but not exactly useful in its current state. When I pointed it at our dining room table, it said, “Table with a lot of things on it” and I said, “no shit.” But, as I played with it more, it recognized Jackson, my guide dog, as large black dog on sofa. It took more effort with our little dog which was identified as “probably a cat” my first three attempts (a mistake a human might make based on sighted analysis of the photos) but when I got her face in the picture it said, “small dog on pillow” and I thought that was pretty cool. One amusing result came when I took a picture of our TV, Seeing AI reported, “Flat screen television watching television” and I didn’t know that televisions watch themselves. I’ve been playing with it around the house and having some fun with it.

The Scenes channel isn’t ready for prime time but it’s interesting to mess around with.

Problems I’ve Observed

The biggest downside to Seeing AI is that it is a battery draining nightmare on your phone. My level dropped from 88% to 34% in about a half hour of usage. This is because it is doing a lot of the recognition in real time on the device, the OCR is instant, the trained facial recognition is happening in real time and all of this is causing the CPU and GPU to work incredibly hard. Over a half hour, the phone heats up as these processors scream. While the battery drain is annoying, I cannot imagine this can be remedied easily and it’s not a showstopper, just turn off announcements in the UI and it’ll put the camera and processing to sleep until you need it again.

I also ran into some bugs. Sometimes it knows to turn on the flash, sometimes it doesn’t. I’ve crashed it three times but it restarts cleanly and gets right back to work.

I also found that it seems to be harder to get a face into the picture using the spoken hints from this app. With the Apple built in guided photography features in its Camera app, I find it easier to get a face recognized and centered in the frame than with Seeing AI. I don’t know why this is but slight movements when trying to tap the button on the phone screen can knock the face out of the frame where this doesn’t seem to happen as much with the Apple app.

The Future?

Here is where I must thank my good friends Austin “Camlorn” Hicks, Sina Bahram and Derek Riemer for taking me to task for my very first impression tweets. These were based entirely on the generic face recognition and the Scenes result that told me the exceedingly obvious, “table with things on it.” I argued with Cam and Sina online more or less just to argue, isn’t that what Twitter is for? But, I respect these guys so highly that instead of just tossing Seeing Ai on the scrap heap, I started testing it further and this mostly positive review is the result.

These guys all said, “It will be better in the future” and I responded with something more along the lines of, “then call me when it’s done.” I’ve seen so many projects offer great promise only to see them fade away over the years that my cynicism became a barrier to the potential that Seeing AI demonstrates.

The killer application of this software may still be unknown. To me, the most incredible thing I can imagine it doing would be running on a pair of smart sunglasses and doing the instant facial recognition of people I’ve trained it to know. I could walk into my favorite pub (Toronado on Haight St. in San Francisco) and know which of my friends are there, roughly which direction they were in relative to my glasses and that would be amazing. Adding the OCR, I could actually read their entire menu of hundreds of beers from all over the globe in print at the bar. Pub culture would be far more fun with this kind of technology.

It’s obvious that my much younger friends saw all of this potential immediately and I’m happy they yelled at me as without their counsel, I would have an entirely different opinion of this technology.

Conclusions

Seeing AI is the most interesting and useful piece of software in the blindness space that I’ve tested out in years. Be My Eyes is pretty cool, KNFB is pretty cool but I’ll predict that Seeing AI will be remembered as the product of the year for 2017. It’s gratis, you probably have an iOS device, go to the AppStore and give it a ride.

Also, you should follow Sina Bahram, Austin “Camlorn” Hicks and Derek Riemer on Twitter. They are super smart and insightful people in the 30 and under category, the generation that needs to invent the future for our community.

About Chris Hofstader

Chris Hofstader has been knocking around the world of blindness for more than two decades. He was VP/Software Engineering at Freedom Scientific for six years ending in late 2004. He has since worked as a contractor on a lot of projects but is most well known for writing this blog and the one called BlindConfidential before it. Chris splits his time between St. Petersburg, Florida and Cambridge, Massachusetts. Chris tends to write about issues related to blindness and technology but he'll also write about mental health, rock and roll, books he's read, fishing, baseball or any of his other myriad interests. Chris also writes some fiction here too.

Comments

Joe Orozco says

July 14, 2017 at 10:56 am

I spent time with the app yesterday evening and concur with your initial impressions. I can’t compare the app to anything other than Tap Tap See, as I have generally been skeptical of apps that claim to redefine the way we view the world around us. For a skeptic, I was pleasantly surprised and hooked.

The biggest problem I experienced was my own incompetence with aiming the camera. I want to be able to grab a piece of random paper and use the phone to start reading, but so much depends on how you hold the paper, how far, etc. I had much better experience with short text than document recognition, but again, that was user error on my part.

I thought the Product channel was especially intriguing. Finding the bar code was not intuitive for me, but again, user ignorance here. When it did recognize the bar code, it provided great information. I wonder if it will work with that directions database thing that’s online somewhere. Again, I have never used Digit-Eyes and so cannot compare how well it does side by side.

Not to divert the discussion, but can anyone speak to AIRA? I have not been interested enough to check into pricing, but it would be great to read a comparison in light of this latest addition to the Apple App Store.

Thank you for another great review.

Reply
Liam says

July 14, 2017 at 11:33 am

I have been preferring to use the people channel to take selfies instead of Apple’s actual camera. I felt as though it did a better job of not only telling me if a face was centered, but giving me a estimation of distance. A useful feature. I’ve also heard it recognizes well-known people on imported pictures so would be curious to see if that works on the facial recognition part as well. I don’t know anyone famous to try it on. As an aside, I went out to breakfast today and was reading the short text feature t oread a place mat and a hash brown box. Maybe underwhelming but I certainly wasn’t able to quickly point at camera at a place mat and get near-instant results two days ago. So it’s a step forward. I feel as though all the other apps we have seen have culminated in a truly useful app. I’ve also told others not to dismiss like AIpoly and similar. they may be inaccurate, but they prove that technology can be made to do this. Its nice to see Microsoft put all that together to make something that really can and I suspect will make a huge difference.

Reply
Lynn Schneider says

July 14, 2017 at 2:54 pm

I love this app. For a first version, it is amazingly useful and I can’t wait to see what Microsoft does with it in the future. What I especially like is that if I can’t get the information I need using one channel, I can switch to another. I just hope that Microsoft makes a true commitment to this app, as we have seen similar projects in the past from other companies which sort of fizzled out. I do think there will need to also be a way to monetize projects like this to keep them viable from the company’s perspective, As the necessary research and development costs will have to be born out somehow. What I would love would be for all the best features of the other existing projects to be bundled into one app, but maybe it is great that so many people are trying to make apps like this, as competition only benefits us more in the end.

Reply
Vivian Cullipher says

July 14, 2017 at 2:55 pm

Thank you for such a detailed review. I tried out the app this morning on random objects as I was getting ready for work. I’m sighted, and a graphic designer, so I was interested in how it interpreted images. I noticed that vertical text (on a shampoo bottle), or highly stylized text was completely missed, though it was able to read a neatly hand-written note. It didn’t recognize my husband as being present at all (perhaps due to dim room lighting), but could differentiate between my bathroom and living room. I’m not sure how that’s practical (or maybe that’s the most practical thing ever), but found it kinda neat. The distance indicators were interesting, and informative, I thought. I’m also intrigued as I consider design choices. Brand names are often highly stylized, while their generic description (“moisturizing shampoo) may not be. Perhaps that’s where the UPC product scans might be more useful? I’d love to hear more about how each of these features is used and how useful they are in different circumstances. Great review.

Reply
Kevin Ackley says

July 14, 2017 at 10:09 pm

Chris, thank you for writing an article so quickly after the launch of Seeing AI.

After a short time of experimenting with it, I have similar conclusions to yours, although your assessment is much more articulate and detailed than mine could ever be.

My biggest complaint is that it thought my 15-year old daughter was a 22-year old man. Needless to say, she was not happy about that. I shared your article with her tonight and I think it put her mind at ease a bit that she’s not the only person who has been incorrectly identified.

I scrolled through several photos that I already had stored on my iPhone and chose “Recognize with Seeing AI” to see how accurately they would be. I was very impressed. I took a picture of my wife at a fireworks display this year and Seeing AI described the scene as “probably person standing in front of fireworks”. She was described as “32 year old woman with brown hair looking happy”. She was certainly happy with that description considering she just turned 46.

In another picture, it identified my wife as “person sitting at a table with a birthday cake”. She had a shirt on with lettering and it even identified the text on her shirt.

If this is the beginning, I can see this becoming a game-changer. I hope Microsoft continues to improve this product over the coming months/years.

Reply
Bram Duvigneau says

July 15, 2017 at 5:34 am

This app is unfortunately only available in some English speaking regions and was not offered in my local (Dutch) AppStore. However, since it’s free, it’s easy to get it from the US store by creating an account there, no payment info required. I guess that Microsoft wants to properly localize the app before rolling it out in other regions. I’ve had no problems with the English interface so far except the measuring of distances in imperial units. I’m not sure, but I guess even for some of the regions where it is officially available now, metric units would be welcome.

Microsoft offered lots of the backing technology for this app already in other forms. For example, the document OCR in Microsoft Lens, scene/object recognition was earlier demonstrated at captionbot.ai and Microsoft has an API for developers to identify faces on photos and give a gender/age/emotion score to them. What makes this app really interesting is the combination of all these different techniques in easy switchable channels and the stuff that runs locally on the phone.

I’ve had good results with the short text channel. It’s the fastest and best method of reading texts on bottles and other odd shapes I’ve seen so far. It also did well on a few receipts I was processing while filing expenses. This seems to run locally on the phone and therefore has a quick response time. This is a world of difference comparing it to older things like Goggles, which streams video to Google for processing and is way slower and less reliable. Most of the scanned texts are in Dutch and were recognized quite well.

I haven’t been able to get the document mode to work. The guiding of the camera, another function that runs locally on the phone, seems to do pretty well, even with odd paper shapes like receipts. However, I got a timeout after the photo has been taken. I’ll try this again later. However, if the OCR is ok, this is a KNFB reader killer for many, especially given the price. Beware though, that your photo gets send to Microsoft servers for processing and that this might not be preferable for sensitive documents.

The product channel works really well so far. Even with local products, the database was able to identify some of them. I would like to see an option to add my own label if the product is not in the database. There is also a “More info” button, but this was grayed out for the products I tried. The matching of barcodes to products is done online, the code detection runs locally. I haven’t any experience with barcode readers designed for the blind, but the beeping to indicate where the code is, works pretty well as guidance and is the best solution I’ve seen so far. Sometimes it’s tricky to get the code to scan, even if it’s in camera view. The best solution seems to change your camera angle a bit, but this might just need some more practice.

The face detection iss pretty good. It’s nice that it gives the distance to the face. I tested the feature with 2 people so far and the software was able to identify their names correctly afterwards. I didn’t play that much with the option where it indicates gender/age. The face detection runs locally, only gender/age detection is handled online. I suppose this means that the photos you make to add people to the list of recognized faces stay on your phone, but didn’t confirm this.

Then the scene channel. As Chris already mentioned, it’s not that useful in practice, but technologically it’s a pretty neat feature. I can see me using this more on photos I receive to get a description of them and not for a description of the scene I’m in right now. The few times I tried it, the results were quite good. It recognized a dog on a bed, some parked cars with a building in the background and a table with “various objects”. This might become more useful if the software is able to answer specific questions/queries I might have about a scene.

Overall, I’m pretty impressed with this app and think that the text and product channels will be the most useful for me in the short term. I hope Microsoft will continue development of this and if not, at least open source the bits that run locally on the phone, because this contains part of the features that make this such a great app. I tested in airplane mode to determine when an internet connection was required.

Reply
Tika Ram Mahto says

July 16, 2017 at 5:13 am

Very useful for us

Reply
Marx Melencio says

July 17, 2017 at 2:14 am

I’m completely blind. I’ve been testing this rather heavily in my iPhone 6S for the past 3 days; and
– Here’s what I think:

PROS <<
• Remarkable Face Identification – For labeling faces with names, also works offline;
• Excellent Document OCR Prepping – Guidance for centering document with text, doesn’t work offline;
• Cool Document OCR Processing – For converting printed text to digital text including formatting, doesn’t work offline;
• Helpful Face / Person Localization Within Captured Scene – Provides location details of detected face, also works offline;
• Good Short Text Recognition – Converts brief printed text into digital text, limited to around 12 to 20 characters or 2 to 4 words, works offline, though can sometimes process 300 characters or more, though only noticed this when online;
• Decent Product Barcode Processing – Useful product details, only works online;
• Can be used to process photos / images stored across your other apps; and
• It’s free …

CONS <<
• Often Inaccurate Scene Recognition – Many errors and false positives when describing captured images, only works online;
• No location details for detected objects, just for persons / faces;
• Terrible Face Recognition – Persons need to be staring straight at the camera to be detected;
• Non-Intuitive Product Barcode Recognition – Difficult to guess where barcode is despite beeping sound to guide user when barcode is detected, only works online;
• Non-Intuitive Short Text OCR Processing – Doesn’t provide guidance;
• Only available at the moment for iOS users in the US, Canada, New Zealand, India, Singapore and Hong Kong;
• Nothing at the moment for Android;
• No direct support for third party wearable / spy cameras with earphone and mic – Quite inconvenient to take out your iPhone every time you need it, especially when walking around; and
• Doesn’t provide voice and camera gesture control …

VERDICT <<
• Similar sentiments to what this Mashable page said – “Microsoft’s ‘talking camera’ app for the blind isn’t as magical as it sounds, It needs some work” > http://mashable.com/2017/07/12/microsoft-seeing-ai-app-for-blind/#7xAs9Y8zOsqT

I’m sure Microsoft’ll be improving this in the next few months or so. Also:

Just wanted to share a proof of concept prototype that I created, which I’m currently working on through a 0 equity inventor’s grant from our national government’s Department of Science & Technology for an AI software development and hardware device manufacturing project: https://www.youtube.com/watch?v=MXgW7folvps?rel=0

Reply
John J Herzog says

July 17, 2017 at 7:18 pm

this app is fantastic for a 1.0 release. I downloaded SeeingAI last week and tried sorting through my unopened mail with it. I’m completely blind and reading mail is one of my least favorite things to do because of how precise I need to be when photographing envelopes in KNFB reader. The other alternative was opening all letters and placing documents on a flatbed scanner which is tedious given the high amount of junk. This app recognized text accurately on envelopes using the short channel almost instantaneously. I was able to hold my phone above envelopes and hear enough feedback in most cases to know whether I needed to open or discard the contents within 5 seconds. I’ve not tried the apps other features but that positive time saving experience alone makes it worth keeping on my device. I think it would be really cool if the Microsoft artificial intelligence in this app could be integrated into the Aira platform. That would enable a blind person to use glasses and AI for many tasks and augment that feedback with a call to an agent when necessary. We live in exciting times!

Reply
Jake says

January 16, 2019 at 10:57 pm

I’m a bit late to the commenting party here but wanted to speak my mind regarding this wonderful app. I installed it at the end of last year, and tried scanning some things with mixed results. Then just this past weekend, I scanned 2 pieces of mail that I brought up from my mailbox downstairs. One was a piece of junkmail, and the other read the same way but I eventually found out from a sighted neighbor that the other piece of mail was my tax form for this year. My father had texted me and my mother last week and told me to be on the lookout for a tax form, so I’m glad my neighbor helped me out. Apparently the buffer(s) in Seeing AI have to be cleared somehow between scans? Having said all that, this is a great app and I can see many uses for it. I have both this and Microsoft’s soundscape app on my iPhone and love them.

Reply

Search this site

About Chris Hofstader

Comments

Leave a Reply Cancel reply

Introduction

Impressions Of Seeing AI

The Two OCR Channels

The Product Channel

The People Channel

The Scenes Channel

Problems I’ve Observed

The Future?

Conclusions

Related

About Chris Hofstader

Reader Interactions

Comments

Leave a Reply Cancel reply