Implement OCR Feature in UWP for Windows 10 and Hololens 2

I have been in the logistics and port industry for more than 3 years. I have also been asked by different business owners and managers about implementing OCR in their business solutions for more than 3 years. This is because it’s not only a challenging topic, but also a very crucial feature in their daily jobs.

For example, currently the truck drivers need to manually key in the container numbers into their systems. Sometimes, there will be human errors. Hence, they always have this question about whether there is a feature in their mobile app, for example, that can extract the container number directly from merely a photo of the container.

In 2019, I gave a talk about implementing OCR technology in the logistics industry during a tech meetup in Microsoft Tokyo. At that point of time, I demoed using Microsoft Cognitive Services. Since then many things have changed. Thus, it’s now a good time to revisit this topic.

Pen and paper is still playing an important role in the logistics industry. So, can OCR help in digitalising the industry? (Image Source: Singapore .NET Developers Community YouTube Channel)

Performing OCR Locally with Tesseract

Tesseract is an open-source OCR engine currently developed and led by Ray Smith from Google. The reason why I choose Tesseract is because there is no Internet connection needed. Hence, OCR can be done quickly without the need to upload images to the cloud to process.

In 2016, Hewlett Packard Enterprise senior developer, Yoisel Melis, created a project which enables developers to use Tesseract on Windows Store Apps. However, it’s just a POC and it has not been updated for about 5 years. Fortunately, there is also a .NET wrapper for Tesseract, done by Charles Weld, available on NuGet. With that package, we now can easily implement OCR feature in our UWP apps.

Currently, I have tried out the following two features offered by Tesseract OCR engine.

  1. Reading text from the image with confidence level returned;
  2. Getting the coordinates of the image.

The following screenshot shows that Tesseract is able to retrieve the container number out from a photo of a container.

Only the “45G1” is misread as “4561”m as highlighted by the orange rectangle. The main container number is correctly retrieved from the photo.

Generally, Tesseract is also good at recognizing multiple fonts. However, sometimes we do need to train it based on certain font to improve the accuracy of text recognition. To do so, Bogusław Zaręba has written a very detailed tutorial on how to do it, so I won’t repeat the steps here.

Tesseract can also work with multiple languages. To recognise different languages, we simply need to download the corresponding language data files for Tesseract 4 and add them to our UWP project. The following screenshot shows the Chinese text that Tesseract can extract from a screenshot of a Chinese game. The OCR engine performs better on images with lesser noise, so in this case, some Chinese words are not recognised.

Many Chinese words are still not recognised.

So, how about doing OCR with Azure Cognitive Services? Will it perform better than Tesseract?

Performing OCR on Microsoft Azure

On Azure, Computer Vision is able to analyse content in images and video. Similar as Tesseract, Azure Computer Vision can also extract printed text written in different languages and styles from images. It currently also offers free instance which allows us to have 5,000 free transactions per month. Hence, if you would like to try out the Computer Vision APIs, you can start with the free tier.

So, let’s see how well Azure OCR engine can recognise the container number shown on the container image above.

Our UWP app can run on the Hololens 2 Emulator.

As shown in the screenshot above, not only the container number, but also the text “45G1” is correctly retrieved by the Computer Vision OCR API. The only downside of the API is that we need to upload the photo to the cloud first and it will then take one to two minutes to process the image.

Computer Vision OCR also can recognise non-English words, such as Korean characters, as shown in the screenshot below. So next time we can travel the world without worry with Hololens translating the local languages to us.

With Hololens, now I can know what I’m ordering in a Korean restaurant. I want 돼지갈비 (BBQ Pork)~

Conclusion

That’s all for my small little experiment on the two OCR engines, i.e. Tesseract and Azure Computer Vision. Depends on your use cases, you can further update the engine and the UWP app above to make the app works smarter in your business.

Currently I am still having problem of using Tesseract on Hololens 2 Emulator. If you know how to solve this problem, please let me know. Thanks in advance!

I have uploaded the project source code of the UWP app to GitHub, feel free to contribute to the project.

Together, we learn better.

References

Leave a comment