Form Recognizer is a Microsoft Azure web service that can read documents and extract information from them. While it comes with browser-based design tools to set up and test the extraction, it is ultimately just a web service that you can send images to. It returns the data in the form of a JSON string.
If you are not a developer you may already tune out here. JSON? Well, my point is that the integration of Microsoft Form Recognizer into your business requires a developer ultimately. Someone needs to write code that sends images to Microsoft and makes sense of the returned data and integrates it with your business systems. If you wish a human operator to review the data or even apply business rules to validate the data before it goes into your systems, you need even more custom-developed software or developers who integrate it with review software you already have.
Bottom line: While easy to configure, the Form Recognizer web service ultimately requires software programming to make use of.
Form Recognizer has 4 levels of abilities:
Raw data – If you send any kind of document as an image, Form Recognizer will perform OCR on it. It then returns the OCR data as structured items such as words and lines. It also finds tables and checkboxes. You can use this raw data to further analyze it in your own systems (full-text search for example).
Pre-built document types – Out of the box Form Recognizer can recognize common business documents such as invoices, ID documents, receipts, and business cards. It returns predefined and known data for such document types. For invoices, it looks for Invoice numbers, customer data, line items, totals, etc. It doesn’t matter how the vendor designs the invoice and how he calls and labels the data. Whether it is “Invoice Number”, “InvNo” or InvoiceNo”, Form Recognizer identifies it as “Invoice Number”.
Generic document types – Form Recognizer can also pull data from unknown document types, like specific forms. It can find such things as names, dates, and amounts. However, since it doesn’t know the meaning of these elements, it labels them with whatever is next to them. If you were to use this approach on an invoice, it would not know that “Invoice No” or “Involve Nurnbar” (a typical mistake OCR makes) actually are the keywords used for the Invoice Number field. It would just call the field whatever it thinks it printed next to it. This requires post-processing to understand the randomly labeled data. The technology is also limited to data that has a clear keyword either to the left of it or above. However, the configuration is simple: There is none. You just send it a document and it returns the “data”.
Custom document types – A better approach than the generic document types are custom document types. These are not configuration-free. Instead, it requires training the form recognizer to pull known data from your custom document type. It requires 5 samples of the document type, and on each of them, you need to label the fields you want it to extract. That’s still easy and quick to do and does not require a programmer or data scientist. The advantage is that it can then extract an “Invoice Number” or whatever field you teach it to extract, regardless of the field’s name or whether the OCR reads the label correctly. Of course, the more variable the document type is in terms of layout and text, the more samples it will need to learn.
In v3.0 Microsoft added a browser-based design application called Studio. This tool lets you test and configure the above-mentioned abilities. You can use pre-loaded sample documents or upload your own. It shows you the results, visualizes them in the image, and shows the corresponding JSON output. For the developers, it also shows the sample code they need to write to call the service. This tool is currently in preview but already publicly available. It will be useful to design the service faster, especially for the custom document types, as that is the tool you will be using for labeling your samples and teaching the system where your data is located on them. It looks like this will replace the v2.0 data labeling tool in the future.
Several aspects limit Form Recognizer:
OCR for machine print supports a lot of languages, they are listed here. 7 languages for handwritten text and what looks like 100+ languages for machine printed text.
Receipts and Business cards only support English-speaking countries like the USA, Canada, the UK, and others.
Form Recognizer only supports US Invoices.
With regards to ID documents, only US driver licenses and the biographical page from international passports are supported.
Generic document types support only the English language.
As input, Form Recognizer supports JPEG, PNG, BMP, TIFF, and PDF. Multi-page document formats such as PDF and TIFF can have a maximum of 2000 pages.
For custom documents and invoices Microsoft charges 50 USD per 1000 pages.
Generic documents, business cards, IDs, and receipts cost you 10 USD per 1000 pages.