Text matching

To complete page identification by using text matching, you must first complete a full page recognition. You can then search the recognition results for a string that is unique to each page type.

In the TravelDocs application, the first function attempts a full page recognition and searches for the string Pickup on the current page. If the function finds Pickup, it assigns the page type Rental_Agreement. If the function does not find Pickup, it fails, and the second function searches for the string Flight. If the second function finds Flight, it assigns the page type Air_Ticket. If it does not find Flight, the second function fails, and the third function searches for the string Room. If the third function finds Room, it assigns the page type Room_Receipt. If it does not find Room, the page remains with the page type Other.

As with the structure-based techniques, when you identify a page by using text matching, the page is not matched to a fingerprint. Therefore, even though recognition zones are available for your application to locate data during recognition, the zones are not aligned to the scanned image. After you identify a page with text-matching methods, you can customize the application to call CreateFields. This call locates the recognition zones where they were defined on the original fingerprint image for that page type. The zone locations are not adjusted for shifting of the scanned image in the same manner that Fingerprint matching can adjust locations. However, you can work around this limitation by using either of two methods: You can crop and de-skew the image during an image-processing step, or you can use pattern-match anchors to align the zones.