Upstage

View Original

Technology for Digital Assetization 101

2023/11/14 | Written By: Lucy Park

(This blog is Part 2 of the Digitize Anything! series.)

As mentioned in the previous blog, digital assetization must precede digital transformation. However, depending on the purpose of this assetization, the technology used will vary. So which technology should I use? In this article, we will explain which technologies to adopt based on purpose and data.



OCR: A Technology That Reads All Letters of a Document

If our organization’s data is in the form of an image or document file, and we want to find and read every letter of the file, we can use OCR—Optical Character Recognition.

  • What input does OCR take? Document files such as PNG, JPG, PDF, etc.

  • What output does OCR return? Characters and character location information.

  • How does OCR work? OCR consists of two models. First, there is a detector that finds letters from a given file, and a recognizer that deciphers which letters it finds.

Detector → Recognizer

In the case of the detector, the position of the letter is expressed as a quad (square of 4 points), polygon (contour expressed as 2N points), center point (one center point), etc. Upstage Document OCR detects using a rectangular method with four dots, as shown in the photo above.

Conversely, for the recognizer, character recognition is performed based on predefined target characters. Undefined characters are usually recognized as unknown symbols such as “�”. Currently, the characters targeted for recognition defined by Upstage Document OCR include:

(1) Korean
(2) English
(3) Numbers
(4) Chinese characters
(5) Special characters

We are continuously updating this list according to customer requests.

Nowadays, both detectors and recognizers are developed as deep-learning based models and, depending on the scenario in which it is utilized, an end-to-end integrated model can be substituted for developing the detector and recognizer separately.

What Does a Good OCR Model Constitute?

To adopt the best OCR model for your company, you must be able to properly evaluate models. Below are four criteria that can aid in your assessment:

  • Accuracy: In OCR models, the most important metric for many customers is accuracy. A more precise technical term to consider is F1-score, and as of 2023, it is common for commercial-level models to score 95 or higher on any test set.

  • Inference Speed: Inference speed is important for returning results in real time. While this speed is dependent on the number of characters included in the image, an inference speed of less than 2 seconds per image is appropriate. Inference speed usually has a trade-off correlation with accuracy. If inference speed is less important, you can choose a model that is not as fast but has correspondingly high accuracy.

  • Recognition Range: The range of characters recognized by the OCR model is correspondent to the character set defined by ISO-15924 or in the language code of ISO 639-1. Thus, before utilizing any OCR, it is essential to check which character set or language code is appropriate for your company’s mission. Furthermore, signatures, checkboxes, and stamps that are not defined in Unicode must be reviewed as well.

  • Robustness: Due to the nature of the AI model, accuracy scores can vary greatly depending on test set and metric. So it's not surprising that an OCR model that scores an accuracy of 95 in one case would score 80 in another. If a model that usually scores well in only a specific case has poor quality on a company’s data, the model's generalization performance can be considered poor. A good OCR model has excellent quality for a variety of real-world data, alternatively known as edge cases. Therefore, in order to verify whether a particular OCR model has suitable capability, it is important to verify whether it has sufficient robustness to analyze various cases in data.

What is OCR Used For?

Representative uses of OCR are as follows:

  • Image searching: This technology can be used to search images by indexing letters within a document. This technique is useful for finding related images based on words or sentences mentioned in a specific document. An OCR analyzes text entered by a user and presents related images from the Internet or database.

  • Manga translation: Furthermore, OCRs can extract text contained in comics and translate it into different languages. This is useful technology for global readers, making comics easily accessible in a variety of languages.

Information Extraction: A Technology Evolved From OCR, Selecting Key Information From Documents

When you want to pick out key information contained in a document rather than simply reading text in its entirety, you can take one step up from OCR and use information extraction technology.

  • What input does information extraction receive? Lists of document files such as PNG, JPG, PDF, etc. and any key information you wish to extract.

  • What output does information extraction return? Outputs necessary data as structured information.

  • How does information extraction work? Like OCR, after the detector and recognizer are performed, a parser that extracts only the necessary information from all given characters is run.

Detector → Recognizer → Parser

The list of key information you wish to extract is called an “ontology.” If the ontology we want extracted from documents is, for example, a patient registration number, a treatment period, or even a receipt number, we annotate the data to include these three pieces of information. The information extractor trained from this data returns the final value in the form of a key-value pair.

Example of information extraction results

A Suitable Information Extraction Model

Below are four criteria you can use to evaluate information extraction models:

  • Accuracy, Inference Speed: As with most OCR models, accuracy and inference speed are important, and there is a trade-off correlation between the two.

  • Adaptability to Various Templates: In the case of an existing base model, key information cannot be extracted at all if the document template changes. However, models developed with AI technology have the advantage of being able to extract information even without a prior template.

  • Support for Our Organization's Data Formats: Sometimes, not all of the key information in a document can be expressed in the format you need. For example, you may require extracted information in the form of a table with rows and columns.

Receipt example where [Egg Tart, 3500, 1, 3500] forms a group

4. Providing Reliability Scores: In many cases, information extraction is used to automate information input into documents. A ‘Confidence Score’ can be useful in checking whether the extracted information requires human verification. When a reliability score is provided, items above a certain threshold are automatically processed, while items below are inspected by humans or undergo separate processing procedures. The reference point should be set as desired by an organization itself.

What is it Used For?

  • Loading Various Types of Documents Into a Relational DB: Insurance companies use information extraction technology to selectively extract important data (e.g. drug name, amount, etc.), automatically extracting the necessary information from medical bill receipts and detailed medical bill statements. If extracted information is automatically saved in the database, it can be reused to easily generate statistical data such as drug usage.

  • Personal Information Masking: Personal information such as name, resident registration number, and address can be automatically identified and masked (hidden) in a document. This security helps your document comply with privacy regulations.

  • Work automation: Logistics and shipping companies certify and track the delivery of cargo. B/L (bill of lading) documents contain key information such as cargo details, origin, destination, and transportation conditions, and store this data. Automatic extraction technology can automate processes such as cargo management, transport route optimization, and delivery status tracking, greatly improving logistics efficiency.

Going Forwards

OCRs and Information Extraction Technology go beyond simple data processing, enabling effective digital assetization as well as fundamental innovation in business processes. By automating the process of extracting important information quickly and accurately, they significantly improve work efficiency. These technologies, already employed in a variety of industries such as insurance, manufacturing, banking, hospitals, and retail, can be expanded to an even wider field. By referring to the cases introduced in this article, organizations can discover and apply the solution optimal to their work environment, building a faster, more accurate, and more efficient work process.