ABBYY FlexiCapture 8.0 Professional Thursday, 09 October 2008
Functionality Overview
ABBYY FlexiCapture 8.0 Professional is a cost-effective, easy-to-deploy yet powerful data capture system ideal for small-to-medium organizations and departmental paper-processing tasks. It is designed as a standalone solution, with all the processing operations – from the import of documents to the export of data – executed on one workstation. However, FlexiCapture 8.0 Professional is a scalable desktop system: when it is used in a local network its data capture/document processing Projects can be shared by several operators in a workgroup to increase the overall efficiency and processing speed.
The data capture/document processing procedure consists of two functional parts: the Set Up Stage and the Processing Stage, which are usually executed by different users – an administrator and operators. ABBYY FlexiCapture 8.0 offers two dedicated user modes:
-
Administrator mode is intended for setting up the whole capture process, preparing document templates and testing the preparation results. This mode is used for the Set up stage and provides full access to all product functionality, including processing options, test batches and design tools.
-
Operator mode is intended for the capture/processing operations only, such as the import/scanning of documents, recognition, verification of recognition results and export of data. It offers a simple, intuitive interface to provide the optimal ergonomic conditions for every-day work with the program.
Set Up Stage
This stage includes setting up the system, processing rules definition and document/form template preparation. The preparation is usually carried out by the administrator who creates templates for all type of documents and forms to be processed within the bounds of one processing Project. For the system set up stage, FlexiCapture provides easy-to-use, but powerful design and template creation tools make preparation of a FlexiCapture system simple:
Form Designer
A design module to create and print out blanks on fixed forms intended for completion by hand. FormDesigner allows creating a wide range of fixed forms: black/white or color; linear, raster or dropout; one- or multi-page. The created forms comply with the requirements of machine-readability and provide superior recognition quality when they are processed in FlexiCapture 8.0. Document templates required for processing the forms are created automatically from these blanks.
FlexiLayout Studio
A powerful design tool to create FlexiLayouts, i.e. flexible descriptions for semi-structured documents (documents with the similar types of data but variable layout). FlexiLayout allows the system to obtain specific data like text strings, barcodes, date, currency, numbers, separators and tables from semi-structured documents.
For example, a single FlexiLayout is enough to find all required fields on invoices coming from different suppliers and with any layout variations, even for multi-page documents. FlexiLayout Studio provides effective and convenient instruments to test and adjust FlexiLayouts on a set of document images: hypotheses tree, reference layouts, FlexiLayout language.
Document Template Editor
A special module to provide Document Templates for any type of documents with processing settings and rules — how to identify a document type in the stream of various documents and pages and what should be done with the document type during the processing. With the Document Template Editor, an administrator can specify which data should be extracted, how the data should be located, recognized, checked, verified and exported.
Project Creation
At the end of the preparation stage, administrators create Projects - a complete set of Document Templates and processing settings (such as import and export profiles) which define the way documents will be processed by operators. Administrators can also define project settings such as: which document template, library database, and import criteria to be used during processing. creation of “fixed” templates.
Processing Stage
Once the set-up and preparation is completed by the administrator, the recognition and processing of forms and documents is easily automated and executed by the operator. Key processing operations include:
Document Import
Paper documents can be imported into the program by using any scanning device (scanners or MFPs) that support TWAIN or ISIS scanning protocols. Images of documents can be manually added from a folder or automatically imported by using Hot Folder settings.
The supported input formats of images include PDF, BMP, PCX, PNG, JPEG, JPEG 2000, TIFF, DjVu and DCX.
Import operations can be simplified and automated by using a set of Import Profiles with pre-defined settings.
Document Classification and Recognition
Recognition stage includes image pre-processing, document classification, data and text extraction and automated validation of data. All the operations are executed automatically and can be run as a background process.
- Image Pre-processing
The imported document images can be preliminary processed to provide the highest possible recognition quality. Image pre-processing implies a series of operations to correct the page orientation, invert and remove skewing or noise on images.
- Automatic Document Classification
ABBYY FlexiCapture matches existing Document Templates with imported images and classifies the incoming mix of separate pages to the appropriate documents. Intelligent document classification allows the program to identify various document types in the incoming stream by using ABBYY’s intelligent document recognition (IDR) and award-winning FlexiCapture technologies. ABBYY FlexiCapture automatically classifies documents with variable layouts of any complexity, including multi-page documents with variable number of pages, multi-page tables as well as documents with image or text attachments.
- Data and Text Extraction
Once Document Templates are assigned and appropriate data fields are located, the data from the fields are automatically extracted by using accurate ABBYY multi-language recognition technologies. Unstructured documents are recognized by using full-text recognition to get searchable PDF files.
- Automatic Validation of Data
All validation rules set up by the administrator during the template design stage are automatically applied during the recognition. Among the most commonly types of rules used are: checking format, checking data against database, checking sums, replacing values from the list, normalizing dates and prices, etc. Many data types have associated dictionaries of permitted words for validation purposes. Any custom rules in scripting languages can be applied in addition to the set of pre-defined rules.
Data Verification
Once recognition is over, each character is assigned a status that reflects the degree of certainty with which it has been recognized – this can be a reliable recognized character, uncertain character or unrecognized character. In the latter two cases, the operator may choose to accept or correct the uncertain character. This stage requires more manual involvement than any other does.
To speed up and simplify the verification process, ABBYY FlexiCapture 8.0 offers an easy-to-use verification interface. There are three different verification modes: group verification (for checkmarks and digits mostly); field verification (for text field contents); and verification in the Document window.
The verification process also includes: correction of assembling rules by using Thumbnail view, correction of validation rules and manual indexing by using KFI.
Group verification:
Field verification:
Verification by documents:
Data Export/File Saving and Document Archiving
Ultimately, the extracted data can be exported to:
- files
- Microsoft SharePoint
- external databases (via ODBC)
- any business applications or document management systems by using custom modules
The data can be exported as is, or along with image and text attachments. Documents can be saved as searchable PDF/PDF-A files for archiving purposes.
ABBYY FlexiCapture 8.0 supports a wide range of formats to save data (TXT, XLS, DBF, CSV, and XML) and images (TIFF, JPEG, JPEG 2000, PDF, PDF/A, PCX, BMP, PNG, and DCX).