Using ReportMiner to Extract Business Information from Printed Documents – Part 1: Creating the Header

By |2020-11-05T15:42:22+00:00August 12th, 2014|

Information drives today’s businesses and manually transferring printed and written data into electronic formats is no longer a cost-effective option.

Astera’s ReportMiner enables users to electronically extract data from printed documents so that it can be integrated into a company’s database for use in electronic applications for operations and business intelligence needs. This saves significant human resource time and dollars, minimizes the risk of poor quality data due to manual mistakes, and frees up human resources for other more important work.

Build an Extraction Model in Minutes

To mine a report, you first need to create an extraction model containing the definition of the report’s structure. This can be accomplished quickly and easily within ReportMiner. If you don’t already have the software and want to follow along with this tutorial, you can download a trial here. ReportMiner supports reading flat text reports, PRN reports and PDF reports.

A report model normally has several regions and fields belonging to those regions. An example of a region is the header, footer, data region, and any additional ‘append’ regions.  An example of a field within a region is CompanyName, AccountNo, Quantity etc. A region may have child regions located within that region.  A field can only belong to one region at a time, and fields cannot overlap.

To create a new report layout, go to File -> New and select Report Model. Select a sample report file in the Open dialog box. Using a sample of an actual report will allow you to ‘visualize’ the report, which shows the regions and fields, as well as their actual values from the sample.

In the screenshot above, a sample report file for Orders has been selected. The selected sample is loaded to the Report Definition Editor. A different sample file in the report definition editor can also be loaded, if desired, at a later time by clicking the icon on the toolbar and navigating to the desired file.

At the top of the sample report is general Order information, such as Company Name, Order Date and time, Customer Name, Account Number and others. Following it is the detailed Order info, such as order items making up the order.

The sample report has two logical regions, the Header and the Data Region. Unlike some other common reports, this report has no Footer.

The header is at the very top of the report, spanning three lines starting at the line with the order date.

PDF Report Models in ReportMiner

The first step in creating the report model is to define the header for the report.

In the Report Definition Editor, select the top three lines. This is the area that covers the Header. Right-click on the selection, and, using the context menu select one of the options shown in the context menu below:

Since  the Header is being created, select Add Page Header Region.

Report Browser on the left hand side of ReportMiner now shows a new node Header.

The header in the sample report always starts with a date, shown at the very first line and in the very first character position of the header. The date can be used as an identifying pattern for the header. Any time the   pattern occurs in the report file, ReportMiner will treat it as the beginning of the Header.

Enter the wildcard characters denoting digits as shown below:

Notice that the Report Definition Editor now highlights the header in purple. The header spans three lines, as shown by the purple block in the editor.  The height of the header or any other region, (i.e., the number of lines that the header spans) is controlled by the Line Count input below the Report toolbar.

The next step is to create fields making up the header.

There are two ways to create fields.

1. Highlight a field, right-click and select Add Field.

2. Right-click within the Header area, and select Auto Create fields.

Scan PDF Reports in ReportMiner

ReportMiner will scan the sample report and identify any changing values within any occurrences of the Header. These changing values will be marked as fields.

In the example, the Auto Create Fields feature added five fields. They are now displayed in the Report Browser under the Header node. Notice that the new fields are also highlighted in darker purple in the Report Definition Editor.

The fields created this way are assigned unique names, such as FIELD_0, FIELD_1 and so on.

Fields can be renamed if needed to make them more descriptive. The selected field is always highlighted in yellow in the report definition editor.

1. Select a field in the Report Browser, double click and enter the new name


2. Select a field in the Report Browser, right-click it and select “Rename”


3. Select a field in Report Definition Editor (the selected field is highlighted in yellow), right-click and select Rename from the context menu.

The field’s data type can also be changed if needed. In this example, ReportMiner was able to correctly assign fields data types from the sample report:

This is how easy it is to create the header in a report model. Next week we’ll explore how to create the main or data region of the report.