EXAMPLES
Recognition errors
This is a sample text that was scanned and recognized by the leading OCR
software. Bad quality of the original document and an unusual font have
resulted in many errors.
Did one of the beft
of Kings, furrounded
by his guarhs, aud
in his capifal citv?
And wiIl not
plots be fometimes
formed againft
them that are nOt
to be refittcd
by thelr couragc,
nor ayoided by thCir
wifdom? When mlsfortunes
happcn that reflCct
upon the honOr
of a nation, the weight almoft
singty lies upOn
the Prince 4, if thev
are fuch as hurt
a whole cOUntry
the point or intereft,
as fOr example,
by fire, pIaque,
or famlne, he bCars
share, as having the largeft
share in propCrty.
The adverfities
of the commonwcalth
affecl him in ohief,
and he has - but his bare prOPOrtiOn
in its propertIes:
add to this, that he can.
This text contains 116 words, 39 of which have recognition errors. It is practically incomprehensible. AfterScan processes this text in 5
seconds and produces the following output:
Did one of the best
of Kings, surrounded
by his guarhs,
and in
his capital city?
And will not
plots be sometimes
formed against
them that are not
to be refitted
by their courage,
nor avoided by
their wisdom?
When misfortunes
happen that reflect
upon the honor
of a nation, the weight almost
singly lies upon
the Prince 4, if they
are such as
hurt a whole country
the point or interest,
as for
example, by fire, plaque,
or famine, he bears
share, as having the largest
share in property.
The adversities
of the commonwealth
affect him in chief,
and he has — but
his bare proportion
in its properties:
add to this, that he can.
Only one error was not corrected. It will be
shown in the Journal of Modifications along with spell-checker suggestions
(see screenshots). It would have taken at least 4 minutes to tweak this text
manually. Imagine proofing a few megabytes of text. Now you can fire your
staff of correctors and have it done in a few hours instead of few weeks.
Typing errors
Manual input (typing) errors are different from
OCR errors and require different rules and algorithms of
correction.
Reformatting
In the old times, text editors used spaces for
indentation and justification of the text. Sometimes you can come across a
text that looks like this. It
looks fine in proportional font but if you try to resize that window you
will see that the text is space-justified (double and triple spaces
between words), there are hard breaks after each line and first line is
indented with spaces. Also, the word "gentleman" is hyphenated
(carried over to the next line).
You cannot do much with this kind of text in a modern editor with
"floating" justification unless you remove all hard line breaks
and eliminate extra spaces and hyphens. AfterScan has the ability to do
just that. The reformatting function will produce the following
text. Try to resize that window and you will see the difference.
|