FEB -61990
Consumer Report
Desktop Scanners
to
Report #5
January 1990
EPA
220/
1990.16
I.
PC Technology Assessment Program
EPA National Data Processing Division
information Centers Branch - RICII, MD-35
Research Triangle Park, NC 27711
Telephone: (919) 541-0568 (FTS) 629-0568
-------
T
PC TAP CONSUMER REPORTS
From the Editor's Desk
Our study of desktop scanners is highlighted in this PC TAP Consumer Report. Like our project to
look at ways to perform graphics file transfers, this study grew along the way. Originally the objective
was to assess just the DEST scanner that was available on the EPA PC contract As you will see in
the following pages, before the project was completed information was included about thirteen scanners.
We also looked at a number of scanning front-end options, including both software-only and
hardware/software-combination products for both the IBM PC and Apple Macintosh environments.
One of the reasons why our study kept expanding is the tremendous amount of attention scanning is
currently getting in the industry. It's not so much that scanning technology is new, but the really good
systems have been so expensive that they were out of reach for the typical desktop application. In
the past couple of years, however, user demand for good scanning equipment has intensified the
competition to provide such a capability while also bringing prices down to a more realistic range.
Sounds like a familiar scenario, doesn't it? In any case, throughout the study more products kept
coming to our attention, and we felt obliged to investigate as many as possible.
A second reason for the growth of our scanner study was that we at PC TAP just got so into it!
Scanning is a fascinating topic, and the more we dug into it the more it grabbed us. Also, as we
talked with other folks about scanning we learned about more users who have scanners, and everyone
was anxious to have their device represented in our report
We think our scanning study grew for good reasons, and that our report is better for the increased
information it contains and the greater number of products it covers as a result of that growth.
Certainly the input provided by the various participants resulted in a more comprehensive report than
would otherwise have been possible. It has been an interesting report to research and write, and we
hope you enjoy it. Due to the length of the scanner report, Open Forum does not appear in this issue.
David A. Taylor
K TAP Coordinator
>-
LJJ
CD
Z
O
>- O
T)
• r>
HEADQUARTERS LIBRARY
5: ENVIRONMENTAL PROTECTION AGENCY
iz WASHINGTON. D.C. 20460
-------
DESKTOP SCANNING
Introduction
Although this study was interesting for the PC TAP staff, It has also been somewhat frustrating. The
frustration comes from the fact that scanning technology is getting so much attention in the industry
and is changing so fast that ft's hard to keep up. The more you team about the process and about
available products, the more you suspect there is that you havent uncovered yet New products keep
cropping up everywhere, and at least one of those we're reporting on has announced an upgrade to
the version we tested. But we're discovering that this is all part of the technology assessment
business'-playing catch-up with the industry.
Scanning: What'* tt All About?
As happens when you dig into most aspects of technology application, your vocabulary must be
enhanced before you can explore the world of desktop scanning, ft doesn't take long to find out that
scanning is what a scanner does. And a scanner is a device that scans. Sound like technospeak
doubletaJk? It really Isn't, it's just that the scanning process itself tent all that complicated. A camera
provides a good analogy: you use a camera to take a picture. Everyone can understand that process
and what results from it Well, a scanner takes a picture1 too. But what happens after the scanner
captures the picture can get involved.
You point a camera and snap the shutter to capture photographic images of your choice on a role of
film. And when you've exposed an entire role of film, you take it to a photo processor to have it
developed. The result is a group of photographs. Scanners capture images too, but the camera
analogy breaks down immediately after the capture takes place. That's because the scanner is a
computer peripheral, while the camera is a stand-alone device. So rather than immediately recording
the image (like the camera does on the film), the scanner simply passes it along to the host computer
for further action. From that point on, the scanner is out of the loop and the material you've scanned
is in your computer's memory waiting for you to do something with it
We don't mean to infer, however, that capturing the image in the first place is insignificant. The wide
range of capabilities and prices represented in the scanner marketplace gives some height into the
potential sophistication of these devices. While a basic desktop scanner (which may or may not be
shipped with some front-end software) can be purchased for as little as $1,000, a realistic cost estimate
to equip yourself to scan text and graphics Is roughly three times that, or about $3,000, assuming you
already have the computer to drive it all. One source lumps these document-scanning systems into
the 'low end* category that that are widely used for desktop publishing and typically sell for less than
$5,000" ("Scanner Application Primer,* Information Center, August 1989, p. 12).
•Mid-range systems' generally are more powerful, more sophisticated versions of the low-end systems.
They offer faster processing and heavier-duty equipment for a wider range of office applications, and
can cost from $5,000 to $30,000. High-end scanners are designed for round-the-clock production use.
Such systems can scan, enhance, compress, and capture images at a rate of about one per second,
and they can accommodate a variety of physical document types. High-end systems cost
approximately $100,000. Then there's a 'super high-end1 category that we wont even go into that's
in the $250,000 ballpark.
If you're shopping for a system in the mid-, high-, or super-high-end category, don't waste your time
reading further. This report is confined to the 'low-end* category of scanning equipment. 'Low-end*
in this case doesn't mean inferior; it just signifies that the equipment in this is group isn't as
-------
sophisticated or as powerful as the more expensive gear in the higher categories. Low-end scanning
equipment is well suited for office use and desktop publishing, where a very high percentage of
scanning applications are found.
Text versus Images
We said earlier that the scanning device is out of the loop after the image has been captured. What
then? Like so many things in the world today, it depends; and what It depends upon is the type of
material you're processing. In the world of desktop scanning, you scan one of two things: text or
images. Well get Into all the nuances of each of these processes later, but in general terms it all really
boils down to whether your dealing with words or pictures. (Of course, no techie worth his or her salt
would ever stoop to using such mundane terms.)
Word Processing
Let's talk about the processing of words (or, more property, scanning text) first. This is a much more
complex application than is apparent at first glance. In scanning parlance, the process of transforming
a page of typed or printed text into a machine-readable form is called optical character recognition
(OCR). Obviously, software is required to perform this process, and scanners often-but not always-
are sold without such software. So, in addition to the cost of a scanner, you might have to buy an
OCR package if you want to scan text. Basic OCR software is programmed to recognize certain
character sets. The more capable a given package is in this regard, the more expensive ft tends to
be. In practice, the scanned page is held in memory while every single character is compared with
those the software 'knows' (this process is called matrix matching) to build a file containing ASCII text
or, if your software has the capability, in the format of a word processing package. Matrix matching
is suitable for recognizing text produced on typewriters, line printers, letter-quality printers, and
(ostensibly) dot matrix printers.
A step up from matrix matching technology is required when you want to scan typeset material like
books, magazines, and other professionally printed materials that usually contain a number of different
type styles and sizes. Tackling the problem of character recognition in this environment requires more
powerful software with more sophisticated capabilities. Using a process called feature extraction, which
is based on the principle that each character has distinctive physical characteristics, such software
packages examine the features of each scanned symbol and generate the appropriate character.
Sometimes this is referred to as *ICR' (for intelligent character recognition), as opposed to the more
limited OCR process. Some of the more powerful text scanning packages include the capability to
output scanned files in the formats of various word processing packages, even to the extent of inserting
the word processor's own commands for things like italics, underscoring, bolding, centering, and
tabbing. Some also preserve multiple columns, or you may be offered the option of retaining or
Ignoring the columnar format of source documents.
To summarize this brief overview of text scanning, an OCR package is required to convert the scanned
symbols into ASCII characters or into the format of your word processing software. If you want to
exercise the latter option, before buying an OCR package be sure it supports your word processor.
It's also important to keep in mind the kind of documents you will be scanning. If your needs are
limited to typewritten or computer-generated source materials, you can save some money with an OCR
package that uses the matrix-matching system for character recognition. But if you have to process
typeset documents, be sure to get a product that performs feature extraction. Beginning on page 6,
we'll be revisiting these processes in our discussions of scanning software products.
-------
Picture
When source materials consist of pictures or graphics, in scanner terminology we are dealing with
Images. You don't need a character recognition capability to scan images; to go back to our earlier
analogy, image scanning software operates more like the camera It makes a •copy* of the scanned
page by creating a bit map of the page's contents. Remember, in bit mapping the file is made up of
dots that are turned on (black) or off (white). Just as dot-matrix text is made up of different
configurations of dot patterns, a bit-mapped graphic image is composed of millions of dots, each of
which is or is not filled ia The more dense the dot pattern, the more numerous are the variations in
shading that can be achieved. You could think of a scanned, bit-mapped image as a •snapshot1 of the
original hard-copy image.
It's important to understand these differences between text files and image files if you are concerned
with the various purposes for which scanned files are used. For example, If you want to use a scanner
to input raw text that will later be edited and imported into other documents (such as in desktop
publishing applications), you should be aware that your source materials must be decent, but not
necessarily perfect, and you need good OCR capabilities. On the other hand, if you simply want to
use scanning to save documentation (that is, text that you wont ever need to edit again) in a more
compact and convenient medium, you can process the pages of text as Images without worrying about
the quality of the source documents. The scanned images will capture the printed page like a picture,
with all its tears, handwritten notes, coffee smears, and photocopy smudges intact-and it will be quite
readable. Furthermore, there's no problem if the original document mixes text with photos, charts, and
graphs; the image processing software sees all the elements on the page as parts of a single image.
Scanner-Generated Files
There are a lot of variations in front-end software for scanning text. The most basic products perform
a simple matrix match on the scanned text and create an ASCII file, period. More sophisticated
products, which will be discussed in more detail later In this report, come with software and/or firmware
that speed up processing and have the capability to recognize a wide variety of fonts and prepare an
output file in the format of any one of a number of popular word processing packages. File sizes for
the ten test pages used in this study ranged from as little 3,500 bytes for a 'normal* page of text to
as much as 9K bytes for columns of numbers.
The Tagged Image File Format (TIFF or .TIP) file apparently is becoming the de facto standard for
scanned image files. The most significant characteristic distinguishing TIFF files from text files is that
image files can't be •edited' in the usual sense of the word. Often you can move a scanned image into
paint program or a graphics package where you can move it around, alter its size, crop it, or rotate it.
But if the file contains any text, you can't edit that text. Think of it again as a photograph. Once
you've captured a photographic image on film you can alter it in some ways-darken or lighten it,
remove parts of It, draw or write over portions of ft-in the dark room. So you can modify the end
product, but you cant really go back and change the original image.
A second, very significant, characteristic of TIFF files is their size: they are LARGE. A TIFF file
containing one 8.5 x 11-inch page easily can (and often does) exceed a megabyte. Files containing
complex graphs or pictures commonly are as large as 15 megabytes. The size of these files is a big
stumbling block for lots of folks; many of us simply don't have enough memory and/or disk space to
accommodate them. One solution, if the computer driving the scanner has enough memory to hold
the scanned image and enough hard disk space to save it temporarily, Is to immediately convert the
TIFF file to another format before saving it For example, we scanned a page, creating a TIFF file of
around a megabyte; then used the WordPerfect graphics conversion utility to create a WordPerfect
-------
graphics (.WPG) file that's only 218,000 bytes. It's highly probable that any toss of detail fn the
converted iamge will be noticeable only to the most critical observers.
Another thing to keep in mind that directly affects image file size is the resolution at which the image
is scanned. For example, the same 1 -page image scanned three times at 300 dpi. 150 dpi, and 75 dpi
resulted in TIFF files of 65,754,26,628, and 10,876 bytes, respectively. So If you can live with a tower
resolution It can save a tot of disk space and speed up processing significantly.
Before we conclude our discussion of scanner files, ft should be mentioned that disk files can be read
and processed by most scanner front-end software and then be processed like input from the scanner
itself. In other words, you can scan text or images today and save the scanned files on disk. Some
time later, you can have the scanner software read the file from disk and process the image just as if
it had come directly from the scanner. Text and images read from files created by facsimile (FAX)
software can be processed like scanned images too. The capabilities of optical character recognition
software can be particularly useful in this context This will no doubt become more clear when you
read the discussion of scanner software later in this report
Product Evaluation Methodology
In keeping with PC TAP practice, users were heavily involved in this project In addition to the TAP staff
and our colleagues in the information centers at Research Triangle Park, participants from several other
RTF offices, the Washington Information Center, Regions IV and VIM, and NEIC were active in the
study. Thirteen scanners and eight software products were evaluated.
When we devised our evaluation materials, we didn't make it easy for the scanners. Folks who knew
about our scanner study and who are interested in exploring scanning technology brought materials
for us to use. •See if you can scan this' was commonly heard. Often these source materials
represented a real challenge, because they definitely weren't 'crisp* copies. Apparently there are a
number of folks who have only hard copy (frequently mountains of it) of data they want to use, but for
which the original computer files have been tost These people see scanning as the solution to their
dilemma Just scan the hard copies to restore the data files! Certainly it's a possibility, but the
condition of the available source documents Is the key to the viability of the scanning solution. Some
of the scanners and software we've looked at are very good, but they aren't magic; even great
technology can't do a satisfactory job with 5-year-old 3rd or 4th generation photocopies of reduced
laser printer hardcopy output. But we tried.
Our evaluation packet included ten pages of source documents that we asked participants to scan on
their equipment: a typical image (the cover page from a training manual); mixed text and images (pages
from technical manuals containing text along with scientific notation, tables, and pictures); and text
pages containing typewriter-like type faces, typeset material (including multi-column pages and mixed
fonts on a page), computer-generated tables, and straight text in both a typewriter-like face and a non-
typewriter font from a PC word processing package. Study participants were asked to save the
scanned files on a floppy disk provided with the evaluation materials and return It to PC TAP. They
also completed a questionnaire on which information about their scanning hardware/software was
recorded along with their evaluation of its performance.
We have elected to discuss the various sofware products that were included in our study first An
overview of each product is presented in the next section. Then in the hardware product reviews
beginning on page 12, we will discuss each scanner's performance in terms of the front-end software
that was used for the tests.
-------
Product Reviews: Software
One should consider several key points when selecting an OCR product The first is hardware
compatibility. It doesn't matter what the software will do, if you cant run it on your system it's worthless
as far as you're concerned. Hardware compatibility turns out to be a bigger potential barrier than we
would have guessed. First you have to be sure the software will run on your computer (e.g., MS-DOS
vs. Mac). We discovered a lot more scanning products for the MS-DOS environment than for the
Macintosh user, but the gap seems to be closing. You also have to be very careful to ensure that your
scanner is supported by the software. Ml OCR products are not compatible with ail scanners. In
summary, there are three links in the scanning chain: (1) the scanner itself, (2) the computer to which
it's connected, and (3) the software for processing scanned text and images. When you're putting
together a system to do scanning, all three links must be mutually compatible.
Performance factors related to OCR software Include speed, number and types of fonts supported, text
recognition accuracy, and supported file types. The text-recognition process is an involved one, and
it can take considerable time. Essentially the software has to look at each character in the file and
make a decision about what that character is. This process is usually accomplished by comparing the
characters In the scanned file to character tables that are part of the software. Some products are
more efficient at this process than others, resulting in measurable differences in the time it takes to
'recognize* a page of text. Reported scan/recognition times for devices in our study ranged from 30
seconds for straight text to as much as six minutes for complex pages (mixed text/graphics, mufti-
columns, 'hard-to-read' copy).
We made reference earlier to two different methods of text recognition, matrix matching and feature
extraction, and pointed out the characteristics of each. OCR software may operate by either of these
methods; some products use both. The flexibility of the product is reflected in its text-recognition
capabilities, and it's important to remember that the font recognition capabilities of a package that uses
only matrix matching will be limited. You have to be careful, too, in interpreting accuracy claims of
software vendors. In their advertisements they often say their product averages "98 percent accuracy*
(or some other number approaching 100%) in tests of text recognition. This may mean that the
software was unable to even make a guess at two percent of the characters it encountered. It doesn't
necessarily mean that the software correctly identified the other 98%-Just that it thought* it did.
Finally, the number and types of files supported by an OCR package are an important measure of its
performance. Some only output ASCII files. If you want to use those files with a word processor or
desktop publishing package you have to import them and edit them accordingly. The more
sophisticated products will produce files in the format of any of a number of word processing packages.
You simply indicate the package you want to use, and a file in the proper format-including formatting
codes-is generated.
In the following paragraphs software and firmware products are presented in alphabetic order by
product name. No quality ranking should be inferred by the order in which these products are
discussed. To refresh your memory, the term firmware is applied to processing instructions or
programs that are contained on a microchip, rather than in memory or in a disk file. PC scanning
products often come with boards on which the OCR software resides on a microchip, along with
memory chips that help speed up processing.
AccuText
AccuText is an intelligent character recognition package from Xerox Imaging Systems. It processes
both images and text. According to the AccuText literature, it is capable of recognizing thousands of
-------
type styles in sizes ranging from 8- to 24-pofnt on both portrait and landscape pages." The product
is advertised to recognize typeset, laser printed, impact printed, typewritten, and tetter-quality dot matrix
printed pages, ft also has a built-in 50,000-word dictionary and context rules, so ft checks the spelling
and structure of the source materials during the character-recognition process, in addition, a user
dictionary can be created with up to 10,000 special terms that also will be checked. Text in multi-
column format can be read successfully. Output files can be in Microsoft Word RTF, Microsoft Excel,
Claris MacWrite, or text-only format
AccuText supports image scanning in resolutions of from GO to 450 dots per inch, depending on the
scanner in use. Scanned images can be output in these formats: TIFF Uncompressed, TIFF PackBits.
TIFF CCnr-3, PICT, and MacPaint A 'Preview command allows you to preview a scanned page and
identify text and image areas and specify the order in which they are to be processed. Areas that are
not to be scanned may also be identified. You also can choose whether to process text and images
separately or in one step.
We weren't able to test a production version of AccuText, but we did obtain a demonstration version
for one of our study participants who's on the market for a Macintosh OCR package. Our evaluator
didn't think the software lived up to its press, but the demo package was severely restrictive and did
not permit all AccuText's features to be tested. With regard to text recognition, results from scanning
our ten test pages were encouraging. Several did very well, but others were totally unsatisfactory.
Macintosh users who are looking for a character recognition package would probably be well advised
to explore a production version of AccuText more carefully.
Discover 7320
This software was bundled with an older Kurzweil Discover 7320 Scanner, it's a text-recognition
package that uses ICR technology to recognize typewritten, laser printer, and typeset materials. Dot
matrix hard copy is not supported. Compared to the other software products in our study this one is
older, and it has one capability that the newer ICR products no longer need: it's trainable. This means
you can literally sit down at the computer and, by describing the characteristics of the characters,
•teach* the software to recognize a font. Although we've never tried this task, everything we've read
or heard indicates that it's a long, painstaking, tedious process. More recent products like Accutext,
OmniPage, and TrueScan have the built-in capability to 'team* fonts without human intervention. The
Discover software will process scanned pages in either landscape or portrait orientation, and the original
document format is preserved. ASCII is the only supported output file format.
Although our evaluator reported reliable text recognition performance at acceptable speeds, newer and
more sophisticated products are currently available. Users interested in Kurzweil scanners and software
should be aware that Kurzweil has become part of Xerox Imaging Systems.
OCR Plus
OCR Plus is a third party product that's shipped with several manufacturers' scanners. Input we
received relative to use of OCR Plus was in conjunction with Datacopy Model 200 and 320A scanners
in the MS-DOS environment
For character recognition, this product uses matrix matching 'supplemented by a topological technique.*
Like the Discover software described above, it's trainabte when you need to scan fonts that aren't built
in to its character-recognizing repertoire. When using OCR Plus in conjunction with tests of the
Datacopy 730GS scanner, PC magazine reported performance 'on a par with other scanners' in tests
limited to 10-point Courier type. However, less success was achieved with proportional fonts and mixed
type sizes.
-------
Our evaluates comments support PC's findings. While recognition accuracy was acceptable with the
10 or 15 fonts OCR Plus •knows,1 tne best that was achieved with typeset material was •probably 75
percent accuracy.* Overall, the best text-scanning results were achieved with documents printed on
laser printers and from a 24-pin dot matrix printer with a new ribbon. Our study participant taught*
OCR Plus a font, and reported that the process took a great deal of time. During the teaching*
process, letters had to be typed In with no errors. There was no way to edit a character after it was
entered, so 9 a mistake was made It was necessary to recreate the file and start over.
OmniPage
Caere Corporation's OmniPage is a first-class product We tested version 2.0 on both a Macintosh II
and an Epson Equity III+. The MS-DOS version, which comes with software and a companion board
that takes up a full slot in the PC, is designed to run under MS Windows. In case you don't have
Windows on your computer a run-time version is bundled with OmniPage. The Mac version needs no
board or Windows interface. Just load the software; it looks and acts like the typical mouse-driven
Macintosh application.
When you install OmniPage you are given the opportunity to set a number of default options for output
files, including selection of the format for text files from a list of supported word processing packages.
However, each time you scan a document you have the option of overriding one or more defaults, so
there's plenty of flexibility built in to the product
OmniPage gives the user a lot of visual feedback, along with meaningful messages about what's going
on during the sometimes lengthy (30-120 seconds, depending on page complexity and scanner options
selected) scanningAext-recognition process. In addition, while text-recognition is going on, a small
window is opened on the screen in which characters are shown "as the software sees them,' giving the
user some feedback about how well the source document scanned, and whether using the 'lighten* or
•darken' options might improve recognition. Visitors to our information center really liked these features.
There is a quick scan option that reads a page into a temporary file that you can then look at to see
whether you want to make any adjustments to contrast or other mode settings before proceeding.
Once you're satisfied, you can select the normal scanning mode to process the current page and any
more that follow. Settings established for the first page in a multi-page operation are retained
throughout the session unless you change them.
OmniPage is an omnifont product: It can read a wide variety of fonts, and handles type sizes of from
8 to 72 points. Multiple columns are accommodated, as are source documents in both portrait and
landscape orientations. A partial page option allows you to define a specific area of the page to be
recognized, while the rest of the page is ignored. We found we could narrow this area down to a
single word with no trouble. Character recognition speed is advertised as from 40 to 115 characters
per second. Unrecognized characters can, at the user's option, be flagged.' The tilde symbol (~) is
placed above questionable characters in the text file when the "show suspects* option is turned on.
Although OmniPage supports a number of scanners, some are not included In its list of supported
devices. However, there's a way around this problem too. Simply scan a page of text into a TIFF file
(take a picture' of the page), then read the resultant file with OmniPage's "Recognize* command. The
text in the TIFF file Is 'read1 by the intelligent character recognition software, and a text file in the format
of the selected word processing package is created.
Release 2.1 of OmniPage, for Macintosh ll's and 386 and 486 PC's, was announced by Caere
Corporation in November. It will read and write both compressed and uncompressed TIFF files (version
2.0 only handles uncompressed TIFF files), and has the capability to interface with a number of
companion products like Omnispell (a spell checker) and Omnidraft (recognizes dot-matrix fonts).
-------
Although we havent had an opportunity to try release 2.1, we were very pleased with OmniPage 2.0
and can recommend it highly. More discussion of OmniPage can be found in the section describing
our tests of the Hewlett Packard ScanJet Plus scanner.
Publish Pac
Publish Pac is a desktop publishing package designed for use with IBM XT, AT, and PS/2 computers
(and compatibles) and any of the DEST PC Scan series scanners. It runs under Microsoft Windows,
and a run-time version is included with the Publish Pac software. A graphics adapter card and a
mouse are required. The documentation that's provided with the software was Judged 'better than
average' by our evaluator.
This product has a good user interface, with pull-down menus and easy-to-understand messages. Our
evaluator particularly liked Publish Pac for scanning images, as opposed to text. When you don't need
the entire contents of a source document, it's easy to identify a particular part of the image to be
processed. After the scanned image is displayed on the screen, you just use the mouse to 'draw a
box* around the selected area, and dick OK when you're satisfied. The portion of the image inside the
box is all that will be placed into the file created by Publish Pac. (mage files can be saved in any of
four formats: TIFF (.TIP), PC Paintbrush (PCX), uncompressed (.IMG), and Encapsulated PostScript
(.EPS).
The text processing capabilities of Publish Pac are somewhat limited. Only typewriter-like characters
and a few fonts from laser printers are recognized, and unrecognizable characters will be represented
in the scanned file by the pound symbol (#). In addition to standard alphanumeric characters, only
a limited number of special characters (*$#©/()&- + •=£) will be recognized. This means
Publish Pac will not be a satisfactory product for people who anticipate a requirement for scanning
typeset source materials. Text files may be saved only in ASCII format.
On the plus side, Publish Pac has the capability to scan images and text together. After the scan
operation is complete, you can create an ASCII file into which the text portion is saved, and an image
file containing the graphic portion of the page. The image file can be in any of the supported file types
listed above. Publish Pac was used in conjunction with our evaluation of the DEST PC Scan 2000 and
OEST PC Scan Plus scanners.
ReadRlght
ReadRight is an OCR product that's bundled with the Hewlett Packard ScanJet Plus and several other
manufacturers' scanners. Our copy says it's designed to be used exclusively with the ScanJet; an HP
ScanJet Interface card is required. It is compatible only with version 3.0 or higher of MS-DOS.
The documentation, which is excellent, says it's the first low-cost high-performance topological OCR
system.' Topological is another way of saying feature extraction.' This sounds great until you find out
that the only fonts that ReadRight recognizes with this technique are the typewriter-like character sets.
The result is very good character recognition accuracy, but with a limited number of fonts. Specifically,
nine 'monospaced' (all characters, including spaces, take up the same amount of horizontal space in
the line) and ten 'proportionally spaced' (characters take up unequal linear space) fonts are listed. In
the ReadRight manual, under •limitations,* ft says the product cant yet read typeset documents,
documents printed by a loose dot-matrix printer, and poor photocopies.'
ReadRight has the usual options for controlling contrast (they call it print intensity), scanning resolution,
and paper size of the source document (6.5 width, 11-14 inches length). There's also an option to
have the text file written directly to a disk file without displaying it on the screen. This option speeds
up processing, but obviously you cant monitor what's going on or check on the accuracy of text-
-------
recognition. Output files can be in any of three formats: ASCII, WordStar, or WordPerfect In addition,
there are three versions of ASCII. The first, called ASCII WP. puts only one space after each word (even
If the original had two), inserts a carriage return at the end of each line, and inserts two spaces after
a period. The second, ASCIIDTP, puts a space after every word (even If the original had two or more),
puts carriage returns only at the end of paragraphs-not at the end of each fine. Finally, ASCII
WYSIWYG reproduces the document in Ms original form using only spaces and carriage returns, but no
tabs.
In our tests of ReadFUght with our HP ScanJet Plus, we found It to be very accurate in scanning the
fonts it "knows.' However, nothing usable resulted from scanning anything but typewriter fonts during
our evaluation.
Scanning Gallery Plus
Hewlett Packard bundled this image-scanning product with the HP ScanJet Plus scanner. It runs under
Microsoft Windows, and a mouse is required. When Scanning Gallery Plus is started, two windows are
presented on the screen. The Scanner window is where the user engages in a dialog about the
scanning operation. Here you can specify the type of scanning operation you want to perform, adjust
the contrast, ask for a •preview scan, indicate that Just a partial area of the source document is to be
processed, set the dimensions of the image to be saved in the TIFF file that will be created, and name
and save those files. The second window, the Image Editor, is where you view the scanned image and
select partial areas to be processed if you wish.
Scanning Gallery Plus comes with excellent user documentation that gives detailed instructions about
the use of the various options offered on the scanning menu. Gray scales are supported, and the user
can select from among four dithering patterns for photographs. A utility is provided to convert Scanning
Gallery Plus' standard TIFF files to MSPaint, PC PaintBrush, GEM, or Encapsulated PostScript files. An
editing feature allows cutting, pasting, and cropping of all or part of an Image.
We found this product easy to learn and use. Compared to some other products that offer scanning
of partial images, it's easy in Scanning Gallery Plus to indicate the portion of the image you want to
process: you just use the mouse to draw a box around it. Repositioning and cropping of image
elements is equally quick and easy with the cut-and-paste function. For image scanning, this software
is all most users of Hewlett Packard scanners should need.
TrueScan
TrueScan was honored by Byte magazine with a 1989 'BYTE Award of Excellence.' These awards are
given to products deemed to be the year's most significant new offerings, and that are the personal
favorites of Byte editors and columnists. Additionally, PC magazine called TrueScan "a powerhouse'
product A shortcoming in the minds of Macintosh users, however, is that it's only available for MS-
DOS machines.
Like OmniPage, which we discussed earlier, Cafera Recognition Systems' Truescan comes with both
software and a board. One unique feature of Truescan, however, is that an optional 'daughtercard* that
can piggy-back onto the controller boards of some (but not all) scanners, thus saving a slot on the PC.
Performance is said to be 'about ten percent better* If you choose the daughtercard rather than a full
Calera board, which is also available.
Catera offers a whole range of scanning products. TrueScan is available in two models for PC/ATs and
PS/2's and compatibles, Model S at $2795 list and Model E at $3995 list. Model S scans at speeds
of up to 75 characters per second and reads only in portrait orientation. Model E operates at speeds
10
-------
of up to 100 cos, and handles portrait, landscape, and rotated pages (FAX images). We tested the
Model E, and found Its performance Hves up to its publicity in most cases.
TrueScan's Hst of supported scanners and word processing packages te Irnpresslve, and much too long
to Hst here. Suffice it to say that chances are excellent that your word processor will be supported; that
is, files in the word processor's format can be generated from scanned pages. The list of supported
scanners isn't quite so comprehensive, but most of the front-runners are included. A wide variety of
output formats for images is supported too, and seamed tabular information can be plugged into Excel,
Lotus, and Ouatro spreadsheets.
We tested the full-board (no daughtercard) version of TrueScan Model E with our HP LaserJet Plus.
Results were excellent Our only negative criticism relates to the user interface. We didnt find this
product as user friendly as OmniPage. There is very little visual feedback, and some of the status
messages are cryptic and not totally accurate. For example, the scanning and text-recognition
processes are two separate steps in the overall process. TrueScan presents a •Scanning" message
when the light comes on in the scanner and the process begins. That initial message remains on the
screen with no changes or status updates while the scanner light goes off and the PC goes to work
on the text-recognition process. If you understand what's going on, it's not so bad; but when we first
started using the product we were baffled by the •Scanning* status message that remained on the
screen long after the scanner obviously had finished doing its job.
Overall, it's hard to fault TrueScan's performance. According to Calera, it can recognize over 16,000
fonts (some of which must be variants of the same basic type face); character recognition accuracy with
good source materials is said to be as high as 99.9%; both text and graphics are captured in one pass
through the scanner-text goes into the user-specified word processor file, graphics into an image file;
multiple fonts and/or type sizes on the same page are handled with ease; and a built-in spell checker
flags misspelled words as well as doubtful or unreadable characters. In the low-end class, TrueScan
is the most powerful product of its kind that we've seen-but it's the most expensive too.
Summary
As is usually the case when you look at a lot of different software that is designed for the same
application, there are a lot of similarities among the products in our study. Just about all image
scanning and OCR packages currently on the market live up to their manufacturers' claims pretty well.
Certainly the ones we looked at did. They key, then, is to look at what's claimed for a given package,
and make sure it's suited to your purposes.
First ami foremost, the software must be compatible with your scanner/computer configuration. Be sure
also to check the OCR/ICR capabilities if you're planning to do a lot of text scanning, and verify that
the product will produce an output file your word processing package will handle with ease. The format
of scanned files is also important with respect your image scanning needs, so check for compatibility
of those files with software you intend to use for modifying and printing scanned images.
The ultimate criterion for many of us when it comes to selecting software for any application is cosr.
Just as the products in our study have diverse capabilities, they also represent a wide price range.
Some basic, software-only OCR products start in the $5004600 range; the True Scan Model E we
tested lists for $3995. So look at your potential scanning needs to get a handle on what functions the
software must support, find products that will run with your hardware configuration, and choose the best
you can afford from among the packages you've identified.
11
-------
Product Reviews: Hardware
Each of the scanners evaluated in our study is discussed In the following paragraphs. No ranking Is
intended by the order in which they are discussed; the devices are presented in aJphabetetic order by
product name. A table summarizing the features of all the devices we tested appears on page 20.
Scanner Devices
Before discussing the particulars of each individual scanner, It will be helpful to briefly review the
capabilities and features of scanners In general Fundamentally, they all work on the same principle:
Hght is bounced off the source document, and the scanner measures how much is reflected back. The
reflected light generates a variable amount of voltage in a senson the more Bght that comes back, the
higher the voltage. Zero voltage translates to black, and increasing voltage generates ever lighter
shades until the highest voltage yields white. One aspect in which scanners are judged is the number
of shades of gray they are capable of producing. Some are capable of only 2 levels (black and white),
while the better low-end devices can distinguish 256 shades of gray. Since the reflected light patterns
are used to create the bit maps we discussed earlier (see Picture Processing, p. 4), the greater the
device's capability for gray-scale recognition, the finer the bit-maps (and the larger the files) it will
produce.
When it's time to produce a hard copy of a scanned image, it doesn't matter how good the scanning
software Is if resolution of the output device isn't compatible with that of the image. Resolution is a
product of the density of the bit-mapped dot patterns discussed earlier; denser patterns accommodate
more shades of gray, yielding higher resolution. Excellent results can be achieved with a scanner
capable of 300-dot-per-inch (DPI) resolution and 256 shades of gray, and a 300-dpi PostScript laser
printer. It's worth mentioning again, however, that very large files are required to accommodate images
with these characteristics. Two methods are employed in software to achieve gray-scaling in scanned
images. The first is dithering, a process by which the density of the bit map is altered before the
scanned file is saved. The dithering, then, is stored with the image. The second, more recently-
developed technique is called gray scaling. In gray scaling, values representing the gray tones (rather
than bit patterns) are stored with the image. Creation of the pattern occurs when the image is sent
to the output device, so the software tailors the output to the capabilities of the printer. The TIFF files
mentioned earlier are the most common format in which gray scale images are saved.
There are two basic physical configurations for scanners, flatbed and sheetfed. Flatbed scanners
resemble photocopy machines (except that they're usually a lot smaller). You lift a cover from the glass
surface, place the source document face down on the glass, dose the cover and start the scanning
operation. The light source inside the device passes beneath the source document and does its light-
bouncing job, the image Is captured, and mat's that With sheet-fed scanners, the source document
usually is fed between rollers that 'grab* the paper and feed It through the inside of the device where
the scanning operation takes place. The source document is then returned to the operator through
an opening at the end of the device's 'paper path.* In both cases, you give the machine one page
at a time, unless you purchase an optional document feeder (available with some scanners) that
accepts a stack of documents that are automatically fed to the device one at a time. One disadvantage
of the sheetfed scanner is that you can't lay an open book on the glass to copy a page; nor will it
accept thick materials. As the name implies, sheetfed scanners accommodate one sheet of paper at
a time. Period. Sheetfed scanners also have a reputation for jamming source pages in the paper path.
Flatbed scanners, on the other hand, will handle both the open book and other heavier-than-paper
source materials.
12
-------
Handhekte
WeVe said there are two basic scanner types, but a third type deserves mention here: hand-held
scanners. We didn't include any hand-held devices in our study. Our task was defined as'evaluating
desktop scanners.* Nevertheless, during our research we came across some information about hand-
held scanners, and we considered trying to find some we could test However, the negative feedback
we got from people who already had looked at them led us to dismiss the idea. Many people feel that
good handheld scanners will be available some time, but they aren't here yet.
For our readers who are interested in hand-held devices, here's what we know in a nutshell. The
Mitsubishi Handheld Image Scanner (no text recognition capabilities at present) is currently available
at a list price of $995. An optional sheet-feed attachment, to which the scanning device quickly
attaches to make a flatbed desktop unit, costs another $260. In hand-held operation, this device is said
to do an acceptable image-scanning job, but lack of a text-scanning capability puts it out of contention
for most scanning applications we've been confronted with by EPA users.
Another hand-held image-only scanner we read about is *ScanMan* from Lotus Selects (PC version
$339 list; PS/2 version $399). ScanMan has a 4-inch scanning window that allows you to scan images
up to 4 inches wide and 11 inches long. Images can be scanned into TIFF or PC Paintbrush format,
and can be saved into TIFF, PC Paintbrush, or Microsoft Paint format.
When we were researching the literature in preparation for our scanner project, we found a somewhat
dated review (PC Magazine, Jan. 26,1988), of the Complete Hand Scanner from Complete PC Inc. The
device offers 200-dpi resolution and a 2.5x10-Inch scan path for $249. It was said to be "very good*
for black-and-white line drawings, while photographs were 'more challenging.* The front-end software
converts images to Or Halo, PC Paintbrush, and Windows formats. A 'bad manual* was pointed out
as the primary shortcoming of the product Like most other hand-hekJs, no text scanning is supported.
Along with the input provided by one of our study participants was an account of one site's local
assessment of handheld scanners from Logrtec. The device is limited to a 4.5 x 6-inch scan, and
getting it properly aligned for text scanning was said to be a problem. (Text alignment in even the
better flatbed devices is critical; the text on the printed page needs to be perpendicular to the path of
the scanning wand-except, of course, in the case of landscape orientation.) Scan speeds were said
to be slow. Our evaiuator summed up this device as "an OK toy.*
Now that you've had a quick primer on scanners, let's look at the individual devices. Evaluation data
for these narratives was provided by the participants in our scanner assessment project. For some
devices general evaluation material and user comments were received, but data on scanning the test
documents were not included. In those cases, only the available general information is summarized.
When detailed test data is included in the discussion of a particular scanner, that information was
provided by the participants who actually ran the tests on their respective equipment.
Apple Scanner
As sometimes happens with PC TAP studies, the person from whom we expected an assessment of
the Apple Scanner was unable to complete the study. However, we feel this product deserves mention
in our report, so we're including a summary here of some general information that appeared in several
trade journals.
The Apple Scanner is a flatbed model offering resolution of up to 300 dots per inch when processing
line art, photographs, and gray-scale images. One shortcoming is a limitation to only 16 shades of
13
-------
gray, however. The scanner is a SCSI device, so R works with any Mac Plus, SE, or Mac II that has
System Version 6.0 or later.
Both AppteScan and HyperScan software come with the Apple Scanner. These packages provide
for scanning (directly into HyperCard stacks if you choose), cropping, sizing, and fine-tuning images.
Source documents in both landscape and portrait orientations are accepted. For text scanning,
OmniPage supports the Apple Scanner, and is reportedly a popular ICR product among Macintosh
users. We have seen the retail price for the Apple Scanner reported at both $1609 and $1799.
CMnon Desktop Scanner
The Chinon Desktop used in our evaluation was an older model. It's a serial device, and is slow in
operation. Scanned image fites were moved into Chinon graphics software for further processing.
These images had good resolution (although images with lots of arcs and diagonal lines were avoided),
and ft was possible to size the image within the graphics package.
A recent Chinon scanner, the DS-3000, was favorably reviewed in the March 28, 1989 issue of PC
magazine. This device, classified as a •portable11 scanner, is intended for the desktop publishing market.
At $745 It comes with bundled image-processing software. For $995 you can buy the DS-3000 with
an image-scanning utility and ReadRight bundled in (see page 9 for more about ReadRight).
The DS-3000 has a unique characteristic: it's an overhead scanner. It looks a lot like a portable
overhead projector. You lay the source document on a flat bed, and the light source is housed directly
over It atop an arm extending from the back of the scanner. In the PC review of this product, they said
that because the source document is virtually unprotected from external lighting effects, all their tests
yielded images in which shadowing effects were present. They placed heavy emphasis on portability
and desktop publishing applications, but this scanner's suitability for general office use was left open
to question.
Datacopy Models 200 and 320A
We didn't receive any detailed evaluation data about the Datacopy Models 200 and 320A. These
devices were used in some local scanner tests at one of our participating locations, and the results of
those tests were forwarded to us. However, our ten standard test documents weren't included in the
local tests, and no assessment of how our tests fared on these devices was included in the information
we received.
Document scanning done on these devices was accomplished with the aid of OCR Plus, which was
discussed on page 7. Scan speed was characterized as •slow.* Reasonable text recognition accuracy
was reported when source documents were of good quality ("not a copy of a copy of a ...*) and the
font was one the OCR software could 'read.1 In some cases, the success rate of character recognition
was improved by enlarging or reducing source documents on a photocopier in an attempt to
approximate a recognizeable font. It was reported that 'almost anything that was (typeset)... could not
be satisfactorily scanned.'
Datacopy Model 830
Our evatuator with the Datacopy Model 830 scanner is a Macintosh user. Although this is an excellent
scanner (ft was rated 'best for Macintosh users' in a 1988 review by Publish! maazine), our study
participant has had difficulty finding suitable front-end software to use with the device. Although a lot
of hardware still bears the Datacopy name, the company is now a subsidiary of Xerox Imaging Systems.
14
-------
For purposes of completing our scanner evaluation, this participant used a demonstration copy of
AccuText, a Xerox Imaging Systems product for the Mac. Given the Hmitattons imposed by the demo
package, this software performed quite credibly. Some formatting problems were encountered, but this
is common In scanned documents. A lot depends on how the scanner was set up, for example
specifying multiple columns or landscape oriented material, before the operation was begun. Despite
the sometimes strange appearance of the scanned files, a careful reading of the text reveals a very high
level of character recognition accuracy.
The Datacopy Model 830/AccuText rendering of one particular page that was the 'acid test1 that most
of the OCR software in our study failed is very good (a rather poor photocopy of many columns of
numbers in a small typeface), tt would probably be acceptable for production work as a viable
alternative to re-creating the source material from scratch. As we said in our software review of
AccuText, this combination looks like a viable option. However, we recommend a more careful
evaluation with the production software before making a decision to purchase.
DEST PC Sean 2000
This device is compatible with both IBM PCs (and compatibles) and Apple Macintosh computers. Our
evaluation device was attached to an IBM PC/AT, requiring installation of a scanner interface board in
the computer. Scanning of both images and text is supported, the latter with the bundled Publish Pac
software. An automatic document feeder (ADF) is available as an option, but the device used in our
evaluation didn't have this attachment. However, with the installation of a FAX board in the computer
the scanning station has been used quite successfully as a FAX terminal as well.
The PC Scan 2000 is a sheetfed scanner, and the biggest physical complaint about the device is its
inclination toward crooked paper feeding and jams in the paper path. Frequent users claim the odds
of an improper feed are greater than those for success. Additionally, the availability of more
sophisticated text-recognition software has been accompanied by a sharp decrease in demand for this
device as a text scanner. Our tests were conducted with Publish Pac as the recognition software (see
discussion under 'Product Evaluations: Software"). Nevertheless, our evaluator did give the PC Scan
2000 high marks as an image scanner (with a caveat for the troublesome paper-feed characteristics).
OEST PC Scan Plus
The DEST PC Scan Plus came bundled with Publish Pac software by Silicon Beach. This product
doesn't read dot matrix source materials, but it does handle output from typewriters and laser printers,
along with typeset documents. Only source documents in portrait orientation are accommodated.
Our evaluator, who uses the PC Scan Plus with a Macintosh, reported better results with scanned
images than with text Accuracy of text recognition seemed to be fairly font-specific; clear copies of
some type families were scanned with low recognition accuracy. The documentation for both the
hardware and bundled software were rated 'average.' Speed of operation was said to be unacceptable.
In processing our test pages, the PC Scan Plus performed about as expected with the configuration
described above. The typewriter fonts were read fairly accurately, with the Prestige Elite coming out
better than the Courier. The typeset pages were worthless. Image processing was quite good, and
zeroing in on one field on a travel voucher was excellent
Commenting on the most-liked features of the DEST PC Scan Plus, our evaluator listed 'easy-to-use
front-end.* Things liked least included 'sheet feed limits paper size; no magazines, books, etc.; pulls
15
-------
paper crooked frequently.' It was noted that this device is several years old, and better products have
become available more recently. With this in mind, readers who are looking for a scanner to purchase
are advised to look at other products.
DEST Worklest Station Model 202
The DEST Wortctess Station is a standalone text scanner with built-in firmware that produces an ASCII
file. Typewritten character sets and output from laser printers in typewriter fonts can be read, but no
dot matrix or typeset material is recognized. The device has no graphics scanning capability, and reads
only in portrait orientation. This is an •older* scanner; It cost around $10,000 in 1985.
The biggest objection to this scanner is that, rather than (xxmecting o^rectly to the computer, it requires
an ASCII communications connection to the serial port in the PC. Robert Root, an 1C consultant at the
Washington information Center, reported to us on the DEST Model 202. His concise description of the
device is so comprehensive that we reproduce it here:
The DEST WoridoM Station Mod*) 202 to the mo* reliable, mechanically and electronically,
of the four eoannere we have. K t* atoo the aimploet to UM because of to reliable
document feeder and to two control*: • button to "nuf and • button to 'clear' If the
opsmtor wishes to cancel scanning on the current page. The only complexity results 1mm
having to know how to tell the PC software. Crosstalk XVI In our setup, how to capture and
saveaspecffic ASCII file on disk. Scanned ASCII text to transferred to the PC via a Mrial
port it 1200 bits per second during and after the page acwi. eo large stacks of page*
procsssss quickly and efficiently.
The red illumination at the •canning window permit* UM of black type to fill in preprinted
orange or red ink forme eo that only the filied-in content* of the form are read. This
feature could be a real time and error aaver for certain date entry application*, but to my
knowledge hat not been exploited during the 5 year* we have offered this scanner to EPA
headquarter* u*en. tt 1* a real thame 1nat our more modem and capable ecanner* don't
have as aimple a u*er interface. I eee little reason why they couldn't
Our ten test documents were scanned on the Worktess Station with mixed results. Understandably,
images and symbols were not property recognized. Text recognition accuracy for pages containing text
in typewriter fonts ranged from good to excellent, and photocopying the 'originate' (which were in fact
photocopies in the first place) to darken the text and thicken the characters resulted in improved
scanning accuracy in some cases. (It was noted on the evaluation form that 'copies must be high
quality for good scanning accuracy.1) We must point out, however, that tests with today's ICR software
yielded equal or greater accuracy with no 'enhancement1 of source documents.
Hewlett Packard ScanJet Plus
The PC TAP staff have access to a new HP ScanJet Plus in the information center at RTF. We did
extensive testing with this device on both an Epson Equity 111+ and a Macintosh II. In the MS-DOS
environment we used HP Scanning Gallery Pius, ReadRight, TrueScan, and OmniPage software to
process scanned files; overviews of these products are in the section of this report dealing with
software.
The ScanJet Plus is a flatbed scanner. It comes with a board that must be installed in the PC before
you can use the scanner; a board is not required for the Macintosh. For MS-DOS machines, the
scanner is shipped with two software products: the HP Scanning Gallery for image scanning, and
ReadRight, an OCR product. Scanning Gallery Plus, which runs under Microsoft Windows, handles
source images in both portrait and landscape formats. If your machine doesnt have Windows, a run-
time version comes with the HP software. Both Scanning Gallery Plus and ReadRight are mouse-
driven and easy to use. If you're anti-mouse, you can still use the keyboard to run the software.
16
-------
Details of our experiences using TrueScan and OmniPage with the ScanJet Plus may be found in the
discussions of scanning software. Retail list price for the ScanJet Plus is around $2,000.
We have been very pleased with the performance of our scanner. It's easy to operate, has no
confusing or cumbersome knobs or switches, and has been trouble-free in both the PC and Macintosh
environments. Clients in our information center have Httte trouble using it, and they invariably are
pleased with the results when they know how to use the scanning software properly. We can give an
unqualified ensorsement to this device.
On the Macintosh II we used OmniPage to scan our ten test documents on the ScanJet Pius.
OmniPage, ReadRight, and TrueScan were ait tried on an Epson Equity III+. An advantage of the Mac
over the standard AT-dass PC for scanning is that there's no need for adding a board to the computer.
Once the image has been captured, though, it's more a matter of user preference for the working
environment We didn't notice any appreciable difference in the quality of text or images that we could
identify as CPU-specific.
Kurzweil Model 4000
Uke the DEST Workiess Station, the Kurzweil Model 4000 is a "stand alone' scanner that must be
accessed through a communications interface. Reflecting another similarity to the DEST, our study
participant used Crosstalk to address the scanner. The Model 4000 is a text-only* scanner with no
capability to process images. All scanned text is saved in ASCII files. This configuration was
characterized as 'old,* and since more direct connectivity is available with newer products, the Model
4000 is not recommended for individuals currently looking for a scanner.
The success of this device in reading our test files is a testimony to Kurzweil's reputation as a leader
in the scanning industry. Even it's 'old* technology demonstrated excellent character recognition
capabilities. Although it did have trouble with a couple of pages, for the most part a very high reliability
was demonstrated. This product did an outstanding job with the 'hard to read- columns of numbers.
Kurzweil Model 7320
The Kurzweil Model 7320 with OCR software and coprocessor board was a $10,000 investment when
it was purchased in 1987. A subsequent upgrade for the OCR software in April 1989 cost an additional
$400.
The study participant who reported on this product cited no problems installing or using any part of
this configuration. However, the document feeder has been a chronic irritant after the first 25-50 hours
of service. It requires constant monitoring because of a tendency to 'grab* several pages at a time.
Another disliked feature is the 'complex, menu-driven user interface that can't be bypassed or
streamlined for simple production scanning of mufti-page text documents unless the pages feed
reliably.'
In a more positive light, the 7320 was reported to have a very flexible font-recognition capability. In
addition, the capability of fine-tuning scanner and OCR settings from on-screen menus was seen as
a significant advantage. Although the performance of this scanner was rated highly, because of its
troublesome document feeder and cumbersome user interface, our evaluator did not recommend that
others consider acquiring a similar device for their office use.
This scanner turned in a top-notch character-recognition performance in processing our test documents.
It rates among the top of the group. Regardless of font, text pages were reproduced with few or no
errors. Sometimes formatting was not totally maintained, but it wouldn't require a major effort to remedy
17
-------
the discrepancies. Uke the Kurzweil 4000 discussed above, this scanner did an excellent Job on the
columns of numbers that were troublesome to many of the other devices.
The Microtek 300A is a flatbed scanner which, according to reports in the literature, is a first-class
device. However, the report from our evaluator dkJnl include a recommendation that other users
consider acquiring one. Although some hardware fncornpatibtirties were encountered when the scanner
was acquired, no significant operational problems were reported with the device. But our evaluators1
experiences have not genenerated much enthusiasm for using ft. Scanning performance was said to
be fine,' but stow, and the scanner itself was rated 'okay.1
This was a field-tested scanner, and we have no first-hand experience with either the device or the
front-end software that was used during the testing. The image-scanning software is a product called
Eyestar Plus; SmartStait was used for text. Neither was rated satisfactory by our study participant. The
text-recognition software was said to work fine wtth simple text, but is not very flexible.1 This sounds
like what you would expect from a matrix-matching product; with fonts ft •knows* ft does an acceptable
job, but otherwise performance is limited. The image-scanning product was summarized in this way:
•works for scanning pictures as long as they are very sharp.*
When our test pages were scanned on the 300A, the results were for the most part unusable. Although
some pages (not surprisingly the typewriter-like source materials) scanned better than others, even the
best weren't suitable for production work. A good typist could re-enter the text in less time than it
would take to edit the recognition errors out of the scanned files. In some cases, practically nothing
of the source text was recognizable.
The image file that was to have contained the picture of the factory only held the title line from the page
on which the picture appeared on the original document We suspect a memory or file-storage
limitation caused this. However, when the software failed to produce a file from two of the text pages,
our study participant scanned those pages as images. This resulted in quite readable (but un-edftable)
images of the original text
Overall, our test results support the evaluator's less-than-enthusiastic endorsement of the Microtek
300A. Based on our experience to date, however, we suspect the lackluster performance may be
attributable more to the image- and text-processing software than to the scanner itself.
Microtek MSP 300G
This device was evaluated in the Macintosh environment using Microtek DA image scanning interface
and OmniPage for text scanning. The 300G is a flatbed scanner requiring a SCSI terminator when
connected to the Mac. The fact that no terminator was supplied with the device was listed as a major
shortcoming by our evaluator. Another shortcoming is the insufficient memory on the Mac for
OmniPage to operate efficiently. (Although this fent the scanner's fault, it is a consideration when
you're putting the device to practical use-a minimum of 4MB is required).
Features noted as *best liked1 include ease of use, low maintenance, better-than-average results for
jscanned graphics, and ability of the flatbed design to accommodate source documents with a variety
of physical characteristics (e.g. books, charts, maps, etc.). Our study participant said he would
recommend this configuration, with appropriate cautions with respect to memory and SCSI terminator
requirements.
18
-------
To overcome the problem of Insufficient memory to process our test pages, the evaluator used a
technique recommended by OmniPage. Text pages were saved as 300-dpi TIFF files (which,
interestingly, all were 1 megabyte in size), then the ICR software was executed against those disk files.
With this technique, the software feeds' the text from disk, rather than having R passed directly from
the scanner. The resultant test files were saved in MS Word format, which we subsequently converted
to WordPerfect
This material dearly demonstrated the suspect nature of manufacturers claims for text recognition
accuracy. With an option turned on to record recognition accuracy during the scanning process,
OmniPage reported 98-99.7% accuracy on several documents that were practically useless. As we
discussed earlier in this report (third paragraph on page 6), these percentages represent the number
of characters the software flagged as •suspect,' but dont take into account those ft incorrectly
recognized. Nevertheless, several pages had few errors, either real or imagined. The Prestige Elite text
and tfie Helvetica from a PC TAP Consumer Report page were particularly well done.
Summary
In conclusion, we'd like to add our own brief assessment of desktop scanning, gleaned through our
experiences in this study. It appears there are a number of viable scanners on the market, and from
what we've seen most of them do a reasonably good Job at what they're designed for. After all,
scanning technology has been around for a while, it just hasn't been In the desktop market until fairly
recently. So you probably can find a low-end scanner that suits your needs for a list price in the
$2,000-$4,000 range, and you can expect to get a reliable piece of equipment However, the key to
the utility of that piece of equipment is in the software you obtain to process the text or images the
scanner can capture.
A number of good software products are available, each of which has its own capabilities and
limitations. Many-but not all-scanners are sold with bundled image-processing software, and
reasonably-priced products are available for those that aren't With OCR products, though, the choices
are wider and more varied. The better ones use Intelligent character recognition techniques; these
often come with a board that has software and additional memory where the ICR processing can be
sped up without a tot of I/O to your computer. They have the power to deliver accurate text recognition
at acceptable speeds, given your source documents are reasonably clear and sharp. These products
presently list in the $2,000-$4,000 range. If your needs are more modest, there are some excellent
performers for under $1,000, but you must be prepared to accept their limitations in terms of text
recognition and processing power.
This report has included a lot of descriptive text, and rather than concluding with more narrative we
prepared a brief table. In deciding what to include in the table, we asked ourselves what a prospective
scanner buyer would be asking him- or herself. These questions came to mind:
1. What type of scanner is it?
2. Will it work with my computer?
3. What is required to connect it to my computer?
4. Does any software come with It?
5. How much does ft cost?
The table on the next page summarizes the answers to these five questions. If you want more details
about a particular scanner or software product, refer back to the text in the body of the report.
Happy scanning!
19
-------
Desktop Scanners
Summary of Feature*
Scanner
Apple
Chinon
DS-3000
Datacopy
Model 830
DEST
PC Scan
DEST PC
Scan Plus
DEST
Model 202
HP Scan-
Jet Plus
Kurzweil
4000
Kurzweil
7320
Microtek
MSF300A
Microtek
MSF300Q
Type
Flatbed
Portable,
Overhead
Flatbed
Sheetfed
Sheetfed
Sheetfed
Flatbed
Flatbed
Flatbed
Flatbed
Flatbed
•Figu?w v* from
Th«w am Ineiud*
•*1«»f nrm
Platform
Macintosh
PC
Mac,
PC
Mac,
PC
Mac,
PC
Stand-
alone
Mac,
PC
Stand-
alone
Mac,
PC
Mac,
PC
Mac,
PC
•vallabto aouroM w
i han ontu m» a mu
Bundled
Software
Image
Image
Image
Text,
Image
Text
Image
Text-only
Device
Text,
Image
Text-only
Device
None
None
None
KJ nwy not rdtod cucw
ah uukteliitt to aid In i
Available
Interface
SCSI
Aboard
SCSI,
Aboard
SCSI,
Aboard
SCSI,
tt-board
Serial
Port
SCSI, Comm,
Full board
Comm
Interface
SCSI,
Full board
SCSI,
yfc-board
SCSI,
tt-board
induct eemfMriwMu.
Price*
$1,700
$ 995
$2.900
$2,250
$2,500
$10,000
$2,000
Not
Avail.
$4,995
$3,000
$ 3,495
20
-------
List of Study Contributors
Earl Beam
EPA National Enforcement Investigations Center
Denver Federal Center
Denver, CO 80225
(303) 236-5122 (FTS) 776-5122
Denise Cheatum
EPA National Enforcement Investigations Center
Denver Federal Center
Denver, CO 80225
(303) 236-5122 (FTS) 776-5122
Angela Edwards
Health Effects Research Laboratory
EPA Environmental Research Center
Research Triangle Park, NC 27711
(919) 541-4911 (FTS) 629-4911
Don Gorton
Information Center Consultant
EPA Region VIII
999 18th Street
Denver, CO 80202
(303) 293-7546 (FTS) 330-7546
Sophia Jeffries
UNC Graduate Assistant/IC Consultant
Information Centers Branch, MD-35
EPA National Computer Center
Research Triangle Park, NC 27711
(919) 541-3661 (FTS) 629-3661
David Levesque
Information Center Consultant
EPA Washington Information Center
401 M Street SW
Washington, DC '20460
(202) 475-7413 (FTS) 475-7413
Theresa Rhyne
Information Center Consultant
Information Centers Branch, MD-35
EPA National Computer Center
Research Triangle Park, NC 27711
(919) 541-0207 (FTS) 629-0207
21
-------
List of Study Contributors
Robert Root
Information Center Consultant
EPA Washington Information Center
401 M Street SW
Washington, DC 20460
(202) 475-7413 (FTS) 475-7413
Diana Smith
Information Center Consultant
EPA Region IV
345 Couitland Street
Atlanta, GA 30365
(404)347-0509 (FTS) 257-0509
David Taylor
PC TAP Coordinator
Enformation Centers Branch, MD-35
EPA Environmental Resarch Center
Research Triangle Park, NC 27711
(919) 541-0568 (FTS) 629-0568
Dr. Betlina Veronesr
Health Effects Research Laboratory, MD-74B
EPA Environmental Research Center
Research Triangle Park, NC 27711
(919) 541-2795 (FTS) 629-2795
22
-------
How to Submit Hems for Open Forum
In keeping with the PC Technology Assessment Program's objective to have the user community
actively involved in TAP projects, users are encouraged to submit items for inclusion in future PC TAP
Consumer Reports. If you have independently investigated the capabilities of a software product or a
hardware component, we would like to hear from you. We'd also Kke you to share with others your
solutions to any problems you may have encountered with a particular application or device, and about
tricks, shortcuts, or unique applications you have devised. Although we cant promise to publish every
contribution, we will evaluate them all in terms of their potential interest to our readers and their
conformance to the spirit and intent of PC TAP.
There are no additional rules for Open Forum contributions, but here are some guidelines:
1. Contributions must be typed. Our first preference is that they
be submitted on a floppy disk in WordPerfect formal If that
Isn't possible, the next best method is to EMAIL the text to
DAVE.TAYLOR, EPA3099. The least preferable method, but still
acceptable, is to mail a typewritten article to TAP at the
address on the cover of this publication.
2. The length of your contribution will be determined somewhat by
its complexity. However, keep in mind that we're primarily
interested in the purpose of your study project and how pleased
you were with the results, not in the nitty-gritty details of
how you did it We will publish your name, address, and phone
number for those who want more details. Two to three pages
is probably a reasonable maximum length. On the other hand,
a paragraph containing a nugget that may be useful to others
would be equally welcome.
3. All material submitted by users is subject to our editing, and
you will not be given an opportunity to review the final
manuscript before publication. Sorry, you'll just have to
trust us. If we have questions or don't understand any part
of your text, we'll contact you for clarification.
We hope you enjoy PC TAP Consumer Reports, and we look forward to hearing from individuals who
have insights or discoveries to share with others. Thanks for your interest and your participation
in the PC Technology Assessment Program.
23
-------
------- |