FEB -61990
        Consumer Report
                    Desktop Scanners
     to
                       Report #5
                      January 1990
  EPA
  220/

  1990.16
I.
PC Technology Assessment Program
EPA National Data Processing Division
information Centers Branch - RICII, MD-35
Research Triangle Park, NC 27711
Telephone: (919) 541-0568 (FTS) 629-0568

-------
T
                                  PC TAP CONSUMER REPORTS
        From the Editor's Desk
        Our study of desktop scanners is highlighted in this PC TAP Consumer Report.  Like our project to
        look at ways to perform graphics file transfers, this study grew along the way. Originally the objective
        was to assess just the DEST scanner that was available on the EPA PC  contract  As you will see in
        the following pages, before the project was completed information was included about thirteen scanners.
        We also looked at a  number  of  scanning front-end options,  including  both software-only and
        hardware/software-combination products for both the IBM PC and Apple  Macintosh environments.

        One of the reasons why our study kept expanding is the tremendous amount of attention scanning is
        currently getting in the industry. It's not so much that scanning technology is new, but the really good
        systems have been so expensive that they were out of reach for the typical desktop application.  In
        the past couple of years, however,  user demand for good  scanning equipment has intensified the
        competition to provide such a capability while also bringing prices down to a more realistic range.
        Sounds  like a familiar scenario, doesn't it?   In any case, throughout the study more products kept
        coming to our attention, and we felt obliged to investigate as many as possible.

        A second  reason for the growth of our scanner study was that we at  PC TAP just got so into  it!
        Scanning is a fascinating topic, and the more we dug into  it the more it grabbed us. Also, as we
        talked with other folks about scanning we learned about more users who have scanners, and everyone
        was anxious to have their device represented in our report

        We think our scanning study grew for good  reasons,  and that our report is better for the increased
        information it contains and the greater number  of products  it covers  as a result  of that growth.
        Certainly the input provided by the various participants resulted in a more comprehensive  report than
        would otherwise have been possible. It has been an interesting report to research and write, and we
        hope you enjoy it.  Due to the length of the scanner report, Open Forum does not appear in this issue.


                                                                             David A. Taylor
                                                                             K TAP Coordinator
      >-

      LJJ
      CD

      Z
      O

   >-  O
    T)
    • r>
                                             HEADQUARTERS LIBRARY
    5:                                       ENVIRONMENTAL PROTECTION AGENCY
    iz                                       WASHINGTON. D.C. 20460

-------
                               DESKTOP SCANNING
 Introduction
Although this study was interesting for the PC TAP staff, It has also been somewhat frustrating.  The
frustration comes from the fact that scanning technology is getting so much attention in the industry
and is changing so fast that ft's hard to keep up.  The more you team about the process and about
available products, the more you suspect there is that you havent uncovered yet New products keep
cropping up everywhere, and at least one of those we're reporting on has announced an upgrade to
the version we tested.  But we're discovering that this is all  part of the technology assessment
business'-playing catch-up with the industry.

                               Scanning:  What'* tt All About?

As happens when you dig into most aspects of technology application, your vocabulary must be
enhanced before you can explore the world of desktop scanning, ft doesn't take long to find out that
scanning is what a scanner does. And a scanner is a device that scans.  Sound like technospeak
doubletaJk?  It really Isn't, it's just that the scanning process itself tent all that complicated. A camera
provides a good analogy: you use a camera to take a picture. Everyone can understand that process
and what results from it  Well, a scanner takes a picture1 too.  But what happens after the scanner
captures the picture can get involved.

You point a camera and snap the shutter to capture photographic images of your choice on a role of
film.  And when you've exposed an entire role of film, you take it to a photo processor to have it
developed.  The result is a group of photographs.  Scanners capture images too, but the camera
analogy breaks down immediately after the capture takes  place.  That's because the scanner is a
computer peripheral, while the camera is a stand-alone device. So rather than immediately recording
the image (like the  camera does on the film), the scanner simply passes it along to the host computer
for further action. From that point on, the scanner is out of the loop and the material you've scanned
is in your computer's memory waiting for you to do something with it

We don't mean to infer, however, that capturing the image in the first place is insignificant.  The wide
range of capabilities and prices represented in the scanner marketplace gives some height into the
potential sophistication of these devices.  While a basic desktop scanner (which may or may not be
shipped with some front-end software) can be purchased for as little  as $1,000, a realistic cost estimate
to equip yourself to scan text and graphics Is roughly three times that, or about $3,000,  assuming you
already have the computer to drive it all. One source  lumps these document-scanning systems into
the 'low end* category that that are widely used for desktop publishing and typically sell for less than
$5,000" ("Scanner Application Primer,* Information Center, August 1989, p. 12).

•Mid-range systems' generally are more powerful, more sophisticated versions of the low-end systems.
They offer faster processing and heavier-duty equipment for a wider range of office applications, and
can cost from $5,000 to $30,000. High-end scanners are designed for round-the-clock production use.
Such systems can  scan, enhance, compress, and  capture images at a rate of about one per second,
and  they can  accommodate  a variety of  physical  document  types.   High-end  systems  cost
approximately $100,000. Then there's a 'super high-end1 category that we wont even go into that's
in the $250,000 ballpark.

If you're shopping for a system in the mid-, high-, or super-high-end category, don't waste your time
reading further. This report is confined to the 'low-end* category of scanning  equipment.  'Low-end*
in this case doesn't mean inferior; it just  signifies that the equipment in this  is group isn't as

-------
sophisticated or as powerful as the more expensive gear in the higher categories.  Low-end scanning
equipment is well suited for  office use and desktop publishing, where a very high percentage  of
scanning applications are found.


                                     Text versus Images

We said earlier that the scanning device is out of the loop after the image  has been captured.  What
then?  Like so many things in the world today, it depends; and what It depends upon is the type  of
material you're processing. In the world of desktop scanning, you scan one of two things: text  or
images. Well get Into all the nuances of each of these processes later, but  in general terms it all really
boils down to whether your dealing with words or pictures.  (Of course, no techie worth his or her salt
would ever stoop to using such mundane terms.)


                                       Word Processing

Let's talk about the processing of words (or, more property, scanning text) first. This is a much more
complex application than is apparent at first glance. In scanning parlance, the process of transforming
a page of typed or printed text into a machine-readable form is called optical character recognition
(OCR).  Obviously, software is required to perform this process, and scanners often-but  not always-
are sold without such software. So, in addition to the cost of a scanner,  you might have to buy an
OCR package if you want to scan text.  Basic OCR software is programmed  to recognize certain
character  sets.  The more capable a given package is in this regard, the more expensive ft tends  to
be. In practice, the scanned page is  held in memory while every single character is compared with
those the  software 'knows' (this process is called matrix matching) to build  a file containing ASCII text
or, if your software has the capability,  in the  format of a word processing package.  Matrix matching
is  suitable for  recognizing text  produced on typewriters, line  printers,  letter-quality printers, and
(ostensibly) dot matrix printers.

A step up from matrix  matching technology is required when you want to scan typeset  material like
books, magazines, and other professionally printed materials that usually contain a number of different
type styles and sizes.  Tackling the problem of character recognition in this  environment requires more
powerful software with more sophisticated capabilities. Using a process called feature extraction, which
is based on the principle that each character has distinctive physical characteristics, such software
packages  examine the features of each scanned symbol  and generate the appropriate character.
Sometimes this is referred to  as *ICR'  (for intelligent character recognition), as opposed  to the more
limited OCR process.   Some  of the more powerful text scanning packages include the capability  to
output scanned files in the formats of various word processing packages, even to the extent of inserting
the word  processor's own commands for things like italics, underscoring, bolding, centering, and
tabbing.   Some also preserve multiple columns, or you may be offered  the option of  retaining  or
Ignoring the columnar format  of source documents.

To summarize this brief overview of text scanning,  an OCR package is required to convert the scanned
symbols into ASCII characters or into  the format of your word processing software.  If you want  to
exercise the latter option, before buying an OCR package be sure it supports your word processor.
It's also important to keep  in mind the kind  of documents you will  be scanning.  If your needs are
limited to typewritten or computer-generated source materials, you can save some money with an OCR
package that uses the matrix-matching system for character recognition.  But if you have to process
typeset documents, be sure to get a product that performs feature extraction. Beginning on page  6,
we'll be revisiting these processes in our discussions of scanning software products.

-------
                                     Picture

When source materials consist of pictures or graphics,  in scanner terminology we are dealing with
Images.  You don't need a character recognition capability to scan images; to go back to our earlier
analogy, image scanning software operates more like the camera It makes a •copy* of the scanned
page by creating a bit map of the page's contents.  Remember, in bit mapping the file is made up of
dots  that are turned on  (black)  or off (white).   Just  as  dot-matrix text is  made up of different
configurations of dot patterns,  a bit-mapped graphic image is composed of millions of dots, each of
which is or is not filled ia The more dense the dot pattern, the more numerous are the variations in
shading that can be achieved.  You could think of a scanned, bit-mapped image as a •snapshot1 of the
original hard-copy image.

It's important to understand these differences between text files and image files if you are concerned
with the various purposes for which scanned files are used. For example, If you want to use a scanner
to input  raw  text that will later be edited and imported into other documents (such as in desktop
publishing applications), you should be aware that your source  materials must be decent, but not
necessarily perfect, and you  need good OCR capabilities.  On the other hand, if you simply want to
use scanning to save documentation (that is, text that you wont ever need to edit again) in a more
compact and convenient medium, you can process the pages of text as Images without worrying about
the quality of the source documents. The scanned images will capture the printed page like a picture,
with all its tears,  handwritten  notes, coffee smears, and photocopy smudges intact-and it will be quite
readable. Furthermore, there's  no problem if the original document mixes text with photos, charts, and
graphs; the image  processing software sees all the elements on the page as parts of a single image.
                                  Scanner-Generated Files

There are a lot of variations in front-end software for scanning text. The most basic products perform
a simple matrix  match on the scanned text and create an ASCII file, period.  More sophisticated
products, which will be discussed in more detail later In this report, come with software and/or firmware
that speed up processing and have the capability to recognize a wide variety of fonts and prepare an
output file in the format of any one of a number of popular word processing packages. File sizes for
the ten test pages used in this study ranged from as little 3,500 bytes for a 'normal* page of text to
as much as 9K bytes for columns of numbers.

The Tagged Image File Format (TIFF or .TIP)  file apparently is becoming the de facto standard for
scanned image files.  The most significant characteristic distinguishing TIFF files from text files is that
image files can't be •edited' in the usual sense of the word.  Often you can move a scanned image into
paint program or a graphics package where you can move it around, alter its size, crop it, or rotate it.
But if the file contains any text, you can't edit that  text.  Think of it again as a photograph.  Once
you've captured a photographic  image on film you can alter it in some ways-darken or lighten it,
remove parts of It, draw or write over  portions of ft-in the dark room.  So you can modify the end
product, but you cant really go back and change the original image.

A second, very significant,  characteristic of TIFF files is their size:  they are LARGE.  A TIFF file
containing one 8.5 x  11-inch page easily can (and often  does) exceed a megabyte.   Files containing
complex graphs or pictures commonly are as large as 15 megabytes.  The  size of these files is a big
stumbling block for lots of folks; many of us simply don't have enough memory and/or disk space to
accommodate them.  One solution, if the computer driving the scanner has enough memory to hold
the scanned image and enough hard disk space to  save it temporarily,  Is to immediately  convert the
TIFF file to another format before saving it  For example, we scanned a page, creating a TIFF file of
around a megabyte;  then used the WordPerfect graphics  conversion utility to create a WordPerfect

-------
 graphics (.WPG) file that's only 218,000 bytes.   It's highly  probable that any toss of detail fn the
 converted iamge will be noticeable only to the most critical observers.

 Another thing to keep in mind that directly affects image file size is the resolution at which the image
 is scanned. For example, the same 1 -page image scanned three times at 300 dpi. 150 dpi, and 75 dpi
 resulted in TIFF files of 65,754,26,628, and  10,876 bytes, respectively. So If you can live with a tower
 resolution It can save a tot of disk space and speed up processing significantly.

 Before we conclude our discussion of scanner files, ft should be mentioned that disk files can be read
 and processed by most scanner front-end software and then be processed like input from the scanner
 itself. In other words, you can scan text or images today and save the scanned files on disk.  Some
 time later, you can have the scanner software read the file from disk and process the image just as if
 it had come directly from the scanner.  Text and images read from files  created by facsimile (FAX)
 software can be processed like scanned images too. The capabilities of optical character recognition
 software can be particularly useful in  this context  This will no doubt become more clear when you
 read the discussion of scanner software later in this report

                               Product Evaluation Methodology

 In keeping with PC TAP practice, users were heavily involved in this project  In addition to the TAP staff
 and our colleagues in the information centers at Research Triangle Park, participants from several other
 RTF offices, the Washington Information Center,  Regions IV and  VIM, and  NEIC were  active in the
 study.  Thirteen scanners and eight software products were evaluated.

 When we devised our evaluation materials, we didn't make it easy for the scanners.  Folks who knew
 about our scanner study and who are interested in exploring scanning technology brought materials
 for us to use.  •See if you can scan this' was commonly  heard.  Often these  source materials
 represented a real challenge,  because they definitely weren't 'crisp* copies.  Apparently there are a
 number of folks who have only hard copy (frequently mountains of it)  of data they want to use, but for
 which the original computer files have been tost  These people see scanning as the solution to their
 dilemma  Just scan the hard copies to restore the data files!  Certainly it's a possibility, but the
 condition of the available source documents Is the key to the viability of the scanning solution.  Some
 of the scanners and software we've  looked at  are very good, but  they aren't  magic; even great
 technology can't do a  satisfactory job with  5-year-old 3rd or 4th generation photocopies of reduced
 laser printer hardcopy output. But we tried.

 Our evaluation packet included ten pages of source documents that we asked participants to scan on
 their equipment: a typical image (the cover page from a training manual); mixed text and images (pages
 from technical manuals containing text along with scientific notation, tables, and pictures); and text
 pages containing typewriter-like type faces, typeset material (including multi-column pages and mixed
 fonts on a page), computer-generated  tables, and straight text in both a typewriter-like face and a non-
typewriter font from a  PC word processing package.  Study participants were  asked to save the
 scanned files on a floppy disk provided with the  evaluation materials  and  return It to PC TAP. They
 also completed a questionnaire on which information about  their scanning hardware/software was
 recorded along with their evaluation of its performance.

We have elected to discuss the various sofware products that were included  in our study first  An
 overview of each product  is presented  in the next section.  Then in the  hardware product reviews
beginning on page 12,  we will discuss each scanner's performance in terms of the front-end software
that was used for the tests.

-------
 Product Reviews:  Software
 One  should  consider several key points when selecting an OCR  product  The first is hardware
 compatibility. It doesn't matter what the software will do, if you cant run it on your system it's worthless
 as far as you're concerned.  Hardware compatibility turns out to be a bigger potential barrier than we
 would have guessed.  First you have to be sure the software will run on your computer (e.g., MS-DOS
 vs. Mac).   We discovered a  lot more scanning products for the MS-DOS environment than  for the
 Macintosh user, but the gap seems to be closing.  You also have to be very careful to ensure that your
 scanner is supported by the software.  Ml OCR products are not compatible with ail scanners.  In
 summary, there are three links in the scanning chain: (1) the scanner itself, (2) the computer to which
 it's connected, and (3) the software for processing scanned text and images.  When you're  putting
 together a  system to do scanning, all three links must be mutually compatible.

 Performance factors related to OCR software Include speed, number and types of fonts supported, text
 recognition accuracy, and supported file  types. The text-recognition process is an involved one, and
 it can take considerable time. Essentially the software has to look at each character in the file and
 make a decision about what that character is. This process is usually accomplished by comparing the
 characters  In the  scanned file to character tables that are part of the software.  Some products are
 more efficient at this process than others, resulting in measurable differences in the time it takes to
 'recognize* a page of text. Reported scan/recognition times for devices in our study ranged from 30
 seconds for straight text to as much as six  minutes for complex pages (mixed text/graphics, mufti-
 columns, 'hard-to-read' copy).

 We made reference earlier to two different methods of text recognition, matrix matching and  feature
 extraction, and pointed out the characteristics of each.  OCR software may operate by either of these
 methods; some products use both.  The flexibility of the product is reflected  in its text-recognition
 capabilities, and it's important to remember that the font recognition capabilities of a package that uses
 only matrix matching will be limited.  You have to be careful, too, in interpreting  accuracy claims of
 software vendors.   In their advertisements they often say their product averages "98 percent accuracy*
 (or some other number approaching 100%) in  tests of text recognition.  This may  mean that the
 software was unable to even  make a guess at two percent of the characters it encountered.  It doesn't
 necessarily mean  that the software correctly identified the other 98%-Just that it thought* it did.

 Finally, the number and types of files supported by an OCR package are an important measure of its
 performance.  Some only output ASCII files.  If you want to use those files with a word processor or
 desktop publishing package you  have to  import them  and  edit them accordingly.  The more
 sophisticated products will produce files in the format of any of a number of word processing packages.
 You simply indicate the package you want to use, and a file in the proper format-including formatting
 codes-is generated.

 In the following paragraphs  software and firmware products are presented in alphabetic order by
 product name.  No quality ranking should be  inferred by  the order in which these products are
 discussed.   To refresh  your memory, the term firmware is applied to  processing  instructions or
 programs that are contained  on a  microchip, rather than in memory or in a disk file.  PC scanning
 products often come with boards on which  the OCR software resides on a microchip, along with
 memory chips that help speed up processing.

                                          AccuText

AccuText is an intelligent character recognition package from Xerox Imaging Systems.  It processes
 both images and text.  According to the AccuText literature, it is capable of recognizing thousands of

-------
type styles in sizes ranging from 8- to 24-pofnt on both portrait and landscape pages." The product
is advertised to recognize typeset, laser printed, impact printed, typewritten, and tetter-quality dot matrix
printed pages,  ft also has a built-in 50,000-word dictionary and context rules, so ft checks the spelling
and structure of the source materials during the character-recognition process,  in addition, a user
dictionary can be created with up to 10,000 special terms that also will  be checked. Text in multi-
column format can be read successfully.  Output files can be in Microsoft Word RTF,  Microsoft Excel,
Claris MacWrite, or text-only format

AccuText supports image scanning in resolutions of from GO to 450 dots  per inch, depending on the
scanner in use.  Scanned images can be output in these formats: TIFF Uncompressed, TIFF PackBits.
TIFF CCnr-3, PICT, and MacPaint A 'Preview command allows you to preview a scanned page and
identify text and image areas and specify the order in which they are to be processed.  Areas that are
not to be scanned may  also be identified.  You also can choose whether  to process text and images
separately  or in one step.

We weren't able to test  a production version of AccuText,  but we did obtain a demonstration version
for one of our study participants who's on the  market for a Macintosh OCR package. Our evaluator
didn't think the software lived up to its press, but the demo package was severely restrictive and did
not permit  all AccuText's features to be tested.  With regard to text recognition, results from scanning
our ten test pages were encouraging.   Several did very well, but  others were totally unsatisfactory.
Macintosh users who are looking for a character recognition package would probably  be well advised
to explore a production  version of AccuText more carefully.

                                        Discover 7320

This software was bundled with an  older Kurzweil Discover 7320 Scanner,  it's a  text-recognition
package that uses ICR technology to recognize typewritten, laser printer,  and typeset materials. Dot
matrix hard copy  is not supported.  Compared to the other software products in our study this one is
older, and it has one capability that the newer ICR products no longer need: it's trainable. This means
you  can  literally sit down at the computer and, by describing the characteristics of  the  characters,
•teach* the software to recognize a font.  Although we've never tried this  task, everything  we've read
or heard indicates that it's a long, painstaking,  tedious process.  More recent products like Accutext,
OmniPage, and TrueScan have the built-in capability to 'team* fonts without human intervention.  The
Discover software will process scanned pages in either landscape or  portrait orientation,  and  the original
document format  is preserved. ASCII is the only supported output file format.

Although our evaluator reported reliable text recognition performance at acceptable speeds, newer and
more sophisticated products are currently available. Users interested in Kurzweil scanners and software
should be aware that Kurzweil  has become part of Xerox Imaging Systems.

                                         OCR Plus

OCR Plus is a  third party product that's shipped with several manufacturers' scanners.  Input we
received  relative to use of OCR Plus was in conjunction with Datacopy Model 200 and 320A scanners
in the MS-DOS  environment

For character recognition, this product uses matrix matching 'supplemented by a topological technique.*
Like the Discover software described above, it's  trainabte when you  need to scan fonts that aren't built
in to  its  character-recognizing repertoire.  When using OCR Plus in conjunction with tests of the
Datacopy 730GS scanner, PC magazine  reported performance 'on a par with other scanners' in tests
limited to 10-point Courier type. However, less success was achieved with proportional fonts and mixed
type sizes.

-------
Our evaluates comments support PC's findings.  While recognition accuracy was acceptable with the
10 or 15 fonts OCR Plus •knows,1 tne best that was achieved with typeset material was •probably 75
percent accuracy.*  Overall, the best text-scanning results were achieved with documents printed on
laser printers and from a 24-pin dot matrix printer with a new ribbon.  Our study participant taught*
OCR Plus a font, and reported that the process took a great deal of time.  During the teaching*
process, letters had to be typed In with no errors.  There was no way to edit a character after it was
entered, so 9 a mistake was made It was necessary to recreate the file and start over.

                                         OmniPage

Caere Corporation's OmniPage is a first-class product We tested version 2.0 on both a Macintosh II
and an  Epson Equity III+.  The MS-DOS version, which comes with software and a companion board
that takes up a full  slot in the  PC, is designed to run under MS Windows.  In case you don't have
Windows on your computer a run-time version is bundled with OmniPage. The Mac version needs no
board or Windows interface. Just load the software; it looks and acts like the typical mouse-driven
Macintosh application.

When you install OmniPage you are given the opportunity to set a number of default options for output
files, including selection of the format for text files from a list of supported word processing packages.
However, each time  you scan a document you have the option of overriding one or more defaults, so
there's plenty of flexibility built in to the product

OmniPage gives the user a lot of visual feedback, along with meaningful messages about what's going
on during the sometimes lengthy (30-120 seconds, depending on page complexity and scanner options
selected) scanningAext-recognition process.   In addition, while text-recognition is going  on,  a small
window is opened on the screen in which characters are shown "as the software sees them,' giving the
user some feedback about how well the source document scanned, and whether using the 'lighten* or
•darken' options might improve recognition. Visitors to our information center really liked these features.
There is a quick scan option that reads a page into a temporary file that you can then look at to see
whether you want to make any adjustments to contrast or other mode settings before  proceeding.
Once you're satisfied, you can select the normal scanning mode to process the current page and any
more  that follow.  Settings established  for the first  page in  a multi-page  operation are retained
throughout the session unless you change them.

OmniPage is an omnifont product: It can read a wide variety of fonts, and handles type sizes of from
8 to 72 points. Multiple columns are accommodated,  as are source documents in both  portrait  and
landscape orientations.  A  partial page option allows you to define a specific  area of the page to be
recognized, while the rest  of the page is ignored.  We found  we could narrow this area down to a
single word with no  trouble. Character recognition speed is advertised as from 40 to 115 characters
per second.  Unrecognized characters can, at the user's option, be flagged.'  The tilde symbol (~)  is
placed above questionable characters in the text file when the "show suspects* option is turned on.

Although OmniPage supports a number of scanners,  some are not included In its list of supported
devices. However, there's a way around this problem too. Simply scan a page of text into a TIFF file
(take a picture' of the page), then read the resultant file with OmniPage's "Recognize* command.  The
text in the TIFF file Is 'read1 by the intelligent character recognition software, and a text file in the format
of the selected word processing package is  created.

Release 2.1  of OmniPage, for Macintosh ll's and 386 and 486 PC's,  was announced by Caere
Corporation in November. It will read and write both compressed and uncompressed TIFF files (version
2.0 only handles uncompressed TIFF files), and has the capability to  interface with a number of
companion products like Omnispell (a spell checker) and Omnidraft (recognizes dot-matrix fonts).

-------
Although we havent had an opportunity to try release 2.1, we were very pleased with OmniPage 2.0
and can recommend it highly. More discussion of OmniPage can be found in the section describing
our tests of the Hewlett Packard ScanJet Plus scanner.

                                        Publish Pac

Publish Pac is a desktop publishing package designed for use with IBM XT, AT, and PS/2 computers
(and compatibles) and any of the DEST PC Scan series scanners.  It runs under Microsoft Windows,
and a run-time version is included with the Publish Pac software.  A graphics adapter card and a
mouse are required.  The documentation that's provided with the software was Judged 'better than
average' by our evaluator.

This product has a good user interface, with pull-down menus and easy-to-understand messages.  Our
evaluator particularly liked Publish Pac for scanning images, as opposed to text.  When you don't need
the entire contents of  a source document, it's easy to  identify a particular part of the image to be
processed.  After the scanned image is displayed on the screen, you just use the mouse to 'draw a
box* around the selected area, and dick OK when you're satisfied.  The portion of the image inside the
box is all that will be placed into the file created by Publish Pac. (mage files can be saved in any of
four formats: TIFF  (.TIP), PC Paintbrush (PCX), uncompressed (.IMG), and Encapsulated PostScript
(.EPS).

The text processing capabilities of Publish Pac are somewhat limited.  Only typewriter-like characters
and a few fonts from laser printers are recognized, and unrecognizable characters will be represented
in the scanned file by the pound  symbol (#).  In addition to standard alphanumeric characters, only
a limited number of special characters (*$#©/()&- +  •=£)  will be recognized.   This means
Publish Pac will not be a satisfactory  product for people who anticipate a requirement for scanning
typeset source materials. Text files may be saved only in ASCII format.

On the plus side, Publish Pac has the capability to scan images and text together.  After the scan
operation is complete, you can create an ASCII file into which the text portion is saved, and an image
file containing the graphic portion of the page.  The image file can be in any of the supported file types
listed above. Publish Pac was used in  conjunction with our evaluation of the DEST PC Scan 2000 and
OEST PC Scan Plus scanners.
                                         ReadRlght

ReadRight is an OCR product that's bundled with the Hewlett Packard ScanJet Plus and several other
manufacturers' scanners. Our copy says it's designed to be used exclusively with the ScanJet; an HP
ScanJet Interface card is required. It is compatible only with version 3.0 or higher of MS-DOS.

The documentation, which is excellent, says it's the first low-cost high-performance topological OCR
system.' Topological is another way of saying feature extraction.' This sounds great until you find out
that the only fonts that  ReadRight  recognizes with this technique are the typewriter-like character sets.
The result is very good  character recognition accuracy, but with a limited number of fonts.  Specifically,
nine 'monospaced' (all characters, including spaces, take up the same amount of horizontal space in
the line) and ten 'proportionally spaced' (characters take up unequal linear space) fonts are listed. In
the ReadRight manual, under •limitations,* ft says  the product cant yet  read typeset documents,
documents printed by a loose dot-matrix printer, and poor photocopies.'

ReadRight has the usual options for controlling contrast (they call it print intensity), scanning resolution,
and paper size of the source document (6.5 width, 11-14 inches length).   There's also an option to
have the text file written directly to a disk file without displaying it on the screen. This option speeds
up processing,  but  obviously you cant monitor what's going  on or check on the accuracy of text-

-------
 recognition. Output files can be in any of three formats: ASCII, WordStar, or WordPerfect  In addition,
 there are three versions of ASCII. The first, called ASCII WP. puts only one space after each word (even
 If the original had two), inserts a carriage return at the end of each line, and inserts two spaces after
 a period. The second, ASCIIDTP, puts a space after every word (even If the original had two or more),
 puts  carriage returns only at  the end of paragraphs-not at the end of each  fine.   Finally, ASCII
 WYSIWYG reproduces the document in Ms original form using only spaces and carriage returns, but no
 tabs.

 In our tests of ReadFUght with  our HP ScanJet Plus, we found It to be very accurate in scanning the
fonts  it "knows.'  However, nothing usable resulted from scanning anything but typewriter fonts during
 our evaluation.

                                    Scanning Gallery Plus

Hewlett Packard bundled this image-scanning product with the HP ScanJet Plus scanner. It runs under
Microsoft Windows, and a mouse is required. When Scanning Gallery Plus is started, two windows are
presented on the screen.  The Scanner window is where the user engages in a dialog about the
scanning operation.  Here you  can specify the type of scanning operation you want to perform, adjust
the contrast, ask for a •preview scan, indicate that Just a partial area of the source document is to be
processed, set the dimensions  of the image to be  saved in the TIFF file that will be created, and name
and save those files.  The second window, the Image Editor, is where you view the scanned image and
select partial areas to be processed if you wish.

Scanning Gallery Plus comes with excellent user documentation that gives detailed instructions about
the use of the various options offered on the scanning menu.  Gray  scales are supported, and the user
can select from among four dithering patterns for photographs. A utility is provided to convert Scanning
Gallery Plus' standard TIFF files to MSPaint, PC PaintBrush, GEM, or Encapsulated PostScript files. An
editing feature allows cutting, pasting, and cropping of all or part of an Image.

We found this product easy to  learn and use.  Compared to  some other products that offer scanning
of partial images, it's easy in Scanning Gallery Plus to indicate the portion of the image you want to
process: you just use the mouse to draw a box around it.  Repositioning and cropping of  image
elements is equally quick and easy with the cut-and-paste function.   For image scanning, this software
is all most users of Hewlett Packard scanners should need.

                                         TrueScan

TrueScan was honored by Byte magazine with a 1989 'BYTE Award of Excellence.'  These awards are
given to products deemed to be the year's most  significant new offerings, and that are the personal
favorites of Byte  editors and columnists.  Additionally, PC magazine called TrueScan "a powerhouse'
product  A shortcoming in the minds of Macintosh users, however, is that it's only available for MS-
DOS machines.

Like OmniPage, which we discussed earlier,  Cafera Recognition Systems' Truescan comes with both
software and a board. One unique feature of Truescan, however, is that an optional 'daughtercard* that
can piggy-back onto the controller boards of some (but not all) scanners, thus saving a slot on the PC.
Performance is said to be 'about ten percent better* If you choose  the daughtercard rather than a full
Calera board, which is also available.

Catera offers a whole range of scanning products. TrueScan is available in two models for PC/ATs and
PS/2's and compatibles, Model S at $2795 list and Model E at $3995 list.  Model S scans at speeds
of up  to 75 characters per second and reads only in portrait orientation.  Model E operates at speeds
                                             10

-------
of up to 100 cos, and handles portrait, landscape, and rotated pages (FAX images).  We tested the
Model E, and found Its performance Hves up to its publicity in most cases.

TrueScan's Hst of supported scanners and word processing packages te Irnpresslve, and much too long
to Hst here. Suffice it to say that chances are excellent that your word processor will be supported; that
is, files in the word processor's format can be generated  from scanned pages.  The list of supported
scanners isn't quite so comprehensive, but most of the front-runners are included. A wide variety of
output formats for images is supported too, and seamed tabular information can be plugged into Excel,
Lotus, and Ouatro spreadsheets.

We tested the full-board (no daughtercard) version of TrueScan Model E with our HP LaserJet Plus.
Results were excellent  Our only negative criticism relates to the user interface.  We didnt find this
product as user friendly as OmniPage.  There is very  little visual feedback, and some of the  status
messages  are cryptic and not totally accurate.   For example, the scanning and text-recognition
processes are two separate steps in the overall process.  TrueScan presents a •Scanning" message
when the light comes  on in the scanner and the process begins.  That initial message remains on the
screen with no changes or status updates while the scanner light goes off and the PC goes to work
on the text-recognition process.  If you understand what's going on, it's not so bad; but when we first
started using the product we were baffled by the •Scanning* status message that remained  on the
screen long after the scanner obviously had finished doing its job.

Overall, it's hard to fault TrueScan's performance.  According to Calera, it can recognize over 16,000
fonts (some of which must be variants of the same basic type face); character recognition accuracy with
good source materials is said to be as high as 99.9%; both text and graphics are captured in one pass
through the scanner-text goes into the user-specified word processor file, graphics into an image file;
multiple fonts and/or type sizes on the same page are handled with ease; and a built-in spell checker
flags misspelled words as well as doubtful or unreadable  characters.  In the low-end class, TrueScan
is the most powerful product of its kind that we've seen-but it's the most expensive too.

                                          Summary

As is usually the case when you look at a lot of different software  that is  designed for the same
application, there are  a lot of similarities among the products in our study. Just about all  image
scanning and OCR packages currently on the market live up to their manufacturers' claims pretty well.
Certainly the ones we  looked at did.  They key, then, is  to look at what's claimed for a given package,
and  make sure it's suited to your purposes.

First ami foremost, the software must be compatible with your scanner/computer configuration.  Be sure
also  to check the OCR/ICR capabilities if you're planning  to do a lot of text scanning, and verify that
the product will produce an output file your word processing package will handle with ease.  The format
of scanned files is also important with respect your image scanning needs, so check for compatibility
of those files with software you intend  to use for modifying and printing scanned images.

The  ultimate criterion for many of us when it comes to selecting software for any application is cosr.
Just  as the products in our study have diverse capabilities, they also represent a wide price range.
Some basic, software-only OCR products start in the $5004600 range; the True Scan Model E we
tested lists for $3995.  So look at your potential scanning needs to get a handle on what functions the
software must support, find products that will run with your hardware configuration, and choose the best
you can afford from among the packages you've identified.
                                             11

-------
 Product Reviews:  Hardware
Each of the scanners evaluated in our study is discussed In the following paragraphs. No ranking Is
intended by the order in which they are discussed; the devices are presented in aJphabetetic order by
product name.  A table summarizing the features of all the devices we tested appears on page 20.

                                      Scanner Devices

Before discussing the particulars of each individual scanner, It will be helpful to briefly review the
capabilities and features of scanners In general  Fundamentally, they all work on the same principle:
Hght is bounced off the source document, and the scanner measures how much is reflected back. The
reflected light generates a variable amount of voltage in a senson the more Bght that comes back, the
higher the voltage.  Zero voltage translates to black, and increasing voltage  generates ever lighter
shades until the highest voltage yields white. One aspect in which scanners are judged is the number
of shades of gray they are capable of producing. Some are capable of only 2 levels (black and white),
while the better low-end devices can distinguish 256 shades of gray. Since the reflected light patterns
are used to create the bit maps we discussed earlier (see Picture Processing, p. 4), the greater the
device's capability for gray-scale recognition,  the finer the bit-maps (and the larger the files) it will
produce.

When it's time to produce a hard copy of a scanned image, it doesn't matter how good the scanning
software  Is if resolution of the output device isn't compatible with that of the image.  Resolution is a
product of the density of the bit-mapped dot patterns discussed earlier; denser patterns accommodate
more shades of gray, yielding higher resolution.  Excellent results can be achieved with  a scanner
capable of 300-dot-per-inch (DPI) resolution and 256 shades of gray, and a 300-dpi PostScript laser
printer. It's worth mentioning again, however, that very large files are required to accommodate images
with these characteristics. Two methods are employed in software to achieve gray-scaling in scanned
images.  The first is dithering, a process by which the  density of the bit map is altered before the
scanned  file is  saved.  The  dithering, then, is stored with the image.  The second, more recently-
developed technique is called gray scaling.  In  gray scaling, values representing the gray tones (rather
than bit patterns) are stored  with the image.  Creation of the pattern occurs when the image is sent
to the output device, so the software tailors the output to the capabilities of the printer. The TIFF files
mentioned earlier are the most common format in which gray scale images are saved.

There are two basic physical configurations for scanners, flatbed and sheetfed.  Flatbed scanners
resemble photocopy machines (except that they're usually a lot smaller). You lift a cover from the glass
surface, place the source document face down on the glass, dose the cover and start the scanning
operation. The  light source inside the device passes beneath the source document and does its light-
bouncing job, the image Is captured, and mat's that With sheet-fed scanners, the source document
usually is fed between rollers that 'grab* the paper and feed It through the inside of the device where
the scanning operation takes place. The source document is then returned to the operator through
an opening at the end of the device's 'paper path.*  In both cases, you give the machine one page
at a time, unless you purchase an optional document  feeder  (available with some scanners) that
accepts a stack of documents that are automatically fed to the device one at a time. One disadvantage
of the sheetfed scanner is that you can't lay an open book on the glass to copy a page; nor will it
accept thick materials. As the name implies, sheetfed scanners accommodate one sheet of paper at
a time. Period.  Sheetfed scanners also have a  reputation for jamming source pages in the paper path.
Flatbed scanners, on the other  hand, will handle both the open book and other heavier-than-paper
source materials.
                                             12

-------
                                         Handhekte

WeVe said there are two basic scanner types, but a third type deserves mention here:  hand-held
scanners. We didn't include any hand-held devices in our study. Our task was defined as'evaluating
desktop scanners.*  Nevertheless, during our research we came across some information about hand-
held scanners, and we considered trying to find some we could test However, the negative feedback
we got from people who already had looked at them led us to dismiss the idea. Many people feel that
good handheld scanners will be available some time, but they aren't here yet.

For our readers who are interested in hand-held devices, here's what we know in a nutshell.  The
Mitsubishi Handheld Image Scanner (no text recognition capabilities at present) is currently available
at a list price of $995.   An optional sheet-feed attachment, to which the  scanning device quickly
attaches to make a flatbed desktop unit, costs another $260. In hand-held operation, this device is said
to do an acceptable image-scanning job, but lack of a text-scanning capability puts it out of contention
for most scanning applications we've been confronted with by EPA users.

Another hand-held image-only scanner we read about is *ScanMan* from Lotus Selects (PC version
$339 list; PS/2 version $399).  ScanMan has a 4-inch scanning window that allows you to scan images
up to 4 inches wide and 11 inches long.  Images can be scanned into TIFF  or PC Paintbrush format,
and can be saved into TIFF, PC Paintbrush, or Microsoft Paint format.

When we were researching the literature  in preparation for our scanner project, we found a somewhat
dated review (PC Magazine, Jan. 26,1988), of the Complete Hand Scanner from Complete PC Inc. The
device offers 200-dpi resolution and a 2.5x10-Inch scan path for $249. It was said to be "very good*
for black-and-white line drawings, while photographs were 'more challenging.* The front-end software
converts images to Or Halo, PC Paintbrush, and Windows  formats.   A 'bad manual* was  pointed out
as the primary shortcoming of the product Like most other hand-hekJs, no text scanning is supported.

Along  with the  input provided by one of our  study  participants was an account of one site's local
assessment of handheld scanners  from  Logrtec.  The device is limited to a 4.5 x 6-inch scan, and
getting it properly aligned for text scanning was said to be a problem. (Text alignment  in even the
better flatbed devices is  critical; the text on the printed page needs to be perpendicular to the path of
the scanning wand-except, of course, in the case of landscape orientation.)  Scan speeds were said
to be slow.  Our evaiuator summed up this device  as "an OK toy.*

Now that you've had a quick primer on scanners, let's look at the individual devices.  Evaluation data
for  these narratives was provided by the participants in  our scanner assessment project.  For some
devices general evaluation material and user comments were received, but data on scanning the test
documents were not included. In those cases, only the  available general information is summarized.
When  detailed test data is included in the  discussion of a particular scanner, that information was
provided by the participants who actually ran the tests on their respective equipment.

                                       Apple Scanner

As sometimes happens with PC TAP studies, the person from whom we expected an assessment of
the Apple Scanner was unable to complete the study.  However, we feel this product deserves mention
in our report, so we're including a summary here of some general information that appeared in several
trade journals.

The Apple Scanner is a flatbed model offering resolution  of up to 300 dots per inch when processing
line art, photographs, and gray-scale images.  One shortcoming is a limitation to only 16 shades of

                                            13

-------
gray, however.  The scanner is a SCSI device, so R works with any Mac Plus, SE, or Mac II that has
System Version 6.0 or later.

Both AppteScan and HyperScan software come with the Apple Scanner.   These packages provide
for scanning (directly into HyperCard stacks if you choose), cropping, sizing, and fine-tuning images.
Source documents in both landscape  and portrait orientations are accepted.  For text  scanning,
OmniPage supports the Apple Scanner, and is reportedly a  popular ICR product among Macintosh
users. We have seen the retail price for the Apple Scanner reported at both $1609 and $1799.

                                  CMnon Desktop Scanner

The  Chinon Desktop used in our evaluation was an older model.  It's a serial device, and is slow in
operation.  Scanned image fites were moved into Chinon graphics software for further processing.
These images had good resolution (although images with lots of arcs and diagonal lines were avoided),
and  ft was possible to size the image within the graphics package.

A recent Chinon scanner, the DS-3000, was favorably reviewed in the March 28, 1989 issue of PC
magazine. This device, classified as a •portable11 scanner, is intended for the desktop publishing market.
At $745 It comes with bundled image-processing software. For $995 you can buy the  DS-3000 with
an image-scanning utility and ReadRight bundled in (see page 9 for more about ReadRight).

The  DS-3000 has a unique characteristic:  it's  an overhead  scanner.  It looks a lot like a portable
overhead projector. You lay the source document on a flat bed, and the light source is housed directly
over It atop an arm extending from the back of the scanner. In the PC review of this product, they said
that  because the source document  is virtually unprotected from external lighting effects, all their tests
yielded images in which shadowing effects were present.  They placed heavy emphasis on portability
and  desktop publishing applications, but this scanner's suitability for general office use was left open
to question.

                              Datacopy Models 200 and  320A

We didn't receive any detailed evaluation data about the Datacopy Models 200 and 320A.  These
devices were used in some local scanner tests at one of our participating locations, and the results of
those tests were forwarded to us. However, our ten standard test documents weren't included in the
local tests, and no assessment of how our tests fared on these devices was included in the information
we received.

Document scanning done on these devices was accomplished with the aid of OCR Plus, which was
discussed on page 7.  Scan speed was characterized as •slow.* Reasonable text recognition accuracy
was  reported when source documents were of good quality ("not a copy  of a copy  of a ...*) and the
font was one the OCR software could 'read.1  In  some cases, the success rate of character recognition
was  improved by  enlarging or reducing source documents  on  a photocopier in an attempt to
approximate a recognizeable font. It was reported that 'almost anything that was (typeset)... could not
be satisfactorily scanned.'

                                    Datacopy Model 830

Our evatuator with the Datacopy Model 830 scanner is a Macintosh user. Although this is an excellent
scanner (ft was rated 'best for Macintosh users' in a 1988  review by Publish! maazine),  our study
participant has had difficulty finding  suitable front-end software to use with the device. Although a lot
of hardware still bears the Datacopy  name, the company is now a subsidiary of Xerox Imaging Systems.
                                            14

-------
For purposes  of completing our scanner evaluation, this participant used a demonstration copy of
AccuText, a Xerox Imaging Systems product for the Mac.  Given the Hmitattons imposed by the demo
package, this software performed quite credibly.  Some formatting problems were encountered, but this
is common In scanned documents.  A lot depends on how the scanner was set up, for example
specifying multiple columns or landscape oriented material, before the operation was begun.  Despite
the sometimes strange appearance of the scanned files, a careful reading of the text reveals a very high
level of character recognition accuracy.

The Datacopy  Model 830/AccuText rendering of one particular page that was the 'acid test1 that most
of the OCR software in our study failed is very good (a rather poor photocopy of many columns of
numbers in a small typeface),  tt would probably  be acceptable for production  work  as a viable
alternative  to re-creating the source  material from scratch.  As we said in  our software review of
AccuText, this combination looks like a viable option.   However, we recommend a more careful
evaluation with the production software before making a decision to purchase.


                                    DEST PC Sean 2000

This device is compatible with both IBM PCs (and compatibles) and Apple Macintosh computers.  Our
evaluation device was attached to an IBM PC/AT, requiring installation of a scanner interface board in
the computer.  Scanning of both images and text is supported, the latter with the bundled Publish Pac
software.  An automatic document feeder (ADF) is available as an option, but the device used in our
evaluation didn't have this attachment. However, with the installation  of a FAX board in the computer
the scanning station has been used quite successfully as a FAX terminal as well.

The PC Scan 2000 is a sheetfed scanner, and the biggest physical complaint about the device is its
inclination toward crooked paper feeding and jams in the paper path.  Frequent users claim the odds
of an improper feed are  greater than  those  for success.   Additionally, the availability of more
sophisticated text-recognition software has been accompanied by a sharp decrease in demand for this
device as a text scanner. Our tests were conducted with Publish Pac  as the recognition software (see
discussion under 'Product Evaluations: Software"). Nevertheless, our evaluator did  give the PC Scan
2000  high marks as an image scanner (with a caveat for the troublesome paper-feed characteristics).

                                    OEST PC  Scan Plus

The DEST  PC  Scan  Plus came bundled with Publish Pac software by Silicon Beach. This product
doesn't read dot matrix source materials, but it does handle output from typewriters and laser printers,
along with typeset documents.  Only source documents in portrait orientation are accommodated.

Our evaluator,  who  uses the PC Scan Plus with a Macintosh, reported better results with scanned
images than with text  Accuracy of text recognition  seemed to be fairly font-specific; clear copies of
some type families were scanned with low  recognition accuracy.  The documentation for both the
hardware and bundled software were rated 'average.' Speed of operation was said to be unacceptable.

In processing our test pages, the PC  Scan Plus performed about as expected with the configuration
described above.  The typewriter fonts were read fairly accurately, with the Prestige Elite coming out
better than the Courier.  The typeset pages were worthless. Image processing was quite good, and
zeroing in on one field on a travel voucher was excellent

Commenting on the most-liked features of the DEST PC Scan Plus, our evaluator listed 'easy-to-use
front-end.*  Things liked least included 'sheet feed limits paper size; no magazines, books,  etc.; pulls
                                            15

-------
paper crooked frequently.' It was noted that this device is several years old, and better products have
become available more recently.  With this in mind, readers who are looking for a scanner to purchase
are advised to look at other products.

                               DEST Worklest Station Model 202

The DEST Wortctess Station is a standalone text scanner with built-in firmware that produces an ASCII
file.  Typewritten character sets and output from laser printers in typewriter fonts can be read, but no
dot matrix or typeset material is recognized. The device has no graphics scanning capability, and reads
only in portrait orientation. This is an •older* scanner; It cost around $10,000 in 1985.

The biggest objection to this scanner is that, rather than (xxmecting o^rectly to the computer, it requires
an ASCII communications connection to the serial port in the PC.  Robert Root, an 1C consultant at the
Washington information Center, reported to us on the DEST Model 202. His concise description of the
device is so comprehensive that we reproduce it here:

               The DEST WoridoM Station Mod*) 202 to the mo* reliable, mechanically and electronically,
               of the four eoannere we have.  K t* atoo the aimploet to UM because of to reliable
               document feeder and to two control*: • button to "nuf and • button to 'clear' If the
               opsmtor wishes to cancel scanning on the current page. The only complexity results 1mm
               having to know how to tell the PC software. Crosstalk XVI In our setup, how to capture and
               saveaspecffic ASCII file on disk. Scanned ASCII text to transferred to the PC via a Mrial
               port it 1200 bits per second during and after the page acwi. eo large stacks of page*
               procsssss quickly and efficiently.

               The red illumination at the •canning window permit* UM of black type to fill in preprinted
               orange or red ink  forme eo that only the filied-in content* of the form are read. This
               feature could be a real time and error aaver for certain date entry application*, but to my
               knowledge hat not been exploited during the 5 year* we have offered this scanner to EPA
               headquarter* u*en. tt 1* a real thame 1nat our more modem and capable ecanner* don't
               have as aimple a u*er interface.  I eee little reason why they couldn't

Our ten test documents were scanned on the Worktess Station with mixed results. Understandably,
images and symbols were not property recognized. Text recognition accuracy for pages containing text
in typewriter fonts ranged from good to excellent, and photocopying the 'originate' (which were in fact
photocopies in  the first place) to darken the  text  and thicken the characters resulted in improved
scanning accuracy in some cases.  (It was noted on the  evaluation form that 'copies must be high
quality for good scanning  accuracy.1) We must point out, however, that tests with today's ICR software
yielded equal  or greater accuracy with no 'enhancement1 of source documents.

                                 Hewlett Packard ScanJet Plus

The PC TAP staff have access to a new  HP ScanJet Plus in the information center at RTF.  We did
extensive testing with this device on both an Epson  Equity 111+ and a Macintosh II.  In the MS-DOS
environment we  used HP Scanning Gallery Pius, ReadRight, TrueScan, and OmniPage software  to
process scanned files; overviews of these products are  in the section of this report dealing with
software.

The ScanJet Plus is a flatbed scanner.  It comes with a board that must be installed in the PC before
you can use the scanner; a board is not required for the Macintosh.  For MS-DOS machines, the
scanner is shipped with two software products: the HP Scanning  Gallery for image  scanning, and
ReadRight, an OCR product.   Scanning Gallery Plus, which runs under Microsoft  Windows, handles
source images in both portrait and landscape formats. If your machine doesnt have Windows, a run-
time version comes with the HP software.  Both Scanning Gallery  Plus and ReadRight are mouse-
driven and easy to use.  If you're anti-mouse, you can still use the keyboard to run the software.
                                               16

-------
 Details of our experiences using TrueScan and OmniPage with the ScanJet Plus may be found in the
 discussions of scanning software.  Retail list price for the ScanJet Plus is around $2,000.

 We have been very pleased with the performance of our scanner.   It's easy to operate, has  no
 confusing or cumbersome knobs or switches, and has been trouble-free in both the PC and Macintosh
 environments.  Clients in our information center have Httte trouble using it,  and they invariably are
 pleased with the results when they know how to use the scanning software properly. We can give an
 unqualified ensorsement to this device.

 On  the Macintosh  II  we  used OmniPage to scan our ten test documents on the ScanJet Pius.
 OmniPage, ReadRight, and TrueScan were ait tried on an Epson Equity III+. An advantage of the Mac
 over the standard AT-dass PC for scanning is that there's no need for adding a board to the computer.
 Once the image has been captured, though, it's more a matter of user preference for the working
 environment We didn't notice any appreciable difference in the quality of text or images that we could
 identify as CPU-specific.

                                    Kurzweil Model 4000

 Uke the  DEST Workiess  Station, the Kurzweil Model 4000 is a "stand alone' scanner that must  be
 accessed through a communications interface.  Reflecting  another similarity to the DEST, our study
 participant used Crosstalk to address the scanner.  The Model 4000 is a text-only* scanner with  no
 capability to process images.  All scanned text is saved in ASCII files.   This  configuration was
 characterized as 'old,* and since more direct connectivity is available with newer products, the Model
 4000 is not  recommended for individuals currently looking for a scanner.

 The success of this device in reading our test files is a testimony to Kurzweil's reputation as a leader
 in the scanning industry.  Even it's 'old* technology  demonstrated  excellent character  recognition
 capabilities.  Although it did have trouble with a couple of pages, for the most part a very high reliability
 was demonstrated.  This product did an outstanding job with the 'hard to read- columns of numbers.

                                    Kurzweil Model 7320

 The Kurzweil Model 7320 with OCR software and coprocessor board was a $10,000 investment when
 it was purchased in  1987. A subsequent upgrade for the OCR software in April 1989 cost an additional
 $400.

The study participant who reported on this product  cited no problems installing or using any part of
this configuration. However, the document feeder has been  a chronic irritant after the first 25-50 hours
 of service.  It requires constant monitoring because of  a tendency to 'grab* several pages at a time.
 Another disliked feature  is  the 'complex, menu-driven user  interface that  can't be bypassed  or
streamlined  for simple production scanning of mufti-page text documents unless the pages feed
 reliably.'

In a more positive light, the 7320 was  reported to have a very flexible font-recognition capability.  In
 addition, the capability of fine-tuning scanner and OCR settings from on-screen menus was seen  as
a significant advantage. Although the performance of  this  scanner was rated highly,  because of  its
troublesome document feeder and cumbersome user interface,  our evaluator did not recommend that
others  consider acquiring a similar device  for their office use.

This scanner turned in a top-notch character-recognition performance in processing our test documents.
It rates among the top of the group. Regardless of  font, text pages were reproduced  with few or no
errors.  Sometimes formatting was not totally maintained, but it wouldn't require a major effort to remedy

                                             17

-------
 the discrepancies.  Uke the Kurzweil 4000 discussed above, this scanner did an excellent Job on the
 columns of numbers that were troublesome to many of the other devices.



 The Microtek 300A is a flatbed scanner which,  according to reports in the literature,  is a first-class
 device.  However, the report from our evaluator dkJnl include  a recommendation that other users
 consider acquiring one. Although some hardware fncornpatibtirties were encountered when the scanner
 was acquired, no significant operational problems were reported  with the device.  But our evaluators1
 experiences have not genenerated much enthusiasm for using ft. Scanning performance was said to
 be fine,' but stow, and the scanner itself was rated 'okay.1

 This was a field-tested scanner, and we have no first-hand experience with either the device or the
 front-end software that was used during the testing. The image-scanning software is a product called
 Eyestar Plus; SmartStait was used for text.  Neither was rated satisfactory by our study participant.  The
 text-recognition software was said to work fine wtth simple text, but is not very flexible.1 This sounds
 like what you would expect from a matrix-matching product; with fonts ft •knows* ft does an acceptable
 job, but otherwise performance is limited.  The image-scanning product was summarized in this way:
 •works for scanning pictures as long as they are very sharp.*

 When our test pages were scanned on the 300A,  the results were for the most part unusable. Although
 some pages (not surprisingly the typewriter-like source materials) scanned better than others, even the
 best weren't  suitable for production work.  A good typist  could  re-enter the text in less time than it
 would take to edit the recognition errors out of the scanned files. In some cases, practically nothing
 of the source text was recognizable.

 The image file that was to have contained the picture of the factory only held the title line from the page
 on which the picture appeared on the original document  We suspect a memory  or file-storage
 limitation caused this. However, when the software failed to produce a file from two of the text pages,
 our study participant scanned those pages as images.  This resulted in quite readable (but un-edftable)
 images of the original text

 Overall, our test results support the evaluator's less-than-enthusiastic  endorsement of the Microtek
 300A. Based on our experience to  date, however, we suspect the lackluster performance may be
 attributable more to the image- and text-processing software than to the scanner itself.

                                     Microtek MSP 300G

 This device was evaluated in the Macintosh environment using Microtek DA image scanning interface
 and OmniPage for text scanning.  The 300G is  a flatbed scanner requiring a SCSI terminator when
 connected to the Mac.  The fact that no terminator was supplied with the device was listed as a major
 shortcoming  by our evaluator.  Another  shortcoming is the insufficient memory on the Mac for
 OmniPage  to operate efficiently.  (Although this fent  the scanner's fault, it is a  consideration when
 you're putting the device to practical use-a minimum  of 4MB is required).

 Features noted as *best liked1 include ease of use, low maintenance, better-than-average results for
jscanned graphics, and ability of the flatbed design to  accommodate source documents with a variety
 of physical characteristics (e.g.  books, charts,  maps, etc.).  Our study participant  said he  would
 recommend this configuration, with appropriate cautions with respect to memory and SCSI terminator
 requirements.
                                             18

-------
To overcome the problem of Insufficient memory to process our test pages, the evaluator used a
technique recommended by OmniPage.   Text pages  were saved  as 300-dpi TIFF files (which,
interestingly, all were 1 megabyte in size), then the ICR software was executed against those disk files.
With this technique, the software feeds' the text from disk, rather than having R passed directly from
the scanner. The resultant test files were saved in MS Word format, which we subsequently converted
to WordPerfect

This material dearly demonstrated the suspect nature of manufacturers claims for text recognition
accuracy.  With an option turned on to record recognition accuracy during the scanning  process,
OmniPage reported 98-99.7% accuracy on several documents that were practically useless. As we
discussed earlier in this report (third paragraph  on page 6), these percentages represent the number
of characters the  software flagged as •suspect,' but dont take into  account those ft incorrectly
recognized.  Nevertheless, several pages had few errors, either real or imagined. The Prestige Elite text
and tfie Helvetica from a PC TAP Consumer Report page were particularly well done.

Summary

In conclusion, we'd like to add our own brief assessment of desktop scanning,  gleaned through our
experiences in this study.  It appears there are a number of viable scanners on the market, and from
what we've seen most of them do a reasonably good Job at what they're designed for.   After all,
scanning technology has been around for a while, it just hasn't been In the desktop market until fairly
recently.  So you probably can find a low-end scanner that suits your needs for a list price in the
$2,000-$4,000 range, and you can expect to get a reliable piece of equipment  However, the key to
the utility of that piece of equipment is in the software you obtain to process the text or images the
scanner can capture.

A number of good software products are available, each of which has its own capabilities and
limitations.  Many-but not  all-scanners  are sold with bundled  image-processing  software, and
reasonably-priced products are available for those that aren't With OCR products, though, the choices
are wider and more varied.  The better ones use Intelligent character recognition techniques;  these
often come with a board that has software  and additional memory where the  ICR processing can be
sped up without a tot of I/O to your computer. They have the power to deliver accurate text recognition
at acceptable speeds, given your source documents are reasonably clear and sharp. These products
presently list in the $2,000-$4,000 range.  If your needs are more modest, there are some excellent
performers for under $1,000, but you  must be prepared to accept their limitations in terms of text
recognition and processing power.

This report has included a lot of descriptive text, and rather than concluding  with more narrative we
prepared a brief table. In deciding what to include in the table, we asked ourselves what a prospective
scanner buyer would be asking him- or herself.   These questions came  to mind:

        1. What type of scanner is it?
        2. Will it work with my computer?
        3. What is required to connect it to my computer?
        4. Does any software come with It?
        5. How much does ft cost?

The table on the next page summarizes the answers to these five questions. If you want more details
about a particular scanner or software product,  refer back to the text in  the body of the report.

Happy scanning!


                                             19

-------
 Desktop Scanners
Summary of Feature*
Scanner
Apple
Chinon
DS-3000
Datacopy
Model 830
DEST
PC Scan
DEST PC
Scan Plus
DEST
Model 202
HP Scan-
Jet Plus
Kurzweil
4000
Kurzweil
7320
Microtek
MSF300A
Microtek
MSF300Q

Type
Flatbed
Portable,
Overhead
Flatbed
Sheetfed
Sheetfed
Sheetfed
Flatbed
Flatbed
Flatbed
Flatbed
Flatbed
•Figu?w v* from
Th«w am Ineiud*
•*1«»f nrm
Platform
Macintosh
PC
Mac,
PC
Mac,
PC
Mac,
PC
Stand-
alone
Mac,
PC
Stand-
alone
Mac,
PC
Mac,
PC
Mac,
PC
•vallabto aouroM w
i han ontu m» a mu
Bundled
Software
Image
Image
Image
Text,
Image
Text
Image
Text-only
Device
Text,
Image
Text-only
Device
None
None
None
KJ nwy not rdtod cucw
ah uukteliitt to aid In i
Available
Interface
SCSI
Aboard
SCSI,
Aboard
SCSI,
Aboard
SCSI,
tt-board
Serial
Port
SCSI, Comm,
Full board
Comm
Interface
SCSI,
Full board
SCSI,
yfc-board
SCSI,
tt-board
induct eemfMriwMu.
Price*
$1,700
$ 995
$2.900
$2,250
$2,500
$10,000
$2,000
Not
Avail.
$4,995
$3,000
$ 3,495

       20

-------
List of Study Contributors
    Earl Beam
    EPA National Enforcement Investigations Center
    Denver Federal Center
    Denver, CO 80225
    (303) 236-5122  (FTS) 776-5122

    Denise Cheatum
    EPA National Enforcement Investigations Center
    Denver Federal Center
    Denver, CO 80225
    (303) 236-5122  (FTS) 776-5122

    Angela Edwards
    Health Effects Research Laboratory
    EPA Environmental Research Center
    Research Triangle Park, NC  27711
    (919) 541-4911  (FTS) 629-4911

    Don Gorton
    Information Center Consultant
    EPA Region VIII
    999 18th Street
    Denver, CO 80202
    (303) 293-7546  (FTS) 330-7546

    Sophia Jeffries
    UNC Graduate Assistant/IC Consultant
    Information Centers Branch, MD-35
    EPA National Computer Center
    Research Triangle Park, NC  27711
    (919) 541-3661  (FTS) 629-3661

    David Levesque
    Information Center Consultant
    EPA Washington Information Center
    401 M Street SW
    Washington, DC '20460
    (202) 475-7413  (FTS) 475-7413

    Theresa Rhyne
    Information Center Consultant
    Information Centers Branch, MD-35
    EPA National Computer Center
    Research Triangle Park, NC  27711
    (919) 541-0207  (FTS) 629-0207
                                           21

-------
List of Study Contributors
    Robert Root
    Information Center Consultant
    EPA Washington Information Center
    401 M Street SW
    Washington, DC 20460
    (202) 475-7413 (FTS) 475-7413

    Diana Smith
    Information Center Consultant
    EPA Region IV
    345 Couitland Street
    Atlanta, GA  30365
    (404)347-0509 (FTS) 257-0509

    David Taylor
    PC TAP Coordinator
    Enformation Centers Branch, MD-35
    EPA Environmental Resarch Center
    Research Triangle  Park, NC  27711
    (919) 541-0568 (FTS) 629-0568

    Dr. Betlina Veronesr
    Health Effects Research Laboratory, MD-74B
    EPA Environmental Research Center
    Research Triangle  Park, NC  27711
    (919) 541-2795 (FTS) 629-2795
                                           22

-------
How to Submit Hems for Open Forum
In keeping with the PC Technology Assessment Program's objective to  have the user  community
actively involved in TAP projects, users are encouraged to submit items for inclusion in future PC TAP
Consumer Reports. If you have independently investigated the capabilities of a software product or a
hardware component, we would like to hear from you. We'd also Kke you to share  with others your
solutions to any problems you may have encountered with a particular application or device, and about
tricks, shortcuts, or unique applications you have devised. Although we cant promise to publish every
contribution, we  will evaluate them all  in terms of  their potential interest to our readers and their
conformance to the spirit and intent of PC TAP.

There are no additional rules for Open Forum contributions, but here are some guidelines:

                 1.  Contributions must be typed.  Our first preference is that they
                     be submitted on a floppy disk in WordPerfect formal If that
                     Isn't possible, the next best method is to EMAIL the text to
                     DAVE.TAYLOR, EPA3099. The least preferable method, but still
                     acceptable, is to mail a typewritten article to TAP at the
                     address on the cover of  this  publication.

                 2.  The length of your contribution will be determined somewhat by
                     its complexity.  However, keep in  mind that we're primarily
                     interested in the purpose  of your study project and how pleased
                     you were with the results, not in the nitty-gritty details of
                     how you did it We will publish your name, address, and phone
                     number for those who  want more details. Two to three pages
                     is probably a reasonable  maximum length. On the other hand,
                     a paragraph containing a nugget  that may be useful to others
                     would be equally welcome.

                 3.  All material submitted by users is subject to  our editing, and
                     you will not be given an  opportunity to review the final
                     manuscript before publication.  Sorry, you'll just have to
                     trust us.  If we have questions or  don't understand any part
                     of your text, we'll contact you for clarification.


   We hope you enjoy PC TAP Consumer Reports, and we look forward to hearing from individuals who
   have insights or discoveries to share with others. Thanks for your interest and your participation
   in the PC Technology Assessment Program.
                                              23

-------

-------