Cartographic
                              Methods
                     Alan Brenne*
The U. S. Environmental Protection Agency
Office of Information Resources.Management
National Geographic Information Systems Program

-------
      The ARC Macro Language^ and C programs discussed in this guideline are
available by anonymous ftp from sdcdg01.sdc.epa.gov and are in the files:
      /pub/readme.cart
      /pub/map_amls.tar.Z
      /pub/map_design.tar.Z
      /pub/map_post.Z
      /pub/color_post.tar.Z

      For users not on the Internet, the programs can be obtained on hardcppy,
3.5 inch disks or QIC 150 by contacting the National CIS Program at 703^235-5600,
or:
      401MST.SW,MS3405R
      Washington, DC 20460.
      AML, ARC/GRID, ARC/TIN and ARC Macro Language are trademarks
and ARC/INFO is a registered trademark of Environmental Systems Research
Institute.  AViiON is a trademark of Data General Corporation.  Sun and
SPARCstation are trademarks of Sun Microsystems, Inc. PostScript is a trademark
of Adobe Systems, Inc.  Use of these trademarks  does  not constitute an
endorsement by the United States Government.
      In no event shall the United States Government have any responsibility or
liability for any consequences of any use, misuse, inability to use, or reliance upon
the information contained herein, nor warrant or otherwise represent in any way
the accuracy, adequacy, efficacy, or applicability of the contents hereof.    :

-------
                          Table of Contents


Chapter One
The Utility of Maps and Graphics 1
 The Principles of Data 1
 Meta-Data and Uncertainty 4
   Uncertainties of the Physical World 6
   Uncertainties of the Computer World 7
   Uncertainties of the Human World 10
 The Idea of Visual Communication 11
 The Means of Visual Communication 13
 Methods for Visually Communicating Data and Meta-Data 17


Chapter Two
Producing Displays 24
 Classifying Space 24
   Projections 25
   Scale and Generalization 27
   Space and Time 27
 Classifying Data 28
   Manual Classification 28
   Natural Breakpoints 29
   Eyton's Equiprobability Ellipse Bivariate Classification 30
   Symbol Value Update 31
   Unclassed Maps 31
 Page Layout 32
   Titles and Type 34
   Insets and Legends 35
 ARC/INFO Hints 36


Chapter Three
Point Symbolization in ARC/INFO 38
 Monovariate Symbolization 38
   Nominal Data-Hue 38
   Nominal Data-Orientation 38
   Nominal Data-Shape 38
   Ordinal to Ratio Data—Orientation 40
   Ordinal to Ratio Data-Value 40
   Ordinal to Ratio Data—Size 40
   Ordinal to Ratio Data-Graduated Circle Size 42
 Bivariate, Monochrome Symbolization 42
   Two Nominal Data Sets—Shape and Orientation 42
   Nominal Data, and Ordinal Data—Shape and Value 44
   Nominal Data, and Ratio Data—Shape and Size 44
   Ratio Data, and Ordinal Data-Size and Value 44
 Bivariate, Color Symbolization 45
   Two Nominal Data Sets—Shape and Hue 45

-------
   Two Nominal Data Sets—Dual Hue Ranges 45
   Two Ordinal Data Sets—Complementary Colors 47
   Point Legend Creation 47
   Nominal Data, and Ordinal Data-Hue and Intensity 47
   Two Ratio Data Sets-Equiprobability Ellipse 49
   Ratio Data, and Nominal Data—Size and Hue 49
  Multivariate Symbolization 49
   Three Ordinal to Ratio Data Sets-Red, Green and Blue Symbolization 49
   Nominal Data, Ratio Data and Ordinal Data—Shape, Size and Value 51
   Ratio Data, Nominal Data, and Ordinal Data—Size, Hue and Intensity 51
   Ratio Data-Point Pie Graphs 51


Chapter Four
Line Symbolization in ARC/INFO  53
  Monovariate Symbolization 53
   Nominal Data—Hue 53
   Nominal Data—Shape or Texture 53
   Ordinal Data-Value 55
   Ratio Data-Size 55
  Bivariate, Monochrome Symbolization 56
   Two Nominal Data Sets—Shape and Texture 56
   Nominal Data, and Ordinal Data—Shape and Value 56
   Nominal Data, and Ratio Data—Shape and Size 58
   Ratio Data, and Ordinal Data—Size and Value 58
  Bivariate, Color Symbolization 58
   Two Nominal Data Sets—Texture or Shape, and Hue 58
   Two Nominal Data Sets-Dual Hue Ranges 60
   Two Ordinal Data Sets—Complementary Colors 60
   Bivariate Color Legends 60
   Two Ratio Data Sets-Eyton's Ellipse for Lines 62
   Ratio Data, and Nominal Data-Size and Hue 62
   Nominal Data and Ordinal Meta-Data—Hue and Intensity 62
  Multivariate Symbolization 62
   Three Ordinal to Ratio Data Sets—Red, Green and Blue Symbolization 62
   Nominal Data, Ratio Data, and Nominal Data-Shape, Size and Hue 64
   Nominal Data, Ratio Data, and Ordinal Data-Shape, Size and Value 64
   Ratio Data, Nominal Data, and Ordinal Data-Size, Hue and Intensity 64


Chapter Five
Choropleth Symbolization in  ARC/INFO 66
  Monovariate Symbolization 66
   Nominal Data-Hue 66
   Monovariate Legends—Filled Polygons 66
   Nominal Data—Orientation 68
   Nominal Data—Shape 68
   Ordinal Data-Value 68
   Ordinal to Ratio Data—Orientation 68
   Monovariate Legends—Orientation 70

-------
  Bivariate, Monochrome Symbolization 70
   Two Nominal Data Sets-Texture and Orientation 70
   Nominal Data, and Ordinal Data—Texture Distinguished by Value 72
   Nominal Data-1 and 2—Texture as Intersecting Lines 72
   Two Ordinal to Ratio Data Sets-Texture as Intersecting Lines 72
   Bivariate Legends—Unclassed Texture 74
  Bivariate, Color Symbolization 74
   Two Nominal Data Sets—Texture Distinguished by Hue 74
   Bivariate Legends—Lookup Table Based Displays 74
   Two Nominal Data Sets—Dual Hue Ranges 76
   Nominal Data, and Ordinal Data—Hue and Intensity 76
   Two Ordinal Data Sets—Complementary Colors 76
   Two Ratio Data Sets-Eyton's Ellipse 78
   Bivariate Legends—Eyton's Ellipse 78
  Multivariate Symbolization 80
   Two Nominal or Ordinal Data Sets, Ordinal Data Sets-Texture with Value 80
   Three Ordinal to Ratio Data Sets-Red, Green and Blue Symbolization 80
   Multivariate Legends—RGB Space 80


Chapter Six
Graduated Symbol Symbolization  in  ARC/INFO 83
  Monovariate Maps 83
   Ordinal to Ratio Data—Graduated Circles 83
   Ordinal to Ratio Data—Cartograms 83
  Monochrome, Bivariate Symbolization 83
   Ordinal to Ratio Data, and Ordinal Data-Graduated Circles and Value 85
   Ordinal to Ratio Data, and Ordinal Data-Cartograms Shaded by Value 85
  Color, Bivariate Symbolization 85
   Ordinal to Ratio Data, and Nominal Data-Graduated Circles and Hue 85
   Graduated Circle Legends 85
   Ordinal to Ratio Data, and Nominal Data—Cartograms with Hue 87
   Cartogram Legends 87
  Multivariate Symbolization 87
   Ratio Data, Nominal Data, and Ordinal Data—Graduated Circles, Hue and Intensity 87
   Ratio Data, Nominal Data, and Ordinal Data—Cartograms, Hue and Intensity 89
   Ratio Data-Polygon Pie Graphs 89
   Graduated Pie Legends 89


Chapter Seven
Grid-Cell Symbolization in ARC/INFO 90
   Conversion of Polygons to Grids 90
   Lookup Tables for Ratio Grids 90
  Monovariate Symbolization 90
   Nominal Data-Hue 92
   Ordinal to Ratio Data-Value 92
  Bivariate and Multivariate Symbolization 92
   Nominal Data, and Ordinal Data-Hue and Intensity 92

-------
   Three Ordinal to Ratio Data Sets-Red, Green and Blue Symbolization 92


Chapter Eight
Dot Density Symbolization in ARC/INFO 93
   Conversion of Polygons to Dot Density 93
  Monovariate Symbolization 93
   Ordinal to Ratio Data-Texture 93
   Monovariate Dot Density Legends. 93
  Monochrome, Bivariate Symbolization 95
   Ratio Data, and Ordinal Data-Texture and Value 95
   Bivariate Dot Density Legends 95
  Color, Bivariate, and Multivariate Symbolization 95
   Ratio Data, and Nominal Data—Texture and Hue 97
   Ratio Data, Nominal Data and Ordinal Data—Texture, Hue and Intensity 97
   Three Ordinal to Ratio Data Sets—Red, Green and Blue Symbolization 97


Chapter Nine
Isopleth and Fishnet Symbolization in ARC/INFO 98
   Polygon to Isoline Conversion 98
   Polygon to Surface Conversion 98
  Monovariate Symbolization 98
   Ratio Data—Isoline Location 98
   Ratio Data-Fishnet Height 100
   Isoline and Fishnet Legends 100
  Bivariate Symbolization 100
   Ratio Data and Ordinal Data-Surface Shaded with Value 100
   Ratio Data and Nominal Data—Surface Shaded with Hue 102
  Multivariate Symbolization 102
   Ratio Data, Nominal Data and Ordinal Data-Surface with Hue and Intensity 102
   Four Ratio Data Sets—Surface Shaded with Red, Green and Blue 103


Chapter Ten
Thematic Mapping in  ARC/INFO 104
  Summary of Presented Methods 104
  Areas of Possible Continued Research 104
  Acknowledgments 104


References  105

-------
                    Table of Figures
Chapter One
  Figure 1.1            2
  Table 1.1             3
  Figure 1.2            6
  Figure 1.3            7
  Figure 1.4            8
  Figure 1.5           10
  Figure 1.6           11
  Figure 1.7           14
  Figure 1.8           15
  Figure 1.9           18
  Figure 1.10          20

Chapter Two
  Figure 2.1           24
  Figure 2.2           25
  Figure 2.3           25
  Figure 2.4           26
  Figure 2.5           26
  Figure 2.6           27
  Figure 2.7           28
  Figure 2.8           32
  Figure 2.9           33
  Figure 2.10          34
  Figure 2.11          35

Chapter Three
  Figure 3.1           39
  Figure 3.2           41
  Figure 3.3           43
  Figure 3.4           46
  Figure 3.5           48
  Figure 3.6           50
Chapter Four
  Figure 4.1           54
  Figure 4.2           57
  Figure 4.3           59
  Figure 4.4           61
  Figure 4.5           63

Chapter Five
  Figure 5.1           67
  Figure 5.2           69
  Figure 5.3           71
  Figure 5.4           73
  Figure 5.5           75
  Figure 5.6           77
  Figure 5.7           79
  Figure 5.8           81

Chapter Six
  Figure 6.1           84
  Figure 6.2           86
  Figure 6.3           88

Chapter Seven
  Figure 7.1           91

Chapter Eight
  Figure 8.1           94
  Figure 8.2           96

Chapter Nine
  Figure 9.1           99
  Figure 9.2          101

-------
                             Chapter One
                 The Utility of Maps and Graphics

      Maps  have a  wide variety of uses,  ranging from  the recording of
information (such as cadastral maps) to aiding navigation (such as road maps and
naval charts). Two of the main uses of maps (and graphics) in environmental
analysis, particularly  as assisted by  geographic  information systems, are the
analysis of data and the presentation of data. Neither of these uses precludes the
other, and both can be aided by good graphic design techniques. Graphic design
requires an understanding of the data (and the information represented by the
data), an understanding of the methods of visual communication, and the ability
to make use of the available means of communication. Although each of these
components is described in other places (such as statistics books for data analysis,
cartographic texts for visualization of spatial data, and software manuals for the
actual production  of maps), this guideline integrates  these three areas in the
context of the Environmental Systems Research Institute's geographic information
system, ARC/INFO revision 6, as the available means of communication.
      As such, chapter one contains a brief discussion of data and meta-data
issues, and visualization and communication theory; chapter two consists of a
discussion of issues involved in the design of maps, such as projections and data
classification, as accomplished in ARC/INFO; and chapters three through nine
present ARC Macro Language programs for specific symbolization techniques for
point, line and area data. Chapter ten provides a brief conclusion.

The Principles of Data
      For geographic phenomena, there are two  types of data that comprise
information about a phenomena: attribute data (the measured characteristic of a
location), and location data (the measured location of a  characteristic).  This
pairing of information types is reflected in the two major approaches of geographic
information systems: vector systems, which emphasize the attribute; and raster
systems, which emphasize the location. ARC/INFO now combines both of these,
allowing a wide variety of analysis and mapping, but the appropriate use of this
information is still dependent on the analyst/cartographer.
      Attribute data can be grouped into several  categories-empirical levels-
which influence the design requirements for representing information.  Each
empirical level contains all of the information of lower levels (and thus can be
simplified) while adding additional information (see Figure 1.1  on page 2 for a
comparison  of attribute data levels  with  spatial characteristics and  Visual
variables' for area data). The lowest empirical level is nominal data; this data only
indicates that something is different than something else. Names of states are an
example of nominal categorization. The next level of empirical data is ordinal
data; this data indicates that something is more (or less) than something else, but
no quantity can be given to that distance. The terms high, medium and low reflect
ordinal differences. The third level of empirical data is interval  data; a linear
measurement of distance can be used to gauge the differences between instances.
The highest level of empirical data is ratio data; this data indicates that something
is more (or less) than something else (and thus different), that a linear distance can

-------
                                Abrupt  character of Inter-regional change  SmOOth
                  Discrete
 Interval/Ratio
  Data
               Graduated
               Symbol
  character
  of Intra-
 reglonal
change

        Graduated
          Symbol
Continuous
Ordinal
  Data
Continuous
                Circle/Dot
               Combination
                  Dot
                Density
                                                      Unit-Vector
                                                      Density
  /  Dosymetrtc
                Unclassed
               Choropleth
      Isoline
     [Fishnet}
  Classed
Graduated
  Symbol
  Dot
Density
                       Multip
                     ^ Classed
                     Graduated
                Classed
               Choropleth
     Stepped
     Surface
                               Visual Variables
                                 location
                                   value
                                 Intensity
                               orientation
                                   shape
                                                                     size
                                                                     hue
                                                                     texture
                                                                     arrangement
                                                                     focus
                  Discrete
                              ^ Arbitrary ^
Nominal
  Data
Continuous
                 Visual Order
                 established by the
                  Visual Variables

                Ca Visual Isolation
                    (differences)

                EZ Visual Levels
                    (standing out)
                CD poor order control
               Representation of Data Variables
               •1 good/primary representation
               ^ good as an additional variable In multlvarlate
                   maps

               ES fair representation
               E3 fair as an additional variable In mulovartate
                   maps

               O poor representational method
Figure 1.1   Spatial data models for area data and their cartographic representa-
tions;  developed  from MacEachren and DiBiase (1991) and DiBiase,  Krygier,
Reeves, MacEachren and Brenner (1991).  In a thematic map, the information to be
displayed can be categorized by its empirical level (nominal to ratio), how the data
changes within any region it is aggregated into, and how the data changes between
adjacent regions. Once this categorization is done, visual variables can be selected
on the basis of the ability to represent the information so categorized.

-------
be used to gauge differences and that ratio comparisons (such as: this is twice that)
can also be done. The essential difference between ratio and interval data is that
for ratio data, a non-arbitrary  zero point is intrinsic in the measurement; this
difference  is  small enough that, for visualization,  the methods  used  for
representing both types of data are the same.  An example of the difference
between interval and ratio measurement is the difference  between temperature
measured in degrees Celsius and degrees Kelvin;  the Celsius system has an
arbitrary zero (the freezing point of water) but the Kelvin system's zero is not
arbitrary (absolute zero: point at which the particle motion that is 'temperature,'
stops).
      Because of these differences in attribute data, care  must be taken when
comparisons between different levels of data area made (see Table 1.1 for the types
of comparisons that can be performed on data of given levels).  For example,
ordinal data may be stored as integers that represent the order, but the values of
these integers do not indicate any measurement of the variability of the data. This
lack of measurement does not, however, preclude a software system from acting
on the data as if it were ratio data and thereby calculating essentially meaningless
statistics (or spatial patterns). If comparisons of data of different levels must be
done, either the higher level  of  data must  be  reduced to  the  lower, or a
transformation (an addition of information) must be done to raise the lower level
to the higher. For nominal or ordinal data, the procedures of psychometrics can be
used to recast ordinal information into interval or  ratio data.  Essentially  this
involves assigning  utility values (such as money)  to  data levels that do  not
normally have this type of information associated with it (such as aesthetic values).
This can be accomplished by conducting a survey to get individual assignments of
value, and then using the collected data to assign overall values to the data.

                           Nominal     Ordinal      Interval        Ratio

    Rank Comparison        Invalid      Valid        Valid         Valid
    Addition, Subtraction     Invalid      Invalid       Valid         Valid
    Multiplication, Division    Invalid      Invalid       Invalid        Valid
    Statistics:
      Parametric            Invalid      Invalid       Valid         Valid
      Nonparametric        Invalid      Valid        Valid         Valid

Table 1.1    Valid data comparisons. Because of the varying degree of numerical
precision associated with different data levels (nominal through ratio) only certain
operations can be applied to  comparisons between  two data sets; comparisons
between data of two different levels must occur at the lower of the two classes
(ESRI Grid Class Notes 1992,11-22).

      Although all information is  classified  due to  the nature of measurement
uncertainty in the recording of data (MacEachren 1992, 48), attribute data can be
further classed into groups for both presentation and analysis (see Classifying
Data  on page 28  for a  discussion  of  data classification in  ARC/INFO).
Classification involves the creation of range categories that individual instances fit
into and thus take on that value range. This results in a loss of information and can

-------
be a reduction in the empirical level of a measurement (interval/ratio data is
reduced to ordinal, for example).  This  loss of precision  is offset  by the
simplification of the presentation of information.   These classifications are
accomplished by representing ranges of data as categories on the display medium,
through the use of symbolization that is appropriate to the level of measurement
(see Figure 1.1 on page 2).
      Spatial data can be grouped into four categories.  The  first  type of data
represents point specific information. The second type of data represents linear
information. The third type of data represents area information. The fourth type
of data represents volume information.  Each of these categories is scale specific;
an area feature such as stream may require mapping as a linear feature if the total
area under consideration is small enough that displaying the stream as having
both length and breadth becomes  too  tedious or difficult to represent in the
available media, for the additional information retained. Volumetric information
is also dependant on the means of display-the appearance of three dimensions can
only be approximated on  a two dimensional surface.   Although  the visual
variables that best represent different levels of information do not change for each
of the types of spatial data, the ARC/INFO methods for accomplishing those
representations change.
      For environmental data, Mark Monmonier and Branden Johnson (1990,5-7)
have characterized spatial data into: single location; single location  and affected
area; and multiple  locations and the pattern of distribution.  Single  location
answers not only 'what' but 'where'; this can be applied to all of the basic types of
spatial data and allows the map user to relate environmental information to his or
her own experience  (this  is what Edward Tufte (1990) calls micro/macro
readings).  Building on single location, single location and affected area adds
information  concerning how  a 'what' influences its surroundings (because
'influence' may be more subject to interpretation than a measurement, presentation
of meta-data can become very important  in the presentation of data).  Finally,
multiple locations and  the  pattern of  distribution integrates more than one
instance of single location and affected area into one map.
      Area phenomena can also be categorized on the basis of its spatial grouping
and how it changes over space (MacEachren and DiBiase 1991) (see Figure 1.1).
Grouping ranges from continuous (no grouping) to discrete (complete grouping):
This reflects the degree of spatial autocorrelation within areas. Changes over space
can be smooth to  abrupt.  This reflects  the degree of spatial autocorrelation
between areas. These changes suggest that appropriate symbolization choices be
made that accurately reflect the nature of  the data.  The possible symbolization
choices include, but are not limited to: graduated symbols for abrupt, discrete data;
dot density for smooth, discrete data; isopleth (or the '3-D' equivalent fishnet) for
smooth, continuous data; and choropleth  for abrupt, continuous data; this is in
contradiction with the all-to-common practice of making choropleth maps for all
types of area data.

Meta-Data and Uncertainty
      The uncertainty of information is becoming an important topic with the
increased use of computers for data processing, presentation and analysis. This is
being addressed with position statements  such as the Environmental Protection

-------
Agency's  Locational Data Policy (1991) and the National Center for Geographic
Information and Analysis' Visualization of Data Quality initiative (MacEachren
1992, 47).  Yet, these cannot eliminate uncertainty, which exists at  the most basic
levels of measurement according to Heisenberg's Uncertainty Principle (Capra
1983).  Because of this, David Rejeski (1991) suggests that uncertainties should be
addressed openly  in order  to ensure that decisions have  both utility and
believability. He, as well as Granger Morgan and Max Henrion,  recognize that
uncertainty can be valuable information. Morgan and Henrion (1990,3) give three
specific reasons for the inclusion of uncertainty in policy oriented research:
     1. A central purpose of policy research and policy analysis is to help
          identify important factors and the sources of disagreement in a
          problem,  and to  help anticipate the  unexpected.  An explicit
          treatment of uncertainty forces us to think more carefully aoout
          such matters, helps us identify which factors are most and least
          important, and helps us plan for contingencies or hedge our bets.
     2. Increasingly we must rely on experts when we make decisions. It is
          often hard to be sure we understand exactly what they are telling
          us.  It is harder still to know what to  do when different experts
          appear to be  telling us different things.  If we insist they tell us
          about the uncertainty of their judgments, we will be clearer about
          how much  they think they know  and  whether  they really
          disagree.
     3. Rarely is any problem solved once and for all. Problems have a way
          of resurfacing.  The details may change but the basic problems
          keep coming oack again and again. Sometimes we would like to
          be aole to use, or adapt, policy analyses that have been done in the
          past to help with the proolems of the moment. This is much easier
          to do when the uncertainties of the past work have been  carefully
          described, because then we can have greater confidence that we
          are using the earlier work in an appropriate way.
      Uncertainty has dictionary definitions such as, "uncertain in respect of
duration, continuance, occurrence, etc.," "liability  to chance," "indeterminate as to
magnitude or value" (Simpson and Weiner, Oxford English Dictionary, 1989, 899).1
A more useful interpretation for use in environmental risk analysis would be that
uncertainty is the information contained in the data about data (that is, meta-data).
By defining uncertainty this way, meta-data can be used as  another piece of
information in the analysis and presentation of data, including risk-based policy
making.
      Uncertainty has a taxonomy that should be useful in delimiting the origins
of uncertainty and the reliability of data at any  given point in an analysis (see
Figure 1.2 on page 6). Although Morgan and Henrion (1990) discuss 'The Nature
and Sources of Uncertainty" as chapter 4 of Uncertainty: A Guide to Dealing with
Uncertainty in Quantitative Risk and Policy Analysis in detail, this presentation is a
distillation based on several sources, which include Morgan and Henrion and
others sources, as noted. The taxonomy can be broken down into three groups:
uncertainties of the physical world;  uncertainties of the computer world; and
   '•Uncertainty begins as the vagueness of duration, etc with Wyclif in 1382. By 1982, Oxford
   (1989,900) reports: "What the uncertainty principle asserts is that for no state of any system
   can all dynamical variables be arbitrarily well-determined."

-------
uncertainties of the human world.  Problems of the measurement of natural
phenomena constitute the first category; Alan MacEachren (1992, 48) reports that
the National Center for Geographic Information and Analysis calls this "data
quality".   The uncertainty of the physical world  can  be  further split into
measurement uncertainty and parameter uncertainty.
                     Uncertainty
 Physical World ' Computer World  ' Human World
  Measurement
    Location
    Attribute
  Parameter
    Aggregation
    Generalization
    Time
    Consistency
                        Descriptive
                         I Numeric
                         1 Spatial Delineation
                        Computational
Sending Meanings
I Open Presentation
  of Data and Meta-Data
Receiving Meaning
                          Rounding                Ignoring Meta-Data
                          Significant Digit Shift      I Misunderstanding
                          Over/Underflow            Meta-Data
                        Propagational
                          Compounded Computational
                         1 Polygon Overlay
                        Modeling
                         I Robustness
                         1 Validity

Figure 1.2   A taxonomy of uncertainty.


Uncertainties of the Physical World
      Measurement uncertainty for geographic data includes both location and
attribute uncertainties.  Location uncertainties (Rejeski and Kapuscinski 1990,10)
can be considered the accuracy (closeness to a 'true' value) of the instruments and
the reliability, or precision, (repeatability of a measurement) of the methods used
to calculate a site's position (see Figure 1.3  on page 7). For example a location of a
phenomenon can be determined by use of many methods, such as a professional
surveyor's analysis, use of  a  global positioning system,  or  terrain analysis
estimation.  Each of these methods is of varying accuracy and precision, and the
meta-data that should be recorded includes  the manner in which a site location
was first defined, the method use to derive  its location, the time it  was derived,
and an estimate of its accuracy.
      Attribute uncertainty can be considered the accuracy and reliability of the
instruments used to take a measurement  of the environment. For example, an
instrument  may be set up to measure atmospheric concentrations of  carbon
monoxide, measuring levels in parts per million and with an accuracy of  plus or
minus one part per million.  The data in this situation is  the part per  million
measurement and the meta-data includes  the plus or minus one part per  million
accuracy of the instrument.
      Parameter uncertainty (Rejeski and Kapuscinski 1990,11; MacEachren 1992,
47-8) involves the problem of the aggregation and generalization of point samples

-------
to areas for spatial data, or trend lines        Reliable, but not Accurate.
for linear data; the temporal variability      (/^
between  data  items  and  between                     _  Arrurate, but
measurement times  and data usage;                /^    ^\ not reliable.
and   the   logical  consistency  and
completeness of  data.   This is the
question of whether the measurements
that  are recorded and then used for
analysis are  adequate measurements
of what was intended to be measured;
it entails the  consequences  of the            True Value'
assumption of autocorrelation. Before   Figure 1.3   An   example  of   the
a model is constructed to provide  an   difference  between   accuracy   and
explanation   or   a  projection  of   reliability in spatial location measure-
environmental phenomena, it must be   ments.
recognized that there are few (if any)
phenomena  that  can be  precisely  defined  and  measured for  all  possible
occurrences. Because of this, interpolation and extrapolation-whether linear (as
in a measurement), spatial (as in an area generalization), or temporal (as in data
from one time as estimates for some other time)-must be done even though the
process introduces uncertainty into the analysis.
      There  have  been suggestions made  for  reducing  and  representing
parameter uncertainty.  For generalizing spatial data, Cort, Rowe and Philpot
(1985; in MacEachren 1992,44) suggest that interpolation in  spherical, rather than
planar,  coordinates  will introduce less uncertainty in the  creation of  area
information from  point samples (particularly for large areas).  MacEachren and
Davidson  (1987; in MacEachren 1992, 46) demonstrate that increasing sampling
frequency will also reduce (but not eliminate) the uncertainty of interpolation; this
should also  be  true  for linear and temporal  data as well.  For, representing
parameter uncertainty, Rejeski and Kapuscinski (1990,  10) suggest the use of
transitional buffer zones to represent "fuzzy"  boundaries,  rather than one line
demarcating "hard edges" (see Figure 1.4 on page 8).

Uncertainties of the Computer World
      Problems of the use of computers for storing  and manipulating  data
constitute  the second category of uncertainty.  This can be  subdivided into four
groups: descriptive  uncertainty,  computational  uncertainty,  propagational
uncertainty  and  modeling uncertainty.   Thoughtful use of  programs  and
programming techniques can reduce the amount of uncertainty introduced to data
by computer manipulation.  Herman Knoble  (1990,  2)  states that correct and
accurate computer programs must be an ethical responsibility when dealing with
numerical algorithms, all of which, "at the bottom line...affect people."
      Descriptive uncertainty deals with the representation  of data in computers,
including both numeric and spatial  problems.  Numeric uncertainty arises from
the method computers use to store data. For example, the binary number system
does not have an exact representation of some common decimal numbers, such as
1/100.  Another type of numeric uncertainty arising from number storage is the
shifting of significant digits.  If a number is  input into a computer that has the
ability to store a longer number than is input,  the computer will 'pad1 the extra

-------
                                                Mean Sea Level
                                                Mean Spring High Tide
                                                Mean Neap High Tide
                                                Mean Sea Level
                                                Mean Neap Low Tide
                                                Mean Spring Low Tide
Figure 1.4  An example of a change from a hard to a 'fuzzy' boundary.

space with zeros.  These digits will be available for computation in numeric
modeling even though they add no information and are meaningless.  Significant
digit shift can occur in the other direction as well. If a number is input into a
computer that stores fewer digits than is input, the computer will round or
truncate the number to fit its numeric scheme. These problems can be controlled,
but not eliminated, by specifically programming the computer for the required
operations (Knoble 1990), but for generally available software this is not possible.
      Spatially, data is generally stored as either a table  of vectors that define
sharp boundaries between regions, or in regular tessellations (square, 'raster'
grids) that force a predefined spatial pattern on an area (Rejeski and Kapuscinski
1990,10).  Both of these methods introduce uncertainty: the vector representation
forces boundary lines where transition zones may be; and tessellations assume the
entire grid cell is homogenous.  These problems can be reduced, but again not
eliminated, by using smaller polygons or grid cells, but this forces a trade off
between data file size and computational time, and decreased uncertainty, which
may not be pragmatically feasible.
      Computational uncertainty deals with  the problems of numeric modelling
by using computers (Knoble 1990; Rejeski and Kapuscinski 1990,13).  Once data
are stored in a digital format, any further processing can introduce uncertainty.
Type shifting (such as from an integer format to a real number format, or from
reals to integers) can introduce uncertainty by forcing rounding to occur. More
commonly, rounding occurs within real numbers when numbers that are not
similar in value are arithmetically joined. Knoble (1990,4-5) gives an example of
the results of this type of rounding: an IBM 3090 Model 600, VS FORTRAN 2.4
program that for the formula:

         P=((A+X)**2-A"2-2.*A*X)/X**2

generates an answer of -4999.99609 when A is equal to 1000.0 and X is equal to 0.01,
despite the fact that the formula simplifies to P equalling one for all X's not equal
to zero.
      Significant digit shift can also occur in arithmetic operations, particularly in
operations involving subtraction of similar values. Subtracting 1.23456787 * 107
from 1.23456789 * 107 yields 0.2, but a computer can store this value as 2.00000000

-------
* 10'1, and as with data input, will allow computation on all of the digits to the right
of the 2, which is the only meaningful digit in the number.
      Two additional,  similar  problems that may occur are  overflow  and
underflow. These result when a number is incremented, or decremented beyond
the storage type's ability to  represent numbers.  Depending  on the operating
system this may cause an error or may be ignored with the value remaining the
same or drastically changing.  Borland's Turbo C++ 2.0, running under MS-DOS
5.0 on an IBM AT, compiles programs that allow adding one to  the thirty-two bit,
'unsigned long' integer 4294967295 (which is equal to 232 - 1, and is the largest
integer that can be represented in  thirty-two bits), changing the value of the
variable to 0. Turbo C++ compiled programs will give an overflow error when
thirty-two bit real numbers (type 'float') are  incremented  outside of type float's
range.  When these computational uncertainty errors occur in a program, and are
not handled well, the program could continue and generate an apparently correct
answer purely by chance (Knoble 1990,2).
      Propagational uncertainty deals with the problem of how uncertainty from
a physical world measurement or another computer-related uncertainty moves
through successive iterations of a model. It can be tested by varying the input to a
numeric model in small steps to see if small changes can make large differences in
the output of a computation, which in principle should not occur. For example, by
making the value of A equal to 100.0 and X equal to 0.01, Knoble's (1990,  5)
program generates a value for P equal to -39.0624847; by changing X to 0.0078125,
the program generates a value of P equal to 0.0000000.  Propagational uncertainty
can also be tested if the computer program can be rewritten to an algebraically
equivalent, but computationally different manner, which would allow comparison
between programs that should generate  the same output.  By simplifying the
formula in Knoble's example to a command line such as:

         if X not equal to 0 then P = 1 else print "DIVISION BY ZERO"

the problems of floating point arithmetic can be avoided.
      Propagational uncertainty not only involves the accuracy of real number
computations for numeric data, but also the generation of polygon overlay slivers
within vector spatial data, such as that used by ARC/INFO (see Figure 1.5 on page
10).  The slivers that may be handled by "fuzzy tolerances"  (ARC  Command
References, Commands J-Z, 1991, UNION 1), which would allow the shifting  of
close lines so that they merged in the output. This could cause the shifting of data
from one layer of a known high accuracy  (such as a surveyor's  cadastral data)  to
correspond with a layer of lower or  unknown accuracy (such as information
digitized from a medium or small scale paper map). All future use of the merged
data layer would include the uncertainty created by the overlaying of two data
layers of varying accuracy, and the meta-data that should be attached to the new
data layer would have to reflect an estimate of how much shifting occurred. This
type of shifting can be eliminated by avoiding fuzzy overlays, although this can
cause the generation of sliver regions along boundaries that will cause increased
storage and processing time for use of the new layer, and may not hold any useful
information other than the indication of the difference between the two layers used
to create the new layer.
      Modeling uncertainty culminates the types of uncertainty associated with

-------
 Boundary 1\   11 Boundary 2
                Fuzzy Tolerance
                Boundary Shift
                ..Polygon Merge
                Sliver
Figure 1.5   Polygon   overlay  can
cause shifting in boundary lines, or can
create boundary slivers.
                                                                      10
computers. Although there are several
types of modelling  (verbal,  graphic,
physical, and mathematical), each of
which  is  subject  to   questions  of
robustness and validity versus the goal
of the model  (such as description or
prediction),  it is only  mathematical
models that tend to be dependent on
computers for execution and are thus
subject   to    the   other   computer
uncertainties.   The  robustness of a
model is reflected in a model's ability to
handle  all  appropriate  input  and
produce a reasonable output. It is thus
tied to propagational uncertainty, but
robustness also entails that not only do
small  changes  in  input  not  cause
inappropriate  changes in output,  but
that the model's  output, in practice, is
also within the expected range of the
mathematical model, in principle. The
validity of a model is the question of whether a model, in practice, actually
represents the phenomena being  modeled, in principle.  This questions the
methods used to operationalize a numerical model, such as the validity of using
'if.. .then' statements to ensure the apparent robustness of a procedure that would
otherwise  produce non-robust output, when no such statements are apparent in
the 'real world' phenomena.

Uncertainties of the Human World
      Problems  of the human  communication of information  and meanings
constitutes the third area of uncertainty (Rejeski and Kapuscinski 1990, 12).  In
attempting to communicate information meta-data can be lost in two ways: the
sender of information can  give data without the meta-data information, or the
receiver of information does not understand that part of the message.  The first
way, not giving  meta-data with data, can be the result of restrictions of space
within publication materials, the desire for the appearance of greater accuracy
(Star 1985), and until recently a lack of awareness of the potential importance of
uncertainty information, particularly in the areas of human and environmental
risk analysis (Rejeski and Kapuscinski 1990). The second way that meta-data can
be lost is when the receiver of information not does receive that part of a message.
This can result from the ignoring of meta-data or the  inability to interpret the
meta-data  through lack of experience in dealing with the way it is conveyed. This
type of failed effective communication  can  be  the result of the  variable
interpretations of both words and images.  The meaning of words constitutes one
of the major problems the U. S. Environmental Protection Agency has had to deal
with in risk analysis (Rejeski and Kapuscinski 1990,12).  Effective communication
of meta-data has been studied in a strictly human realm (body language, etc.), but
little has been done in the area of communication of uncertainty within the realm

-------
                                                                      11

of  scientific  communication,  particularly  with  spatial data  (Rejeski  and
Kapuscinski 1990, MacEachren 1992).
      This taxonomy should prove  useful,  particularly for reducing human
communication  uncertainties.    By  recognizing  that  uncertainty  exists in
measurement  and is propagated in  computer manipulation  of  data,  these
uncertainties can be dealt with openly and honestly, as Rejeski (1991) suggests.
This,  aided by  visual communication  techniques, should reduce  human
uncertainty by increasing  the amount  of  information  (by communicating
meta-data) given in the presentation of data.

The Idea of Visual Communication
      The effective communication of environmental data and uncertainty as
meta-data requires an understanding  of the principles of visual communication
and map design.  David DiBiase (1990) has  developed a  model of information
display, in a research setting, as a means of communication in a continuum from
communication to self through communication  to  others  (see Figure  1.6).
Communication to self can be thought of as "visual thinking", and includes data
exploration,  hypothesis generation and confirmation.  At  this level of  data
visualization, maps and other graphics are used to "prompt insight, reveal patterns
in data, and highlight anomalies" (MacEachren 1992,1). The goal in the creation
of these images should be to assist these goals. Because of these goals, it is with
this type of visual communication that the possibility of visualization error is
greatest.  MacEachren and Ganter (1990) describe these errors as seeing wrong
(similar to the type I error in hypothesis testing—identifying a pattern that is not
there) and not seeing (similar to the type II error—not identifying a pattern that is
there).  But, since these graphics are generally rough and  intended only for
viewing by the researcher(s), searching for one 'optimal' display is less important;
more views on the same data may be more helpful and reduce the chances of not
seeing or seeing wrong.
      Communication    to
others  can be thought of as
"visual communication" and
is the realm of presentation
graphics.   At this level  of
data visualization, maps and
other  graphics are used  to
synthesize   data   into  an
"abstract         statement
concerning  patterns  and
relationships"  (MacEachren
1992,6) and finally to present
the  information to  others to
persuade   them   of   the
accuracy   of   the   data
assessment    (MacEachren
1992,  7).   Edward  Tufte's
(1983,   77)   principles   of
graphic excellence  can help
                               Visual
                         Communication
                        Synthesis
                                Presentation
Figure 1.6   DiBiase's (1990) model of the function
of graphics in research.  Visualization begins with
exploring an idea.  The idea then moves through
confirmation and synthesis in a larger group of
both ideas and people, and ends with presentation
(although each stage can spawn new ideas).

-------
                                                                       12

ensure the most information is presented in a minimum of space, and that this
information is conveyed ethically:
     The representation of numbers, as physically measured on the surface
          of  the graphic  itself,  should be directly proportional to the
          numerical quantities represented.
     Clear, detailed, and  thorough  labeling should be  used to defeat
          graphical distortion and ambiguity. Write out explanations of the
          data on the graphic itself. Label important events in the data.
     Show data variation, not design variation.
     In time-series displays of money, deflated and standardized units of
          monetary measurement are nearly always better than nominal
          units.
     The number of information-carrying (variable)  dimensions depicted
          should not exceed the number of dimensions in the data.
     Graphics must not quote data out of context.
Tufte also discusses the concept of data-carrying ink—that is, don't put more ink on
the page than is necessary to convey the information. These principles are similar
to those that Morgan and Henrion (1990) propose for designing graphics for the
presentation of uncertainty information.  Judy Olson (1981) also suggests the need
for clear and  accurate legends in her guidelines for the production of bivariate
maps.
      Monmonier  and Johnson  (199077) have proposed a guideline  for the
communication of environmental risk, which can aid in the making of maps for
visual communication.  Their multistep process is not a waterfall type model
(when one level is completed it cannot be returned to), but rather a guideline for
iterative refinement for the presentation of data.  The steps they include are:
setting up the design team; identifying the communication goal; the issue profile;
the audience; the messages of environmental maps;  methods; and  evaluation.
Setting up the design team acknowledges that one person may not have all of the
knowledge necessary to adequately design a graphic, and that, as necessary, each
of the following steps should involve each person who has or can contribute to the
map. Identifying the communication goal is simply that a focus should be selected
for the map; this will facilitate inclusion of important information and removal of
extraneous data. The issue profile deals with the history of the problem  to be
mapped and the constraints on producing the map; that is, the 'environment'  of
the map and map design process.
      The audience must also be considered when designing a graphic.  This
includes consideration of who will be viewing the map (politicians, scientists, the
public, etc.).  These different audiences will generally have varying degrees  of
map-interpretation skills; this influences the amount of explanatory information
that should be included, the appropriateness of bi- or multivariate maps, and the
appropriateness of 'eye catchers' such as bright  colors.  Consideration of the
audience should also include an acknowledgment of how  the map will be
presented; this is part of the issue profile and influences the choices of methods.
      Monmonier  and Johnson (1990)  present a list of several categories  of
messages that environmental maps generally present.  The first of these is "What
we found/What we know/What we  think  we know"  (p!2).  This is the
presentation of information at its most basic level, but even at this level meta-data
can be presented-'what we think we know.' The second of these messages  is
"What you can do/What you should do" (p!3). This is very dependent on the

-------
                                                                        13

choice of audience—a lawmaker's set of choices of what to do can be quite distant
from a concerned citizen's, for example.  The third message is "What we're doing/
What we want to do" (p!3). For this type of message Monmonier and Johnson
suggest that an overview map with several smaller maps of detail maps may aid
the presentation. The last category is "Why we're doing what we're doing/Why
you should do what we're asking you to  do" (p!3).  This type of map should
present the reasoning behind a choice or plan of action;  this can include the
presentation of the history side of the issue profile.
      The final two steps of Monmonier and Johnson's strategy are methods and
evaluation. Methods need to address questions such as the need for one or several
maps, how these maps will be presented (large size color maps, 8.5x11 black and
white maps, slides, video, etc.) and whether or not additional information such as
non-map graphics  should be included.  Although evaluation is presented as the
last step in map design, it is a part of each of the earlier steps. It is the last step as
a review of the, possible, final design.  Evaluation can be  formal (a survey) or
informal (a telephone call to someone who has seen the map and can suggest any
possible improvements).

The Means of Visual Communication
      Jacques Bertin proposed a group of 'visual variables' that has since been
added on to by other cartographers (such as: Morrison (1974), McCleary (1983),
and Woodward (1991)). This group of variables constitutes those representational
techniques that a cartographer or illustration designer has in the creation of an
image.  The list includes: location, size, value, hue, intensity, orientation, shape,
texture, arrangement, and focus (see Figure 1.7 on page 14). This list is not
necessarily all inclusive, but the list is useful in the design of maps and graphics.
Each of the variables can be used to establish visual isolation (difference from
surroundings) and visual levels (greater noticeability) (see Figure 1.1 on page 2),
and according to Bertin, these visual variables have certain levels of measurement
that are commonly associated with them, and thus allow representations  that
convey the character of the data.
      In a static map, the use of location is limited to the signification of the spatial
place of an item, although in orthogonal displays, location is a flexible tool because
of the possibility of specifying the viewpoint on a simulated three-dimensional
surface. In multiple maps, or dynamic mapping, change in position can be used to
show movement of a feature. It is inherently an interval/ratio variable, but can be
used to depict all levels of measurement.
      Size is most often used to depict an ordinal variable, although interval/ratio
variables can also be depicted using size. Because a larger symbol is almost always
associated with 'more,' this is the context that it should generally be used in.
      Of the 'color' variables, Bertin (1983, 42)  only identified value and  hue.
Value, like size, has a distinct range from more to less and is therefore good for
mapping ordinal data and can also be used for  interval/ratio data. ARC/INFO
refers to value as lightness. Hue can be used to depict ordinal or interval/ratio
data because the frequencies that constitute hue are ordered, but these orders are
not always readily remembered and used;  hue (at a constant value and intensity)
is therefore best used for nominal data. Intensity, which is also called saturation,
has a distinct range from more to less and is therefore good for ordinal data.

-------
                                                    14
                   Point
                       Line
Area
      Location
            Size
                     *
          Value  /.;•.
                    • •
                    £    -^ j
                    & £
                     , ,  -tl'-
                    ± Xt v  A
                      ©
            Hue
      Intensity
   Orientation
Shape
       Texture
Arrangement
          Focus
Figure 1.7  The visual variables as presented by DiBiase, Krygier, Reeves,
MacEachren, and Brenner (1991). All have been done in ARC/INFO 6.0, but some
(such as texture for point symbols, orientation for line symbols, and arrangement
for area symbols) are more difficult and, thus, less useful than others.

-------
                                                                        15
      As availability increases for the means of creation of high resolution color
displays, use of the color visual variables will become an even greater part of
cartography. Color can be specified in several ways (Dent 1985). One common
and useful way is the Munsell color system (see Figure 1.8); it is based on the
human perception of color (and is similar to the Tektronix color cone, and a color
specification system in ARC/INFO, Hue-Lightness-Saturation).  This system
divides color into the three visual variable categories value, hue and intensity.
      Value is the measure of the lightness or darkness of a surface; it is the total
amount of light that is reflecting or emitting from a surface measured relative to
the human ability to discriminate black from gray up to white.  It is the  only
measure of light along the white-gray-black continuum, because all frequencies of
light should be equally present.
      Hue is the term most often meant when the word 'color1 is used.  The
Munsell color system records hue as an angular measure around a color space. It
represents the modal value of the frequency of light that is reflecting or being
emitted from a surface.  When a light has no modal frequency, the perceived light
is   along    the  white-gray-black
continuum,  which  constitutes   the             Value —'W*1116
central  axis  of the Munsell  color
space.
      Intensity is a measure of the
purity  of the spectral  frequency of
light. It represents the variance of the
frequency of light;  greater variance
means   that  a  larger   range   of
frequencies of light are being emitted
or reflected from a surface—that the
surface has more gray in it. A feature
of intense colors is that they tend to be
more noticeable  (stand  out more)
than less intense colors,  even  when
the total amount of light reflected or
emitted may be the  same, which
                                                                    Hue
                                     Intensity
2/

I/
                                                       Black
                                    Figure 1.8   The Munsell color system;
makes intensity good for establishing  for example, an intense yellow could be
             J C7                 O      • f •  j   ^N/F"f / *1 J
visual  hierarchies.   Because of the  specified as 5Y7/14.
human visual system, the value of the
most intense color of different hues varies, with yellow having the highest value
for its  maximum intensity.  This change is accounted for in the Munsell color
system; the Tektronix color cone assumes that maximum intensity for any color
occurs at the midpoint of the value scale.
      For cartographic use in displaying ordered data, color hues are generally
arranged to allow ease of interpretation.  There are many possible color schemes,
many having been suggested for terrain shading.  For other data,  the spectral
ordering of hues may be the most obvious for use in mapping ordered data, but
because colors  can  shift in intensity and value, this may  lead to misleading
maps—yellow will stand out in the spectral pattern.  Two suggestions are of note
for remedying this problem. The first is the use of a part-spectral scheme: yellow
through  orange to  red; or, yellow through green to blue.   This can allow

-------
                                                                        16

redundancy in the visual variables of intensity and value, which reinforces the hue
progression. The second is the use of hues ordered on the basis of value. This may
be a viable alternative, but could prove confusing to those who know the spectral
sequence of hues.
      The ability to discriminate colors has received some study, particularly in
the discrimination of values.  The Munsell color system divides the range from
black to white  into eleven steps, as  does the system  for  black and white
photography developed by Ansel Adams (Upton and Upton 1989,313). For most
cartographic use, eleven steps would be both difficult to  create and difficult to
interpret; most cartographic research indicates that the use of half that range (five
or six) is better for map design. If value gives is an indication for the amount of
discriminability  that can  be  expected  for  intensities, four intensity values is
probably the most that should be used. Discrimination of hues for cartographic
use is affected by the apparent changes in intensity and value that occur as hue is
changed,  but  for univariate  symbolization,  five  steps  should be easily
discriminable in one of the part-spectral sequences.
      Colors are produced by two methods: color addition and color subtraction.
The color addition process is most commonly seen in color monitors for computer
displays and television. It involves the mixture for red, green and blue to create
colors on a black surface with all three being used for white. The color subtractive
process is used for the production of printed material, such as paper maps.  It
involves the use of cyan, yellow and magenta (and black) to subtract colors that
would reflect from the surface of white paper. Because the overprinting  of cyan,
yellow,  and magenta generally leaves muddy brown, black is often used as a
fourth color in the printing process. Colors are obtained by overprinting  the four
separates, with each offset slightly in order to allow the apparent mixing of colors
in a dither pattern.
      The production  processes must  be kept in  mind  when choosing color
schemes on a color monitor that will be printed on a paper output device. Colors
on a computer display may not appear the same on a printed sheet, because of the
change in creation method. In addition, on a computer monitor, a point can take
on any of the possible colors generatable by the graphics system, and resolution is
independent of color. On paper output, the dither patterns that are used to create
the appearance of many colors may cause a change in apparent color and, more
significantly, output resolution.  ARC/INFO allows the use of color lookup tables
in commands such as  HPGL2 (ESRI ARC Command References, 1991) to enable
redefining screen colors to pretested printer colors to help compensate for changes
in color; changes is resolution must be accounted for by the cartographer.
      A final consideration for color (particularly hue) is its social interpretation.
Colors can have meanings that must be taken in to consideration when designing
maps and  graphics, for example: red as stop; yellow as caution; green as  go. For
environmental maps, use of red (particularly, intense red) can indicate imminent
danger, and with  dark shades of other colors  (blues, grays and browns, for
example) can evoke a sense of foreboding or futility (this is demonstrated in The
Nuclear  War Atlas and  movies  such as Blade Runner).  Other color schemes can
invoke other reaction: pastels (colors of low intensity and high value)  can indicate
serenity (unclassified maps by the Central Intelligence Agency often make use of
pastels); intense colors such as yellow and cyan  grab attention, and can indicate

-------
                                                                       17

happiness—they are often used in maps for children.
      The final three variables that Berlin identified are orientation, shape and
texture. Orientation is readily discriminated by the human visual system and is
therefore good for indication of nominal data, although by using common symbols
such  as  clock  faces, ordinal and  interval/ratio data can be  displayed  with
orientation. Shape should be used for nominal data categories, because shapes in
general do not have an apparent order and are not as readily distinguished as
other visual variables. Texture is the relative coarseness of an area fill, and because
rough textures appear closer than fine textures, texture is  good for establishing
visual hierarchies.  Texture should generally be used for nominal data, but it can
also be used for ordinal and interval/ratio data.
      Unlike  orientation  and texture,  shape has  a continuum of possible
representations: from mimetic to abstract (see Classifying Space on page 24).
Mimetic  symbols convey the appearance of what they represent (an animal's
outline represents sightings of that animal). Abstract symbols must be defined (a
legend indicates that triangles represent sightings). ARC/INFO provides many
abstract symbols, a few that are less abstract (for example, an anchor that might be
used to represent a marina), and the ability to define new symbols (see Map Display
and Query). Selection of appropriate levels of abstractness must be considered in
designing a display—use of mimetic symbols may allow less dependence on the
legend, but too many mimetic symbols may give the appearance of a map oriented
toward children.
      Two additional visual variables that have been identified since Bertin
proposed his list are arrangement and focus. Arrangement is the order of symbols
in an area fill (from regular to random to clustered), and nominal and ordinal data
can be represented by changes in arrangement. Focus is the last visual variable; it
is the crispness of the edges of symbols. Because this has an apparent order, focus
can be used to display ordinal data and could be used for interval/ratio data,
although too much variability of focus may make a map hard to read.

Methods for Visually Communicating Data and Meta-Data
      The  communication  of uncertainty has been  studied by Morgan and
Henrion  (1990)  with   Harold  Ibrekk,   although  their  concern   was   the
communication of the uncertainty of linear data. Because of this, the methods they
present cannot be readily used in the communication of mapped geographic data,
except for point symbolization (even this is difficult in ARC/INFO), but their
study highlights some of the difficulties that can be encountered in the visual
communication of uncertainty and their conclusion are useful.
      Their study analyzed nine methods for displaying  uncertainty in linear
data: a point estimate with an error bar; a discrete density function); a pie chart; a
probability density function; a half-height probability density function mirrored
on the x-axis; a dot density horizontal bar a vertical line density horizontal bar; a
modified Tukey box plot (minimum and maximum points are not included, and a
mean point is included); and a cumulative density function (see Figure 1.9 on page
18).  Their  recommendations include  using displays that specifically show the
information that is to be extracted (such as a point for a mean value, if mean values
are important), and using multiple displays (particularly, display the  cumulative
density function and  the probability density function one over the other, with a

-------
                                                               18
  Point Estimate with Error Bar   Mirrored Probability Density
         I	•	1
       0  2 4  6  8 1012141618

  Discrete Density Function
  0.20-i
  0.16-
  0.12-
  0.08-
  0.04-
     0
        
-------
                                                                        19

common horizontal axis).
      For two- and multidimensional uncertainty in non-spatial displays, Morgan
and Henrion give several examples of methods that can be used for graphing (see
Figure 1.10 on page 20). These include: multiple lines in a probability density/
cumulative density pair; multiple Tukey box plots; linear graphs with error bars;
orthogonal displays; and triangle plots alone and in multiples.
      They conclude with  a list of factors that should go  in to design  of
uncertainty displays (Morgan and  Henrion 1990,252):
     finding a clear, uncluttered  graphic style and an easily understood
          format,
     making decisions about what information to display,
     making decisions about what information to treat in a deterministic
          form and what to treat in a probabilistic form,
     making decisions about what kind of parametric sensitivities will
          provide key insights.
They also suggest that display design often involves the reduction of a multi-
dimensional model into the two dimensions of a paper or monitor display (as does
Tufte 1991), and that the intended audience's experience in interpreting graphs
must be considered when creating  a display.
      For the  representation of uncertainty for spatial variables, there are two
possible cartographic routes.  The first of these is the creation of two maps, one for
displaying the data,  and one for  displaying the meta-data.  Olson (1981) and
Laurence Carstensen (1986) have tested this choropleth map arrangement against
two bivariate mapping  techniques (Olson:  spectral  encoding; Carstensen:
intersecting lines) in the representation of statistical correlation.
      Olson finds that  map  readers can initially  interpret value shaded,
monovariate maps pairs more readily than bivariate maps. On the other hand, she
reports that over half of those who could interpret bivariate maps at a significantly
better than guessing level, did better with a bivariate map than two separate maps.
She then suggests that the bivariate maps may be more readily interpreted once the
"cognitive hurdle" (Olson 1981,  269) of the bivariate  mapping technique  is
overcome.
      Carstensen also finds that  map readers find interpreting value shaded,
monovariate map pairs easier than bivariate maps. Because his test compared a
classed map pair and unclassed bivariate maps, the problem he notes with the use
of map pairs (poorer statistical residual scores) would not be as likely to occur if
map pairs are compared with classed bivariate maps.
      The map pair technique should be a useful tool for communication of two
variables, when the data must be classed (as is done for pragmatic  reasons in
ARC/INFO). Since meta-data values may not be correlated with data values, the
use of two monovariate maps  should be an effective tool  for communicating
uncertainty versus techniques that  were designed to highlight spatial correlation.
      The second cartographic route for the visual representation of uncertainty
is bivariate mapping (the mapping of two variables onto the same map).  This
would ensure that meta-data is presented to the map reader with the data, but this
mapping method generally  increases the difficulty  with which data  can be
extracted from a map.  Because of the variety of visual variables, there are several
possible methods of  bivariate mapping, some of which have been tested for
communication effectiveness;  most  of  these have been oriented toward the

-------
Probability/Cumulative Density
                                                                    20
                                       Orthogonal Display
                                          1.0-,
                    iiiiirn
              2 4  6  8 1012141618
            XI  (first uncertain variable)
  Multiple Tukey Box Plots
     8g
     
-------
                                                                       21

representation of spatial correlation. This includes the testing of the intersecting
lines method (which uses texture), color maps made by the U. S. Census Bureau
(which  uses a full hue range to achieve bivariate  representations—spectral
encoding), color maps  that  rely  on  two  complementary hues,  and  the
equiprobability ellipse (which also uses the complementary hues).
      In a technique that allows for bivariate mapping in monochrome displays,
Carstensen (1982) has tested the communication effectiveness of the intersecting
lines method of bivariate mapping (see Figure 5.4b on page 73). These maps use
horizontal and vertical lines for representing two variables. Carstensen suggests
that this scheme  be used in an unclassed map, but this scheme can be used for
classed representations of data and meta-data.  Although this technique can be
used in monochrome displays, colored lines can be used to distinguish the two
variables (this has not been tested for communication effectiveness, though).
      This technique has one disadvantage: the method of producing the area
symbolization may lead to a conflict between two visual variables. Both texture
and value change with the mixing of the lines; both establish visual hierarchies
with coarse textures and dark values standing out in the display.  This is a problem
because in the representation scheme these two points (coarseness and darkness)
are at opposite ends of the data ranges.  This can cause value to be used for
identifying relationships even though squareness in the texture is the  intended
visual variable (Carstensen 1986).
      The U. S. Census products, originally published for 1970 census data, that
show two variables were studied for communication effectiveness by Olson (1981)
(see Figure  5.5c  and d on page 75).  These maps use two hue patterns for
representing two variables, with the part-spectral ranges of yellow to blue and
yellow to red being used on the x and y axes, respectively. Olson reports that some
of these maps (such as education and income) convey information well, especially
for  homogeneous regions. She also indicates that these maps are thought to be
more authoritative and more innovative than two separate  maps showing the
same information. In concluding, she suggests that prominent and clear legends
are necessary for accurate  interpretation of bivariate maps; that both  the
monovariate map pair and the bivariate map be shown, with the monovariate map
pair in a monochrome format; and explanatory notes should be include to the
types of information presented. These guidelines are in keeping with both Morgan
and Henrion, and Tufte, and are feasible with ARC/INFO's ability to rapidly
generate small, monovariate maps.
      As a response to the problems of interpreting a spectrally encoded bivariate
maps, Steiner (1979, in Eyton 1984) has proposed a complementary color scheme
(see Figure 5.6 on page 77).  This color system makes use of the mixing of
complementary colors (such as red and cyan) to produce a central gray region that
highlights the diagonal that represents correlation in a bivariate map. This can be
done for unclassed or classed representations of data, and by flipping the order of
one tint, negative correlations can be shown as well as positive correlations.
      Building on Steiner's proposal, J. Ronald Eyton (1984) developed  the
complimentary color bivariate map with an equiprobability ellipse (see Figure 5.7a
and b on page 79). These maps use a modified, 2x2 complimentary color range,
with an additional class that occurs in the  middle of the matrix and represents the
central cluster of data.  By plotting the two variables to be mapped on a scatter

-------
                                                                        22

diagram,  linearly correlated data will have an ellipse-shaped central cluster.  By
selecting a percentage of the total number of observation to be included in the
central ellipse, a category for the central cluster can be created with the formula
(Eyton 1984,488):


                (X-X  )2   (Y-Y )2
                     m         m
                   o2        -2

      X2 =
a2        a2          2r(X-X )(Y-Y  )
 x          y             v     m/x    rrv
                        1 -r2                    a »a
                                                  x  y

      X2       = chi-square (1.386 for 50%)
      X, Y      = the X and Y observations
      X , Y     = the means of X and Y
        m  m
      a , a     = the variances of X and Y
      r         = coefficient of correlation (Pearson's r)

When this ellipse is displayed in gray, surrounded by the four corners of the data
set in white, black, rea and cyan, a  bivariate map that portrays the central data
cluster explicitly (without haying a staircase effect) can be created.
      There are  other possible methods of bivariate mapping that have been
postulated as effective communicators of (un)certainty. Two of these, which have
not been tested, are color intensity  and focus. Neither of these methods place
emphasis on a correlation of the data and meta-data variables, but rather use visual
variables to highlight portions of the data set, as determined by the meta-data
information.
      Use of color intensity may prove useful for the cartographic representation
of uncertainty, because it allows the highlighting of certain (or uncertain) areas by
specifying intense shades for those areas, and less intense for others (see Figure
5.5b on page 75). Because intense colors stand out in an image, this technique
would allow the creation of a distinct visual hierarchy that emphasizes certain (or
uncertain) values. Generally, for maps that will be used for data communication
(as in the DiBiase model), intense colors should represent certain areas and less
intense (that is, more gray) colors should represent uncertain areas.
      Focus  is  potentially  a  useful means  of  representing  uncertainty.
MacEachren (1992) presents  several  possible variations on focus: edge crispness
(for  external boundaries of points, lines and areas),  fill clarity (for internal
boundaries within  point, line or  area  symbols), fog  (by imposition of  an
interposing, translucent layer over another  symbol),  and  resolution (point
thinning for vector databases or  aggregation in to larger area units for raster
databases). All of these involve the blending of a symbol with the surrounding
parts of the image, thereby eliminating clearly defined regions. In ARC/INFO
edge crispness can be accomplished  by buffering regions and careful assignment
of color in  the buffer areas;  resolution  can be accomplished by thinning points
manually or with the ARCEDIT command GENERALIZE.

-------
                                                                        23

      Another method of bivariate mapping is the use of a fishnet, orthogonal,
view to display one data set, with another data set used to color the display (see
Figure 9.2 on page 101). This is commonly used as a technique for displaying
terrain elevation (with land use/land cover draped over the net by specifying the
net's color with the  land use symbolization scheme), although there is no
restriction on  its use for  other types of data.  When a data layer represents
information that is known to be continuous and smoothly changing (such as
elevation, air   temperature, or air pressure), this type  of representation is
appropriate. If the uncertainty of a spatial data layer can be shown or assumed to
be continuous and smoothly changing, a fishnet representation of that statistical
surface (with data values used to specify the color of the net) should be an effective
method of conveying meta-data.
      Finally, another representation of continuous and smoothly changing data
that could be used to display data and meta-data is the use of isoiines (see Figure
9.1a on page 99).  With this technique data and/or meta-data can be represented,
with the use of another display technique  if only  one is to be displayed with
isoiines. Like fishnets, isoiines are often used for displaying terrain information (as
in topographic  maps), but isoiines can also be used  to show data and meta-data.
This could be accomplished by using different  hues, values, intensities, sizes, or
textures to indicate which lines represent data and which represent meta-data.

-------
                                                                     24

                            Chapter Two
                        Producing Displays

      As Monmonier and Johnson indicate, the process of designing and creating
maps and graphics is a multistep process.  There are many things that must be
considered in the design process, including the classification of both space and
data, and issues such as page layout. This chapter addresses these issues in the
context of ARC/INFO and leads into the methods of representing data that are
presented in later chapters.

Classifying Space
      A fundamental process in the classification of space is the abstraction of
data.  MacEachren and Ganter (1990) have describe this as  a  shifting along a
continuum ranging from images to  graphics.  At one end of this continuum is
information in its rawest form: for  spatial data, aerial photographs and other
remote sensing products (see Figure 2.1, left). Further along the continuum, maps
provide an abstraction of images (Figure 2.1, middle). Some of the information
that  is present in  an image is  dropped in order to allow symbolization of
information that may not be directly  perceivable in the  image.  For example,
replacing a dark line through a green area with symbols for a road and a forested
area; the road can then be given a label, as well as provide an indication of the
number of lanes and access. At the other end of the continuum, graphics allow the
use of position in the display to symbolize any variable (Figure 2.1, right).  For
example, a distance decay graph uses the X axis to indicate distance from a point
in any direction, and the point may not be tied to any one geographic location.
      A similar continuum is the range of symbols that can be used in maps and
graphics (see Figure 2.2 on page 25).  At the end of the continuum most similar to
images are mimetic symbols.  At the end of the continuum most similar to graphics
are abstract symbols. Except for a few symbols that are toward the mimetic end of
this continuum, most of the symbols that are provided with  ARC/INFO are
                                                     Distance from site.
Figure 2.1   The Image to Graphic continuum. Images approximate what we see,
graphics  provide  abstract  representations  of,  possibly,  invisible  relations
(MacEachren and Ganter, 1990).

-------
                                                                      25
abstract-these tend  not
to   have   commonly
accepted meanings, and
thus can  be  defined by
the   cartographer    as
needed.
      The  degree   of
abstraction that should
be used is dependent on
several   factors.   Most
                              mimetic
                         abstract
                          Figure 2.2   The Image  to  Graphic  continuum
                          applied to symbolization. Symbols can vary from
                          mimetic to abstract. All of these symbols could be
                          used to represent a marina—the audience must be
importantly, the data that   considered in selecting which to use.
needs to be mapped must
be effectively shown, with a degree of abstraction that is appropriate to the data.
For example, if a distance decay model is developed from a set of sample sites,
showing the values measured at individual sites  may not convey the distance
decay as clearly as a graph. If the site locations  are important, a map can be
generated with the data and an inset showing the graph can be made using
ARCPLOT's graphing tools. As mentioned in Chapter One, the audience must also
be considered when determining symbolization  abstractness. Highly mimetic
symbols can convey a sense of simpleness, which may need to be avoided in order
to project an image of authority. A third consideration is the means of display;
mimetic symbols can be used to minimize the need for/use of a legend. This can
be helpful for slides and overhead displays.

Projections
      There are several classes of projections, each with strong points. Equal area
projections show the same amount of space on the earth's surface for any given
area of the map. Equidistant projections show true distance from a point or along
given lines. Conformal projections have a constant scale in every direction from
any given point, and because of this latitude and longitude lines meet at right
angles.    Some  projections,  such  as  Robinson's, are  not  mathematical
transformations, but rather, tabular.  These projections have been designed for
specific    purposes    (Arthur
Robinson    developed    his
projection for world maps that are
more   visually  appealing  than
others,   like   the   Mercator
projection).   Other projections
include the Mercator and gnomic
projections;     the    Mercator
projection shows lines of constant
compass  directions as  straight,
and the gnomic projection shows
all great  circles as straight  lines.
These  two  projections  are best
used for  navigation and not for
general       reference
environmental maps.
                            or
Figure 2.3   EPA Region  Six  shown  in
Alber's conic equal area projection.

-------
                                                                          26
       For environmental maps, in  general,
equal  area projections  should be used.  This
insures that symbols, especially those based on
size, are not distorted on the basis of the base
cartographic data  (such as county,  or  state
outlines). For the 48 contiguous states, Alber's
conic  equal  area projection should  be used;
Alber's conic should also be used for smaller
regions that are east-west oriented, such as EPA
Region Six (see Figure 2.3).  The sinusoidal
equal area projection should be used for regions
that are north-south in extent, such as  EPA
Region  One (see   Figure   2.4).   Lambert's
azimuthal equal area should be used for areas
that have the same extent in all directions from
a center point (see Figure 2.5).
       Because the  distortions of any given
projection are scale dependant, the use of any
specific projection becomes less critical as the
map's scale increases (and thus the area shown
becomes smaller).   For example, a map  of a
hazardous waste site may need the Universal
Transverse Mercator grid for locations within
the site; at this large a scale (1:50,000 or larger)
use of the UTM projection is more appropriate
than  an equal-area  projection-there will  be
little,  if any, perceivable distortion  of areas
regardless of the projection used, as long as the
projection is centered on the site.
       The ARC  command  PROJECT allows
transformation   of    coverages    between
projections,  as  well   as  realignment   of
projections. If data is obtained from a national
database, the projection (if not included  with
the database) may be:

          Project: projection albers
          Project: units meters
          Project: parameters
          1st standard parallel: 29 30 0
          2nd standard parallel: 45 30 0
          central meridian: -96 0 0
          latitude of projections origin: 23 0 0
         false easting (meters): 0
         false northing (meters): 0
Figure 2.4    EPA Region One
shown in the sinusoidal equal
area projection.
Figure 2.5   EPA     Region
Four   shown   in  Lambert's
azimuthal     equal     area
projection.
Because this is designed for the contiguous 48 states, any subset of this data should
be reprojected for the subset, even if the output projection is Alber's. In particular,
the standard parallels should be chosen such that the new parallels divide the re-

-------
                                                                       27
gion into three equal east-west bands,
and the central meridian select bisects
the region. This minimizes distortion
away from the parallels and prevents
an appearance of the whole map lean-
ing in one direction, which results
from an off-center central meridian.

Scale and Generalization
      As noted above, selection  of
scale is critical in the selection of an
appropriate projection. Scale is also
important in the selection of total area
to be mapped, and the relation of the
area to the data to be mapped.  By
using  a  smaller  scale, and   thus
showing more area,  the apparent
seriousness  of a problem will be
reduced. This is because the apparent
extent of a problem is reduced by
showing more of the surrounding
area.  Zooming in on a site has the
opposition effect~the appearance  of
Figure 2.6   Insets allow focusing on
detail and give a 'big picture.' When
possible, place  them  in  otherwise
unused space.
symbolization over a large part of the display conveys the idea that the problem is
everywhere.  Use of zoomed in areas, which allow presentation of detail, with a
locator map can combine these two  extremes (see Figure 2.6 on page 27).  The
locator map allows a wide perspective, which, when combined with the large-
scale, detailed map, conveys both a micro and a macro reading, as Tufte (1990)
suggests is a key component of good presentation graphics.

Space and Time
      With the increased processing speed of single user workstations, and the
increased  flexibility of ARC/INFO, animation of  data—both for analysis and
presentation—is becoming feasible. Animation has  several possible approaches.
Change in time can be used to depict existence, attributes, or change in existence
or attributes.  Change can also be broken into: looking at different  parts of a data
set, one after the other; looking at a data set that shows variation in time, in time
sequence;  or use of progression in time to show another data variable (this is
comparable to use  of an axis of a graph to indicate change in time rather than
change in space).
      There are two primary methods of creating animations in ARC/INFO, and
both require AML  programming. The most flexible method is the use of AML
driven interactive map composition.  This allows the display of  an  object (for
example a point location, such as the population center of the United States) that
changes over time. The object can be drawn at one location, erased with the
MDELETE command, and redrawn at a  new  location. The other method of
animation is the use of GRID to display changes in area data. The cliche 'Raster is
Faster' is still true enough to make a difference. Separate grid  layers can be
generate, each of which shows the change in areal extent of a phenomenon. Each

-------
                                                                        28

layer can then be draw with successive calls to a display routine.

Classifying Data
      Data to be mapped for presentation should generally be classified (into five
or six groups) in  order to aid ease of interpretation. When drawing features in
ARCPLOT, the commands:  ARCLINES, LABELMARKERS, POINTMARKERS,
and POLYGONSHADES, as well as AML driven RESELECT's, allow the use of
lookup  tables to  define a data  to symbol relationship for feature display (the
CLASS command also allows grouping of data).  These tables allow user defined
ranges for  cartographic products, rather than the default of directly relating the
data item identifications with the symbol set identifications. These ranges must be
determined by the  cartographer and range types  include: quantiles, equal
intervals, geometric progressions, mean and standard deviation intervals, and
natural breaks (the Jenks' Optimal classification) (see Figure 2.7). Certain data sets
may need to be classified on the basis of predetermined breaks (such as maximum
allowable concentration of a pollutant). This can be accomplished by manually
specifying  all breakpoints, or by use of another classification  scheme, with  the
externally defined break added in. For example: run the Jenks' optimal AML for
five classes, note the breakpoints, then run the manual specification AML on a new
lookup table and specify the Jenks' breakpoints plus the predefined point.

Manual Classification
      For  both the CLASS command and  lookup tables, break points can be
chosen manually, by exporting the data to a statistical package, or with  the
assistance of the ARCPLOT command STATISTICS. The command syntax is:

STATISTICS   
       - contains the items to be classed;
       - includes POINTS, ARCS and POLYS.
   Jenks'  Equal Interval  Quantile   Mean and Standard Deviation   Exponential
     www                   w                  f-
   I1LL
                                     CN
                                     rH
                                     IT)
Figure 2.7   Breakpoints set by different classification schemes. Equal Interval is
close  to the statistically optimal (Jenk's), for  this example. Quantile separates
relatively similar values at the extremes of the distribution. Mean and Standard
Deviations fail because the distribution is bimodal, not normal; Exponential fails
by grouping all of the items greater than 256 into one category.

-------
                                                                        29


STATISTICS has several subcommands:

         SUM 
         MEAN 
         MINIMUM 
         MAXIMUM 
         STANDARDDEVIATION 
             is a field in the attribute table of .
         END - indicates all subcommands have been entered.

These values can be used to calculate mean and standard deviation break points,
as well as quantiles and geometric progressions.
      There are several methods of defining classes within ARC/INFO, as well as
several  methods of determining class breaks.   As MacEachren (1992) notes,
interval and quantile classification systems,  which are automated parts of the
CLASS  command, generally  are not the best methods for grouping data and
therefore will not be discussed here. The syntax for manually specifying intervals
with the ARCPLOT command CLASS is:

CLASS MANUAL <#classes> 
      <#classes> - the number of classes that will be generated;
       - the numeric class breaks. There must be <#classes> -1 values.

CLASS NONE - turns off the classification scheme.

Until the CLASS NONE command is given the classification remains in effect, and
will cause  all subsequent  uses of ARCLINES,  LABELMARKERS, POINT-
MARKERS, and POLYGONSHADES to be classified.
      Another method of classifying data is to use lookup tables. These are INFO
files that perform a similar function as the CLASS MANUAL command. A manual
procedure from ARC for the creation of a lookup tables is:

         1) Use PULLITEM to extract the attribute table item that holds the data values;
         2) Use ADDITEM to add a numeric field called SYMBOL;
         3) Enter INFO, SELECT the table and PURGE the old data;
         4) ADD records to the lookup table;
            a) Specify the breakpoint value for numeric data, or the alphanumeric for text data;
            b) Specify the symbol number;
         5) SORT on the data field (not the SYMBOL field).

      It is  possible to automate  the creation and specification of lookup tables
within ARCPLOT.  SETMAN.aml allows  the manual update  of symbol values
within a lookup table, as well as creation a new table.  The AML handles both
nominal and numeric data. The syntax is:

SETMAN 
       will be created if it does not exist, existing tables will be selected for update.

Natural Breakpoints
      Because the Jenks' Optimal classification scheme is generally considered to
provide the best classification of  numeric data  (MacEachren 1992), it should be
included in ARC/INFO (and with any mapping program). Unfortunately it is not,

-------
                                                                        30
but ARC/INFO's does provide a macro language, commands to export data, and
a method for calling system programs.  These  can be  combined  to allow the
automated generation of the breakpoints for Jenks' optimal classification.  The
AML, JENKS.ami, exports the data to be classified, calls the C program, jenks
(which must be in the executable search path), that calculates the breakpoints, and
constructs a lookup  table.  The cartographer must still specify symbol matches
though.  Please note  that the routine generates a lookup table that contains more
information than is required for ARCPLOT's use (although ARCPLOT can still use
it); this additional information is included for use by the AML's presented in later
chapters. The additional information is: an initial line that is one smaller than the
smallest data value (this is included only in non-nominal lookup tables, including
the output from Jenk's); a column that contains the  number of values  in the
associated coverage in the first record, and the number of values in each category
in the following records; and a column that contains the coverage name in the first
record, an 'n' (for nominal data), an 'o' (for ordinal data) or an 'r' (for interval/ratio
data), the coverage type in the third record (point, arc,  etc), and for route and
section  lookup  tables,  the  route name  in the forth record. This  additional
information allows these  AMLs to select the coverage  it  is based on  and, for
legends, provide category  totals and appropriate symbolization.
      The C program jenks.C can be compiled on a workstation with an ANSI C
compiler (including Data General's DG/UX 5.4) by using  the command line:

         cc -ansi -o jenks jenks.c -Im

      The command syntax for using JENKS.aml is:

JENKS      
       - the coverage that the lookup table will be created for;
       - the feature type (point, line, poly, etc) of ;
       - an interval or ratio level  data field in the attribute table of ;
       - the name of the lookup table to be generated;
       - the number of data classes in the output lookup table.

Eyton's Equiprobability Ellipse Bivariate Classification
      The uniqueness of Eyton's ellipse as a bivariate mapping scheme requires
that a column be added to the attribute table of the data to be mapped. As with the
calculation of Jenks1 optimal classifications this is best done with a combination of
macro and  C   program.    EYTON.aml  requires two  ratio data  items—the
classification system is based on the  parametric statistic,  Pearson's  r.  It also
requires  that the system program, eyton, be in the executable search path.  Like
jenks.c, the eyton.c program can be compiled on a workstation with an ANSI C
compiler by using the command line:

         cc -ansi -o eyton eyton.c -Im

-------
                                                                          31
The command line syntax for using EYTON.aml is:

EYTON     
          {classes} {chi_square_value}
        - the coverage that the lookup table will be created for;
        - the feature type (point, line, poly, etc.) of ;
         - interval or ratio data fields in the attribute table of < cover>;
        - the output lookup table;
       {classes} - the number of classes on each axis, valid codes are 2 (default) or 3;
       {chi_square_value} - a value for the selecting the number of points in the central ellipse,
          defaults to 1.386, which is 50% of the observations.

Symbol Value Update
       The  lookup  tables  generated by  these commands  may  require  the
modification  of the symbol numbers.   There  are  several ways  this can be
accomplished: use of '&SYS ARC INFO' to enter INFO from ARCPLOT; manually
declaring a  cursor and using it to update the lookup table; using the AML given
above for manually changing values; or use of the AML, SETAUTO.aml.  This
AML updates a lookup table by replacing the SYMBOL values with an ordered set
of numbers. It requires a starting value, a step value, and can optionally be given
two other values.  Nonlinear progressions  can be specified by including a 'scale'
value,  and  decreasing   numbers  can  be  obtained   by   specifying  a
'subtraction_value.'   These non-linear progressions should  be  used when  a
SYMBOL value is used for specifying something other than a predefined symbol,
such as symbol value. In this case value should range from black to very light grey
with a greater change in the black/dark-gray values (value differences are easier
to discriminate for lighter values). For five classes, the use of 0 50 0.8  and, if
necessary,  69 as  the subtraction value, will  set up  an appropriate  value
progression.

SETAUTO     {scale} {subtraction_value}
        - the lookup table to be updated;
        - the beginning value of the progression;
        - the value used to change the beginning value;
       {scale} - an exponent that is applied to --defaults to 1;
       {subtraction_value} - value that a symbol value will be subtracted from in order to generate
          decreasing progressions

Unclassed  Maps
       Unclassified maps (those that are continuously symbolized) can be created
in ARC/INFO, but the symbolization schemes available can be limiting and the
potential improvement in the ability to depict data (as suggested by Monmonier,
1976) may not justify the  effort for presentation of data.  For exploration and
analysis however, unclassified  maps are an approach that  may be worthwhile,
although  software/hardware limitations  can  impinge  on truly  unclassified
displays.  Eight  bit plane graphic  displays can only display 256 colors at once,
thereby preventing unclassed display of data sets with more than 256 values. This
can be partially circumvented by using the  Jenks' optimal classification system to
create a lookup table with 256 classes. For data sets with less than 256 data values
or for systems with 'true color' (24 bit plane) displays, use the manual lookup table
definition AML (SETMAN.aml) to define a  nominal lookup table. A limitation of

-------
                                                                          32

'true color' displays is that 24 bit graphics systems can only generate 256 shades of
gray. So, in general, for 'unclassed' maps, create a lookup table with 256 categories
(or fewer if there are not 256 data values); once a lookup table is created, symbol
values must be assign on the basis of the lookup table's item value. SETUNCL.aml
accomplishes this. The command syntax is:

SETUNCL  {scale_factor} {subtraction_value}
        - the table with symbol values to be updated;
       {scalejactor} - an exponent for generating non-linear scaling, defaults to 1;
       {subtraction_value} - allows creation of descending scales.

Page Layout
       Once  the  information to  be displayed is classified, the data must be
displayed.  This is the process of page layout, which can be broken into: layout
within parts and layout within the whole.  Layout within parts of a  display
involves the selected use of visual variables to highlight a part or parts of a data set
or display.  This is the  establishment of graphic hierarchies and involves two
techniques: visual isolation, and visual levels (MacEachren 1992, 27) (see Figures
1.1 on page 2 and 1.7 on page 14).
       Visual isolation is the use of visual variables to give the appearance of the
separateness of part of a display.  The visual  variables that can accomplish this
include location,  size,  focus, value, hue,  saturation, texture and  orientation.
Location is the most obvious control-display items that are not close together will
not generally be associated (see Figure 2.8).  Size can influence isolation because
any feature that is drawn smaller than the area  allocated to it, will appear separate
from the surroundings  (this is done in cartograms).  Focus influences isolation
because unfocused, fuzzy symbols blend into the surrounding, reducing isolation.
Value, hue, saturation, texture and orientation  influence isolation by enabling one
item to be displayed in a manner that is noticeably different than its surroundings
(dark blue surround by yellow, for example) and thus isolating it.
       Like visual isolation, visual levels are  used to separate an item from its
surroundings, but whereas  visual isolation  seeks to accomplish that in the x-y
space of the  page, visual levels give
the appearance  of separation in  the
third dimension. The visual variable        Isolation as Separateness
that control  visual levels  are: size,
value,  saturation, texture, focus and
hue.   These visual variables all play
a  part  in  depth  perception.   Size
controls visual levels because things
that appear  larger,  appear  to  be
closer.  Value controls visual levels by
control of contrast from the display
background—objects    with   high   	——	—1 mlle
contrast  will  appear   above   the   	1 ldlometer
background (see Figure 2.9 on page  Figure 2.8  Visual isolation. The letters
32).    Saturation  controls  contrast  in each  word 'go together,' as do the lines
because intense colors appear to be  and their labels, but not the lines and the
closer   than   less-intense    colors,  display title.

-------
                                                                        33
Texture  influences  visual  levels
because coarse textures appear to be
closer  to  the  observer.    Focus
influences  visual  levels  because
sharply focused objects appear to be
closer  than fuzzy  objects.   Hue
influences visual levels because, at a
constant  value  and  saturation,
yellow  is  more noticeable  than
other colors, but the influence of
hue can be hard  to predict and
control.
      In  designing  a  page,  the   Figure 2.9  Visual levels. Darker value
layout  within   the   whole  page   usually is a foreground (at a higher level)
involves the use of space  (and  for   than lighter values, particularly on white
animation,   time)    to  present   backgrounds.
different parts of a message. The
type of display has an influence on what can, and should, be said on an image.
Paper displays can be of any size, but two are of more importance: 8.5 by 11 inch
paper, and full size electrostatic plotters. 8.5 by 11 inch paper is probably the most
common format for graphic publication.  Its small size restricts the amount of
information than can be shown on one page, but generally high resolution and the
ability of the  reader to study  the  graphic at length permit  a great deal  of
information communication.  The portrait orientation, which is normally used
with text, limits the left-right extent of a display, so graphics that have a large left-
right extent should be  displayed as landscape, when the graphic cannot be
redesigned to take advantage of the more common reading orientation. Generally
the primary part of the display should be shown with a maximum left-right extent.
Titles can then be placed above, and additional  information such as legends and
locator diagrams can be placed below the primary part of the display.
      Large size paper products, such as posters, can  be  created with  less
emphasis on the orientation of the page, and more emphasis on the size and shape
of the area of interest.  Orientation of the page should follow the larger axis of the
data set.  Additional information should then be placed around the main part of
the graphic, in order to take advantage of 'dead space/
      Slides and overheads limit the amount of data that can be displayed because
of the distance that they force on the viewer (see Figure 2.10 on page 34). The
farther a viewer is from the display, the smaller the display will  appear, limiting
the visibility of detail. This implies that the page layout decisions for slides and
overheads require greater generalization in both space and data. Only the most
important information needed to support a point should be included on the
display.  As such, additional information such as legends and locator diagrams
should not be included.  This allows the maximization of the available space for
data display, and like posters, the orientation of the display should be aligned with
the larger axis of the data area.  This loss of data on the graphic is offset by the
explanations of the presenter of the slide, overhead or video.
      Video involves both high resolution  computer  monitors,  and  lower
resolution National  Television Standards Committee displays.  Video displays

-------
                                                                        34

must, like slides, take into account the distance from the viewer, as well as lower
resolutions and display flicker. Minimize the amount of ancillary information, in
order to maximize the amount of space available for the primary data. Like slides,
this loss of visual information must be offset by audio explanations.

Titles and Type
      The text that is used to label a map or graphic should be selected carefully.
This involves the determination of what should be labeled,  how it should be
labeled, where the labeling should be placed, and what type of lettering to use.
Which items must be labeled should be identified during the planning phases of
the map design, particularly in identification of the communication goal—label
those things that are necessary to accomplish the goal. Additional labeling may be
necessary for locational reference, etc., but this ancillary labelling should be kept
to a minimum in order to emphasize the new information.
      As with the amount of features that get labelled, the amount of text in each
label should be kept to a minimum. Often leading 'a,' 'an' and 'the's can be dropped
from labels. Phrases such as 'the city of,' 'plot of or 'Legend' can also be dropped
without loss of information and result in an improvement in communication by
getting rid of the clutter.
      Labels should be placed on or near the object they label. Locations for titles
can include prominent positions such as top center, but may also include making
use of otherwise empty space (such as using the Gulf of Mexico's space in a map
of Florida). Labels of point features should, when possible, be put at the upper left
of the feature.  Labels of linear features should be placed on straight sections of the
feature, or if necessary fit along a smooth curve, and be oriented for maximum
readability (top up, whenever possible). Area features should be labeled inside
their boundaries, when possible.  Although it is not entirely unavoidable, text
should not be broken up by linear features or area boundaries; it may be preferable
to have a break in the line, although this can be difficult to accomplish in ARC/
INFO.
      There are  two general  classes of fonts that are used for cartographic
displays, roman and sans-serif, both of which can be italicized, underlined, etc.
Roman fonts, such as the font used for this text, have 'serifs'-the small extension
at the end of the strokes that make up the character.  These fonts usually are more
legible than fonts without (sans) serifs, particularly for small text. Sans-serif fonts
are simpler, but are generally best used only for titles and other large text. For
either type of font that may be used to label a body of water, it is cartographic
tradition to use an italic font. This may connotate a sense of flowing.
      The size of text influences legibility, particularly  when the  speed of a
presentation is controlled  by the
presenter, rather  than  the  reader.
For paper displays that the reader              PoStGr Title
will be able  to study at length, text
as small as 5 points (0.07 inches) can                  Po*t8r ™*

b  Ud   Dif!erenCfLin^\ |iZ6S   R9"re 2.10  The apparent size of 16 point
          3n  6oaxSo 35% (0\°^5 Pt'   text from 8 feet away (top) is the same as 6
         tf 0.13\9 pt e^c.)(Kea^      int letterm  fr/m  £- feet (bottom)
1982). For paper displays which are   [MacEachren ?992f 71).

-------
                                                                       35

not available in depth study (such as flip charts and posters) the following
suggestion for slides and overhead  are more appropriate,  although  detail
information could be included in posters.
      For slides and overheads, not only must there be less text, but the text
should be larger (see Figure 2.10 on page 34). As a general rule text should be at
least 18 point with larger differences than paper (50%)  (MacEachren 1992, 71).
Video displays also  need to make use of large points sizes, because of lower
resolution and with some systems, flicker.

Insets and Legends
      There are two main  types of  insets: locator diagrams and  additional
graphics.  Locator diagrams show the location of a smaller area, in relation to a
larger and presumably
more well known  area    ..^   |
(seeFigure 2.11). When  C      "j	"";	iI	YI	\"\ .
multiple graphics  are  i        i     [     I     I      \      i     \    ';
generated for the same  L	/"   \	i    :  "\   i •••   :"\	/   •;  ;   A.....
region, only one locator  i
diagram  should    be  !--•
needed, and it may  be  i
advisable   simply   to  ;
               i J       :  :	  :..   /    .    .'   "• .-•  ;  ;-.— . /
make   the    locator i /  f\ "••./'   /   ,/ \  (      *""'""     :'
diagram  a   separate I
graphic.   For  displays i
which need integrated |
location diagrams, they i	
should   be    visually
isolated—the   diagram      Figure 2.11  A simple locator diagram.
should be secondary to
the main display. Therefore avoid bright colors, if not colors entirely, but ensure
the area of the main display is readily discernible.
      Other graphics include legends, north  arrows,  scales and, for bi- or
multivariate displays, monovariate maps.  Legends must convey the information
necessary to extract information from the main display.  North  arrows assist in
orientation when north is not at the top, which it does not have to be. For maps to
be presented to an international  audience, include a north arrow, because south
may be expected at the top (as is the Chinese custom). For maps where a large and
hopefully well-known area is displayed (the United States or a subset of them, for
example),  a north arrow is not needed.  Map scales, like north arrows, are not
necessary for maps of well known areas and generally are not needed, unless the
map may  be used  for measurement of extent.   As suggested by Olson (1980),
monovariate maps may be helpful in assisting readers who are not familiar with
bi- or multivariate maps. These single variable maps should be designed to be
both isolated and at a lower level in the visual hierarchy than the main display.
Monochrome displays may be the most appropriate technique for all but nominal
data.
      Because of the requirements of different media, the amount of information
that should be carried in a legend varies, as does the need for a legend. Because of

-------
                                                                       36
the generalization that goes into slides and video displays, the need for legends
should  be minimal, particularly  when the presenter  is available  to answer
questions. A slide for a legend may be a waste, because it cannot be referred back
to, and handing  out a legend before a presentation may only  serve to focus
attention on the legend, not on the data that is being presented.
      Paper displays, particularly those where the author is not available to
answer questions, require the most thoughtful  use of additional graphics.
Explanations of data must be written out on the graphic.  For bi- and multivariate
displays, the mapping technique  should be explained,  and monovariate maps
should be included. Legends should present the character of the data (data level,
continuous or discontinuous, classification system used).  This can, in part, be
accomplished by  defining discreet variable symbols separately with a label on
each  symbol, and  by defining  continuous  variables  as a  continuum  with
breakpoints, rather than midpoints, labeled (see the legends in Figures 5.1 on page
67 and 5.2 on page 69, for example).
      An approach to legend design that may be most helpful to inexperienced
map readers is natural legends (see Figures 6.2 on page 86 and 9.1 on page 99).
Natural legend design is particularly helpful with abstract symbolization, such as
isolines, which can be difficult to interpret by a novice (especially when only given
a contour interval).  By providing a legend that shows isolines on a surface and
spot values, the concept of a continuous surface and how data values are shown
can be communicated. Isoline data, particularly when several  data items are
shown together, can be displayed as a single-variable, fishnet  inset; this  will
convey the concept of a continuous surface and allow analysis of individual data
sets, in addition to the presentation of the interplay of multiple data sets in the
main display.

ARC/INFO Hints
      There are two lines that should be added to the  $HOME/app-defaults/
Mwm file. These lines allow ARC to automatically place popup windows:

         'clientAuloPlace: FALSE
         *interactivePlacement: FALSE

      There are a few general tools available in ARCPLOT that may assist in the
design or production of maps and  graphics.  When  using 'DISPLAY 9999,'
ARCPLOT defaults to a black background; this is a problem with the 'true color1
display of value-black objects will be visible on paper, but invisible on the display,
and vice-versa for white.  The remedy for this is to include the following line in
either a .cshrc or .login file:

         setenv CANVASCOLOR WHITE

      When using nineteen  inch  monitors, it is  possible to specify a display
canvas size that has (as far as ARC is concerned) the same dimensions as an 8.5 by
11 sheet of  paper.   The  dimension for portrait  layout are 691 by  930.  The
dimensions for landscape  layout are 896 by 727.
      All AMLs  can be  executed in two ways: specifying an AML path and
running with the &RUN directive; or by placing the AMLs in an  atool directory

-------
                                                                         37

and running them as a standard command. The $ARCHOME/atool directory can
be  used, or the  AMLs can  be placed in another directory and  linked  into
$ARCHOME/atool subdirectories.  The AMLs can also be placed in any other
directory, with subdirectories for each module (arc and arcplot), and use the
&ATOOL directive to specify the directory path (see the &ATOOL page in the
AMI User's Guide).
      The AMLs discussed in this document, for the most part, follow a naming
convention. For the first character:
      P = point or node displays.
      L = line, route or section displays.
      C = choropleth area displays.
      G = graduated symbol area displays.
      D = dot density area displays.
      R = raster (grid) area displays.
      S = surface displays.

For each additional character (one for each variable displayed):
      H =hue.
      V = value.
      I  = intensity.
      O = orientation
      S = shape.
      T = texture.
      Z = size.
      B = box (used with points only).
      C = circle (used with points and graduated symbols only).
      G = cartoGram (used with graduated symbols only).
      D = density (used with dot density only-all are DD).
      F = fishnet (used with surfaces only).
      AO    = angle (for single variable displays only: cao, pao)
      ISO   = isoline (used with surfaces only).

Other codes include:
      P = pie (graduated circles with subdivisions).
      L = legend (gel, ggl, gpl, sfl; al - single variable; ol - orientation)
      BL = bivariate legend.
      CC= complementary colors.
      DH= dual hue colors.
      EE = equiprobability ellipse (and eel - legends).
      IL = choropleth area intersecting lines with PAT data (and cill - legends).
      TL = choropleth area intersecting lines with lookup tables.
      RGB  = red, green and blue (and rgbl - legends).

-------
                                                                     38

                           Chapter Three
                Point Symbolization in ARC/INFO

      For point symbolization, the visual variables of hue, orientation and shape
can be  changed by the use of an INFO lookup  table that relates nominal
distinctions to different symbols; the visual variables of size and value can be used
to symbolize ordinal and interval/ratio data either with a lookup table or with
data from the  file attribute table.  For lookup tables use, one of the methods
discussed in Chapter Two should be used to generate a lookup table. The AMLs
discussed in this chapter generally require some of the additional information that
the lookup table generating AMLs in Chapter Two provide.

Monovariate Symbolization
      The macros in this chapter demonstrate the use of AML to accomplish map
design.  The AMLs become progressively more complex in order to set the stage
for bi- and multivariate mapping, which, for all intensive purposes, must be done
with AMLs (most of which also require the use of cursors).
      Each of the examples makes use of  the markers that are provided with
ARC/INFO: a  default set of markers in the markerset file, PLOTTER.MRK; and
several  other  available markersets: BW.MRK, COLOR.MRK, MINERAL.MRK,
MUNICIPAL.MRK, OILGAS.MRK, USGS.MRK, and WATER.MRK. Most of these
are displayed in the Map Display and Query guide's Appendix B, and the ability to
create new symbols sets is discussed in Chapter Three of Map Display and Query.

Nominal Data-Hue
      Hue is  best  used for  nominal  data, and the COLOR.MRK markerset
provides fifteen symbols of the same shape, but each with a different color. All of
the other markersets, except BW.MRK, have the same symbol in black, red, green
and blue.  An appropriate markerset can  be selected  with the MARKERSET
command, and used with a lookup table that relates changes in data values to
changes in hue.  POINTMARKERS can then be used in conjunction  with  the
lookup table to plot the points. This is the method used in PH.ami; see Figure 3.1a.

PH    
       - the coverage to be displayed;
       - an item in the point attribute table of ;
       - an info table that relates  values to markers;
       - a markerset file.

Nominal Data-Orientation
      Orientation is  best used for nominal  data, and the  MINERAL.MRK
markerset provides some ready-to-use symbols that vary in orientation (such as
markers 143 through 149). PH.ami (above) uses command line arguments to select
this markerset and draw these symbols; see Figure 3.1b.

Nominal Data-Shape
      Shape should only be used for nominal data. All of the markersets, except
COLOR.MRK, have symbols that change in shape. The macro PS.aml uses cursors
to extract the coverage name and type (point or node) from the lookup table (which

-------
                                                                      39

Figure 3.1a: PH.aml                   Figure 3.1b: PH.aml
   Nominal data displayed  with  hue    Nominal data displayed with orien-
using the color.mrk markerset.         tation   by   using   symbols   from
                                    mineral.mrk.
               o
t

    o
    o
Figure 3.1c: PS.aml                   Figure 3.1d: PAO.aml
   Nominal data displayed with shape.    Ratio data displayed  with symbol
                                    orientation.

-------
                                                                         40
must be in the format generated by the lookup table AMLs discussed in Chapter
Two), and then draws the points with the lookup table; see Figure 3.1c.

PS  {markerset}
       - a lookup table generated by SETMAN;
      {markerset} - a markerset file-the current markerset is the default.

Ordinal to Ratio Data-Orientation
      Orientation is  best used for nominal  data, but ARC/INFO allows the
specification of  orientation that can be used with higher data levels. For point
symbols,  the  MARKERANGLE command allows specification of the  drawing
angle of the current symbol.  The macro, PAO.aml, uses cursors to determine the
minimum and maximum data values to be displayed, and then uses these values
to adjust the drawing angle of the selected marker; see Figure 3.Id.

PAO   {markerset} {markersymbol} {markersize}
          {max_angle} {pointlnode}
       - the coverage to be displayed;
       - a numeric data field in the attribute table of ;
      {markerset} {markersymbol} - specify a marker-the current marker is the default;
      {markersize} - drawing size of the marker-the default is 0.15 inches;
      {max_angle} - angle of the maximum data value-the default is 179 degrees;
      {pointlnode} - symbolize point (the default) or node feature.

Ordinal to Ratio Data-Value
      Color value is best used  for  ordinal data.   For  point symbols, the
MARKERCOLOR  command allows specification of  the  drawing color of the
current symbol.  ARC/INFO has a  Munsell-like color specification: the HLS color
model;  the parameters for the HLS model  are   .
Hue is an integer from 0 to 360 (red = 0, green  = 120, blue = 240). Lightness is an
integer from 0 to 100 (black = 0, white = 100).  Saturation is an integer from 0 to 100
(gray = 0, fully  saturated  =  100).  Specification  of changes in  value are
accomplished by: setting hue to any valid number; adjusting lightness to control
the grayness (for output on white paper, generally use a percentage less than 90);
saturation must  be set to 0. The macro, PV.aml, uses MARKERCOLOR to control
value; see Figure 3.2a.

PV  {markerset} {markersymbol} {markersize} {hue} {intensity}
       - a lookup table generated by SETMAN or JENKS:
      {markerset} {markersymbol} - specify a marker-the current marker is the default;
      {hue} {intensity} - specify a color that will be value shaded:.

Ordinal to Ratio Data-Size
      Size can be used in point symbols for ordinal and interval/ratio data, and
within ARC/INFO changes in size can be achieved in two ways: for circles and
boxes, ARC/INFO provides the  SPOTSIZE/POINTSPOT command  pair to
generate circle or box point symbols; for other  symbol shapes, size is changed by
changing the drawing size of point symbols. The MARKERSIZE command allows
the specification of the size of the current point symbol. This macro, PZ.aml, uses
markersize to display a ratio data set; see Figure 3.2b.

-------
                                                                         41
Figure 3.2a: PV.aml
   Classed   ratio    (ordinal)
displayed with color value.
     Figure 3.2b: PZ.aml
data    Ratio data displayed with symbol
     size.


Figure 3.2c: PC.aml                    Figure 3.2d: PB.aml
   Ratio data displayed with graduated    Ratio data displayed with graduated
circles.                                boxes.

-------
                                                                         42
PZ   {size_exponent} {size_factor} {minimum_size}
          {markerset} {markersymbol} {pointlnode}
        - the coverage to be displayed;
        - a numeric data field in the attribute table of ;
       {size_exponent} {size_factor} {minimum_size} - define the scaling of sizes
          defaults: size_exponent = 1; size_factor = 0.15; minimum_size = 0.02 inches.

All of the size symbolization AMLs use the formula:

          size = size_factor (normalized_data_value s|ze-exP°nent) + minimum_size

       This is just a formula for a line in x (normalized_data) and y (symbol size)
space  (with  size_exponent set  to one),  with the ability  to  create non-linear
progressions (with size_exponent not equal to one).  The slope  of the line is
determined by size_factor, and  the y-intercept of the line is minimum_size. The
data is normalized prior to calculating the size in order to set the minimum data
value to minimum_size; data values are normalized by:

          normalized_data_value = (actual - minimum) / (maximum - minimum)

Ordinal to Ratio Data-Graduated Circle Size
       Symbol size is good for either ordinal or interval/ratio data.   For the
generation of graduated  circle (or box) symbols,  ARC/INFO  provides two
commands  that allow  rapid  generation  of  these maps:  SPOTSIZE  and
POINTSPOT.  SPOTSIZE must be  given  before POINTSPOT  can be  used.
SPOTSIZE allows  the  creation of  point  symbols  that  can be linearly  or
exponentially scaled; of the two, exponential  scaling is  generally preferred,
although the command line syntax is more complicated. Once SPOTSIZE has been
given, POINTSPOT can be used to create graduated symbol maps, with either
circle or box symbols. These two macros generate circle (PC.ami) and box (PB.aml)
symbols (see  Figures 3.2c and d).

PC   {minimum_size} {maximum_size} {pointlnode}

PB   {minimum_size} {maximum_size} {pointlnode}
        - the coverage to be displayed;
        - an interval or ratio data field in the attribute table of ;
       {minimum_size} - size of the smallest symbol-defaults to 0.05 inches;
       {maximum_size} - size of the largest symbol-defaults to 0.5 inches;
       {pointlnode} - the type of  to be displayed-defaults to point.

Bivariate,  Monochrome Symbolization
       Most of the AML's presented in this section make use  of shape to indicate a
nominal distinction.  An appropriate symbol set should be selected (or created)
and one of the bivariate methods should be selected on the basis of the second
variable's type.

Two Nominal Data Sets-Shape and Orientation
       For nominal data and meta-data, shape and orientation can be combined to
create  bivariate  point symbolization. Shape should generally be used for the
primary data, and orientation for the less important data, or meta-data  (varying

-------
                                                                        43
Figure 3.3a: PSO.aml                  Figure 3.3b: PSV.aml
   Two  nominal  data  sets displayed     A nominal data set displayed with
with  shape (primary data) and orien-  shape and an ordinal data set displayed
tation (secondary data).                 with value.  This is better for meta-data
                                      than a second data set.
               o

                H
                  O
Figure 3.3c: PSZ.aml                  Figure 3.3d: PCV.aml
   A nominal data set displayed with     Interval/Ratio data displayed with
shape  and a ratio data set displayed  graduated circles that are value shaded
with size. This is more effective than  on the basis on an ordinal data set.
value for a second data set.

-------
                                                                           44

orientations of the same shape seem to 'go  together' more than the same
orientation of varying shapes).  PSO.aml, uses lookup tables for both shape and
orientation to generate a bivariate display; see Figure 3.3a. Note that this AML (as
with all that follow) require that any necessary lookup tables be set up in the
format generated by SETMAN.aml and JENKS.aml prior to running this routine.

PSO   {markerset} {markersize}
        - specifies marker symbol numbers;
        - specifies angles in degrees;
       {markerset} - a markerset for shapes-defaults to the current markerset;
       {markersize} - a size for the markers-defaults to 0.15 inches.

Nominal Data, and Ordinal  Data-Shape and Value
       As with the next AML, this routine uses nominal primary data and ordinal
secondary data, but the visual  hierarchy established by value makes  it more
appropriate for display of meta-data than size.  With value, uncertain values of the
primary data can be faded into the background; see Figure 3.3b and PSV.aml.

PSV   {markerset} {markersize} {hue} {intensity}
        - specifies marker symbol numbers;
        - specifies HLS lightness data (0 to 100);
       {markerset} - a markerset for shapes-defaults to the current markerset;
       {markersize} - a size for the markers-defaults to 0.15 inches;
       {hue} {intensity} - specify a color that will be value shaded.

Nominal Data, and Ratio Data-Shape and Size
       For nominal primary  data, shape can be used in conjunction with size to
create bivariate point symbols. This seems to work better for two data sets rather
than data and meta-data, which should be shown with shape and value; see Figure
3.3c and PSZ.aml (and the size discussion on page 39).

PSZ   {markerset} {size_exponent} {size_factor} {minimum_size}
        - specifies marker symbol numbers;
        - an interval or ratio data item  in the coverage named by ;
       {markerset} - a markerset for shapes-defaults to the current markerset;
       {size_exponent} {sizejactor} {minimum_size} - define the scaling of sizes,
          defaults: size_exponent = 1; size_factor = 0.15; minimum_size = 0.02 inches.


Ratio Data, and Ordinal Data-Size and Value
       Symbol size and value are good for either ordinal or interval/ratio data.
The AML  PCV.aml combines the two; see Figure 3.3d.  Because symbol size is
calculated directly from the Point Attribute Table, only one lookup table is needed,
but rather than containing point symbol numbers, it should contain color value (0-
-100) numbers. This system can be used to represent a ratio data value with size,
and a meta-data estimate of accuracy with  value.

-------
                                                                        45
PCV   {minimum_size} {maximum_size} {hue} {intensity}
       - a numeric item in the coverage referenced by ;
       - specifies HLS lightness data;
      {minimum_size} - graduated circle size for the smallest data value-default = 0.05 inches;
      {maximum_size} - graduated circle size for the largest data value-default = 0.5 inches;
      {hue} {intensity} - specify a color that will be value shaded.

Bivariate, Color Symbolization
      For color (hue or hue/intensity) based bivariate mapping, a point symbol
should be selected from the available marker sets (PLOTTER, COLOR, MINERAL,
MUNICIPAL,  OILGAS, TEMPLATE, USGS, or WATER);  these markersets are
displayed in Appendix B  of  the Map Display and Query  guide.   The color
specification of that point symbol will then be changed to show variations in data
and meta-data. For size and hue based maps, only a sets of hues must be chosen;
symbol size and shape is calculated by ARC/INFO.

Two Nominal  Data Sets-Shape and Hue
      This macro (PSH.aml) uses two lookup tables for symbolizing two nominal
data sets. Unlike shape and orientation, neither shape or hue is, in general, the
dominant visual variable.  Visual hierarchies can be established by selecting
intense color and similar shapes—this will make color hue the more prominent of
the two nominal visual variables. See Figure 3.4a.

PSH   {markerset} {markersize} {value} {saturation}
       - specifies marker symbol numbers;
       - specifies HLS hue data (values of 0 to 360);
      {markerset} - a markerset for shapes-defaults to the current markerset;
      {markersize} - a size for the markers-defaults to 0.15 inches:
      {value} {saturation} -defaults of 50 and 100 (maximum intensity).

Two Nominal Data Sets-Dual Hue Ranges
      Color hue is best used for nominal data, although with well selected colors,
hue can be used with ordinal data. Use of the spectral encoding bivariate mapping
scheme requires careful selection of colors. For the specification of colors for dual-
hue range mapping, the CMY (Cyan, Yellow, Magenta)  color scheme is most
useful. A color chart for this color specification system is in Appendix J of the Map
Display and Query manual. For a four by four color matrix, color specifications like
the following can be used:

      Cyan:    100   Cyan:     100   Cyan:     100   Cyan:     100
      Magenta:    0   Magenta:   33   Magenta:   67   Magenta:  100

      Cyan:     67   Cyan:      67   Cyan:      67   Cyan:      67
      Magenta:    0   Magenta:   33   Magenta:   67   Magenta:  100
      Cyan:
      Magenta:
33   Cyan:
 0   Magenta:
33   Cyan:
33   Magenta:
33   Cyan:      33
67   Magenta:  100
      Cyan:
      Magenta:
 0   Cyan:
 0   Magenta:
 0   Cyan:
33   Magenta:
 0   Cyan:       0
67   Magenta:  100

-------
                                                                        46
                                *>
 o
Figure 3.4a: PSH.aml                  Figure 3.4b: PDH.aml
   Two nominal data  sets displayed    Two nominal data sets displayed
with shape and hue.                   with the dual-hue  ranges.  Note the
                                     lack  of  discernible  order in  color
                                     changes—this  necessitates  a  legend
                  .

Figure 3.4d: PBL.aml
   The   AML   that
generates      these
legends    automati-
cally    places   the
labelling text and the
total   number   of
occurrences in each
column or row.
                                     11.11 ->
                                      5.42->
                                                                        6

                                                                        6

                                                                        7
                                                                        7

                    i  5
      11.11* *•••  9
     -> 5.420 * * • •  6
                    I  5
                    I  1
                                       0.82 ->
                                       0.11 -> 0.82
Figure 3.4c: PCCaml
   Two ordinal data sets displayed with comple-
mentary colors.  This technique highlights corre-
lation better than dual-hue range maps—the central
gray diagonal could indicate a linear relation.

-------
                                                                            47

This pattern can be generated with SETAUTO.aml with the command line 0 33.
Yellow should be held constant for each of the sixteen positions—generally, 100
should be good.  Even with this control of changes in color, dual-hue range maps
should generally be used only for nominal data—the complementary-color system
tends to present ordinal data better. See Figure 3.4b and PDH.aml.

PDH   {markerset} {markersymbol} {markersize}
        - specifies changes in cyan (values of 0 to 100);
        - specifies changes in magenta (values of 0 to 100);
       {markerset} {markersymbol} - specify which symbol will be drawn with-defaults to current;
       {markersize} - silicifies the size of the marker-defaults to 0.15 inches.

Two Ordinal Data Sets-Complementary  Colors
       This AML is a variation on the previous macro.  The change is in the colors
used to symbolize data;  complementary colors are hues that are on opposite sides
of the Munsell or Tektronix color spaces and mix to form grey. This mixing allows
highlighting of data that is not highly correlated, because these areas will appear
in color, whereas the central axis of correlated data will  appear in grey.  This
method allows the representation of  both positive and  negative  correlations;
negative correlations should be represented by reversing the values in one of the
lookup tables—this changes the direction of the slope of the  central axis. Note that
white should be avoided because the entire symbol will disappear on a white sheet
of paper; use a range from 5 to 100 for percent area inked.  See Figure  3.4c and
PCC.aml.

PCC   {markerset} {markersymbol} {markersize}
        - specifies changes in cyan  (values from 0 to 100);
        - specifies changes in red (values from 0 to 100);
       {markerset} {markersymbol} - specify which symbol will be drawn wrth-defaurts to current;
       {markersize} - specifies the size of the marker-defaults to 0.15 inches.

Point Legend Creation
       Although the usage of this AML (PBL.aml) is lengthy, it allows one macro
to generate a bivariate legend for three different types of point symbolization
schemes: dual hue, complementary colors, and hue and intensity. See Figure 3.4d
for both dual-hue and complementary-color legends and Figure 3.5b for a hue and
intensity legend.

PBL       
          {markerset} {markersymbol} {markersize} {textset} {font} {point} {decimaLprecision}
        - the first lookup given in one of the bivariate AMLs;
        - the second lookup given in one of the bivariate AMLs;
         - the lower left corner of the legend matrix,  in PAGEUNITS;
         - the separation of symbols on the x and y axes, in PAGEUNITS;
       {markerset} {markersymbol} - specify which  symbol will be drawn wrth-defauKs to current;
       {markersize} - specifies the size of the marker-defaults to 0.15 inches;
       {textset} {font} {point} - specify a textset for legend labels-defaults to a roman, 10 point;
       {decimaLprecision} - number of decimal places shown for ratio data labels-defaults to 2.

Nominal Data, and Ordinal  Data-Hue and Intensity
       Color hue is best used for nominal data; color intensity, on the other hand,
is best used for ordinal data (and generally only for meta-data, not a second data

-------
                                                                       48
                                                            rs
                                                                  5

                                                                  9

Figure 3.5a: PHI.aml
   A  nominal data set displayed with
hue and an ordinal data set used to
display  meta-data.   Intense (bright)
colors tend to be more noticeable and are
used to present more certain values.
Figure 3.5b: PBL.aml
   A hue and intensity legend. Not the
lack of a diagonal, as in the comple-
mentary color system. This makes hue
and intensity better suited to display of
meta-data than two correlated data sets.
                   •     I
                                             0
Figure 3.5c: PEE.aml                  Figure 3.5d: PCFLaml
   Two ratio data sets displayed with    Interval/Ratio data displayed with
the  Eyton's  equiprobability  ellipse graduated circles that are hue shaded,
system. This should be used to highlight displaying nominal data.
correlated data.

-------
                                                                          49
variable).  This color scheme represents meta-data better than the dual-hue and
complementary-color  bivariate systems, because the data variable is  clearly
displayed in a constant hue, unlike the other color bivariate methods. See Figure
3.5a and PHI.aml.

PHI   {markerset} {markersymbol} {markersize}
        - specifies changes in HLS hue (from 0 to 360)
        - specifies changes in HLS saturation (from 0 to 100)
       {markerset} {markersymbol} - specify which symbol will be drawn wrth-defaults to current;
       {markersize} - specifies the size of the marker-defaults to 0.15 inches.

Two Ratio Data Sets-Equiprobability Ellipse
       Eyton's ellipse is a variation on the complementary color system. The colors
that are used are the same, but the linear correlation between the two variables is
used to determine a central category, which specifically highlights  correlation.
This AML requires that the EYTON.aml, presented in chapter two, be run first. See
Figures 3.5c and 5.7b on page 79 for a legend display, and PEE.ami.

PEE  {markerset} {markersymbol} {markersize}
        - a lookup table generated by EYTON.aml;
       {markerset} {markersymbol} - specify which symbol will be drawn with-defaults to current;
       {markersize} - specifies the size of the marker-defaults to 0.15 inches.

Ratio Data, and Nominal Data-Size and Hue
       This macro is the color equivalent of the one that generated Figure  3.3d.
Unlike that AML though, this should be used for one nominal  data variable and
one ratio data variable.  See Figure 3.5d and  PCH.aml.

PCH   {minimum_size} {maximum_size} {value} {intensity}
        - a numeric data item of the coverage reference by ;
        - specifies changes in HLS hue (from 0 to 360);
       {minimum_size} - graduated circle size for the smallest data value-default = 0.05 inches;
       {maximum_size} - graduated circle size for the largest data value-default = 0.5 inches;
       {value} {intensity} -default to 50 and 100 (maximum intensity).

Multivariate Symbolization
       Multivariate point symbolization can be accomplished by several means,
each of which is suited to various combinations of data levels.  Because of the
increased complexity involved in multivariate mapping, care  must be taken to
insure legends are well designed and convey the methods that should be used in
interpreting map symbols.

Three Ordinal to Ratio Data Sets-Red, Green and Blue Symbolization
       The 'false color' images that are often  generated with satellite derived data
use red, green and blue to symbolize data values from three spectral bands.  This
technique is applied here to allow display of three data values.  See Figures 3.6a
and 5.8b on  page  82 for a legend. Note that the AML (PRGB.aml)  performs a
linear-stretch on the items in the Point Attribute Table. This AML also allows color
specification as cyan, magenta, and yellow. This can be helpful because this color
scheme tends  to  allow numbers in the  low  end  of the data  range to  be
distinguished more readily than the RGB scheme.

-------
                                                                        50
                         «•.«
                                             D
Figure 3.6a: PRGB.aml                 Figure 3.6b: PSZV.aml
   Three ratio  data  sets displayed by     A nominal data set  displayed with
linearly stretching the data sets from 0 to  shape, a ratio data set  displayed with
255 and using data set one for red, two  size,  and  an  ordinal  data set (size
for green and three for blue.             meta-data) displayed with color value.
Figure 3.6c: PCHLaml                  Figure 3.6d: PP.aml
   A nominal data set displayed with     Ratio data displayed with size; each
color hue, ratio data displayed with size,  pie slice is a nominal difference within
and an ordinal data set (size meta-data)  the  ratio whole—pie slice  values are
displayed with color intensity.          percentages of the whole.

-------
                                                                             51

PRGB    
          {markerset} {markersymbol} {markersize} {pointlnode} {rlc}
        - the coverage to be displayed;
          - numeric items in the attribute table of ;
       {markerset} {markersymbol} - specify which symbol will be drawn wrth-defaults to current;
       {markersize} - specifies the size of the marker-defaults to 0.15 inches;
       {pointlnode} - symbolize point (the default) or node features;
       {rlc} - display with the RGB (default) or CMY color system.

Nominal Data,  Ratio Data and Ordinal Data-Shape, Size and Value
       Shape can be used to display nominal data, and size can be used to display
ratio data. This AML (PSZV.aml) adds to this bivariate representation by allowing
color value to be used to display an ordinal data set.  Color value can be used to
display uncertainty on monochrome output devices, such as laser printers. The
calculation of symbol size is discussed on page 39; see Figure 3.6b.

PSZV    {markerset}
          {size_exponent} {sizejactor} {minimum_size} {hue} {intensity}
        - specifies marker symbol numbers;
        - a numeric data item in the coverage referred to by ;
        - specifies HLS lightness data (from 0 to 100);
       {markerset} - a markerset for shapes-defaults to the current markerset;
       {size_exponent} {size_factor} {minimum_size} - define the scaling of sizes
          defaults: size_exponent = 1; size_factor = 0.15; minimum_size = 0.02 inches;
       {hue} {intensity} - define a color that will be value shaded.

Ratio Data, Nominal Data, and Ordinal Data-Size, Hue and Intensity
       Like the previous AML, size, hue and intensity should be used to display a
nominal  data  variable,  a ratio  data  variable, and an ordinal  data  variable
(meta-data for the ratio data variable would be appropriate). This AML however
uses color hue and intensity, and therefore requires full color displays. See Figure
3.6c and PHIZ.aml.

PHIZ   
          {markerset} {markersymbol} {size_exponent} {sizejactor} {minimum_size}
        - specifies HLS hue data (from 0 to 360);
        - specifies HLS saturation data (from 0 to 100);
        - a numeric data item in the coverage referred to by the lookup tables:
       {markerset} {markersymbol} - specify a marker-the current marker is the default;
       {size_exponent} {size_factor} {minimum_size} - define the scaling of sizes
          defaults: size_exponent = 1; size_factor = 0.15; minimum_size = 0.02 inches.

Ratio Data-Point  Pie Graphs
       This  AML generates pie symbols for point data by use of the POINTSPOT
command. The type of data that should be used in the creation of this type of map
is a size value that represents a sum of several other values; these other  values
should have nominal distinctions. POINTSPOT uses the sum value to calculate the
size of the circle, and it calculates the pie slice size that will be drawn as a function
of the ratio  a sub-value to the whole.  Colors for each slice are calculated by the
AML.  See Figures 3.6d and 6.3d on page 88 for a legend, and PP.aml.

-------
                                                                                 52

PP     
           
        - the coverage to be displayed;
        - symbolize points or nodes;
        - a numeric item in  that specifies the size of the circle;
         - the smallest and largest circle sizes;
        - the number of pie slice data rtems~the value of n, next;
        - names of numeric items in  that specify the size of pie slices.

-------
                                                                     53

                            Chapter Four
                 Line Symbolization in ARC/INFO

      For line symbolization, the visual variables of hue, shape, and texture can
be controlled by the use of lookup tables, which relate nominal difference to
symbolization. Value and size can be used either by means of lookup tables, or by
referencing Arc Attribute Table values.  Lookup tables should be generated with
one of the lookup table AMLs presented in Chapter Two.

Monovariate Symbolization
      As with point symbolization, the AMLs discussed in this chapter start out
relatively simply,  and the output could be accomplished  almost as readily by
hand. Like the monovariate point AMLs, these AMLs are primarily background
for the bi- and multivariate AMLs of later sections.
      ARC/INFO provides a default set of lines in the lineset file, PLOTTER.LIN;
there are several  other  available linesets: 50.LIN, BW.LIN, CALCOMP2.LIN,
CARTO.LIN, COLOR.LIN, HP.LIN, and OILGAS.LIN. These are displayed in the
Map Display and Query guide's Appendix A, and the ability to create new symbols
sets is discussed in chapter three of Map Display and Query.

Nominal Data-Hue
      Hue is best used for nominal data, and the COLOR.LIN lineset provides
fifteen symbols of the same shape, but each with a different color. All of the other
linesets, except BW.LIN, have the same symbol in black, red, green and blue. This
AML (LH.aml) uses the COLOR.LIN lineset for generating a display for arcs-see
the next section for display of routes, and Figure 4.1a.

LH    
       - the arc coverage to be displayed;
       - the data item in  referenced by ;
       - an info table that relates  to ;
       - a lineset.

Nominal Data-Shape or Texture
      Shape should be used for nominal data, and a line's shape can be changed
in several ways in ARC/INFO. The linesets PLOTTER.LIN, TEMPLATE.LIN and
OILGAS.LIN have symbols that change in shape. LINESET can be used to select
one of these, and LINESYMBOL can be used to change the shape of the current line
type, as well as change the line color and width (although the selection is limited).
This  allows the selecting of several different shapes; the LINETYPE command
allows the generation of nine other types of shapes.
      Texture is best used for nominal data, and the CARTO.LIN lineset provides
many ready-to-use symbols that vary in texture (such as lines 106, 110,114 and
118). These symbols can be selected by using LINESET to select this symbol set,
and LINESYMBOL to select the individual symbols. A line's texture can also be
directly  varied; LINEINTERVAL and LINETEMPLATE can be used to control
texture.  LINEINTERVAL determines the space between successive parts of a line
symbol; it defaults  to 0 (no space). LINETEMPLATE requires that a lineinterval be

-------
                                                                        54
                                                                        K. .
Figure 4.1a: LH.aml                   Figure 4.1b: LS.aml
   Nominal data in  an arc coverage    A nominal data set in a route system
distinguished by hue with the color.Un distinguished by shape using symbols
lineset.                               from plotter.lin.
Figure 4.1c: LV.arnl                    Figure 4.1d: L/.aml
   Classed   ratio   (ordinal)   data     Ratio data displayed with size; sizes
displayed with color value by using a  are calculated by scaling the data in the
lookup table.                          coverage Arc Attribute Table.

-------
                                                                          55

set.  The LINETEMPLATE command then allows control over both the length of
the mark and the length of the space between marks.  The