Cartographic Symbolization and Design: ARC/INFO Methods


 Cartographic
                              Methods
                     Alan Brenne*
The U. S. Environmental Protection Agency
Office of Information Resources.Management
National Geographic Information Systems Program

-------
      The ARC Macro Language^ and C programs discussed in this guideline are
available by anonymous ftp from sdcdg01.sdc.epa.gov and are in the files:
      /pub/readme.cart
      /pub/map_amls.tar.Z
      /pub/map_design.tar.Z
      /pub/map_post.Z
      /pub/color_post.tar.Z

      For users not on the Internet, the programs can be obtained on hardcppy,
3.5 inch disks or QIC 150 by contacting the National CIS Program at 703^235-5600,
or:
      401MST.SW,MS3405R
      Washington, DC 20460.
      AML, ARC/GRID, ARC/TIN and ARC Macro Language are trademarks
and ARC/INFO is a registered trademark of Environmental Systems Research
Institute.  AViiON is a trademark of Data General Corporation.  Sun and
SPARCstation are trademarks of Sun Microsystems, Inc. PostScript is a trademark
of Adobe Systems, Inc.  Use of these trademarks  does  not constitute an
endorsement by the United States Government.
      In no event shall the United States Government have any responsibility or
liability for any consequences of any use, misuse, inability to use, or reliance upon
the information contained herein, nor warrant or otherwise represent in any way
the accuracy, adequacy, efficacy, or applicability of the contents hereof.    :

-------
                          Table of Contents


Chapter One
The Utility of Maps and Graphics 1
 The Principles of Data 1
 Meta-Data and Uncertainty 4
   Uncertainties of the Physical World 6
   Uncertainties of the Computer World 7
   Uncertainties of the Human World 10
 The Idea of Visual Communication 11
 The Means of Visual Communication 13
 Methods for Visually Communicating Data and Meta-Data 17


Chapter Two
Producing Displays 24
 Classifying Space 24
   Projections 25
   Scale and Generalization 27
   Space and Time 27
 Classifying Data 28
   Manual Classification 28
   Natural Breakpoints 29
   Eyton's Equiprobability Ellipse Bivariate Classification 30
   Symbol Value Update 31
   Unclassed Maps 31
 Page Layout 32
   Titles and Type 34
   Insets and Legends 35
 ARC/INFO Hints 36


Chapter Three
Point Symbolization in ARC/INFO 38
 Monovariate Symbolization 38
   Nominal Data-Hue 38
   Nominal Data-Orientation 38
   Nominal Data-Shape 38
   Ordinal to Ratio Data—Orientation 40
   Ordinal to Ratio Data-Value 40
   Ordinal to Ratio Data—Size 40
   Ordinal to Ratio Data-Graduated Circle Size 42
 Bivariate, Monochrome Symbolization 42
   Two Nominal Data Sets—Shape and Orientation 42
   Nominal Data, and Ordinal Data—Shape and Value 44
   Nominal Data, and Ratio Data—Shape and Size 44
   Ratio Data, and Ordinal Data-Size and Value 44
 Bivariate, Color Symbolization 45
   Two Nominal Data Sets—Shape and Hue 45

-------
   Two Nominal Data Sets—Dual Hue Ranges 45
   Two Ordinal Data Sets—Complementary Colors 47
   Point Legend Creation 47
   Nominal Data, and Ordinal Data-Hue and Intensity 47
   Two Ratio Data Sets-Equiprobability Ellipse 49
   Ratio Data, and Nominal Data—Size and Hue 49
  Multivariate Symbolization 49
   Three Ordinal to Ratio Data Sets-Red, Green and Blue Symbolization 49
   Nominal Data, Ratio Data and Ordinal Data—Shape, Size and Value 51
   Ratio Data, Nominal Data, and Ordinal Data—Size, Hue and Intensity 51
   Ratio Data-Point Pie Graphs 51


Chapter Four
Line Symbolization in ARC/INFO  53
  Monovariate Symbolization 53
   Nominal Data—Hue 53
   Nominal Data—Shape or Texture 53
   Ordinal Data-Value 55
   Ratio Data-Size 55
  Bivariate, Monochrome Symbolization 56
   Two Nominal Data Sets—Shape and Texture 56
   Nominal Data, and Ordinal Data—Shape and Value 56
   Nominal Data, and Ratio Data—Shape and Size 58
   Ratio Data, and Ordinal Data—Size and Value 58
  Bivariate, Color Symbolization 58
   Two Nominal Data Sets—Texture or Shape, and Hue 58
   Two Nominal Data Sets-Dual Hue Ranges 60
   Two Ordinal Data Sets—Complementary Colors 60
   Bivariate Color Legends 60
   Two Ratio Data Sets-Eyton's Ellipse for Lines 62
   Ratio Data, and Nominal Data-Size and Hue 62
   Nominal Data and Ordinal Meta-Data—Hue and Intensity 62
  Multivariate Symbolization 62
   Three Ordinal to Ratio Data Sets—Red, Green and Blue Symbolization 62
   Nominal Data, Ratio Data, and Nominal Data-Shape, Size and Hue 64
   Nominal Data, Ratio Data, and Ordinal Data-Shape, Size and Value 64
   Ratio Data, Nominal Data, and Ordinal Data-Size, Hue and Intensity 64


Chapter Five
Choropleth Symbolization in  ARC/INFO 66
  Monovariate Symbolization 66
   Nominal Data-Hue 66
   Monovariate Legends—Filled Polygons 66
   Nominal Data—Orientation 68
   Nominal Data—Shape 68
   Ordinal Data-Value 68
   Ordinal to Ratio Data—Orientation 68
   Monovariate Legends—Orientation 70

-------
  Bivariate, Monochrome Symbolization 70
   Two Nominal Data Sets-Texture and Orientation 70
   Nominal Data, and Ordinal Data—Texture Distinguished by Value 72
   Nominal Data-1 and 2—Texture as Intersecting Lines 72
   Two Ordinal to Ratio Data Sets-Texture as Intersecting Lines 72
   Bivariate Legends—Unclassed Texture 74
  Bivariate, Color Symbolization 74
   Two Nominal Data Sets—Texture Distinguished by Hue 74
   Bivariate Legends—Lookup Table Based Displays 74
   Two Nominal Data Sets—Dual Hue Ranges 76
   Nominal Data, and Ordinal Data—Hue and Intensity 76
   Two Ordinal Data Sets—Complementary Colors 76
   Two Ratio Data Sets-Eyton's Ellipse 78
   Bivariate Legends—Eyton's Ellipse 78
  Multivariate Symbolization 80
   Two Nominal or Ordinal Data Sets, Ordinal Data Sets-Texture with Value 80
   Three Ordinal to Ratio Data Sets-Red, Green and Blue Symbolization 80
   Multivariate Legends—RGB Space 80


Chapter Six
Graduated Symbol Symbolization  in  ARC/INFO 83
  Monovariate Maps 83
   Ordinal to Ratio Data—Graduated Circles 83
   Ordinal to Ratio Data—Cartograms 83
  Monochrome, Bivariate Symbolization 83
   Ordinal to Ratio Data, and Ordinal Data-Graduated Circles and Value 85
   Ordinal to Ratio Data, and Ordinal Data-Cartograms Shaded by Value 85
  Color, Bivariate Symbolization 85
   Ordinal to Ratio Data, and Nominal Data-Graduated Circles and Hue 85
   Graduated Circle Legends 85
   Ordinal to Ratio Data, and Nominal Data—Cartograms with Hue 87
   Cartogram Legends 87
  Multivariate Symbolization 87
   Ratio Data, Nominal Data, and Ordinal Data—Graduated Circles, Hue and Intensity 87
   Ratio Data, Nominal Data, and Ordinal Data—Cartograms, Hue and Intensity 89
   Ratio Data-Polygon Pie Graphs 89
   Graduated Pie Legends 89


Chapter Seven
Grid-Cell Symbolization in ARC/INFO 90
   Conversion of Polygons to Grids 90
   Lookup Tables for Ratio Grids 90
  Monovariate Symbolization 90
   Nominal Data-Hue 92
   Ordinal to Ratio Data-Value 92
  Bivariate and Multivariate Symbolization 92
   Nominal Data, and Ordinal Data-Hue and Intensity 92

-------
   Three Ordinal to Ratio Data Sets-Red, Green and Blue Symbolization 92


Chapter Eight
Dot Density Symbolization in ARC/INFO 93
   Conversion of Polygons to Dot Density 93
  Monovariate Symbolization 93
   Ordinal to Ratio Data-Texture 93
   Monovariate Dot Density Legends. 93
  Monochrome, Bivariate Symbolization 95
   Ratio Data, and Ordinal Data-Texture and Value 95
   Bivariate Dot Density Legends 95
  Color, Bivariate, and Multivariate Symbolization 95
   Ratio Data, and Nominal Data—Texture and Hue 97
   Ratio Data, Nominal Data and Ordinal Data—Texture, Hue and Intensity 97
   Three Ordinal to Ratio Data Sets—Red, Green and Blue Symbolization 97


Chapter Nine
Isopleth and Fishnet Symbolization in ARC/INFO 98
   Polygon to Isoline Conversion 98
   Polygon to Surface Conversion 98
  Monovariate Symbolization 98
   Ratio Data—Isoline Location 98
   Ratio Data-Fishnet Height 100
   Isoline and Fishnet Legends 100
  Bivariate Symbolization 100
   Ratio Data and Ordinal Data-Surface Shaded with Value 100
   Ratio Data and Nominal Data—Surface Shaded with Hue 102
  Multivariate Symbolization 102
   Ratio Data, Nominal Data and Ordinal Data-Surface with Hue and Intensity 102
   Four Ratio Data Sets—Surface Shaded with Red, Green and Blue 103


Chapter Ten
Thematic Mapping in  ARC/INFO 104
  Summary of Presented Methods 104
  Areas of Possible Continued Research 104
  Acknowledgments 104


References  105

-------
                    Table of Figures
Chapter One
  Figure 1.1            2
  Table 1.1             3
  Figure 1.2            6
  Figure 1.3            7
  Figure 1.4            8
  Figure 1.5           10
  Figure 1.6           11
  Figure 1.7           14
  Figure 1.8           15
  Figure 1.9           18
  Figure 1.10          20

Chapter Two
  Figure 2.1           24
  Figure 2.2           25
  Figure 2.3           25
  Figure 2.4           26
  Figure 2.5           26
  Figure 2.6           27
  Figure 2.7           28
  Figure 2.8           32
  Figure 2.9           33
  Figure 2.10          34
  Figure 2.11          35

Chapter Three
  Figure 3.1           39
  Figure 3.2           41
  Figure 3.3           43
  Figure 3.4           46
  Figure 3.5           48
  Figure 3.6           50
Chapter Four
  Figure 4.1           54
  Figure 4.2           57
  Figure 4.3           59
  Figure 4.4           61
  Figure 4.5           63

Chapter Five
  Figure 5.1           67
  Figure 5.2           69
  Figure 5.3           71
  Figure 5.4           73
  Figure 5.5           75
  Figure 5.6           77
  Figure 5.7           79
  Figure 5.8           81

Chapter Six
  Figure 6.1           84
  Figure 6.2           86
  Figure 6.3           88

Chapter Seven
  Figure 7.1           91

Chapter Eight
  Figure 8.1           94
  Figure 8.2           96

Chapter Nine
  Figure 9.1           99
  Figure 9.2          101

-------
                             Chapter One
                 The Utility of Maps and Graphics

      Maps  have a  wide variety of uses,  ranging from  the recording of
information (such as cadastral maps) to aiding navigation (such as road maps and
naval charts). Two of the main uses of maps (and graphics) in environmental
analysis, particularly  as assisted by  geographic  information systems, are the
analysis of data and the presentation of data. Neither of these uses precludes the
other, and both can be aided by good graphic design techniques. Graphic design
requires an understanding of the data (and the information represented by the
data), an understanding of the methods of visual communication, and the ability
to make use of the available means of communication. Although each of these
components is described in other places (such as statistics books for data analysis,
cartographic texts for visualization of spatial data, and software manuals for the
actual production  of maps), this guideline integrates  these three areas in the
context of the Environmental Systems Research Institute's geographic information
system, ARC/INFO revision 6, as the available means of communication.
      As such, chapter one contains a brief discussion of data and meta-data
issues, and visualization and communication theory; chapter two consists of a
discussion of issues involved in the design of maps, such as projections and data
classification, as accomplished in ARC/INFO; and chapters three through nine
present ARC Macro Language programs for specific symbolization techniques for
point, line and area data. Chapter ten provides a brief conclusion.

The Principles of Data
      For geographic phenomena, there are two  types of data that comprise
information about a phenomena: attribute data (the measured characteristic of a
location), and location data (the measured location of a  characteristic).  This
pairing of information types is reflected in the two major approaches of geographic
information systems: vector systems, which emphasize the attribute; and raster
systems, which emphasize the location. ARC/INFO now combines both of these,
allowing a wide variety of analysis and mapping, but the appropriate use of this
information is still dependent on the analyst/cartographer.
      Attribute data can be grouped into several  categories-empirical levels-
which influence the design requirements for representing information.  Each
empirical level contains all of the information of lower levels (and thus can be
simplified) while adding additional information (see Figure 1.1  on page 2 for a
comparison  of attribute data levels  with  spatial characteristics and  Visual
variables' for area data). The lowest empirical level is nominal data; this data only
indicates that something is different than something else. Names of states are an
example of nominal categorization. The next level of empirical data is ordinal
data; this data indicates that something is more (or less) than something else, but
no quantity can be given to that distance. The terms high, medium and low reflect
ordinal differences. The third level of empirical data is interval  data; a linear
measurement of distance can be used to gauge the differences between instances.
The highest level of empirical data is ratio data; this data indicates that something
is more (or less) than something else (and thus different), that a linear distance can

-------
                                Abrupt  character of Inter-regional change  SmOOth
                  Discrete
 Interval/Ratio
  Data
               Graduated
               Symbol
  character
  of Intra-
 reglonal
change

        Graduated
          Symbol
Continuous
Ordinal
  Data
Continuous
                Circle/Dot
               Combination
                  Dot
                Density
                                                      Unit-Vector
                                                      Density
  /  Dosymetrtc
                Unclassed
               Choropleth
      Isoline
     [Fishnet}
  Classed
Graduated
  Symbol
  Dot
Density
                       Multip
                     ^ Classed
                     Graduated
                Classed
               Choropleth
     Stepped
     Surface
                               Visual Variables
                                 location
                                   value
                                 Intensity
                               orientation
                                   shape
                                                                     size
                                                                     hue
                                                                     texture
                                                                     arrangement
                                                                     focus
                  Discrete
                              ^ Arbitrary ^
Nominal
  Data
Continuous
                 Visual Order
                 established by the
                  Visual Variables

                Ca Visual Isolation
                    (differences)

                EZ Visual Levels
                    (standing out)
                CD poor order control
               Representation of Data Variables
               •1 good/primary representation
               ^ good as an additional variable In multlvarlate
                   maps

               ES fair representation
               E3 fair as an additional variable In mulovartate
                   maps

               O poor representational method
Figure 1.1   Spatial data models for area data and their cartographic representa-
tions;  developed  from MacEachren and DiBiase (1991) and DiBiase,  Krygier,
Reeves, MacEachren and Brenner (1991).  In a thematic map, the information to be
displayed can be categorized by its empirical level (nominal to ratio), how the data
changes within any region it is aggregated into, and how the data changes between
adjacent regions. Once this categorization is done, visual variables can be selected
on the basis of the ability to represent the information so categorized.

-------
be used to gauge differences and that ratio comparisons (such as: this is twice that)
can also be done. The essential difference between ratio and interval data is that
for ratio data, a non-arbitrary  zero point is intrinsic in the measurement; this
difference  is  small enough that, for visualization,  the methods  used  for
representing both types of data are the same.  An example of the difference
between interval and ratio measurement is the difference  between temperature
measured in degrees Celsius and degrees Kelvin;  the Celsius system has an
arbitrary zero (the freezing point of water) but the Kelvin system's zero is not
arbitrary (absolute zero: point at which the particle motion that is 'temperature,'
stops).
      Because of these differences in attribute data, care  must be taken when
comparisons between different levels of data area made (see Table 1.1 for the types
of comparisons that can be performed on data of given levels).  For example,
ordinal data may be stored as integers that represent the order, but the values of
these integers do not indicate any measurement of the variability of the data. This
lack of measurement does not, however, preclude a software system from acting
on the data as if it were ratio data and thereby calculating essentially meaningless
statistics (or spatial patterns). If comparisons of data of different levels must be
done, either the higher level  of  data must  be  reduced to  the  lower, or a
transformation (an addition of information) must be done to raise the lower level
to the higher. For nominal or ordinal data, the procedures of psychometrics can be
used to recast ordinal information into interval or  ratio data.  Essentially  this
involves assigning  utility values (such as money)  to  data levels that do  not
normally have this type of information associated with it (such as aesthetic values).
This can be accomplished by conducting a survey to get individual assignments of
value, and then using the collected data to assign overall values to the data.

                           Nominal     Ordinal      Interval        Ratio

    Rank Comparison        Invalid      Valid        Valid         Valid
    Addition, Subtraction     Invalid      Invalid       Valid         Valid
    Multiplication, Division    Invalid      Invalid       Invalid        Valid
    Statistics:
      Parametric            Invalid      Invalid       Valid         Valid
      Nonparametric        Invalid      Valid        Valid         Valid

Table 1.1    Valid data comparisons. Because of the varying degree of numerical
precision associated with different data levels (nominal through ratio) only certain
operations can be applied to  comparisons between  two data sets; comparisons
between data of two different levels must occur at the lower of the two classes
(ESRI Grid Class Notes 1992,11-22).

      Although all information is  classified  due to  the nature of measurement
uncertainty in the recording of data (MacEachren 1992, 48), attribute data can be
further classed into groups for both presentation and analysis (see Classifying
Data  on page 28  for a  discussion  of  data classification in  ARC/INFO).
Classification involves the creation of range categories that individual instances fit
into and thus take on that value range. This results in a loss of information and can

-------
be a reduction in the empirical level of a measurement (interval/ratio data is
reduced to ordinal, for example).  This  loss of precision  is offset  by the
simplification of the presentation of information.   These classifications are
accomplished by representing ranges of data as categories on the display medium,
through the use of symbolization that is appropriate to the level of measurement
(see Figure 1.1 on page 2).
      Spatial data can be grouped into four categories.  The  first  type of data
represents point specific information. The second type of data represents linear
information. The third type of data represents area information. The fourth type
of data represents volume information.  Each of these categories is scale specific;
an area feature such as stream may require mapping as a linear feature if the total
area under consideration is small enough that displaying the stream as having
both length and breadth becomes  too  tedious or difficult to represent in the
available media, for the additional information retained. Volumetric information
is also dependant on the means of display-the appearance of three dimensions can
only be approximated on  a two dimensional surface.   Although  the visual
variables that best represent different levels of information do not change for each
of the types of spatial data, the ARC/INFO methods for accomplishing those
representations change.
      For environmental data, Mark Monmonier and Branden Johnson (1990,5-7)
have characterized spatial data into: single location; single location  and affected
area; and multiple  locations and the pattern of distribution.  Single  location
answers not only 'what' but 'where'; this can be applied to all of the basic types of
spatial data and allows the map user to relate environmental information to his or
her own experience  (this  is what Edward Tufte (1990) calls micro/macro
readings).  Building on single location, single location and affected area adds
information  concerning how  a 'what' influences its surroundings (because
'influence' may be more subject to interpretation than a measurement, presentation
of meta-data can become very important  in the presentation of data).  Finally,
multiple locations and  the  pattern of  distribution integrates more than one
instance of single location and affected area into one map.
      Area phenomena can also be categorized on the basis of its spatial grouping
and how it changes over space (MacEachren and DiBiase 1991) (see Figure 1.1).
Grouping ranges from continuous (no grouping) to discrete (complete grouping):
This reflects the degree of spatial autocorrelation within areas. Changes over space
can be smooth to  abrupt.  This reflects  the degree of spatial autocorrelation
between areas. These changes suggest that appropriate symbolization choices be
made that accurately reflect the nature of  the data.  The possible symbolization
choices include, but are not limited to: graduated symbols for abrupt, discrete data;
dot density for smooth, discrete data; isopleth (or the '3-D' equivalent fishnet) for
smooth, continuous data; and choropleth  for abrupt, continuous data; this is in
contradiction with the all-to-common practice of making choropleth maps for all
types of area data.

Meta-Data and Uncertainty
      The uncertainty of information is becoming an important topic with the
increased use of computers for data processing, presentation and analysis. This is
being addressed with position statements  such as the Environmental Protection

-------
Agency's Locational Data Policy (1991) and the National Center for Geographic
Information and Analysis' Visualization of Data Quality initiative (MacEachren
1992, 47). Yet, these cannot eliminate uncertainty, which exists at the most basic
levels of measurement according to Heisenberg's Uncertainty Principle (Capra
1983). Because of this, David Rejeski (1991) suggests that uncertainties should be
addressed openly in order to ensure that decisions have both utility and
believability. He, as well as Granger Morgan and Max Henrion, recognize that
uncertainty can be valuable information. Morgan and Henrion (1990,3) give three
specific reasons for the inclusion of uncertainty in policy oriented research:
1. A central purpose of policy research and policy analysis is to help
identify important factors and the sources of disagreement in a
problem, and to help anticipate the unexpected. An explicit
treatment of uncertainty forces us to think more carefully aoout
such matters, helps us identify which factors are most and least
important, and helps us plan for contingencies or hedge our bets.
2. Increasingly we must rely on experts when we make decisions. It is
often hard to be sure we understand exactly what they are telling
us. It is harder still to know what to do when different experts
appear to be telling us different things. If we insist they tell us
about the uncertainty of their judgments, we will be clearer about
how much they think they know and whether they really
disagree.
3. Rarely is any problem solved once and for all. Problems have a way
of resurfacing. The details may change but the basic problems
keep coming oack again and again. Sometimes we would like to
be aole to use, or adapt, policy analyses that have been done in the
past to help with the proolems of the moment. This is much easier
to do when the uncertainties of the past work have been carefully
described, because then we can have greater confidence that we
are using the earlier work in an appropriate way.
Uncertainty has dictionary definitions such as, "uncertain in respect of
duration, continuance, occurrence, etc.," "liability to chance," "indeterminate as to
magnitude or value" (Simpson and Weiner, Oxford English Dictionary, 1989, 899).1
A more useful interpretation for use in environmental risk analysis would be that
uncertainty is the information contained in the data about data (that is, meta-data).
By defining uncertainty this way, meta-data can be used as another piece of
information in the analysis and presentation of data, including risk-based policy
making.
Uncertainty has a taxonomy that should be useful in delimiting the origins
of uncertainty and the reliability of data at any given point in an analysis (see
Figure 1.2 on page 6). Although Morgan and Henrion (1990) discuss 'The Nature
and Sources of Uncertainty" as chapter 4 of Uncertainty: A Guide to Dealing with
Uncertainty in Quantitative Risk and Policy Analysis in detail, this presentation is a
distillation based on several sources, which include Morgan and Henrion and
others sources, as noted. The taxonomy can be broken down into three groups:
uncertainties of the physical world; uncertainties of the computer world; and
'•Uncertainty begins as the vagueness of duration, etc with Wyclif in 1382. By 1982, Oxford
(1989,900) reports: "What the uncertainty principle asserts is that for no state of any system
can all dynamical variables be arbitrarily well-determined."

-------
uncertainties of the human world.  Problems of the measurement of natural
phenomena constitute the first category; Alan MacEachren (1992, 48) reports that
the National Center for Geographic Information and Analysis calls this "data
quality".   The uncertainty of the physical world  can  be  further split into
measurement uncertainty and parameter uncertainty.
                     Uncertainty
 Physical World ' Computer World  ' Human World
  Measurement
    Location
    Attribute
  Parameter
    Aggregation
    Generalization
    Time
    Consistency
                        Descriptive
                         I Numeric
                         1 Spatial Delineation
                        Computational
Sending Meanings
I Open Presentation
  of Data and Meta-Data
Receiving Meaning
                          Rounding                Ignoring Meta-Data
                          Significant Digit Shift      I Misunderstanding
                          Over/Underflow            Meta-Data
                        Propagational
                          Compounded Computational
                         1 Polygon Overlay
                        Modeling
                         I Robustness
                         1 Validity

Figure 1.2   A taxonomy of uncertainty.


Uncertainties of the Physical World
      Measurement uncertainty for geographic data includes both location and
attribute uncertainties.  Location uncertainties (Rejeski and Kapuscinski 1990,10)
can be considered the accuracy (closeness to a 'true' value) of the instruments and
the reliability, or precision, (repeatability of a measurement) of the methods used
to calculate a site's position (see Figure 1.3  on page 7). For example a location of a
phenomenon can be determined by use of many methods, such as a professional
surveyor's analysis, use of  a  global positioning system,  or  terrain analysis
estimation.  Each of these methods is of varying accuracy and precision, and the
meta-data that should be recorded includes  the manner in which a site location
was first defined, the method use to derive  its location, the time it  was derived,
and an estimate of its accuracy.
      Attribute uncertainty can be considered the accuracy and reliability of the
instruments used to take a measurement  of the environment. For example, an
instrument  may be set up to measure atmospheric concentrations of  carbon
monoxide, measuring levels in parts per million and with an accuracy of  plus or
minus one part per million.  The data in this situation is  the part per  million
measurement and the meta-data includes  the plus or minus one part per  million
accuracy of the instrument.
      Parameter uncertainty (Rejeski and Kapuscinski 1990,11; MacEachren 1992,
47-8) involves the problem of the aggregation and generalization of point samples

-------
to areas for spatial data, or trend lines        Reliable, but not Accurate.
for linear data; the temporal variability      (/^
between  data  items  and  between                     _  Arrurate, but
measurement times  and data usage;                /^    ^\ not reliable.
and   the   logical  consistency  and
completeness of  data.   This is the
question of whether the measurements
that  are recorded and then used for
analysis are  adequate measurements
of what was intended to be measured;
it entails the  consequences  of the            True Value'
assumption of autocorrelation. Before   Figure 1.3   An   example  of   the
a model is constructed to provide  an   difference  between   accuracy   and
explanation   or   a  projection  of   reliability in spatial location measure-
environmental phenomena, it must be   ments.
recognized that there are few (if any)
phenomena  that  can be  precisely  defined  and  measured for  all  possible
occurrences. Because of this, interpolation and extrapolation-whether linear (as
in a measurement), spatial (as in an area generalization), or temporal (as in data
from one time as estimates for some other time)-must be done even though the
process introduces uncertainty into the analysis.
      There  have  been suggestions made  for  reducing  and  representing
parameter uncertainty.  For generalizing spatial data, Cort, Rowe and Philpot
(1985; in MacEachren 1992,44) suggest that interpolation in  spherical, rather than
planar,  coordinates  will introduce less uncertainty in the  creation of  area
information from  point samples (particularly for large areas).  MacEachren and
Davidson  (1987; in MacEachren 1992, 46) demonstrate that increasing sampling
frequency will also reduce (but not eliminate) the uncertainty of interpolation; this
should also  be  true  for linear and temporal  data as well.  For, representing
parameter uncertainty, Rejeski and Kapuscinski (1990,  10) suggest the use of
transitional buffer zones to represent "fuzzy"  boundaries,  rather than one line
demarcating "hard edges" (see Figure 1.4 on page 8).

Uncertainties of the Computer World
      Problems of the use of computers for storing  and manipulating  data
constitute  the second category of uncertainty.  This can be  subdivided into four
groups: descriptive  uncertainty,  computational  uncertainty,  propagational
uncertainty  and  modeling uncertainty.   Thoughtful use of  programs  and
programming techniques can reduce the amount of uncertainty introduced to data
by computer manipulation.  Herman Knoble  (1990,  2)  states that correct and
accurate computer programs must be an ethical responsibility when dealing with
numerical algorithms, all of which, "at the bottom line...affect people."
      Descriptive uncertainty deals with the representation  of data in computers,
including both numeric and spatial  problems.  Numeric uncertainty arises from
the method computers use to store data. For example, the binary number system
does not have an exact representation of some common decimal numbers, such as
1/100.  Another type of numeric uncertainty arising from number storage is the
shifting of significant digits.  If a number is  input into a computer that has the
ability to store a longer number than is input,  the computer will 'pad1 the extra

-------
                                                Mean Sea Level
                                                Mean Spring High Tide
                                                Mean Neap High Tide
                                                Mean Sea Level
                                                Mean Neap Low Tide
                                                Mean Spring Low Tide
Figure 1.4  An example of a change from a hard to a 'fuzzy' boundary.

space with zeros.  These digits will be available for computation in numeric
modeling even though they add no information and are meaningless.  Significant
digit shift can occur in the other direction as well. If a number is input into a
computer that stores fewer digits than is input, the computer will round or
truncate the number to fit its numeric scheme. These problems can be controlled,
but not eliminated, by specifically programming the computer for the required
operations (Knoble 1990), but for generally available software this is not possible.
      Spatially, data is generally stored as either a table  of vectors that define
sharp boundaries between regions, or in regular tessellations (square, 'raster'
grids) that force a predefined spatial pattern on an area (Rejeski and Kapuscinski
1990,10).  Both of these methods introduce uncertainty: the vector representation
forces boundary lines where transition zones may be; and tessellations assume the
entire grid cell is homogenous.  These problems can be reduced, but again not
eliminated, by using smaller polygons or grid cells, but this forces a trade off
between data file size and computational time, and decreased uncertainty, which
may not be pragmatically feasible.
      Computational uncertainty deals with  the problems of numeric modelling
by using computers (Knoble 1990; Rejeski and Kapuscinski 1990,13).  Once data
are stored in a digital format, any further processing can introduce uncertainty.
Type shifting (such as from an integer format to a real number format, or from
reals to integers) can introduce uncertainty by forcing rounding to occur. More
commonly, rounding occurs within real numbers when numbers that are not
similar in value are arithmetically joined. Knoble (1990,4-5) gives an example of
the results of this type of rounding: an IBM 3090 Model 600, VS FORTRAN 2.4
program that for the formula:

         P=((A+X)**2-A"2-2.*A*X)/X**2

generates an answer of -4999.99609 when A is equal to 1000.0 and X is equal to 0.01,
despite the fact that the formula simplifies to P equalling one for all X's not equal
to zero.
      Significant digit shift can also occur in arithmetic operations, particularly in
operations involving subtraction of similar values. Subtracting 1.23456787 * 107
from 1.23456789 * 107 yields 0.2, but a computer can store this value as 2.00000000

-------
* 10'1, and as with data input, will allow computation on all of the digits to the right
of the 2, which is the only meaningful digit in the number.
      Two additional,  similar  problems that may occur are  overflow  and
underflow. These result when a number is incremented, or decremented beyond
the storage type's ability to  represent numbers.  Depending  on the operating
system this may cause an error or may be ignored with the value remaining the
same or drastically changing.  Borland's Turbo C++ 2.0, running under MS-DOS
5.0 on an IBM AT, compiles programs that allow adding one to  the thirty-two bit,
'unsigned long' integer 4294967295 (which is equal to 232 - 1, and is the largest
integer that can be represented in  thirty-two bits), changing the value of the
variable to 0. Turbo C++ compiled programs will give an overflow error when
thirty-two bit real numbers (type 'float') are  incremented  outside of type float's
range.  When these computational uncertainty errors occur in a program, and are
not handled well, the program could continue and generate an apparently correct
answer purely by chance (Knoble 1990,2).
      Propagational uncertainty deals with the problem of how uncertainty from
a physical world measurement or another computer-related uncertainty moves
through successive iterations of a model. It can be tested by varying the input to a
numeric model in small steps to see if small changes can make large differences in
the output of a computation, which in principle should not occur. For example, by
making the value of A equal to 100.0 and X equal to 0.01, Knoble's (1990,  5)
program generates a value for P equal to -39.0624847; by changing X to 0.0078125,
the program generates a value of P equal to 0.0000000.  Propagational uncertainty
can also be tested if the computer program can be rewritten to an algebraically
equivalent, but computationally different manner, which would allow comparison
between programs that should generate  the same output.  By simplifying the
formula in Knoble's example to a command line such as:

         if X not equal to 0 then P = 1 else print "DIVISION BY ZERO"

the problems of floating point arithmetic can be avoided.
      Propagational uncertainty not only involves the accuracy of real number
computations for numeric data, but also the generation of polygon overlay slivers
within vector spatial data, such as that used by ARC/INFO (see Figure 1.5 on page
10).  The slivers that may be handled by "fuzzy tolerances"  (ARC  Command
References, Commands J-Z, 1991, UNION 1), which would allow the shifting  of
close lines so that they merged in the output. This could cause the shifting of data
from one layer of a known high accuracy  (such as a surveyor's  cadastral data)  to
correspond with a layer of lower or  unknown accuracy (such as information
digitized from a medium or small scale paper map). All future use of the merged
data layer would include the uncertainty created by the overlaying of two data
layers of varying accuracy, and the meta-data that should be attached to the new
data layer would have to reflect an estimate of how much shifting occurred. This
type of shifting can be eliminated by avoiding fuzzy overlays, although this can
cause the generation of sliver regions along boundaries that will cause increased
storage and processing time for use of the new layer, and may not hold any useful
information other than the indication of the difference between the two layers used
to create the new layer.
      Modeling uncertainty culminates the types of uncertainty associated with

-------
 Boundary 1\   11 Boundary 2
                Fuzzy Tolerance
                Boundary Shift
                ..Polygon Merge
                Sliver
Figure 1.5   Polygon   overlay  can
cause shifting in boundary lines, or can
create boundary slivers.
                                                                      10
computers. Although there are several
types of modelling  (verbal,  graphic,
physical, and mathematical), each of
which  is  subject  to   questions  of
robustness and validity versus the goal
of the model  (such as description or
prediction),  it is only  mathematical
models that tend to be dependent on
computers for execution and are thus
subject   to    the   other   computer
uncertainties.   The  robustness of a
model is reflected in a model's ability to
handle  all  appropriate  input  and
produce a reasonable output. It is thus
tied to propagational uncertainty, but
robustness also entails that not only do
small  changes  in  input  not  cause
inappropriate  changes in output,  but
that the model's  output, in practice, is
also within the expected range of the
mathematical model, in principle. The
validity of a model is the question of whether a model, in practice, actually
represents the phenomena being  modeled, in principle.  This questions the
methods used to operationalize a numerical model, such as the validity of using
'if.. .then' statements to ensure the apparent robustness of a procedure that would
otherwise  produce non-robust output, when no such statements are apparent in
the 'real world' phenomena.

Uncertainties of the Human World
      Problems  of the human  communication of information  and meanings
constitutes the third area of uncertainty (Rejeski and Kapuscinski 1990, 12).  In
attempting to communicate information meta-data can be lost in two ways: the
sender of information can  give data without the meta-data information, or the
receiver of information does not understand that part of the message.  The first
way, not giving  meta-data with data, can be the result of restrictions of space
within publication materials, the desire for the appearance of greater accuracy
(Star 1985), and until recently a lack of awareness of the potential importance of
uncertainty information, particularly in the areas of human and environmental
risk analysis (Rejeski and Kapuscinski 1990). The second way that meta-data can
be lost is when the receiver of information not does receive that part of a message.
This can result from the ignoring of meta-data or the  inability to interpret the
meta-data  through lack of experience in dealing with the way it is conveyed. This
type of failed effective communication  can  be  the result of the  variable
interpretations of both words and images.  The meaning of words constitutes one
of the major problems the U. S. Environmental Protection Agency has had to deal
with in risk analysis (Rejeski and Kapuscinski 1990,12).  Effective communication
of meta-data has been studied in a strictly human realm (body language, etc.), but
little has been done in the area of communication of uncertainty within the realm

-------
                                                                      11

of  scientific  communication,  particularly  with  spatial data  (Rejeski  and
Kapuscinski 1990, MacEachren 1992).
      This taxonomy should prove  useful,  particularly for reducing human
communication  uncertainties.    By  recognizing  that  uncertainty  exists in
measurement  and is propagated in  computer manipulation  of  data,  these
uncertainties can be dealt with openly and honestly, as Rejeski (1991) suggests.
This,  aided by  visual communication  techniques, should reduce  human
uncertainty by increasing  the amount  of  information  (by communicating
meta-data) given in the presentation of data.

The Idea of Visual Communication
      The effective communication of environmental data and uncertainty as
meta-data requires an understanding  of the principles of visual communication
and map design.  David DiBiase (1990) has  developed a  model of information
display, in a research setting, as a means of communication in a continuum from
communication to self through communication  to  others  (see Figure  1.6).
Communication to self can be thought of as "visual thinking", and includes data
exploration,  hypothesis generation and confirmation.  At  this level of  data
visualization, maps and other graphics are used to "prompt insight, reveal patterns
in data, and highlight anomalies" (MacEachren 1992,1). The goal in the creation
of these images should be to assist these goals. Because of these goals, it is with
this type of visual communication that the possibility of visualization error is
greatest.  MacEachren and Ganter (1990) describe these errors as seeing wrong
(similar to the type I error in hypothesis testing—identifying a pattern that is not
there) and not seeing (similar to the type II error—not identifying a pattern that is
there).  But, since these graphics are generally rough and  intended only for
viewing by the researcher(s), searching for one 'optimal' display is less important;
more views on the same data may be more helpful and reduce the chances of not
seeing or seeing wrong.
      Communication    to
others  can be thought of as
"visual communication" and
is the realm of presentation
graphics.   At this level  of
data visualization, maps and
other  graphics are used  to
synthesize   data   into  an
"abstract         statement
concerning  patterns  and
relationships"  (MacEachren
1992,6) and finally to present
the  information to  others to
persuade   them   of   the
accuracy   of   the   data
assessment    (MacEachren
1992,  7).   Edward  Tufte's
(1983,   77)   principles   of
graphic excellence  can help
                               Visual
                         Communication
                        Synthesis
                                Presentation
Figure 1.6   DiBiase's (1990) model of the function
of graphics in research.  Visualization begins with
exploring an idea.  The idea then moves through
confirmation and synthesis in a larger group of
both ideas and people, and ends with presentation
(although each stage can spawn new ideas).

-------
                                                                       12

ensure the most information is presented in a minimum of space, and that this
information is conveyed ethically:
     The representation of numbers, as physically measured on the surface
          of  the graphic  itself,  should be directly proportional to the
          numerical quantities represented.
     Clear, detailed, and  thorough  labeling should be  used to defeat
          graphical distortion and ambiguity. Write out explanations of the
          data on the graphic itself. Label important events in the data.
     Show data variation, not design variation.
     In time-series displays of money, deflated and standardized units of
          monetary measurement are nearly always better than nominal
          units.
     The number of information-carrying (variable)  dimensions depicted
          should not exceed the number of dimensions in the data.
     Graphics must not quote data out of context.
Tufte also discusses the concept of data-carrying ink—that is, don't put more ink on
the page than is necessary to convey the information. These principles are similar
to those that Morgan and Henrion (1990) propose for designing graphics for the
presentation of uncertainty information.  Judy Olson (1981) also suggests the need
for clear and  accurate legends in her guidelines for the production of bivariate
maps.
      Monmonier  and Johnson  (199077) have proposed a guideline  for the
communication of environmental risk, which can aid in the making of maps for
visual communication.  Their multistep process is not a waterfall type model
(when one level is completed it cannot be returned to), but rather a guideline for
iterative refinement for the presentation of data.  The steps they include are:
setting up the design team; identifying the communication goal; the issue profile;
the audience; the messages of environmental maps;  methods; and  evaluation.
Setting up the design team acknowledges that one person may not have all of the
knowledge necessary to adequately design a graphic, and that, as necessary, each
of the following steps should involve each person who has or can contribute to the
map. Identifying the communication goal is simply that a focus should be selected
for the map; this will facilitate inclusion of important information and removal of
extraneous data. The issue profile deals with the history of the problem  to be
mapped and the constraints on producing the map; that is, the 'environment'  of
the map and map design process.
      The audience must also be considered when designing a graphic.  This
includes consideration of who will be viewing the map (politicians, scientists, the
public, etc.).  These different audiences will generally have varying degrees  of
map-interpretation skills; this influences the amount of explanatory information
that should be included, the appropriateness of bi- or multivariate maps, and the
appropriateness of 'eye catchers' such as bright  colors.  Consideration of the
audience should also include an acknowledgment of how  the map will be
presented; this is part of the issue profile and influences the choices of methods.
      Monmonier  and Johnson (1990)  present a list of several categories  of
messages that environmental maps generally present.  The first of these is "What
we found/What we know/What we  think  we know"  (p!2).  This is the
presentation of information at its most basic level, but even at this level meta-data
can be presented-'what we think we know.' The second of these messages  is
"What you can do/What you should do" (p!3). This is very dependent on the

-------
13

choice of audience—a lawmaker's set of choices of what to do can be quite distant
from a concerned citizen's, for example. The third message is "What we're doing/
What we want to do" (p!3). For this type of message Monmonier and Johnson
suggest that an overview map with several smaller maps of detail maps may aid
the presentation. The last category is "Why we're doing what we're doing/Why
you should do what we're asking you to do" (p!3). This type of map should
present the reasoning behind a choice or plan of action; this can include the
presentation of the history side of the issue profile.
The final two steps of Monmonier and Johnson's strategy are methods and
evaluation. Methods need to address questions such as the need for one or several
maps, how these maps will be presented (large size color maps, 8.5x11 black and
white maps, slides, video, etc.) and whether or not additional information such as
non-map graphics should be included. Although evaluation is presented as the
last step in map design, it is a part of each of the earlier steps. It is the last step as
a review of the, possible, final design. Evaluation can be formal (a survey) or
informal (a telephone call to someone who has seen the map and can suggest any
possible improvements).

The Means of Visual Communication
Jacques Bertin proposed a group of 'visual variables' that has since been
added on to by other cartographers (such as: Morrison (1974), McCleary (1983),
and Woodward (1991)). This group of variables constitutes those representational
techniques that a cartographer or illustration designer has in the creation of an
image. The list includes: location, size, value, hue, intensity, orientation, shape,
texture, arrangement, and focus (see Figure 1.7 on page 14). This list is not
necessarily all inclusive, but the list is useful in the design of maps and graphics.
Each of the variables can be used to establish visual isolation (difference from
surroundings) and visual levels (greater noticeability) (see Figure 1.1 on page 2),
and according to Bertin, these visual variables have certain levels of measurement
that are commonly associated with them, and thus allow representations that
convey the character of the data.
In a static map, the use of location is limited to the signification of the spatial
place of an item, although in orthogonal displays, location is a flexible tool because
of the possibility of specifying the viewpoint on a simulated three-dimensional
surface. In multiple maps, or dynamic mapping, change in position can be used to
show movement of a feature. It is inherently an interval/ratio variable, but can be
used to depict all levels of measurement.
Size is most often used to depict an ordinal variable, although interval/ratio
variables can also be depicted using size. Because a larger symbol is almost always
associated with 'more,' this is the context that it should generally be used in.
Of the 'color' variables, Bertin (1983, 42) only identified value and hue.
Value, like size, has a distinct range from more to less and is therefore good for
mapping ordinal data and can also be used for interval/ratio data. ARC/INFO
refers to value as lightness. Hue can be used to depict ordinal or interval/ratio
data because the frequencies that constitute hue are ordered, but these orders are
not always readily remembered and used; hue (at a constant value and intensity)
is therefore best used for nominal data. Intensity, which is also called saturation,
has a distinct range from more to less and is therefore good for ordinal data.

-------
                                                    14
                   Point
                       Line
Area
      Location
            Size
                     *
          Value  /.;•.
                    • •
                    £    -^ j
                    & £
                     , ,  -tl'-
                    ± Xt v  A
                      ©
            Hue
      Intensity
   Orientation
Shape
       Texture
Arrangement
          Focus
Figure 1.7  The visual variables as presented by DiBiase, Krygier, Reeves,
MacEachren, and Brenner (1991). All have been done in ARC/INFO 6.0, but some
(such as texture for point symbols, orientation for line symbols, and arrangement
for area symbols) are more difficult and, thus, less useful than others.

-------
                                                                        15
      As availability increases for the means of creation of high resolution color
displays, use of the color visual variables will become an even greater part of
cartography. Color can be specified in several ways (Dent 1985). One common
and useful way is the Munsell color system (see Figure 1.8); it is based on the
human perception of color (and is similar to the Tektronix color cone, and a color
specification system in ARC/INFO, Hue-Lightness-Saturation).  This system
divides color into the three visual variable categories value, hue and intensity.
      Value is the measure of the lightness or darkness of a surface; it is the total
amount of light that is reflecting or emitting from a surface measured relative to
the human ability to discriminate black from gray up to white.  It is the  only
measure of light along the white-gray-black continuum, because all frequencies of
light should be equally present.
      Hue is the term most often meant when the word 'color1 is used.  The
Munsell color system records hue as an angular measure around a color space. It
represents the modal value of the frequency of light that is reflecting or being
emitted from a surface.  When a light has no modal frequency, the perceived light
is   along    the  white-gray-black
continuum,  which  constitutes   the             Value —'W*1116
central  axis  of the Munsell  color
space.
      Intensity is a measure of the
purity  of the spectral  frequency of
light. It represents the variance of the
frequency of light;  greater variance
means   that  a  larger   range   of
frequencies of light are being emitted
or reflected from a surface—that the
surface has more gray in it. A feature
of intense colors is that they tend to be
more noticeable  (stand  out more)
than less intense colors,  even  when
the total amount of light reflected or
emitted may be the  same, which
                                                                    Hue
                                     Intensity
2/

I/
                                                       Black
                                    Figure 1.8   The Munsell color system;
makes intensity good for establishing  for example, an intense yellow could be
             J C7                 O      • f •  j   ^N/F"f / *1 J
visual  hierarchies.   Because of the  specified as 5Y7/14.
human visual system, the value of the
most intense color of different hues varies, with yellow having the highest value
for its  maximum intensity.  This change is accounted for in the Munsell color
system; the Tektronix color cone assumes that maximum intensity for any color
occurs at the midpoint of the value scale.
      For cartographic use in displaying ordered data, color hues are generally
arranged to allow ease of interpretation.  There are many possible color schemes,
many having been suggested for terrain shading.  For other data,  the spectral
ordering of hues may be the most obvious for use in mapping ordered data, but
because colors  can  shift in intensity and value, this may  lead to misleading
maps—yellow will stand out in the spectral pattern.  Two suggestions are of note
for remedying this problem. The first is the use of a part-spectral scheme: yellow
through  orange to  red; or, yellow through green to blue.   This can allow

-------
                                                                        16

redundancy in the visual variables of intensity and value, which reinforces the hue
progression. The second is the use of hues ordered on the basis of value. This may
be a viable alternative, but could prove confusing to those who know the spectral
sequence of hues.
      The ability to discriminate colors has received some study, particularly in
the discrimination of values.  The Munsell color system divides the range from
black to white  into eleven steps, as  does the system  for  black and white
photography developed by Ansel Adams (Upton and Upton 1989,313). For most
cartographic use, eleven steps would be both difficult to  create and difficult to
interpret; most cartographic research indicates that the use of half that range (five
or six) is better for map design. If value gives is an indication for the amount of
discriminability  that can  be  expected  for  intensities, four intensity values is
probably the most that should be used. Discrimination of hues for cartographic
use is affected by the apparent changes in intensity and value that occur as hue is
changed,  but  for univariate  symbolization,  five  steps  should be easily
discriminable in one of the part-spectral sequences.
      Colors are produced by two methods: color addition and color subtraction.
The color addition process is most commonly seen in color monitors for computer
displays and television. It involves the mixture for red, green and blue to create
colors on a black surface with all three being used for white. The color subtractive
process is used for the production of printed material, such as paper maps.  It
involves the use of cyan, yellow and magenta (and black) to subtract colors that
would reflect from the surface of white paper. Because the overprinting  of cyan,
yellow,  and magenta generally leaves muddy brown, black is often used as a
fourth color in the printing process. Colors are obtained by overprinting  the four
separates, with each offset slightly in order to allow the apparent mixing of colors
in a dither pattern.
      The production  processes must  be kept in  mind  when choosing color
schemes on a color monitor that will be printed on a paper output device. Colors
on a computer display may not appear the same on a printed sheet, because of the
change in creation method. In addition, on a computer monitor, a point can take
on any of the possible colors generatable by the graphics system, and resolution is
independent of color. On paper output, the dither patterns that are used to create
the appearance of many colors may cause a change in apparent color and, more
significantly, output resolution.  ARC/INFO allows the use of color lookup tables
in commands such as  HPGL2 (ESRI ARC Command References, 1991) to enable
redefining screen colors to pretested printer colors to help compensate for changes
in color; changes is resolution must be accounted for by the cartographer.
      A final consideration for color (particularly hue) is its social interpretation.
Colors can have meanings that must be taken in to consideration when designing
maps and  graphics, for example: red as stop; yellow as caution; green as  go. For
environmental maps, use of red (particularly, intense red) can indicate imminent
danger, and with  dark shades of other colors  (blues, grays and browns, for
example) can evoke a sense of foreboding or futility (this is demonstrated in The
Nuclear  War Atlas and  movies  such as Blade Runner).  Other color schemes can
invoke other reaction: pastels (colors of low intensity and high value)  can indicate
serenity (unclassified maps by the Central Intelligence Agency often make use of
pastels); intense colors such as yellow and cyan  grab attention, and can indicate

-------
                                                                       17

happiness—they are often used in maps for children.
      The final three variables that Berlin identified are orientation, shape and
texture. Orientation is readily discriminated by the human visual system and is
therefore good for indication of nominal data, although by using common symbols
such  as  clock  faces, ordinal and  interval/ratio data can be  displayed  with
orientation. Shape should be used for nominal data categories, because shapes in
general do not have an apparent order and are not as readily distinguished as
other visual variables. Texture is the relative coarseness of an area fill, and because
rough textures appear closer than fine textures, texture is  good for establishing
visual hierarchies.  Texture should generally be used for nominal data, but it can
also be used for ordinal and interval/ratio data.
      Unlike  orientation  and texture,  shape has  a continuum of possible
representations: from mimetic to abstract (see Classifying Space on page 24).
Mimetic  symbols convey the appearance of what they represent (an animal's
outline represents sightings of that animal). Abstract symbols must be defined (a
legend indicates that triangles represent sightings). ARC/INFO provides many
abstract symbols, a few that are less abstract (for example, an anchor that might be
used to represent a marina), and the ability to define new symbols (see Map Display
and Query). Selection of appropriate levels of abstractness must be considered in
designing a display—use of mimetic symbols may allow less dependence on the
legend, but too many mimetic symbols may give the appearance of a map oriented
toward children.
      Two additional visual variables that have been identified since Bertin
proposed his list are arrangement and focus. Arrangement is the order of symbols
in an area fill (from regular to random to clustered), and nominal and ordinal data
can be represented by changes in arrangement. Focus is the last visual variable; it
is the crispness of the edges of symbols. Because this has an apparent order, focus
can be used to display ordinal data and could be used for interval/ratio data,
although too much variability of focus may make a map hard to read.

Methods for Visually Communicating Data and Meta-Data
      The  communication  of uncertainty has been  studied by Morgan and
Henrion  (1990)  with   Harold  Ibrekk,   although  their  concern   was   the
communication of the uncertainty of linear data. Because of this, the methods they
present cannot be readily used in the communication of mapped geographic data,
except for point symbolization (even this is difficult in ARC/INFO), but their
study highlights some of the difficulties that can be encountered in the visual
communication of uncertainty and their conclusion are useful.
      Their study analyzed nine methods for displaying  uncertainty in linear
data: a point estimate with an error bar; a discrete density function); a pie chart; a
probability density function; a half-height probability density function mirrored
on the x-axis; a dot density horizontal bar a vertical line density horizontal bar; a
modified Tukey box plot (minimum and maximum points are not included, and a
mean point is included); and a cumulative density function (see Figure 1.9 on page
18).  Their  recommendations include  using displays that specifically show the
information that is to be extracted (such as a point for a mean value, if mean values
are important), and using multiple displays (particularly, display the  cumulative
density function and  the probability density function one over the other, with a

-------
18
Point Estimate with Error Bar Mirrored Probability Density
I • 1
0 2 4 6 8 1012141618

Discrete Density Function
0.20-i
0.16-
0.12-
0.08-
0.04-
0

-------
19

common horizontal axis).
For two- and multidimensional uncertainty in non-spatial displays, Morgan
and Henrion give several examples of methods that can be used for graphing (see
Figure 1.10 on page 20). These include: multiple lines in a probability density/
cumulative density pair; multiple Tukey box plots; linear graphs with error bars;
orthogonal displays; and triangle plots alone and in multiples.
They conclude with a list of factors that should go in to design of
uncertainty displays (Morgan and Henrion 1990,252):
finding a clear, uncluttered graphic style and an easily understood
format,
making decisions about what information to display,
making decisions about what information to treat in a deterministic
form and what to treat in a probabilistic form,
making decisions about what kind of parametric sensitivities will
provide key insights.
They also suggest that display design often involves the reduction of a multi-
dimensional model into the two dimensions of a paper or monitor display (as does
Tufte 1991), and that the intended audience's experience in interpreting graphs
must be considered when creating a display.
For the representation of uncertainty for spatial variables, there are two
possible cartographic routes. The first of these is the creation of two maps, one for
displaying the data, and one for displaying the meta-data. Olson (1981) and
Laurence Carstensen (1986) have tested this choropleth map arrangement against
two bivariate mapping techniques (Olson: spectral encoding; Carstensen:
intersecting lines) in the representation of statistical correlation.
Olson finds that map readers can initially interpret value shaded,
monovariate maps pairs more readily than bivariate maps. On the other hand, she
reports that over half of those who could interpret bivariate maps at a significantly
better than guessing level, did better with a bivariate map than two separate maps.
She then suggests that the bivariate maps may be more readily interpreted once the
"cognitive hurdle" (Olson 1981, 269) of the bivariate mapping technique is
overcome.
Carstensen also finds that map readers find interpreting value shaded,
monovariate map pairs easier than bivariate maps. Because his test compared a
classed map pair and unclassed bivariate maps, the problem he notes with the use
of map pairs (poorer statistical residual scores) would not be as likely to occur if
map pairs are compared with classed bivariate maps.
The map pair technique should be a useful tool for communication of two
variables, when the data must be classed (as is done for pragmatic reasons in
ARC/INFO). Since meta-data values may not be correlated with data values, the
use of two monovariate maps should be an effective tool for communicating
uncertainty versus techniques that were designed to highlight spatial correlation.
The second cartographic route for the visual representation of uncertainty
is bivariate mapping (the mapping of two variables onto the same map). This
would ensure that meta-data is presented to the map reader with the data, but this
mapping method generally increases the difficulty with which data can be
extracted from a map. Because of the variety of visual variables, there are several
possible methods of bivariate mapping, some of which have been tested for
communication effectiveness; most of these have been oriented toward the
-------
Probability/Cumulative Density
20
Orthogonal Display
1.0-,
iiiiirn
2 4 6 8 1012141618
XI (first uncertain variable)
Multiple Tukey Box Plots
8g

-------
21

representation of spatial correlation. This includes the testing of the intersecting
lines method (which uses texture), color maps made by the U. S. Census Bureau
(which uses a full hue range to achieve bivariate representations—spectral
encoding), color maps that rely on two complementary hues, and the
equiprobability ellipse (which also uses the complementary hues).
In a technique that allows for bivariate mapping in monochrome displays,
Carstensen (1982) has tested the communication effectiveness of the intersecting
lines method of bivariate mapping (see Figure 5.4b on page 73). These maps use
horizontal and vertical lines for representing two variables. Carstensen suggests
that this scheme be used in an unclassed map, but this scheme can be used for
classed representations of data and meta-data. Although this technique can be
used in monochrome displays, colored lines can be used to distinguish the two
variables (this has not been tested for communication effectiveness, though).
This technique has one disadvantage: the method of producing the area
symbolization may lead to a conflict between two visual variables. Both texture
and value change with the mixing of the lines; both establish visual hierarchies
with coarse textures and dark values standing out in the display. This is a problem
because in the representation scheme these two points (coarseness and darkness)
are at opposite ends of the data ranges. This can cause value to be used for
identifying relationships even though squareness in the texture is the intended
visual variable (Carstensen 1986).
The U. S. Census products, originally published for 1970 census data, that
show two variables were studied for communication effectiveness by Olson (1981)
(see Figure 5.5c and d on page 75). These maps use two hue patterns for
representing two variables, with the part-spectral ranges of yellow to blue and
yellow to red being used on the x and y axes, respectively. Olson reports that some
of these maps (such as education and income) convey information well, especially
for homogeneous regions. She also indicates that these maps are thought to be
more authoritative and more innovative than two separate maps showing the
same information. In concluding, she suggests that prominent and clear legends
are necessary for accurate interpretation of bivariate maps; that both the
monovariate map pair and the bivariate map be shown, with the monovariate map
pair in a monochrome format; and explanatory notes should be include to the
types of information presented. These guidelines are in keeping with both Morgan
and Henrion, and Tufte, and are feasible with ARC/INFO's ability to rapidly
generate small, monovariate maps.
As a response to the problems of interpreting a spectrally encoded bivariate
maps, Steiner (1979, in Eyton 1984) has proposed a complementary color scheme
(see Figure 5.6 on page 77). This color system makes use of the mixing of
complementary colors (such as red and cyan) to produce a central gray region that
highlights the diagonal that represents correlation in a bivariate map. This can be
done for unclassed or classed representations of data, and by flipping the order of
one tint, negative correlations can be shown as well as positive correlations.
Building on Steiner's proposal, J. Ronald Eyton (1984) developed the
complimentary color bivariate map with an equiprobability ellipse (see Figure 5.7a
and b on page 79). These maps use a modified, 2x2 complimentary color range,
with an additional class that occurs in the middle of the matrix and represents the
central cluster of data. By plotting the two variables to be mapped on a scatter
-------
22

diagram, linearly correlated data will have an ellipse-shaped central cluster. By
selecting a percentage of the total number of observation to be included in the
central ellipse, a category for the central cluster can be created with the formula
(Eyton 1984,488):

(X-X )2 (Y-Y )2
m m
o2 -2

X2 =
a2 a2 2r(X-X )(Y-Y )
x y v m/x rrv
1 -r2 a »a
x y

X2 = chi-square (1.386 for 50%)
X, Y = the X and Y observations
X , Y = the means of X and Y
m m
a , a = the variances of X and Y
r = coefficient of correlation (Pearson's r)

When this ellipse is displayed in gray, surrounded by the four corners of the data
set in white, black, rea and cyan, a bivariate map that portrays the central data
cluster explicitly (without haying a staircase effect) can be created.
There are other possible methods of bivariate mapping that have been
postulated as effective communicators of (un)certainty. Two of these, which have
not been tested, are color intensity and focus. Neither of these methods place
emphasis on a correlation of the data and meta-data variables, but rather use visual
variables to highlight portions of the data set, as determined by the meta-data
information.
Use of color intensity may prove useful for the cartographic representation
of uncertainty, because it allows the highlighting of certain (or uncertain) areas by
specifying intense shades for those areas, and less intense for others (see Figure
5.5b on page 75). Because intense colors stand out in an image, this technique
would allow the creation of a distinct visual hierarchy that emphasizes certain (or
uncertain) values. Generally, for maps that will be used for data communication
(as in the DiBiase model), intense colors should represent certain areas and less
intense (that is, more gray) colors should represent uncertain areas.
Focus is potentially a useful means of representing uncertainty.
MacEachren (1992) presents several possible variations on focus: edge crispness
(for external boundaries of points, lines and areas), fill clarity (for internal
boundaries within point, line or area symbols), fog (by imposition of an
interposing, translucent layer over another symbol), and resolution (point
thinning for vector databases or aggregation in to larger area units for raster
databases). All of these involve the blending of a symbol with the surrounding
parts of the image, thereby eliminating clearly defined regions. In ARC/INFO
edge crispness can be accomplished by buffering regions and careful assignment
of color in the buffer areas; resolution can be accomplished by thinning points
manually or with the ARCEDIT command GENERALIZE.
-------
23

Another method of bivariate mapping is the use of a fishnet, orthogonal,
view to display one data set, with another data set used to color the display (see
Figure 9.2 on page 101). This is commonly used as a technique for displaying
terrain elevation (with land use/land cover draped over the net by specifying the
net's color with the land use symbolization scheme), although there is no
restriction on its use for other types of data. When a data layer represents
information that is known to be continuous and smoothly changing (such as
elevation, air temperature, or air pressure), this type of representation is
appropriate. If the uncertainty of a spatial data layer can be shown or assumed to
be continuous and smoothly changing, a fishnet representation of that statistical
surface (with data values used to specify the color of the net) should be an effective
method of conveying meta-data.
Finally, another representation of continuous and smoothly changing data
that could be used to display data and meta-data is the use of isoiines (see Figure
9.1a on page 99). With this technique data and/or meta-data can be represented,
with the use of another display technique if only one is to be displayed with
isoiines. Like fishnets, isoiines are often used for displaying terrain information (as
in topographic maps), but isoiines can also be used to show data and meta-data.
This could be accomplished by using different hues, values, intensities, sizes, or
textures to indicate which lines represent data and which represent meta-data.
-------
24

Chapter Two
Producing Displays

As Monmonier and Johnson indicate, the process of designing and creating
maps and graphics is a multistep process. There are many things that must be
considered in the design process, including the classification of both space and
data, and issues such as page layout. This chapter addresses these issues in the
context of ARC/INFO and leads into the methods of representing data that are
presented in later chapters.

Classifying Space
A fundamental process in the classification of space is the abstraction of
data. MacEachren and Ganter (1990) have describe this as a shifting along a
continuum ranging from images to graphics. At one end of this continuum is
information in its rawest form: for spatial data, aerial photographs and other
remote sensing products (see Figure 2.1, left). Further along the continuum, maps
provide an abstraction of images (Figure 2.1, middle). Some of the information
that is present in an image is dropped in order to allow symbolization of
information that may not be directly perceivable in the image. For example,
replacing a dark line through a green area with symbols for a road and a forested
area; the road can then be given a label, as well as provide an indication of the
number of lanes and access. At the other end of the continuum, graphics allow the
use of position in the display to symbolize any variable (Figure 2.1, right). For
example, a distance decay graph uses the X axis to indicate distance from a point
in any direction, and the point may not be tied to any one geographic location.
A similar continuum is the range of symbols that can be used in maps and
graphics (see Figure 2.2 on page 25). At the end of the continuum most similar to
images are mimetic symbols. At the end of the continuum most similar to graphics
are abstract symbols. Except for a few symbols that are toward the mimetic end of
this continuum, most of the symbols that are provided with ARC/INFO are
Distance from site.
Figure 2.1 The Image to Graphic continuum. Images approximate what we see,
graphics provide abstract representations of, possibly, invisible relations
(MacEachren and Ganter, 1990).
-------
25
abstract-these tend not
to have commonly
accepted meanings, and
thus can be defined by
the cartographer as
needed.
The degree of
abstraction that should
be used is dependent on
several factors. Most
mimetic
abstract
Figure 2.2 The Image to Graphic continuum
applied to symbolization. Symbols can vary from
mimetic to abstract. All of these symbols could be
used to represent a marina—the audience must be
importantly, the data that considered in selecting which to use.
needs to be mapped must
be effectively shown, with a degree of abstraction that is appropriate to the data.
For example, if a distance decay model is developed from a set of sample sites,
showing the values measured at individual sites may not convey the distance
decay as clearly as a graph. If the site locations are important, a map can be
generated with the data and an inset showing the graph can be made using
ARCPLOT's graphing tools. As mentioned in Chapter One, the audience must also
be considered when determining symbolization abstractness. Highly mimetic
symbols can convey a sense of simpleness, which may need to be avoided in order
to project an image of authority. A third consideration is the means of display;
mimetic symbols can be used to minimize the need for/use of a legend. This can
be helpful for slides and overhead displays.

Projections
There are several classes of projections, each with strong points. Equal area
projections show the same amount of space on the earth's surface for any given
area of the map. Equidistant projections show true distance from a point or along
given lines. Conformal projections have a constant scale in every direction from
any given point, and because of this latitude and longitude lines meet at right
angles. Some projections, such as Robinson's, are not mathematical
transformations, but rather, tabular. These projections have been designed for
specific purposes (Arthur
Robinson developed his
projection for world maps that are
more visually appealing than
others, like the Mercator
projection). Other projections
include the Mercator and gnomic
projections; the Mercator
projection shows lines of constant
compass directions as straight,
and the gnomic projection shows
all great circles as straight lines.
These two projections are best
used for navigation and not for
general reference
environmental maps.
or
Figure 2.3 EPA Region Six shown in
Alber's conic equal area projection.
-------
26
For environmental maps, in general,
equal area projections should be used. This
insures that symbols, especially those based on
size, are not distorted on the basis of the base
cartographic data (such as county, or state
outlines). For the 48 contiguous states, Alber's
conic equal area projection should be used;
Alber's conic should also be used for smaller
regions that are east-west oriented, such as EPA
Region Six (see Figure 2.3). The sinusoidal
equal area projection should be used for regions
that are north-south in extent, such as EPA
Region One (see Figure 2.4). Lambert's
azimuthal equal area should be used for areas
that have the same extent in all directions from
a center point (see Figure 2.5).
Because the distortions of any given
projection are scale dependant, the use of any
specific projection becomes less critical as the
map's scale increases (and thus the area shown
becomes smaller). For example, a map of a
hazardous waste site may need the Universal
Transverse Mercator grid for locations within
the site; at this large a scale (1:50,000 or larger)
use of the UTM projection is more appropriate
than an equal-area projection-there will be
little, if any, perceivable distortion of areas
regardless of the projection used, as long as the
projection is centered on the site.
The ARC command PROJECT allows
transformation of coverages between
projections, as well as realignment of
projections. If data is obtained from a national
database, the projection (if not included with
the database) may be:

Project: projection albers
Project: units meters
Project: parameters
1st standard parallel: 29 30 0
2nd standard parallel: 45 30 0
central meridian: -96 0 0
latitude of projections origin: 23 0 0
false easting (meters): 0
false northing (meters): 0
Figure 2.4 EPA Region One
shown in the sinusoidal equal
area projection.
Figure 2.5 EPA Region
Four shown in Lambert's
azimuthal equal area
projection.
Because this is designed for the contiguous 48 states, any subset of this data should
be reprojected for the subset, even if the output projection is Alber's. In particular,
the standard parallels should be chosen such that the new parallels divide the re-
-------
27
gion into three equal east-west bands,
and the central meridian select bisects
the region. This minimizes distortion
away from the parallels and prevents
an appearance of the whole map lean-
ing in one direction, which results
from an off-center central meridian.

Scale and Generalization
As noted above, selection of
scale is critical in the selection of an
appropriate projection. Scale is also
important in the selection of total area
to be mapped, and the relation of the
area to the data to be mapped. By
using a smaller scale, and thus
showing more area, the apparent
seriousness of a problem will be
reduced. This is because the apparent
extent of a problem is reduced by
showing more of the surrounding
area. Zooming in on a site has the
opposition effect~the appearance of
Figure 2.6 Insets allow focusing on
detail and give a 'big picture.' When
possible, place them in otherwise
unused space.
symbolization over a large part of the display conveys the idea that the problem is
everywhere. Use of zoomed in areas, which allow presentation of detail, with a
locator map can combine these two extremes (see Figure 2.6 on page 27). The
locator map allows a wide perspective, which, when combined with the large-
scale, detailed map, conveys both a micro and a macro reading, as Tufte (1990)
suggests is a key component of good presentation graphics.

Space and Time
With the increased processing speed of single user workstations, and the
increased flexibility of ARC/INFO, animation of data—both for analysis and
presentation—is becoming feasible. Animation has several possible approaches.
Change in time can be used to depict existence, attributes, or change in existence
or attributes. Change can also be broken into: looking at different parts of a data
set, one after the other; looking at a data set that shows variation in time, in time
sequence; or use of progression in time to show another data variable (this is
comparable to use of an axis of a graph to indicate change in time rather than
change in space).
There are two primary methods of creating animations in ARC/INFO, and
both require AML programming. The most flexible method is the use of AML
driven interactive map composition. This allows the display of an object (for
example a point location, such as the population center of the United States) that
changes over time. The object can be drawn at one location, erased with the
MDELETE command, and redrawn at a new location. The other method of
animation is the use of GRID to display changes in area data. The cliche 'Raster is
Faster' is still true enough to make a difference. Separate grid layers can be
generate, each of which shows the change in areal extent of a phenomenon. Each
-------
28

layer can then be draw with successive calls to a display routine.

Classifying Data
Data to be mapped for presentation should generally be classified (into five
or six groups) in order to aid ease of interpretation. When drawing features in
ARCPLOT, the commands: ARCLINES, LABELMARKERS, POINTMARKERS,
and POLYGONSHADES, as well as AML driven RESELECT's, allow the use of
lookup tables to define a data to symbol relationship for feature display (the
CLASS command also allows grouping of data). These tables allow user defined
ranges for cartographic products, rather than the default of directly relating the
data item identifications with the symbol set identifications. These ranges must be
determined by the cartographer and range types include: quantiles, equal
intervals, geometric progressions, mean and standard deviation intervals, and
natural breaks (the Jenks' Optimal classification) (see Figure 2.7). Certain data sets
may need to be classified on the basis of predetermined breaks (such as maximum
allowable concentration of a pollutant). This can be accomplished by manually
specifying all breakpoints, or by use of another classification scheme, with the
externally defined break added in. For example: run the Jenks' optimal AML for
five classes, note the breakpoints, then run the manual specification AML on a new
lookup table and specify the Jenks' breakpoints plus the predefined point.

Manual Classification
For both the CLASS command and lookup tables, break points can be
chosen manually, by exporting the data to a statistical package, or with the
assistance of the ARCPLOT command STATISTICS. The command syntax is:

STATISTICS
- contains the items to be classed;
- includes POINTS, ARCS and POLYS.
Jenks' Equal Interval Quantile Mean and Standard Deviation Exponential
www w f-
I1LL
CN
rH
IT)
Figure 2.7 Breakpoints set by different classification schemes. Equal Interval is
close to the statistically optimal (Jenk's), for this example. Quantile separates
relatively similar values at the extremes of the distribution. Mean and Standard
Deviations fail because the distribution is bimodal, not normal; Exponential fails
by grouping all of the items greater than 256 into one category.
-------
29

STATISTICS has several subcommands:

SUM
MEAN
MINIMUM
MAXIMUM
STANDARDDEVIATION
is a field in the attribute table of .
END - indicates all subcommands have been entered.

These values can be used to calculate mean and standard deviation break points,
as well as quantiles and geometric progressions.
There are several methods of defining classes within ARC/INFO, as well as
several methods of determining class breaks. As MacEachren (1992) notes,
interval and quantile classification systems, which are automated parts of the
CLASS command, generally are not the best methods for grouping data and
therefore will not be discussed here. The syntax for manually specifying intervals
with the ARCPLOT command CLASS is:

CLASS MANUAL <#classes>
<#classes> - the number of classes that will be generated;
- the numeric class breaks. There must be <#classes> -1 values.

CLASS NONE - turns off the classification scheme.

Until the CLASS NONE command is given the classification remains in effect, and
will cause all subsequent uses of ARCLINES, LABELMARKERS, POINT-
MARKERS, and POLYGONSHADES to be classified.
Another method of classifying data is to use lookup tables. These are INFO
files that perform a similar function as the CLASS MANUAL command. A manual
procedure from ARC for the creation of a lookup tables is:

1) Use PULLITEM to extract the attribute table item that holds the data values;
2) Use ADDITEM to add a numeric field called SYMBOL;
3) Enter INFO, SELECT the table and PURGE the old data;
4) ADD records to the lookup table;
a) Specify the breakpoint value for numeric data, or the alphanumeric for text data;
b) Specify the symbol number;
5) SORT on the data field (not the SYMBOL field).

It is possible to automate the creation and specification of lookup tables
within ARCPLOT. SETMAN.aml allows the manual update of symbol values
within a lookup table, as well as creation a new table. The AML handles both
nominal and numeric data. The syntax is:

SETMAN
will be created if it does not exist, existing tables will be selected for update.

Natural Breakpoints
Because the Jenks' Optimal classification scheme is generally considered to
provide the best classification of numeric data (MacEachren 1992), it should be
included in ARC/INFO (and with any mapping program). Unfortunately it is not,
-------
30
but ARC/INFO's does provide a macro language, commands to export data, and
a method for calling system programs. These can be combined to allow the
automated generation of the breakpoints for Jenks' optimal classification. The
AML, JENKS.ami, exports the data to be classified, calls the C program, jenks
(which must be in the executable search path), that calculates the breakpoints, and
constructs a lookup table. The cartographer must still specify symbol matches
though. Please note that the routine generates a lookup table that contains more
information than is required for ARCPLOT's use (although ARCPLOT can still use
it); this additional information is included for use by the AML's presented in later
chapters. The additional information is: an initial line that is one smaller than the
smallest data value (this is included only in non-nominal lookup tables, including
the output from Jenk's); a column that contains the number of values in the
associated coverage in the first record, and the number of values in each category
in the following records; and a column that contains the coverage name in the first
record, an 'n' (for nominal data), an 'o' (for ordinal data) or an 'r' (for interval/ratio
data), the coverage type in the third record (point, arc, etc), and for route and
section lookup tables, the route name in the forth record. This additional
information allows these AMLs to select the coverage it is based on and, for
legends, provide category totals and appropriate symbolization.
The C program jenks.C can be compiled on a workstation with an ANSI C
compiler (including Data General's DG/UX 5.4) by using the command line:

cc -ansi -o jenks jenks.c -Im

The command syntax for using JENKS.aml is:

JENKS
- the coverage that the lookup table will be created for;
- the feature type (point, line, poly, etc) of ;
- an interval or ratio level data field in the attribute table of ;
- the name of the lookup table to be generated;
- the number of data classes in the output lookup table.

Eyton's Equiprobability Ellipse Bivariate Classification
The uniqueness of Eyton's ellipse as a bivariate mapping scheme requires
that a column be added to the attribute table of the data to be mapped. As with the
calculation of Jenks1 optimal classifications this is best done with a combination of
macro and C program. EYTON.aml requires two ratio data items—the
classification system is based on the parametric statistic, Pearson's r. It also
requires that the system program, eyton, be in the executable search path. Like
jenks.c, the eyton.c program can be compiled on a workstation with an ANSI C
compiler by using the command line:

cc -ansi -o eyton eyton.c -Im
-------
31
The command line syntax for using EYTON.aml is:

EYTON
{classes} {chi_square_value}
- the coverage that the lookup table will be created for;
- the feature type (point, line, poly, etc.) of ;
- interval or ratio data fields in the attribute table of < cover>;
- the output lookup table;
{classes} - the number of classes on each axis, valid codes are 2 (default) or 3;
{chi_square_value} - a value for the selecting the number of points in the central ellipse,
defaults to 1.386, which is 50% of the observations.

Symbol Value Update
The lookup tables generated by these commands may require the
modification of the symbol numbers. There are several ways this can be
accomplished: use of '&SYS ARC INFO' to enter INFO from ARCPLOT; manually
declaring a cursor and using it to update the lookup table; using the AML given
above for manually changing values; or use of the AML, SETAUTO.aml. This
AML updates a lookup table by replacing the SYMBOL values with an ordered set
of numbers. It requires a starting value, a step value, and can optionally be given
two other values. Nonlinear progressions can be specified by including a 'scale'
value, and decreasing numbers can be obtained by specifying a
'subtraction_value.' These non-linear progressions should be used when a
SYMBOL value is used for specifying something other than a predefined symbol,
such as symbol value. In this case value should range from black to very light grey
with a greater change in the black/dark-gray values (value differences are easier
to discriminate for lighter values). For five classes, the use of 0 50 0.8 and, if
necessary, 69 as the subtraction value, will set up an appropriate value
progression.

SETAUTO {scale} {subtraction_value}
- the lookup table to be updated;
- the beginning value of the progression;
- the value used to change the beginning value;
{scale} - an exponent that is applied to --defaults to 1;
{subtraction_value} - value that a symbol value will be subtracted from in order to generate
decreasing progressions

Unclassed Maps
Unclassified maps (those that are continuously symbolized) can be created
in ARC/INFO, but the symbolization schemes available can be limiting and the
potential improvement in the ability to depict data (as suggested by Monmonier,
1976) may not justify the effort for presentation of data. For exploration and
analysis however, unclassified maps are an approach that may be worthwhile,
although software/hardware limitations can impinge on truly unclassified
displays. Eight bit plane graphic displays can only display 256 colors at once,
thereby preventing unclassed display of data sets with more than 256 values. This
can be partially circumvented by using the Jenks' optimal classification system to
create a lookup table with 256 classes. For data sets with less than 256 data values
or for systems with 'true color' (24 bit plane) displays, use the manual lookup table
definition AML (SETMAN.aml) to define a nominal lookup table. A limitation of
-------
32

'true color' displays is that 24 bit graphics systems can only generate 256 shades of
gray. So, in general, for 'unclassed' maps, create a lookup table with 256 categories
(or fewer if there are not 256 data values); once a lookup table is created, symbol
values must be assign on the basis of the lookup table's item value. SETUNCL.aml
accomplishes this. The command syntax is:

SETUNCL {scale_factor} {subtraction_value}
- the table with symbol values to be updated;
{scalejactor} - an exponent for generating non-linear scaling, defaults to 1;
{subtraction_value} - allows creation of descending scales.

Page Layout
Once the information to be displayed is classified, the data must be
displayed. This is the process of page layout, which can be broken into: layout
within parts and layout within the whole. Layout within parts of a display
involves the selected use of visual variables to highlight a part or parts of a data set
or display. This is the establishment of graphic hierarchies and involves two
techniques: visual isolation, and visual levels (MacEachren 1992, 27) (see Figures
1.1 on page 2 and 1.7 on page 14).
Visual isolation is the use of visual variables to give the appearance of the
separateness of part of a display. The visual variables that can accomplish this
include location, size, focus, value, hue, saturation, texture and orientation.
Location is the most obvious control-display items that are not close together will
not generally be associated (see Figure 2.8). Size can influence isolation because
any feature that is drawn smaller than the area allocated to it, will appear separate
from the surroundings (this is done in cartograms). Focus influences isolation
because unfocused, fuzzy symbols blend into the surrounding, reducing isolation.
Value, hue, saturation, texture and orientation influence isolation by enabling one
item to be displayed in a manner that is noticeably different than its surroundings
(dark blue surround by yellow, for example) and thus isolating it.
Like visual isolation, visual levels are used to separate an item from its
surroundings, but whereas visual isolation seeks to accomplish that in the x-y
space of the page, visual levels give
the appearance of separation in the
third dimension. The visual variable Isolation as Separateness
that control visual levels are: size,
value, saturation, texture, focus and
hue. These visual variables all play
a part in depth perception. Size
controls visual levels because things
that appear larger, appear to be
closer. Value controls visual levels by
control of contrast from the display
background—objects with high —— —1 mlle
contrast will appear above the 1 ldlometer
background (see Figure 2.9 on page Figure 2.8 Visual isolation. The letters
32). Saturation controls contrast in each word 'go together,' as do the lines
because intense colors appear to be and their labels, but not the lines and the
closer than less-intense colors, display title.
-------
33
Texture influences visual levels
because coarse textures appear to be
closer to the observer. Focus
influences visual levels because
sharply focused objects appear to be
closer than fuzzy objects. Hue
influences visual levels because, at a
constant value and saturation,
yellow is more noticeable than
other colors, but the influence of
hue can be hard to predict and
control.
In designing a page, the Figure 2.9 Visual levels. Darker value
layout within the whole page usually is a foreground (at a higher level)
involves the use of space (and for than lighter values, particularly on white
animation, time) to present backgrounds.
different parts of a message. The
type of display has an influence on what can, and should, be said on an image.
Paper displays can be of any size, but two are of more importance: 8.5 by 11 inch
paper, and full size electrostatic plotters. 8.5 by 11 inch paper is probably the most
common format for graphic publication. Its small size restricts the amount of
information than can be shown on one page, but generally high resolution and the
ability of the reader to study the graphic at length permit a great deal of
information communication. The portrait orientation, which is normally used
with text, limits the left-right extent of a display, so graphics that have a large left-
right extent should be displayed as landscape, when the graphic cannot be
redesigned to take advantage of the more common reading orientation. Generally
the primary part of the display should be shown with a maximum left-right extent.
Titles can then be placed above, and additional information such as legends and
locator diagrams can be placed below the primary part of the display.
Large size paper products, such as posters, can be created with less
emphasis on the orientation of the page, and more emphasis on the size and shape
of the area of interest. Orientation of the page should follow the larger axis of the
data set. Additional information should then be placed around the main part of
the graphic, in order to take advantage of 'dead space/
Slides and overheads limit the amount of data that can be displayed because
of the distance that they force on the viewer (see Figure 2.10 on page 34). The
farther a viewer is from the display, the smaller the display will appear, limiting
the visibility of detail. This implies that the page layout decisions for slides and
overheads require greater generalization in both space and data. Only the most
important information needed to support a point should be included on the
display. As such, additional information such as legends and locator diagrams
should not be included. This allows the maximization of the available space for
data display, and like posters, the orientation of the display should be aligned with
the larger axis of the data area. This loss of data on the graphic is offset by the
explanations of the presenter of the slide, overhead or video.
Video involves both high resolution computer monitors, and lower
resolution National Television Standards Committee displays. Video displays
-------
34

must, like slides, take into account the distance from the viewer, as well as lower
resolutions and display flicker. Minimize the amount of ancillary information, in
order to maximize the amount of space available for the primary data. Like slides,
this loss of visual information must be offset by audio explanations.

Titles and Type
The text that is used to label a map or graphic should be selected carefully.
This involves the determination of what should be labeled, how it should be
labeled, where the labeling should be placed, and what type of lettering to use.
Which items must be labeled should be identified during the planning phases of
the map design, particularly in identification of the communication goal—label
those things that are necessary to accomplish the goal. Additional labeling may be
necessary for locational reference, etc., but this ancillary labelling should be kept
to a minimum in order to emphasize the new information.
As with the amount of features that get labelled, the amount of text in each
label should be kept to a minimum. Often leading 'a,' 'an' and 'the's can be dropped
from labels. Phrases such as 'the city of,' 'plot of or 'Legend' can also be dropped
without loss of information and result in an improvement in communication by
getting rid of the clutter.
Labels should be placed on or near the object they label. Locations for titles
can include prominent positions such as top center, but may also include making
use of otherwise empty space (such as using the Gulf of Mexico's space in a map
of Florida). Labels of point features should, when possible, be put at the upper left
of the feature. Labels of linear features should be placed on straight sections of the
feature, or if necessary fit along a smooth curve, and be oriented for maximum
readability (top up, whenever possible). Area features should be labeled inside
their boundaries, when possible. Although it is not entirely unavoidable, text
should not be broken up by linear features or area boundaries; it may be preferable
to have a break in the line, although this can be difficult to accomplish in ARC/
INFO.
There are two general classes of fonts that are used for cartographic
displays, roman and sans-serif, both of which can be italicized, underlined, etc.
Roman fonts, such as the font used for this text, have 'serifs'-the small extension
at the end of the strokes that make up the character. These fonts usually are more
legible than fonts without (sans) serifs, particularly for small text. Sans-serif fonts
are simpler, but are generally best used only for titles and other large text. For
either type of font that may be used to label a body of water, it is cartographic
tradition to use an italic font. This may connotate a sense of flowing.
The size of text influences legibility, particularly when the speed of a
presentation is controlled by the
presenter, rather than the reader.
For paper displays that the reader PoStGr Title
will be able to study at length, text
as small as 5 points (0.07 inches) can Po*t8r ™*

b Ud Dif!erenCfLin^\ |iZ6S R9"re 2.10 The apparent size of 16 point
3n 6oaxSo 35% (0\°^5 Pt' text from 8 feet away (top) is the same as 6
tf 0.13\9 pt e^c.)(Kea^ int letterm fr/m £- feet (bottom)
1982). For paper displays which are [MacEachren ?992f 71).
-------
35

not available in depth study (such as flip charts and posters) the following
suggestion for slides and overhead are more appropriate, although detail
information could be included in posters.
For slides and overheads, not only must there be less text, but the text
should be larger (see Figure 2.10 on page 34). As a general rule text should be at
least 18 point with larger differences than paper (50%) (MacEachren 1992, 71).
Video displays also need to make use of large points sizes, because of lower
resolution and with some systems, flicker.

Insets and Legends
There are two main types of insets: locator diagrams and additional
graphics. Locator diagrams show the location of a smaller area, in relation to a
larger and presumably
more well known area ..^ |
(seeFigure 2.11). When C "j ""; iI YI \"\ .
multiple graphics are i i [ I I \ i \ ';
generated for the same L /" \ i : "\ i ••• :"\ / •; ; A.....
region, only one locator i
diagram should be !--•
needed, and it may be i
advisable simply to ;
i J : : :.. / . .' "• .-• ; ;-.— . /
make the locator i / f\ "••./' / ,/ \ ( *""'"" :'
diagram a separate I
graphic. For displays i
which need integrated |
location diagrams, they i
should be visually
isolated—the diagram Figure 2.11 A simple locator diagram.
should be secondary to
the main display. Therefore avoid bright colors, if not colors entirely, but ensure
the area of the main display is readily discernible.
Other graphics include legends, north arrows, scales and, for bi- or
multivariate displays, monovariate maps. Legends must convey the information
necessary to extract information from the main display. North arrows assist in
orientation when north is not at the top, which it does not have to be. For maps to
be presented to an international audience, include a north arrow, because south
may be expected at the top (as is the Chinese custom). For maps where a large and
hopefully well-known area is displayed (the United States or a subset of them, for
example), a north arrow is not needed. Map scales, like north arrows, are not
necessary for maps of well known areas and generally are not needed, unless the
map may be used for measurement of extent. As suggested by Olson (1980),
monovariate maps may be helpful in assisting readers who are not familiar with
bi- or multivariate maps. These single variable maps should be designed to be
both isolated and at a lower level in the visual hierarchy than the main display.
Monochrome displays may be the most appropriate technique for all but nominal
data.
Because of the requirements of different media, the amount of information
that should be carried in a legend varies, as does the need for a legend. Because of
-------
36
the generalization that goes into slides and video displays, the need for legends
should be minimal, particularly when the presenter is available to answer
questions. A slide for a legend may be a waste, because it cannot be referred back
to, and handing out a legend before a presentation may only serve to focus
attention on the legend, not on the data that is being presented.
Paper displays, particularly those where the author is not available to
answer questions, require the most thoughtful use of additional graphics.
Explanations of data must be written out on the graphic. For bi- and multivariate
displays, the mapping technique should be explained, and monovariate maps
should be included. Legends should present the character of the data (data level,
continuous or discontinuous, classification system used). This can, in part, be
accomplished by defining discreet variable symbols separately with a label on
each symbol, and by defining continuous variables as a continuum with
breakpoints, rather than midpoints, labeled (see the legends in Figures 5.1 on page
67 and 5.2 on page 69, for example).
An approach to legend design that may be most helpful to inexperienced
map readers is natural legends (see Figures 6.2 on page 86 and 9.1 on page 99).
Natural legend design is particularly helpful with abstract symbolization, such as
isolines, which can be difficult to interpret by a novice (especially when only given
a contour interval). By providing a legend that shows isolines on a surface and
spot values, the concept of a continuous surface and how data values are shown
can be communicated. Isoline data, particularly when several data items are
shown together, can be displayed as a single-variable, fishnet inset; this will
convey the concept of a continuous surface and allow analysis of individual data
sets, in addition to the presentation of the interplay of multiple data sets in the
main display.

ARC/INFO Hints
There are two lines that should be added to the $HOME/app-defaults/
Mwm file. These lines allow ARC to automatically place popup windows:

'clientAuloPlace: FALSE
*interactivePlacement: FALSE

There are a few general tools available in ARCPLOT that may assist in the
design or production of maps and graphics. When using 'DISPLAY 9999,'
ARCPLOT defaults to a black background; this is a problem with the 'true color1
display of value-black objects will be visible on paper, but invisible on the display,
and vice-versa for white. The remedy for this is to include the following line in
either a .cshrc or .login file:

setenv CANVASCOLOR WHITE

When using nineteen inch monitors, it is possible to specify a display
canvas size that has (as far as ARC is concerned) the same dimensions as an 8.5 by
11 sheet of paper. The dimension for portrait layout are 691 by 930. The
dimensions for landscape layout are 896 by 727.
All AMLs can be executed in two ways: specifying an AML path and
running with the &RUN directive; or by placing the AMLs in an atool directory
-------
37

and running them as a standard command. The $ARCHOME/atool directory can
be used, or the AMLs can be placed in another directory and linked into
$ARCHOME/atool subdirectories. The AMLs can also be placed in any other
directory, with subdirectories for each module (arc and arcplot), and use the
&ATOOL directive to specify the directory path (see the &ATOOL page in the
AMI User's Guide).
The AMLs discussed in this document, for the most part, follow a naming
convention. For the first character:
P = point or node displays.
L = line, route or section displays.
C = choropleth area displays.
G = graduated symbol area displays.
D = dot density area displays.
R = raster (grid) area displays.
S = surface displays.

For each additional character (one for each variable displayed):
H =hue.
V = value.
I = intensity.
O = orientation
S = shape.
T = texture.
Z = size.
B = box (used with points only).
C = circle (used with points and graduated symbols only).
G = cartoGram (used with graduated symbols only).
D = density (used with dot density only-all are DD).
F = fishnet (used with surfaces only).
AO = angle (for single variable displays only: cao, pao)
ISO = isoline (used with surfaces only).

Other codes include:
P = pie (graduated circles with subdivisions).
L = legend (gel, ggl, gpl, sfl; al - single variable; ol - orientation)
BL = bivariate legend.
CC= complementary colors.
DH= dual hue colors.
EE = equiprobability ellipse (and eel - legends).
IL = choropleth area intersecting lines with PAT data (and cill - legends).
TL = choropleth area intersecting lines with lookup tables.
RGB = red, green and blue (and rgbl - legends).
-------
38

Chapter Three
Point Symbolization in ARC/INFO

For point symbolization, the visual variables of hue, orientation and shape
can be changed by the use of an INFO lookup table that relates nominal
distinctions to different symbols; the visual variables of size and value can be used
to symbolize ordinal and interval/ratio data either with a lookup table or with
data from the file attribute table. For lookup tables use, one of the methods
discussed in Chapter Two should be used to generate a lookup table. The AMLs
discussed in this chapter generally require some of the additional information that
the lookup table generating AMLs in Chapter Two provide.

Monovariate Symbolization
The macros in this chapter demonstrate the use of AML to accomplish map
design. The AMLs become progressively more complex in order to set the stage
for bi- and multivariate mapping, which, for all intensive purposes, must be done
with AMLs (most of which also require the use of cursors).
Each of the examples makes use of the markers that are provided with
ARC/INFO: a default set of markers in the markerset file, PLOTTER.MRK; and
several other available markersets: BW.MRK, COLOR.MRK, MINERAL.MRK,
MUNICIPAL.MRK, OILGAS.MRK, USGS.MRK, and WATER.MRK. Most of these
are displayed in the Map Display and Query guide's Appendix B, and the ability to
create new symbols sets is discussed in Chapter Three of Map Display and Query.

Nominal Data-Hue
Hue is best used for nominal data, and the COLOR.MRK markerset
provides fifteen symbols of the same shape, but each with a different color. All of
the other markersets, except BW.MRK, have the same symbol in black, red, green
and blue. An appropriate markerset can be selected with the MARKERSET
command, and used with a lookup table that relates changes in data values to
changes in hue. POINTMARKERS can then be used in conjunction with the
lookup table to plot the points. This is the method used in PH.ami; see Figure 3.1a.

PH
- the coverage to be displayed;
- an item in the point attribute table of ;
- an info table that relates values to markers;
- a markerset file.

Nominal Data-Orientation
Orientation is best used for nominal data, and the MINERAL.MRK
markerset provides some ready-to-use symbols that vary in orientation (such as
markers 143 through 149). PH.ami (above) uses command line arguments to select
this markerset and draw these symbols; see Figure 3.1b.

Nominal Data-Shape
Shape should only be used for nominal data. All of the markersets, except
COLOR.MRK, have symbols that change in shape. The macro PS.aml uses cursors
to extract the coverage name and type (point or node) from the lookup table (which
-------
39

Figure 3.1a: PH.aml Figure 3.1b: PH.aml
Nominal data displayed with hue Nominal data displayed with orien-
using the color.mrk markerset. tation by using symbols from
mineral.mrk.
o
t

o
o
Figure 3.1c: PS.aml Figure 3.1d: PAO.aml
Nominal data displayed with shape. Ratio data displayed with symbol
orientation.
-------
40
must be in the format generated by the lookup table AMLs discussed in Chapter
Two), and then draws the points with the lookup table; see Figure 3.1c.

PS {markerset}
- a lookup table generated by SETMAN;
{markerset} - a markerset file-the current markerset is the default.

Ordinal to Ratio Data-Orientation
Orientation is best used for nominal data, but ARC/INFO allows the
specification of orientation that can be used with higher data levels. For point
symbols, the MARKERANGLE command allows specification of the drawing
angle of the current symbol. The macro, PAO.aml, uses cursors to determine the
minimum and maximum data values to be displayed, and then uses these values
to adjust the drawing angle of the selected marker; see Figure 3.Id.

PAO {markerset} {markersymbol} {markersize}
{max_angle} {pointlnode}
- the coverage to be displayed;
- a numeric data field in the attribute table of ;
{markerset} {markersymbol} - specify a marker-the current marker is the default;
{markersize} - drawing size of the marker-the default is 0.15 inches;
{max_angle} - angle of the maximum data value-the default is 179 degrees;
{pointlnode} - symbolize point (the default) or node feature.

Ordinal to Ratio Data-Value
Color value is best used for ordinal data. For point symbols, the
MARKERCOLOR command allows specification of the drawing color of the
current symbol. ARC/INFO has a Munsell-like color specification: the HLS color
model; the parameters for the HLS model are .
Hue is an integer from 0 to 360 (red = 0, green = 120, blue = 240). Lightness is an
integer from 0 to 100 (black = 0, white = 100). Saturation is an integer from 0 to 100
(gray = 0, fully saturated = 100). Specification of changes in value are
accomplished by: setting hue to any valid number; adjusting lightness to control
the grayness (for output on white paper, generally use a percentage less than 90);
saturation must be set to 0. The macro, PV.aml, uses MARKERCOLOR to control
value; see Figure 3.2a.

PV {markerset} {markersymbol} {markersize} {hue} {intensity}
- a lookup table generated by SETMAN or JENKS:
{markerset} {markersymbol} - specify a marker-the current marker is the default;
{hue} {intensity} - specify a color that will be value shaded:.

Ordinal to Ratio Data-Size
Size can be used in point symbols for ordinal and interval/ratio data, and
within ARC/INFO changes in size can be achieved in two ways: for circles and
boxes, ARC/INFO provides the SPOTSIZE/POINTSPOT command pair to
generate circle or box point symbols; for other symbol shapes, size is changed by
changing the drawing size of point symbols. The MARKERSIZE command allows
the specification of the size of the current point symbol. This macro, PZ.aml, uses
markersize to display a ratio data set; see Figure 3.2b.
-------
41
Figure 3.2a: PV.aml
Classed ratio (ordinal)
displayed with color value.
Figure 3.2b: PZ.aml
data Ratio data displayed with symbol
size.

Figure 3.2c: PC.aml Figure 3.2d: PB.aml
Ratio data displayed with graduated Ratio data displayed with graduated
circles. boxes.
-------
42
PZ {size_exponent} {size_factor} {minimum_size}
{markerset} {markersymbol} {pointlnode}
- the coverage to be displayed;
- a numeric data field in the attribute table of ;
{size_exponent} {size_factor} {minimum_size} - define the scaling of sizes
defaults: size_exponent = 1; size_factor = 0.15; minimum_size = 0.02 inches.

All of the size symbolization AMLs use the formula:

size = size_factor (normalized_data_value s|ze-exP°nent) + minimum_size

This is just a formula for a line in x (normalized_data) and y (symbol size)
space (with size_exponent set to one), with the ability to create non-linear
progressions (with size_exponent not equal to one). The slope of the line is
determined by size_factor, and the y-intercept of the line is minimum_size. The
data is normalized prior to calculating the size in order to set the minimum data
value to minimum_size; data values are normalized by:

normalized_data_value = (actual - minimum) / (maximum - minimum)

Ordinal to Ratio Data-Graduated Circle Size
Symbol size is good for either ordinal or interval/ratio data. For the
generation of graduated circle (or box) symbols, ARC/INFO provides two
commands that allow rapid generation of these maps: SPOTSIZE and
POINTSPOT. SPOTSIZE must be given before POINTSPOT can be used.
SPOTSIZE allows the creation of point symbols that can be linearly or
exponentially scaled; of the two, exponential scaling is generally preferred,
although the command line syntax is more complicated. Once SPOTSIZE has been
given, POINTSPOT can be used to create graduated symbol maps, with either
circle or box symbols. These two macros generate circle (PC.ami) and box (PB.aml)
symbols (see Figures 3.2c and d).

PC {minimum_size} {maximum_size} {pointlnode}

PB {minimum_size} {maximum_size} {pointlnode}
- the coverage to be displayed;
- an interval or ratio data field in the attribute table of ;
{minimum_size} - size of the smallest symbol-defaults to 0.05 inches;
{maximum_size} - size of the largest symbol-defaults to 0.5 inches;
{pointlnode} - the type of to be displayed-defaults to point.

Bivariate, Monochrome Symbolization
Most of the AML's presented in this section make use of shape to indicate a
nominal distinction. An appropriate symbol set should be selected (or created)
and one of the bivariate methods should be selected on the basis of the second
variable's type.

Two Nominal Data Sets-Shape and Orientation
For nominal data and meta-data, shape and orientation can be combined to
create bivariate point symbolization. Shape should generally be used for the
primary data, and orientation for the less important data, or meta-data (varying
-------
43
Figure 3.3a: PSO.aml Figure 3.3b: PSV.aml
Two nominal data sets displayed A nominal data set displayed with
with shape (primary data) and orien- shape and an ordinal data set displayed
tation (secondary data). with value. This is better for meta-data
than a second data set.
o

H
O
Figure 3.3c: PSZ.aml Figure 3.3d: PCV.aml
A nominal data set displayed with Interval/Ratio data displayed with
shape and a ratio data set displayed graduated circles that are value shaded
with size. This is more effective than on the basis on an ordinal data set.
value for a second data set.
-------
44

orientations of the same shape seem to 'go together' more than the same
orientation of varying shapes). PSO.aml, uses lookup tables for both shape and
orientation to generate a bivariate display; see Figure 3.3a. Note that this AML (as
with all that follow) require that any necessary lookup tables be set up in the
format generated by SETMAN.aml and JENKS.aml prior to running this routine.

PSO {markerset} {markersize}
- specifies marker symbol numbers;
- specifies angles in degrees;
{markerset} - a markerset for shapes-defaults to the current markerset;
{markersize} - a size for the markers-defaults to 0.15 inches.

Nominal Data, and Ordinal Data-Shape and Value
As with the next AML, this routine uses nominal primary data and ordinal
secondary data, but the visual hierarchy established by value makes it more
appropriate for display of meta-data than size. With value, uncertain values of the
primary data can be faded into the background; see Figure 3.3b and PSV.aml.

PSV {markerset} {markersize} {hue} {intensity}
- specifies marker symbol numbers;
- specifies HLS lightness data (0 to 100);
{markerset} - a markerset for shapes-defaults to the current markerset;
{markersize} - a size for the markers-defaults to 0.15 inches;
{hue} {intensity} - specify a color that will be value shaded.

Nominal Data, and Ratio Data-Shape and Size
For nominal primary data, shape can be used in conjunction with size to
create bivariate point symbols. This seems to work better for two data sets rather
than data and meta-data, which should be shown with shape and value; see Figure
3.3c and PSZ.aml (and the size discussion on page 39).

PSZ {markerset} {size_exponent} {size_factor} {minimum_size}
- specifies marker symbol numbers;
- an interval or ratio data item in the coverage named by ;
{markerset} - a markerset for shapes-defaults to the current markerset;
{size_exponent} {sizejactor} {minimum_size} - define the scaling of sizes,
defaults: size_exponent = 1; size_factor = 0.15; minimum_size = 0.02 inches.

Ratio Data, and Ordinal Data-Size and Value
Symbol size and value are good for either ordinal or interval/ratio data.
The AML PCV.aml combines the two; see Figure 3.3d. Because symbol size is
calculated directly from the Point Attribute Table, only one lookup table is needed,
but rather than containing point symbol numbers, it should contain color value (0-
-100) numbers. This system can be used to represent a ratio data value with size,
and a meta-data estimate of accuracy with value.
-------
45
PCV {minimum_size} {maximum_size} {hue} {intensity}
- a numeric item in the coverage referenced by ;
- specifies HLS lightness data;
{minimum_size} - graduated circle size for the smallest data value-default = 0.05 inches;
{maximum_size} - graduated circle size for the largest data value-default = 0.5 inches;
{hue} {intensity} - specify a color that will be value shaded.

Bivariate, Color Symbolization
For color (hue or hue/intensity) based bivariate mapping, a point symbol
should be selected from the available marker sets (PLOTTER, COLOR, MINERAL,
MUNICIPAL, OILGAS, TEMPLATE, USGS, or WATER); these markersets are
displayed in Appendix B of the Map Display and Query guide. The color
specification of that point symbol will then be changed to show variations in data
and meta-data. For size and hue based maps, only a sets of hues must be chosen;
symbol size and shape is calculated by ARC/INFO.

Two Nominal Data Sets-Shape and Hue
This macro (PSH.aml) uses two lookup tables for symbolizing two nominal
data sets. Unlike shape and orientation, neither shape or hue is, in general, the
dominant visual variable. Visual hierarchies can be established by selecting
intense color and similar shapes—this will make color hue the more prominent of
the two nominal visual variables. See Figure 3.4a.

PSH {markerset} {markersize} {value} {saturation}
- specifies marker symbol numbers;
- specifies HLS hue data (values of 0 to 360);
{markerset} - a markerset for shapes-defaults to the current markerset;
{markersize} - a size for the markers-defaults to 0.15 inches:
{value} {saturation} -defaults of 50 and 100 (maximum intensity).

Two Nominal Data Sets-Dual Hue Ranges
Color hue is best used for nominal data, although with well selected colors,
hue can be used with ordinal data. Use of the spectral encoding bivariate mapping
scheme requires careful selection of colors. For the specification of colors for dual-
hue range mapping, the CMY (Cyan, Yellow, Magenta) color scheme is most
useful. A color chart for this color specification system is in Appendix J of the Map
Display and Query manual. For a four by four color matrix, color specifications like
the following can be used:

Cyan: 100 Cyan: 100 Cyan: 100 Cyan: 100
Magenta: 0 Magenta: 33 Magenta: 67 Magenta: 100

Cyan: 67 Cyan: 67 Cyan: 67 Cyan: 67
Magenta: 0 Magenta: 33 Magenta: 67 Magenta: 100
Cyan:
Magenta:
33 Cyan:
0 Magenta:
33 Cyan:
33 Magenta:
33 Cyan: 33
67 Magenta: 100
Cyan:
Magenta:
0 Cyan:
0 Magenta:
0 Cyan:
33 Magenta:
0 Cyan: 0
67 Magenta: 100
-------
46
*>
o
Figure 3.4a: PSH.aml Figure 3.4b: PDH.aml
Two nominal data sets displayed Two nominal data sets displayed
with shape and hue. with the dual-hue ranges. Note the
lack of discernible order in color
changes—this necessitates a legend
.

Figure 3.4d: PBL.aml
The AML that
generates these
legends automati-
cally places the
labelling text and the
total number of
occurrences in each
column or row.
11.11 ->
5.42->
6

7
7

i 5
11.11* *••• 9
-> 5.420 * * • • 6
I 5
I 1
0.82 ->
0.11 -> 0.82
Figure 3.4c: PCCaml
Two ordinal data sets displayed with comple-
mentary colors. This technique highlights corre-
lation better than dual-hue range maps—the central
gray diagonal could indicate a linear relation.
-------
47

This pattern can be generated with SETAUTO.aml with the command line 0 33.
Yellow should be held constant for each of the sixteen positions—generally, 100
should be good. Even with this control of changes in color, dual-hue range maps
should generally be used only for nominal data—the complementary-color system
tends to present ordinal data better. See Figure 3.4b and PDH.aml.

PDH {markerset} {markersymbol} {markersize}
- specifies changes in cyan (values of 0 to 100);
- specifies changes in magenta (values of 0 to 100);
{markerset} {markersymbol} - specify which symbol will be drawn with-defaults to current;
{markersize} - silicifies the size of the marker-defaults to 0.15 inches.

Two Ordinal Data Sets-Complementary Colors
This AML is a variation on the previous macro. The change is in the colors
used to symbolize data; complementary colors are hues that are on opposite sides
of the Munsell or Tektronix color spaces and mix to form grey. This mixing allows
highlighting of data that is not highly correlated, because these areas will appear
in color, whereas the central axis of correlated data will appear in grey. This
method allows the representation of both positive and negative correlations;
negative correlations should be represented by reversing the values in one of the
lookup tables—this changes the direction of the slope of the central axis. Note that
white should be avoided because the entire symbol will disappear on a white sheet
of paper; use a range from 5 to 100 for percent area inked. See Figure 3.4c and
PCC.aml.

PCC {markerset} {markersymbol} {markersize}
- specifies changes in cyan (values from 0 to 100);
- specifies changes in red (values from 0 to 100);
{markerset} {markersymbol} - specify which symbol will be drawn wrth-defaurts to current;
{markersize} - specifies the size of the marker-defaults to 0.15 inches.

Point Legend Creation
Although the usage of this AML (PBL.aml) is lengthy, it allows one macro
to generate a bivariate legend for three different types of point symbolization
schemes: dual hue, complementary colors, and hue and intensity. See Figure 3.4d
for both dual-hue and complementary-color legends and Figure 3.5b for a hue and
intensity legend.

PBL
{markerset} {markersymbol} {markersize} {textset} {font} {point} {decimaLprecision}
- the first lookup given in one of the bivariate AMLs;
- the second lookup given in one of the bivariate AMLs;
- the lower left corner of the legend matrix, in PAGEUNITS;
- the separation of symbols on the x and y axes, in PAGEUNITS;
{markerset} {markersymbol} - specify which symbol will be drawn wrth-defauKs to current;
{markersize} - specifies the size of the marker-defaults to 0.15 inches;
{textset} {font} {point} - specify a textset for legend labels-defaults to a roman, 10 point;
{decimaLprecision} - number of decimal places shown for ratio data labels-defaults to 2.

Nominal Data, and Ordinal Data-Hue and Intensity
Color hue is best used for nominal data; color intensity, on the other hand,
is best used for ordinal data (and generally only for meta-data, not a second data
-------
48
rs
5

Figure 3.5a: PHI.aml
A nominal data set displayed with
hue and an ordinal data set used to
display meta-data. Intense (bright)
colors tend to be more noticeable and are
used to present more certain values.
Figure 3.5b: PBL.aml
A hue and intensity legend. Not the
lack of a diagonal, as in the comple-
mentary color system. This makes hue
and intensity better suited to display of
meta-data than two correlated data sets.
• I
0
Figure 3.5c: PEE.aml Figure 3.5d: PCFLaml
Two ratio data sets displayed with Interval/Ratio data displayed with
the Eyton's equiprobability ellipse graduated circles that are hue shaded,
system. This should be used to highlight displaying nominal data.
correlated data.
-------
49
variable). This color scheme represents meta-data better than the dual-hue and
complementary-color bivariate systems, because the data variable is clearly
displayed in a constant hue, unlike the other color bivariate methods. See Figure
3.5a and PHI.aml.

PHI {markerset} {markersymbol} {markersize}
- specifies changes in HLS hue (from 0 to 360)
- specifies changes in HLS saturation (from 0 to 100)
{markerset} {markersymbol} - specify which symbol will be drawn wrth-defaults to current;
{markersize} - specifies the size of the marker-defaults to 0.15 inches.

Two Ratio Data Sets-Equiprobability Ellipse
Eyton's ellipse is a variation on the complementary color system. The colors
that are used are the same, but the linear correlation between the two variables is
used to determine a central category, which specifically highlights correlation.
This AML requires that the EYTON.aml, presented in chapter two, be run first. See
Figures 3.5c and 5.7b on page 79 for a legend display, and PEE.ami.

PEE {markerset} {markersymbol} {markersize}
- a lookup table generated by EYTON.aml;
{markerset} {markersymbol} - specify which symbol will be drawn with-defaults to current;
{markersize} - specifies the size of the marker-defaults to 0.15 inches.

Ratio Data, and Nominal Data-Size and Hue
This macro is the color equivalent of the one that generated Figure 3.3d.
Unlike that AML though, this should be used for one nominal data variable and
one ratio data variable. See Figure 3.5d and PCH.aml.

PCH {minimum_size} {maximum_size} {value} {intensity}
- a numeric data item of the coverage reference by ;
- specifies changes in HLS hue (from 0 to 360);
{minimum_size} - graduated circle size for the smallest data value-default = 0.05 inches;
{maximum_size} - graduated circle size for the largest data value-default = 0.5 inches;
{value} {intensity} -default to 50 and 100 (maximum intensity).

Multivariate Symbolization
Multivariate point symbolization can be accomplished by several means,
each of which is suited to various combinations of data levels. Because of the
increased complexity involved in multivariate mapping, care must be taken to
insure legends are well designed and convey the methods that should be used in
interpreting map symbols.

Three Ordinal to Ratio Data Sets-Red, Green and Blue Symbolization
The 'false color' images that are often generated with satellite derived data
use red, green and blue to symbolize data values from three spectral bands. This
technique is applied here to allow display of three data values. See Figures 3.6a
and 5.8b on page 82 for a legend. Note that the AML (PRGB.aml) performs a
linear-stretch on the items in the Point Attribute Table. This AML also allows color
specification as cyan, magenta, and yellow. This can be helpful because this color
scheme tends to allow numbers in the low end of the data range to be
distinguished more readily than the RGB scheme.
-------
50
«•.«
D
Figure 3.6a: PRGB.aml Figure 3.6b: PSZV.aml
Three ratio data sets displayed by A nominal data set displayed with
linearly stretching the data sets from 0 to shape, a ratio data set displayed with
255 and using data set one for red, two size, and an ordinal data set (size
for green and three for blue. meta-data) displayed with color value.
Figure 3.6c: PCHLaml Figure 3.6d: PP.aml
A nominal data set displayed with Ratio data displayed with size; each
color hue, ratio data displayed with size, pie slice is a nominal difference within
and an ordinal data set (size meta-data) the ratio whole—pie slice values are
displayed with color intensity. percentages of the whole.
-------
51

PRGB
{markerset} {markersymbol} {markersize} {pointlnode} {rlc}
- the coverage to be displayed;
- numeric items in the attribute table of ;
{markerset} {markersymbol} - specify which symbol will be drawn wrth-defaults to current;
{markersize} - specifies the size of the marker-defaults to 0.15 inches;
{pointlnode} - symbolize point (the default) or node features;
{rlc} - display with the RGB (default) or CMY color system.

Nominal Data, Ratio Data and Ordinal Data-Shape, Size and Value
Shape can be used to display nominal data, and size can be used to display
ratio data. This AML (PSZV.aml) adds to this bivariate representation by allowing
color value to be used to display an ordinal data set. Color value can be used to
display uncertainty on monochrome output devices, such as laser printers. The
calculation of symbol size is discussed on page 39; see Figure 3.6b.

PSZV {markerset}
{size_exponent} {sizejactor} {minimum_size} {hue} {intensity}
- specifies marker symbol numbers;
- a numeric data item in the coverage referred to by ;
- specifies HLS lightness data (from 0 to 100);
{markerset} - a markerset for shapes-defaults to the current markerset;
{size_exponent} {size_factor} {minimum_size} - define the scaling of sizes
defaults: size_exponent = 1; size_factor = 0.15; minimum_size = 0.02 inches;
{hue} {intensity} - define a color that will be value shaded.

Ratio Data, Nominal Data, and Ordinal Data-Size, Hue and Intensity
Like the previous AML, size, hue and intensity should be used to display a
nominal data variable, a ratio data variable, and an ordinal data variable
(meta-data for the ratio data variable would be appropriate). This AML however
uses color hue and intensity, and therefore requires full color displays. See Figure
3.6c and PHIZ.aml.

PHIZ
{markerset} {markersymbol} {size_exponent} {sizejactor} {minimum_size}
- specifies HLS hue data (from 0 to 360);
- specifies HLS saturation data (from 0 to 100);
- a numeric data item in the coverage referred to by the lookup tables:
{markerset} {markersymbol} - specify a marker-the current marker is the default;
{size_exponent} {size_factor} {minimum_size} - define the scaling of sizes
defaults: size_exponent = 1; size_factor = 0.15; minimum_size = 0.02 inches.

Ratio Data-Point Pie Graphs
This AML generates pie symbols for point data by use of the POINTSPOT
command. The type of data that should be used in the creation of this type of map
is a size value that represents a sum of several other values; these other values
should have nominal distinctions. POINTSPOT uses the sum value to calculate the
size of the circle, and it calculates the pie slice size that will be drawn as a function
of the ratio a sub-value to the whole. Colors for each slice are calculated by the
AML. See Figures 3.6d and 6.3d on page 88 for a legend, and PP.aml.
-------
52

- the coverage to be displayed;
- symbolize points or nodes;
- a numeric item in that specifies the size of the circle;
- the smallest and largest circle sizes;
- the number of pie slice data rtems~the value of n, next;
- names of numeric items in that specify the size of pie slices.
-------
53

Chapter Four
Line Symbolization in ARC/INFO

For line symbolization, the visual variables of hue, shape, and texture can
be controlled by the use of lookup tables, which relate nominal difference to
symbolization. Value and size can be used either by means of lookup tables, or by
referencing Arc Attribute Table values. Lookup tables should be generated with
one of the lookup table AMLs presented in Chapter Two.

Monovariate Symbolization
As with point symbolization, the AMLs discussed in this chapter start out
relatively simply, and the output could be accomplished almost as readily by
hand. Like the monovariate point AMLs, these AMLs are primarily background
for the bi- and multivariate AMLs of later sections.
ARC/INFO provides a default set of lines in the lineset file, PLOTTER.LIN;
there are several other available linesets: 50.LIN, BW.LIN, CALCOMP2.LIN,
CARTO.LIN, COLOR.LIN, HP.LIN, and OILGAS.LIN. These are displayed in the
Map Display and Query guide's Appendix A, and the ability to create new symbols
sets is discussed in chapter three of Map Display and Query.

Nominal Data-Hue
Hue is best used for nominal data, and the COLOR.LIN lineset provides
fifteen symbols of the same shape, but each with a different color. All of the other
linesets, except BW.LIN, have the same symbol in black, red, green and blue. This
AML (LH.aml) uses the COLOR.LIN lineset for generating a display for arcs-see
the next section for display of routes, and Figure 4.1a.

LH
- the arc coverage to be displayed;
- the data item in referenced by ;
- an info table that relates to ;
- a lineset.

Nominal Data-Shape or Texture
Shape should be used for nominal data, and a line's shape can be changed
in several ways in ARC/INFO. The linesets PLOTTER.LIN, TEMPLATE.LIN and
OILGAS.LIN have symbols that change in shape. LINESET can be used to select
one of these, and LINESYMBOL can be used to change the shape of the current line
type, as well as change the line color and width (although the selection is limited).
This allows the selecting of several different shapes; the LINETYPE command
allows the generation of nine other types of shapes.
Texture is best used for nominal data, and the CARTO.LIN lineset provides
many ready-to-use symbols that vary in texture (such as lines 106, 110,114 and
118). These symbols can be selected by using LINESET to select this symbol set,
and LINESYMBOL to select the individual symbols. A line's texture can also be
directly varied; LINEINTERVAL and LINETEMPLATE can be used to control
texture. LINEINTERVAL determines the space between successive parts of a line
symbol; it defaults to 0 (no space). LINETEMPLATE requires that a lineinterval be
-------
54
K. .
Figure 4.1a: LH.aml Figure 4.1b: LS.aml
Nominal data in an arc coverage A nominal data set in a route system
distinguished by hue with the color.Un distinguished by shape using symbols
lineset. from plotter.lin.
Figure 4.1c: LV.arnl Figure 4.1d: L/.aml
Classed ratio (ordinal) data Ratio data displayed with size; sizes
displayed with color value by using a are calculated by scaling the data in the
lookup table. coverage Arc Attribute Table.
-------
55

set. The LINETEMPLATE command then allows control over both the length of
the mark and the length of the space between marks. The