Interagency Task Fore* on Environmental Cancer and Heart and Lung Disease EPA/600/9-90/054 January 1991 Evaluation and Effective Risk Communications Workshop Proceedings Edited by Ann Fisher Maria Pavlova Vincent Covello ------- EPA/600/9-90/054 January 1991 EVALUATION AND EFFECTIVE RISK COMMUNICATION WORKSHOP PROCEEDINGS Editors: Ann Fisher Maria Pavlova Vincent Covello Interagency Task Force on Environmental Cancer and Heart and Lung Disease Committee on Public Education and Communication Environmental Protection Agency • National Cancer Institute National Heart, Lung, and Blood Institute • National Institute for Occupational Safety and Health National Institute of Environmental Health Services • National Center for Health Studies Centers for Disease Control • Food and Drug Administration Department of Energy • Consumer Product Safety Commission Occupational Safety and Health Administration • Department of Agriculture Department of Defense • Department of Vetems Affairs Agency for Toxic Substances and Disease Registry • National Library of Medicine ^y Printed on Recycled Paper U S Environmental Protection Agency ftegion 5, Library (PL-12J) 77 West Jackson Boulevard, 12tn Chicago, IL 60604-3590 ------- NOTICE The information in this document has been funded wholly or in part by the Federal Task Force on Environmental Cancer and Heart and Lung Disease. It does not necessarily reflect the views of the Task Force or its individual member agencies and no official endorsement should be inferred. Mention of trade names or commercial products does not constitute endorsement or recommendation for use. ------- Contents m TABLE OF CONTENTS FOREWORD „ vii EVALUATION AND EFFECTIVE RISK COMMUNICATION: INTRODUCTION xi Vincent Covello, Ann Fisher, Elaine Bratic Arkin COMMISSIONED PAPERS RISK COMMUNICATION: ON THE ROAD TO MATURITY 3 Milton Russell EVALUATION FOR RISK COMMUNICATORS 11 Elaine Bratic Arkin THE TWELVE LAWS OF EVALUATION RESEARCH 25 Highlights from A Guide to Evaluation Research Theory and Practice Peter H. Rossi PRESENTATIONS INTEGRATING EVALUATION INTO THE DEVELOPMENT AND DESIGN OF RISK COMMUNICATION PROGRAMS 33 June A.Flora MARKETING RESEARCH AND RISK COMMUNICATION Corporate and Public Sector Roles 41 William D. Novelli EVALUATING RISK COMMUNICATION PROGRAMS A Catalogue of "Quick and Easy" Feedback Methods 45 Mark Kline, Car on Chess, Peter M. Sandman COMMENTARIES ON EVALUATION ISSUES DEVELOPING THE MESSAGE Selecting Appropriate Strategies 65 Mildred Zeldes Solomon Tailoring The Message to the Audience 73 James W. Swine hart Focusing on the Audience 83 Marilyn Rice ------- iv Contents TRACKING PROGRESS Issues to Consider for Evaluation Design 89 Judy Shaw, Jeanne Herb Tracking the Health Objectives for the Nation 91 James A. Harrell The Purpose of Tracking Progress 93 James L. Regens Benefits to Conducting Midcourse Reviews 97 MaxLum DECIDING ON THE EXTENT OF EVALUATION 99 Elaine Bratic Arkin MATCHING YOUR NEEDS WITH AN EVALUATOR'S CAPABILITIES 103 James W. Swinehart, Shelagh Smith, Vicki S. Freimuth, Charles Darby MEASURING ACCOMPLISHMENTS Considerations for Planning Risk Communication Ill Robert W. Denniston Four Factors in Designing Evaluation Strategies 115 David McCallum Integrating Evaluation: A Seven-Step Process 119 William H. Desvousges UNDERSTANDING OMB PROCEDURES OMB Survey Clearance Procedures 127 Richard Eisinger OMB Regulatory and Approval Requirements 129 Susan E. Dudley USING EVALUATION CASE STUDIES Introduction ,135 Elaine Bratic Ark The National Cancer Institute 137 Shelagh Smith New Jersey Department of Environmental Protection 141 Jeanne Herb, Judy Shaw, Henry L. Garie CIBA-GEIGY Corporation, Toms River (NJ) Plant 147 Thomas A. Chizmadia National Heart, Lung, and Blood Institute 151 John C. McGrath ------- Contents New York City Health Department 159 Robert W. Denniston Environmental Protection Agency 163 Ann Fisher Maryland Department of the Environment 165 Nancy Zahedi, Carol Deck U.S. Council for Energy Awareness 169 Ann S. Bisconti Food and Drug Administration 171 Louis A. Morris Cancer Prevention Awareness Program 173 Shelagh Smith EPA Office of Toxic Substances 175 Maria Pavlova National Cholesterol Education Program 177 John C. McGrath EPA Superfund Program 181 Maria Pavlova Cancer Information Service 183 Roswell Park Memorial Institute National High Blood Pressure Education Program 187 John C. McGrath CONCLUSION DOES RISK COMMUNICATION MAKE A DIFFERENCE? 191 John F. Ahearne WHAT ELSE DO YOU NEED TO KNOW ABOUT EVALUATION? 195 Roger E. Kasperson APPENDIX A GUIDE TO EVALUATION RESEARCH THEORY AND PRACTICE 201 Peter H. Rossi and Richard A. Berk PARTICIPANTS 257 INDEX 273 ------- FOREWORD Many agencies and other organizations communicate with the public about risk. How can these agencies and organizations learn whether they are communicating effec- tively? Are their messages appropriate and clear to the intended audience? Are their messages reaching that audience? Is the audience understanding and internalizing the message? To explore questions like these, the Workshop on Evaluation and Effective Risk Communication brought together experts from academia, government agencies, and the private sector under the auspices of the federal Task Force on Environmental Cancer and Heart and Lung Disease and its subcommittee, the Interagency Group on Public Education and Communication. These proceedings of the Workshop provide an overview of the principles and methods of evaluation and of their application to risk communication programs. The Task Force on Environmental Cancer and Heart and Lung Disease The Task Force on Environmental Cancer and Heart and Lung Disease was established because Congress believed that federal environmental and health agencies, i.e., the Environmental Protection Agency (EPA) and the Public Health Service agencies, should be cooperating on a regular, formal basis. Thus, the impetus for the Task Force from the beginning has been communication: communication among federal agencies about what is being done to elucidate the relationships between environmental factors and human disease and to prevent or reduce the incidence of environmental disease. The Task Force has sponsored various activities to examine specific issues related to environmental disease: The effects of exposure to toxic substances, especially how they are metabo- lized and their mechanisms of toxicity Exposure assessment Non-oncogenic lung disease related to the environment Women's occupational health vii ------- viii Foreword Air pollutants and respiratory cancer Environmental toxicity and the aging process Health professionals' awareness of environmental diseases The Task Force' s current activities focus on how pesticides affect human health and on environmental and occupational asthma. Task Force reports on these activities have been widely disseminated among the scientific community, within Task Force member agencies, and to Congress. Task Force recommendations are used by federal agencies in planning research and establishing regulatory priorities and by Congress in drafting legislation. But, while the Task Force has promoted communication between federal agencies and with the outside scientific community, it has not had great success in communicating with the public. In fact, government in general, although it has initiated research and formulated policies that respond to public concern about the environment, is often deficient when it comes to communicating with the public. Interagency Group on Public Education and Communication Recognizing this deficiency, the Task Force sponsored a Workshop on the Role of Government in Risk Communication and Public Education in January 1987. This Workshop recommended the establishment of an Interagency Group on Public Education and Communication, under the auspices of the Task Force, to enhance collaboration on public education and risk communication efforts. The Interagency Group includes representatives from all fifteen Task Force agencies as well as from other government agencies and from private organizations. Some of the members are scientists, some educators, some policymakers, but all have an interest in communicating with the public. The missions of Task Force member agencies, either formally prescribed in legislation or implicitly derived from evolving programs, require that the public receive information and be able to participate in decisions that affect overall health and welfare, as well as to make personal decisions concerning risk. The Interagency Group first concentrated on identifying federal risk communication programs that already exist and defining the role that public and private groups could play in this area. It found that many federal agencies do have risk communication and public education programs of one kind or another. These exist for a variety of reasons: the requirements of new legislation, such as that requiring public disclosure of information on release of toxic substances; the increasing interest in disease prevention; and public demand for information on health and environmental risks. Having identified these programs, the Interagency Group realized that many agencies knew little about the actual effectiveness of their risk communication efforts. The Interagency Group also saw the need to share information about their risk communication activities and to move increasingly toward collaborative efforts. The Workshop on Evaluation and Effective Risk Communication was one step in that direction. It was designed to share information about what had worked—and what had not worked—when communicating about risk to allow agencies to avoid pitfalls that already have been ------- Foreword ix encountered by others, and to avoid costly re-invention of approaches already proven to be effective. This record of the Workshop includes papers given in presentations, panels, and individual working sessions. Read the Introduction to gain an overview of the meeting's issues and conclusions and the commissioned papers for more detailed presentations of current knowledge in the field of risk communication and its evaluation. Subsequent sections address specific aspects of evaluation and provide case studies. In cases where sessions addressed similar topics, individual author's papers are grouped together under one title. The appendix includes a background paper developed for the Workshop, A Guide to Evaluation research Theory and Practice, by Peter Rossi and Richard Berk. A list of Workshop participants is also appended. Throughout this volume, readers will encounter the term "research" applied to evaluation. In this context, the term refers to investigation in some systematic way; it is not meant to imply that evaluation need be expensive, time consuming, or sophisticated. In reality, the importance of each evaluation determines the level of resources and sophistication needed. The Workshop required many months of planning and hard work on the part of a few dedicated individuals. The Task Force extends its sincere gratitude to Co-Chairpersons Vincent Covello, Ann Fisher, and Rose Mary Romano and to Frederick Allen, Elaine Bratic Arkin, Jean French, and David McCallum, who were also instrumental in planning the Workshop. In addition, the Task Force appreciates the special contributions of the Environmental Protection Agency, the Agency for Toxic Substances and Disease Regis- try, the Food and Drug Administration, and the National Cancer Institute. We hope that the insights and ideas that were shared in these two days will be of use to communicators in many agencies as they plan, and plan to evaluate, health and environmental communication programs. Maria Pavlova, M.D., Ph.D. Chairperson Interagency Group on Public Education and Communication ------- EVALUATION AND EFFECTIVE RISK COMMUNICATION: INTRODUCTION Vincent Covello, Ann Fisher, and Elaine Bratic Arkin The papers in this volume review and summarize much of what is known about evaluating risk communication activities. The papers were presented at the Workshop on Evaluation and Effective Risk Communication held in Washington, D.C., in June 1988. The purpose of the Workshop was to bring together experts from academia, government agencies, and the private sector to review the current state of knowledge in evaluation research and the ways this knowledge can be applied to risk communication. Specific objectives were to: Improve understanding of evaluation problems and tasks; Survey principles and methods of evaluation relevant to risk communication; Illustrate the practice of evaluation through examples; Provide guidance for organizations engaged in planning and coordinating the evaluation of risk communication; Derive recommendations for improving risk communication; and Identify future needs. Definitions For purposes of the Workshop, risk communication was defined as any purposeful exchange of information about health or environmental risks between interested parties. More specifically, risk communication was defined as the act of exchanging information about levels of health or environmental risks; about the significance or meaning of health or environmental risks; about the data and methods used in deriving estimates of risk; or about decisions, actions, or policies aimed at managing or controlling health or environ- mental risks. XI ------- xii Introduction Evaluation, in the context of risk communication, was defined as any purposeful effort to determine the effectiveness of risk communication programs. Evaluation, according to this definition, encompasses a wide range of activities, from diagnosing risk communication problems to measuring and analyzing program effects and outcomes. Why Evaluate? One fundamental question dominated initial workshop discussions: Why is it important to evaluate risk communication programs? In response to this question, participants agreed that evaluation is critical to effective risk communication; without evaluation, there is no way to determine whether risk communication activities are achieving (or have achieved) their objectives. Evaluation should be an integral part of the risk communication process. When carried out at each stage of program development, evaluation provides information that is critical to program effectiveness. For example, it provides essential planning information, it provides program direction, and it can help demonstrate program accomplishments. Most fundamentally, evaluation can signal the need for timely modifications. When viewed in this way, evaluation has much to offer organizations that have risk communication responsibilities. During the planning and pre-production phase, evalua- tion can provide data critical to effective program design, including information about health, environment, and lifestyle needs and concerns, information about risk management needs and concerns, and information about how to meet those needs and concerns. Through surveys, questionnaires, focus groups, and other research tools, evaluation can be used to identify stakeholders and other relevant audiences, to assess audience opinion or reaction, to find out what people see as important problems, to find out what issues and events people are aware of, and to find out how people react to different sources of information. Pretesting and pilot testing can be used to forecast the effectiveness and feasibility of alternative risk communication activities, to determine the kinds of informa- tion needed by target audiences to understand risk communication material, to examine how people process and interpretrisk communication information, and to obtain feedback on draft materials. Estimates of the effectiveness of alternative risk communication activities can be combined with information about their costs to show which risk communication strategy will be most cost-effective. Once the risk communication program is operational, evaluation can be used to address questions of accountability and performance. For example, evaluation studies can determine whether the risk communication program is reaching the intended audience, provide feedback on the performance of risk communicators, identify program strengths, suggest ways these strengths can be used to communicate more effectively, and determine whether the program is being implemented appropriately (for example, what material was produced, how much was produced, how long it took, what it cost, and what audiences received the material). Once the risk communication program has been implemented, evaluation can provide information on program impact and outcome. For example, evaluation can determine what members of the audience actually received, what they learned, and whether ------- Introduction xiii change occurred in the way they feel, think, or behave. The results can be used to answer the most important question: Did the program achieve its goals? One major reason for evaluating risk communication activities is the general lack of resources for development of comprehensive risk communication strategies and programs. Few organizations have the resources needed to launch state-of-the-art risk communica- tion programs that address multiple audiences through multiple channels. As a result, managers need to be able to choose messages and channels that use their limited resources most effectively. Problems and Difficulties These advantages raise a second question: If evaluation is so valuable, why are so few risk communication activities formally evaluated? The answer to this question appears to lie in a variety of problems and difficulties that affect the conduct of evaluation. These include problems and difficulties stemming from conflicts and disagreements about values, goals, resources, and usefulness. Each is briefly discussed below. Values. Many difficulties in evaluation arise from its nature as a normative, value- laden undertaking that carries important policy, ethical, and practical implications. The value-laden nature of evaluation derives in part from the many stakeholders interested in the conduct and effectiveness of any given risk communication activity or program. These include government agencies, corporations and industry groups, unions, the media, scientists, professional organizations, public interest groups, and individual citizens. Each of these groups has varying and often conflicting needs, interests, and perspectives. Evaluators are often asked to respond to the needs and concerns of each of these constituencies. However, different audiences have different goals; different audi- ences need different types of information; and different risk communication activities require different types of evaluation studies. As a result, an initial difficulty in any evaluation study is determining the perspective from which the evaluation will be conducted. Having chosen a perspective, several reporting implications follow, including the evaluator's responsibility to be explicit about the chosen perspective and to acknowl- edge the existence of other perspectives. Several practical implications also follow, including limits on the relevance and role that evaluation can play in affecting risk communication programs, and an increased likelihood that evaluation results will be criticized, even by the sponsors of the evaluation. Goals. A second problem affecting evaluation is the difficulty in identifying goals for risk communication. What goals are appropriate? For example, should the primary goal of risk communication be to help people become aware of an issue, make more informed decisions, take action, seek information, seek help, protect themselves, change their behavior, or participate more effectively in the decisionmaking process? For some, the goal of risk communication is narrowly defined as personal or organizational survival and damage control; for others, it is to overcome opposition to decisions; for still others, it is to achieve informed consent, enhanced public participation, constructive dialogue, and citizen empowerment. ------- xiv Introduction Meaningful evaluation is possible only when the program's goals, intended audi- ence, and expected effects can be specified clearly. However, for many risk communica- tion programs, such specification is extremely difficult and sometimes impossible. In many cases, evaluators and those who commission the evaluation are not able to agree on what the goals of the risk communication program should be, let alone which goals should be assessed or what kinds of success measured (e.g., through measures of knowledge, attitudes, and perceptions; measures of message awareness, comprehension, and accep- tance; measures of information demand; or measures of behavioral intentions or actual behavior). One practical requirement for evaluation is thinking through communication goals at the beginning. Program and evaluation activities should be based on a set of clear risk communication goals. Even the most basic risk communication activity, such as respond- ing to a telephone inquiry from a concerned citizen, should have a specific goal. Without clear communication goals—be they informational, organizational, legally mandated, or process goals—it is impossible to know if the interaction and exchange has been successful. Once risk communication goals have been determined, they should occupy a key role in the planning and implementation process. At each stage of the program, activities should be evaluated in light of these goals. If warranted, program goals should be reviewed and changed as the program develops. Resources. Effective risk communication requires a determined effort to ascertain whether the program is working as intended. Ideally, this should be done while there is still time to change direction. Feedback is essential to ensure that the communication effort is achieving its goals; if done early enough, it can save time by identifying places where mid- course corrections may be effective. In practice, however, evaluation is often neglected in favor of more urgent tasks— especially if evaluation has not been planned and budgeted in advance. In most cases, the amount of money spent on evaluation represents an extremely small percentage of the total amount spent on the risk communication effort. There are several reasons for the reluctance of managers to evaluate. One reason is that many program managers believe that evaluation is prohibitively expensive and that only a few organizations have the resources and skills to carry out evaluation. Another reason is the tendency for program managers to exhaust all available resources producing and distributing more risk communication materials (in the hope of increasing effective- ness by reaching more people), rather than to conduct evaluation studies that ask whether the message has reached the target audience and whether the target audience has received and internalized the message. There also is an understandable reluctance on the part of many program managers to support research that has the potential for showing that the time, resources, and effort they have invested in a risk communication activity or program have not produced the desired results. Program managers may not want to be told that their programs have shortcomings, because this may have implications for career advancement, for intra-organizational decisions about the allocation of resources, and for program survival. Whenever an evaluation is conducted, there is a chance that it will reveal (serious) shortcomings. Thus, not evaluating avoids the potential for evidence of failure. On the other hand, if a program manager is convinced that evaluation can demonstrate success, ------- Introduction xv according to what he judges to be appropriate measures, then evaluation may be viewed very differently; it becomes a tool to justify promotions, bonuses, or increases in financial resources and staff. Another factor that may affect the decision to evaluate is the limited success of previous risk communication programs aimed at changing risk-related attitudes and behaviors. These planned risk communication activities make up only a small share of the many factors that impinge on people's perceptions and behavior. Most evaluation studies conducted to date suggest that even when the message is clearly communicated and appears to be in the audience's best interest, the goals and expectations for such programs should be realistic. For example, a successful risk communication program might change the behavior of only a small percentage of the population. Agencies that have a public health mandate may view a small percentage change as insignificant even if the number of individuals affected is large. However, from the perspective of competing for attention and recognizing the complexities of behavioral change, risk communication endeavors should be compared with marketing efforts. For example, a marketing effort that produced an increase of a few percentage points in market share would be judged a big success. Beyond this lack of understanding of what level of impact should be considered a success, program managers may prefer formative and process evaluation over outcome and impact evaluation because the former affords opportunities to make changes in response to findings. All of these factors suggest that increased attention needs to be given to understanding organizational and other barriers to evaluating risk communication activities. Equally important is the need to develop strategies to overcome these barriers. First among these strategies is planning risk communication efforts early in the program planning stage so that evaluation activities can be integrated into the effort from the beginning. Evaluation is less likely to be resisted when evaluation is built into each stage of the risk communi- cation process, when adequate resources are available for evaluation, and when changes implied by evaluative data can be made. Evaluation also is less likely to be resisted when funds for evaluation have been set aside and built into the risk communication budget in the beginning. Second, greater attention needs to be given to the use of informal, quick, and simple evaluation methods, many of which can produce extremely valuable planning and program information. When more rigorous, systematic evaluations are required, these ideally should be carried out by parties other than those who control and conduct the risk communication activity or program. Third, greater attention needs to be given to developing incentives for program managers to fund evaluations for the purpose of better understanding which risk commu- nication activities are most effective, not solely for justifying what has been done. Fourth, program managers should be encouraged to develop well articulated evaluation plans with clear goals and clear explanations of what the evaluation is designed to achieve. Finally, program managers should be encouraged to document and share risk communication successes, including cases in which community feedback was solicited and used to enhance the risk communication activity or program. ------- xvi Introduction Usefulness. A common criticism of many evaluations is that the results are seldom used. Implicit in this criticism is the notion that use means direct and immediate changes in risk communication policies or programs. However, there are several different types of use, not all of them immediately apparent. For example, results may be used to confirm that changes in the risk communication program are not needed. In some cases, evaluation may indicate directions for risk communication that are inappropriate or not feasible. Even when there is no immediate discernible use of the information derived from an evaluation, results may accumulate over time and be absorbed slowly, eventually leading to changes in risk communication concepts, perspectives, and programs. In assessing the usefulness of evaluation research, an important consideration is that the forces and events impinging on risk communication programs are often more powerful than the results derived from evaluation studies. The environment in which risk commu- nication programs are developed seldom permits swift and unilateral changes; new information may actually slow down the change process, because it may make decisions more complicated. Summary Recommendations Several recommendations can be derived from these observations and from those found in the papers in this volume. The recommendations are divided into those for the short-term and long-term. Consistent with the goals of the Workshop, most of these recommendations are oriented toward policymakers in public sector agencies that have risk communication responsibilities. However, the recommendations apply equally well to risk communication efforts in private sector organizations, such as public interest groups and industrial corporations. Short-term Recommendations 1. Agencies and organizations should be encouraged to use evaluation methods that are appropriate to the scale and importance of the risk communication effort. Small-scale efforts may require only quick and easy evaluation methods. In contrast, more resource-intensive, statistically reliable methods may be appro- priate for large-scale efforts. 2. Agencies and organizations should be encouraged to integrate evaluation strat- egies and results into program planning and decisionmaking: evaluation should become a routine part of risk communication practice. 3. Mechanisms are needed to permit agencies to share evaluation methods and the results of evaluation research. 4. Agencies and organizations should develop guidelines to help managers choose the most suitable evaluation methods. Workshops or other training mechanisms are needed to build the skills required to design and implement evaluation strategies. ------- Introduction xvii 5. Agencies and organizations should be encouraged to evaluate risk communica- tion programs so that mid-course corrections can be made and program impact can be assessed. Long-term Recommendations 1. Agencies and organizations should support research aimed at measuring the effectiveness of risk communication activities as well as the cost-effectiveness of alternative approaches. Examples of research questions that need to be answered are: How can we evaluate the role of risk communication in changing behavior? Are risk management decisions better made as a result of more effective risk communication? Is it more cost-effective to extend the time period for existing risk com- munication activities, to intensify their use in the originally scheduled time period, or to combine multiple risk communication activities? 2. Agencies and organizations should sponsor forums for public and expert debate on issues related to the appropriateness of using different kinds of motivational and persuasive messages within risk communication programs. For example, what guidelines are needed on ethical issues related to using different types of motivational and persuasive messages to help foster a more informed public? 3. Agencies and organizations should support development of guidebooks and manuals for practitioners on how to apply evaluation techniques. Guidebooks and manuals should include information on how to tailor an evaluation program to the scope and importance of a risk communication activity, as well as how to recognize the limitations of alternative methods. Guidebooks and manuals should also include case studies demonstrating the value and importance of evaluation research in risk communication. ------- COMMISSIONED PAPERS ------- Risk Communicaton: On the Road To Maturity Milton Russell The focus on evaluation of risk communication efforts is striking evidence of the growing maturity and self-awareness of this field. The sponsors and practitioners of risk communication are beginning to take evaluation seriously. It is an extraordinary thing for practitioners to ask themselves and others such questions as: What is success, and have we measured up? What can we do better? • Is there any evidence that some of our efforts are failures? The following issues are important in evaluating risk communication programs: Opportunities for improving public health lay in changing lifestyles, and risk communication is necessary to achieve those improvements. • A prerequisite for the evaluation process is to be clear about what risk communication goal is appropriate under what circumstances; i.e., is the goal to inform or to change behavior? • Risk communication that is designed to change individual behavior imposes some serious value conflicts, specifically between the duties of the state and the rights of the individual; however, there are principled bases for resolving these conflicts. • Risk communicators must develop the professional skills to perform as well as the confidence to insist on effective evaluation in order to make risk communication a useful tool for improving public health. ------- 4 Risk Communication: On the Road to Maturity Both risk communication and the protection of public health have come a long way in this country. Risk communication was born when professional risk managers, whether they were environmental protectors, public health officials, physicians, or safety trainers, realized that the post- Vietnam generation would not respond automatically when told to jump. This generation wanted to know why and on whose authority. Risk communication began to grow up when there were requirements for public hearings and when aggressive special interest groups and an activist press started demanding that decisions be explained, so that those who made them could be judged. Risk communication made a leap to its present state of near-maturity when the "decide and announce" model, which had alienated a generation whose credo had become participatory democracy, started to be replaced with interactive consultation with the people whose interests were affected. On the threshold of adulthood, risk communication is now attempting to empower ordinary citizens to take control of the risk situations encountered in everyday life. Risk communicators are still far from successful in informing people about the consequences of alternatives in collective decisions. Further, there are many examples of mismatches between what people believe and what experience has shown and science confidently asserts about risks. Finally, some of those in positions of authority remain stuck in an earlier time warp where "doing good on the behalf of others" was sufficient Risk communication will not receive the respect or support that it deserves until its practitioners subject themselves to a rigorous standard of performance. This performance must be demonstrated in terms of the usefulness of the messages actually received by the intended audience. Therefore, developing and using evaluation techniques is of preemi- nent importance at this stage in risk communication. Three Historical Stages in Public Health Protection In terms of protecting public health, simplistic categorization suggests that public health protection has experienced two stages and is embarking on the third. The first stage was the province of the engineer, and victories were gained against infectious diseases and premature death by cleaning up the drinking water, disposing safely of garbage and sewage, and ridding the country of insect- carried scourges such as yellow fever. The reduction of conventional air pollutants that caused respiratory distress was accomplished as well. One also could place enforced inoculation programs such as those for smallpox and polio in the same category. These actions were carried out by government acting as an agent for its citizens. The second stage came when physicians developed both the skills and the tools to intervene in the course of disease, rather than simply diagnose it. Insulin was one such breakthrough, and developments leading to safer childbirth were another. The advent of antibiotics also made it possible for physicians to do more for patients than offer them comfort and prognosis. In addition, psychoactive drugs, modern surgical techniques, and crisis intervention tools lengthen and improve the quality of life of many individuals. In this stage, physicians act as agents for their patients. These two stages are now in a maintenance and marginal improvement phase in terms of public health gains. For example, the purity of drinking water cannot be allowed to deteriorate, but little improvement to overall health is likely to occur if it becomes ------- Risk Communication: On the Road to Maturity 5 cleaner. Improvements in the environment can continue as government acts for its citizens, but the large opportunities for improving public health now appear to be elsewhere. In this third stage of risk communication, major improvements will come not from government in collective actions, nor from physicians treating individual patients, but from ordinary citizens acting on their own behalf as they carry on their daily lives. The skills of the physician are no match for lung cancer and heart disease caused by smoking. There is little that government regulation of industry can do to improve indoor air quality when radon seeps from the basement and chemical fumes leak from under the kitchen sink. Moreover, a lifetime of poor diet and inadequate exercise cannot be reversed by a pill; there is no way to reverse the course of AIDS once the virus strikes; and only so much can be done at the water plant to prevent lead from leaching into the drinking water from household pipes. Finally, hospitals have limits on what they can do for alcohol or drug damaged infants or for patients with cirrhosis or pancreatic cancer. Ethical Issues However, all of these health risks can be prevented by individual behavior. All that is required is having the appropriate information and choosing to act on it. This is where risk communication can play a central role. It is a means by which society fulfills its obligation to protect public health by empowering individuals to make informed decisions about the hazards within their individual control. This philosophy implies that success is measured by whether individual decisions are informed, not by what the decisions are or by how much risk they may impose. Yet to move away from the informed decision standard can conflict with values regarding the rights of individuals and the limits of the state. One source of difficulty with the informed decision standard is that the harm rarely is limited to the one making the decision. Self-inflicted illness is an economic burden to all of us, since we share in its cost in medical insurance and taxes as well as other losses that society bears when its members are not as healthy and productive as they might be. The risks are often imposed on others, such as innocent persons hurt by drunk drivers and children afflicted by the insults their development suffered as fetuses. In addition, lung cancer not only shortens the smoker's life, but also lessens the quality of life for anguished family members and friends when the victim suffers and dies. These external effects of individual behavior mean that others inevitably have a stake in whether the behavior is changed. External effects also reach across time. What responsibility does the teenage boy who starts on a course of addictive smoking have to the middle-aged man he will become? That man may find himself not only with a wife and children but also with heart disease. Or what responsibility does the teenage girl with a fast-food diet deficient in calcium have to the osteoporosis-ridden grandmother she will become forty years later? These "others" also have a stake in whether harmful behavior is changed. Beyond external effects, another difficulty is that we think we know what is best for someone else. This argument is simple: science has shown that smoking shortens life spans. Longer, healthier lives are better than shorter, sicker ones. Therefore, people should not smoke, and if they won't quit, even when fully informed, we should make them. ------- 6 Risk Communication: On the Road to Maturity However, dictating individual decisions rather than assuring that they are informed has two problems: the practical and the philosophical. One practical consideration is that the ability to control individual behavior is limited. Would one monitor diet and exercise and enforce healthy habits? What about private, consensual sexual practices? The strict enforcement of laws may have reduced drunk driving but has failed so far to halt the spread of drug use. Beyond these practical matters are philosophical questions. Where should the line be drawn between the power of the state and the rights of the individual? Each of us would probably draw it in a different place, but there are few who would draw no line at all. This matter has embroiled political philosophers and ethicists for millennia, and it is not likely to be resolved soon. Yet it is a central issue in modifying individual behavior, and its implications need to be clarified for an evolving set of risk communication ethics and guidelines. Those with access to information about changes in individual behavior that may improve health have an implicit social duty to make it available to those who can use it. Government agencies and researchers supported by public funds bear the burden of making an effort to disseminate the results of research as widely as possible so that it can be used most effectively. Communicating about risk reduction opportunities in ways that will inform effectively is therefore part of the social contract. However, effective communication about individual risks absorbs resources that could be used elsewhere. No researcher ever has enough money or time, and diverting resources from research is asking some individuals to go against both incentives and personal proclivities. One answer may be to build a communication requirement into the support for the research, so that this element of the social contract is clear and enforceable. In addition, researchers are not usually trained or equipped to inform those who could use the information. Risk communication specialists could take on this task, which is integral to the research, not ancillary to it. Another important fact is that little effort is placed on evaluating the effectiveness of messages about individual risk reduction. With few exceptions, public health profes- sionals have failed to get messages about individual risk reduction behavior across to the public. Communication about major issues may be adequate for the reasonably intelligent, educated, medically alert portion of the public that reads newspapers and magazines, watches the news on television, has regular medical and dental check-ups, and attends PT A meetings. However, this is only part of the public. When numerous surveys indicate that many teenagers are uninformed about sex, what is the likelihood that they know enough to make informed decisions about the health risks of smoking, the effects of alcohol or drugs during pregnancy, or the long-term dangers of obesity or inadequate diet? It is a tremendous challenge to reach those persons who fall outside the usual information network, and it will not happen by putting more effort into the same techniques. Good evaluation procedures are likely to demonstrate that new communication strategies are needed. ------- Risk Communication: On the Road to Maturity 7 Risk Communication Guidelines The first set of risk communication guidelines should address the responsibilities of scientists and health professionals to inform the public, evaluate whether the messages are being received, and develop alternate tools when necessary. However, sometimes information alone is not enough to change behavior, and there may be good reasons to go further. While different communication techniques actually form a continuum, rough distinctions are possible. Closest to informing is the use of persuasion. Persuasion goes beyond supplying the facts to conveying the information in ways designed to encourage the individual, through reason, to make the behavior change desired. In contrast, manipulation bypasses reason to work on the emotions. By presenting material in forms that tap unrelated emotions, behavior that would resist appeals to reason can be changed. Dr. Koop with all his charts and medical authority is a marvelous spokesman against smoking, but who among us would choose his recent press conference over a manipulative Michael Jackson video for changing teenage behavior? At the other end of the continuum is deception—lying, presenting partial truths, or omitting clearly relevant facts. Deception is the antithesis of communication because it rejects the values of the recipients and seeks to change actions by coercive means. A free society depends on trust, especially between government officials and the public they serve. Deception, even with the best of motives, erodes trust at its core. Finally, deception rarely works, and when the deception comes to light, lost credibility is difficult to regain. Further guidelines for the use of risk communication to protect public health are as follows. First, deception cannot be tolerated. Second, efforts to inform and even persuade those who are reachable by the usual form of messages represent a powerful tool to reduce public health risks and offer no cause for objection. Third, there is a difference between using manipulative devices to get a message across and using manipulation to change behavior. If today's youth tune out Dr. Koop and tune in Michael Jackson, having the latter try to persuade teenagers to protect themselves also seems unobjectionable. The difficult choices start at the next level of communication with the use of manipulation to change behavior. When external effects are sufficiently large and where direct intervention is practical, such as with drunk driving, society does not hesitate to employ coercive sanctions. It may be acceptable to attempt to change risky behavior, which society would otherwise constrain, by manipulation as long as appropriate safe- guards are in place. However, establishing those safeguards is not easy, nor is deciding where to draw the line with regard to the degree of external effect. Unlike cases in which coercive actions are taken and due process is clear, guidelines for acceptable manipulative behavior are difficult to define and enforce. By definition, manipulative risk communication is subtle. Moreover, the "watchers" in government are often those doing the manipulating in the first place, and they often believe that they have a high moral purpose. In these circumstances, the appeal of elitism and the belief that those in positions of authority know best, even about choices that informed individuals clearly are competent to make, is strong. The major safeguard against the erosion of individual choice is to demand greater degrees of political legitimacy as government moves beyond simply informing its citizens about risk. This legitimacy can range from the responsibility implied by the existence of ------- 8 Risk Communication: On the Road to Maturity an agency, to clear statements of executive intent that Congress does not see fit to reject, to executive orders, to explicit legislation relating to members of the executive branch who attempt to bring about certain behavior. Thus, both expressed and implied legitimacy may be the best vehicle for justifying actions that are designed to change behavior in ways not expressly commanded by law. However, this conclusion leaves an opening for those in government to manipulate and control those whom they are supposed to serve. When such power is wrongly used, itviolatesthebasicpremiseoftheconsentof the governed. Therefore,thetestof legitimacy should be a stringent one that is supported by the professionalism of those involved in the process as well as by the vigilance of those subjected to it. More professionalism among risk communication practitioners offers one avenue for developing guidelines and codes of ethics. Greater attention by the political system to the means as well as to the goals of public policy may offer another restraint. Also, the pluralism of this society should not be underestimated. Those who value individual rights and restraints on the power of the state have ready access to publicity, political power, and the courts to hold egregious violations at bay. As the behavior in question moves toward more truly individual impacts, the justification for manipulative intervention shrinks and then disappears. Other values take precedence in our society, and fully informed adults must have the right to make their own decisions. In the abstract, most individuals would agree that this value is an important building block for a free society, although they might abhor its practical implications in particular situations. Summary Risk communication is not merely a means of supporting the protection of public health. It has an important role of its own. When it is individual behavior that causes risks, which is evermore the case, then it must be individual changes in behavior that reduce them. Risk communication, aggressively and effectively pursued, can raise the quality of public health at this stage of our history in the same way that clean drinking water did at the turn of the century or antibiotics did forty years ago. However, it is critical that other values are not sacrificed in the process; therefore, much attention needs to be paid to how and for what purpose communication techniques are used. Consider for a moment how drugs are tested for safety and efficacy. Cautious steps are taken leading from testing in animals, to rigorously monitored human trials, to larger double blind tests on a few individuals, to broader trials, and only then to general availability. Even then, there are carefully articulated contra-indications and injunctions about side effects. At each step, careful protocols are followed so that the processes can be replicated and the results judged by peers. Also, consider how scientists and physicians are trained: their performance and judgment are monitored by experienced professionals who can intervene if necessary. Then consider the usual risk communication effort by a government agency: haphazard would often be an apt description of the quality of the process. This is true, even when the message involves a public health hazard that may affect millions of people or where appropriate behavior change has the potential for saving thousands of lives. ------- Risk Communication: On the Road to Maturity 9 In comparison with the money and effort spent on research, risk communication is frequently treated as an afterthought or as a side line or diversion. As noted earlier, this is because the process is considered by many scientists to be beneath their attention or somehow suspect. Informing the public is not considered a professional scientific activity; also it is hard work and absorbs time and resources. Decisionmakers tend to lose interest in issues that have been resolved and turn to the next item on their agenda. Alternately, decisionmakers may consider communicating with the public about scientific issues a simple process that their political experience equips them to do with no special help. There are many exceptions, but those who have the formal responsibility for dealing with the public on risk matters are often recruited from other fields, are unfamiliar with the science they are communicating, or have little expertise, training, or incentive for doing this part of their job well. In short, this society has all sorts of controls and safeguards, tests for safety and efficacy, and professional standards and codes of ethics regarding who can do what to an individual's brain, but it pays scant attention to what goes into the collective mind. At a time when known changes in individual behavior could bring about the first significant improvements in the quality and length of life since antibiotics, the tools to communicate this information are rudimentary, the research is poorly supported, and many of the front line troops lack training and the support of those who send them into the field. There is little reason to believe that this situation will change as matters now stand. If public officials are judged on the decisions they make rather than on the effectiveness of their messages, why should they devote a great deal of effort to informing the public? If applause comes simply because Dr. Koop appears on television, where is the incentive to develop community-based peer groups to persuade teenagers not to start smoking? If scientists working on public health issues are judged exclusively by the papers they publish, where is the incentive for them to transform that research into information on which people can act? There will be no reason to take the risk communication process seriously until evaluations are made about whether people are truly empowered to make important choices about the way they lead their lives or about the collective decisions that others are making for them. In summary, it is the risk communication professionals who have the largest stake in both facilitating and demanding evaluation of their efforts. They have both the professional responsibility and the personal incentive to determine what has been successful and what further efforts will be required over time to fulfill the promise of risk commu- nication as a major element in improving public health. ------- Evaluation for Risk Communicators Elaine Bratic Arkin "In the broadest sense, evaluations are concerned with whether or not a program or policy is achieving its goals and objectives." (Rossi and Berk, 1988) While it is true that there are specialists who use sophisticated techniques to perform program evaluation, it is also true that evaluation is a natural process. We all assess actions by consciously or unconsciously reviewing the available facts, considering them in the light of the original intent, and drawing a conclusion. For example, you might find that the news media rarely report your agency's news as you think they should. A close look at the situation—the content of your news releases, how and when they are released, #nd the reactions of the reporters receiving them—might identify and help solve the problem. The purpose of any evaluation is to learn from actions so that improvements can be made. Everyone reaches conclusions about the relative success or failure of programs and activities. Formal evaluation helps assure that those conclusions are based on objective data. Formal evaluation takes the natural process and makes it a conscious, orderly effort by using objective techniques for gathering and analyzing data and reaching conclusions. The purposes of evaluation are to improve current and future efforts, certify the degree of change that has occurred, and identify programs, or elements of programs, that are not working. Evaluation is one of many tools available to help risk communication professionals and other decisionmakers do their jobs well. However, it is important to recognize that there are many kinds of evaluation, from the very informal and simple to the very formal and complex. Evaluation should not be tacked onto the end of a program. Assessment and careful planning are interdependent, integral functions of program development and implemen- tation. Just as each step of a program contributes to its effect, each step can be subjected to evaluation. Even before program development begins, evaluative discipline demands that the desired program outcome be described as specifically as possible. Once set, these goals and objectives direct how each aspect of the program will be developed. 11 ------- 12 Evaluation for Risk Communicators It is important to note that evaluation is not a substitute for sound judgment, creativity, or decisionmaking. Once evaluation results are available, they must be interpreted and a determination made about how and to what extent they will be used. Types of Evaluation This chapter describes four basic types of evaluation. Some of these concepts and definitions conform to standard textbook terminology, others do not. These types of evaluation are designed to predict results of a program, measure results of a program, or help determine why certain results occur. Examining why specific results occur helps determine which strategies or tasks work well and provides direction for improving a program's functioning. Although there are many barriers to undertaking formal evaluation projects, it is important to consider using evaluation tools to assess work performed. The four types of evaluation discussed are: formative, process, outcome, and impact. Formative Evaluation. Formative evaluation consists )f determining the strengths and weaknesses of messages, materials, or program strategies before full production, distribution, or implementation. It permits revisions before the full effort goes forward and before the communications strategy is fully developed. Its basic purpose is to maximize a program's chance for success. Formative evaluation does not guarantee that a program will have a certain effect. However, it does minimize the possibility that a program will fail due to developmental flaws, such as a confusing message, inappropriate strategy, or ineffective educational materials. Examples of evaluation strategies that are used during the planning and developmen- tal stages of a program include needs assessments, pretesting, and field testing. A needs assessment may be undertaken to reveal the habits, needs, resources, and interests of the target audience, the community, or both. This kind of study takes the problem to be addressed and relates it to the existing situation, providing a basis for designing risk communication and other strategies that will positively affect the problem. Pretesting ideas (concepts) helps ensure that messages or draft materials will have the intended effect and answers questions about whether they are understandable, relevant, attractive, attention-getting, credible, and acceptable to the target audience. These factors can determine whether messages and materials work with a particular (or target) audience. Pretesting should not be used to determine whether the message is accurate and complete— this requires expertise and professional judgment. Instead, pretesting assures that the target audience will interpret and accept the information as it was intended. Most pretesting involves a few persons chosen as representatives of the target audience, and they do not constitute a statistically valid sample in number or selection method. Pretesting is generally considered qualitative research—research that can be interpreted somewhat loosely to provide clues about audience reactions, acceptance, and direction regarding materials production and use. This kind of informal evaluation is fast and affordable; therefore, it is easier to fit into risk communication program budgets and schedules. There is no prescribed methodology for pretesting. Rather, a technique is chosen to fit each pretesting requirement according to the objectives of and available resources for ------- Evaluation for Risk Communicators 13 each project. The most frequently used methods include self-administered questionnaires, central location intercept interviews, focus group interviews, theater testing, and readabil- ity testing. Table 1 indicates which of these techniques is best suited to pretest specific risk communication products. Table 1. Applicability of Pretesting Methods Concept Development Poster Pamphlet Booklet Notification Letter Storyboard Radio PSA Television PSA Videotape Concept Development Poster Pamphlet Booklet Notification Letter Storyboard Radio PSA Television PSA Videotape Nonparticipatory Qualitative Readability Tests X X X Focus Self Groups Tests X X X X X X X X X Qualitative or Quantitative Individual Interviews X Intercept Mail Interviews Questionaires X X X X X X X X X X Theater Tests X X X ------- 14 Evaluation for Risk Communicators Field testing (pilot testing) focuses on the strategies for communicating risks, rather than the messages. For programs that will be implemented on a large scale or over a long period of time, or have a potentially vital impact, field testing can help assure that the message dissemination and other program activities will work by testing them on a smaller scale (e.g., within a limited geographic area) before full program implementation resources are committed. Also, a field test can offer a smaller, more controllable setting for conducting outcome evaluation. Examples of information that might come from a formative evaluation include: comprehension and understanding of the message by the target audience; appeal or relevance of materials to a particular audience; and feasibility of a mode of distribution for reaching the target audience. Process Evaluation. Process evaluation examines the procedures used to implement an activity. This type of evaluation monitors the administrative and organizational aspects of a program in progress, providing information about whether activities are on track; which strategies are most successful; which aspects of the program need more attention, alteration, or elimination; whether time schedules are being met; and whether resource expenditures are acceptable. Tracking the number of materials distributed, meetings attended, articles printed, or inquiries received will determine how the program is operating and whether the target audience is responding. These measures explain how a program works, but not whether it is having the intended effect. Although the effect or outcome is the reason for a program's existence, it is also important to document what is happening, which elements are working, and what needs to be changed or improved during the implementation period of a program to maximize its chances of success. Data from routine record keeping and other tracking measures should be reviewed on a regular basis, so that program tasks and schedules can be modified as necessary to improve progress. Information from a process evaluation might include: number of educational materials distributed and to whom; number of events and how many attended; print media coverage and estimated readership; number of inquiries; number of organiza- tions, businesses, and media outlets participating; effectiveness of the working relation- ships among key personnel; and degree of adherence to budget and deadlines as well as reasons for deviations. Outcome Evaluation. Outcome evaluation is used to obtain descriptive data about the results of a project and to document short-term results. Sometimes, these measures may appear to overlap process measures, but they should provide more information about the value than about the quantity of the activity. Project-focused results describe the output of the activity (e.g., the number of organizations, businesses, or media outlets participating and what and how much they are doing). Short-term results describe the immediate effects of the project on the target audience (e.g., the percent of the target audience showing increased awareness of the subject or taking a simple action). An example of an outcome evaluation methodology is a comparison between the target audience's awareness, attitude, and behavior before and after the program. Unlike the qualitative methods used for pretesting, outcome evaluation generally calls for quantitative measures that are necessary to draw conclusions about the program effect. These measures may be self-reported (e.g., interviews with a statistically valid sample of ------- Evaluation for Risk Communicators 15 the target audience) or observational (e.g., a study of changes in public inquiries or town meeting attendance). Comparisons between a control group that did not receive the program and the target audience receiving the program are desirable. It is also useful to accumulate data relevant to the desired outcome from the target audience prior to the intervention (baseline data) and again following the intervention to study changes. However, one problem that must be addressed in comparing pre- and post- intervention data is the role that factors other than the intervention being evaluated (e.g., extensive media attention) may have played. The existence of a control group can lessen this problem. Information that can result from an outcome evaluation includes knowledge and attitude changes; expressed intentions or simple actions taken by the target audience; and policies initiated or other institutional changes made. Impact Evaluation. Impact evaluation is the most comprehensive and difficult to obtain of the four evaluation types. It is desirable for some long-term programs because it focuses on the long- range results of the program, such as changes or improvements in health status. It may also be problem focused, that is, the results of the evaluation relate directly to the problem being addressed. For example, a program designed to make local residents more aware of the risk of toxic chemicals, increase participation in local decisionmaking processes, and ultimately strengthen local governance of toxic waste could be evaluated in terms of changes in awareness of residents' own risk, changes in participation in town meetings (outcome evaluation), and changes in local governance (impact evaluation). Impact evaluations are rarely possible because they are often costly, involve an extended commitment, and the results are difficult to attribute to the effects of a single activity or program when compared with other influences on the target audience over extended periods of time. This is especially true for risk communication programs, because there may be more compelling influences on an individual's behavior. For this reason, impact studies are rarely initiated as part of a communication activity, except when communication is one aspect of a larger intervention. Information obtained from an impact study may include changes in morbidity and mortality; changes in absenteeism from work; long-term maintenance of desired behavior; and rates of recidivism. Exhibits 1 and 2 give further information on designing and evaluating risk communication programs. ------- 16 Evaluation for Risk Communicators Exhibit 1. Elements of an Evaluation Design Every formal evaluation design, whether formative, process, outcome, impact, or a combination of elements, must contain certain basic elements. These are briefly described below. 1. A Statement of Risk Communication Objectives—Unless there is an adequate definition of desired achievements, evaluation cannot measure them. Evaluators need clear and definite objectives in order to measure program effects. 2. Definition of Data to be Collected—The determination of what is to be measured in relation to the objectives. 3. Methodology—A study design is formulated to permit measurement in a valid and reliable manner. 4. Instrument—Data collection instruments are designed and pretested. These instruments range from simple tally sheets for counting public inquiries to complex survey and interview forms. 5. Data Collection—The actual process of gathering data. 6. Data Processing—Putting the data into usable form for analysis. 7. Data Analysis—The application of statistical techniques to discover significant relationships. 8. Reporting—Compiling and recording evaluation results. These results rarely categorize a program as a complete success or failure. To some extent all programs have good and bad elements. It is important to realize that lessons can be learned from both, if the results are properly analyzed. These lessons can be applied to either altering an existing program or as a guide to planning new efforts. ------- Evaluation for Risk Communicators 17 Exhibit 2. Risk Communication Assessment Questions How many people were reached? (process evaluation) • Amount of time on radio and television and estimated audience at those times • Print coverage and estimated readership • Number of educational materials distributed • Number of speeches/presentations and size of audience • Number of other organizational and personal contacts Did they respond? (process evaluation) • Number of in-person, telephone, and mail inquiries (location of inquirers, where they heard of the program, and what they asked) • Number of new organizations, businesses, and media outlets participating in the program • Response (e.g., filled-out evaluation forms) from presentations. Who responded? (outcome evaluation) • Demographics of responders (e.g., gender, education, and income) • Geographic residence of responders. Was there change? (outcome evaluation) • Changes in knowledge and/or attitudes • Changes in intentions (e.g., intentions to modify diet) • Actions taken (e.g., increased enrollment in smoking cessation clinics) • Policies initiated or other institutional changes made. ------- 18 Evaluation for Risk Communicators Constraints to Risk Communication Evaluation Every program manager faces constraints to undertaking optimal evaluation tasks, just as there are constraints to designing other aspects of a risk communication program. These constraints may include: • Limited funds • Limited staff time and capabilities • Length of time allotted to the program • Limited access to computer facilities • Agency restrictions to hiring consultants or contractors • Policies limiting the ability to gather information from the public • Management perceptions regarding the value of evaluation • Levels of management support for well designed evaluation activities • Difficulties in defining (or establishing agency consensus) regarding the objec- tives of the program • Difficulties in designing appropriate measures for risk communications pro- grams, and • Difficulties in separating the effects of program influences from other influences on the target audience in "real-world" situations. These constraints make it necessary to accommodate existing limitations as well as the requirements of a specific program. However, it is not always true that "something is better than nothing." If an evaluation design, data collection, or analysis must be compromised to fit limitations, the program must decide whether: • The required compromises will make the evaluation results invalid • An evaluation strategy is essential for the particular situation, compared with other compelling uses for existing resources Some questions for program managers to consider when deciding whether to evaluate a risk communication program include: • Is the program entirely new, or does it incorporate messages and methods that have been previously tested? • Have program strategies already been formally evaluated or well documented and accepted? • How long will the program last? Will the implementation phase be long enough to permit significant adjustment? • Will the program be repeated? • Are the objectives measurable in the foreseeable future? • Which program components are most critical? • What aspects of the program fit best with the agency's mission or goals? • Is there management support or public demand for program accountability? • Will an evaluation component help risk communication efforts to compete with other agency priorities for future funding? The answers to these questions should help identify what kind of evaluation should be included in the program. ------- Evaluation for Risk Communicators 19 Determining the Type of Evaluation Listed below are examples of how evaluation can fit into the seven consecutive stages of program development and implementation. 1. State Problem or Need—A well conceived statement of the problem or the need for risk communication is essential, regardless of the size or extent of the program to be developed. Conducting a formal or informal needs assessment at this point can provide objective verification of the need and added understanding of, or new dimensions to, the need to be addressed. 2. Formulate Goals and Objectives—All risk communication programs should be founded on carefully composed goals and objectives. The goals describe the overall change, such as a specific improvement in the health of the specified population; in most cases, activities beyond risk communication will be neces- sary to reach stated goals. The objectives describe the intermediate steps that must be reached to accomplish the broader goal. These objectives should be as specific as possible, obtainable through risk communication activities, and measurable. Once the goals and objectives are written and approved, these statements serve as a kind of agreement or contract regarding the program's purpose; all aspects of the risk communication program should relate specifically to them. Without clear, measurable goals and objectives, there is no clear direction for program development and no basis for evaluation. Plans for outcome or impact evaluation should be developed at this point also to permit the collection of baseline data, if possible, prior to program intervention. 3. Develop Risk Communication Strategies and Message Concepts—The com- munication strategy statement outlines the benefits and information to be com- municated to the target population; the message concepts are how the information will be communicated (e.g., the information, the appeal, and the spokesperson) as opposed to the fully composed messages. Concept testing at this stage will confirm whether the proposed benefits, appeals, spokesperson, and information are considered clear, understandable, culturally acceptable, and relevant by the target audience. 4. Draft Risk Communication Materials—Pretesting prior to the expenditure of production funds can help diagnose any problems and indicate whether the materials are likely to be effective. 5. Develop Distribution and Implementation Plans—These plans will indicate through what mechanisms, in what quantities, and when messages and materials will be directed to the target audience. For broad-scale or long-term programs, a field test (pilot test) of these mechanisms for a short period with a smaller audience segment (or smaller geographic area) can help identify any potential ------- 20 Evaluation for Risk Communicators problems before full implementation begins. A field test may be designed to test several options for risk communication delivery and assess their relative effec- tiveness to permit full- scale implementation of the most successful methods. 6. Implement Programs—Process evaluation measures used during program implementation can provide feedback in time for modifications to be made, if necessary. This accountability evaluation is designed to identify and correct problems, not necessarily to determine the extent to which the problems exist. These measures can also provide indications of intermediate progress and justify program expansion. 7. Assess Program Effects—Outcome evaluation, based on measurable goals and objectives, is designed to document what changes occurred. Often it is difficult or impossible to credit the risk communication activities as the direct cause of the effects. However, outcome evaluation can help program managers determine whether the specific program or some of its methods should be continued or replicated. An analysis of outcome measures combined with resource costs can yield some measure of the efficiency of the process (cost effectiveness) and the importance of the change relative to its cost (cost benefit). Rarely does anyone have access to adequate resources for an ideal risk commu- nication program, much less an ideal evaluation component. Nevertheless, there are practical benefits to including an evaluation component, such as to determine whether the program is on track and how well or why it worked. With a little creative thinking, some form of evaluation can be included in almost any budget. However, resources other than program funds, such as professional staff time and skills, computer time, and evaluation consultants, also should be considered when determining evaluation strategies. Table 2 includes examples of different evaluation tasks for programs with minimal, modest, or substantial resources. The matrix is additive from left to right. That is, each ascending program level could be expected to include the evaluation techniques described at lower levels in addition to those described at the higher level. ------- Evaluation for Risk Communicators Table 2. Evaluation Options Based on Available Resources 21 Type of Evaluation Formative Process Outcome Impact Minimal Resource Readability test Recordkeeping (e.g., monitoring activity timetables) Activity assessments (e.g., numbers of health screenings and outcomes or program attendance and audience response) Print media (e.g., monitoring of content of articles appearing in the media) Modest Resources Central location intercept views Program checklist (e.g., review of adherence to program plans) Progress in attaining objectives (e.g., periodic calculation of percentage of target audience aware, referred or participating) Public surveys (e.g., telephone surveys of self- reported knowledge or behavior) Substantial Resources Focus groups, individual in-depth interviews Management audit (e.g., external management review of activities Assessment of target audience for knowledge (pretest and posttest to measure change in audience knowledge) Studies of public behavior/health change (e.g., data on physician visits or changes in public's health status) ------- 22 Evaluation for Risk Communicators SUGGESTED READINGS Environmental Protection Agency. 1987. Evaluating and Improving EPA's Risk Advisory Programs. Washington, DC: The Agency, Program Evaluation Division, Office of Policy, Planning and Evaluation, May. Fink, A., and J. Kosecoff. 1987. An Evaluation Primer and Workbook: Prac- tical Exercises for Health Professionals. Beverly Hills, CA: Sage Publications. Fitz-Gibbon,C.T.andL.L.Mirris. 1978. How to Design aProgram Evaluation. Beverly Hills, CA: Sage Publications. French, J.F., C.C. Fisher, S.J. Costa Jr. Working with Evaluators. A Guide for Drug Abuse Prevention Program Managers. U.S. Department of Health and Human Services. Rockville, MD: Alcohol, Drug Abuse and Mental Health Administration, Publication No. (ADM) 83- 1233. Green, L.W., and F.M. Lewis. 1986. Measurement and Evaluation in Health Education and Health Promotion. Palo Alto, CA: Mayfield Publishing Co. Morris, L.L., and C.T. Fitz-Gibbon. 1978. How to Measure Program Imple- mentation. Beverly Hills, CA: Sage Publications. National Cancer Institute. 1978. Making Health Communications Programs Work: A Planner's Guide. Bethesda, MD: The Institute, NIH Publication No. 89-1493. National Heart, Lung, and Blood Institute. 1986. Measuring Progress in High Blood Pressure Control: An Evaluation Handbook. NIH Publication No. 86- 2647. April. Rossi, P.H., and H.E. Freeman. 1985. Evaluation. Beverly Hills, CA: Sage Publications. ------- Evaluation for Risk Communicators 23 GLOSSARY Baseline study—collection and analysis of data regarding a target audience or situation prior to intervention. Control group—a group randomly selected and matched to the target popula- tion according to characteristics identified in the study to permit a comparison of changes between those who receive the intervention and those who do not. Formative evaluation—evaluative research conducted during program devel- opment (e.g., state- of-the-art reviews, pretesting messages and materials, and pilot testing a program on a small scale before full implementation). Goal—the overall improvement the program will strive to make; this usually requires efforts beyond risk communication. Impact evaluation—research designed to identify whether and to what extent a program contributed to accomplishing its stated goals (more global than outcome evaluation). Objective—a quantifiable statement of a desired program achievement necessary to reach a program goal; for risk communication programs, specific objectives can relate directly to desired outcomes of communication activities. Outcome evaluation—research designed to obtain data about the results of a program (short term or intermediate changes). Pretesting—a type of formative research that involves systematically gathering target audience reactions to messages and materials before they are produced in final form. Qualitative research—research that is subjective in that it involves obtaining information about feelings and impressions from small numbers of respondents. The information gathered usually should not be described in numerical terms, and generalizations about the target population should not be made. Quantitative research—research designed to gather objective information from representative, random samples of respondents. Results are expressed in numerical terms and are used to draw conclusions about the target audience. ------- The Twelve Laws of Evaluation Research Peter H. Rossi This paper highlights some of the major ideas in A Guide to Evaluation Research Theory and Practice (Rossi and Berk, 1988). The full text of this larger paper, reprinted in the Appendix, introduces the reader to the central substantive and technical issues in evaluation research and to the important literature in this field. Presented here are some principles derived from that paper. Major Evaluation Modes At first, evaluation research assessed whether or not programs were succeeding in reaching their stated goals, but it soon became clear that there was a strong need to use social research in designing the programs as well. As a result, there are now two major evaluation modes: • Formative evaluations, consisting of research to improve programs at the pro- gram design stage • Summative evaluations, consisting of efforts to assess the success of existing programs. Although formative and summative evaluations resemble each other in some ways, there are important differences. Operating agencies with the responsibility for implement- ing programs are usually more interested in formative than in summative research. Policymakers and oversight groups, such as the Congress and the executive branch agencies, are usually more interested in summative research. Most of the evaluation laws discussed below are applicable primarily to one mode or the other. A few are general laws that apply both to formative and summative evaluations. Some of these laws appear to embody simple common sense and therefore may seem hardly worth stating. Yet, a large proportion of failed programs and inconclusive findings are the result of not following these common-sense laws. 25 ------- 26 The Twelve Laws for Evaluation Research Three General Laws of Evaluation Practice LAW GI: There is no such thing as a free evaluation. This law states that there are costs to every evaluative effort. It implies that there is a rough proportionality between quality and price. LAW Gil: Evaluations should not cost more than the program being evaluated. This law emphasizes that evaluation is not an end in itself but is necessarily subservient to the programs to which it is applied. An implication of this law is that trivial programs do not merit elaborate evaluations and that important programs ought to be evaluated more elaborately. LAW Gill: Evaluation starts at the very beginning of a program. Ex-post-facto evaluations can never attain the same degree of validity as evaluations that are planned at the outset of a program and conscientiously pursued throughout the planning, design, and implementation stages. The Laws of Formative Evaluation LAW FI: Proper design requires prior knowledge. This law states that a program cannot be designed properly without having some valid knowledge about the nature, extent, and location of the problem in question. It means that one of the first steps in the design of programs is to learn about the nature of the problem to which the program is addressed. Of course, the information needed is not simply the opinions and guesses that one can find in the op-ed pages of the national media or depicted in television documentaries. What is needed is valid data, firmly based on rigorous social research. LAWFII: Proper evaluation design requires specific program goals. Or, if you don't know where you are going, you can't figure out how to get there. Stated this way it sounds obvious, but it is one of the most frequently ignored rules of program design. There are all too many examples of legislation that simply provides funds for programs without specifying what the programs are to accomplish, a sure invitation to the design of frivolous programs. LAW Fill: Response to dosage is usually curvilinear. Another way of stating this law is that a reduction in the amount of a treatment of some sort does not usually produce a proportional reduction in response. For example, if an eight-page educational pamphlet produces a certain amount of knowledge change, a four-page version does not necessarily produce half that amount (but usually considerably less). LAW FIV: Pilot programs usually work better than production programs. This law means that it is easy for program designers to produce and run a program that is effective when they run it but not so easy to fashion a program so that YOAA—"Your Ordinary American Agency"—can carry it out. A critical design issue is the need to create a program that, when turned over to an operating agency, will perform as well as when under the control of designers. The Laws of Summative Evaluation The main purpose of a summative evaluation is to estimate aprogram's impact; that is, those effects that are over and above what would have occurred naturally, or net effects. A homely illustration: an effective remedy for the common cold should produce recovery ------- The Twelve Laws for Evaluation Research 27 in patients in time periods appreciably shorter on the average than the typical two weeks that it takes for untreated patients. Summative evaluations, which are quite tricky and difficult to carry out well, require high levels of technical skill. A summative evaluation is usually commissioned by an agency with oversight responsibility; in the case of the federal government, this is Congress, an agency in the executive branch, or the central policymaking unit of an agency. Summative evaluators often find themselves regarded as antagonists by program managers. In contrast, formative evaluators typically work closely with program designers and managers. LAW SI: Impact assessments are not substitutes for the political process. The first law of impact assessment states that policymaking is a political not a technical function. The fact that a program has been found effective or ineffective usually does not dominate decisionmaking about that program, nor should it. There are many reasons for establishing and continuing a program, among which effectiveness may be only one of the major criteria. Correspondingly, there are many examples of programs that have been shown to be ineffective or weakly effective that are nevertheless continued; prime examples are job training programs. LAW SII: The impact of a program can be assessed only comparatively. This law states that in order to estimate the impact of a program it must be compared to the absence of that program. This is the law that mandates the use of comparison groups. Italso implies that randomized controlled experiments are the preferred means to make such compari- sons, although they are frequently impractical. Most of the art of impact assessment lies in defining and using the best possible and most practicable comparison groups or situations. The full paper provides a charted inventory of the nine most commonly used approaches to the construction and utilization of comparison groups, ranked in rough order of credibility of the resulting impact assessments. This chart is the most important item of information in the long section on impact assessment and should be given serious consideration. LAW SIII: Programs that do not have clear and consistent goals cannot be evaluated. This third law of impact assessment is a restatement of the second law of formative evaluations. In other words, if you don't know where you are going, not only can you not figure out how to get there, but if you do get there, you don't know where you are. Designing an impact assessment requires specifying in advance what are to be indicators of success, a process that involves translating program goals into concrete measures of success. A program, such as a community block grant, that has only the vague goal of improving the quality of living in urban areas, simply cannot be assessed. Evaluators can avoid a lot of aggravation by simply refusing to undertake evaluations of such programs. LAW SIV: The expected outcome of an impact assessment is an estimate of zero impact. This fourth law of impact evaluation often is called the "iron law of evaluation" and is misunderstood as an argument against having any programs. The law is based on the fact that most evaluations find programs to be, at best, only marginally effective in reaching their goals. In part, these findings reflect the fact that designing effective programs is not an easy task. Ours is a society that has moved a long way toward improving the level of living of most members; making additional improvements is usually increasingly more difficult. ------- 28 The Twelve Laws for Evaluation Research For example, it is relatively easy to move a society rapidly from 10 percent literacy to 60 percent literacy, and indeed, there are many examples of nations that have accomplished such changes in the space of one or two generations. In contrast, it is much more difficult to move from 80 percent literacy to 90 percent. To be illiterate in a society in which most people are literate is quite a different matter from being in that condition when a majority is illiterate. Similarly, most of the decline in mortality experienced in our society was accomplished relatively easily and inexpensively by such public health measures as sanitary sewers and supplying reasonably good drinking water. Today's problems in further reducing mortality require more resources at a higher level of effort with poorer prospects of success. The somewhat discouraging message of the fourth law reflects the fact that we usually evaluate only those programs whose success is problematic. There are no evaluations of our social security old age pension system or of our public schools, because there are very few doubts that mass education is effective compared to no education at all or that retired persons are better off under the social security benefit system compared to no benefit system at all. In other words, we set about to estimate the effects of those programs whose effectiveness we believe is problematic. It is no surprise that when we do so, our findings are that they are indeed problematic. LAW S V: There are three main reasons for the failure of programs. • The problem was not correctly understood and that misunderstanding was built into the structure of the program. • The program was improperly designed. • YOAA could not deliver the program with sufficient fidelity and at the correct dosage level. This law states that the interpretation of an impact assessment is a complicated matter. In the first place, it has to take into account the fit between a program and the existing valid knowledge concerning the problem in question. Clearly, if there is no reasonable correspondence, that is reason enough for the program' s failure. The example that comes most easily to mind is the assumption in the design of the housing voucher experiment that families that lived in substandard housing, as defined in the experiment, would agree that they lived in such housing and would welcome change to better housing. In fact, many of the standards used by the program were irrelevant to participants. The second main reason for ineffective programs are design defects in the programs themselves. For example, a famous California program providing group therapy for prisoners was designed to utilize prison guards as therapists, a design feature that insured that an atmosphere of trust between therapy group members and therapists would not be attained. This example can be quite misleading, however; most program design defects are not as obvious. The third main reason for program failure lies in program implementation. It is all too often the case that an agency is given a mission for which it is unsuited. Police departments have been given the mission of counseling in domestic disputes when called ------- The Twelve Laws for Evaluation Research 29 to quell a family quarrel. Schools in inner cities have been given the task of providing for recreation for drop-outs. The military has been asked to release unused facilities to house the homeless. The U.S. Department of Agriculture's Extension Service was asked to set up urban extension programs to teach proper nutrition practices to inner city mothers. An additional failure in implementation can occur because the agency assigned to implement a program is not given enough resources to accomplish that end. The result is a fatally weakened version of the program. Conclusion These are only a few central rules for the proper design and conduct of evaluation research. Additional laws might be formulated and could, indeed, make up a fat compen- dium. What makes these twelve laws important is that they link the technical skills of evaluation research with substantive knowledge about problems and programs. Evalua- tions, whether formative or summative, are not just technical exercises; they need to be informed by substantive knowledge. REFERENCE Rossi, P.H., and R.A. Berk. 1988. A Guide to Evaluation Research Theory and Practice. Paper prepared for the Workshop. ------- PRESENTATIONS ------- Integrating Evaluation into the Development and Design of Risk Communicaton Programs June A. Flora Risk communication researchers and professionals have long acknowledged the importance of evaluation in message development, intervention implementation, and program dissemination. Interventions can suffer from a lack of planned, systematic, and comprehensive evaluation that incorporates preproduction research, intervention and dissemination monitoring, and measuring program effectiveness (Flay, 1987). In addition, evaluation results often are not incorporated into message development, intervention implementation, and program revision. This lack of integration of results into program plans can be due to lack of time to fully incorporate the resulting feedback. In other cases evaluations are initiated late in the intervention planning process (e.g., intervention monitoring or program outcome evaluation). Other problems include evaluations that are limited to superficial objectives (e.g., liking, reading, listening) and exclude more in-depth evaluation (e.g., audience segmentation, needs analysis) and measurement of objectives closer to the desired outcomes (e.g., behavior change). This paper describes the components of a framework for comprehensive risk communication evaluation and provides suggestions for integrating evaluation results into the intervention planning process. The framework includes evaluation during the design and development phases of an intervention. This preproduction phase includes planning research, concept testing, pretesting of messages, and pilot studies. We call this first phase formative evaluation. The second phase, which roughly corresponds with the second phase of program development, is called process evaluation. Process evaluation includes monitoring message dissemination, implementation quality and participant utilization and satisfaction. The final, most often discussed phase of evaluation, outcome evaluation, will be reviewed only briefly. The emphasis here will be on the underutilized and often neglected area of research in preproduction and dissemination phases of a risk communi- cation program. The first two phases of evaluation research will be illustrated with examples from the Stanford Five City Project (FCP). Finally, a set of principles for increasing the utility of results will be presented. 33 ------- 34 Integrating Evaluation Into Risk Communication Programs Framework for Comprehensive Evaluation Table 1 presents each of the three phases of a comprehensive evaluation plan, which correspond roughly to the states of intervention development. The first phase of intervention development is called preproduction planning (identification of target audi- ences, concept development, audience analysis, specification of intervention outcomes) and production (message design, refinement, and final production). The second stage of program development encompasses implementation and dissemination (intervention delivery). The final stage is intervention refinement and revision (understanding what worked and what failed). The three corresponding stages of evaluation are discussed in more detail in the following sections. Evaluation Intervention Sequence Sequence Table 1. Phases of a Comprehensive Evaluation Phase of Evaluation 1. Audience Segmentation 2. Asset & needs analysis 3. Concept testing Formative 4. Message pretesting 5. Pilot studies (optional) 6. Dissemination 7. Utilization Process 8. Implementation effectiveness 9. Intervention Summative effectiveness 1. Identify Audiences 2. Specify objectives 3. Develop concepts 4. Construct messages 5. Refine messages 6. Implement programs 7. Disseminate products/programs 8. Follow-up programs Formative Evaluation Formative evaluation is defined as the sum of evaluation activities that occur prior to the final production of a risk communication intervention. Formative evaluation encompasses activities that serve three functions relevant to intervention design; planning research, concept testing, and message pretesting respectively. Planning research is one of the most important evaluation activities in this sequence. Planning research activities set the stage for intervention conceptualization. Prior to these planning evaluation activities, risk communication planners have determined their theo- retical orientation, broadly identified their target group (e.g., smokers, sexually active ------- Integrating Evaluation Into Risk Communication Programs 35 adults, or overweight men), and specified target outcomes (e.g., changes in morbidity, mortality, behavior). These planning requirements set the stage for planning research. Planning research can be categorized onto three separate sets of activities; however in reality all may be carried out by one survey or other research activity. The first planning research activity is audience segmentation. The goal of this step is to identify target audiences that differ by variables that are relevant for intervention design. These relevant variables include demographic factors that may be indicative of differences in message format (e.g., easy to read), access to information (e.g., cost of programs, membership, mobility), or message appeal (e.g., messages embedded in cultural context, gender differences, experience with outcome behavior). Other relevant segmen- tation variable are differences in extent of involvement in the outcome behavior (e.g., the sedentary, low level exercisers, and vigorous exercisers), lifestyle factors (e.g., cognitive, social, and behavioral factors), and information processing (e.g., high information seek- ers). Whatever the segmentation variables, subgroups must differ by factors important to intervention designers, i.e., channel of communication and message. The second planning research activity is audience needs analysis. Once audience segments are identified, their needs and assets must be determined. This step is traditionally labeled "needs analysis," but this unfortunate title masks the fact that needs as well as resources (e.g., skills, networks, motivation, regulations) must be identified. Needs analysis combined with the third and final planning activity, channel analysis, set the stage for creative and effective message/intervention design. Risk communication interventions often require a variety of channels through which messages, products and services can be delivered to target groups. These channels range from the mass media (e.g., television, radio, and newspapers) to more narrowcast media (e.g., mail, newsletters, small audience radio and newspapers, specialized magazines) to interpersonal communication (e.g., community leaders, social opinion leaders.profession- als) (Lefebvre and Flora, 1988). Audiences differ greatly in their use of channels, the communication functions of channels (e.g., information, entertainment, socialization, and surveillance), and the extent to which they select within channels (e.g., reading only news or sports in the newspaper). This audience communication pattern data and practical information on feasibility of channel use, cost, time, and access are necessary for the risk communication intervention designer (Flora, Maibach, and Slater, 1989). Concept assessment and testing is often a more qualitative followup of audience segmentation and needs analysis. Researchers may observe members of the target audience in natural settings (e.g., ethnographic research), conduct intensive interviews with audience members about issues germane to the targeted outcome, or determine intervention preferences by examining behavior patterns in related areas (e.g., self change, health, social skills). Concepts also can be assessed quantitively. In the Stanford Five Community Program (FCP), we assessed the appeal of a range of intervention possibilities such as media programs, self-help print kits, correspondence courses, groups and classes. This information was used to set product development priorities for the next year of intervention. Once target groups are identified and concepts refined, further research is necessary to ensure that the final products will achieve interventionists' objectives. This next, more detailed step requires that samples of the target audience be exposed to rough forms of the ------- 36 Integrating Evaluation Into Risk Communication Programs final product. Message pretesting incorporates all evaluation activities that assess the extent to which risk communication products meet their informational objectives. Mes- sage pretesting also can determine the likelihood of success of products through evaluation of acceptability, comprehension, familiarity, memorability, and credibility. Two methods of data collection are commonly used for message pretesting research: audience response analysis and focus group discussions. It is always useful when these two methods are conducted with the same samples of audience members. Audience response techniques, such as those used by the Health Message Testing Service (Office of Cancer Communications, 1984) at the National Cancer Institute and the National Heart Lung and Blood Institute, utilized techniques of message presentation, audience response, and analysis. Focus group discussions can vary in their purpose and process (Basch, 1988) but typically incorporate an unstructured discussion of the draft health product. More exten- sive discussion of formative evaluations are available in publications by Palmer (1980), LaRose (1980), and Atkin and Freimuth (1989). A final practical issue created by experience is the relationship of the degree of finality of the product to the relevance of the message testing results. For example, a television PSA is first a script; then a script with description of visual and auditory elements; a story board (cartoon like pictorial sequences that show the message from beginning to end); a draft production (a roughly edited version of message, often using different actors, convenient locations, and music); a rough cut (a draft form of scenes from the final product, perhaps without final music, transitions, and edits); and finally, a finished product. Message testing conducted early in this message development process is important for determining if the message concept is promising. However, information gathered early in the development will be less accurate in regard to production characteristics that have not yet been finalized (e.g., actors, music, editing, setting). Yet, changes later in the sequence usually cost more. Interventionists and evaluators must often make tradeoffs between quality of evaluation results and the expense (dollars and time) of incorporating changes. A similar analogy is possible >vith print media, although desk top publishing has reduced considerably the cost of near-final printed products. Audience segmentation: An example from theFCP. Smokers identified in the FCP baseline survey were segmented into the three motivational groups (highly committed to quit, moderately committed, not at all committed). Comparisons of the three groups showed that those not committed to quit: a) were more likely to be male and less likely to be high school graduates; b) more often had a heavier smoking history—they smoked more cigarettes a day, quit fewer times, and had increased the number of cigarettes smoked over the past two years; c) had poorer health habits and were less interested in changing habits to avoid coronary heart disease (CHD), d) used smoking to cope with life stresses and held attitudes that reflected a poor sense of control over their smoking; and e) perceived fewer pressures to quit. In general, less committed smokers were early in the change process (e.g., less aware, informed, motivated, and skilled). Increasing knowledge about the effects of smoking, and increasing perceptions of the benefits of change and improving self-efficacy about quitting are important requisites to quitting. More highly motivated quitters fell later in the change process, needing cessation skills rather than motivation. Further, constructing a social ------- Integrating Evaluation Into Risk Communication Programs 37 environment supportive of staying quit (e.g., having nonsmoking friends, family support for quitting, smoke-free environment) is crucial to the continued success of motivated quitters. FCP campaign designers initially focused on the more motivated smokers reasoning that they could be helpful to those who were less motivated. Television programs supplemented with classes and self-help quit kits constituted some of the first efforts in smoking cessation (Sallis, Flora, Fortmann, et al., 1985). Later in the campaign, quit smoking contests that offered incentives (a trip to Hawaii) for quitting and staying quit were implemented on an annual basis (King, Flora, Fortmann, et al., 1987). Efforts to reach poorly motivated smokers included PSAs utilizing fear appeals to persuade smokers about the need to quit, combined with the promotion of telephone numbers to call for more intensive instruction. A program teaching physicians to prescribe Nicorette gum (to cope with the addiction to nicotine) and counsel patients about staying quit was also an important component of a comprehensive smoking effort. That segmentation analysis combined with needs analysis of the identified segments shaped the sequence and construction of programs for the length of the FCP. Other formative research (i.e., message testing) supplemented the segmentation analysis and further shaped and refined individual program products. Process Evaluation Process evaluation is distinguished from formative evaluation by its concern with the processes of dissemination and implementation of the intervention. These processes of implementation can be described by three broad concepts; (1) identification and definition of the "do's" of intervention, (2) determination of the integrity of intervention delivery, and (3) detection and description of direct and indirect (intended and unintended) program effects. The first concept concerns evaluation from a message sender or program delivery perspective. The goal of this first aspect of process evaluation is to identify program components, their intended outcomes, their intensity, repetition, and potency. The second aspect of process evaluation, integrity of intervention, also is concerned with the extent to which the actual implementation of the intervention meets the expectations of program designers. The third aspect of process evaluation is concerned with the participants' (and non-participants') responses to the program. The objectives of this level of evaluation are to: (1) investigate the qualitative aspects of the program, (2) determine the amount of intervention, (3) provide explanatory links in cause and effect relationships, (4) determine any unintended or indirect effects of the intervention, and (5) provide supplemental data that may augment the interpretation of outcome evaluations. In addition, program monitoring data are useful for effective administration and management of programs through increased information for intervention goal setting, intervention development, establishment of priorities, and refinement of programs. This is accomplished through provision of feedback on intervention to program staff. Both intervention integrity and program development are enhanced by process evaluation. Finally, intervention monitoring can yield archives of program efforts that become the database for cost and cost-effectiveness analysis. Thus, program management, planning, ------- 38 Integrating Evaluation Into Risk Communication Programs program description, and cost analyses are important additional outcomes of intervention monitoring. Process evaluations most often require the capability to: (1) collect similar types of information over time, (2) collect data about participants as well as non-participants, (13) allow for collecting information so that the amount of intervention exposure per recipient can be calculated, and (4) monitor progress of interventions. The Stanford FCP developed an education monitoring system that accounts for the number, type (e.g., print, television, face-to-face), and objective (awareness, information, skills) of messages sent to a target audience. This time-based system provided feedback to program planners on progress towards implementation goals, amount of effort devoted to promotion and behavior change, number and type of channels of communication used, and estimates of individuals reached by programs (Flora, Goode, and Farquhar, 1985). Supplemented with auxiliary studies of audience response and implementation integrity, FCP staff were able to detect problems in implementation and to make revisions both during the ongoing campaign and in future campaigns. Summative Evaluation Summative, or outcome, evaluation describes the impacts (direct, indirect, intended, unintended) of programs. Its main objectives are to determine whether the program goals are achieved and whether there are alternative explanations for results. There are several excellent discussions of Summative evaluations of communication campaigns (Cook and Flay, 1981; Cook and Flay; 1989). These references review summative evaluations that simply monitor programs along with more costly causal designs. Using Evaluation Results Literature on the history of the use of evaluations reveals a generally pessimistic picture (Windsor, Baranowski, Clark, and Cutter, 1984). Evaluation results are often not incorporated into program planning and implementation. Four general kinds of reasons are given for the lack of utilization of evaluation results (Wholey et. al., 1970; Windsor et. al.; 1984): (1) Organizational inertia. Organizations are typically slow to incorporate recom- mendations for change. Organizations are much better at maintaining the status quo than changing. Thus, evaluations suggesting changes in planning and program development are likely to be given less attention that those reinforcing current efforts. (2) Methodological weakness. Poorly conducted studies are subject to internal as well as external criticism. This criticism weakens the credibility of results and interpretations. Thus, decisionmakers are likely to use their own judgments under conditions of poor research. (3) Design irrelevance. Often evaluations are conducted without the input of program planners and decision makers. This lack of input from those who are the consumers of the research does not facilitate the active use of results. (4) Lack of active dissemination. Lack of dissemination includes two issues. First, evaluation results are at times simply not disseminated to a wide range of ------- Integrating Evaluation Into Risk Communication Programs 39 individuals within an organization. Perhaps a more common dissemination problem with the types of research discussed here is the lack of tailoring of research reports to the skills and needs of program staff. These four concerns are most commonly offered in the context of summative evaluation. However, the last three reasons—methodological weakness, design irrel- evance, and lack of dissemination—are relevant for formative and process research. Evaluation can be made more useful if a few fairly simple suggestions are followed: (1) Organizational prioritization. Decisionmakers within organizations planning risk communication program must be committed to evaluation research throughout the lifespan of a program, including conceptualization, implemen- tation, and refinement. This commitment has to be accompanied by allocation of time to conduct research, resources to carry out evaluation plans, funds for staff, the necessary equipment and other logistical needs, as well as evaluation expertise to guide planning and implementation of the evaluation. (2) Evaluation planning. This second suggestion is a logical consequence of organizational prioritization. Yet, it is so important that it deserves special attention. Planning for formative, process, and summative evaluation has many components. These planning components include input from staff concerned with program development; input based on review of relevant research literature; input derived from the theories that guide program devel- opment; and input based on a consideration of evaluation objectives in each phase (e.g., formative evaluation should include considerations of target outcomes to be assessed in summative evaluation). (3) Staff training and education. In order to facilitate input in evaluation planning from a range of program staff (e.g., evaluation and risk communication program planners, media professionals, content experts and evaluators), all staff members should be fluent with the basic tenets of evaluation. Training should be supplemented by staff discussions about planning, data analysis, and interpretation. This group discussion process is invaluable in facilitating the integration of research and program design and implementation. (4) Reporting of evaluation research. Program planners often require simple, direct answers to complex questions. Put differently, they need usable data from evaluations. Thus, while the evaluation design, methods, and analysis may be complex, interpretation and presentation of the data generally need to be straightforward to be useful in the intervention planning context. Summary A comprehensive risk communication evaluation framework is one that includes: 1. Formative evaluation for the design and production of risk communication messages/products/programs; 2. Process evaluation to monitor implementation and dissemination of interven- tions; and 3. Summative evaluations, preferably those that are able to determine causality. ------- 40 Integrating Evaluation Into Risk Communication Programs An often neglected area of concern in evaluation is linking evaluation results to program development, implementation, and revision. Organizational prioritization, adequate evaluation planning, enhanced communication between program and evaluation staff, and evaluation reports tailored to intervention staff needs will increase the likelihood of building risk communication campaigns based on evaluation research data. REFERENCES Cook, T.D. and Reichardt, C.S. (1992). Qualitative and Quantitative Methods in Evaluation Research, Beverly Hills, CA: Sage. Flay, B.R. (1987). Evaluation of the development, dissemination, and effectiveness of mass media health programming. Health Education Research. 123-129. Flay, B. and Cook T.D. (198)1. Evaluation of mass media prevention campaigns. In Rice R.E., Paisley W. (Eds.), Public Communication Campaigns. Beverly Hills: Sage. Flora, J.A., Maccoby, N.M, Farquhar, J.W. (1985). A Prototype Education Monitoring System. A paper presented at the American Public Health Association meeting, Wash- ington, D.C. Flora,J.A.,Maccoby,N.,Farquhar,J.W. (1989). Cardiovascular disease prevention: The Stanford Studies. In Rice, R. and Atkin, C., Public Communication Campaigns. Beverly Hills, CA: Sage. King, A.C., Flora, J.A., Fortmann, S.P. and Taylor, C.B. (1987). Smokers challenge: Immediate and long term findings of a community smoking cessation contest. American Journal of Public Health. 77. 1340-1341. La Rose, R. (1980). Formative evaluation of children's television as mass communication research. In Dervin, B. and Voigt, T. (Eds.), Progress in Communication Science. Vol. II, 275-297. Palmer, E. (1981). Shaping persuasive messages with formative research. In R.E. Rice and W.J. Paisley (Eds.), Public Communication Campaigns. Beverly Hills, CA: Sage. Reicharts, C.S. and Cook T.D. (1979). Beyond qualitative vs quantative methods. In Qualitative and Quantitative Methods in Evaluation Research. Beverly Hill. CA: Sage. Wholey, J. ScanlonJ., Duffy, H.,Kukumoto,J., and Vogt,L. (1970). Federal Evaluation Policy: Analyzing the Effects of Public Programs. Washington, D.C.: Urban Institute. Windsor, R.A. Baronowski, T., Clark, N., Cutter, G. (1984). Evaluation of Health Pro- motion and Education Programs. Palo Alto, CA: Mayfield. ------- Marketing Research and Risk Communication Corporate and Public Sector Roles William D. Novell! Elements of marketing research, for purposes of planning, tracking and evaluation, have found their way into many of today's health communication programs. In one sense, this is to be expected, since marketing research and evaluation research stem from common social science antecedents. In addition, some of the research methods in health commu- nication, such as use of focus groups for qualitative research, are borrowed directly from marketing. The evaluation research described by Rossi and Berk (1988), however, is more precise, more comprehensive, and usually more expensive than most marketing studies. Nonetheless, their concept of "the best possible strategy" is certainly common to market- ing, since questions of cost, timeliness, political feasibility, and other pragmatic consider- ations are essential to decisionmaking. Marketing research, at least in the commercial world, is supposed to have an impact on the bottom line. There was a time when corporations could rely almost solely on technical expertise in research and development or production to carve out successful businesses. Most of the great corporations became great because they excelled in some technology: Dupont in polymer chemistry, PPG in glass, General Electric in electrical products, and IBM in computers. But today, superior technology is virtually ubiquitous. It is no longer so much a competitive edge as it is the price of being able to compete at all. If one company moves out ahead in a technological area, its technologically adept competitors are apt to copy it quickly. The competitive edge, therefore, frequently lies in marketing. The key to success in business today is often the ability to direct technical efforts into producing and delivering products and services for which certain targeted markets will pay and with which they will be satisfied. That, of course, is what marketing is all about. And that is why marketing research is so essential to playing the game as well as to keeping score. While technology is changing, so is marketing research. For example, split cable technology now enables marketers to compare the perceptions of households receiving one 41 ------- 42 Marketing Research and Risk Evaluation set of television commercials with next door neighbors who are viewing a different set of TV messages. This split cable research technology is being combined with uniform product code scanners, which read and register items and their prices as they are checked out of supermarkets. Using them together, marketers can measure the purchase patterns of household members who are viewing the TV commercials being tested. Also, pricing strategies, coupon use, in-store promotions, and other techniques can be assessed, all combined with demographic, purchase, and media patterns. What does all this have to do with risk communication? Marketing research is relevant to the evaluation of risk communication because their essential purposes are the same. Both aim to plan well integrated programs, to assess progress in interventions, to measure intended and unintended effects, to track and engage in surveillance for making program adjustments, and to respond to market place change. Also, both marketing managers and health communicators must develop long-term and annual plans and must measure performance regularly to assess effectiveness in reaching objectives and effi- ciency regarding budget expenditures. The necessary ingredient for all this is relevant, quality information, regularly available in a form that is useful for management decisionmaking. Not all data gathering is expensive, although some information can be quite costly to collect. Robert Waterman, a co-author of In Search of Excellence, was asked how the companies he studies anticipate marketplace change. His answer was deceptively simple: They have close ties with their customers. Waterman explained that successful companies have a wealth of "listening devices" to keep tabs on consumers and suppliers, as well as on the competition. This concept of keeping tabs is directly applicable to health communications research. Numerous listening devices can be put in place to track at-risk consumers and other targets. Some of these devices can be quite inexpensive, and no one or two would be sufficient; but collectively they can be an affordable, effective means for tracking change. Years ago, in the Stanford Program's original community studies, investigators employed what they called informal "snoops" to assess what was going on in their test communities. Research on Social Issues While marketing and its research techniques have much to offer health commu- nicators, corporate America appears to have a particular blind spot. This blind spot presents a need and an opportunity for public health professionals to lead the way. As expert as they may be in studying the marketplace, corporate marketers seem to be content to do little or no research on social issues as long as their current marketing strategies appear to be working well. As a result, the corporate marketing research system all too often is insensitive to unfolding social needs. Two New York University marketing professors, Larry Rosenberg and Robert Shoemaker, expressed some thoughtful views on this problem some years ago in the Sloan Management Review (Rosenberg and Shoemaker, 1980). There appear to be a variety of reasons for myopia when it comes to emerging social issues. One is the widespread corporate practice, especially at the product manager level, to rotate managers every two to three years. Under these conditions, managers feel compelled to fund research that will be completed quickly, at the lowest possible cost, with ------- Marketing Research and Risk Evaluation 43 direct impact on short-term sales. Their brief tenure reduces any incentive to be innovative in conducting marketing research. A second reason is that, in most cases, managers are judged by annual sales, profit, and return on investment. Actually, in today's marketing world, "annual" means "long- term," and quarterly assessments are typical. Senior level managers tend not to reward social issue concerns, since they may not be related to immediate profit and loss. Third, higher-level executives also may be loath to study social issues if the findings may adversely affect their companies. For instance, Rosenberg and Shoemaker cite the case of senior managers of a major cosmetics manufacturer who did not research consumer attitudes on ingredient labeling because they anticipated some level of demand for improved labeling. Fourth, large companies usually market a broad product line. Problems involving potentially deceptive advertising, harmful ingredients, or inadequate safety warnings may involve many products, in several divisions. The average manager may ignore the issues, seeing them as larger, company- wide problems beyond his or her control. Fifth, the cost of research is an important consideration in corporate marketing, and this, too, contributes to a lack of surveillance of social needs. For example, convenience samples may be used, which cut costs and speed results, but which may not uncover social discontent. Finally, high- and low-socioeconomic segments often are inadequately represented in corporate research. This means that the early warnings of a social issue may be underestimated or missed completely. On one hand, studies show that consumers who complain about company practices and products tend to be atypical and often in higher socioeconomic strata. These types are often underrepresented in marketing research samples. On the other hand, disadvantaged, inner city people also are usually under- represented in marketing studies. Much marketing research is done in suburban locations, often by interviewing people in malls or through mail surveys with sizeable non-response levels. Such methods can limit detection of social issues. For all these reasons, studies of social issues are given low priority and often are the first to be dropped when research funds are tight. Thus, an awareness of health and safety issues usually comes to corporate managers from sources other than their own—consumer advocates, the media, social researchers, and government officials. Yet social issues, such as health and safety risks, may be of importance to the company. Awareness of such risks among corporate executives is a necessary first step in their understanding of what they must do to be good corporate citizens as well as to protect their business from social and political pressures. To summarize, marketing research has a great deal to offer public health practitio- ners in planning, development, implementation, and assessment of programs. In turn, public health professionals can use similar research techniques to sensitize, inform, and educate American industry. This can help begin and accelerate the process of corporate change in areas related to health and safety, before problems reach high-risk proportions. ------- 44 Marketing Research and Risk Evaluation REFERENCES Rosenberg, L.J., and R.W. Shoemaker. 1980. SMR Forum: Is Marketing Research Sensitive to Social Issues? Sloan Management Review. Winter. Rossi, P.H., and R. A. Berk. 1988. A Guide to Evaluation Research Theory and Practice. Paper prepared for the Workshop. ------- Evaluating Risk Communication Programs1 A Catalogue of "Quick and Easy" Feedback Methods Mark Kline, Caron Chess, and Peter M. Sandman Agencies that deal with environmental health issues are paying greater attention to how they can communicate with the public more effectively. There is also an increasing body of literature directed to agency practitioners, suggesting how risk communication principles might be translated meaningfully into reality. As these principles are integrated into practice, agencies should also be evaluating their efforts. Communication efforts, like technical ones, can improve with feedback. The lack of such feedback may lead the agency to repeat the same communication mistakes and fail to duplicate successes. Unfortunately, it may be difficult for agencies to identify evaluation strategies that are practical, useful, and affordable. The term "evaluation" has multiple meanings, including making critical judgments about the worth of a program. Therefore, evaluation activities may seem threatening to agencies already immersed in "crisis" communication efforts, usually with limited resources. In addition, some forms of evaluation may seem too elaborate and difficult to implement in this context. The goal of this catalogue, which was funded by a contract from the Division of Science and Research of the New Jersey Department of Environmental Protection, is to identify and recommend specific evaluation methodologies with the greatest potential for agency use in small-scale communication efforts where a full-scale evaluation may not be feasible. These tools are also likely to have application in risk communication efforts by industry and advocacy groups. 'Submitted to the Division of Science and Research, New Jersey Departmant of Environ- mental Protection, September 22,1989, by the Environmental Communication Research Program, New Jersey Agricultural Experimental Station, Cook College, Rutgers Univer- sity, 122RydersLane,New Brunswick, New Jersey 08903; this paper summarizes the full report. 45 ------- 46 Evaluating Risk Communication Programs Strengths and Limitations of Quick and Easy Evaluation In its most general sense, the term "evaluation" refers to a process of interpreting and judging events, aprocess that human beings engage in much of the time. Evaluation ranges along a continuum, from informal, subjective impressionsat one end, to formal, scientifically conducted and controlled evaluation research at the other (Rossi and Berk, 1988). In the middle of this continuum are assessment and feedback methods that are more structured and systematic than subjective impressions, but less rigorous than evaluation research. Because these intermediate methods require much less time, resources, and expertise than evaluation research, we call them "quick and easy" methods. In our view when most people think of evaluation they tend to think of approaches that give an overall assessment of a program's worth. Such approaches, including "summative evaluation" (Rossi and Berk, 1988) and "impact evaluation", lie at one end of the previously mentioned continuum. Many programs go without any evaluation whatsoever because impact evaluation is seen as the only form of evaluation and these efforts are beyond agency capabilities and resources. Practitioners may be left with only their own impressions of how they fared in a communication effort, with no basis beyond intuition and guesswork for correcting communication errors and repeating communication successes. Evaluation experts have generally accepted this state of affairs because of their conviction that data from poorly designed evaluation research studies can be misleading. Rossi (1988) has noted that a bad evaluation can be worse than not doing one at all. Proponents of rigor have seen less rigorous research badly abused, leading them to conclude that agencies are better off knowing nothing than obtaining questionable feedback. We believe that partial feedback can be better than none at all if the strengths and limitations of this feedback are fully understood. Agencies should not, for example, rely on feedback from "quick and easy" approaches for impact evaluation. Drawing reliable causal inferences about the effects of a communication effort requires scientific evaluation research. This catalogue focuses on approaches that we feel are useful when practitioners face limitations on time, expertise, and other resources. These approaches can be practical for less resource- intensive communication efforts, where impact evaluation is not appropriate or possible. In lieu of formal impact evaluation, agencies can rely on feedback from quick and easy approaches to guide the development of their risk communication programs. This is called "process evaluation," and it examines the ongoing processes and procedures of a risk communication effort. "Formative evaluation" techniques, which assess the strengths and weaknesses of materials before full implementation of a program, can also be adapted to suit less resource-intensive communication efforts. Some techniques used in "outcome evaluation," which explores the reactions of audiences after a phase of a communication effort, can also be adapted for quick and easy use. Since the use of "quick and easy" methods generates feedback which is more systematic and disciplined than that found in typical practice, the use of these methods creates programs that may be ultimately more amenable to rigorous impact evaluation, should resources become available. ------- Evaluating Risk Communication Programs 47 If "quick and easy" approaches are viewed as a means of obtaining a snapshot— rather than a full picture—they can provide useful input to agency risk communication efforts. Practitioners can use quick and easy strategies to gather some information that will inform their practice in the absence of a full study. In particular, quick and easy strategies can yield information that can lead to mid-course corrections and bring new ideas into the process. This feedback can be even more critical to agency efforts than retrospective analyses. (It may be ultimately more useful for practitioners to know they are about to light communications fires than to evaluate their firefighting efforts.) Information gathering of this type is common in the public relations field, where it is viewed as "developmental" input for generating hypotheses rather than as conclusive data that are reliable and generalizable. Feedback can be viewed as an opportunity to turn bad news into good. Agencies can use feedback suggesting that a program is off-course to put the program back on track. Even scathingly negative remarks can be fodder for making a program more effective. When viewing feedback as information to succeed rather than as justification, superficial praise about a meeting or brochure may be less useful than critical remarks that include suggestions for change. The latter provide the agency an opportunity for improving its materials and the added benefit of being responsive to the public. Agencies should not abandon rigor entirely when gathering information. Quick and easy methods can be more valuable if agencies attempt to be as rigorous as possible within the constraints of their resources. For example, keep in mind basic principles of objective data gathering, carefully defining target groups, choosing representatives typical of the target groups, and asking questions in a consistent and unbiased manner. More rigorous methods increase the strength of conclusions that can be drawn from feedback. Awareness of the need for rigor can also allow agencies to refrain from drawing sweeping and misleading conclusions from developmental feedback. Barriers to the Use of Quick and Easy Evaluation We believe these strategies can help communicators develop and maintain an open channel to those outside the agency. However, even the best feedback is of little value if it is not heeded. Audiences may already be skeptical about whether agencies will use their input and respond to their needs. If practitioners gather evaluative feedback, they must be open to using it. Furthermore, they should be prepared to assess how the feedback was used—whatrole it played in the decision that was ultimately made—and also to demonstrate any positive effects to the public. Agencies, in short, should be accountable not only for getting input from the public, but also for using it and showing that they used it. If audiences sense that their time and effort have gone to waste, they may be even more disenchanted with agencies than they would have been if no feedback had been solicited. Agencies that operate as closed systems may have little organizational investment in this kind of feedback. In such an agency, decisions are made on the basis of an internal process. Staff are accountable to their supervisors who are in turn accountable to higher- ups. Communication efforts may be designed to take into account this internal input and keep things running smoothly. Staff who attempt to bring in new ideas based on public input may not be supported. Agencies of this kind may attempt to lend an occasional ear, pass out an occasional survey, and make an occasional telephone call in an effort to solicit ------- 48 Evaluating Risk Communication Programs public input, but the system's incentives make it unlikely that such input will be used constructively. Even the best evaluation tool can be subverted by this sort of agency process. For quick and easy tools to function well in maintaining an open channel, they must be supported by agency management and policy. Without this support, front-line practitio- ners may gather information only to have it ultimately ignored, leaving them with an even more irritated pubb'c than in the first place. Part of quick and easy evaluation involves agency management encouraging staff to be creative in opening the channel with the public—even when what emerges from the channel is critical of the agency staff members conducting the communication program. Agencies, therefore, must be prepared to turn bad news into good. Critical feedback provides an opportunity to improve a communication effort and a chance to be responsive. Agencies that are not willing to make mid-course corrections in response to feedback from the public will have little use for these tools. Agencies may be tempted to use quick and easy strategies to justify what they did rather than to find out what they can do differently. Aside from being a tedious exercise, using these tools in this way defeats their very purpose—to introduce new ideas and feedback through an open channel. Risk communication and quick and easy evaluation are both value-laden processes. The values and climate of an agency can have great impact on whether these tools help open the door to the public or help keep it shut. We have attempted to identify tools that support commonly accepted risk communication principles, hopeful that agencies will use them in the spirit of an open, ongoing dialogue with the public. Development of This Catalogue This investigation took the form of a scavenger hunt. Through telephone and personal interviews, literature reviews, networking, and a computer database literature search, we attempted to identify feedback approaches that we could recommend for agency practice. We looked for techniques that: • Are easy to use • Can be implemented inexpensively • Yield results quickly • Are relatively non-threatening to both the audience and the agency • Give feedback which translates to behavioral change • Reinforce commonly accepted risk communication principles Our search was intensive but by no means exhaustive. We talked to a large group of people, including risk communication practitioners, those with evaluation experience, consultants, public relations specialists, industry practitioners, and academics. We looked into their suggestions and reviewed literature they recommended in addition to literature we were unearthing. From this rich mix of sources, we identified the evaluation methods and instruments reviewed in this catalogue. We recognize that we may have missed some instruments, though our networking efforts did yield confirmation of many of the tools we describe from a variety of different ------- Evaluating Risk Communication Programs 49 sources. This catalogue is not intended to be the final word on quick and easy evaluation strategies. We encourage agencies to continue to look for and develop tools for this kind of feedback. How to Use This Catalogue Our review of quick and easy evaluation methods is not in the form of a quick and easy evaluation manual. After agencies have some experience with the instruments we recommend, development of a step-by-step guide may well be appropriate. We assume this catalogue will be of most interest to those who have a fair amount of commitment to and expertise in risk communication. We hope they will use the catalogue as a resource for assisting policy-makers and technical staff with evaluation. Nonetheless, we recognize that most agency staff may not have the time to read a full review of each tool before deciding which one will be useful to their risk communication efforts. The following summaries of twenty-two tools give a brief overview of each. Readers can use these summaries to decide which tools might prove useful to their communication effort. However, readers will want to review the detailed reports about instruments that interest them in order to get more in-depth information. (See the full report, as listed on page 45.) These reports include a) detailed descriptions, including examples of how the instruments have been used; b) discussion of strengths and limitations; and c) how to order the instruments. ------- 50 Evaluating Risk Communication Programs OVERVIEW OF EVALUATION METHODS I. Planning The key to effective risk communication is effective planning. Just as scientific research without planning can slow down an assessment due to the need to rethink and resample, it is ultimately more wasteful and time consuming to develop a brochure or presentation without planning. It is quite difficult, if not impossible, to evaluate a risk communication effort unless you have planned a program so that you know what you want to achieve and how you are going to achieve it. Because planning is so critical we have developed a separate document on planning entitled, "Improving Dialogue with Communities: A Risk Communication Workbook" (Hance et al., 1988). This workbook, available in 1989 from NJDEP's Division of Science and Research or the Rutgers Environmental Communication Research Program, includes checklists and worksheets to help those with little communication background to identify communication goals, audiences, audience concerns, methods of reaching people, key content points, and other components of successful planning. Our research for this evaluation catalogue did locate some comprehensive planning systems (Green, 1980; National Cancer Institute, 1989) that could have application in risk communication efforts, but they are not "quick and easy" tools appropriate for this catalogue. Other planning tools we located needed significant modification to be useful in agency settings. 2. Audience Analysis One of the keys to successful communication is understanding your audiences in advance. Agencies need to identify the audiences involved in their communication efforts and get a sense of what groups already know, what they need and want to know, and what they expect from the agency. Audience analysis tools provide a means for practitioners to clarify their perceptions of audiences in organized ways or to solicit feedback from key audiences before, during and after a communication program. Such feedback can help practitioners maintain an open channel between the audience and the agency throughout the communication effort. These strategies are common in public relations and advertising practice, where ongoing feedback from an audience is important to respond to changes rapidly. 2A. Conceptual/Organizing Techniques These techniques do not involve any data collection from audiences. Rather they are frameworks to help communicators systematically organize and analyze their impressions about different types of audiences. 2A-1. Policy Profiling Questionnaire Purpose: To identify stakeholders in an issue and organize agency perceptions of them. Lead Time: Low Staff Time: Brief—might include a meeting of involved staff. ------- Evaluating Risk Communication Programs 51 Budget: Low This tool helps agencies assess their perception of the potential impact that important actors can have on a decision or course of action. Agency staff identify stakeholders and numerically rate each of them in three categories: issue position, power, and salience. These ratings allow a calculation to determine whether the stakeholder might oppose, support, or be neutral toward a decision. This tool guides the agency's internal assessment of relevant stakeholders and involves no formal data collection. It is a means for organizing and comparing perceptions of stakeholders to anticipate reactions to a decision or issue. However, the ratings are based solely on the perceptions of agency staff and are only as valuable as those perceptions. 2A-2. Audience Analysis Matrices Purpose: To identify relevant audiences and organize agency perceptions of their reactions, involvement, or posi- tion in a communication effort. Lead Time: Low Staff Time: Brief Budget: Low Matrices are developed which identify relevant audiences and cross-reference the audience with another important variable— such as issue position, anticipated reactions, or issue importance. These matrices allow a graphic representation of groups in a communication effort while also encouraging greater awareness of the specific audiences and their qualities. These matrices are based only on the perceptions of agency staff—they involve no data collection. The instrument may be limited by the degree of knowledge, intuition, and sensitivity present within the agency. 2B. Preliminary Audience Feedback These techniques involve collecting information about an audience in advance of communicating to help anticipate the audiences's needs and interests. 2B-1. Audience Information Needs Assessment Purpose: To gather questions from relevant audiences in ad- vance of public meetings so a response can be orga- nized and presented. Lead Time: Moderate to high—requires a number of weeks to mail out inquiry, receive responses, and organize the information. Lead time may be decreased if telephone contacts are used instead of mailed inquiry. Staff Time: Moderate Budget: Low to moderate Questions from an audience are gathered in advance of a public meeting so agency staff can develop a meaningful response. The agency response may involve both written ------- 52 Evaluating Risk Communication Programs and verbal answers to the questions. This approach, which helps agencies meet community needs, establishes a precedent of listening to the audience and responding to its concerns. However, it may require too much lead time for a crisis situation, and the answers generated in advance may still meet with disagreement and dissatisfaction from the audience. 2B-2. Analysis of News Clippings Purpose: To identify audiences and their concerns. To develop some historical knowledge of a community to help in planning future phases of a communication effort. Lead Time: Variable, depending on how far back in time the analysis goes. Staff Time: Variable, depending on the extensiveness of the re- view. Budget: Low Background information about on-going issues is obtained by locating appropriate newspapers and clipping articles relevant to the issue in question. The clippings can be analyzed for a variety of factors, including perceptions of prior agency behavior, public concerns, principal actors, key events, and community mood. While a useful source of input and background information, news clippings may reflect media biases, journalistic sensationalizing, and the inaccuracies of the rush of daily reporting. 2B-3. Public Opinion Polling Purpose: To assess audience opinion or reaction; to find out what people see as important problems, what issues and events they are aware of, and how they evaluate social and political institutions. Lead Time: Moderate.dependingonhowformalapollisrequired. Staff Time: Moderate Budget: Moderate to high—may involve contracting with a polling firm to obtain useful results. A low estimate for a very brief formal poll with a relatively small sample is about $2000. Informal telephone surveys may require fewer resources. Polling can give agencies a sense of public attitudes and perceptions so the agency can better target its communications. Carefully constructed polls can help prevent surprises and provide a baseline for the later evaluation of the communication effort. Agencies may hire firms to design and conduct polls on specific issues. These polls benefit from careful development of the polling questionnaire and random sampling to increase the reliability of the data. They may also be quite expensive. Informal telephone surveys involve briefer questionnaires and smaller samples. Informal surveys may be more practical and less expensive, but also less reliable. Polls and surveys tend to consist of ------- Evaluating Risk Communication Programs 53 closed-ended questions that limit the richness of the data and can fail to convey the complexity of public perception. 2B-4. Public Opinion Polling/Pollstart Purpose: To organize and analyze polling data on personal computers available within agencies. Lead Time: Moderate to high, depending on extensiveness of the poll, expertise in polling design available, and knowledge of personal computers. Staff Time: Moderate—depends on previous expertise and skills. Budget: Moderate. Pollstart software costs $98.00; Public OpinionPolling.abook that guides useof the software, costs $19.95. Pollstart is apiece of computer software which allows agency staff to tabulate and analyze polling data on a typical office personal computer. The manual for Pollstart provides step-by-step guidance on how to encode the data within computer files and how to generate "frequency reports" and "cross-tabulations." Public Opinion Polling provides useful background on polling and a useful outline of the steps in planning and developing a poll. The book was written as a companion volume for the software. While this system provides an excellent review of polling issues, it does not make the reader a survey design expert, and less experienced readers may still have difficulty designing appropriate surveys. The software is also not capable of doing more complex data analysis. 25-5. Qualitative Questionnaires Purpose: To collect information from people whom agencies have involved in a communication effort. Lead Time: Low to high, depending on the complexity of the questionnaire and the time needed to develop it. May also require at least two weeks to receive responses to mailed questionnaires. Staff Time: Low to moderate—depends complexity of feedback to be tallied. Budget: Low to moderate Questionnaires are developed, usually in-house, to assess audience positions on issues or responses to agency process. Because they may involve a small sample, the feedback may not be statistically accurate or generalizable. These questionnaires can still provide early input about specific directions an agency might take, or reasonably rapid assessment of audience reactions. Questionnaire development, distribution, and tallying can take considerable effort. 3. Message Pretesting Agencies can obtain useful feedback on written materials by having them reviewed (pretested) in advance of production and distribution. This input can significantly ------- 54 Evaluating Risk Communication Programs improve materials so they are more easily understood and communicate the intended message more effectively. Message pretesting may involve surveys and questionnaires, discussion groups, and/or reviews of the language used in a document. Agencies can assess whether the document is too complicated for the intended audience, the amount of jargon, and other aspects of the writing style. We found the work of the National Cancer Institute (1984,1989) to be of great value in exploring and assessing these techniques. 3 A. Brief Approaches These techniques give feedback in a short amount of time. 3A-L Rightwriter Purpose: To review documents written on computer word- processing programs for errors in grammar, style, usage, and punctuation. Lead Time: Low Staff Time: Low Budget: Rightwriter software currently costs $95.00. Rightwriter reviews documents on computer and creates a "mark-up" copy, includ- ing feedback on grammar, style, usage, and punctuation in the text, as well as a summary of the analysis. This summary includes a readability quotient, a strength index, a descriptive index, a jargon index, and a sentence structure analysis. The summary also includes a list of words which readers might find difficult to understand. The program is easy to use and quite rapid. While it can provide a useful feedback mechanism for written materials, Rightwriter does not "understand" the content of the text and can give no feedback about tone or appropriateness. In addition, some Rightwriter feedback may be confusing, difficult to understand, or irrelevant. 3A-2. SMOG Readability Grading Formula Purpose: To evaluate the level of reading comprehension a person must have to be able to understand a piece of written material. Lead Time: Low Staff Time: Low Budget: Low This approach involves reviewing a sample of text from a written piece and performing some simple mathematical calculations to obtain a SMOG grade, which represents the reading grade level a person must have reached in order to understand the text The higher the grade level, the more sophistication is necessary to understand the material. Assessment of readability, along with a knowledge of the target audience's level of sophistication, can allow agency staff to produce materials that will be more accessible to their audiences. Readability quotients are useful as a "first cut" in reviewing drafts of materials for the public, but they give no feedback on style, format, tone, or content. In ------- Evaluating Risk Communication Programs 55 addition, frequent use of long terms that may be necessary in scientific reports may inflate the SMOG grade. 3A-3. Signaled Stopping Technique Purpose: To examine how readers process information as they read written materials and through this procedure to get feedback on those materials. Lead Time: Low Staff Time: Low Budget: Low In this approach, respondents read through a document and put slash marks where they stop. They are then provided with a coding scheme to notate why they stopped at each slash. These reasons for stopping provide feedback to the writer. Respondents may stop due to being confused, needing to re-read, having a question, wanting to think about the idea, or agreeing or disagreeing with the writer. This technique can help writers recognize confusing or controversial statements within a piece of text and consider revisions, but its value may be diminished if the reader is unmotivated or uninterested. 3B. More Extensive Feedback Methods These methods give richer feedback but also take more time to administer. 3B-1. Self-administered Pretest Questionnaires Purpose: To get feedback on pretest materials. Lead Time: Moderate—allow at least two weeks if questionnaire is mailed. Staff Time: Moderate Budget: Low to moderate Questionnaires about written material are developed to elicit both quantitative and qualitative feedback from readers representative of the intended audience. The questionnaire may include questions about format, comprehetision, reaction, interest in the materials, and any other relevant opinions. Questionnaires may include open-ended or closed-ended questions, depending on the items being pretested and type of feedback desired. The approach may be limited by low response rates to mailed questionnaires and the amount of follow-up time needed to insure a meaningful response. 3B-2. Central Location Intercept Interviews Purpose: To get feedback on pretest materials or to examine an audience's attitudes and opinions. Lead Time: Moderate Staff Time: Moderate to high Budget: Low to moderate ------- 56 Evaluating Risk Communication Programs Interviewers are stationed at a place frequented by a target audience. They recruit participants who review materials and then respond to a series of multiple-choice or closed- ended questions. The structured interviews provide feedback that can be summarized quantitatively. Careful planning when using this approach can increase the reliability and generalizability of the data, but central location interviews typically reflect a non- random sample weighted in favor of those who are able to get to the particular site. In addition, the necessity of using closed- ended questions may deprive the agency of richer feedback from a more extended discussion. 3B-3. Theater Testing Purpose: To get feedback on visually presented pretest mate- rials. Lead Time: Moderate Staff Time: Moderate Budget: Moderate to high Films, public-service announcements, slide shows, or other audio-visual mate- rials are observed by a group of respondents in a theater or auditorium. After watching the film, participants fill out a pretest questionnaire to provide the agency with feedback. While very useful to improve visually presented messages, this approach may require a great deal of time and logistical arrangements, in addition to design of the message itself and the questionnaire. 3B-4. Focus Groups Purpose: To get feedback on and generate ideas about pretest items. To get a "feel" for the attitudes and beliefs of a target audience. Lead Time: Moderate to high Staff Time: Moderate Budget: Moderate to high A focus group is a discussion session run by a trained moderator. It may include six to twelve participants, who discuss pretest materials or issues of importance to a communication effort. Areas covered in a focus group discussion are outlined in the moderator's guide, which is developed before the session. Focus group discussions generally yield qualitative feedback as summarized in a report by the moderator. These reports can give an in-depth sense of participants' language, their reactions to the materials, and suggestions for improvement. Formal focus groups require careful planning and moderation and may therefore be too resource-intensive for the average agency. 'Target audience meetings," involving brief informal discussions with a neutral moderator, a group typical of the target audience, an agenda planned in advance, and some procedure for note- taking, can be useful and less expensive. ------- Evaluating Risk Communication Programs 57 4. Assessment of Communicator Style Although agency staff may traditionally focus on "facts" as opposed to relation- ships, conflict in styles can lead to tremendous frustration as well as impasses in a given communication. Armed with the facts alone, practitioners may be doomed to skirmish with audiences whose very style of perceiving the world and communicating about it differs from theirs. Tools in this category can help communicators examine what they bring to the communication process. Mostof these tools are self-assessment surveys that are completed and then scored, providing a profile of the respondent's style, type, and/or motivational pattern. This profile provides a model for understanding communication situations, which in turn can help practitioners gain flexibility within their own style, recognize their strengths and limitations, identify the communication styles of people in their audiences, and recognize and deal with communication impasses resulting from a clash in styles. 4-1. Myers-Briggs Type Indicator Purpose: To provide feedback on the communication styles of agency staff. Lead Time: Moderate to lengthy, due to time needed to secure services of consultant. Staff Time: Low Budget: Moderate The Myers-Briggs Type Indicator (MBTI) is a self-report inventory consisting of 126 questions. It provides feedback on respondents' communication styles in terms of four scales: Extraversion-Introversion, Sensing-Intuition, Thinking-Feeling, and Judging- Perceiving. The profiles generated in terms of these four scales include feedback about communication strengths and weaknesses. Communicators can become aware of their own strengths and weaknesses while learning to recognize differing communication styles in their audiences. The MBTI model has been used in consultation with risk communi- cators and has helped foster flexibility in communication style. However, the psychological theory of type underlying the tool may not fully capture the diversity of personality styles, and the feedback from this tool is of limited value without a consultation to set it in context. 4-2. Strength Deployment Inventory Purpose: To identify the strengths of agency staff and suggest ways these strengths can be used to communicate more productively with others. Lead Time: Moderate to lengthy, due to time needed to secure services of contractor. Staff Time: Low Budget: Moderate. Each Inventory form costs $3.45; con- sultation is additional. The Strength Deployment Inventory (SDI) consists of twenty questions, some of which refer to situations where things are going well, and some of which refer to situations where things are going wrong. The SDI is self-scoring, and respondents identify whether ------- 58 Evaluating Risk Communication Programs they are characterized by any of seven style patterns, each of which implies different strengths, weaknesses, and motivations which may be reflected in interpersonal com- munication. The inventory is easy to complete and provides quick feedback about an individual's style. The SDI model is one way of understanding differences in personal styles and their impact on communication. A consultation should accompany the tool for maximum benefit. 4-3. Conflict Management Survey Purpose: To provide feedback about a respondent's approach to conflict. Lead Time: Moderate to lengthy, due to time needed to secure services of consultant. Staff Time: Low Budget: Moderate. Each survey form costs $5.60 and con- sultation is additional. The Conflict Management Survey presents scenarios in each of the following areas: personal views of conflict, interpersonal conflicts, the handling of conflict in task groups, and conflict in relationships among groups. Respondents note how they would respond to each conflict scenario, and after a self-scoring exercise, a style preference is determined, which represents the respondent's preferred mode of managing conflict. Through consultation, respondents become able to understand the implications of their style preference and develop the flexibility to use other styles if situations dictate this. Feedback from this tool may seem threatening if not accompanied by a good consultation. 4-4. Communication Style Survey Purpose: To provide feedback on the respondent's style of interpersonal communication. Lead Time: Moderate to lengthy—surveys need to be mailed to Chicago for scoring, and a consultation should be arranged. Staff Time: Low Budget: Moderate—standard fee of $ 140 per person which is negotiable The Communication Style Survey consists of a self-assessment form and "other- assessment" forms to be filled out by people who know the respondent well. The survey involves choosing among a set of words the term that most aptly describes the respondent. The data are processed to yield an assessment of communication style as some combination of Analyzing, Facilitating, Advocating, and Controlling. This Style Profile is accompa- nied by feedback on the respondent's oral communication competency and adaptability. Consultation is needed to help respondents understand the strengths and weaknesses of each communication style and develop flexibility. ------- Evaluating Risk Communication Programs 59 5. Outcome Assessment Agencies typically view evaluation as a means of finding out whether what they did worked or not. As suggested earlier, carefully designed scientific evaluation research is required to draw these kinds of conclusions. When agencies have little time and few resources, however, they may still need to find out how audiences have reacted to phases of the communication effort and to the effort as a whole. The outcome tools we recommend provide strategies for getting feedback on audiencereaction and communicatorperformance. 5A. Audience Reaction Audiences are asked what their reaction is to a presentation. 5A-1. Meeting Reaction Form Purpose: To get feedback about participants' reactions to a public meeting. Lead Time: Low to moderate, depending on whether the form developed by the Environmental Communication Research Program needs modification for specific agency use. Staff Time: Moderate—includes preparation of form, distribution, and data analysis. Budget: Low The Environmental Communication Research Program has developed a form for distribution at public meetings which examines whether information was understood, whether presenters were perceived as honest, whether people felt their concerns and issues were understood, whether people felt their input would be used in decision-making, etc. Other relevant issues can also be addressed. The particular form described in this catalogue was designed to get feedback from various constituencies involved in a public participation program run by the B ureau of Water Quality S tandards and Analysis (B WQS A) of the New Jersey Department of Environmental Protection. While it provides a quick, easy, and inexpensive way to get feedback about a public meeting, the form is not standardized or scientifically validated and some feedback could be difficult to interpret. 5A-2. Verbal Meeting Feedback Purpose: To get direct feedback from participants at a meeting. Lead Time: Low Staff Time: Low Budget: Low Time for a structured feedback discussion is planned in a meeting agenda. The meeting chairperson actively solicits and may even record this feedback on a chart for everyone to see. Participants should feel free to comment on any aspect of the meeting, and conflicting statements are allowed. The goal is to generate as many idea as possible rather than going into detail on any one idea. This approach is highly dependent on the skill of the chairperson in creating a comfortable environment for feedback and inviting partici- ------- 60 Evaluating Risk Communication Programs pation. Less verbal members may not be heard, and it is difficult to know whether this kind of feedback is in any way representative of the views of the group as a whole. SB. Performance of Presentation These techniques provide feedback more specific to how the communicator per- forms than how the audience reacts. 5B-1. Speech Evaluation Checklist Purpose: To get feedback on how a speech or presentation went. Lead Time: Low to moderate—depending on design of form. Staff Time: Low Budget: Low The Speech Evaluation Checklist is a simple form to get feedback on a speech or presentation. It may include statements about the physical setting of the speech, the speaker's appearance, rapport, comprehensibility, and other important areas. The forms can be completed by one or a number of evaluators who observe the speech. Alternatively, a speech can be audio- or video-taped for use for scoring by the presenter. The form is not intended as a "report card," but as a chance to get some input on a speech that will improve future presentations. This approach can provide immediate, relevant written feedback, but the perceptions of other agency staff may differ markedly from the perceptions of the audience. 5B-2. Observation and Debriefing Purpose: To get feedback on speeches and presentations. Lead Time: Low to moderate—time needed to develop an ob- server checklist. Staff Time: Low Budget: Low One or a number of observers attend a presentation and take organized notes, using their perceptions of the event and some kind of observer checklist based on the goals of the presentation. An informal verbal debriefing session may be held after the presentation to review important strengths and weaknesses with regard to both the speaker's performance and the audience's reactions. The presenter can also use an audiotaped or videotaped version for self-assessment. While this is a quick and easy way to provide feedback on a speech, it should not substitute for finding out the audience's actual reactions, and it can be uncomfortable for the observers or the presenter depending on their roles within the agency. ------- Evaluating Risk Communication Programs 61 REFERENCES Briggs, K.C. and Myers, I.E. 1976. Mvers-Briggs Type Indicator. Form G. Palo Alto, CA: Consulting Psychologists Press, Inc. Green, L.W., Kreuter, M.W., Deeds, S.G., and Partridge, K.B. 1980. Health Education Planning: A Diagnostic Approach. Palo Alto, CA: Mayfield Publishing Company. Hance, B.J., Chess, C., and Sandman, P.M. 1988. Improving Dialogue with Commu- nities: A Risk Communication Manual for Government. Trenton, NJ: New Jersey Department of Environmental Protection, Division of Science and Research. National Cancer Institute. 1984. Pretesting in Health Communications: Methods. Examples, and Resources for Improving Health Messages and Materials. Washington, DC: National Institutes of Health, NIH Publication #84-1493. National Cancer Institute. 1989. Making Health Communication Programs Work: A Planning Guide. Bethesda, MD:National Institutes of Health, NIH Publication #89- 1493. Rossi, P.H. and Berk, R.A. 1988. A Guide to Evaluation Research Theory and Practice. Discussion Draft prepared for the Workshop. ------- COMMENTARIES ON EVALUATION ISSUES Developing the Message ------- Selecting Appropriate Strategies Mildred Zeldes Solomon This paper describes some guiding principles for the development of risk reduction messages, and, like most recommendations on the design of effective messages, these are based on research in the health promotion field. Professionals in both public health and the environment face risk communication challenges that are similar in some ways and different in others. It is hoped that environmental professionals will be stimulated to test the usefulness of the following suggestions for their own work in environmental risk reduction. Recentreviews of the effectiveness of several different kinds of risk reduction efforts in the public health field (Wallack and Corbett, 1987; Robertson, 1983), make it clear that environmental and legislative changes are powerful forces for change and that their effects often have been greater than those of education directed to individuals in isolation. Wallack and Corbett (1987), for example, point to the effective role that legislative changes have had on cigarette consumption: bans on cigarette advertising and increases in excise taxes have reduced demand. Similarly, Grossman, Coate, and Arluck (1984) found that both raising the drinking age and increasing alcohol prices lowered alcohol consumption among youth. In the area of injury control, state mandates for the use of infant restraint seats were much more successful than education programs, even ones that included the provision of free car seats (Robertson, 1983). Examples of effective environmental changes in public areas include such diverse measures as highway redesign to cut down on automobile crashes; handgun control; point- of-purchase access to condoms for young people who might be too embarrassed to ask the clerk for condoms stored out of sight; and safety caps for electrical outlets that pose a threat to infants and toddlers. Researchers in the United Kingdom even found that they were able to influence the suicide rate dramatically in that country by redesigning gas stoves. Clearly, there are at least three leverage points: education aimed at persuading individuals to change their behaviors; environmental redesign (sometimes called "passive measures" because they do not require that individuals take action); and legislation. Public health experience suggests that, whenever possible, program planners should eschew 65 ------- 66 Developing the Message methods that require people to change their behaviors in favor of passive measures. To promote water conservation, for example, it would be wiser to install self-shutting faucets in public facilities than to post signs exhorting people to use less water. However, passive measures are not always an option. We will probably always need to communicate information about new hazards and to encourage the adoption of new recommendations. Preliminary Research What, then, does research say about designing effective educational or persuasional programs? The most salient feature of effective programs is that they are informed by rigorous preliminary research that is used actively to design the program. Figure 1 presents the most critical questions that this research should seek to answer. They are questions that help clarify whom we should be targeting, with what messages, in what medium, at what time, and in what ways. They help us come to a better understanding of the target audience's beliefs, values, and current behaviors. They provide information about the social and physical context in which our audience lives, and they help us predict how best to incorporate and deliver our messages in those settings. The point that preliminary research is vital to the design and implementation of risk reduction programs is so simple and straightforward that it may appear facile. But the literature is full of examples of programs that failed precisely because appropriate preliminary research was not conducted. Early efforts to encourage the use of oral birth control pills among women in one developing country, for example, resulted in women inserting the pills vaginally! During the second World War, the U.S. Armed Forces mounted expensive and essentially ineffective campaigns against venereal disease that relied on excessive appeals to fear without providing the men with clear messages about what they could do to protect themselves. And until very recently, many health promotion advocates thought it would be sufficient simply to give adolescents information about the harmful effects of drugs, without taking into account the psychosocial dimensions of drug experimentation by youth. As painful as these failures have been, they also have been useful. We now know, for example, that fear arousal is effective only when coupled with concrete recommen- dations for what the individual can do to reduce the risk. People must feel that they have it within their power to eliminate or modify the potential source of harm. Without such assurances and the knowledge and skill to implement the changes, arousing fear is likely to engender defensive reactions that lead to denial (Berkanovice, 1976; Leventhal, 1970; Mewborn and Rogers, 1979). Furthermore, many current health and environmental problems are better served by reducing fear than by elevating it. For example, without more concerted efforts to reduce community fears about the siting of waste disposal facilities, the dreadful situation of unlimited and unregulated midnight dumping will persist. ------- Selecting Appropriate Strategies 67 Figure 1 Questions to Consider in Message Design 1. Have you determined the leverage points most likely to yield the best results, given your goals and resources: education aimed at behavior change; environ- mental redesign; legislation? 2. Who exactly is your audience? How can your audience be most usefully segmented? 3. What is it your audience needs to know? 4. What do you want them to do? 5. How do you want to make them feel? 6. What relevant existing values and beliefs do they have? Which values and beliefs are ones onto which you might "piggyback" your message? 7. Which beliefs are likely to run counter to your message? While providing counterclaims, have you acknowledged and respected the group's current beliefs? 8. How can you establish that the desired behavior is normative or valued by a respected elite? 9. What role can significant others play in promoting the behavior? 10. What obstacles exist to the adoption of the target behavior? 11. How can you acknowledge the obstacles without overstating their importance? 12. What support/incentives (both social and material) can you offer to overcome obstacles? 13. In what ways can existing social networks be used to convey the message? 14. How can existing social networks be enhanced to support your goals? 15. Have you focused on underlying attitudes, behavior change, and skill develop- ment rather than disease etiology or other facts for their own sake? 16. Is the message simple? 17. Have you found multiple ways to deliver the message and to repeat it over time? 18. What strategies have you employed to enhance the group's perception or its susceptibility to the risk and its ability to do something about it? 19. Are you satisfied that you have found the right use of fear, combining fear arousal with concrete recommendations people can carry out to eliminate or modify the risk? 20. How will you measure success? Will you be able to make use of evaluation information to revise your message and/or your implementation strategies? ------- 68 Developing the Message Strategies And there are other lessons. We know, for example, that people are more likely to adopt a recommended change if it is a simple one-time act, such as installing a water-saving shower head, than if it is a complex behavior or one that must be repeated and maintained over time (Robertson, 1983). If the target group is misinformed or holds strong misperceptions about the risk, it is important to acknowledge and respect its beliefs even while countering them (McGuire, 1968). If the new information is not presented in relation to existing beliefs, people are more likely to rely on their current understanding, dismissing the new information as no more authoritative than their own conceptions. Creating what social psychologists call "cognitive dissonance" is another important strategy for accomplishing shifts in perception. Cognitive dissonance refers to the sense of imbalance people experience when they perceive contradictions in a set of beliefs. Program planners can create dissonance by introducing new information that clashes with current beliefs. If that information is introduced in a way that allows it to be heard—that is, if existing beliefs are not ridiculed, if popular myths are acknowledged, if the target audience feels the program planners understand its perspective- -the resulting dissonance creates a demand for re-thinking that can lead to changes not only in perception but in behavior as well. Audience segmentation, a concept derived from social marketing (Kotler, 1971), is an important characteristic of well designed educational and persuasional programs. Audience segmentation refers to the process of dividing the target audience into subgroups on the basis of their beliefs, needs, and other salient features such as age, occupation, or ethnic identity. The anti-hypertension campaign, sponsored by the National Heart Lung and Blood Institute (NHLBI) of the National Institutes of Health over the last decade, is an excellent example of successful market segmentation. At first blush, one might assume the audience for a hypertension control program to be essentially homogenous, butNHLBI developed separate messages and separate delivery mechanisms for a number of different groups. For example, they distinguished between those who had risk factors for hyperten- sion but did not know it, those who knew they had hypertension but were not in therapy, and those who were in therapy but were not complying with medical recommendations. Another way to segment one's audience is to think not only about the ultimate target group but about appropriate "intermediaries" who can provide access to the audience, lend credibility to the campaign's messages, and help create the sense that the targeted behavior is accepted by one's social group (Solomon and Dejong, 1986). Indeed, successful health promotion programs recognize that the unit of focus is not the individual but the group (Berkanovice, 1976). If we want to effect and sustain behavior change, we have to mobilize and influence groups of peers and significant others. Who should be mobilized and how they should be involved will depend upon the target group: Only preliminary research and/ or experience with the group will reveal appropriate answers. In an Hispanic community, for example, it is likely that respected authority figures will have to agree with, endorse, and/or convey messages. A campaign with similar goals but targeted to say, adolescent runaways, might focus less on authority figures and more on the peer group. Successful campaigns also recognize obstacles that impede the adoption of the targeted behavior, and they attempt to build in incentives for those who make the ------- Selecting Appropriate Strategies 69 recommended change (Green et al., 1980). For example, one obstacle confronting homeowners who would like to dispose of their household hazardous wastes safely is the lack of convenient alternatives to simply putting them out with the trash. Some communities acknowledge this obstacle and help overcome it by designating a "Hazardous Disposal Waste Week" when special home pick-ups are made. A creative approach might include an incentive for participating in the program, such as a rebate on town dump fees. The Health Belief Model A discussion of the principles of good message design would not be complete without reference to the Health Belief Model (Janz and Becker, 1984; Strecheretal., 1986), which attempts to predict health behavior change on the basis of five key variables: • Perceived susceptibility to the risk • Perceived severity of the harm associated with the risk • Effectiveness (the perception that something can be done about the risk) • Self-efficacy (the perception that it is within the person's power to do something about the risk) • Perceived benefits of making the change outweighing the burdens of the status quo. Let's consider this model and some of the recommendations made above in relation to a current environmental risk: household radon levels. Let's imagine that our goal is to encourage homeowners to install radon detection devices and, if elevated levels of radon are discovered, to make the necessary corrections. According to the Health Belief Model, before people would do either of these things they would have to believe that a) they (or their house) were susceptible—they would have to feel that it was likely that radon were present, b) the effect of indoor radon pollution was serious enough to warrant their attention, c) something could be done about it, if it were discovered, d) it was within their power to do something about, and e) the inconvenience, psychological disturbance, cost, and other burdens of detecting and correcting the radon problem were outweighed by the benefit of eliminating (or modifying) the risk. Shrewd program planners would want to assess what the homeowners' current perceptions of these issues are. Do homeowners at risk realize that they are? Are they aware of radon? Do they think it likely that radon would be discovered where they live? Do they consider it asienificanthealth risk? Do they believe that anything (short of moving away) can help? If the answers to these questions are "no," the risk reduction program ought to begin by raising homeowners' awareness of their susceptibility and the severity of risk, and by providing information about the effectiveness of corrective measures. But program planners should recognize that simple awareness will not lead to the desired behavior changes. Later iterations of the program must consider the obstacles (in Health Belief Model terms, the burdens) and potential incentives (in Health Belief Model terms, the benefits): Where do I get a radon detection device? How much will it cost? How hard will it be to operate? What's in it for me, if I go to all this trouble? ------- 70 Developing the Message So far, we have focused only on the personal beliefs of the homeowner, as if she or he were not part of a larger community. But we also must recognize the increased leverage we have if we conceptualize our homeowner as a social creature. Instead of relying, say, on home mailings (which might be a very legitimate component of our campaign), we may wish to consider also reaching Mr. and Mrs. Homeowner through existing social networks of importance to them, such as church groups, schools, and civic associations. In addition, we always must ask whether an education or persuasion program is really the best choice. Is it the only choice? In the case of radon control, we also ought to ask if there is any way to redesign the environment to modify the risk. Should we, for example, be urging the use of different kinds of building materials to provide greater protection? Is there any other "passive measure" available to us? What about legislative or regulatory leverage points? For example, early efforts to promote the use of smoke detectors relied on direct promotion to homeowners. But the reason that smoke detectors are commonplace today is not because promotional efforts were effective, but because state and local laws now mandate their installation by landlords and house sellers. Furthermore, house inspections of smoke detectors are now a routine part of fire departments' responsibilities. Landlords, house sellers, and firemen may be inappropriate leverage points for encouraging radon control, but is it too farfetched to consider the builder? Some day will a satisfactory radon reading be as obligatory for contractors as a satisfactory water percolation test? These are examples of the types of interventions available to risk reduction program planners. Too often, we stop short of such brainstorming, assuming that only one kind of intervention is available to us. Instead we need to ask in as open-ended a way as possible and from the very start of our work: What kinds of changes in personal behavior, environmental redesign, and in law or regulation are most likely to accomplish our goals? Careful program planning will consider all the options and proceed with the best strategy or combination of strategies. When education or persuasion is to be part of the mix, we now have the benefit of considerable experience in health promotion to help design, based on preliminary audience research, messages that work. REFERENCES Berkanovice.E. 1976. Behavioral Science and Prevention. Preventive Medicine5:92-105. Green, L.W., et al. 1980. Health Education Planning: A Diagnostic Approach. Palo Alto, CA: Mayfield Press. Grossman, M., D. Coate, and G.M. Arluck. 1984. Price Sensitivity of Alcoholic Beverages in the United States. Paper presented at Control Issues in Alcohol Abuse Prevention II: Impacting Communities Conference, 7-10 October, Charleston, South Carolina. Janz, N., and M. Becker. 1984. The Health Belief Model: A Decade Later. Health Edu- cation Quarterly 11:403-418. ------- Selecting Appropriate Strategies 71 Kotler, P., and G. Zaltman. 1971. Social Marketing: An Approach to Planned Social Change. Journal of Marketing 35:3-12. Leventhal, H. 1970. Findings and Theory in the Study of Fear Communications in Advances in Experimental Social Psychology: ed. L. Berowitz, Vol. 5. [Location?] Academic Press. McGuire, W.J., 1968. The Nature of Attitude and Attitude Change. In Handbook of Social Psychology, ed. G. Lindzey and E. Aronson, Vol. 3. Reading, MA: Addison Wesley. Mewborn,C.R.,andR.W. Rogers. 1979.EffectsofThreateningandReassuringComponents of Fear Appeals on Physiological and Verbal Measures of Emotion and Attitudes. Journal of Experimental Social Psychology 15:242-253. Robertson, L.S. 1983. Control Strategies: Educating and Persuading Individuals. In Injuries: Causes. Control Strategies, and Public Policy. Lexington, MA:Lexington Books. Solomon, M.Z., and W. Belong. 1986. Recent Sexually Transmitted Disease Prevention Efforts and their Implications for AIDS Health Education. Health Education Quarterly. 13(4):301-316. Strecher, V.J., et al. Spring 1986. The Role of Self-efficacy in Achieving Health Behavior Change. Health Education Quarterly 13(1):73-91. Wallack, L., and K. Corbett. Summer 1987. Alcohol, Tobacco, and Marijuana Use Among Youth: A Overview of Epidemiological, Program, and Policy Trends. Health Education Quarterly 14(2):223-49. ------- Tailoring The Message to the Audience James W. Swinehart When planning a risk communication campaign, it is useful to bear in mind that public information is only one part of comprehensive efforts to improve health and safety. Information provided through the mass media and other means can improve people's knowledge, attitudes, or skills, and thus their risk-related behavior, which also is influ- enced by laws, regulatory actions, and technology. None of these should be expected to do the job alone, but appropriate combinations should produce lower risks of various kinds and lead to reductions in morbidity and mortality. A campaign plan should be based on answers to several questions: • What audiences are we trying to reach? • How large is each of these target audiences? • How many people in each audience are already taking the actions we recom- mend? • What barriers (e.g., ignorance, fear, cost) are keeping other people from taking these actions? • What do we want to communicate? • How will it be said? • Who will say it? • What combination of media available will reach people most efficiently and effectively? • How will the results be measured? 73 ------- 74 Developing the Message Setting Objectives Assuming that a campaign is not seeking only to inform people or to remind them of something, but also to persuade them to take a particular action, the intended result should be stated in behavioral terms. Stating the desired outcome helps to sharpen the message and increases the likelihood that any evaluation will be appropriate. Of course, some outcomes are harder to produce than others; for example, starting or stopping an activity is usually harder than changing it, and taking an action repeatedly is harder than doing it only once. Knowing the level of difficulty in advance makes it possible to set more realistic expectations. Table 1 provides some examples of audience segments and of objectives, but these are necessarily somewhat abstract. The objectives for an actual campaign should be practical as well as appropriate. If a message recommends an action that costs money, can people afford it? If the action requires access to facilities, are the facilities available? Do people believe it will do any good? Would doing it conflict with their personal values or self-image? Would their friends oppose their doing it? Is the action painful, boring, or inconvenient? Any such constraints should be known at an early stage of planning. Table 1 Some Categories of Health Related Behavioral Messages 1. Start doing X (a particular action with health consequences) 2. Don't start doing X (or continue not doing it) 3. Continue doing X 4. Stop doing X 5. Do more of X 6. Do less of X 7. Do X differently (in such a way as to reduce risk) 8. Do X once 9. Find out about X 10. Get someone else to (start, stop, continue, etc.) Examples of Topics and Target Audiences bv Messages TOPIC TARGET AUDIENCES MESSAGES Smoking nonsmokers 2 10 former smokers 2 10 current smokers 4679 Breast self- those who have not done it 1 examination those who have done it 3 9 10 those who do it incorrectly 7 9 ------- Tailoring the Message to the Audience Safety belt nonusers use current occasional users current consistent users families or friends of nonusers 75 Nutrition Alcohol/drugs Immunizations Hypertension people with excessive intake of sugar, sodium, saturated fat, etc. people with insufficient vitamin A non-abusers current abusers former abusers parents of preschool children people with undetected HBP people with detected HBP families of people with detected HBP 1 9 5 9 3 10 10 4679 1 5 9 10 2 10 469 2 10 8 9 10 1 8 9 1 3 10 Prenatal care Exercise pregnant women (regarding nutrition, substance abuse, physical exams, etc.) people who already do it people who don't do it people who overdo it 1246 9 3 5 1 9 6 10 Radon, lead people who have not checked their paints, etc. homes for possible contamination 8 10 Designating Target Audiences The target audiences listed as examples in Table 1 were chosen because they had some obvious connection with the topics, but greater differentiation should be used in planning an actual campaign. In the case of smoking, for instance, "current smokers" could be divided into several sub-groups depending on their desire to quit, previous efforts to quit, knowledge of risks related to smoking, social support for quitting, and other factors. Regardless of the topic, answers should be sought to the following kinds of questions: What specific population groups are most affected by the problem? ------- 76 Developing the Message • Are they also the ones in the best position to do something about it, or should the campaign be addressed mostly (or at least in part) to others? • How accessible are they, and how susceptible to influence? • What proportion of the people affected have tried previously, and unsuccessfully, to do something about the problem? • How much do they know about it? • How many people hold incorrect beliefs about its seriousness, its causes, or intervention methods? • How many are afraid of it, or apathetic about it, or merely resigned? • How much public interest is there in the problem, and what is its perceived importance in relation to other problems? • How many people feel that the importance of the problem, coupled with the prospects for successful intervention, can justify individual or collective actions as control measures? • How many people, in what population groups, will be receptive to information about the problem? Will they have the opportunity and the ability to influence others? For each target audience identified, a summary should be prepared which gives the following information: • Description of audience • Objectives • Barriers to recommended action(s) • Communications strategies/themes/appeals • Spokespersons • Media/channels/vehicles • Methods of measuring results Some of the information for this summary can be derived from three worksheets, each with a matrix showing all of the target audiences and various choices made regarding them. One worksheet should indicate the media/channels/vehicles through which the campaign will reach each audience; another should show the themes or appeals chosen as likely to influence each group; and the third should show the kinds of spokespersons thought to be effective with each group. Preparing these worksheets can be difficult and time-consuming, since it involves making several hundred decisions. Moreover, many of these decisions will have to be guesses if the needed background information (e.g., on audience beliefs, media usage) is not available from such sources as the Roper Center for Public Opinion Research. The worksheet preparation is worth the effort, however,because it imposes focus and some degree of rationality on the process of campaign planning. Designing Messages The fifteen recommendations that follow, concerning the content and style of messages, are necessarily somewhat general, because they are intended to apply to a wide ------- Tailoring the Message to the Audience 77 variety of topics—health risks related to personal habits, environmental hazards, occu- pational safety conditions, and so on. The suggestions given should be used or adapted as desired to suit particular circumstances. Be careful when using fear as an appeal. Communications aboutrisks typically arouse some amount of fear or anxiety, and this emotion may lead people to avoid or distort a message. Reactions will depend upon the situation, the audience's initial level of concern about the topic, the number and seriousness of threats posed, the perceived effectiveness of actions that can be taken, and several other factors. It is agreed generally that some amount of fear arousal makes people more likely to act, but specifying (and inducing) the optimum amount is very difficult; in some instances, it is better to offer reassurance than to emphasize danger, to allay fear rather than arouse it. Strong fear appeals seem to work best when they pose a threat to the audience's loved ones (rather than a direct personal threat), come from a highly credible source, deal with a topic that the audience knows little about, and are directed to people with relatively low income and education, high self- esteem, and low perceived vulnerability to danger. When it is necessary to emphasize risks, the audience should be in a position to act at once on the recommendations and should be given specific advice to help them do so. There are no firm rules about the choice of information to convey in campaign messages, and it is often hard to decide which points of information are most likely to lead people to take the recommended action. In some cases there is a risk that by emphasizing a point regarded as important, a message may actually result in a decrease in the number of people taking a recommended action. For example, by mentioning that a problem has its greatest impact on certain population groups, a message may lead people in other groups to feel that it does not concern them. Care should be taken to minimize the chance that anv information points or appeals will produce a negative reaction in some people while pro- ducing a positive reaction in others. Rather than trying to use a single message for everyone, it is better to use a series of specialized messages for different audiences. Ideally, each person in your intended audience should feel that the message applies to him or her personally. This is especially true for people in high-risk categories, who may tend to deny the personal relevance of the message. Emphasize the usefulness of the information to the person receiving it. Make the recommended actions as specific as possible and explain why or show how the action can help a member of the target audience. Be sure the information is current and technically accurate. When feasible, have it checked independently. As appropriate, try to identify a particular problem and offer a specific wav to handle ifc but don't try to convince people that this is the best or only solution. Rather seek to convey an understanding of the problem and the reasons for taking the kind of action suggested. When people are initially hostile to a position, or are likely to hear conflicting views. presentboth sides of theissue rather than only one. Doing this has two benefits: it increases credibility, and it prepares the audience to resist arguments that may be presented later by the other side. ------- 78 Developing the Message Avoid exaggeration and moralizing, because either can make people reject the message. In general, the same applies to exhortation, although this is often a function of how the message is given. Most people are willing to accept advice but resent being told what to do. Distinguish between established facts and guesses or assumptions. People may react negatively to an entire message if they recognize an assumption stated as a fact. Make the language and style appropriate to the intended audience. Whenever pos- sible, avoid using technical terms. When such terms have to be used, explain them clearly and briefly. (This is one of the areas in which showing an item to a few lay people and asking for their reaction—the simplest kind of pretest—can tell communicators whether their message is getting across.) In general, make the tone of messages serious rather than flip or frivolous. A hu- morous approach can attract attention and be entertaining, but special care should be taken to ensure that the tone is consistent with the topic, the information presented, and the public image of the sponsoring agency. Use the power of group pressure to reinforce a message. Ascertain the normative beliefs and actions of people with whom the target audience identifies, and use them as appropriate. The fact that "everybody's doing it" can prompt certain people to take a protective action that they would not have undertaken on their own. If presenting a series of messages, let each one seek to convey very limited information. A pamphlet or other printed piece that people can read and review at their own pace can carry several points, but as a rule only one point should be made in a poster, or public service announcement. Identify the intended audience in materials whenever it is feasible to do so. This makes it more likely that the "right" people will pay attention and perceive the message as personally relevant. Find ways to elicit the active participation of the audience, such as writing a slogan, taking notes, role-playing, voting on issues discussed, or taking a self-test. Active involvement facilitates both learning and recall of message content. Choosing Media, Channels, and Vehicles Since no medium will reach everyone in an intended audience, it is important to use multiple channels or media and tailor the choices to the habits and preferences of the kinds of people the program aims to reach. Look for answers to these questions about each medium considered: 1. How do people rate the credibility of this medium versus others? 2. Will getting access to this medium be relatively hard or easy? 3. How much will it cost to produce materials for this medium? 4. How much will it cost to distribute or place materials? 5. How much staff time will be needed for production and placement? 6. Do we have, or can we get, the production capabilities required? 7. How much control will we have over the final product? ------- Tailoring the Message to the Audience 79 8. How much flexibility will we have about the timing of placements? 9. What tie-in possibilities exist with regard to other media? 10. How effective is this medium versus others in conveying a message that people will notice, recall, and act upon? 11. How much repetition (frequency of exposure) will our message have? 12. How efficient is this medium in reaching the kind(s) of people we are addressing in the campaign? (reach + selectivity) 13. In what context will messages appear? What other material will surround or accompany them? 14. What is the probable "set" or frame of mind of the audience when using this medium? 15. How does this medium compare with others in its ability to convey complex information? General answers to these questions can be found in such sources as The Media Book and current textbooks on advertising planning. For answers pertaining to a particular campaign, it may be necessary to consult media planners in an advertising agency or organizations that specialize in placement of public service materials. Any media plan should specify the particular vehicles to be used within the general categories of broadcast, print, and out-of-home kinds of media. Vehicles in each of these categories are listed below. TV and Radio Newspapers. Magazines Out-of-Home PSAs public service ads billboards paid commercials paid ads transit cards talk/interview shows feature articles posters news items interviews news program inserts editorials editorials news items documentaries cartoons specials letters to editor station break tags/slides health/advice columns call-in shows entertainment programs It is also important to differentiate among specialized magazines. For example, the content and style of submissions should differ greatly across these 13 categories of magazines: • automotive • news • business/financial • Sunday ------- 80 Developing the Message • outdoor • shelter • sports • general • men's • women's • national weeklies • fashion/beauty • special appeal (e.g., Esquire. National Geographic. New Yorker. Psychology Today) The same is also true of submissions to radio stations, which normally use one of these formats: • album-oriented • rock, jazz • agriculture and farm • middle of the road, • all news • adult contemporary • soul and blues, • news, weather, Afro-American information • music, instrumental • oldies, popular classics, nostalgia • country and western • public or community affairs • classical, concert, fine arts • religious, gospel, inspirational • disco • rock and roll, folk, progressive • educational, cultural • discussion, interview, personality • ethnic music and topics • hit parade • foreign language • variety, diversified Important considerations in selecting channels and vehicles are not only how many members of the target audience will have the opportunity for exposure to the message, but also how many are actually likely to be exposed, will pay attention, will learn from it, and so on. Table 2 indicates the factors that help determine channels' and messages' effectiveness. Measuring these variables is an important part of program evaluation. Summary The goals of any campaign should be explicit and realistic. Detailed data about target audiences—their beliefs, feelings, actions, habits, perception of risks, use of mass media, and so on- should be considered when choosing the content and style of messages and the media through which they will be distributed. Use mass media in combination with interpersonal communications and efforts to obtain organizational support. Finally, use appropriate research, from pretesting of materials in the developmental stage to evaluating the results of the campaign as implemented. ------- 100 -i ESTIMATED % OF POPULATION MEETING EACH CONDITION FOR A GIVEN MESSAGE (hypothetical data) 50 - 05 n a i C5 a m M 2 H It W C« ^* C/i E" Sj f» OPPORTUNITY FOR EXPSOURE ACTUAL EXPOSURE ATTENTION MOTIVATION RECALL LEARNING OPPORTUNITY FOR ACTION s. §. OQ CO "8 n s o ct ------- Focusing on the Audience Marilyn Rice Although the topic of this paper is developing the message, 50 percent of what risk communicators actually develop is a description of the audience. Their role is to stimulate an interaction between the audience and the message in order to achieve some outcome, perhaps a behavior change. In this light, the initial task of a risk communicator is not to identify materials to develop, but rather to ascertain what needs to be achieved. Four factors motivate people to listen, learn, and take action: • Perception of need—unless the audience perceives that it needs the program benefits, motivation will be difficult. • Foreseeable risk or benefit—there are three levels of motivation— - Individual: A risk or benefit to the individual from taking or failing to take a given action. - Peers and role models: The persons who influence the individual to take an action. - Broader social context: The cultural norms and traditions that might influence an individual's actions. • Previous experience and habits — Communicators need to be cognizant of individuals' habits and routines. When some new desired action is introduced, the motivation to change existing habits must be strong. • Attitudes and values—What do the people who risk communicators are attempt- ing to reach hold important? The values of the audience, or program recipients, may be different from those of the communicator. Channels Two major channels of communication are mass media and interpersonal commu- nication. Mass media (such as television, radio, and newspapers) attempts to reach many people in a short time. The benefits of this approach must be balanced against the cost. 83 ------- 84 Developing the Message Another consideration is whether the message can be tailored effectively to different audiences on such a large scale. Conversely, interpersonal channels (counseling, group sessions, question and an- swer sessions, and so on) are valuable in clarifying and reinforcing information, but limited in their ability to reach large numbers of people. Designing Messages Key points to consider in designing messages are these: • Who is the audience, and what are their differentiating characteristics (e.g., educational level, cultural background, age, sex, and socioeconomic background)? • What is the purpose of the message? Is it expected to stand alone or to serve as part of a broader program? • What options does the message present? It is important to give the target population options to avoid the perception that they are being controlled. Specify the consequences of these options, which need to be simple and accessible. For example, responding to a survey or going to a meeting are relatively simple and straightforward options. Risk communicators also need to spell out the incentives for each option. Developing Materials There are two types of evaluation of educational materials: 1) evaluation of materials currently being developed, and 2) evaluation of materials previously developed. Five principles guide the development of materials: • Develop educational materials from the community perspective. This relates to development of materials for a given audience. It may be beneficial to sample the community to ascertain its perspective. • Ensure that materials arean integral part of ahealth education program. Note that the materials should be part of a program, not the entire program. Materials by themselves are not a program. How will the materials reinforce each other and contribute to the objectives of the program? Conflicting information should never be disseminated; the audience's receptivity to further materials will be damaged. • Relate materials to health service delivery. If the risk communication materials are informing of the availability of a service, ensure that it is in fact available when and how it is advertised, or credibility will be damaged. Be aware of the potential response, and plan to have adequate supplies of whatever services are being advertised. • Pretest all materials. A formal pretest is important to ensure the usefulness of the materials. This entails exposing a sample of the actual audience for the materials and incorporating any revisions suggested by the feedback. The pretest should yield feedback on the following: ------- Focusing on the Audience 85 - Attractiveness: do the materials gain and keep interest? - Comprehension: is the message understandable? - Acceptability: are materials in concurrence with the beliefs and norms of the audience? - Ownership: does the audience identify with the message? - Persuasiveness: does the message convince the audience to make an attitudi- nal and behavioral change? Include instructions for use when distributing materials. This point seems simple and straightforward, yet instructions are often understated or not included. Much time and effort will be wasted if materials are not used properly or not used at all. Exhibit 1 includes some questions to help evaluate printed materials. ------- 86 Developing the Message Exhibit 1 CRITERIA TO EVALUATE PRINTED MATERIAL On a scale of 1 to 5, indicate the extent the criteria are met, with 5 being totally and 1 not met at all. SPECIFIC CRITERIA 12345 1. Does it fully present one specific theme? 2. Is the content or message easily understood? 3. Do the illustrations clarify or complement the written parts? 4. Is the size of the letters easy to read? 5. Does it provide a synopsis of the message or content? 6. Does it have aspects that emphasize important ideas, such as type size, style or color of certain parts? 7. Are the writing style, grammar, and punctuation appropriate for the audience? 8. Does it avoid information overload or too much writing in one place? 9. Does it use language easily understood by the target audience? ------- TRACKING PROGRESS ------- Issues to Consider for Evaluation Design Judy Shaw and Jeanne Herb Tracking progress in risk communication means tracking changes in public involve- ment and control: Has the public become part of the decision process? An example of change in public control can be observed in the doctor-patient relationship. Today, patients ask the doctor questions pertaining to an upcoming operation, go to another doctor for a second opinion, or even refuse the operation. For the public to become involved in the decisionmaking process, a dialogue is needed. Dialogue entails: • Education about risk • Institutional mechanisms • An understanding of what citizens think With these three components in a dialogue, evaluation can occur throughout the process of decisionmaking. Unfortunately, many organizations are afraid of evaluation because it may reveal flaws in policies and planning. Evaluation provides the following benefits: • Awareness of the public's response (e.g., are they understanding the message?) • Awareness of the behavioral change in the public and what caused the change • The option of improving a communication strategy (and sometimes policies within the organization). When a communication fails to change public behavior, it may be for a number of reasons: • The message was not conveyed appropriately. • The public does not trust the person/organization delivering the message. • The public feels the organization is dealing with a small problem (such as oven gas wastes) instead of tackling what it perceives as the major problem (such as the incinerator about to be constructed down the block). 89 ------- 90 Tracking Progress However, the specific reasons for success or failure cannot be determined without evaluation. The evaluation design determines the type of information that results from the evaluation tasks. Some kinds of evaluation will result in information about changes or outcomes. Or evaluation can be designed to look at the process, i.e., the factors that produced the results. In either case, the analysis needs to consider both desired outcomes and unexpected outcomes in determining whether the program was "successful." Knowing an organization's goals is the first step towards message design and delivery. Indeed, a careful review of an organization's goals may reveal that a risk communication program is not timely or appropriate. If a risk communication program is desirable in the context of the goals, then both its evaluation and expected outcomes should relate back to the goal. The goal should remain constant, but objectives or intermediate steps to reach the goal may change as risk communication effects change. A tracking system to monitor any such changes is essential. When evaluating a risk communication, one should know whether goals are changing (and, perhaps, incorporate new goals accordingly); the audience is changing; and the message is being received as intended. Pretesting (e.g., asking people if they think an important question or issue has been missed or left unanswered) ensures that an important aspect of the project is not overlooked. If an evaluated risk communication does not have measured success, it does not mean the communication effort was not successful. It could mean that there were other overriding effects that ran counter to the objectives of the effort. Under other circum- stances, the same communications effort may have worked. In some cases the public will not listen to any communications, spoken or written. One such example was a lake infested with arsenic; some people did not care what was in the lake and wanted to use it regardless of the arsenic. Furthermore, communications vary with each risk; what works in one instance may not work in another. ------- Tracking the Health Objectives for the Nation James A. Harrell In general, objectives are used for evaluation, planning, or management purposes, and they help focus, structure, and mobilize a program or activity. Objectives are a valuable planning tool because they are both measurable and specific. Objectives should translate abstract ideas into something concrete; specifically, they are used to: • Establish priorities (e.g., reach a consensus on which issues to address) • Manage programs by answering questions such as —When? (By a certain date or year) —How much? (What percentage) —Who? (Target audience) —What? (Topic) • Identify concrete signs of progress • Indicate challenges (strengths and weaknesses of a program initiative) Management by objectives (MBO) is a decisionmaking process that assists in the planning, implementing, and evaluating of a program or activity. The MBO process has five classes of objectives: outcome, strategy, productivity, marketing, and innovation. Disease prevention and health promotion activities use outcome (e.g., morbidity and mortality reduction) and strategy (e.g., controllable risk factors) objectives. The 1990 Health Objectives for the Nation: A Midcourse Review, coordinated by the U.S. Department of Health and Human Services, Office of Disease Prevention and Health Promotion, examined the status of 226 health objectives. These objectives were issued in 1980 asaresultof the 1979 publication. Healthy People: The Surgeon General's Report on Health Promotion and Disease Prevention. The 226 objectives were the result of a consensus by a multitude of groups. These groups chose objectives that followed trends but also posed challenges. In addition, the objectives were required to address a problem that was preventable or controllable. These objectives addressed problems at the national, state, and community levels and were used to build disease prevention and health promotion programs by many agencies. 91 ------- 92 Tracking Progress Healthy People: The Surgeon General's Report on Health Promotion and Disease Prevention announced five national health goals for enhancing the health of the U.S. population among five major age groups: infants, children, adolescents/young adults, adults, and older adults. The 1990 Health Objectives for the Nation: A Midcourse Review assessed the status of 226 health objectives developed as a result of Healthy People, and found the following: • 34.5 percent of the objectives were on track. • 26.5 percent of the objectives were unlikely to be achieved. • 13 percent of the objectives had been achieved. • 26 percent of the objectives could not be assessed because of lack of data. The negative findings are important for judging progress and identifying remaining needs. We learn most from those objectives for which we are not on track because the assessment indicates what still needs to be done as well as what has not worked. Besides setting specific and measurable goals, the national health objectives have provided a blueprint or frame of reference for state and local disease prevention and health promotion activities. Their use at state and local levels has been idiosyncratic; in some places it is very structured, and in others a more grassroots approach is used. What is important is that the national objectives have been used at all levels. The national health objectives are now being revised to serve as a guide for designing intervention and evaluation strategies between now and the year 2000. The twenty priority areas for preventive interventions for the year 2000 objectives are: 1. Reduce tobacco use 2. Reduce alcohol and other drug abuse 3. Improve nutrition 4. Increase physical activity 5. Improve mental health and prevent mental illness 6. Reduce environmental health hazards 7. Improve occupational safety and health 8. Pievent unintentional injuries 9. Rduce abusive and violent behavior 10. Improve oral health 11. Improve maternal and child health 12. Immunize against and prevent infectious diseases 13. Prevent and control HIV infection and AIDS 14. Prevent and control sexually transmitted diseases 15. Reduce teenage pregnancy and improve reproductive health 16. Prevent, detect, and control high blood pressure and high blood cholesterol 17. Prevent, detect, and control cancer 18. Prevent, detect, and control other chronic diseases ------- The Purpose of Tracking Progress James L Regens Increasingly, public officials are using program evaluation techniques in an effort to monitor the effectiveness of risk communication strategies. Because program evalu- ation involves systematic attempts to measure consequences (i.e., outcome or impact evaluation) or operations (i.e., process evaluation), it offers an attractive option for producing information about how decisionmakers can produce deliberative changes in risk factors as part of an overall plan for managing environmental hazards. For example, process evaluation encompasses a variety of considerations with respect to program operations. Were the pamphlets distributed? What problems were discussed? How many people attended the meeting? Answers to such questions are helpful in assessing the needs of the target population and ascertaining the most effective means of distributing materials to that audience. Tracking progress during the implementation phase of risk communication activities draws attention to significant structural elements— program components, outputs, objectives, and effects—so that the program can be modified as needed. Other informational objectives of tracking exercises include answering outcome or impact evaluation questions: Can the program be repeated and/or did the program make a difference? A serious examination of the actual content of the risk communication program being evaluated can prevent or reduce the occurrence of some of the obvious but often repeated failures of prior programs. There are a number of reasons for tracking progress. First, evaluation helps explain choices. The products or endpoint of evaluation can be used to clarify responses to risk. In addition, a well designed monitoring system gives better ongoing evidence about program accomplishments. For example, if information about the program components—such as the message being communicated, mechanisms for evaluating target audiences, and impact/outcome—is incorporated into a tracking program, evaluators can obtain informa- tion about how program accomplishments are achieved and about the program's impact for purposes of modification. That is, both program intention and content can be evaluated. 93 ------- 94 Tracking Progress Second, tracking can demonstrate the kinds of problems that may arise as a risk communication program is implemented. Tracking allows practitioners to maintain relevance throughout the life of the program. Moreover, tracking directs attention to data needs. Finally, progress or lack of it can be tracked and necessary changes or adjustments made. Information about progress can be used to obtain several kinds of information about a program: • Better evidence about the program's usefulness and its context • Better information about the kinds of problems that arise during program implementation • Information on the nature of outcomes • Ideas for alternative strategies for dealing with the situation In designing a plan to evaluate risk communication efforts, the following series of key questions can help focus attempts to track progress: • Why conduct a risk communication program? • What am I trying to obtain from my risk communication program? What kinds of information do I need? What do I want to know? What kinds of questions will lead to the information needed? • Is the information to be used to help inform the decisionmaking process and clarify goals or objectives? For whom and through what mechanism? • Is the design appropriate for the study? Careful attention to framing responses to each question before initiating risk communication activities can increase the likelihood of program success. Moreover, ambiguous results and lack of understanding between the messenger and audience can occur unless there is clarity of presentation and timeliness. This underscores the need for evaluators to recognize that effective risk communication is a continuum, not a dichotomy. Clearly, evaluation can be an important part of a management information system. There are several objectives in tracking that illustrate this point. First, a well designed tracking system makes it possible to detect important events and interactions among events. Second, it can generate information during the course of the program in order to identify the nature and significance of such event. Third, tracking systems provide continuous awareness and evaluation of trends to guide choices of action. Moreover, as part of a comprehensive management information system, the risk communication program's tracking activities direct attention to data needs which might otherwise be overlooked. Finally, the insights obtained from such ongoing, systematic appraisal can inform decisionmakers of the need for anticipatory action and stimulate proactive instead of reactive management. In summary, tracking progress: • Helps decisionmakers identify what they are giving up for the sake of accommo- dating organizational and political pressures • Maintains study relevance • Provides early warning about things that are going wrong • Aids in making mid-course corrections ------- Tracking the Health Objectives for the Nation 95 Planning on Evaluation For tracking, the kinds of resources potentially available to agencies include: internal staff specialists, the general pool of employees in the organization, an internal ad hoc team, internal management personnel, or outside contractors. For example, obvious sources for obtaining resource materials include experts in the Environmental Protection Agency or the General Accounting Office. Such resources can provide insights into the following considerations for planning and evaluating a risk communication program: • What criteria are to be used to judge a program? What can be proved or disputed? • What kinds of outcomes are likely to emerge? • What may be alternative strategies in dealing with a situation? • What is most easily evaluated? • In monitoring a program for ecological effects, what kind of data are needed, how can the data be manipulated, and what do the data tell? • To inform decisions, what minimum kinds of data are needed? • Is there internal validity of program design? (This can help sidestep a major problem: uncertainty about whether or not the perceived change is a result of the program.) • Will results be quantifiable? If evaluators are to keep their work relevant to tracking progress, they need ongoing information about a variety of program elements. For instance, it is important to monitor what is happening in day-to-day activities. Continuous monitoring helps to identify new questions that the evaluation will be expected to answer. It also can aid in pinpointing changing conditions, which can create a situation in which the program should try to achieve different goals from those originally set. Finally, continuous monitoring makes it somewhat easier to detect unexpected developments and changes that are significant from a scientific standpoint. Making the best, direct use of evaluation results requires the following: • Clear decision points illuminated by specific questions • An evaluation design appropriate to the purpose and a completed study supplying evidence on questions identified for study • Unambiguous results • Clarity and timeliness of presentation to appropriate audience • Congruence of values • Relevance of results to contemporary situation • Lack of external pressures that constrain choices made by decisionmakers • Sufficient resources to apply findings in the context of the risk communication program • Authority to change or to modify the program as indicated. ------- 96 Tracking Progress In summary, tracking makes sense if only to monitor a program and even if it consists only of process evaluation. It is a good management tool, which tells the practitioner whether a program is working as planned. In addition, tracking is helpful for informed error correction. The monitoring system should be crafted to match the communication program. Tracking can provide the information needed to make decisions about whether or not continued allocation of resources is justified, by answering these fundamental questions: Is the program working? Is it something we should continue to do? ------- Benefits to Conducting Midcourse Reviews Max Lum An obvious but often overlooked point with regard to evaluation is that a program must be implemented before it can be judged. The realities of implementation are: • The program may not actually exist in the community where it has been "implemented" in the form originally intended. • Final acceptance by the community is never certain, even if program imple- mentation has occurred. • Implementation always contains unknowns that may change the original objectives and the intent of the evaluation design. Studies of federal programs have concluded that implementation problems are the reasons that programs are most often unsuccessful. The main problem in program implementation is that managers often make the assumption that the process is rational and quantitative. However, in reality, implementation usually does not fit the clear research and development model; it lacks specificity and there may be little active "user"—or public—involvement in the model. The types of information one can obtain about and during implementation are routine management information (e.g., costs, numbers of people involved); process information (e.g., what happens during implementation); and treatment information (e.g., how it is being implemented, what treatments are being used, and what effects have occurred). Barriers to effective risk communication program implementation include: • Personnel turnover; understaffing • People refusing to give up their own ideas and conform with the planned program strategies • Emotional outbursts; conflicts between staff and the public • Muddled communication about what implementation entails • Lack of anticipation of problems and plans for handling them • Poorly composed objectives 97 ------- 98 Tracking Progress • Undue haste in implementing program without sufficient planning, training, or agenda setting • Compulsion to spend money before fiscal year ends without attention to planning or realistic expectations for implementation • Management conflicts, differing points of view and goals • Insufficient or unskilled planning Although midcourse review cannot, of course, prevent these problems from occur- ring, such a review can identify problems, both anticipated or unanticipated, that can prevent a program from reaching its objectives. A major benefit of midcourse review is identifying problems at a time when corrections can be made, to try to assure that program objectives can be met. Whether the problems stem from incorrect execution of plans, faulty judgment in planning, or unexpected circumstances, the purpose of this kind of evaluation is problem identification. The design of a midcourse evaluation must consider the transition of the program from the design stage, which may have reflected a logical, or "ideal," situation into the "real" world, where influences within and outside of the program manager's control will affect the program outcomes. Therefore, in designing and implementing a midcourse review, program managers should consider factors such as these: • Is the process of implementation formal or informal? • Is control centralized or decentralized? • Is management authoritarian or participatory? • Is the program structure hierarchical or egalitarian? • Is the community divisive or cohesive? • Is the program isolated or community oriented? • Are the methods of communication standardized or individualized? • Is response and interaction controlled or expressive? • Are strategies partitioned or integrated? Finally, it must be recognized that midcourse review is but one of many useful evaluation strategies. If the program is a one-time effort with sufficient depth, length, and resource to make corrections, this might be the most important strategy choice to assure that program objectives are met. ------- Deciding on the Extent of Evaluation Elaine Bratic Arkin There is no one answer to what kind of—or how much—evaluation should be included in a risk communication program. A number of factors contribute to the decision about what evaluation tasks to undertake. It is essential that evaluation considerations be included in the planning phase of a program to assure that adequate time and resources are allocated and that any preintervention tasks, such as collecting baseline data, can be accommodated. Why Evaluate? Evaluation offers a number of benefits to risk communication managers. Formative evaluation (such as pretesting program message strategies or draft materials) promotes effectiveness, indicates potential problems, and permits revisions prior to expending final production budgets or moving a program into the field. Formative research or evaluation also can provide a more complete understanding of the problem and the population affected, building a stronger rationale for the interventions that will follow. Process evaluation, such as tracking the effects of a program underway, can alert the manager to the strongest and weakest program components, allowing mid-course ad- justments. Therefore, both formative and process evaluation help managers determine whether an activity or program can be improved. These kinds of measures can provide some predictors of program effects as well. Adding outcome evaluation tactics to a risk communication program can provide evidence of whether the program or activity works. Outcome measures also can uncover other program consequences (unexpected effects); provide the basis for deciding whether additional interventions— and what kind—are needed to reach program goals; justify expenditures for similar activities; and generate ideas for new interventions and programs. Outcome and impact evaluation data can provide a strong response to the need for institutional, public, or political accountability, and can help the program manager, agency policymakers, or others determine the cost versus benefit of the program. 99 ------- 100 Deciding on the Extent of Evaluation Sometimes, evaluation components are included in a program because of agency requirements, public demand, or political or interest group pressure. No matter what the incentive or requirements are, there can be strong benefits to evaluation. Risk program managers should be aware of these benefits, carefully assess which kinds of evaluation will be of greatest value, and incorporate the most appropriate evaluation tasks into their risk communication efforts. How Can Barriers to Evaluation Be Overcome? A number of barriers to conducting appropriate evaluation exist. Some can be overcome; some cannot. Managers can develop strategies to avoid or deal with such barriers, but also must be prepared to assess whether an obstacle or problem will so undermine the integrity of a planned evaluation that the evaluation should be reconsidered, restructured, or abandoned. Some obstacles may be integral to the risk communication problem to be addressed or the activities planned. For example, the need for an emergency response will probably prevent conducting formative evaluation, or any evaluation requiring the collection of baseline data. Emergencies also may tax an agency's ability to respond, and time and resources may not be allocated to evaluated tasks. Nevertheless, an analysis of evaluative data after the fact (such as reviewing the effect of agency actions, the quality and extent of media coverage, or public response) can help the agency determine how well it responded to the emergency and help plan for similar situations in the future. Although many emergencies appear to be unique, there are few instances where lessons cannot be learned about staff capabilities to respond, intra-agency coordination, and logistical procedures that work or need to be rethought. Although not deemed emergencies, some other situations may prevent or handicap evaluation. Sometimes a program manager faces a short deadline, and optimal evaluation must be sacrificed in lieu .of shortchanging implementation tasks. In this case, a sound communication program plan with evaluation tasks intertwined may provide justification of the need for more time. Similarly, a lack of trained staff or sufficient resources can hamper evaluation attempts. Staff knowledge sometimes can be supplemented with advice from other agency offices, other agencies, institutions, or universities. Even if an evaluation position cannot be supported on staff, staff can be trained to conduct some evaluation tasks, perhaps with the guidance of evaluation experts. Program managers must be able to judge whether simple evaluation steps can be undertaken by their staff, when outside help is needed to plan or carry out an evaluation, and when a lack of expertise or resources should lead to a decision not to evaluate. A poorly designed, administered, or analyzed evaluation can result in a waste of resources, faulty conclusions, and an agency bias against evaluation. Staff or agency interest, political needs, or investment in an intervention also may make selfevaluation unwise, even if the staff has the necessary skills. In this case, the program manager may need to decide whether funds can be secured to underwrite evaluation by a more neutral party. The risk communication program design itself may preclude certain types of evaluation. For example, the intervention may be too shallow or the time frame too short to expect a measurable impact. On the other hand, the intervention may be based on well ------- Deciding on the Extent of Evaluation 101 established, previously evaluated strategies; in this case, resources may be allocated more usefully to other aspects of the program. Program schedules or resources may not allow collection of baseline data, eliminating the possibility of some kinds of evaluative measures. And even the most careful evaluation plans can be laid waste by unexpected events affecting the intervention or the sponsoring agency. However, process or formative measures still may be valuable. Institutional resistance is one of the most frequently cited reasons for not evaluating. Agency decisionmakers may not appreciate the value of evaluation, may have internal policies that make conducting an evaluation difficult, may have other spending priorities, may disagree about a program's objectives, or may not want to find out the effects of an intervention (and be held accountable). These barriers usually can be overcome, although perhaps not in a short time. Presenting sound, clear, and understandable justification, including program accountability, for evaluation tasks, and showing examples of how other evaluation findings have been useful and applied can overcome agency resistance. What Kind of Evaluation? Decisions regarding the kinds of evaluation to conduct are based on agency support, understanding, and needs; resources; the program design; desired outcome; and related future agency activities addressing the same or similar problems. An ideal risk communication program would be designed to accommodate a balance of formative, process, and outcome evaluation measures, because each kind of measure contributes differently to the quality of the program and an understanding of its effects. However, few programs are structured to accommodate all of these kinds of evaluation, and few agencies enjoy the resources and commitment to support elegant schemes. Each type of evaluation serves a different purpose. Formative evaluation helps identify potential problems and refine program elements before full-scale implementation. Formative evaluation techniques may be more useful than other kinds of evaluation when there is sufficient time allocated for program pretesting and revision, and when a program is a one-time effort, with no opportunity for refinement in the field. Process evaluation strategies provide some indications of program effects and are particularly useful for program management. Process evaluation can identify logistical and other program problems in time for correction while a program is underway. Such indicators also supply some evidence of success and failure when other kinds of evaluation are not feasible or affordable. Outcome evaluation assesses the effect of a program or strategy after it has been implemented. Outcome measures are important because they go beyond how a program worked to address what changes occurred in the target population as a result. Such measures help a program manager decide whether a particular program or strategy was sufficient to resolve the problem, to decide whether and what kind of additional efforts will be needed, and to justify support for using such strategies with similar situations in the future. ------- 102 Deciding on the Extent of Evaluation How Much? Deciding how much effort and resources to devote to evaluation is frequently the most difficult question to answer. Sometimes a manager must forfeit desired evaluation tasks to the urgency of an intervention or lack of resources to support both implementation and evaluation (although with a little creative thinking, some kind of evaluation is affordable for almost every budget). At the other extreme, a risk communication program may be designed to test a specific strategy, or series of intervention strategies, with the expectation that a successful model could be widely used. In this case, it is not unusual for the evaluation costs to exceed the communication development and intervention costs, with these expenditures regarded as an investment in future risk communication program efforts. Thus one major determinant of how much evaluation to include in a program design revolves around the program purpose. If the program elements are on trial for replication and application to similar situations, a more elaborate evaluation may be justifiable as a wise investment. Summary Deciding upon the type and extent of program evaluation should be an integral part of risk communication program planning, when there is an opportunity to consider resource allocation for evaluation. Some kinds of evaluation can serve as a powerful management tool for a program manager; frequently, these evaluative tasks are conducted prior to or during program implementation and cannot be relegated to last minute decisionmaking. Other evaluation tasks are designed to measure the extent of program success and failure, and why efforts did, or did not, work. Evidence of success is important to justify current and future risk communication efforts. Perhaps even more valuable is the identification of failures, so that strategies can be altered, mistakes corrected, and future failure avoided. Considering these questions can help the risk communication program manager make the difficult and important evaluation choices: • How urgently must the risk communication problem be addressed? • What would be the consequences of failure? • Is there management support or public demand for program accountability? • How long will the program be and how much will the total effort cost? • Are the program objectives measurable in the foreseeable future? • How else might the problem be addressed in the future? Will an analysis of program effects be used for planning additional efforts? • What aspects of the program best fit with agency priorities? • Will an evaluation report help communication efforts compete with other agency priorities for future funding? ------- Matching Your Needs with an Evaluator's Capabilities James W. Swinehart, Shelagh Smith, Vicki S. Freimuth, Charles Darby Several types of services are available to the evaluator from academia as well as small and large consulting firms. It is valuable to know how to choose the best services for optimum effectiveness. Academic Services The partnership between the risk communication practitioner and the academic can work to the advantage of both, but certain constraints, such as time and cost, do apply. The advantages to using the services of academics include assistance in overcoming these constraints and access to expertise not always present in the practitioner's own agency. Academics are conversant with the latest literature on a subject and can help translate theory into action. Furthermore, students can be a source of low-cost or free help for labor intensive projects. For the academic, these evaluation projects can be a source of valuable data and are useful instructional tools. When students are employed on projects outside the university, they gain realistic experience and begin to forge their own networks. An example of an effective evaluation service provided by an academic is a recent analysis of the Cancer Information System database. This proved to be a good match between the practitioner's needs and the evaluator's special expertise. The evaluation's objectives should always be included in the planning process for a risk communication program. The evaluation strategies and the type of help needed from academia should be based on those objectives. Consultant Services Evaluation contractors can provide assistance with the technical development of an evaluation plan as well as the implementation of the plan. Many companies have the staff and flexibility to assist with data collection as well as design and analysis for qualitative and quantitative evaluation methods. Design and analysis services can include setting 103 ------- 104 Matching Your Needs with an Evaluator's Capabilities objectives, sampling, developing the instrument, choosing methodology, securing Office of Management and Budget clearance, statistical and qualitative analysis, and interpreta- tion and reporting. Qualitative data collection is used in concept or materials testing. The most commonly employed techniques are focus groups, central location intercepts, small-scale executive interviewing, gatekeeper reviews, and needs assessments. Quantitative data collection involves analysis of existing data or large-scale surveys conducted in person, by telephone, or by mail. Some suggestions for making the relationship between practitioner and contractor work include: • Recognizing the advantage of a contract • Setting objectives jointly and with clarity • Making the contractor a technical partner • Tracking costs and progress and assuring accountability • Allowing the contractor the latitude necessary to do the job Selecting Assistance The choice of evaluation assistance can be made from among academia, full-service providers, or firms that specialize in certain facets of evaluation, e.g., focus groups, central location intercepts, market research, analysis of program issues, special populations, and executive interviewing. The appropriate selection will depend, to some extent, on the risk communicator's determination of the type and extent of evaluation services needed. To make such a determination, the risk communicator first poses the question(s) to be addressed through evaluation and then considers the options for answering the question (and their respective costs). Here are some examples: 1. QUESTION: WERE THE CAMPAIGN MATERIALS REGARDED AS AP- PEALING AND UNDERSTANDABLE BY THE TARGET AU- DIENCES? METHOD: Testing with persons representative of designated audiences COST: Estimated range of $ 10,000 to $40,000, depending upon number of items and number of audiences 2. QUESTION: WERE THE CAMPAIGN MATERIALS REGARDED AS AP- PEALING AND APPROPRIATE BY MEDIA GATEKEEPERS? METHOD: Interviews and/or questionnaires with appropriate persons at TV networks and stations, cable systems, radio networks and stations; magazine and newspaper editors; others as appropriate COST: Estimated range of $5,000 to $20,000, depending upon number and location of persons interviewed 3. QUESTION: DID THE CAMPAIGN MATERIALS ACHIEVE A LEVEL OF DISTRIBUTION AND PLACEMENT THAT GAVE THE TAR- GET AUDIENCES ADEQUATE OPPORTUNITIES FOR EX- POSURE TO THEM? ------- Matching Your Needs with an Evaluator's Capabilities 105 METHOD: COST: 4. QUESTION: METHOD: COST: 5. QUESTION: METHOD: COST: 6. QUESTION: TV spots: reports from persons in agencies distributing these items, plus monitoring of on-air placements by tracking firm such as Broadcast Advertisers Reports Radio spots: interviews/reports from persons in agencies distribut- ing spots, and radio station personnel (reports on placement to include, where feasible, printouts from stations regarding dates and times of airings) Print ads and articles: interviews/reports from persons in agencies distributing ads and articles (reports to include copies of ads or tear sheets of articles as published) Other materials: interviews/reports from persons in agencies dis- tributing these items, and, as appropriate, from other intermediaries Estimated range of $25,000 to $75,000, depending upon number/ type/exclusivity of monitoring reports purchased, use of clipping services, number and location of persons interviewed as distribu- tors or users of materials HOW SATISFACTORY WERE THE ARRANGEMENTS AND PROCEDURES THAT WERE USED IN MAKING CAMPAIGN DECISIONS (SETTING OBJECTIVES, CHOOSING TARGET AUDIENCES AND COMMUNICATION STRATEGIES, DE- VELOPING MATERIALS, ETC.)? WHAT CHANGES, IF ANY, SHOULD BE MADE IN PLANNING FUTURE CAMPAIGNS? Using a set of specific questions, interviews and invited commen- tary from participants in the process (with assurances of anonym- ity) An estimated $5,000 to $10,000 HOW SATISFACTORY WERE THE PROCEDURES USED TO OBTAIN COOPERATION FROM OTHER ORGANIZATIONS (IF ANY WERE USED), AND TO COORDINATE ACTIVITIES WITH THEM? WHAT CHANGES, IF ANY, SHOULD BE MADEINTHESEPROCEDURESFORFUTURECAMPAIGNS? Using a set of specific questions, interviews and invited commen- tary from staff, coordinators for the campaign, and from represen- tatives of organizations involved in this campaign or parallel ones (with assurances of anonymity) An estimated $5,000 to $10,000 TO WHAT EXTENT DID THE TARGET AUDIENCES NOTICE THE CAMPAIGN MATERIALS AND PAY ATTENTION TO THEM? ------- 106 Matching Your Needs with an Evaluator's Capabilities METHOD 1: COST: METHOD 2: COST: METHOD 3: COST: 7. QUESTION: METHOD: COST: 8. QUESTION: METHOD 1: Three surveys conducted at appropriate intervals among samples representative of the designated target audiences, assessing both unaided and aided recall of campaign materials Estimated range of $ 15,000 to $50,000, depending upon the samples used, data collection procedures, and the possible opportunity to share costs with others (for example, through participating in multi- sponsor surveys) Analysis of sources of inquiries prompted by the campaign (made possible by use of keyed box numbers or other identification on campaign materials — e.g., Box A on radio spots, Box B on TV spots, etc.) Estimated range of $6,000 to $15,000 A continuing survey of persons submitting inquiries or requests prompted by the campaign (data to be obtained by enclosing a reply card or brief questionnaire when fulfilling information requests— perhaps in a tenth or fewer of the packages sent out, depending on the total number of requests received) An estimated $8,000 to cover reply postage and some analysis time on the part of staff TO WHAT EXTENT DID THE CAMPAIGN ALTER THE BE- LIEFS, ATTITUDES, AND BEHAVIORAL INTENTIONS OF THE TARGET AUDIENCES IN THE DIRECTION INTENDED? DID ANY CHANGES OCCUR THAT WERE OPPOSITE TO THE ONES INTENDED? A series of three surveys (conducted one month before the cam- paign begins, at its midpoint, and one month after it ends) with panels representative of the designated target audiences. Each of the panels should include in the second and third surveys two identifiable subsets of people: (1) those who are able to recall seeing or hearing campaign materials pertaining to them, and (2) those who have not seen any campaign materials pertaining to them and/ or are unable to recall these materials even with prompting. An estimated range from $15,000 to $50,000, depending upon the samples used, data collection procedures, and the possible oppor- tunity to share costs with others TO WHAT EXTENT DID THE CAMPAIGN ALTER THE AC- TIONS OF THE TARGET AUDIENCES IN THE DIRECTION INTENDED? DID ANY CHANGES OCCUR THAT WERE OPPOSITE TO THE ONES INTENDED? A series of surveys with samples representative of the designated target audiences, scheduled appropriately for each one (e.g., one ------- Matching Your Needs with an Evaluator's Capabilities 107 COST: METHOD 2: COST: 9. QUESTION: METHOD: COST: month before the campaign, at its mid point, one month after it ends, and one year later) with exposure to campaign either controlled or assessed Local or regional surveys may run from $5,000 to $25,000; national surveys $50,000 to $150,000 (Factors affecting cost size and nature of samples, data collection methods, number of open-end questions, etc.) Final analysis and summary of volunteered comments from persons noticing or using campaign materials (Note: Such anecdotal evidence of campaign impact is weak in comparison with the kinds of data produced by the surveys indicated above, but it can be extremely useful in illustrating the points made in more representative stud- ies.) An estimated $3,000 to $5,000, depending upon the number of items reviewed TO WHAT EXTENT DID THE CAMPAIGN STIMULATE IN- QUIRIES FOR INFORMATION ABOUT THE TOPIC(S) COVERED? A month-by-month tally of inquiries, classified by topic, source, and other relevant characteristics; such tracking to begin at least a month prior to the start of the campaign and continuing for a year after it ends No additional cost, assuming that tasks of this kind are already being performed (and will continue to be performed) by staff on a regular basis Once the evaluation needs have been outlined, criteria for selecting contractor support should be determined. The criteria should be based on the kinds of experience, skills required for the methods chosen, cost, and other considerations. These include: • Academic or other training • Experience with chosen methodology • Age, sex, ethnicity, other characteristics if important for working with a selected target audience • Samples of reports and/or references from previous clients • Population characteristics and recruiting procedures • Possible client conflicts • Contractual matters: schedule, confidentiality of findings (and topic, if desired), fees, review and possible revision of report, etc. ------- 108 Matching Your Needs with an Evaluator's Capabilities Resources The following guides and directories are valuable to anyone involved in risk communication evaluation. The Green Book—director of market research firms and services; new edition available in April each year; cost: $50.00 Available from: American Marketing Association 310 Madison Avenue New York, NY 10017 The Blue Book—directory of some agencies and organizations represented in membership of American Association for Public Opinion Research (AAPOR); new edition published each year; free Available from: AAPOR P.O. Box 17 Princeton, NJ 08542 Directory of Focus Group Facilities and Moderators: new edition published each year; cost: $25.00 Available from: National Focus Group Network 33 Junction Road Brookfield Center, CT 06805 What is a Survey?—provides flow chart of process and checklist of items to budget; free Available from: American Statistical Association 806 15th Street, NW, Suite Washington, DC 20005 Newsroom Guide to Polls and Surveys: cost $10.00 Available from: American Newspaper Publishers Association 11600 Sunrise Valley Drive Reston, VA 22091 ------- MEASURING ACCOMPLISHMENTS ------- Considerations for Planning Risk Communication Robert W. Denniston Measuring accomplishments begins at the beginning: with a thorough understanding of the risk communication program objectives and a program foundation based on realistic expectations for risk communication messages within a broader risk abatement program. Without adequate consideration in the program planning stage, appropriate evaluation cannot be designed, and measurement may later reveal basic, avoidable faults in program design. Planning risk communication messages necessitates a review of the problem, public knowledge, attitudes and behavior related to the problem, and an understanding of how the public views risk messages. Public Perceptions of Risk Messages The following are some of the general obstacles to public understanding of risk messages. • Risk is an intangible concept, and requires effort on the part of the public to understand. • The public does not understand relative risk and may underestimate or overesti- mate their personal vulnerability. • People seek absolute answers, and risk messages address intangible, invisible hazards without concrete outcomes. • The public reacts unfavorably to fear, and fearful messages may result in uncalled for outrage or, conversely, denial of an important risk. • The public has a strong tendency to underestimate personal susceptibility. • Individuals have contradictory beliefs that interfere with their understanding of risk messages; they can, at the same time, believe that "it can't happen to me" and "everything causes cancer." • Most people lack a future orientation, and threats that may materialize far in the future are easy to put aside. • The public does not understand science. Technical data, risk models and the variables involved in calculating risk, and the fact that scientific knowledge is not 111 ------- 112 Measuring Accomplishments static but evolving over time, all add obstacles to public understanding of risk messages. In addition, environmental risk messages are more difficult for the public to accept than personal health messages. For example: • People grasp easy solutions, and easy solutions to environmental risk problems may conceal many complexities and obstacles. • Individuals desire personal control over their well-being, and most environmental risks are more amenable to governmental or institutional control than individual control. • Individuals seek guidance adapted to the personal level, and most environmental risk messages look at the problem on a community or societal level. • Most people have more pressing priorities, including more immediate threats to their health. Assessment of Available Data Risk communication planning begins with a review of available data about the risk problem. The decision to develop messages for the public should be based on the answers to questions like these: • Is sufficient information available to explain the problem to the public? • Will more data be forthcoming within a reasonable amount of time? • Are there compelling reasons (e.g., issues of right-to-know, public or political pressure, need for public action) for informing the public even if insufficient information is available? • What is the purpose of developing messages for the public? • Are these expectations realistic? • Is the government or other responsible body prepared to handle public reaction or response? A realistic review of the answers to these questions will help shape risk communication messages. What Risk Communication Can—and Cannot—Do Risk communication by itself is not the answer but one component of the response to environmental and other risk situations. Well designed communications can increase awareness of a problem as well as the options for resolving it; can empower citizens to make changes in their own behavior, or work towards community change; and can provide support for effecting policy or institutional changes to resolve a risk situation. For risk situations in which there is no opportunity for personal action, particular care in message design is needed to reduce frustration and fatalism. In the long term, using risk communication to inform and motivate the affected public is crucial to ameliorating risky situations and assuring the public's health. Too often, risk messages are judgmental, rather than neutral or supportive. The public may hear contradictory messages from different sources about a situation. Although there may be different points of view about a risk, and different audiences need different ------- Considerations for Planning Risk Communication 113 types of information, clear, direct messages can reduce unnecessary conflict, misunder- standing, and mistrust in all cases. The design of risk messages must include consideration of what the public knows and understands about the issue, and what interests and concerns it has. The complexity of mostrisk communication messages, juxtaposed with the public's desire for answers and solutions, poses a challenge for designing risk messages that will further the program's purpose and not merely enrage the recipients. Formative evaluation, including message pretesting, is essential to assuring that messages are appropriate and effective prior to their dissemination. Conclusions Risk communication should be an integral part of larger risk programs, but will not produce a positive effect without careful planning and development. Careful planning includes a thorough understanding of the intended audience, including its knowledge, attitudes and interest in the subject; a clear statement of the intent of the communication (e.g., to inform, persuade, and/or influence attitudes or behavior); and formative evaluation prior to public dissemination to reduce the chances of unintended effects. ------- Four Factors in Designing Evaluation Strategies David McCallum In planning environmental risk communication programs, it is often difficult to define intended program outcomes, a difficulty that becomes evident in the developmental stages of these programs. Outcomes are more definable in some other kinds of health programs, albeit hard to measure (e.g., increasing the percentage of the population that has controlled blood pressure or eats certain foods). The difficulty lies in determining what kinds of responses and attitudes risk communicators want people to adopt. Frequently agencies set goals, such as "the public should make more informed decisions," but translating goals into specific, directive messages—as well as measurable objectives—is difficult if not impossible. This paper presents four planning considerations that can help overcome these difficulties. Defining the Problem First, communicators should begin with a definition of the problem and a statement of desired outcomes. It is helpful to differentiate between goals and objectives. Overall goals can then remain constantregardless of vested interests of program objectives. For example, with Superfund, the goal is to clean up waste sites and reduce exposure; the objectives will consist of intermediate steps toward this goal. Some of these steps may include risk communication strategies to produce specific outcomes, such as public awareness and participation. Most infectious diseases cause adverse effects sooner than do exposures to chemicals; because of the long latency period after exposure to toxic chemicals before the manifestation of adverse effects, articulating goals can be very difficult Risk communication program goals might include behavioral changes that people should make with objectives addressing the intermediate steps toward behavior change (e.g., awareness and attitude change). If people decide not to follow certain risk reducing recommendations, is it due to a change or difference in their values or a lack of access to information and other supporting services? 115 ------- 116 Measuring Accomplishments Frequently the public is characterized as irrational; that usually means that the public does not agree with the risk manager's explanation. Program planners need to be cognizant of and track attitudes, because these precede behavior change. Furthermore, they should determine the definition of the risk communication program success (e.g., an informed public, informed decisionmaking, public support, or changed behavior). Initially, do not focus too heavily on deriving measurable objectives. Although these are important and should be included where possible, the emphasis in program design should be on what information is needed to understand whether and how well the program is working. Program designers should understand the kinds of questions other people are likely to ask. For example, the framers of the Superf und Amendments and Reauthorization Act (SARA) Title III intended the overall goal to be a reduction in toxic emissions; but the communicators' objectives are to provide incentives for people to work together. Then, specify a numerical or percentage change in a measure as a result of program intervention, and be realistic. A 3- to 4-percent increase in the number of homes that are owner- tested for radon may sound low; restating this as a "statistically significant change" (which it is) may improve the perception of the value of the program. The 3- to 4-percent increase can represent, in some cases, a large absolute number. The risk communicator must provide a link between the communication objectives and the program goals. This link, the program rationale, answers the questions of policymakers and others about the purpose of the risk communication program. Risk communication is of ten undertaken after a population has been exposed to a risk factor, such as after exposure of workers to a toxic chemical. In these cases, initial intervention is not possible. In the short term, there may be an increased reporting of adverse effects from the exposure. Therefore, criteria for measuring objectives must be established in light of the natural history of the population so that, if there is an increase in adverse effects in the short term, the risk communication program is not criticized undeservedly. In program design, it helps to preserve the greatest number of options when measuring outcomes. Design "triggers" to tell whether unforeseen outcomes or results are taking place. This is important because in setting up objectives, an unforeseen outcome that is just as important as the stated objectives may be missed. For example, a program might be designed to increase people's knowledge about a health issue. While this objective is very hard to quantify and observe, another unforeseen program outcome could take place: People exposed to the risk communication program might visit a health clinic as a result of increased awareness. This is, in fact, a more desired outcome than increased knowledge alone. Yet, it could be missed as an outcome of the evaluation design did not include ways to track this type of behavior change. Program planners should identify early in the program design stage what the confounding variables could be. That is, there may not be the ability to control some factors that might influence the outcome. Program administrators need to know not only whether the objectives were achieved, but why (or why not). "What happened" should be identified. regardless of whether everything that happened was intended or not. If what went right— or wrong—cannot be identified, then the public next asks, "who went wrong." Evaluation is a useful tool to identify program dysfunctions. ------- Four Factors in Designing Evaluation Strategies 117 Planning How to Use Results Program administrators shouldknow how they will use either a positive or a negative result. In government programs, the value of a "success" may not be weighted as highly (in terms of desirability) as problems that could arise if the risk communication were unsuccessful. Therefore, situations that could be created as a result of the communication (e.g., public outrage, new legislation, a change in resource commitments) should be considered for inclusion in the evaluation design. It is also important to understand what program administrators should do with the results of the measurement of outcome or accomplishment. The results can be used to answer questions like these: • Should the strategy of the program be changed? • Should the program be marketed to increase the use of its effective components? • Can positive evaluation data be used to leverage additional resources from the community or other program supporters? Establishing Timeframes A third factor to consider in designing programs so that accomplishments can be. measured is timeframe. It is important to know the decision timeframe of, for example, a remedial investigation, so that evaluation of the outcomes of a risk communication program can be built into the plan. The timeframe for political, technical, and social processes may not coincide with the evaluation timeframe. For example, legislation often imposes timetables on a program; the effect is that program goals must be established to meet the specified timetable, rather than addressing a desired outcome of the program or being consistent with the natural timing of communication programs. Interim objectives that can be accomplished within the legislative timeframe are essential in this case. When designing the evaluation, it is very important to identify the timeframes of these processes. Failure to do so can lead to a situation in which, for example, the required time to achieve a change is longer than the timeframe for the evaluation, so results cannot be shown. Because of the inflexible nature of some of the political, technical, or social timeframes, it may be necessary to establish interim measures that can take place within the established timeframe. Using Resources A fourth factor to consider in the design of risk communication programs is the effective use of evaluation resources. What level of validity can be achieved with the resources available? This needs to be made explicit during the planning program. The greater the level of social controversy surrounding an environmental risk, the more resources will need to be allocated for evaluation. In such cases, data produced must be able to withstand intense scrutiny. "Cost saving" is a particularly tricky evaluative measure. For example, the cost- effectiveness of disease prevention programs became an issue with the emphasis on health care cost containment in the late 1970s and early 1980s. Results of studies showed that live people cost more than dead people; unless the "social good" of people staying alive was considered, it appeared to be more costly to prevent deaths. The programs, however, were cost-effective using broader measures. ------- 118 Measuring Accomplishments In summary, a number of factors must be considered in planning program evaluation and incorporated into the design of the program to assure that the needed measurements will be possible. These factors include: 1) a careful definition of the problem so that an appropriate intervention and outcome can be planned; 2) a clear purpose for the evaluation results; 3) a consideration of the varying timeframes involved; and 4) the best use of available evaluation resources. ------- Integrating Evaluation: A Seven-Step Process William H. Desvousges An Environmental Protection Agency (EPA) study has shown that the Agency has given too little attention to evaluating the effectiveness of its risk communication activities (EPA, 1987). Recently, its Risk Communication Program has taken some strides to address the lack of evaluations, but clearly, more needs to be done. Evaluating the effectiveness of a risk communication effort involves subtle consid- erations. For example, former EPA Deputy Administrator Milton Russell persuasively argues that the main challenge facing risk communicators is empowering individuals to make informed choices about the hazards that are under their control (Russell 1988). Yet, he acknowledges that there are legal, institutional, philosophical, and even cognitive limits to influencing how individuals make decisions involving risks, especially environmental risks. Recent studies for EPA have found that two factors in evaluating risk communica- tion effectiveness are often overlooked. First, the benchmarks used to measure effective- ness can have an important effect on the final assessment of a risk communication activity. For example, Johnson et al. (1988) have shown that an "informed consent" or "informed choice" criterion—one in which individuals have access to the best information but make their own choices—yields a different measure of effectiveness than an assessment based on individuals' following an agency's recommendations. Second, perceptual, cognitive, and behavioral measures of effectiveness are more reliable than simply asking people for then- evaluations of risk communication materials. In one study, for example, more than 85 percent of homeowners gave a fact sheet high ratings, while the learning and risk perception measures showed much lower levels of effectiveness (Smith et al. 1987). Even so, evaluations of risk communication programs must be practical. Common sense suggests that little can be gained from spending more on the evaluation than on the entire risk communication effort. This paper presents a comprehensive framework and a seven-step approach for evaluating risk communication. It argues that integrating 119 ------- 120 Measuring Accomplishments evaluation with risk communication will increase substantially the overall effectiveness of risk communication activities. Risk Communication Framework The most important step in developing a framework for evaluating risk communi- cation is to develop a clear definition of risk communication itself. Several experts expressed the need for such a definition at the first major conference on risk communica- tion sponsored by the Conservation Foundation (Davies et al., 1987). However, presen- tations at the 1988 Society for Risk Analysis meetings, which devoted several sessions to risk communication topics, showed that many participants used strikingly different definitions. A comprehensive definition for risk communication would include three important aspects of the communication process: • Perceptions: How do people perceive environmental risks? • Practices: What messages about risk are developed and how are they communi- cated? • Process: To what extent are various groups involved in the communications process? The communications process may affect how risks are perceived as well. Creighton (1988) suggests that involving key groups in the process early and providing several means of resolving conflicts can improve the chances for successful communications. Success with communications can be measured in several ways and from several perspectives. The following criteria commonly are used in evaluations: • Information delivery: Did the target audience receive the message? • Information processing: Did the targetaudience process the message "correctly"? • Information impacts: Did the target audience take the recommended actions? And, did they make informed choices? While these criteria are not exhaustive, they illustrate the subtle considerations that arise in evaluating risk communications. Information delivery simply asks whether people were exposed to the message. Under this criterion, the larger'the exposure, the more successful the program. The second criterion adds an additional consideration to the evaluation. Is "correctly" mainly a cognitive criterion? That is, does it require only that the target audience understand the risk communication message? Or, does it also imply that the audience should interpret the message in the way that the communicator considered correct? The same distinctions arise in the assessment of information impacts, except that behavioral changes are the main focus. The "informed consent" evaluation criterion may be the most appropriate (Johnson et al., 1988). Under this criterion, people are assumed to make their decisions on the basis of the best information available. A radon risk communication program may be judged successful even when homeowners, using sound information, choose not to test their homes for radon. Nevertheless, implementing such a criterion can be complex (Desvousges et al., 1988). ------- Integrating Evaluation: A Seven-Step Process 121 Despite the subtleties involved in developing a risk communication framework, the need for such a framework is critical. It is necessary to provide a clear definition of risk communication and a sound basis for evaluating its effectiveness. Seven Steps of Evaluation Two separate campaigns concerning radon used seven steps to evaluate risk communication effectiveness. One study, which took place in New York State, aimed to communicate with 2,300 homeowners throughout the state who had already tested for radon (Johnson et al., 1988). The other study was carried out in Maryland, and its objective was to inform homeowners in two communities about radon tests (Desvousges et al., 1988). EPA cooperated with the states in both studies. The seven steps used to evaluate risk communication effectiveness are as follows: 1. Define risk communication objectives 2. Design communication program 3. Determine measures of program effectiveness 4. Design effectiveness evaluation 5. Develop implementation plan 6. Evaluate 7. Determine communication effectiveness To define the risk communication objectives (Step 1), the purpose, significance, and constituency must first be defined. For example, what is the risk communication expected to accomplish? Why is the risk communication important, and how can importance be shown? Who will benefit from the risk communication, and why is it important to reach those people? Once these questions have been answered, clearly defined objectives can be stated. Designing the risk communication program (Step 2) demands attention to integra- tion and workability. Features include such activities as mailing informational brochures, setting up toll-free numbers for questions, or providing diagnostic assistance. Integration involves defining a message (e.g., radon is a serious health risk; you may be at risk), publicizing that message through the media (print, radio, or television), and targeting a specific audience for the message. Workability involves pretesting to see whether the program works. This can be done through focus groups, expert review, or one-on-one evaluations. Step 2 can be envisioned as a funnel. Many ideas and alternatives are narrowed down, and the best ones are chosen for use in the risk communication program. To determine how to measure program effectiveness (Step 3), participant evaluation or perceptual/behavior indicators are useful. However, participant evaluation is often misleading because, although participants might respond that a program was effective, the actual measured perceptual and behavioral changes might be small. Participant evaluation provides good qualitative but not good quantitative information. Step 4, designing effectiveness evaluation, involves defining the target population(s), identifying the experimental design, and setting controls or limits. Developing the implementation plan (Step 5) involves choosing a plan such as a sampling plan (e.g., stratified random sampling or random digit dialing) or a survey plan (e.g., baseline and follow-up telephone survey; mail follow-up survey). ------- 122 Measuring Accomplishments To do the evaluation correctly (Step 6), the program activities and the evaluation must be integrated; evaluating only after completing the activity does not allow valuable feedback to be incorporated into the activity to improve its results. Evaluation activities can include developing and using a questionnaire, training interviewers, and establishing quality assurances. Analyzing the data and summarizing the findings are necessary to determine communication effectiveness (Step 7). Simple or complex measures can be used to analyze the data. The most informative measures involve giving a pre- and post-survey question- naire and then estimating changes in behaviors and intentions. Simple measures usually involve changes in means or proportions. For example, in the Maryland study, EPA evaluated changes in the proportions of people aware of radon, changes in the mean number of correct answers on a radon quiz, and changes in the proportions of people testing their homes for radon. More complex measures involved developing models to describe changes in knowledge, attitudes, and behavior (Desvousges et al., 1988). The same basic data were used for both types of analysis, indicating the importance of carefully planning the overall evaluation process. Implications for Risk Communication Evaluation The radon experiences suggest four benefits that an agency gains from evaluating its risk communication activities: • Determining what works and what does not • Providing ideas for program changes • Establishing credibility • Enhancing program effectiveness The following recommendations are based on experiences in several evaluations of radon risk communication: • Make objectives explicit • Use attitudinal, perceptual, and behavioral indicators • Establish experimental controls • Pretest program materials and evaluation materials • Integrate evaluation design and analysis This paper has drawn primarily from experiences gained with evaluating risk communication for radon. Whether its conclusions apply to other risk communication experiences is an important issue that needs to be addressed in future studies. Another important need is an evaluation guidebook based on a comprehensive evaluation frame- work. ------- Integrating Evaluation: A Seven-Step Process 123 REFERENCES Creighton,J.L., 1988. A Comparison of Successful and Unsuccessful Public Involvement: A Practitioner's Viewpoint. Paper presented at the Annual Convention of the Society for Risk Analysis, October 31-November 2, Washington, D.C. Davies.J.C..V.T.Covello.andF.W. Allen. 1987.RiskCommunication.Washington.D.C.: The Conservation Foundation. Desvousges, W.H., V.K. Smith, and H.H. Rink. 1988. Communicating Radon Risk Ef- fectively: Radon Testing in Maryland. Overview and Summary of Survey Results. Final report prepared for Office of Policy Planning and Evaluation, U.S. Environmental Protection Agency. Washington, D.C., Research Triangle Park, North Carolina: Research Triangle Institute, October. Johnson, F.R., et al. 1988. Informed choice or regulated risk? Lessons from a social experiment in risk communication. Environment. 30:12-15,30-35. Russell, M. 1988. Risk Communication: On the Road to Maturity. Paper presented at the Workshop. Smith, V.K., et al. 1987. Communicating Radon Risk Effectively: A Mid-Course Evalu- ation. Prepared for the Office of Policy Analysis, U.S. Environmental Protection Agency, under Cooperative Agreement No. CR-811075, by Vanderbilt University, Nashville, Tennessee, and Research Triangle Institute, Research Triangle Park, North Carolina. United States Environmental Protection Agency, EPA. 1987. Unfinished Business: A Comparative Assessment of Environmental Problems. Washington, DC: The Agency, February. ------- UNDERSTANDING OMB PROCEDURES ------- OMB Survey Clearance Procedures Richard Eisinger The federal government's Office of Management and Budget (OMB) grants clearancesfor surveys, including those performed as part of a program evaluation. Any federally sponsored survey often or more individuals must go through the OMB clearance process. The Office of Information and Regulatory Affairs (OIRA) in the Office of Manage- ment and Budget (OMB) is responsible for granting survey clearances. The OMB is given the authority for survey clearance under three separate legal authorities, one of which is the Federal Reduction of Paperwork Act of 1980. The purposes of this legislation are to reduce the burdens placed on individuals and to coordinate the government's surveying activities. The primary authority for survey clearance comes from the President's Executive Order on Regulations, requiring OMB to approve all federal rules, many of which have reporting requirements. The OIRA is legally required to follow the President's orders. However, most OMB decisions are affected more by questions of survey design than by political policy. The OIRA staff are mostly economists, lawyers, and public policy experts, with a few social scientists. Therefore, there is a small staff with expertise in surveys and data collection. The facts that OMB staff consider for survey clearance are: • Duplication: Are there other sources of the same data? • Burden on the public: Will the survey require an unfair or unnecessary effort on the part of the public? • Cost. Can the survey be done for less money? • Practical utility. How will the results be used? This can be called the "So What?" factor. The OMB must be convinced that the results of an evaluation will be used to change something. This is a primary criterion for risk communication evaluations. Clearly, these are important questions to be answered in designing any survey, whether or not OMB clearance is required. 127 ------- 128 Understanding OMB Procedures Another important consideration is cost versus potential benefit. This was impor- tant, for example, in an evaluation done by the Food and Drug Administration (FDA) on Patient Package Inserts (PPIs), information sheets for consumers on the effects and possible adverse reactions of specific drugs, which the FDA Commissioner wanted all pharmacists to distribute at the point of prescription drug sale. The national cost would have been $20 to$100 million. The Rand Corporation studied the effects of PPIs and found there was an increase in knowledge but no change in behavior. Therefore, other less costly alternatives had to be considered as a result of the study. In this case, investment in evaluation prevented a requirement that would have cost society millions of dollars with unproven benefits to the consumer. ------- OMB Regulatory and Approval Requirements Susan E. Dudley The objective of OMB's Office of Information and Regulatory Affairs (OIRA) is to ensure that government activities do more good than harm. Two of the primary governmental activities that affect the public are 1) regulations, and 2) paperwork and reporting requirements. The OIRA is responsible for weighing the effects of these activities against their intended results to ensure net benefits to society. Regulations With respect to regulations, the OMB operates according to procedures outlined in the following two Executive Orders: Executive Order 12291—Federal Regulation (Federal Register 2/17/81); and Executive Order 12498—Regulatory Planning Process (Federal Register 1/4/85) The "general requirements" of Executive Order 12291 are itemized in Section 2, which stresses that net benefits should be maximized whenever any agency promulgates new regulations, reviews existing regulations, or develops legislative proposals concerning regulations. This Executive Order requires agencies to prepare Regulatory Impact Analyses (RIA) for all major rules. Executive Order 12498 builds upon the previous Executive Order by requiring each agency, subject to the provisions of 12291 (which includes EPA), to submit to the OMB an annual statement of its "regulatory policies, goals, and objectives for the... year and information concerning all significant regulatory actions underway or planned" (Section 1). When a regulation is proposed by EPA, the agency is required to send four copies of the proposed regulation to OMB. Within OMB, these copies are distributed to: 1) the desk officer; 2) the budget examiner; 3) the regulatory analyst; and 4) the public file. The copy in the public file is not actually accessible to the public until the rule is published in the Federal Register. The OIRA's review is based solely on the regulatory agency' s record and comments from other agencies in the Executive Branch. The OIRA staff do not communicate with anyone outside the government regarding regulations and thus can focus on consumer welfare without being influenced by special interests. When the rule is approved, OMB notifies the Agency. 129 ------- 130 Understanding OMB Procedures The Executive Order anticipates the following review time requirements within OMB: • For proposed major rules, 60 days • For final major rules, 30 days • For nonmajor rules, 10 days However, the review period may be extended. The secret to ensuring a smooth review process is to demonstrate that the draft rule would make society better off; that demonstration includes an examination of alternative approaches to addressing the problem and a good RIA. Risk communication programs are not covered by the Executive Orders per se. However, because risk communication can be a good substitute for less efficient command-and-control approaches to regulation, an RIA may determine that such regulation is unnecessary. For example, with radon, perhaps individual homeowners can help ameliorate the problem, minimizing the need for federal regulation. The requirements of Executive Order 12291 (and 12498) are not affected by the recent court decision on vinyl chloride, according to which the Agency's determination of what is "safe" cannot consider cost, because the requirements of the Executive Orders do not override statutory requirements. Nevertheless, under the Executive Order, agencies are required to choose the least costly approach to meeting a statutory goal. Therefore, once a decision to regulate is made, then cost is considered. This is consistent with the court decision. Paperwork and Reporting Requirements Three federal documents are especially pertinent to paperwork and reporting requirements: the Paperwork Reduction Act of 1980; the Code of Federal Regulations (Volume 5, Part 1320): Control of Paperwork Burdens on the Public; Regulatory Changes Reflecting Amendments to the Paperwork Reduction Act; Final Rule (5/10/88); and "Information Collection Requests (ICRs)" (Fact sheet published by the EPA Office of Policy, Planning and Evaluation, dated 11/87). The ICR fact sheet summarizes the Paperwork Reduction Act of 1980 and the rule development process for EPA. The topic of paperwork and reporting requirements is more applicable than regulatory requirements to evaluation and risk communication. Paperwork and reporting require- ments pertain to notifications, surveys, questionnaires, and other types of information collection. The Paperwork Reduction Act directs the OMB to review and approve all collections of information from the public based on the following criteria: • The collection has practical utility. • It is not duplicative. • It is the least burdensome method to the government and the public of obtaining the information. The requirements for OMB approval pertain to any survey or questionnaire requesting information from ten or more persons. This includes mandatory and voluntary requests for information (see 5 CFR 1320, Subpart 1320.7: "Definitions," Paragraphs C(l), C(2), and ------- OMB Regulatory and Approval Requirements 131 Agencies are required to list the estimate of burden hours in the ICR. If the actual respondent burden of the approved survey is excessive, respondents may comment to the agency or to OMB. This mitigates somewhat against an agency underestimating the burden hours. One method of more accurately predicting burden hours is to pretest the survey instrument and observe how long it takes respondents to complete the survey. Pretest instruments need to be submitted for approval only if they will be used to collect data from ten or more respondents. The pretest instrument should be submitted as part of the ICR, which contains the draft instrument to be fielded. The OMB desk officer may request revisions in the pretest instrument even though it is going to fewer than ten respondents and does not require formal OMB clearance. When the ICR is sent from EPA to OMB, a copy goes to the desk officer, who may request review by a budget examiner. Normally, the OMB is required to respond within 60 days of its receipt of the agency ICR, but this can be extended to 90 days. If OMB takes no action within 60 days, and there has been no extension, EPA can request an OMB control number, which OMB is obligated to provide. At the time of submission of the ICR, the EPA also places a notification in the Federal Register, informing interested persons to contact the OMB desk officer or the Agency for further information or comments. This is unlike the regulatory review process, where public comments are not solicited for pending rules. Usually OMB does not receive any public comments on ICRs. As a general rule, the OMB does not complete its review of ICRs for 45 days to allow time for the interested public to comment. However, there are expedited review procedures (fully explained under 5 CFR, Part 1320, Subpart 1320.18: "Emergency and Expedited Processing"). The procedures for approval are somewhat flexible. For example, in one recent case, OMB approved one segment of a survey, a technical portion, while disapproving another segment that required further refinement. This enabled the regulation development process, which was dependent on the results of the first segment, to commence sooner than would otherwise have been the case. The ICRs in EPA are processed through the Information Policy Branch of the Office of Policy, Planning and Evaluation (OPPE) prior to their transmittal to OMB. In addition to the three criteria noted earlier, those cited in "Evaluation for Risk Communication" (Arkin, 1988) are good guidelines to help ensure prompt review and approval. The EPA does have the expertise to provide assistance to agencies in designing surveys to meet the required OMB criteria. The Statistical Policy Branch of OSR in EPA serves as an information resource for the Agency in the design of instruments, and EPA's OPPE also has statistical expertise. There is an apparent tension between the regulatory review and the information collection review functions of OMB; the goals appear to conflicting. That is, with regulatory review, choosing the best alternative for society requires an agency to collect information to analyze the situation, while the information collection review focuses on minimizing the burden of information collection on the public. However, providing good information to the public is important and, if a survey is well designed, it can meet the overarching requirement of the Executive Order of doing more good than harm. As a result, OMB is supportive of risk communication programs and evaluations of their success. ------- 132 Understanding OMB Procedures The OMB publishes compendia of the regulatory programs and agendas of federal agencies. The Regulatory Agenda is a compilation of all regulations that agencies plan to promulgate within six months; the Regulatory Program is an annual compilation of forthcoming significant regulatory actions. REFERENCE Arkin, Elaine Bratic. 1988. Evaluation for Risk Communicators. Paper prepared for the Workshop ------- USING EVALUATION CASE STUDIES ------- Introduction Elaine Bratic Arkin Whether the issue is smoking or a Superfund site or pesticides in the drinking water, one challenge for risk communicators is that these risk messages, as warnings, come laden with extra burdens. What we tell people may enlighten them, but it may have an equal chance of confusing them; it may be frightening or reassuring; it could cause denial or alarm or anger. Messages about risk may motivate people to action or to frustration. At a previous conference, one panelist from a government agency said, "Why complicate the issue? Why talk about public frustration and anger and all of those things?" He said, "My job at the federal level is just to get the word out. Why is this such a big deal?" Although some people may think of the communication challenge as "just getting the word out," in reality we know that there is always a purpose for the communication. The purpose may be to encourage someone to seek information or help or protection, to change behavior, or to participate in policymaking or in the enforcement of existing laws and policies. So we need to examine the reasons for communicating about risks prior to designing risk messages, and then to look at the results. We need answers to questions such as: Did anyone hear what was said? Who listened? Did they understand? Did they agree? What happened as a result? Beyond getting the risk information out, there is a compelling need to answer questions of evaluation to justify risk communication efforts to taxpayers or to the sponsoring agency or company. Risk communication frequently has been considered an auxiliary activity to risk assessment and risk abatement. Answering questions about the results and value of risk communication is necessary to prove that this is a separate discipline, and that it requires professional knowledge and skills. Evaluation efforts are necessary to decide whether risk communication works as intended, to make sure that it does no harm, to know how it works, with whom, and how it should be altered in the future. In addressing evaluative questions, we must consider not only how to find the answers, but the obstacles and barriers to evaluation and how to overcome them. Such obstacles include agency restrictions, resource limitations, and the fact that the discipline of risk communication is still under development. 135 ------- 136 Case Studies This group of papers explores experiences with risk communication evaluation. Some are descriptions of entire programs, while others discuss experiences with particular methods or aspects of evaluation. Together they provide an illustrative range of perspectives from federal, state, and municipal agencies and from the private sector. ------- The National Cancer Institute Shelagh Smith Frequently, a risk communicator also must wear the hat of an evaluator. That can be both an advantage and a disadvantage. Many people think that evaluation is difficult, or a burden, and they panic. At the National Cancer Institute (NCI), there is a staff person in charge of evaluation, and this can lead to one of the primary problems in evaluation: Often, program staff are asked, at the end of a program, to determine whether it was successful. A prime task of the evaluator is to educate program managers about what kind of evaluation is feasible—and what is not—and about the need to build evaluation into a program from the beginning. Evaluation cannot be tacked onto a program once it is completed. In the federal government, there is a certain amount of commitment, funding, and support for evaluation, and for that reason, some evaluations conducted at the National Cancer Institute (NCI) may not be feasible under other circumstances. NCI staff have the expertise to plan and to conduct surveys and to obtain clearance from the Office of Management and Budget (OMB). These tasks may be obstacles for risk communication staff at other agencies. On the other hand, the disadvantage of having greater evaluation resources is that there is a temptation to design evaluation strategies that are more elaborate than necessary. It is helpful to remember that evaluation exists only to support a program. Without the program, evaluation is not necessary. Therefore, the evaluation design should fit the context and scale of the program it supports. Also, not all risk communication results are measurable, and not everything can or should be evaluated. For example, there are some programs for which pretesting (formative evaluation) is more applicable than evaluation of the program results. In order to make the best decisions about what and how to evaluate, it is important to review the reasons for evaluation. 137 ------- 138 Case Studies Why Evaluate? First, evaluation tasks can provide information for future planning. Evaluation provides program direction. Evaluation can demonstrate accomplishments and help to answer the questions of program managers, policymakers, and others. Evaluation can answer questions at the pre-production, production, and results stages. In pre-production, questions might include: Who is the audience? What are their needs (needs assessment)? What revisions are needed in draft messages (pretesting)? In the production and program implementation stages, questions that process evaluation can help answer include: What was produced? How many were produced and distributed? How long did it take? What did it cost? Who was the audience? Were they exposed to the message? At the final stage, questions to be answered include: Did members of the audience receive the message? Did they learn? Did they change the way they think or behave (outcome evaluation)? Evaluation can be used to apply successful methods to new programs, to revise current programs, or to plan. Evaluation and planning go together. Measurable objectives and defined goals are essential if one is to be realistic and objective about what kind of program—and evaluation—is feasible. Formative and Process Evaluation The first evaluative step in the program planning process is needs assessment. For example, at NCI, we conducted a survey of needs for educational materials among 100 hospital-based patient educators. Another evaluative activity is testing message concepts to identify the best way to communicate about a risk. For example, NCI conducted six focus groups to classify profiles of people to help shape "Eat for Health," a joint NCI/Giant Food consumer nutrition education program designed to change people's behaviors in buying, consuming, and preparing high-fiber, lower-fat food. The focus groups helped formulate ideas to kick off and shape this program. A third kind of activity is pretesting. One pretest at NCI showed that we were using illustrations and a title for discouraging use of chewing tobacco that was not appealing to adolescents. As a result, the booklet, "Chew or Snuff is Real Bad Stuff," was revised to be more appropriate for the intended audience. Another important activity is process evaluation. At NCI, we collect data routinely, but it was not being organized regularly into summary reports for staff to use in assessing questions such as "Where are we? How many phone calls are we receiving? How many brochures are being requested? Are we reaching our target audience?" This process data is useful to make sure that a program is on course and to permit any adjustments necessary while the program is still underway. NCI recently has prepared a plan for analyzing this process data. These are the types of evaluation most traditionally used by programs. They are inexpensive and use existing, accessible data and resources. They can be undertaken on a small scale, without large population based surveys, which are very expensive and require considerable expertise. Sometimes a combination of methods will provide the best evaluative picture. For example, pretesting draft materials using focus groups and ------- National Cancer Institute 139 intercept interviews can provide two kinds of data to analyze for a more complete assessment. A new undertaking at NCI is an audience segmentation survey. Nationally represen- tative, it is called a "psychographic survey." The NCI will send seventy-five value statements on separate cards to approximately 2,000 randomly chosen people. Respon- dents will be asked to sort the cards in order of priority, i.e., by how much a respondent agrees with the value statements. There will be value statements on issues related to cancer and health, but also unrelated subjects such as "I like to watch television on Sunday nights." NCI hopes to use the results to segment the national population into subgroups using factors other than simple demographics (e.g., age, sex, education and income). The intent is to target different types of people based on their different values. Although commercial marketers and political pollsters target according to psychographic characteristics, this is innovative for health communications. Outcome Evaluation The next type of evaluation—outcome evaluation—is more difficult. It takes more time, and is more expensive. It may be something imposed upon a program by a policymaker or through public demand. At NCI, a true experimental design is being used to evaluate one communication program. The difference between the true experimental design and the quasi-experimental design is randomization. In the true experimental design there are specified criteria for subjects who may volunteer. These subjects are then assigned systematically to an intervention or control group. Quasi-experimental design is used when randomization is not possible. For example, NCI' s "Eat for Health" program with Giant Foods is being tested in Washington, D.C., using Baltimore as the control site. Obviously, NCI could not randomize people going to supermarkets in one city, so a different city was chosen as the control site. Although not randomized, this test is a quasi- experimental design because there is the comparison group in Baltimore. A more feasible variation of outcome evaluation is field testing, or pilot testing. A few years ago NCI conducted a breast cancer education pilot test on a small scale with AT&T in New Jersey. The results showed that the program resulted in changes in both knowledge and practice of breast self- examination 5 months after intervention. As a result, the program was implemented on a larger scale. Obstacles to Evaluation There are a number of obstacles to evaluating risk communication programs. For federal agencies, one is OMB clearance. Also, evaluation can be expensive, although qualitative methods are often affordable. In addition, evaluation is time consuming, and policies governing evaluation may be predetermined by the agency involved. Further, not all risk communicators have sufficient skills to design and conduct evaluations. If this is the case, agencies can contract for assistance or consultation or tap into university based talent. There are evaluation methods that are not as difficult or complicated, including those used forpretesting and process evaluation. However, it is necessary for risk communication ------- 140 Case Studies program managers to become familiar with the options that are available. One source of help is NCI's new publication Making Health Communications Work. What is realistic? An impact evaluation may not be. It may not be possible to say that a program was successful and resulted in changes in behavior. But some type of evaluation usually is feasible. Many people have unrealistic expectations of what evaluation can do, especially if they are not well versed in risk communication or if they are not familiar with evaluation. And, of course, evaluation is not something performed as a program's last step. So, one obstacle may be the difficulty of deciding how to measure a program's effects if evaluation was not anticipated and planned early in the program. In conclusion, some recommendations include: Keep it simple. Concentrate on qualitative methods, if quantitative methods are not practical. Use secondary data from existing sources if possible to simplify the work. And finally, consider the many technical resources and sources of data that exist to help make some type of evaluation possible. ------- New Jersey Department of Environmental Protection J. Herb, J.A. Shaw, H.L Garie Government environmental regulators make management decisions using a variety of tools and mechanisms. Within the past ten years, the scientific community has recognized the value that risk assessment can contribute when used as a tool in environ- mental decisionmaking. Applied appropriately and with protective assumptions, risk assessment provides a logical, scientific basis for protecting public health through environmental management. However, while the utilitarian and scientific aspects of risk assessment are clear, uncertainties and assumptions inherent in the process often cause skepticism in the affected community. This makes communicating environmental decisions based on risk assess- ment particularly difficult to explain to the general public. These uncertainties prompt many citizens to consider risk assessment with suspicion and disbelief and, as a result, they may not accept or understand regulators' decisions. The case study outlined in this paper specifically concerns communicating with the public about an environmental health risk which was determined through the use of risk assessment. All too often, risk communication is tagged on as a final piece to an overall risk management strategy and, when put into practice, the risk communication effort is more a risk "telling" strategy, that is, the agency informs the public of the decision regarding an environmental health risk rather than actually "communicating" with the public about the situation. It is the belief of the regulators in the case described below that risk assessment is best communicated to the public when a proactive communication effort is integrated early in the risk management process. In addition, the communication effort should be designed to allow two-way communication with the public throughout the process. Background Union Lake in Millville, Cumberland County, is the second largest fresh water lake in New Jersey. With a statewide reputation as a sailing and fishing lake, it is a popular recreation area. The lake is south of the Vineland Chemical Company (Vi-Chem), now a 141 ------- 142 Case Studies Superfund site, located along the Blackwater Branch of the Maurice River in the city of Vineland. The water and lakebed sediments are known to be contaminated with arsenic, believed by the New Jersey Department of Environmental Protection (NJDEP) to have originated at the Vi-Chem site. Up until the spring of 1987, analyses of the lake indicated that recreational exposure to the water itself did not pose a health threat. During the spring of 1987, the state began to undertake plans to reconstruct the lake's 119 year-old dam which posed imminent hazards to life and property. At this time, NJDEP learned that it was necessary to lower the lake level considerably in order to perform the reconstruction work. It was the pending lowering of the lake that prompted the NJDEP Division of Science and Research (DSR), in conjunction with the New Jersey Department of Health (NJDOH), to conduct a health risk assessment of potential health risks posed from recreational use of Union Lake. Lowering the lake waters would increase the amounts of arsenic contaminated sediments exposed. The health risk assessment concluded that the arsenic contamination in exposed lake bottom sediments would result in an unacceptable level of risk. As a result of the risk assessment, NJDEP, as well as the Cumberland County Health Department, decided to close Union Lake for all uses. The branch of DSR that undertook the risk assessment and risk communication program at Union Lake is known as the Office of Environmental Health Assessment. This Office is the central component of an overall environmental assessment program initiated by New Jersey Governor Kean in spring 1986. The overriding philosophy of the office is that integration of risk assessment, risk communication, and risk reduction allows the public to be included in the decisionmaking process, which significantly increases an agency's ability to protect public health. Recognizing the Need for Communication As a result of the preliminary data from the Union Lake risk assessment, it was clear to the regulators that the contaminated sediments posed a potential health risk. They immediately consulted with the Risk Communication Unit (RCU) staff and integrated them into the NJDEP-NJDOH team that was assessing the situation. From that point, the role of communications was an integral part of the overall strategy at Union Lake. Matters of what to communicate to the public and how to communicate it were weighed equally with technical aspects of the risk assessment and long-term policy issues. This strategy was manifested in several ways: RCU staff were directly involved in the NJDEP- NJDOH planning meetings; technical staff were directed to devote as much time and energy as necessary to respond to requests for assistance from RCU staff; resources needed to implement communication strategies were made readily available; and RCU concerns were addressed in the development of policy. It is particularly important to note that it is often the communications staff that the public confronts with policy questions regarding coordination within the agency, long- term plans for the site in question, and the process under which environmental health issues are addressed. For example, in the case of Union Lake, it was difficult for staff to explain the health risk assessment to the public without addressing concerns about why the dam reconstruction could not be accomplished without lowering the lake. This demonstrates the necessity of having the communications staff identify technical and policy issues that ------- New Jersey Department of Environmental Protection 143 concern the public and bring these to the attention of the agency. In other words, the communications staff can serve as the voice of the public within the agency. In the Union Lake case, the communications staff were given the freedom and authority to raise concerns about policy issues, such as dam reconstruction, enforcement, and future research. However, this can work only if the communications staff are considered equal and important contributors to the overall agency effort. It also should be noted that a deliberate effort was made to communicate with the public about the risk assessment process as well as the outcome of the Union Lake risk assessment. Strategy The following considerations guided RCU's planning. • What is the purpose, or goal, for communicating an issue to the general public? In the Union Lake case, the purpose for communicating with the general public was to generate public support and adherence with the ban on use of the lake. • Who are the audiences affected and who are the audiences that we want to reach? In order to assess the Union Lake audience, RCU staff immediately moved to establish contacts in the community. First, the staff met with the Director of Health and Human Services for Cumberland County who, in addition to being a local government leader, is an active and well known leader in the community. Second, RCU staff visited the site and surrounding neighborhoods. Civic, recreational, educational, and religious organizations and leaders were subse- quently identified and contacted. Third, RCU staff reviewed past documents and newspaper articles regarding the site to further identify persons or groups that might be affected. In addition to identifying affected or interested audiences, RCU targeted specific audiences that needed to be informed. These audiences included local fishing and boating enthusiasts and the community schools. Two primary audiences were identified: a private sailing club located on the lake and a lakeside housing development that provided home owners with direct access to the lake shores. • What is the message that must be conveyed to each audience in order to reach it? In some communications efforts, there is a need to convey different messages to different audiences. However, the message in the Union Lake case was single and clear: contact with the sediments may pose a significant health risk. The RCU decided that this same message should be conveyed to all audiences, including people boating, swimming, and fishing, schoolchildren, and curiosity-seekers. • What strategies should be used to get the message across? The RCU determined from the outset that the most effective way to reach the public and generate trust in the decision to ban use of the lake was to use the county health department as the contact agency for local citizens. This allowed the public to acknowledge that the local government supported the actions of the state and increased public acceptance of the ban. The RCU also strived to make interpersonal communi- cations the preferred form of interaction between the agency and the public. ------- 144 Case Studies The following table summarizes the specific strategies used to communicate the potential risks posed by use of Union Lake to the general public. Several strategies are particularly important to note. First, the RCU staff planned and implemented a proactive two-way communications strategy with the community and also acted to raise the community's concerns within the agency. Second, through discussions with local leaders, the RCU recognized the importance of local newspapers in conveying information to the general public and developed background papers for the media on technical topics. Steps Used to Deliver Union Lake Message Established contact with local officials Prepared factual briefing materials for press, officials, and others Arranged a press conference to announce ban on lake use Assisted regulatory personnel in developing language for signs posted around Union Lake Coordinated distribution of written materials to local audiences Arranged follow-up meetings with key local interest groups Contacted local schools, churches, and hospitals to offer educational assistance about the lake ban Summary: Elements of Success The Union Lake case points out several key factors that must be included in efforts to integrate risk communication and risk assessment into a comprehensive risk manage- ment approach: First, the communications staff were involved as early as possible and were encouraged to raise concerns of the local community with technical staff and policy makers. Second, communication was considered an integral part of the overall agency program pertaining to Union Lake. Communications staff were considered part of a team, which also included policymakers and technical staff. Third, communications staff were given time and resources not only to plan a proactive communications strategy, but also to respond to public concerns. The com- munications staff's persistence in spreading the word and responding to the public in a timely and responsible manner increased the credibility of the agency and, in turn, led to increased adherence to the ban. Fourth, communications and technical staff conveyed not only the results of the risk assessment, but the process of risk assessment as well. ------- New Jersey Department of Environmental Protection 145 Fifth, communications staff did some of the actual communicating but also acted as a liaison between the community and technical staff and also established mechanisms and forums for the technical staff to communicate directly with the local community. Sixth, the communications staff worked within existing networks in the community and relied on local groups and government agencies to funnel information to individuals. This strategy allowed the community to understand that local agencies endorsed the state's actions and, by bringing the communications strategy down to a local scale, it increased the opportunities for local citizens to voice concerns, to which either local or state agencies eventually could respond. Seventh, the communications staff were committed to developing a communications strategy that was interactive or two-way, rather than simply telling the local community of its findings and decisions. The RCU is able to identify specific factors that allowed the communications strategy to successfully influence the overall risk management program at Union Lake. However, it recognizes that the Union Lake case is not an ideal model for two-way, up-front risk communication. Specifically, there are two aspects of an ideal communications approach that the RCU would have liked to integrate into the Union Lake case. First, although the RCU staff were brought into the Union Lake case as soon as a risk assessment began to show the presence of potential public health risks, this timing cannot be considered "up-front" in the ideal sense. The RCU intends to explore the potential for integrating affected communities into the process before the risk assessment is conducted so that communities actually are involved before the problem is identified. Second, although the RCU has evaluated the effectiveness of its communications efforts at Union Lake informally, no formal evaluation mechanisms were built into this communications strategy. Without formal evaluation, an assessment of the effectiveness of communications strategies is not fully reliable. The primary responsibility of the RCU is to conduct research and case studies to identify effective strategies for communicating environmental health risks to the public and for integrating the public into decisionmaking. Both communications evaluation and up-front integration of the public are the subjects of research investigations currently underway by the Risk Communication Unit. ------- 146 Case Studies READINGS Faust, S.D., A. Winka, T.J. Belton, and R. Tucker. 1983. Assessment of the chemical and biological significance of arsenical compounds in a heavily contaminated watershed: Part II, The distribution of several arsenical species in a watershed. Journal of Environmental Science and Health A18(31: 389-411. Faust, S.,A. Winka, and T. Belton. 1987a. An assessment of chemical and biological significance of arsenical species in the Maurice River drainage basin (N.J.). Part I: Distribution in water and rivers and lake sediments. Journal of Environmental Science and Health 22(3) [need page numbers]. Faust, S., A. Winka, and T. Belton. 1987b. An assessment of chemical and biological significance of arsenical species in Maurice River drainage basin (N.J.). Part II: Partitioning of arsenic into bottom sediments. Journal of Environmental Science andHealth 22(31 [need page numbers]. Hazen, R., L. Jowa, and J. Savrin. 1987. Risk Assessment for Recreational Use of Union Lake. New Jersey Department of Environmental Protection, Division of Science and Research. ------- CIBA-GEIGY Corporation, Toms River (NJ) Plant Thomas A. Chizmadia The Toms River Plant began operating in 1952 as the Toms River division of CIB A States Limited. It was later consolidated with Cincinnati Chemical Works Inc. which was owned by CIB ALimited (J.R. Geigy, S.A., and Sandoz Limited, all of Basel, Switzerland). By 1960, after the consolidation with Cincinnati Chemical, the site was known as Toms River Chemical Corporation. CIBA and Geigy merged in 1970. After Sandoz sold its remaining interest to CIBA-GEIGY in 1981, the site became the Toms River Plant of CIBA-GEIGY Corporation. Over the 35 years of the site's existence, the plant has manufactured a wide variety of dyes, additives, and adhesives in largely batch processes. Environmental facilities and procedures have evolved over this time period, as has the entire chemical industry, and consequently a variety of waste treatment and disposal techniques have been used at the facility—all of which are no longer considered advisable. Due to the nature of some of these techniques, contamination of parts of the site and some underlying acquifers has occurred. CIBA-GEIGY already has undertaken remediation of a portion of the contaminated area through the use of recovery wells, and full remediation through the Record of Decision under Superfund is anticipated shortly. To put the Superfund risk communication plan in its proper perspective, one must realize that the site was concurrently addressing many other environmental and waste disposal issues. These issues included use of an ocean discharge pipeline that carried treated waste water from the plant's waste water treatment plant to the Atlantic Ocean (the heightened awareness of which was triggered by a leak in April 1984); a planned site expansion that involved reduction of the current synthesis of dyes and plastics and the construction of a Pharmaceutical's active ingredient manufacturing facility; and 1985 regulatory and legal issues that involved a$ 1.45 million fine by the New Jersey Department of Environmental Protection and an indictment issued by the State Attorney General in October of that year. Consequently, since the pipeline leak in 1984, the plant has been the subject of virtually daily coverage in the news media, the focal point of many environmental and citizens' groups, and the subject of various state legislative initiatives that have attempted 147 ------- 148 Case Studies to end its discharge of treated effluent into the ocean. This high level of public concern and interest is likely to continue into the foreseeable future. In regard to Superfund, the site was placed on the national priorities list in December 1982. The Remedial Investigation began in 1985 and the draft Feasibility Study was released in June 1988. Communications Goals Because of the many issues facing the plant, its overall communications plan addressed both short-term and long-term objectives, as well as segregating the issues and developing communications plans for each. This paper will describe the plan implemented for Superfund related activity. The short-term goal of the overall communications plan was to improve the public understanding, image, and credibility of the Toms River Plant and CIBA-GEIGY by better educating the public about theplant'sproducts, environmental activities, and improvements, and about its many contributions to the local community/economy. The plan's long-term goal was to restore the public's confidence in the plant's environmental protection efforts and technology by communicating that the plant could be a viable production site without creating or causing any adverse impact on the environment. The two key messages from the plant to the public at large about Superfund were that there was no health risk as a result of the groundwater contamination that had been identified both under and off the site, and that CIBA-GEIGY was committed to cleaning up the contamination without the use of public funds. The first message was critical for a successful risk communications program and was based upon an independent quantitative risk assessment conducted for CIBA-GEIGY by ENVIRON Corporation. ENVIRON concluded that no significant risk existed as a result of the Saperfund site, and recognized that direct exposure to the contaminated groundwater was very limited since it was not used for domestic purposes. Activities To implement the plan, the plant undertook an aggressive informational cam- paign based on scientific data without ignoring the general human concern of whether health was affected. One of the biggest challenges of implementing the plan was to distribute technical data effectively in language easily understood to the lay public, and in a manner that clearly represented CIBA-GEIGY's concern for residents' interests and its ability to handle such a complex issue. Audiences were identified in the risk communication plan; priorities were given to regulatory and elected officials, the media, employees, retirees, and residents who lived adjacent to the site. With messages and audiences identified, vehicles were established for communicating the messages. These included briefings for public officials (on an almost weekly basis, both formal and informal); editorial board meetings; plant briefings for the media; neighborhood meetings on the plant site; participation on a citizens advisory committee established by the county governing body to discuss issues related to the plant site; and adoor-to-door campaign conducted inAugust 1986, in Oak Ridge, the neighborhood adjacent to the eastern boundary of the plant. In addition, the plant periodically mailed information directly to the residents in Oak Ridge related to the status of the Remedial ------- CIBA-GEIGY Corporation, Toms River (NJ) Plant 149 Investigation and what CIBA-GEIGY was doing to address the contamination. These items included the company's own material as well as information prepared by the EPA and independent consultants. Door-to-door campai gn. The most effective outreach efforts were those that allowed face-to- face contact with residents. The two-day, door-to-door campaign in the Oak Ridge neighborhood represents one of the best examples of a coordinated effort between the communications staff and other corporate staff departments, and participation by the neighbors themselves. Specifically, this program set out to communicate with those neighbors who, through geography alone, were perceived to be at the highest potential risk related to the CIBA-GEIGY Superfund site. One of the questions that the door-to-door campaign set out to address was how many of these residents had functioning irrigation wells and how these wells were being used. (All the residents in the affected portion of this neighborhood were on municipal water service for domestic use. The wells servicing the municipal water supply were not affected by the plant.) Prior to this effort, various critics and opponents of the plant had stated that irrigation wells were in abundance in this neighborhood and that neighbors were coming in direct contact with the contaminated groundwater through the use of these wells for filling pools, watering lawns, washing cars, and other activities. After identifying the portion of the neighborhood under which the contaminated plume of groundwater flowed, a four- to five-block buffer zone was added, and this total area encompassed the contact points for the door-to-door campaign. Approximately 220 homes were in this area. Residents were contacted by two CIBA-GEIGY employees who explained that they were conducting the campaign to share information about Superfund and to inquire whether a homeowner had an irrigation well. In addition, print material related to both Superfund and drinking water standards was distributed. If the residents indicated they had an irrigation well, they were told that there would be follow-up contact regarding a well testing and/or closure program in which the company would be involved. After the two-day period, fourteen homeowners were identified as having irrigation wells. The subsequent program on testing and closure was managed by the plant communications staff who maintained direct contact with the residents. Anyone with a well was offered an opportunity to have his or her well sampled on a periodic basis by an independent certified laboratory (paid for by CIBA- GEIGY) or to have the well permanently closed according to a state certified procedure (paid for by CIBA-GEIGY) with additional compensation for the inconvenience of losing the use of the well. The objective of the program was to address directly any concerns that neighbors had about the purity or quality of the water they were using for non-domestic purposes. Approximate half of the well owners chose to close their well (some of which, through sampling, proved to have no contaminants present) and half chose to have their wells monitored three times per year at the company's expense during the period of peak water use. Coordinated by the communications department, this program not only provided more data related to the extent of off-site contamination, but also met the objective of addressing concerns the homeowners had on the relative risk of continuing to use their wells for non- domestic purposes. Technical assistance grant. A further effort to address citizen interest in the CIBA- GEIGY Superfund issue was undertaken by the plant and the Ocean County Citizens for ------- 150 Case Studies Clean Water (OCCCW), a local environmental organization formed in 1984. The two organizations set a national precedent when the plant contributed and the OCCCW accepted a $50,000 grant to retain professional consultants to study the Toms River Plant Superfund site. CIB A-Geigy independently awarded the grant to the OCCCW to meet the intent of the Superfund Amendments and Reauthorization Act (SARA) of 1986 which provides for such grants from the EPA to environmental organizations. The plant offered to contribute the funds when it became apparent that the regulations allowing the EPA to provide such grants to citizens' organizations would not be promulgated prior to the resolution of the plant's Superfund issues. The awarding of the grant was made at a public ceremony on August 8,1987 at the office of the late Congressman James Howard in Toms River. To date, the process has allowed CIB A-GEIGY, the OCCCW, consultants, and the EPA a forum for continuing an active and productive dialogue on Superfund. Evaluation Public opinion polling commissioned by the plant since 1984 has indicated a great deal of success in the outreach efforts related to Superfund. Of the five surveys conducted between December 1984 and January 1988, the most significant positive change attributed to communications efforts by the plant was seen in the Oak Ridge area. According to the results, residents here not only exhibit more knowledge about the company and more confidence with the environmental protection efforts of the plant, but they want to receive much more information. CIBA-GEIGY is continuing its efforts to meet the neighbors' desire for more information by continuing its Superfund communications efforts with the Oak Ridge community as a primary audience. Realistically, the company also understands that the issues will not be resolved overnight. As more information becomes available through environmental studies, the final Feasibility Study, and the Record of Decision to be issued by the EPA, CIBA-GEIGY will continue to recognize the social, political, and technical aspects of risk communication under Superfund. ------- National Heart, Lung, and Blood Institute John C. McGrath The primary function of the National Heart, Lung, and Blood Institute (NHLBI) is to support biomedical research in the area of heart, lung, and blood diseases. But in addition to supporting research, a high priority of this Institute is to transfer and translate the results of that research for the benefit of the American public. One important means of transferring research results is through National Risk Factor Education Programs. The Office of Prevention, Education and Control coordinates three cardiovascular disease risk factor education programs: The National High Blood Pressure Education Program, The National Cholesterol Education Program, and the NHLBI Smoking Education Program. These three risk factor education programs were established because it is known that high blood pressure, high blood cholesterol, and smoking are the major modifiable risk factors for cardiovascular disease. This paper describes how NHLBI uses research data to evaluate the feasibility, progress, and results of its national education programs. It will pose and then answer four questions: 1. How do we know that high blood pressure, high cholesterol, and smoking are the three major modifiable risk factors for cardiovascular disease? 2. How do we know that modifying these risk factors will reduce the risk of cardiovascular disease? 3. How can we intervene so that those at risk modify their behavior in a way that reduces their risk? 4. How do we know that the intervention is effective? Before addressing these questions, it is important to understand the theoretical framework for risk reduction education programs and some of the main sources of data used in the development, implementation, and assessment of the Institute's national education programs. 151 ------- 152 Case Studies Theoretical Framework for Risk Reduction Programs The science base is the firm foundation upon which risk reduction programs must be built. Basic research and applied research are conducted to investigate the cause and nature of disease. Knowledge validation supplies much of the basis for risk factor education programs, first through clinical investigations and then through clinical trials. When the results of clinical trials provide firm evidence of the benefits of controlling a risk factor, the Institute transfers that knowledge, first through demonstration and education research, and then through national education programs such as the National High Blood Pressure Education Program and the National Cholesterol Education Program. Sources of Data While several sources of data are used in the development, monitoring, and assessment of risk reduction education programs, the Institute relies most heavily on four sources. • TheFraminghamStudy. Thisstudy,beguninl949,gathersdatathatdescribeand qualify the relative risk associated with specific risk factors. In this longitudinal study of residents of Framingham, Massachusetts, participants have been followed for forty years, their risk measured, and cardiovascular disease outcomes moni- tored. • Data From Clinical Trials. Clinical trials are an important component of the biomedical research spectrum because their results can have immediate appli- cability to medical practice. Many of these large scale clinical experiments are used to validate the efficiency of risk reduction. • The National Center For Health Statistics (NCHSX This is the federal agency responsible for collecting much of the nation's health data. NCHS conducts several large scale cross-sectional health surveys, including the National Health and Nutrition Examination Survey (NHANES), which gathers information on the health and nutritional status of Americans. The NHANES study is unique because it not only uses an interview in which medical history is obtained, but also an extensive physical examination including measurements of blood pressure, blood cholesterol, height, and weight. NCHS also sponsors an annual National Health Interview Survey to determine self-reported health status and public knowledge and behaviors related to certain diseases and/or risk factors. • NHLBI Surveys. The Institute has conducted several large-scale, national surveys of public and professional knowledge, attitudes, and practices related to high blood pressure and high blood cholesterol. Surveys of the public and high blood pressure were conducted in 1973,1979, and 1982. A survey of physicians and high blood pressure was conducted in 1977, and an update has just been completed. Surveys of public and physician knowledge, attitudes and practices related to high blood cholesterol were conducted in 1983 and 1986 and another will be conducted in 1989. In addition, the Institute soon will conduct a survey of nurses and dietitians. ------- National Heart, Lung, and Blood Institute 153 Using the Data How can we use the above data to answer the four questions posed above? First, how do we know that high blood pressure, high blood cholesterol and smoking are the three maior modifiable risk factors for cardiovascular disease? Data from the Framingham S tudy show that persons with high blood pressure have three to four times the risk of developing coronary heart disease and as much as seven times the risk of developing a stroke as do individuals with controlled or normal blood pressure. Likewise, the Framingham Study shows that the risk of developing coronary heart disease increases as blood cholesterol levels rise. The Framingham Study also shows that the more cigarettes one smokes, the greater the risk of developing cardiovascular disease, including stroke, atherogenesis, and vascular disease. The Framingham data provide a clear and causal link between three risk factors— high blood pressure, high blood cholesterol, and smoking—and cardiovascular disease. However, these data do not demonstrate that lowering these risk factors will reduce the disease risk. This prompts the second question: How do we know that lowering these risk factors will reduce the risk of cardiovascular disease? Data from several clinical trials have demonstrated that reducing high blood pressure, high blood cholesterol, and smoking will reduce morbidity and mortality caused by these conditions. For instance, the Veterans Administration Cooperative Study was initiated in 1963 to determine whether treating and controlling high blood pressure with antihypertensive medication would reduce resulting morbidity and mortality. The results were so positive that the trial was halted before its scheduled conclusion so those in the control group could benefit from the treatment. This trial was followed by the Hypertension Detection and Follow-up Program which determined that controlling high blood pressure, particularly mild hypertension, through a vigorous treatment program would reduce morbidity and mortality. Much of the evidence concerning cholesterol comes from the ten-year Lipid Research Clinics Coronary Prevention Trial. Data from this Trial indicated that in individuals at high risk due to high blood cholesterol levels, a one-percent reduction in risk of coronary heart disease resulted from every two-percent reduction in cholesterol levels. The evidence of the benefit of smoking cessation comes from a variety of studies over the last twenty years. Many of those that deal with cardiovascular disease are reported in the 1983 Surgeon General's report on smoking: The Health Consequences of Smoking: Cardiovascular Disease. These studies show that ex-smokers have a risk of death from cardiovascular disease substantially less than that of continuing smokers. In addition, the risk of developing chronic obstructive pulmonary disease is significantly reduced after smoking cessation. Having established that high blood pressure, high blood cholesterol, and smoking are significant risk factors for cardiovascular disease, and that reducing the risk factor reduces the risk, we come to the third question: How can we intervene so that those at risk change their behavior in a wav that reduces their risk? As mentioned earlier, the Institute coordinates three education programs: the National High Blood Pressure Education Program, the National Cholesterol Education Program, and the NHLBI Smoking Education Program. These programs comprise a network of several federal agencies, more than 150 national organizations, fifty states, and ------- 154 Case Studies several thousand community programs. At the core of the programs is a leadership entity called the coordinating committee. It consists of representatives of organizations with a diversity of interests including professional associates, voluntary organizations, hospitals, and citizens' groups. These education programs coordinate a wide variety of programs directed towards public, patient, and professional audiences in an effort to intervene and assist those at risk to lower their risk behaviors. Key components of the high blood pressure program and the cholesterol program are mass media campaigns. In developing the campaigns, the Institute follows a social marketing process described in detail in Making Health Communications Work, from the National Cancer Institute. An important first step in these campaigns is a strategy statement that provides the broad guidelines concerning target audiences and messages. In developing these strategy statements, data from NCHS, from NHLBI surveys, and from other sources are used extensively. For instance, data from the National Health and Nutrition Examination Survey reveal that prevalence rates of high blood pressure increase with age and are greater among blacks than whites. In addition, compliance with medication regimes are lower in men than in women and lower in younger than in older persons. Furthermore, data from the NCHS National Health Interview Survey showed that knowledge of high blood pressure is extremely high and that 92 percent of those surveyed had their blood pressure checked within the last 24 months. Based on these data showing high awareness combined with low compliance, particularly among men, the Institute identified a communication goal as compliance with therapy and the target audience as those whose risks are high and whose compliance with therap> is low, particularly younger men, both black and white. The strategy statement for the National Cholesterol Education Program identifies a different target audience and a different message strategy. Data from the 1983 and 1986 consumer awareness surveys sponsored jointly by NHLBI and the Food and Drug Administration showed that: • Recognition of high blood cholesterol among adults was high; 81 percent of respondents had heard of the condition in 1986 compared to 77 percent in 1983. • More people in 1986 believed that reducing elevated cholesterol levels would have a large effect on heart disease. The figure was 72 percent in 1986, up from 64 percent in 1983. • But less than half of adult Americans reported having their blood cholesterol checked and only about seven percent knew their blood cholesterol level. Based on these data, as well as data from NCHS indicating that approximately 50 percent of adult Americans have cholesterol levels above the desirable range, the cholesterol campaign identified the general public as the target audience. The message of the campaign urged people to get their cholesterol levels checked and to ask their doctor what the results mean. This is all part of an extensive process that eventually leads to products: radio and television public service announcements (PS As), print ads, posters, and collateral material, all with specific risk reduction messages targeted to specific audiences. But before developing these messages, the Institute tests concepts in focus groups. Focus group ------- National Heart, Lung, and Blood Institute 155 participants are selected from among the target audiences (i.e., the general public, aware hypertensives), and can be selected to reflect the socioeconomic status of that audience. For example, in developing the cholesterol mass media campaign, NHLBI examined three different, potentially motivating concepts: • Curiosity, a time-tested method of motivating people • Getting on the health bandwagon—everyone else is doing it, so should I • Taking control of one's health—taking responsibility The focus groups revealed that curiosity was a much stronger motivator than either getting on the health bandwagon or taking responsibility for one's health. Participants were extremely curious about the fact that they might have high blood cholesterol and not know aboutiL On theotherhand, taking responsibility for one's health was not particularly motivating because these participants generally thought of themselves as being in control of their health. In the next step of this process, the Institute developed a series of scripts based on the results of the focus groups. Before going into final production however, the scripts were tested. In this phase of testing, it is common to use an animatic, a detailed story board placed on videotape with voice- over. Scripts can be tested in two ways: through focus groups and through central location intercept interviews. Finally, the Institute circulated the proposed scripts, along with a report documenting the process, to approximately 150 key constituents for field review. Reviewers include the liaisons at State departments of health, members of the coordinating committees, members of the Institute's ad hoc minority committee, as well as other interested individuals. When a reviewer makes a particularly salient comment, the script is modified accordingly. The final question is this: How do we know the intervention is effective? One way is to use national data on the status of awareness, treatment, and control of high blood pressure and high blood cholesterol. By comparing baseline data to subsequent survey data, the effectiveness of an intervention can be measured. For instance, adjusted death rates for coronary heart disease, stroke, and noncardiovascular causes declined beginning in 1972 when the high blood pressure program was initiated. Other survey data show an increase in awareness that high blood pressure can cause stroke: from 29 percent in 1973 to 38 percent in 1979 to 66 percent in 1982. While the causal link cannot be proved, most experts agree that the National High Blood Pressure Education Program had a significant impact on these developments. To assess the impact of mass media campaigns, the Institute relies on several indicators. First it includes a bounce-back card with each radio and television PSA it distributes. Typically, about ten percent of the cards are returned, and in a sense they are self selective. The people who like the PSAs the most and the least tend to answer. While the comments are not generalizable to all stations, they provide an indication of what gatekeepers, i.e., the public service directors, think of the PSAs. Another indicator is provided by a monitoring service, Broadcast Advertisers Reports (BAR). This company monitors the Institute's PSAs (as well as those of several other Public Health Service agencies) in 75 of the country's largest markets. A monthly report includes the number of times the PS A was aired in each city, the times, and the value of that time had it been purchased. It is particularly useful to monitor this information from ------- 156 Case Studies month to month. The Institute intervenes in some way if the numbers go down and congratulates itself if the numbers go up. Summary In summary, the NHLBI uses various sources of research data in the development, implementation, and assessment of its national education programs. These sources include data from the Framingham Heart Study, from the National Center for Health Statistics, from clinical trials, and from NHLBI surveys. It also uses all of these data in developing and assessing mass media campaigns. In addition, it supplements the existing quantitative data with qualitative data. The latter, frequently obtained in focus groups, tell how well specific concepts communicate and how well various messages communicate. Finally, the Institute assesses the impact of programs through national survey data and commercial monitoring services. Exhibit A summarizes these steps. ------- National Heart, Lung, and Blood Institute 157 Exhibit A Using Evaluation in Developing PSAs The National Heart, Lung, and Blood Institute follows a twelve-month campaign devel- opment process consisting of six major steps: 1. Data Review—The Institute staff looks at the data on the prevalence, awareness, treatment, and control of high blood pressure and high blood cholesterol. Several sources of data are used including the second National Health and Nutrition Examination Survey (NHANES II) and the National Health Interview Survey. The purpose is to identify target audiences, and to identify the most effective information and education strategies to reach the target audience. 2. Concept Development—The Institute holds a one-day meeting with represen- tatives from constituent groups such as State health departments, members of its coordinating committees, public health practitioners, and health professionals dealing with members of the target audience on a regular basis. The purpose of the meeting is to develop a series of concepts that can be used to reach the target audience. These concepts are then tested with members of the target audience. 3. Draft Scripts/Test Messages—Based on the most successful concepts, a series of scripts are developed and tested with members of the target audience. Testing procedures include focus groups and central location intercept interviews. 4. Field Review and Clearance—Scripts and story boards for the PSAs are sent to approximately 300 key constituents along with a description of the message development and testing process. These people are asked to review and comment on the scripts. If a pattern of comments emerges on some aspect of the script, the issue is resolved before production. 5. Production—Production for all of the high blood pressure and high blood cholesterol television PSAs distributed during the year takes place during a concentrated 2- to 4- day period. 6. Distribution—The Institute sends its television PSAs to a designated media coordinator in each state health department who then distributes the PSAs, often through personal delivery, to television stations throughout the state. Bounce- back cards and commercial monitoring services help evaluate stations' use of the PSAs. ------- New York City Health Department Robert W. Denniston Most people are aware that the consumption of alcohol can involve some risk, whether it is drinking associated with driving, high blood pressure, or alcoholism. For most healthy adults who drink, these are preventable problems. But for some people, in particular those in certain high-risk groups, such problems can be especially severe. The discovery of fetal alcohol syndrome (FAS) and the resulting risk communication messages is one example of how risk communication can affect the public's health. It was only about fourteen years ago that scientists established that alcohol consumption during pregnancy presents a high risk for birth defects. In fact, it is the number one preventable cause of birth defects; about 2 percent of all births (about 50,000 each year) involve fetal alcohol syndrome. This syndrome is lifelong, irreversible, and costly to both families and society. Following the publication of research identifying FAS, there was scientific consen- sus resulting in a Surgeon General's Advisory Statement in 1981 that "the safest choice is not to drink during pregnancy." The Advisory was supported by the American Academy of Pediatrics, the March of Dimes, and many other health related organizations. In addition, the 1990 Health Objectives for the Nation identified the need to increase public awareness about the risks of alcohol consumption during pregnancy. The under- lying assumption was that increased public awareness is necessary, but probably not sufficient, to assure behavior change. That is, improved knowledge is a logical antecedent of behavior change, empowering individuals to make informed decisions about matters within their individual control. The first step in designing a risk communication program is to identify the prevalence of the problem: in this case, a survey of high school seniors nationwide revealed that about two-thirds identify themselves as "current drinkers." About two-thirds of the high school senior girls said that they drink five or more drinks on an occasion about every two weeks. From other data, we know that certain populations, particularly women of child-bearing age, are generally heavier drinkers than older women. The next step in developing the risk message about FAS was to examine the knowledge of the target audience. About two-thirds of women are aware that alcohol 159 ------- 160 Case Studies consumption during pregnancy can cause birth defects. However, upon closer examina- tion, it appears that this awareness is very superficial. In fact, about one-third of respondents to a recent survey said that alcohol is a hazard that can lead to fetal alcohol syndrome, but only at the level of three drinks or more a day. There are other myths and misconceptions as well. For example, many people believe that distilled spirits are more hazardous than beer or wine. This belief is especially prevalent among younger audiences. Although the public health community's response to the need for risk communica- tion messages about FAS has been multifaceted, including mass media campaigns and information provided to health care professionals, one specific case study will be presented here. The City of New York responded with a nontraditional public policy decision—to require warning posters in establishments that sell or serve alcoholic beverages. New York's program was somewhat unusual also because it included an outcome evaluation; polls conducted before and after the campaign allowed planners to gauge the program's effect. The warning poster is at point-of-purchase, within some 8,000 commercial estab- lishments in New York City, and reads: "Warning: Drinking alcoholic beverages during pregnancy can cause birth defects." This policy is very controversial; it required 15 months to be approved by the city council. In fact, it was probably the controversy reported in the media that raised public awareness of FAS more than the warning posters. This is one positive outcome of good public policy discussion and open public debate. In order to evaluate the effects of this new policy, a Gallup poll was commissioned before the warning signs were posted; however, the poll was conducted after the publicity had begun. A second poll was conducted after the posting of the warning. In the first survey, 54 percent of respondents spontaneously mentioned alcohol as a risk factor for birth defects. This increased to 68 percent in less than a year as a result of the warning poster policy. Even more important was the increase in knowledge among those at risk. Seventy- six percent of women of childbearing age mentioned alcohol as a risk factor for birth defects, and 74 percent of those women who said that they had consumed alcohol in the last 30 days identified it as a risk factor. This public policy did result in increased public awareness. Drinkers became far more aware of FAS than nondrinkers. Also, there were big gains in refuting some myths and misconceptions, particularly those related to differing risks from consumption of different types of alcoholic beverages. For example, because beer is the beverage of choice in the United States, it was important to make the public aware through the warning signs, and through risk messages in other media, that beer is as likely as wine or distilled spirits to cause FAS. Awareness of this fact increased from 60 percent in the first year to 71 percent. For wine, awareness increased from 60 to 66 percent, and for hard liquor, from 90 to 92 percent. The number of women who said that wine is an unlikely cause of birth defects decreased from 25 to 11 percent. This is important because it shows progress in breaking the strongly held conviction that wine is less harmful than distilled beverages. In summary, the program was effective in the sense that it made good use of the available research evidence on public opinion, responded to myths and misconceptions, developed persuasive risk messages, and after a year had a positive effect in increasing ------- New York City Health Department 161 public awareness as the logical basis for behavior change. There were positive results not only due to the warning labels, but also to the programming efforts, including the discussion in the media of the controversy. Of course, all of these positive changes cannot be ascribed to the campaign, the public discussion, or the warning signs alone, but rather to the synergistic effects of all components. As a result, other cities including Columbus, Ohio, Philadelphia, Pennsylvania, and Washington, D.C., have adopted this measure. In California, it stimulated Proposition 65 and other new policies. The message has continued to be controversial, causing responses from the beverage industry in particular. ------- Environmental Protection Agency Ann Fisher This paper describes two risk communication evaluations undertaken by the Envi- ronmental Protection Agency (EPA). The first illustrates the process of formative evaluation and the second, the process of summative evaluation. Formative Evaluation: Lead in Drinking Water Lead in drinking water became a news item before the EPA had decided what to do about this issue. With time at a premium, the Agency's Water Office drafted a booklet on lead in drinking water, assisted by the Office of Public Affairs, and then asked the Risk Communication Program to help evaluate the booklet. The Risk Communication Program contacted half a dozen experts, some of whom knew about lead and some of whom knew about drinking water. All knew something about risk communication. They were asked to take a quick look at the draft booklet and provide comments within two weeks. These experts' comments helped identify some definite problems. For example, it was not clear what the draft booklet's objectives were. It also was not clear who the target audience was: operators of water companies or households? This first version of the booklet also failed to give readers an action to take after they had become aware of their risk. The Water Office responded to these points and revised the draft. The resulting booklet, although it did not go through a formal evaluation, is certainly much closer to what most risk communication experts would feel makes some sense. Summative Evaluation: Radon Information The EPA estimates that radon causes between 5,000 and 20,000 lung cancer deaths per year in the United States, posing the greatest environmental risk that the Agency now deals with. A major concern was how to raise awareness of this risk without creating unnecessary panic. Research, however, indicates that it is hard to scare people about something that they can't see, smell, or taste and that is in their own homes (Weinstein et 163 ------- 164 Case Studies al, 1986). These findings suggested that the EPA needed to concentrate on raising awareness. The Agency has some research underway to evaluate alternative ways of doing this (Desvousges, Smith, and Rink, 1988). There was another basic question with respect to radon communications: Once the program had raised awareness, how could it get the right people to take action? Of course, some judgments have to be made about who the right people are, but after selecting criteria for determining that, it is possible to set up an appropriate research design. This the Agency did with the cooperation of New York State. The State, to find out how serious the radon problem was in New York, had put canisters in about 2,000 homes. It had not given much thought, however, to how it would tell people what the readings meant. EPA worked with New York to set up an experimental design that included six different approaches to providing information. Everyone in the study was assigned to one of the six approaches. The study initially gathered data at three points: at baseline; after homeowners got their first readings; and after they received annual readings. These data were examined to determine which approaches were most effectice with respect to a) satisfying respondents' need for information, b) increasing their knowledge about radon and how its risk can be reduced, and c) whether respondents could form risk perceptions consistent with the radon levels in their homes. Later, a fourth data set was gathered to learn which approaches were most effective in encouraging mitigation for those with high radon levels. In summary, this evaluation compared the impact of a variety of communication methods on a particular target audience. Although not as comprehensive or expensive as the ideal might be, it illustrates many of the principles of proper design and conduct of evaluation research. REFERENCES Desvousges, H., V.K. Smith, and H.H. Rink, III. 1988. Communicating Radon Risk Ef- fectively: Radon Testing in Maryland. October. Final Report to Office of Policy Planning and Evaluation, U.S. EPA. Weinstein, Neil D., Sandman, Peter M., Klotz, M.L. 1987. "Public Response to the Risk From Radon, 1986," Final Report to New Jersey Department of Environmental Protection, January, 1987. ------- Maryland Department of the Environment Nancy Zahedi and Carol Deck Federal and state agencies often work with limited resources in their efforts to communicate to the public about risk, with little extra time or money available to devote to evaluating the impacts of those communications. The Environmental Protection Agency (EPA), in an effort to fill the gap in evaluation, recently carried out a pilot radon risk communication effort, and an integral component of the effort was an evaluation of its effectiveness. This campaign took place from January to March 1988 in conjunction with the Maryland Department of the Environment. The goal was to evaluate the effectiveness of a number of innovative and cost-effective radon risk communication methods and materials. Based on insights gained through past risk communication efforts and focus group sessions, the following messages were emphasized: • Radon is a serious health risk. • You may be at risk, and the only way to find out is to test. • Testing is easy and inexpensive. • If your home has a radon problem, it can be fixed. • The State of Maryland will provide information and a list of testing companies through its toll-free radon hotline. These messages were communicated through a combination of methods in two different Maryland communities. The communication methods used were: • Radio public service announcements • Newspaper print advertisements • Newspaper articles • Utility bill inserts • Community radon presentations • Other community events publicizing radon 165 ------- 166 Case Studies Defining Success In designing the evaluation of the communications efforts, it was first necessary to define the purpose of the evaluation. Was the purpose to identify whether testing (the outcome encouraged by the communications) had increased; to learn where people obtained their radon information; to determine which methods were most effective; or to measure the impact of specific messages? Discussions on this subject indicated that none of these measures would adequately define success for the communications efforts. Measuring changes in testing alone, for example, might understate the impact of the communications, since the project had a short time frame, and radon poses long-term rather than short-term risks, thereby making it less imperative that people test immediately. However, the communications may have educated people about radon and resulted in their taking action to test at some later time. Thus, awareness, knowledge, and attitudes about radon, as well as testing behavior, were considered to be important measures of success. Where people obtained their information as well as socioeconomic characteristics that might influence how they responded to risk communications were also considered to be useful in understanding the impact of the communications. Evaluation Methodology Having defined the purpose of the evaluation and how success would be measured, different options were considered for evaluating the radon communication efforts. Based on the kinds of information needed to assess the effects of the radon communications, the main evaluation method selected was the collection of data through a pre-outreach and post-outreach survey. A survey questionnaire was developed and used to establish first the existing levels of awareness, knowledge, attitudes, and testing, and then measure changes that took place following the communication activities. An additional means of assessing which communications activities and materials were most successful was also built into the project design. This consisted of keeping a log of all calls to the State of Maryland's radon hotline during the project's three months of outreach and recording where callers had heard about radon. Limitations of the Evaluation Design The survey questionnaire relied on respondents to recall their sources of radon information. However, it can be difficult for people to remember where they heard about a given subject Thus the survey did not accurately reflect where people heard about radon, relying as it did on imperfect recall. The hotline data, however, were likely to be a somewhat more accurate measure of where people had heard aboutradon than the surveys, since those calling the hotline were mod vated by a specific communication or combination of communications to seek additional information. The disadvantage of the hotline data was that they included only those who called the hotline number and not those who may have received the project's communications but did not call. Another limitation of the evaluation design was that it assumed that the changes measured would be attributable primarily to the project's communications. However, during the same time period, a local television station carried out a major radon public awareness campaign that also reached the communities included in the study. As a result, while the surveys were useful in identifying changes in the effectiveness measures— ------- Maryland Department of the Evironment 167 awareness, knowledge, attitudes, and testing—they were less useful in attributing the causes of the observed changes to a specific information source—in this case, the EPA's communications versus the local television station's communications. Thus, although interesting changes were observed between the two surveys, which showed that respon- dents had increased their awareness, knowledge, and testing for radon, it was difficult to determine how much of the observed increases was due to which communication activities. Finally, the value of survey data depends on applying the appropriate statistical tools in analyzing the data. However, low response rates in both the pre- and post-outreach surveys made it difficult to use rigorous statistical techniques as a means of generalizing from the survey population to the population at large. Survey findings could be used only to describe the respondent population, and not to predict how other individuals would react. Use of Evaluation Data Despite the problems encountered in evaluating the EPA/State of Maryland radon risk communication efforts, this evaluation provided much information of value to risk communicators. As a result of the evaluation, it was possible to offer recommendations to EPA regional and state radon offices based on both the process of communicating about radon and the outcome. This information can be adapted by these radon risk communi- cators in their communications efforts. The survey also provided data that can be used to understand how people process risk information and where they turn for such information. The record of calls to the Maryland radon hotline helped supplement data from the survey to provide a more accurate picture of where people heard about radon during the project period. The data further allow for the quantification of the changes observed during the project period. It is possible not only to say that changes occurred, but also to indicate the magnitude and direction of changes. By evaluating risk communications, important factors that might not be otherwise obvious can be understood and incorporated into future communications. For example, an interesting finding from this survey was that speaking to someone else about radon (e.g., a friend, relative, or co-worker) was as important as exposure to other communication sources in explaining why some people were more concerned or knowledgeable about radon than others. Also, while more respondents were aware and knowledgeable about radon as a result of the communications during this period, many of them did not test because they were able to avoid personalizing the risk. They acknowledged therisks posed by radon but did not perceive being atrisk themselves (EPA, 1988). Thus, itis not adequate just to educate people about radon risks; efforts also must be made to convince people that they are personally at risk. Final Thoughts on Evaluation Evaluating the effectiveness of risk communication efforts is an important aspect of communications that is often overlooked, but it can provide important lessons for improving communications. The cost of conducting an evaluation can be a major drawback, particularly when resources are scarce. However, there is also a cost involved in producing an ineffective risk communication campaign, in terms of wasted resources and lives affected. A number of evaluation options are available and can be selected according to resource availability and evaluation needs. The information gathered from ------- 168 Case Studies such evaluations allows risk communicators to build on successes and avoid repeating failures. REFERENCE Environmental Protection Agency (EPA). 1988.Region 3/OPPE/State of MarvlandRadon Risk Communication Project: An Evaluation of Radon Risk Communication Approaches. Washington, DC: The Agency, November. ------- U.S. Council for Energy Awareness Ann S. Bisconti If communications are aimed at changing behaviors, improving attitudes, or just informing, program managers may be able to achieve their objectives without continuous evaluation and change. But the odds are against it. Think of the difficulties in communicating effectively. Even if the sponsor's name is Coca Cola, the program will not have unlimited resources. Planners must select the audiences with whom the program's limited resources will do the most good, and that means knowing about the potential audiences, getting their attention, and communicating messages that are meaningful and believable, and do not raise undue concern. Most important, program managers must be alert to the need for adapting to change, lest the program stagnate or die. The old dichotomy of dividing evaluation into two stages, formative and summative, is certainly inadequate for long-term communications projects and probably inadequate for most short- term projects as well. It is not enough to ask: "What should we do?" and then "How well did we do?" Conceptually, that sequence of questions treats evaluation as an add-on to the program, a response to the requirement for accountability. Instead, when evaluation is integrated into the program, the additional relevant questions are "How well are we doing and why?" and "How can we do it better?" When summative research and formative research become part of the program process, the program is more likely to start strong and then evolve and improve. A Case Study in Using Evaluation Results Evaluation research at the U.S. Council for Energy Awareness (USCEA) has helped its communications evolve. For example, in 1983, USCEA launched a national advertising campaign as part of a multifaceted communication program on the need for energy security and the prominent role of electric power from nuclear energy and coal in achieving that goal. The advertising agency, Ogilvy and Mather, one of the best in the business, began admirably but without the benefit of solid research. The initial television advertising, which did derive from research on audience attitudes, was not tested until later. It was attractive, appealing, and included the memorable song "Tomorrow" from "Annie." But 169 ------- 170 Case Studies early in the program the Council began to test the advertisements and found that this series of commercials was not getting its intended message across. The other half of the campaign, two-page magazine advertisements aimed at information seekers, derived from the body of facts about the electrification of America. For instance, the advertisements described how the use of electric energy had grown substantially since the oil embargo in 1973, while nonelectric energy use had declined. The new electricity, largely from coal and nuclear energy, replaced oil in many uses and helped reduce U.S. dependency on foreign oil. Because it was more efficient, it also helped reduce overall energy consumption. This informational series proved to be far more effective than the television advertising. It was attention-getting, clear, and favorably received. Based on follow-up research, the Council made significant changes. The television and magazine advertising were changed to match, with the same visuals, the same basic message, and approximately the same audience. Subsequent research consistently has shown the wisdom of these changes, as the television and magazine advertising are synergistic, i.e., they reinforce each other. Since those initial program changes, many other research based decisions and improvements have been made. For instance, each advertisement is tested before it is placed. Based on this testing, the Council may decide to use the advertisement as is, revise it, or drop it altogether. USCEA has a continuous, experimental-design study of overall advertising impact using large national panels. This research has shown that those who see the advertisements improve their knowledge and attitudes significantly more than those who do not see the advertisements. The research also helps identify population segments that the communications should be reaching more effectively, and this information is considered in both media placement decisions and creative approaches. The People Part Good integrated evaluation requires a communications team that appreciates research and knows how to apply the findings intelligently. Many professional communicators feel threatened by research until they learn how helpful it can be; therefore, launching an integrated evaluation program requires strong direction from the top. Once the evaluation component becomes familiar, it can be seen as an aid in improving the effectiveness of communications and in demonstrating that decisions were made scientifically and not by the seat of the pants. ------- Food and Drug Administration Louis A. Morris Reye's syndrome (RS) is a rare but severe disease associated with influenza and other viral diseases. It affects primarily children under the age of 18 years. Although its pathogenesis is unknown, the mortality rate is estimated at 20 to 30 percent, and permanent brain damage may also occur. For the past decade, evidence has been accumulating supporting the association between RS occurrence and the use of salicylates, such as aspirin. In the early 1980s, the Surgeon General recommended that doctors advise parents to "use caution" when administering salicylates to treat children with viral diseases such as chicken pox and influenza. In 1985, hearings were held before Congress to support a bill that would require the makers of aspirin to include a distinctive section on the product's label warning consumers not to administer aspirin to children with flu or chicken pox. Under mounting pressure, some manufacturers had voluntarily included the warning information while others had not. Some aspirin manufacturers feared that the warning would cause a substantial drop in their sales and increase the sales of competitors who make acetaminophen products. Surveys of parents undertaken in Houston, Texas, in 1981 and 1983 indicated a growing trend among parents to avoid administering aspirin to children with flu or chicken pox. In the 1983 survey, 60percent of the parents surveyed had heard of RSand42 percent knew of the association between RS and aspirin. Of the 103 children who had the flu, 14 percent received aspirin, 42 percent received acetaminophen, and 20 percent received both. The issue addressed in this case was the need for a study to determine if the Food and Drug Administration (FDA) should require a warning label on aspirin to tell parents not to administer it to children with flu or chicken pox. The first challenge is to define the question or objective accurately. In the case of the warning label, two issues needed to be addressed: the consequence of a communication (e.g., behavior change) and the communication itself. What would be the public impact of an aspirin warning label? What would be the best label format and design? Was a label 171 ------- 172 Case Studies the appropriate risk communication response? Are multiple communications—beyond labels—necessary? In defining communication objectives, the following questions can be formulated: • Is a warning label effective at changing behavior? • What would be an effective communication mechanism (or a combination of mechanisms)? • How do people decide to take aspirin? When is the decision made? Will a label affect that? • How do parents learn about medicines, and how do people use medicines? Methods to find answers to these questions include field studies, focus groups, panel studies, and national surveys. Field studies in most cases are not appropriate because they require too much time. Focus groups, panel studies, and national surveys each have strengths and weaknesses. Probably the best way to gather the most information, consid- ering cost and time constraints, is to use a combination of these methods. As information becomes available, the communications objectives might need to be redefined. The FDA frequently chooses national telephone surveys to explore, quickly and relatively inexpensively, public awareness and response to issues and problems. When structuring a questionnaire for a national survey, areas to be considered for questioning include what people know, what people do, and what people intend to do. The question sequence as well as the question structure should be carefully constructed. For example, how the questionnaire begins is important because it can influence how a respondent will answer the rest of the questionnaire. In this case, Reye's Syndrome should not be mentioned in the beginning of the interview to prevent a bias in the responses. One obstacle to conducting surveys in the federal government is the requirement for Office of Management and Budget approval. Time and cost considerations also make many large scale surveys unrealistic. However, a telephone survey can produce the needed results in about a month with a fairly good response rate. In summary, the fears, threats, and obstacles to evaluating effective risk communi- cation seem to be 1) defining the question or objective accurately, 2) time, and 3) costs. Effective risk communication is an evolving process as public awareness, opinion, and action change. The program or activity must develop mechanisms to reassess the situation readily as it evolves. ------- Cancer Prevention Awareness Program Shelagh Smith The National Cancer Institute's (NCI) Cancer Prevention Awareness Program (CPAP) is a national public information and education program that was launched in 1984. Its purposes are to increase public awareness of cancer risks and promote changes in lifestyle to help people reduce their own risks. The program was planned following the scientific quantification of the potential for cancer prevention. NCI determined that • About 80 percent of cancers are potentially preventable. The concept of prevention flowed naturally from a decade of positive life-style trends in the United Sates that support and sustain NCI's prevention messages. • Survey research consistently showed that the public was confused and skeptical about cancer and cancer prevention. NCI selected risk factors for inclusion in the CPAP based on three criteria: • The risk factor affects a significant percentage of Americans. • It poses a substantial threat by itself or in combination with other factors. • Its identification presents an opportunity to reduce or control exposure through individual effort. NCI conducted the following activities to select the risk factors to be addressed and establish a sound foundation for program development: • Prepared research summaries on each of seven selected cancer risk factors • Reviewed the existing literature on public knowledge, attitudes, and practices (KAP) related to cancer • Reviewed recent federal and state health promotion campaigns (e.g., Healthy Mothers, Healthy Babies; NIA Prevention Campaign; High Blood Pressure Education Program; Healthy Older Americans) to identify elements of successful programs • Analyzed communications, social marketing, and health education models to provide a conceptual and practical foundation for the program 173 ------- 174 Case Studies • Conducted thirteen focus groups to explore consumer perceptions of cancer risk and prevention and to test alternative messages and formats • Convened sixteen working groups to provide guidance on messages related to specific cancer risk factors and on communication strategies for various channels and audiences • Conducted a national survey of public KAP related to cancer prevention and risk Based on this research, the following objectives were set: 1. Improve public knowledge and attitudes regarding cancer prevention, incidence, and treatment. 2. Increase public awareness and knowledge of cancer risks that can be modified. 3. Increase public awareness and knowledge of healthful behaviors that afford a measure of personal control over cancer risk. 4. Promote changes in behaviors and practices that will help individuals to reduce their cancer risks. These objectives were established specifically for the prevention awareness pro- gram, designed as one contributor to the overall NCI goal for the year 2000—to reduce the cancer death rate by up to half from the 1980 level. This goal cannot be achieved without major gains in prevention. The following key evaluation questions were developed: • Are the program messages and materials effective (believable, interesting, persuasive, understandable, memorable, and personally relevant)? • Are the program networks (media and intermediaries) functioning effectively (amount of coverage, level of activity)? • Are people's knowledge and attitudes improving or changing with regard to cancer and cancer prevention? • Are people seeking information about cancer prevention from the appropriate sources? • Are people changing their life-styles on the basis of NCI's cancer prevention messages? • Is progress being made toward NCI's year 2000 goals? The following evaluation approaches and activities were designed to address the evaluation questions: • Formative evaluation: concept and message testing; pretesting of products. • Process evaluation: tracking calls to the toll-free telephone service and publica- tions distributed; case studies; tracking PSA use; tracking news coverage (news- clip content analyses); and tracking of secondary data. • Outcome evaluation: national knowledge, attitudes, and practices surveys (1983, 1985); tracking secondary data (especially other surveys, e.g., NHANES, NHIS); and ongoing surveillance of communications literature. ------- EPA Office of Toxic Substances Maria Pavlova Thousands of facilities in the United States are required to report environmental releases of over 300 toxic chemicals annually to the U.S. Environmental Protection Agency (EPA) and to the states as of July 1988. These new data augment existing information about the presence and effects of toxic chemicals. This information is available to the public and allows for more informed participation by the public on related issues. However, this information will be helpful only if the public also is provided with a context for understanding and using these data as a result of education efforts sponsored by EPA and others. To design messages and materials that are responsive to the public's questions and concerns regarding the presence of toxic substances, EPA commissioned a needs assessment through a cooperative agreement with the Institute for Health Policy Analysis, Georgetown University. The needs assessment was designed to: • Identify current awareness, knowledge, perceptions, concerns, needs, and wants of various publics (e.g., affected citizens, employees, environmentalists, com- munity leaders, local government staff, health and media professionals, educa- tors, and students) about toxic chemicals • Identify credible sources of information and potential delivery channels (e.g., League of Women Voters chapters, homeowners associations) to guide the design of communications activities • Identify and evaluate existing educational materials for use in EPA's program to prevent duplication of effort and assure optimal use of EPA resources • Test messages used to explain the meaning and implications of toxic emissions (e.g., public understanding of terms such as emission, risk, toxicity, dose, exposure, and health effects) Beyond this needs assessment, the project was designed to: • Produce guidelines for effective educational messages and materials on toxic exposure and health effects 175 ------- 176 Case Studies • Develop criteria to evaluate the subsequent risk communication activities The needs assessment revealed how specific segments of the public and opinion leaders think about issues related to environmental risks, specifically about the presence of toxic substances in their communities, through the following activities: Analyzing public perception data available from related projects. The EPA has sponsored related activities (e.g., the Toms River, New Jersey, Superfimd community education program, focus groups on radon, and the National Pesticide Survey), which have produced information about public awareness and perceptions of environmental risks. Analyzing national polling data. An analysis of related questions asked in national public opinion polls over the previous three years was conducted to provide a quantitative perspective on public knowledge and attitudes. Gathering information about the perceptions of environmental professionals. Telephone and in-person interviews were being conducted with environmental leaders, EPA headquarters and regional staff, and state and local officials to assess their perspectives regarding what the public needs and wants to know, as well as what assistance these professionals need to communicate effectively with their constituents. Identifying and evaluating existing educational materials. Letters to 1,200 envi- ronmentally related agencies and businesses solicited copies of related public education materials; existing inventories, libraries, clearinghouses, and databases were also checked. Potentially relevant, useful materials were reviewed for readability, accuracy, appropri- ateness, and availability. Conducting focus groups in potentially affected communities. Fifteen focus groups were conducted in communities where the presence of business and industry provides for the potential release of toxic chemicals. Focus groups were being held with citizens who live near affected industries, employees of those industries, environmental group mem- bers, local officials and potential intermediaries with public credibility, and business representatives. The results of all components of the needs assessment were analyzed in a report to EPA to: • Recommend priority messages, risk communication strategies, and target audi- ences. • Identify existing or modifiable educational materials or the criteria for develop- ing new educational materials. • Provide guidance for developing assistance and training programs for local emergency preparedness committees and for use in the community by interested organizations. • Recommend how to develop communication networks within a concerned or affected community. • Suggest criteria and methodologies for providing feedback to EPA. ------- National Cholesterol Education Program John C. McGrath The National Cholesterol Education Program (NCEP) of the National Heart, Lung, and B lood Institute (NHLB I) pretested materials prepared for a public education campaign. Persons with high blood cholesterol were asked to comment on two brochures in two focus groups in the Washington, D.C., area and two groups in Providence, Rhode Island. Persons who were aware that they had a high blood cholesterol level were recruited through the cooperation of medical facilities conducting blood cholesterol screenings. One group with male respondents and another group with female respondents were conducted in each of the two locations. All of the respondents had at least a high school education, worked in either a nonprofessional or professional job, and had never worked in the health or medical field. In addition, those selected had never had a heart attack or stroke and did not consider themselves very knowledgeable about cholesterol. Focus groups are a form of qualitative research, so the findings cannot be projected statistically to a larger population. However, the groups provide reactions to educational materials as well as insights regarding potential confusion in meaning, difficulty with the level of language used, and the need to highlight information to communicate critical points. Participants' needs for additional information (i.e., the questions left unanswered after reading the material) can be explored in the focus group session. To provide consistency across groups, a moderator's topic guide was developed. This guide was designed to examine participants' reactions toward: • Learning that they have high blood cholesterol, • The general content and format of two public education brochures ("So You Have High Blood Cholesterol" and "How To Lower Your High Blood Cholesterol"), and • The specific contents of each brochure For ease of discussion, the booklet "So You Have High Blood Cholesterol" will be referred to as "the red booklet" and "How To Lower Your High Blood Cholesterol" will be called "the green booklet," based on the color of their covers. 177 ------- 178 Case Studies All participants were given the two booklets in advance and asked to read them prior to attending the focus groups. At the beginning of the discussion, participants were asked to discuss their reactions to learning that they had elevated blood cholesterol. This discussion served two purposes: • To explore target audience feelings, perceptions, knowledge, attitudes, and misconceptions to help plan appropriate messages • To help the individual group members become comfortable with the moderator and the other participants Following this discussion, the participants were asked to discuss their reactions to the two booklets. To facilitate recall and discussion, the specific sections of each booklet being discussed were shown on overheads. Participants were asked a series of questions to assess their reactions to the amount of information provided, its level of complexity, attitudes regarding the tone and style of the publications, and whether this information would make a difference in their subsequent behaviors. Responses included the following: • Despite their interest in receiving the information, a number of participants felt that the level of information made it difficult to understand the brochures. Many participants described having read and re-read sections in an attempt to comprehend the information. • Although participants found portions of the booklets to be too densely written, they felt that the booklets were appropriate in tone. They described the general tone of both booklets as "serious," "straightforward," and "generally informative." • Although the booklets were considered similar in tone, several differences were apparent. The red booklet was perceived as being "more clinical" in tone, "hard- hitting," and "concise," while the green booklet was described as "less technical" and "more hands-on." • Despite their sense of having received a great deal of information, several participants raised questions that they felt had not been addressed. For the red booklet, these questions were more "clinical" in nature, while those directed at the green booklet were on a more "nitty gritty" level. • In general, participants felt that the material had given them enough information to know how to lower blood cholesterol. Several described the influence that this information had already had on their shopping and eating behaviors. However, others felt that the material was not as "accessible" as it could be. A few described feeling "inhibited" by their lack of complete understanding of blood cholesterol information. When asked if they would be able to describe this information to a friend, several participants mentioned their own "lack of comfort" with the material. Participants also responded to a number of questions developed to assess their reactions to the placement of information on the page, use of graphics, print size, and structure of appendices. They were also asked to comment on the way in which they perceived the two booklets as fitting together. Responses included the following: ------- National Cholesterol Education Program 179 • Respondents liked the placement of the material, with boldface questions or labels in the left-hand margin serving to provide a framework for the text. In the red booklet, especially, participants appreciated being able "to look at the questions and then move in to find the answers." • The red booklet was perceived as "too densely formatted." A number of participants suggested that more graphics be included in the final version of the brochures. Graphics were described as a means of "catching attention" and "breaking up the flow." Participants liked the idea of highlighting important portions of the text. They felt that the use of color would add excitement to the materials. Several also remarked that a picture can often be an aid to memory, if it is relevant to the information. Although participants thought that additional charts, graphs, and pictures would be quite useful, they did not want to include images that would not "add meaning" to the text. • The print size was acceptable to all participants. • As a means of facilitating access to information, respondents favored the idea of adding a glossary to each booklet. • Women, more than men, were interested in a format that would allow them to pull out various sections from the green booklet to carry around as references. In addition, specific comments about the content, wording, and format of each booklet were discussed and recorded on a page-by-page basis. As a result of focus group findings, both booklets were revised prior to publication. The changes included the following: • Adding more illustrations and simplifying the format to make the booklets less dense • Substituting summary charts for large blocks of text • Adding a "pull-out" summary suitable for posting on the refrigerator • Adding a glossary to one booklet ------- EPA Superfund Program Maria Pavlova In 1985, EPA Region n developed a pilot public education program to run concurrently with the CIBA-GEIGY Superfund Remedial Investigation/Feasibility Study (RI/FS) in Toms River, New Jersey. The Toms River community was selected because of public concern over contamination from the waste disposal areas of the CIBA-GEIGY plant, a designated Superfund site. The program was designed to assess the levels of awareness and concern among local citizens and to provide accurate information about health risks associated with potential exposure to environmental contaminants. Following completion of a community needs assessment for risk information, a series of fact sheets was developed, pretested, and modified to respond to citizen interests and concerns. The fact sheets were produced, and a field test was conducted to assess the best methods for reaching the public and ascertain public response to the fact sheets. Components of the field test included: • Asking community leaders who were members of a network established for the program to distribute a sample fact sheet and assessment questionnaire to their constituents • Reviewing requests for the fact sheets received by EPA's toll-free telephone number • Reviewing answers to the question "where did you get the fact sheet you just read" from readers requesting additional fact sheets via an order form • Analyzing responses to the assessment questionnaire Because of the limited number of participants in the informal field test, the results of the study were not reviewed for the purposes of quantifying demand or projecting responses of the community as a whole. Rather, the results of the field test were intended to indicate the most promising routes of distribution and to verify that the fact sheets were responsive to identified public interests in risk information. Following two months of the study, the findings regarding the routes of distribution included: • The community leaders' network was very willing to distribute the fact sheets. • Most of the organizations included in the network did not meet during the summer months (the time of the field test), so there was no way for community leaders to 181 ------- 182 Case Studies distribute the fact sheets to their constituents. • Although a teacher was willing to help, schools were out of session. • Community leaders were willing to mail to their constituents, but only if EPA would pay for postage. • Nineteen requests for additional fact sheets were received by the EPA toll-free number (advertised in the media), and eleven requests were received through mail order forms. These requests indicated that fact sheets had been picked up at: —a community leaders' network meeting —the county public information office —the N. J. Department of Environmental Protection —a local college —the county fair —the county library • About 1500 sets of ten fact sheets were distributed at the county fair. Two conclusions were reached about routes of distribution: • The fact sheets needed to be placed in more popular locations in the community (e.g., grocery and convenience stores, shopping malls) rather than the library and county information office. • The summer months are not the optimal time to release information to the public, because schools are on recess, families are on vacation, and most community groups do not hold regular meetings. Regarding the utility of the sample fact sheet distributed: • Most of the participants responded very favorably, indicating that some of the information was new, it was appropriate for the public, and they would recom- mend it to a friend. • They said that the question-and-answer format made it easier to read, and it was very interesting, informative, useful, and clear. • They said that the information was understandable, although not easy to read, and somewhat complete. As a result of the field test, the sample fact sheets were determined to be appropriate for broad-scale distribution. ------- Cancer Information Service Roswell Park Memorial Institute In May 1985, Dr. Frank Field broadcast a series of reports on the relationship between dietary practices and cancer risk on four consecutive week-night segments of the WCBS-TV evening news. During each segment, Dr. Field promoted the National Cancer Institute's (NCI's) booklet "Diet, Nutrition and Cancer Prevention" and provided the toll- free Cancer Information Service (CIS) number (1-800-4-CANCER) so that viewers could order it by telephone. After the second night, he also provided an address at NCI so that viewers could write for a free copy of the booklet. The WCBS- TV evening news is one of the most frequently watched news programs in the nation's largest television market, encompassing New York City, Long Island, Southern New York State, Northern New Jersey, and Connecticut. This promotion resulted in the largest response in CIS's ten-year history; a total of 75,000 booklet requests were received. Approximately 15,000 phone requests were handled by the New York City, New York State, and Connecticut CIS offices, and 60,000 mail requests were handled by NCI. The large response to the booklet promotion provided a good opportunity to characterize the population who requested the booklet, determine its usefulness to readers, and assess its impact on dietary behavior and knowledge. In November 1985, a survey was undertaken of persons who had called the Roswell Park Memorial Institute CIS (covering New York State callers) to request the diet booklet. The purpose of the survey was to (1) describe the characteristics and cancer-related attitudes of those requesting the booklet; (2) assess the callers' perceptions and uses of the booklet; and (3) determine the impact of the booklet on the callers' dietary practices and knowledge about diet and cancer risk. Between May 2 and May 31, the Roswell Park CIS office received 3,725 orders for the diet booklet The name, address, age, sex, educational status, and race were recorded for each caller. From the 3,725 callers to the Roswell Park CIS, a random sample of 1,842 callers was selected to participate in the survey. This sample was demographically representative of callers to the Roswell Park CIS and similar to callers requesting the diet booklet from the Connecticut CIS office. (New York City data were not available.) Based on previous experience with CIS surveys, a 70 percent return rate of questionnaires was anticipated, which would have yielded approximately 1,300 completed questionnaires. 183 ------- 184 Case Studies The questionnaires were mailed with a cover letter signed by Dr. Field and a prepaid return envelope; two weeks after the initial mailing, a reminder postcard was sent to those who had not returned the questionnaire. One week later, nonrespondents were sent a second questionnaire and cover letter. Finally, in January 1986, a third mailing consisting of a questionnaire and cover letter were sent to nonrespondents. A total of 1,106 usable questionnaires were returned. The four-page questionnaire included questions about the booklet, the respondent's dietary practices, changes in dietary habits and food preparation methods as a result of reading the booklet, and beliefs about the relationship between specific dietary practices and cancer. In addition, respondents were asked to respond to several attitudinal items on cancer prevention and treatment and to indicate their usual sources of information about cancer topics. A pretest version of the questionnaire was mailed to fifty individuals; based on their responses to the pretest questionnaire, a few changes were made and a final version of the questionnaire was constructed. Three indices for analysis were constructed to: • Provide a measure of the respondent's knowledge about the relationship between dietary practices and cancer risk • Provide a measure of positive changes made by respondents in food consumption practices since receiving the booklet • Provide a measure of positive changes in food preparation practices The findings were as follows: • Sixty-five percent said they read all of the booklet; 32 percent reported reading some of it. • More than 90 percent said the booklet motivated them to try to change their diet, was easy to understand, and provided useful diet suggestions. • Seventy-one percent indicated that they made changes in their eating and/or food preparation habits after receiving the diet booklet. • Men and women did not differ with regard to reported changes in food consump- tion habits except for consumption of skim milk and low-fat dairy products, where women were more likely than men to report increased consumption. • In general, more positive changes in food consumption habits were reported by older respondents, whites, and those with more education. • Those who reported reading all or some of the booklet were significandy more likely to report positive changes in food consumption habits than those who did not read the booklet. • A substantial percentage of respondents incorrectly reported that consumption of salt, coffee, eggs, and food additives are associated with cancer risk. In summary, the results of the survey suggest that the diet booklet was associated with positive changes in food consumption and food preparation practices and higher levels of knowledge about the relationship between diet and cancer risk. However, the lack of a control or comparison group makes it difficult to attribute the reported changes in dietary habits to the booklet. The overwhelmingly positive response to the television promotion of the diet booklet may be tempered somewhat by the fact that the characteristics and health habits ------- Cancer Information Service 185 of those who received the booklet suggest that they had access to the same information from other sources, and they were more interested and better informed about the issue than the general public. While mass media promotions may be useful in triggering a response from those most ready to make dietary changes, the majority of the population (and probably those most in need of altering their dietary habits) is not likely to respond to this type of promotion. For the majority of the population, efforts to persuade them about the importance of diet as a factor in cancer risk are necessary prior to attempts to educate them about specific dietary changes. ------- National High Blood Pressure Education Program John C. McGrath The National High Blood Pressure Education Program (NHBPEP) was estab- lished in 1972. Administered by the National Heart, Lung, and Blood Institute (NHLBI), the program was conceived as a cooperative effort of various federal agencies and major national health care organizations. The goal of the NHBPEP is to reduce death and disability related to high blood pressure through professional, patient, and public education. Strategies to achieve this goal include health promotion activities and the dissemination of information on the latest and most effective modes of treatment. Throughout its history, the NHBPEP has employed a comprehensive strategy of mobilizing, educating, and coordinating the resources of all interested groups in govern- ment and the private sector. The NHBPEP has developed a network of approximately fifteen federal agencies, 150 national organizations, all state health departments, and more than 2000 community-based programs. At the core of the program is the NHBPEP coordinating committee, a body composed of representatives from thirty-two national organizations. The NHBPEP's examination of major blood pressure control issues encompasses: • Appropriate roles of health care professionals • Most effective treatment practices in medical management • High blood pressure control at the worksite • Health needs of rural communities • High blood pressure in the elderly • The relationship between diet and high blood pressure • Special problems in controlling high blood pressure in minority populations Several factors have contributed to permitting an evaluation of the impact of the NHBPEP: • The longevity of the program • The level of resource commitment 187 ------- 188 Case Studies • The breadth of commitment by many organizations • The range of strategies to address the problem including but extending beyond risk communication • The existence of tracking measures Three measures of the NHBPEP's success—hypertensive patients' awareness, treatment, and control rates—indicate that substantial progress has been made toward its goal. By 1980, almost three out of four persons with hypertension were aware of their condition—a 50 percent increase since 1971- 72. During this same period, rates of hypertensive persons undergoing treatment (on medication) were one and one half times greater, and rates of hypertension control more than doubled (National Center for Health Statistics data). Although current national estimates will not be available until 1990, preliminary analyses of the data collected by the seven states that participated in the NHLBI Demonstration Grant study suggest that the rates of awareness, treatment, and control continue to improve among the hypertensive population. Because uncontrolled high blood pressure is the major risk factor for stroke, the mortality rate for cerebrovascular disease is another indication of NHBPEP progress. Age- adjusted death rates for cardiovascular disease in general have been on a downward trend since the 1950s. However, mortality rates for coronary heart disease and stroke sharply declined in the early 1970s. The NHBPEP was instituted in 1972; by 1985, age-adjusted mortality rates had declined 34.6 percent for coronary heart disease and 50.2 percent for stroke (National Center for Health Statistics data). The improvements made in public knowledge and patient behavior are highly encouraging; however, studies of these and other survey results indicate new challenges for the NHBPEP. Thus, the program continues to use both survey data and analysis methodologies to evaluate its multifaceted strategies and to refine its educational empha- sis. For example, in light of recent findings demonstrating a higher mortality rate from stroke in the southeastern United States, the NHBPEP has launched a "Stroke Belt Initiative," a major effort to target the population of this region. ------- CONCLUSION ------- Does Risk Communication Make a Difference? John F. Ahearne The purpose of this paper is to examine some mistakes in risk communication, describe the role of risk communicators, and pose some challenges that are facing risk communicators. Examples of bad risk communication exist in both the Love Canal and Three Mile Island (TMI) incidents. Within the scientific community, Love Canal is considered an example of government incompetence. After the problem erupted, the Environmental Protection Agency (EPA) decided to conduct a study of chromosome damage among residents of the Love Canal area. However, the study did not include a control group, primarily because it was ordered by the legal office, which did not understand how to do a valid evaluation. Only thirty-six people were examined, and they were selected to maximize the likelihood of finding chromosome damage. After the study, five reviews of the study were conducted to determine whether any valid conclusions could be drawn. The study only added to the agency's problems. When the TMI reactor was destroyed, neither the operating company nor the Nuclear Regulatory Commission (NRC) was prepared for such an emergency. TheNRC had only a semblance of an emergency procedure, which quickly broke down in the face of a major emergency. The agency was unable to get accurate information about what was happening, and had happened, at the reactor. Consequently, even if an effective plan to deal with the media had existed, the NRC would not have had good material to use. But the agency was not prepared to deal with the media. It had no knowledgeable official spokesperson, and it was several days before the director of the NRC's reactor division went to Pennsylvania and became the federal government spokesperson. Several post-accident reviews evaluated plant management, the nuclear industry, and the NRC response. The NRC took many actions as a result, including three actions directly relating to communications. First, the NRC made a major revision of its emergency response organization, establishing who would be in charge; who would handle contacts with the plant, other governmental agencies, and the press; and how to form evaluation teams. The revised system is a substantial improvement and has been tested in many drills and in at least two accidents. 191 ------- 192 Does Risk Communication Make a Difference? Second, the NRC passed an emergency planning rule, which requires a coordinated plan between plant operating staff and local governments. These plans must include dissemination of information, installation of warning systems, and drills to check com- munications links and ensure that participants know their roles during an emergency. The rule has upgraded emergency planning in many locales, and the plans have been used successfully to respond to non-nuclear emergencies, such as chemical spills from railroad accidents. Third, the NRC recognized that many people in the media covering TMI wrote confusing stories because they were unfamiliar with nuclear plants. Therefore, the NRC established a program in which the five NRC regional offices hold day-long sessions annually for regional science reporters and others in the media located near nuclear power plants. At these meetings, speakers review important information about the plants, including how they work, what their hazards are, and how an emergency would be handled. Using engineering terms, risk communication can be seen as a smart circuit with feedback loops. The decisionmakers are on one side, separated from the communications channel by a barrier, which can be called a buffer. At the other end of the channel is another barrier, or buffer. The buffers can be other agency staff, media representatives, repre- sentatives of public interest groups or industry, congressional staff, and the like. On the far side of that buffer are the recipients, who are the media, the public, and Congress. Some decisionmakers and people in communications see the channel and the buffers as one-way transmission devices, to transfer information from the decisionmakers to the communi- cations channel, and from the communications channel to the recipients. However, the buffers should be seen as two-way, with the smart channel providing information back to the decisionmakers. This information feedback can improve decisions by letting decisionmakers know what recipients think about proposed actions, what they are angry about, what their concerns are, and what information they want Risk communication today often must address complex scientific and engineering issues. Unfortunately, to be a smart channel the communicator must understand the technology. If the channel is a dumb channel, a one-way transfer of anything put into it can create the following problems: • A smart buffer—a knowledgeable media person or a skilled public interest group—will reject the transmission. • The concerned public will attempt to communicate via the channel, but will become frustrated because a dumb channel cannot become a two-way smart channel. The actions at Love Canal demonstrated these problems. Of course, this model assumes that the decisionmakers understand the science or engineering involved, which is not always the case. However, a knowledgeable com- municator, a smart channel, may be able to force a lazy decisionmaker to work to understand the issues. To understand the proper role of a communicator, the concept of Thomas Jefferson is helpful: "I know of no safe depository of the ultimate powers of society but the people themselves; and if we think them not enlightened enough to exercise their control with a ------- Does Risk Communication Make a Difference? 193 wholesome discretion, the remedy is not to take it from them, but to inform their discretion." Jefferson did not endorse manipulating or even persuading people. He endorsed informing them. Using this Jeffersonian concept and the model of a smart channel, successful risk communication can be defined as raising the level of understand- ing of relevant issues among participants, including decisionmakers. Rossi andBerk (1988) explain the importance of evaluation. However, one can read more into their paper. They stress understanding the problem to be solved, the planned solution, and the goals of the program, and then objectively assessing progress toward meeting these goals. By requiring understanding, Rossi and Berk mean to push program planners to identify clearly the problem, the solution, and the program's goals. This analysis is necessary for developing sound programs. Therefore, if risk communicators use the Rossi and Berk approach and have access to the decisionmakers, the communicators will be checking on whether the programs are sound. This is in line with the appropriate role of a smart channel. The NRC changed its programs after evaluations resulting from the TMI accident. The EPA also may have changed as a result of Love Canal, but the Agency's actions during the event would have flunked the Rossi and Berk checklist. Two key concepts that stem from Jefferson' s "informed discretion" are accuracy and completeness. It should be remembered that risk communication is not a one-time affair, a brief skim of a pamphlet The rationale for evaluation is that programs continue and similar situations will arise. Evaluations enable the system to improve because risk communicators learn from mistakes and successes. Two groups in particular need to be addressed: • The technical professionals, who need to disseminate their knowledge and un- derstanding. They have a responsibility to see that decisionmakers and com- municators understand the technical issues and that messages are accurate and complete. A large problem for persons in government is that they lack credibility. Credibility is necessary for successful communication; it is easily lost and, once lost, is never restored completely. Maintaining credibility requires continuing efforts to be accurate and complete, which in turn requires understanding the technology involved. • The communicators, who are willing and able to be smart channels, using the Rossi and Berk approach to improve the entire risk communication process. Much of the U.S. public mistrusts its government, unfortunately with reason. This can be seen in the fact that the U.S. public does not participate in elections at the levels seen in other democracies. It is doubtful that this is because the public believes everyone in government is doing an excellent job. Rather, the public believes government is not influenced by single voters, a symptom of a growing gap between the government and its people. This is not healthy. Risk communicators can affect this problem for better or worse, because they are involved in situations where the public comes in contact with government, where there is high interest, strong emotions, and the potential for strengthening the public's positive or ------- 194 Does Risk Communication Make a Difference? negative attitudes about its government. To be more than a dumb channel is a large responsibility. REFERENCE Rossi,P.andR.A.Berk. 1988. A Guide to Evaluation Research Theory and Practice. Paper prepared for the Workshop. ------- What Else Do You Need to Know About Evalutation? Roger E. Kasperson This book is one outcome of the first specialized conference on risk communication in which evaluation was the central topic and was treated generically, a sign of the beginning maturity of discussions in a rapidly growing and changing field. There is a remarkable array of new initiatives and programs in risk communication being undertaken by government agencies and other organizations. Since 1987, many guides and manuals on risk communication have become available. Many of these, however, have preceded a sound base of research to support prescriptions. Risk communication is still in its first generation of effort and encompasses diverse subjects and issues. The challenge now is to move beyond the notion of risk communication as simply the transfer of scientific information that has been amassed and move toward the creation of genuinely interactive communication processes and more comprehensive risk education. Need for Better Theory and Understanding Milton Russell (1988) has identified a central issue in risk communication: the extent to which society focuses on trust. People are concerned about the performance of social institutions and, in many instances, have lost their trust in them. It is important to focus on how the loss of trust occurred and to understand how it can be regained. The question of whether trust, once lost, can be regained in the short term remains to be answered. If it can, we need a better understanding of how to accomplish this. Likewise, it is important that we understand how to communicate with the public about risk under conditions of high social distrust. The proper design of programs requires intelligence about the nature of basic problems. Requirements include: • Careful definition of program objectives and goals • A formal assessment of communications needs • Baseline studies of public concerns and perspectives 195 ------- 196 What Else Do You Need to Know About Evaluation? • Conceptualization of the nature of communications problems Contrary to much of the flavor of discussions at these meetings, theory must be integrated with practice. The quality of a risk communication program depends on the quality of the underlying theory. It is also important to study the reasons for any changes in public understanding and behavior that occur, because, to the extent that the causes for change can be identified, risk communication can improve. Conversely, to the extent that we do not really know whether it was the intervention or confounding forces that produced the change (whether desired or not), the substance of risk communication as well as its evaluation will not advance. It is often stipulated that evaluation requires definition of clear goals and objectives, but goals and objectives, in reality, are never as clear as might be desired. This situation is likely to continue. Planners also must consider whether their program goals and objectives are appropriate or whether they have been manipulated or revised to achieve the hidden objectives of institutions. In the latter case, when the program is evaluated solely according to the stated program objectives, the evaluation will be shallow and insufficient. It also is possible to evaluate risk communication programs without goals and objectives, using normative criteria. Indeed, it is necessary to go beyond the stated objectives because risk communication is fundamentally a value laden activity and a political act. Risk communication that does not include rigorous evaluation is unethical and should be avoided. A continuing problem is the large gap between what is known and what is practiced. It is essential to narrow that gap so that risk communication programs embody the best of current knowledge, theory, and experience. This requires that agencies and institutions commit themselves not only to risk communication but to practice that is anchored in state- of-the-art understanding and that strictly observes the limits of what is known. Evaluation Within a Social Context It is important to consider risk communication programs and their related evaluation within the realities of public acceptance and understanding. Some consider- ations are outlined here. • Public judgments about problems. Statistical risk is an abstraction that has little meaning for the public. Members of the public do not think about risk in the same way as scientists. The public makes judgments about the more tangible technologies, not about intangible risks. If a risk communication program is organized around risk, it may be out of tune with the needs and concerns of various publics. A key job for formative evaluation is to determine whether the communication program is actually adapted to the nature of public concerns and the breadth of relevant considerations. • Knowledge versus information. The public seeks and receives risk information from many sources and, in turn, sends information to diverse parties. All have an impact on how a problem is perceived and the way that the decision process is ------- What Else Do You Need to Know About Evaluation? 197 structured. These many confounding variables present methodologic problems in evaluating the effects and success of risk communication. People will have a better understanding of a risk problem if information is adapted to their mental models of the risk and what they feel they need to know. Correspondingly, risk managers will enlarge their understanding if they know the nature of the public experience of risk. • The unintended consequences of risk communication. Although communi- cators tend to focus primarily on the objectives they seek to realize, unintended consequences of risk communication also occur. These unintended consequences may be greater in impact than the intended consequences. They may also be harmful. The control of unintended consequences requires that they be identified and assessed. Risk communication programs themselves should undergo a risk assessment to determine potential harm as well as benefits. • Holism and integration. As the practice of risk communication continues to develop and to become more elaborate, there is a danger of specialization of expertise and division of labor. If risk communication becomes isolated from risk assessment and from other aspects of risk management, it will lose its quality and integrity. Examples of failures in risk communication due to its abstraction and isolation from the science of risk assessment already are apparent. • Risk communication as a humane activity. The conduct of risk communication is embedded in its mission as a humane enterprise. At base, the goals of risk communication are those of risk management more generally—to anticipate the harms that may be inflicted on people, to reduce these harms whenever that can be done reasonably, and to reduce overall human suffering. All individuals involved in risk communication need to work to ensure that risk communication remains a caring and humane activity and forms an integral part of a broader risk management process. REFERENCE Russell, M. 1989. Risk Communication: On the Road to Maturity. Paper prepared for the Workshop. ------- APPENDIX ------- A GUIDE TO EVALUATION RESEARCH THEORY AND PRACTICE Peter H. Rossi Stuart A. Rice Professor of Sociology and Acting Director Social and Demographic Research Institute University of Massachusetts at Amherst Richard A. Berk Department of Sociology University of California at Los Angeles ------- TABLE OF CONTENTS Introduction 205 Key Concepts in Evaluation Research 206 Policy Space 206 Effectiveness: Three Meanings 207 Validity 208 Measurement Error 208 Causality 208 Generalizability 209 Chance 210 The Best Possible Strategy 213 Policy Formation and Program Design Issues 213 Policy Issues and Evaluation Research 213 Fitting Strategy to Problem 213 The Policy Contexts of Evaluation 215 Policy Formation and Design Stage 215 Defining the Problem 216 Needs Assessment Where Is the Problem and How Big Is It? 217 Estimating Problem Parameters 218 Qualitative Needs Assessment Approaches 219 Forecasting Needs 220 Policy-Oriented Research: 221 Developing Promising Ideas into Workable Programs 222 The YOAA Problem 224 Will Some Particular Program Work? The Effectiveness Issue 225 Practical Developmental Evaluation Approaches 227 The Assessment of Ongoing Programs: Accountability Evaluation 227 Is the Program Reaching the Appropriate Beneficiaries? 228 Program Integrity Research: Are Benefits Being Delivered? 229 Are Funds Being Used Appropriately? Fiscal Accountability. 231 Program Assessment Evaluation 231 Can Effectiveness Be Estimated? The Evaluability Question. 232 Did the Program Work? The Effectiveness Question. 233 Designs Frequently Used For Estimating Effectiveness 237 Was the Program Worth It: The Economic Efficiency Question. 248 Evaluation in Evolution 249 REFERENCES 249 ------- Introduction Program evaluation is not something new, having been undertaken since the time when social policies and programs became recognized as secular matters. Judgments have always been made about whether prospective or ongoing programs are worth the effort and resources expended. However, evaluation research—the use of social science research methods to aid such judgments—is relatively new, becoming common only in the last two decades. Its use has grown because the assessments of policies and programs have become more complicated and because social science research methods have matured sufficiently to handle the technical issues involved. If one of the parents of evaluation research is policy makers' uncertainty about how best to determine the success of public policy and programs, then the other parent is the technical development that made evaluation research credible (at least in principle). To evaluate something means to make judgments about its value or worth. Evalu- ation research is research in support of judgments about public programs, usually social programs. It involves the application of a complex set of research procedures, mainly based on social science research methods, to the questions generated by the policy problems arising in the course of program development, implementation, and assessment. At its best, evaluation research can help policymakers make judgments about the relative success or failure of programs and policies, whether prospective or in operation. Evalu- ation research is not, however, a substitute for policymakers' judgments, and responsible evaluators have no interest in either circumventing the political process or becoming central players. Put another way, evaluation research is essentially about the provision of the most accurate information possible in an even-handed manner. Thus, an evaluation might determine the likely impact of a program providing information about sexually transmitted diseases to adolescent school children, but leave unaddressed the political question of whether the schools should make such programs mandatory.1 Likewise, an evaluation might estimate the degree to which charges imposed for the treatment of waste water that are proportionate to the degree of pollution would deter manufacturers from polluting, but be silent on the fairness of such pricing policies. Or an evaluation might determine that bottle-ban initiatives really reduce litter, but take no position on whether such bans are an unreasonable interference with a free market. This paper provides a detailed introduction to the variety of purposes for which evaluation research may be used and to the range of methods that are currently employed. Specific examples are given to provide concrete illustrations of both the goals of evaluation researchers and the methods they use. Although this paper is intended to be comprehensive in the senseof describing major uses of evaluation research, it cannot pretend to be encyclopedic. Citations to more detailed discussions are provided. In addition, there are several general references that survey the field of evaluation in a more detailed fashion (Suchman, 1967; Weiss, 1972; Cronbach and Associates, 1980; Rossi and Freeman, 1985; Cronbach, 1982, Cuba and Lincoln, 1981, Guttentag and Struening, 1975, Cook and Campbell, 1979). 1 For purposes of this paper, the deeper issues surrounding the possibility, or even desirability, of true objectivity can be sidestepped. Suffice it to say that we do not hold to the conventional positivism position (see for example, Berk et al., 1985; Berk, 1988). 205 ------- 206 A Guide to Evaluation Research Theory and Practice Key Concepts in Evaluation Research One needs to know some of the specialized language of evaluation research to understand this paper. This section introduces the key concepts. The main intellectual roots of evaluation research are found in the social sciences. Social science concepts and research methods dominate the field and, correspondingly, most evaluation specialists have had some social science training. All social science fields have contributed to the development of evaluation research methods. The best evaluation research and the best evaluators are multidisciplinary, using an eclectic repertory of concepts and methods drawn from all of the constituent disciplines. Policy Space. The substantive roots of evaluation research are deep in policy concerns. Evaluations are almost entirely confined to issues that are encompassed in whatever may be the current "policy space." In other words, this means that evaluations are almost always concerned with making judgments about policies and programs that are on the current agenda of policymakers (broadly construed to include a wide variety of "players," not just public officials). Clearly, policy space is time-bound and does not encompass a permanently fixed set of policies and programs. It shifts and changes over time. It is the almost exclusive attention to matters that are included in policy space that distinguishes the evaluation researcher from the academic social scientist. A good evaluation researcher knows how to find out what is included in the policy space and what is not. Stakeholders. By virtue of its engagement in policy space matters, evaluation research is saturated with political concerns. The outcome of an evaluation can be expected to attract the attention of persons, groups, and agencies who hold stakes in the outcome. These "stakeholders" include policymakers at the executive and legislative levels, the agency officials who administer the policies or programs under scrutiny, the persons who deliver the services in question, groups representing the targets or beneficiaries of the programs, or the targets or beneficiaries themselves, taxpayers, and citizens generally. In almost all program issues, stakeholders may be aligned on opposing sides, some favoring the program and some opposed to it. Whatever the outcome of the evaluation may be, there usually are some who are pleased and some who are disappointed: it is not usually possible to please all of the stakeholders. For the most important political issues, all or nearly all of the groups listed above may appear among the vocal stakeholders. The vocal stakeholders, composed of those who make their views known, may be more narrowly restricted on typical issues. As a consequence, an evaluation report ordinarily is not regarded as a neutral document; rather, it is scrutinized, often minutely, by stakeholders who are quick to discern how its contents affect their activities. Even when an evaluation is conducted "in house"— by an agency concerned with its own activities—stakeholders may appear within the agency to appraise the report's impact on their activities. A clear implication is that evaluation research should not be undertaken by persons who prefer to avoid controversy, or who have difficulty facing criticism. A consequence of the ubiquitous presence of stakeholders is that much greater care needs to be taken in the conduct of evaluation research than in the conduct of its academic cousin, basic research. Loose procedures that border on the slipshod will surely come to ------- A Guide to Evaluation Research Theory and Practice 207 the attention of critical stakeholders and may render an evaluation report vulnerable. Another consequence is that the conduct of evaluation research often involves careful prior negotiations with stakeholders. For example, it can impede the evaluation of a school's educational program if a teachers' organization recommends that its members not cooperate with the evaluator. Effectiveness: Three Meanings. In the broadest sense, evaluations are concerned with whether or not a program or policy is achieving its goals; discerning these goals is an essential part of the evaluation process, and almost always its starting point This tends to be difficult because goals and purposes often are vaguely stated, typically in an attempt to garner as much political support as possible. Programs and policies that do not have clear and consistent goals cannot be evaluated. (A subspecialty of evaluation research, evaluability assessment, has developed to uncover the goals and purposes of policies and programs to judge whether or not they can be evaluated.) A key concept in evaluation is effectiveness—the extent to which a policy or program is achieving its goals and purposes. In practice, it should be emphasized that the concept of effectiveness must always address "compared to what." For marginal effectiveness the issue is dosage; the consequences of more or less of some intervention are assessed. For example, one might study whether a long-term program produces correspondingly more cancer screenings in comparison to a short-term educational campaign. For relative effectiveness, the contrast is between two or more program options.2 For example, one might compare the impact of public service announcements on cancer screenings versus that of mass pamphlet mailings, where both contain the same educational information. Finally, it is common to consider effectiveness in dollar terms: cost-effectiveness. Comparisons are made in units of outcome per dollar. For example, while mass mailings of pamphlets may increase cancer screenings more than public service announcements, the latter may be more cost- effective because it may cost less to produce an additional cancer screening using public service announcements. Validity. All research activities need to achieve validity—results that will stand up under the scrutiny of the harshest critics. Of course, validity is actually a bundle of goals; the four that follow are the most critical: Primarily, valid evaluation research uses valid measures. One must consider whether the measurement procedures used are likely to measure accurately what they are intended to, a topic that is sometimes considered under the rubric of "construct validity" (Cook and Campbell, 1979). For example, a study measuring the impact of the Center for Disease Control's pamphlet on AIDS, recently mailed to all U.S. households, must use measures that properly capture what CDC intended to affect in the way of behavioral, attitudinal, and cognitive responses among the public. 2 While "nothing" may be one of the options (serving as a comparison group, it cannot be overemphasized that nothing is not nothing (pardon our Zen). At the very least, "nothing" is likely to be the status quo. Moreover, subjects exposed to the status quo may react in a variety of ways (e.g., resentment, depression) if they know that others have been exposed to some innovative intervention. In this instance, the status quo becomes a treatment in the conventional sense; it does something new to its subjects. ------- 208 A Guide to Evaluation Research Theory and Practice It is important to stress that questions about measurement quality apply not only to program outcomes such as "learning," but also to measures of the program (intervention) itself and to other factors that may be at work (e.g., a child's motivation to learn). For example, an experiment on the effects of income support payments on criminal recidivism considered the payments to be the intervention, an incomplete description of the total caring support that the experimenters gave to the released prisoners along with the payments.3 Measurement Error. Space limitations prevent a thorough discussion of the mea- surement issues of evaluation research. At a minimum, evaluation researchers should be aware of the critical distinction between two kinds of measurement errors: those that are systematic and those that are random. Measurement may be subject to bias, consisting of systematic disparities between a measure used and the "true" attribute that is being measured. This is at the heart of the perennial controversy over whether standardized IQ tests really measure "general intelligence" without cultural bias. Measures also can be flawed because of random error or "noise." Whether approached as an "errors in variables" problem as in econometric literature (e.g., Kmenta, 1971:309-22), or as a "latent variable" problem as in psychometric literature (Lord, 1980), or as the "underadjustment" problem in the evaluation literature (e.g., Campbell and Erlebacher, 1970), random error can lead to decidedly nonrandom distortions in evaluation results. The role of random measurement error is sometimes addressed through the concept of "reliability." Systematic errors lead a measurement device to produce biased readings, by either over- or underestimating. In contrast, random errors lead to readings that are variable but unbiased, just as likely to over- as to underestimate. Causality. Many evaluation questions concern causal relations, e.g., whether or not a proposed program encouraging people not to use wood-burning stoves on high-air- pollution days will cause reductions in air pollution. The literature on causality and causal inference is large and currently fraught with controversy (e.g., Pratt and Schlaifer, 1984; Holland, 1986; Holland and Rubin, 1988; Berk, 1988). Suffice it to say that by a "causal effect" we mean a comparison between the outcome (following an intervention) compared to what the outcome would have been had the intervention not been introduced. For example, the causal effect of a ban on diesel-powered automobiles might be the amount of nitrogen based pollutants in the air after banning diesel automobile engines compared to the amount had the ban not been put in place. From the definition of a causal effect, it should be apparent that in practice, causal effects cannot be directly observed. One cannot observe the amount of nitrogen based pollutants in the air simultaneously with and without the ban in place. Rather, causal effects must be inferred. Thus, one might try to estimate the causal effect of the ban by comparing air quality before and after the ban. Or one might try to estimate the causal effect of the 3 For example, experimenters often escorted subjects to their employment, checked to see that they picked up their payment checks, and provided advice on how to retain employ- ment. Although these additional treatments may not have affected the results, valid measurement of the treatment should have included these measures in addition to the payment. (See Rossi, Berk, and Lenihan, 1980.) ------- A Guide to Evaluation Research Theory and Practice 209 ban by comparing air quality in an area with the ban to the air quality in an area without the ban. However, in the first case one must assume that no other changes have occurred that could affect air quality in the interval between the earlier and later observational periods. In the second case, one must assume that the two areas are otherwise identical with regard to all factors that could influence air quality. In short, the need to infer causal effects opens the door to inferential errors. In practice, therefore, whenever a causal relationship is proposed, alternative explanations must be addressed and presumably discarded. If such alternatives are not considered, one may be led to make spurious causal inferences; the causal relationship being proposed may not in fact exist. Sometimes this concern with spurious causation is addressed under the heading of "internal validity" (Cook and Campbell, 1979) and, as in the case of construct validity, is relevant regardless of the stage in a program's life history (assuming causal relationships are at issue). For example, anyone who claims that an educational TV program improved the knowledge of those who viewed it must also consider the alternative explanation that viewers were self-selected persons interested in the topic who would have picked up the same amount of information in some other way, if the program were not available. The consideration of alternative causal explanations for the success of programs is an extremely important research design consideration (Heckman and Robb, 1985). In the wood-burning example, an observed change in air pollution after the program went into effect could be the result of milder weather, improved wood-burning equipment, or by a rise in cord wood prices that led people to shift to other fuels, rather than changes produced by the program. In addition, programs that deal with humans are all more or less subject to problems of self- selection; often the persons who are most likely to be helped or who are already on the road to recovery are those most likely to participate in a program. Thus, vocational training offered to unemployed adults is likely to attract those who would be most apt to improve their employment situation in any event. Also, program operators sometimes choose the best among target populations to participate in programs, thereby assuring that such programs appear to be successful. In other cases, events unconnected with the program produce improvements that appear to be the result of the program; an improve- ment in employment for parents, for instance, may make it more likely that their adolescent children will stay in and complete their high school training. In any case, we will have more to say about causal inference later.4 Generalizabilitv. Whatever the empirical conclusions resulting from evaluation research, it is necessary to consider how broadly one can generalize the findings in question; that is, are the findings relevant to other times, other subjects, similar programs, 4 To anticipate a bit, the evaluator has two sorts of tools at his/her disposal. First, the data may be collected in a manner that greatly simplifies causal inference. Experiments in which subjects are assigned at random to experimental and control conditions are a good example. Second, the data may be analyzed in a fashion that explicitly addresses a set of specified, alternative causal explanations. Analysis of covariance is a common example. A good rule of thumb, however, is that a strong data analysis will almost never overcome a weak research design. ------- 210 A Guide to Evaluation Research Theory and Practice and other program sites? Sometimes such concerns are raised under the rubric of "external validity" (Cook and Campbell, 1979), and again, the question is germane to all program stages regardless of the evaluation method. Thus, even if a quantitative assessment of high school cholesterol education programs indicates that they do not change the eating patterns of high school students, this does not mean that adult education programs nwould be ineffective. Similarly, a descriptive account of why the cholesterol education program did not work for teenagers may or may not be generalized to apply to adult education programs. The high school cholesterol education example used here obviously is limited in generality because health educators know that teenagers are motivated by different things than adults. However, for other topics, limitations on generalization may not be as obvious. Standard questions that can be raised about most evaluations are whether the findings are applicable to other age groups, ethnic groups, cities, regions, agencies, or school systems besides those in which they were found. Or are the results specific only to the organizations in which the program was tested? Another issue that arises is whether a program's results would be applicable to persons who are different in abilities or in socioeconomic background. For example, Sesame Street was found to be effective with respect to preschool children from lower socioeconomic families, but was more effective with children from middle-class families (Cook, et al. 1975). Similarly, curricula that work well in junior colleges may not be appropriate for students in senior colleges. Programs that worked well with adults in their middle years may not be effective for the aged. There is also the problem of generalizing over time. For example, Maynard and Murnane (1979) found that transfer payments provided by the Gary Income Maintenance Experiment apparently increased the reading scores of children from the experimental families. One possible explanation is that with income subsidies, parents (especially in single parent families) were able to work less and therefore spend more time with their children. Even if this is true, it raises the question of whether similar effects would be found at present, when inflation is taking a smaller bite out of the purchasing power of households. Finally, it is impossible to introduce precisely the same treatment(s) when studies are replicated or when programs move from the development to the demonstration stage. Hence, one is always faced with trying to generalize across treatments that are rarely identical. In summary, external validity surfaces as a function of the subjects of an evaluation, the setting, the historical period, and the treatment itself. Another way of describing this issue of generalization is to consider that programs vary in their "robustness"; that is, in their ability to produce the same results under varying circumstances, with different operators, and at different historical times. Clearly a "robust" program is highly desirable. Chance. It is always important, whatever one's empirical assessments, that the role of chance be properly taken into account. When formal, quantitative findings are considered, this is sometimes addressed under the heading of "statistical conclusion validity" (Cook and Campbell, 1979), and the problem is whether tests for "statistical significance" have been undertaken properly. For example, perhaps people who have viewed television programs about the risks incurred by excessive exposure to the sun appear subsequently to lower their sun exposure, when compared to persons who have not seen the television program in question. But no two groups are ever identical: The observed ------- A Guide to Evaluation Research Theory and Practice 211 differences in sun exposure may have resulted from chance factors having nothing to do with the television program. Unless the role of these chance factors is formally assessed, it is impossible to determine if the apparent program effects are real or illusory. Similar issues concerning the role of chance appear in non-quantitative work as well, although formal assessments of the role of chance are difficult to undertake in such studies. Nevertheless, it is important to ask whether the reported findings rest on observed behavioral patterns that occurred with sufficient frequency and regularity to warrant the conclusions that they are not simply the result of chance. Three types of factors play a role in producing apparent (chance) effects that are not "real." The first reflects sampling error and occurs whenever one is trying to make statements about some population of interest from observations gathered on a subset of that population. For example, one might be studying a sample of students from the population attending a particular school, or a sample of teachers from the population of teachers in a particular school system, or even a sample of schools from a population of schools within a city, county, or state. Yet although it is typically more economical to work with samples, the process of sampling necessarily introduces the prospect that any conclusions based on the sample may differ from conclusions that might have been reached had the full population been studied instead. Indeed, one can well imagine obtaining different results from different subsets of the population. Although any subset that is selected from a larger population for study purposes may be called a sample, some subsets may be worse than having no observations at all. The act of sampling must be accomplished according to rational selection procedures that guard against the introduction of selection bias. A class of such sampling procedures that yield unbiased samples are called "probability samples," in which every element in a population has a known chance of being selected (Sudman, 1976; Kish, 1965). Probability samples are difficult to execute and are often quite expensive, especially when dealing with populations that are difficult to locate. Yet there are clear advantages to such samples, as opposed to haphazard and potentially biased methods of selecting subjects, that probability samples are almost always to be preferred over less rational methods. (See Sudman [1976] for examples of relatively simple and inexpensive probability sampling designs.) Fortunately, when samples are drawn with probability procedures, disparities between a sample and a given population can only result from the "luck of the draw." With the proper use of statistical inference, one can place "confidence intervals" around estimates from probability samples, or ask whether a sample estimate differs in a statistically significant manner from some assumed population value. In the case of confidence intervals, one can obtain a formal assessment of how much "wiggle" there is likely to be in one's sample estimates. In the case of statistical significance tests, one can reach a decision about whether a sample statistic (e.g., a mean reading score) differs from some assumed value in the population. For example, if the mean reading score from a random sample of students differs from some national norm, one can determine if the disparities represent statistically significant differences. A second kind of chance factor stems from the process by which experimental subjects may be assigned to experimental and control groups. For example, it may turn out that the assignment process yields an experimental group that on the average contains brighter students than the control group. As suggested earlier, this may confound any ------- 212 A Guide to Evaluation Research Theory and Practice genuine treatment effects with a priori differences between experimentals and controls; here the impact of some positive treatment such as self-paced instruction will be artificially enhanced because the experimentals were already performing better than the controls. Much as in the case of random sampling, in controlled experiments in which the assignment to treatment group or control is undertaken with probability procedures, the role of chance factors can be taken into account. In particular, it is possible to determine the likelihood that outcome differences between experimentals and controls are statisti- cally significant. If the disparities are statistically significant, chance (through the assignment process) is eliminated as an explanation, and the evaluator can begin making substantive sense of the results. It is also possible to place confidence intervals around estimates of the treatment effect(s) indicating the likely range of the effects, given that any estimate is subject to sampling variation. A third kind of chance factor has nothing to do with research design interventions undertaken by the researcher (i.e., random sampling or random assignment). Rather, it surfaces even if the total population of interest is studied and no assignment process or sampling is undertaken. In brief, if one proceeds with the assumption that whatever may be the program processes at work, other forces are at work that will have some impact, though not systematically, on the outcomes of interest. Typically, these are viewed as a large number of small, random perturbations that on the average cancel one another. For example, performance on a reading test may be affected by a child's mood, the amount of sleep the previous night, the content of the morning's breakfast, a recent quarrel with a sibling, distractions in the room where the test is taken, anxiety about the test's consequences, and the like. While these each introduce small amounts of variation in a child" s performance, their aggregate impact is taken to be zero on the average (i.e., their expected value is zero). Yet since the aggregate impact is only zero on the average, the performance of particular students on particular days will be altered. Thus, there will be chance variation in performance that needs to be taken into account. As before, one can apply tests of statistical inference or confidence intervals. One can still ask, for example, if some observed difference between experimentals and control is larger than might be expected from these chance factors, and/or estimate the "wiggle" in experimental-control disparities. In case it is not clear, statistical conclusion validity speaks to the quality of inferential methods applied and not to whether some result is statistically significant. Statistical conclusion validity may be high or low, independent of judgments about statistical significance. (For a more thorough discussion of these and other issues of statistical inference in evaluation research, and statistical inference in general, see Berk and Brewer, 1978, Barnett; 1982; Pollard, 1986). It is important to understand that the critical issues outlined above apply to all varieties of evaluation research, whether highly quantitative in approach or highly qualitative. In summary, evaluation research involves a number of key concepts each cor- responding to critical questions linked to evaluation design issues. Evaluations are concerned with policies and programs that are on the public agenda, suffused with political concerns, relevant to policies and programs that have clearly formulated goals, often obsessed with effectiveness issues, and designed to enhance outcome validity. ------- A Guide to Evaluation Research Theory and Practice 213 The Best Possible Strategy. In the next sections, the general issues just raised will be addressed in more depth. However, before proceeding it is important to note that practical constraints may intervene in the real world of evaluation research, even when an ideal marriage is made between the evaluation questions posed and the empirical techniques employed. Questions of cost, timeliness, political feasibility, and other difficulties may prevent the ideal from being realized. This in turn will require the development of a second-best evaluation package (or even a third-best), more attuned to what is possible in practice. On the other hand, practical constraints do not in any way justify a dismissal of technical concerns; if anything, technical concerns become even more salient when less desirable evaluation procedures are employed. Policy Formation and Program Design Issues Policy Issues andEvaluation Research. Virtually all evaluation research begins with one or more policy questions in search of answers. Evaluation research may be conducted to answer questions that arise during the identification of policy issues, the formulation of policy responses to such issues, in the design of programs, in the improvement of programs, and in testing the efficiency and effectiveness of programs that are in place or are being considered. Specific policy questions may be concerned with how widespread a social problem is, whether any program can be enacted that will ameliorate a problem, whether programs are effective, whether a program is producing enough benefits to justify its cost, and so on. Fitting Strategy to Problem. A given evaluation problem may be tackled at levels varying in intensity and thoroughness. When exquisite precision is needed and ample resources are available, state-of-the-art evaluation procedures may be employed. When the occasion demands approximate answers and when resources are in short supply, "rough-and-ready" (and, usually, speedier) procedures can be used. Correspondingly, the answers supplied by evaluations vary in quality: the findings of some evaluations are more credible than others, but all genuine evaluations produce findings that are better than haphazard guesses. This does not mean that any evaluation can use any means available. The principle that should be upheld in selecting evaluation procedures is the principle of "best possible," given available resources and constraints.5 Given the diversity of policy questions to be answered and the wide variations in available resources, it should not be surprising that there is no single best way to proceed. Evaluation research must draw on a variety of perspectives and procedures. Thus, approaches that might be useful for determining what activities were actually undertaken under some educational program, for instance, might not be appropriate when the time comes to determine whether the program was worth the money spent. Similarly, techniques that may be effective in documenting how a program is functioning on a day- to-day basis may prove inadequate for the task of assessing the program's ultimate impact. 5 This principal requires far more than lip service. It is all too common to hear in response to criticism of a slipshod evaluation the lame excuse that "it was the best that could be done under the circumstances" when in fact technically superior (and sometimes less wasteful) procedures easily could have been employed. ------- 214 A Guide to Evaluation Research Theory and Practice The choice among evaluation methods depends in the first place on the particular question posed; appropriate evaluation techniques must be linked explicitly to each of the policy questions posed. While this point may seem simple enough, it has been overlooked far too often, resulting in force- fits between an evaluator's preferred method and the particular questions at hand. Another result is an evaluation research literature padded with empty, sectarian debates between warring camps of "true believers." For example, there has been a long and somewhat tedious controversy about whether assessments of the impact of social programs are best undertaken with research designs in which subjects are randomly assigned to experimental and control groups or through theoretically derived causal models of how the program works. In fact, the two approaches are complementary and can be effectively wedded (e.g., Rossi, Berk, and Lenihan 1980; Heckman and Robb, 1985). Secondly, the choice among evaluation methods is conditioned by the resources available and by the degree to which precision is needed. It is probably overkill to devote more resources to an evaluation than to the program being evaluated.6 Nor does it make sense to plan an evaluation that will take several years to complete when the answers it will supply are needed within a few weeks. The evaluation effort must be tailored to fit the circumstances; that is, the need for precision in information and the amount of resources and time that are available. Finally, evaluations must be tailored to the degree of importance of the issue under scrutiny. At the one extreme, routine issues concerning potentially low impact programs probably do not deserve to be evaluated with any degree of care. For example, it would make very little substantive difference whether soft-steel paper clips were superior or inferior to plastic paper clips: Hence, it is not worthwhile to invest many (if any) resources toward evaluating their comparative merits.7 Similarly, it would make little sense to evaluate a media campaign involving one 30-second television spot broadcast over a small local station; we know in advance that the campaign would not be sufficiently strong to leave any appreciable residual effect. In contrast, policies dealing with central issues and programs that are very expensive deserve the most careful evaluation possible. A program designed to reduce exposure to the AIDS virus by saturating the media with messages deserves careful evaluation both because the issue is a critical one and because the program in question would require a major allocation of resources. 6 However, one must carefully judge what is at stake. For example, while the costs of an evaluation may loom larger compared to the costs of the particular program under consideration, the evaluation findings may have vital implications for many more pro- grams and for a larger program. In the context of the universe of programs potentially affected, the evaluation budget may be relatively small. 7 On the other hand, while they might perform similarly, they may have different environmental implications. Much would depend on the ways the two kinds of clips are manufactured and on what happens to them when they are thrown away. And, of course, the issues might be extremely salient to the competitive needs of paper clip manufacturers. ------- A Guide to Evaluation Research Theory and Practice 215 The Policy Contexts of Evaluation. To obtain a better understanding of the fit between evaluation questions and the requisite evaluation procedures, it is useful to distinguish between two broad evaluation contexts: 1) policy and program formation contexts, in which policy questions are being raised about the nature and amount of some identified problem, whether appropriate policy actions can be taken, and whether programs that may be proposed are appropriate and effective; and 2) existing policy and program contexts, in which the issues are whether appropriate policies are being pursued and whether existing programs are achieving their intended effects. Although these two broad contexts may be regarded as stages in a process that starts with the recognition of policy needs and ends with the installation and testing of programs designed to meet those policy needs, the unfolding of the policy process may bypass some evaluation activities. There are many examples of major programs that have had truncated policy formation stages, going straight from the drawing boards of the executive or legislature to full-scale operation. For example, Head Start and school lunch programs were started with minimum amounts of program testing beforehand. The issue of whether Head Start was or was not effective did not surface until some years after the program had been in place. Similarly, many programs apparently never get beyond the testing stage, either by being shown to be ineffective or politically troublesome (e.g., contract learning, Gramlich and Koshel, 1975) or because the policy issues to which they were addressed shifted in the meantime (e.g., the case of negative income tax proposals, Rossi and Lyall 1974). We do not mean to imply—by the organization of this section—that policymakers always ask each of the questions raised in the order that we addressed them. The questions are arranged from general to specific, but that is an order we have imposed and it is not intended to be a description of typical sequences. For example, research that uncovers the extent and depth of a social problem may spark the need for policy change, rather than vice versa, as may appear to be implied in this section. Policy Formation and Design Stage. Proposals for policy changes and new programs presumably arise as the result of dissatisfaction with existing policy, existing programs, or out of the realization that a problem exists for which a new policy and program may be an appropriate remedy. Policymakers and administrators need information that would make the policy and accompanying programs relevant to the problem and efficacious in providing at least some relief from the burdens imposed by the problem. It is important that the previous paragraph not be misunderstood. For example, we do not presuppose that the solutions sought by policymakers will solve the social problems in question as seen in some objective sense, but only that the problem as experienced and understood by the policymaker is to be addressed. Thus, from the perspective of policymakers, eradicating poverty may not be the goal so much as lowering the level of expressed concern with the problem of poverty, as experienced by the decisionmakers. It is also important to stress that defining a social problem is ultimately a political process whose outcomes do not simply flow from an assessment of available information. While it would be difficult to argue against providing the best possible data for potential areas of need, there is no necessary correspondence between patterns in those data and what eventually surfaces as a subject of concern. For example, in an analysis of pending legislation designed to reduce adolescent pregnancy, the General Accounting Office ------- 216 A Guide to Evaluation Research Theory and Practice (GAO, 1987) found that none of the legislation defined the problem as involving the fathers of the children in question. Every proposal addressed adolescent pregnancy as if it were an issue involving only young women. Another example concerns varying definitions of the problem of water pollution, each with different emphases placed on sources of pollution, counteracting technical solutions, and end-user solutions. (See also Berk and Rossi [1976] for a more thorough discussion of problem definition issues.) In principle and in practice, no useful distinction can be made between the formation of new policies and programs and the improvement of existing policies and programs. A proposed improvement is nothing but a proposed change. Correspondingly, the same evaluation procedures applicable to entirely new policies and programs are suitable for proposed changes to existing policies and programs. Therefore, the discussion that follows does not distinguish between them. Defining the Problem . A political or program issue is a social construction. That is, a condition that is defined as problematic thereby becomes a problem. Consequently, the beginning of a political issue consists of defining the problem in question. The preambles to proposed legislative actions usually recognize this principle by defining the conditions for which the legislation is designed as a remedy. A legislation program designed to address a particular problem is necessarily based on some definition or understanding of the issue involved. For example, two contending legislative proposals may address the issue of homeless persons, one identifying the homeless as needy persons who have no kin upon whom to be dependent, and the other defining homelessness as the lack of access to conventional shelter. The first definition centers attention primarily on the social isolation of potential clients and the second focuses on their housing arrange- ments. It is likely that the ameliorative actions that follow will be different as well. The first might emphasize a program to reconcile alienated persons with their relatives, while the second might propose a subsidized housing program. The two definitions lead to quite different proposed programs. To pursue another example, the presence of hazardous substances in water supplies may be defined either as a user problem or as a production problem. In the first instance, appropriate programs might educate users about how to best avoid contaminated water sources or, alternatively, how best to purify water before consumption. The second definition might lead to devising surveillance programs of potential polluters and the setting of sanctions for allowing pollution to take place. Note that these two definitions are not contradictory: rather, each highlights an aspect of the problem. The explication of definitions is, of course, not a task for which evaluators are uniquely trained. Lawyers and judges, textual analysts, and others are trained in laying open the logical structure and probing the inclusiveness and exclusiveness of definitions. Yet there is a special role that evaluators can play in this process by analyzing the implications of definitions for substantive concerns. It is clear that the two definitions of water pollution given above focus on slightly different (albeit overlapping) phenomena, but they also contain clues to the underlying factors that are considered to be driving those processes. Thus, judgments about definition issues may require substantive knowledge that evaluators often have (or can get easily). Especially critical in the explication of problem definitions is the relationship between what is popularly considered to be the problem and the implicit or explicit ------- A Guide to Evaluation Research Theory and Practice 217 definitions in the legislation addressing the problem. In this connection, the evaluator ordinarily would refer to legislative proceedings, including committee hearings and floor debates, journals of opinion, newspaper and magazine editorials, and other sources in which discussions of the problem may appear. The purpose of this review of sources is to examine how the problem has been formulated and to delineate as clearly as possible the set of alternatives that define the policy space for the issue in question. Certainly an important role for evaluators to play at this stage is to provide for policymakers' critiques of problem definitions inherent in the proposed policies and accompanying programs, and to propose alternative definitions that may be more ap- propriate. For example, an evaluator might point out that defining the teenage pregnancy problem as primarily one of illegitimate births ignores the large number of births that occur among married teenagers. Needs Assessment: Where Is the Problem and How Big Is It? The proper design of a public program and the projection of its costs requires accurate information on the density, distribution, and overall size of the problem in question. For example, in providing financial support for emergency shelters for homeless persons, it would make a significant difference if the total population of homeless is of a magnitude of 3.5 million or 350,000 (both estimates have been advanced). Whether the problem of homeless persons is located primarily in large central cities or can be found in equal amounts in both small and large places also would make an important difference in program design and planning. An identified problem often is a complex mix of related conditions; planning requires information on all related factors. For example, the proportions of the homeless suffering from chronic mental illness, chronic alcoholism, or physical disabilities need to be known in order to design an appropriate mix of programs. It is much easier to identify and define a problem than to develop valid estimates of its density and distribution. For example, a handful of battered children may be enough to establish that aproblem of child abuse exists. However, to know how much of a problem exists and where it is located—geographically and socioeconomically—involves obtaining detailed information about the population of abused children and its distribution throughout the political jurisdiction in question. Ordinarily, such exact knowledge is much more difficult to obtain. Through knowledge of the existing literature (consisting of government reports, published and unpublished studies, and limited distribution reports) and an understanding of which designs and methods lead to conclusive results, evaluation researchers are able to collate and assess whatever information exists on the issues in question. Note that equal emphasis is given to both collating and assessing: unevaluated information often can be as bad as no information at all. For some issues, existing data sources may be of sufficient quality to be used with confidence. For example, data that are routinely collected either by the Current Population Survey or the decennial Census often are accurate and trustworthy information sources to use. Likewise, data available in many of the statistical series routinely collected by federal agencies are often trustworthy.8 But when data from other sources are used, it is always 8 Unfortunately, there are exceptions. For example, it is widely acknowledged that the U. S. Census undercounts the number of blacks and Hispanics. For the nation as a whole, the ------- 218 A Guide to Evaluation Research Theory and Practice necessary to carefully examine how the data were collected. The assessment of data quality is another a task for which evaluators are eminently qualified. A good rule of thumb is that existing data sources will provide contradictory estimates on any issue. But even chaos sometimes can be reduced to some order. Seemingly contradictory data collected by opposing stakeholders can be especially useful for needs assessment purposes. For example, both the Coalition Against Handguns and the National Rifle Association have sponsored sample surveys of the American population, concerning approval or disapproval of gun control legislation. Although the reports issued by the Coalition and the NRA differed widely in their conclusions—one finding much popular support for more stringent gun control measures and the other finding the opposite—a close inspection of the data showed that many of the specific findings were nearly identical in the two surveys (Wright et al, 1983). Those findings upon which both surveys agreed substantially can be regarded with greater credibility. In many instances, there may be no existing information that can provide estimates of the extent and distribution of a problem. For example, it is likely that there are no sources of information about how households use pesticides or about the level of popular knowledge concerning how such substances can be safely used. Any instance of household pesticide misuse identifies a problem, but how serious the problem is—in households with children, for example—may be unclear. It may or may not be the case that the problem is a lack of knowledge concerning the toxic properties of certain pesticides, or a lack of knowledge about alternatives used to control household or garden pests. Ordinarily, there are no sources from which information on such issues can be obtained. Under these circumstances, an evaluator may wish to undertake a special preliminary study to estimate the amount and distribution of household pesticide use and the level of popular knowledge concerning the toxic properties of household pesticides. Estimating Problem Parameters. There are several ways of estimating a problem's parameters. Perhaps the easiest to undertake, but also the least reliable, is to rely on expert testimony. Most of the larger estimates of the size of the homeless population are essentially compilations of local experts' guesses of the numbers of homeless in their localities (see US Conference of Mayors, 1987). Another source of information that can be reliable—but is often unavailable—are the records from organizations that provide services to the population in question. For example, the extent of drug abuse may be extrapolated from the records of persons treated in drug abuse clinics. To the extent that the drug-using community is fully covered by these clinics, such data may be quite accurate.9 undercount is relatively small and for most purposes can be ignored. However, for some jurisdictions with large populations of blacks and Hispanics, the undercount translates into substantial losses of federal funds (since many programs are tied to the size of particular populations). This has led to a lawsuit by the State of New York in which statistical adjustments for the undercount have been proposed (Ericksen and Kadane, 1985). In short, how good the data must be always depends on how those data will be used. 9 It is also the case that if drug-abuse clinics did not cover all or most of the drug-abusing population, drug-abuse treatment programs may not be an issue. Hence, to the extent that ------- A Guide to Evaluation Research Theory and Practice 219 In many cases, it may be necessary to conduct research to assess the extent and amount of a problem. To illustrate, the Robert Wood Johnson Foundation and the Pew Memorial Trust were trying to plan a program for making medical care more accessible to homeless persons. Although there was an ample amount of evidence that serious medical conditions existed among the homeless population in urban centers, there was virtually no precise information on either the size of the homeless population or the extent to which medical problems existed in that population. Hence, the foundations funded a research project to devise technical advances in sample survey methods, in order to collect the missing information. The result was a research project that influenced most of the subsequent research on homelessness and has led to changes in plans for the 1990 Census that will make it possible to arrive at reasonable estimates of the homeless population on a national basis (Rossi, Fisher and Willis, 1986). Needs assessment research is usually not as elaborate as the pilot research described above. In many cases, straightforward sample surveys can provide most of the necessary information. For example, in planning for educational campaigns to increase public understanding of the risks associated with hazardous substances, it would be necessary to have a good understanding of what the current level of public knowledge is and which population subgroups pose special problems. A national sample survey would provide the necessary information.10 The number of local needs assessments covering single municipalities, towns, or counties done every year must no w be in the thousands. For example, the 1974 Community Mental Health legislation called for community-mental-health needs-assessments to be undertaken periodically. Last year's McKinney Act mandating aid to the homeless calls for states and local communities to undertake needs assessments as the basis for planning programs for the homeless. Also, social impact statements, to be prepared in advance of large-scale alterations to the environment, often call for estimates of the numbers of persons or households to be affected or to be served. The quality of such local needs assessments varies widely but is most likely quite poor on the average. Especially difficult obstacles lie in the need to devise valid measurements of relatively subtle social problems (e.g., distrust of food additives, or mental health). For such problems, unusually high- quality surveying methods are needed, the resources for which are often simply lacking on the local level. Qualitative Needs Assessment Approaches. It should be noted that the research associated with needs assessments can be as inexpensive as copying the relevant information from printed volumes of the U.S. Census, or as costly as several years of effort in designing, fielding, and analyzing a large-scale sample survey. Moreover, needs assessments do not have to be undertaken solely with quantitative techniques. Qualitative research—ranging the problem is being adequately handled by existing programs, data from such programs may be useful, but that is not the situation in which data are usually needed. 10 There are many national survey organizations that have the capability to plan, carry out, and analyze such surveys under contract. In addition, it is often possible to add questions to an existing national survey, thereby (possibly) reducing costs. It should be noted that for surveys of a given sample size, national surveys are slightly more expensive than local surveys. ------- 220 A Guide to Evaluation Research Theory and Practice in complexity from interviewing a few people in group discussion sessions, as in the focus group approach, to the more elaborate ethnographic research employed by anthropologists— may also be instructive, especially in getting detailed knowledge of the specific nature of the needs in question. For example, the development of educational campaigns may be considerably aided by qualitative data on the structure of popular beliefs. What, for instance, are the tradeoffs people believe exist between the pleasures of cigarette smoking and the resulting health risks? On the other hand, when the time comes to assess the extent of a problem, there is usually no substitute for formal quantitative procedures. Stated a bit starkly, qualitative procedures are likely to be especially effective in determining the nature of the need. Quantitative procedures are, however, essential to determine the extent of the need. An especially attractive feature of qualitative approaches is that they appear inexpensive. Certainly conducting three or four focus group sessions is less costly than conducting a sample survey. However, the information obtained from focus groups usually cannot be generalized accurately beyond the highly self-selected focus group participants. Although needs assessment research is ordinarily undertaken for the descriptive purpose of developing accurate estimates of the amounts and distribution of a given problem, such research also can yield some understanding of the problem's underlying mechanisms. For example, a search for information on how many high school students study a non-English language may reveal that many schools do not offer such courses; therefore, part of the problem is that there are not enough opportunities to learn foreign languages. As another example, the fact that many primary school children of low socioeconomic backgrounds appear to be tired and listless in class may be explained by a finding that many eat no breakfast. Carefully, sensitively conducted qualitative studies are particularly important for uncovering process information of this sort. Thus, ethnographic studies of disciplinary problems within high schools may suggest why some schools have fewer disciplinary problems than others, in addition to providing some indication of how widespread such disciplinary problems are. The findings on why schools differ might suggest useful ways in which new programs could be designed. Another example concerns how qualitative research on household energy-consumption uncovered the fact that few households had any information on the energy consumption characteristics of their appliances. Without knowing how they consumed energy, households could not develop efficient strategies for reducing consumption. Indeed, the history of ups and downs in public concern for social problems provides many examples of how qualitative studies (e.g., Lewis, 1965; Liebow, 1967; Riis, 1890; Carson, 1955), and sometimes novels (e.g., Sinclair, 1906; Steinbeck, 1939) have raised public consciousness about particular social problems. Sometimes the works in question are skillful combinations of the qualitative and quantitative information, as in the case of Harrington (1962), whose Other America contained much publicly available data inter- laced with graphic descriptions of the living conditions endured by the poor. Forecasting Needs. For program planning purposes, it is often important to be able to project current circumstances into the future. A problem that is serious at present, for instance, may be more or less serious years later. Yet forecasting future trends can be quite risky, especially as the time horizon lengthens. There are a number of technical and ------- A Guide to Evaluation Research Theory and Practice 221 practical difficulties, which derive in part from the necessary assumption that the future will be much like the past. For example, a projection of the number of persons aged 18 to 30 a decade later at first seems easy to construct; the number of persons of that age ten years hence is almost completely determined by the current age structure of the population. However, had demographers in central Africa made such a forecast 10 years ago, they would have been substantially off the mark. They would have failed to anticipate the tragic impact of the AIDS epidemic, which is most prevalent among young adults. Projections with longer time horizons would have been even more problematic because trends in fertility as well as mortality would have to have been included.11 Note that we are not arguing against forecasting. Rather, we are concerned by the uncritical acceptance of forecasts—acceptance of the information without a thorough examination of how the forecasts were produced. For example, examining the forecasting assumptions is a task that can vary considerably in complexity. For simple extrapolations of existing trends, the assumptions may be relatively few and easily ascertained. However, even if the assumptions are known, it is often unclear how to determine if the assumptions are reasonably met. For projections developed from multiple- equation, computer based models, examining the assumptions may require the skills of an advanced programmer and the insight of a sophisticated statistician. All forecasts should be reported both as a point and an interval estimate. The former is typically a single "best" guess, while the latter is a range of values in which the true (future) value is likely to lie. Yet for a large number of forecasting models, it is not apparent how proper confidence intervals may be constructed. Policy-Oriented Research: Can We Do Anything About a Problem? Diagnosis may be the first step on the road to treatment. The second step is to understand enough about the problem and its setting to devise appropriate remedies. That is, knowing a considerable amount about the distribution and extent of a problem does not automatically lead to solutions. In order to design programs one must call on two sorts of knowledge. First, one needs valid knowledge on the leverage points and interventions that are useful for changing the distribution and extent of a problem. Second, one needs to know—from a variety of sources—something about the institutional arrangements that are implicated so that workable policies and programs can be designed.12 11 There are a number of other problems that forecasters face. For example, suppose that a utility company wanted to forecast the demand for electricity 10 years in the future. Since there is obviously a strong relationship between the number of residential, industrial, and agricultural customers and the demand for electricity, knowing the numbers of each kind of customer would provide a basis for instructive forecasts. However, those numbers would have to be forecasted themselves, since the number of customers affects demand contemporaneously. These and other problems are discussed in a broad social science context by Berk and Cooley (1987). 12 This conception of policy-oriented research apparently causes considerable misunder- standing about the relationships between basic and applied social research. Policy- oriented research tried to learn how changes in policy can affect the phenomenon in question. In contrast, knowledge about the phenomenon per se (the province of basic disciplinary concerns) may have no ready links to what can be done about it. For example, ------- 222 A Guide to Evaluation Research Theory and Practice For example, applied research in microeconomics has shown repeatedly that consumers typically will respond to price. All else held constant, they will generally buy less of a commodity if its price increases. This lesson can be applied to conservation of all sorts. Yet it has been virtually impossible in many states to institute marginal cost pricing for water because of political opposition from agricultural users who are being subsidized by residential and industrial users (Berk et al., 1981). To take another illustration from water conservation, applied research in social psychology indicates that people who are likely to conserve believe that others drawing on the same resource are conserving as well. Yet it is unclear how water consumers who believe that other consumers are not consuming can be convinced that they are not alone in their support for conservation efforts. The only consumption data they usually see are their own (on their bill). One strategy employed by some water districts in California has been to enclose in each consumer bill a short newsletter reporting aggregate trends in consumptions by different segments of the community (Berk et al., 1981). It should be emphasized that to construct a program likely to be adopted by an organization, one needs to know how to introduce new procedures that would be undertaken at an appropriate level of effort. Large-scale organizations—schools, facto- ries, social agencies, and the like—are resistant to change, especially when the changes do not involve accommodations in reward systems. For example, an educational program that is likely to work provides positive incentives for school systems, particular schools, and individual teachers. Inadequate attention to program organization is one of the more frequent causes of program- implementation failure. Mandating that a particular program be delivered by an agency that is insufficiently motivated, poorly prepared, or without personnel with the appropriate skills to do so is a sure recipe for degraded and weakened interventions. Indeed, sometimes programs are not delivered at all under such circumstances. Developing Promising Ideas into Workable Programs. The act of transforming a promising idea into a workable program is essentially the practice of art rather than of science. Evaluation research has little to say about how to be creative although it may be useful as an aid to program development. For example, during the severe energy crisis of the late 1970s it became clear from needs assessment research that consumers had little specific knowledge on how their use of electrical appliances affected their energy consumption levels. Of course, nearly every consumer knew that keeping their refrigerator doors closed saved electricity and turning off their electrical burners when not in use would lower electrical consumption, but few knew that there are wide variations in the energy consumption characteristics of different refrigerators and electrical stoves. Needs assessment research also showed that most consumers were quite concerned about energy costs. In short, there was a considerable a convincing study finding that violent criminals often were abused as children does not by itself lead to rehabili tation programs for the violent criminals or to concrete interventions into the homes of abused children. However, such a study might stimulate ideas for the kinds of policy-oriented research necessary to develop sensible responses. That is, basic research may provide general clues about where and how to intervene. ------- A Guide to Evaluation Research Theory and Practice 223 reservoir of motivation to adopt energy conservation measures and notable gaps in popular knowledge about how best to conserve. Given these circumstances, there are a variety of programs that could be constructed to remedy it; for example, price changes that would reward consumers for shifting consumption away from the high demand periods of the day, and educational programs urging consumers to lower their thermostat settings during the winter. Furthermore, within each of these broad categories of programs there are a variety of specific measures. Pricing schemes, for instance, might be built on marginal price, average price, increasing block pricing, or other similar ideas. The point is that developing program ideas is not the outcome of evaluation research, but of artful innovation that links what is known about a problem (the outcome of evaluation research) to what is known about how to bring about change. In contrast, evaluators should feel right at home with pilot studies of how different pricing mechanisms might be instituted and whether there is any evidence that they might work. For example, if consumers are to pay at a higher per-unit rate as the amount consumed increases (e.g., under increased block pricing), some means must be found to allow consumers to monitor their electricity use in real time. Moreover, the delivery of that information needs to be studied. These are the kinds of tasks that allow evaluators to earn their keep. Likewise, evaluation skills are not relevant to the design of educational television programs. However, evaluators can make contributions at the development stage. Pilot testing (pretesting) television programs is a standard procedure in program development. Educational programs must demonstrate that they can get the intended audiences'attention, be understood, and produce a predisposition to act in a desired fashion. Pilot versions of new programs are often tested on small audiences whose responses are carefully monitored. The elements of a program that repel audiences, lead to misunderstanding, or lead to undesired behavior can be changed. The program can be finely tuned until pretest- audience responses are acceptable. Pretesting can be informal, with pretest audiences selected haphazardly, or involve more formal research programs. An example of a fairly extensive informal pretesting is one conducted by the Children's Television Workshop, producers of the highly popular educational television program, "Sesame Street." The producers monitor volunteer pretest audiences of preschool children to measure the attention-getting abilities of its episodes. The producers watch how closely pretest audiences follow the action of the episode being tested. In addition, the audience is interviewed after each showing to ascertain whether or not the message of the program was understood clearly. The program's deficiencies are then rectified and the process is repeated until the program is acceptable to the producers. At the other extreme, the Lodge Program developed by Fairweather and his associates employed a highly formalized development testing procedure. The goal of the program was to return mental patients to life outside the hospital in away that would reduce their chances of being rehospitalized. Drawing upon social science findings about the importance of informal group supports, Fairweather and his colleagues took two decades to develop a technique that could be used by most mental hospitals and was effective in lowering the patient return rates. The development process consisted of a series of ------- 224 A Guide to Evaluation Research Theory and Practice randomized field experiments in which version after version of the program was tested until an effective version was achieved. Thorough pretesting during the development phase can increase the chances of developing a worthwhile program. However, it is one thing to have a program that works well with a test audience and quite another to have a program that will work well in practice. A "Sesame Street" episode that does well in a studio atmosphere has none of the competition for attention that exists in an ordinary living room. Indeed, an adult-oriented health information program, "Feeling Good," that was developed by the Children's Television Workshop (the producers of "Sesame Street") did well with pretest audiences but failed to achieve significant audience shares when aired on public TV stations during prime viewing hours. The test audiences in the studios liked the episodes they viewed, but the unconstrained audience preferred programs on other channels that were competing with "Feeling Good." The YOAA Problem. Moving from the development phase to the operational phase usually means moving responsibility from a developing organization that is highly committed to the program to an operating agency whose commitment level may be much lower. This has been called the "Can YOAA Do It?" problem: Can Your Ordinary American Agency carry out the program with fidelity? Often the YOAA problem has been identified as a problem of dealing with large- scale bureaucracies, a diagnosis that obscures the problem as much as illuminates it. The issue is whether an operating agency has the appropriately trained personnel, a sufficiently motivating reward system, and the resources to carry out the program at the desired level of fidelity. Asking an agency to perform an additional task when its current tasks are straining its resources, or to undertake a task for which its personnel are not trained, are both recipes for failure. Therefore, it is vital to study program implementation, and descriptive accounts may be especially valuable. For example, just a few field visits to high schools—which were supposed to have in operation a widely publicized program designed to raise the academic motivation levels of poor black children—revealed that the programs existed mainly on paper and in the public relations releases of the main sponsor (Murray, 1980). Similarly, careful qualitative visits to the sites of the celebrated Cities in Schools Project brought to light the fact that the projects, as implemented, fell far short of original designs and intentions (Murray, 1981). It is at this point that it may make sense to start up demonstration programs in which operating agencies attempt to implement the program. Demonstration programs can be viewed as another step in development in which attention is centered on the problems that operating agencies encounter in carrying out a program. A prime example is the "administrative experiment" (a misnomer since these demonstrations were not truly experiments) carried out in connection with the proposed housing voucher program. Ten municipalities were selected to work out procedures for administering housing voucher programs in their localities and to carry them out for a period of years. The demonstrations were closely monitored by researchers who carefully noted all the difficulties each of the ten cities encountered in administering their versions of the program (Struyk and Bendick, 1981). ------- A Guide to Evaluation Research Theory and Practice 225 Will Some Particular Program Work? The Effectiveness Issue. Once a program has been fine- tuned and its operational kinks ironed out through demonstrations, the problem remains of whether it will be effective. To this point, all one has managed to do is document that the program in question can be implemented wifh sufficient fidelity as a "prototype." It is important to realize that effectiveness goes far beyond implementation and revolves around whether a program produces the changes anticipated.13 Effectiveness is rarely obvious for at least two reasons. First, it is often difficult to distinguish program effects from other major forces affecting the outcome. We addressed this earlier under internal validity. Second, it is often difficult to distinguish program effects from chance variation, which as "noise" may mask any program impact. (We addressed this earlier under statistical conclusion validity.) Furthermore, both problems are exacerbated by interventions that are typically weak and for that reason unlikely to produce strong effects. The reasons for the fact that most interventions are weak raises issues beyond the scope of this paper (Rossi, 1987). Nevertheless, among the most important explanations is that the social processes in which interventions are likely to be introduced usually are shaped by a large number of forces, while the programs introduced rarely address more than one of these forces. Nutritional behavior, for example, is affected by upbringing, ethnic background, disposable income, local availability of food products, information about nutritional issues, subjective estimates of the risks to health and well- being of the nutritional behavior in question, household composition, the nutritional practices of family members and peers, chemical dependencies, and many other influences. Yet programs intended to improve nutrition rarely target more than one of the many possible influences. To make matters worse, there appears to be no single developmental stage that if interrupted, will improve nutritional practices effectively.14 There are many handles controlling eating habits but each handle can control only a small part of this complex behavior. When a particular program has been identified that appears to be sensible according to current basic knowledge in the field, and a reasonable working version has been developed, the next step is to determine whether it is effective enough to become part of an agency's ongoing responsibilities. It is at this point that we recommend the use of randomized controlled experiments in which the candidate programs are tested. Randomized experiments are desirable (some would say mandatory) because randomly allocating persons or other units (e.g., classes) to an experimental group (to which the tested program is administered) or to a control group (from whom the program is withheld) assures that all the factors that ordinarily affect the educational process in question are, on the average, distributed identically among those who receive the program and those who do not.15 "Keep in mind that effectiveness may be relative or marginal, and also may take cost into account. 14 In contrast, consider diseases transmitted by insects (e.g., typhus, malaria, bubonic plague). If the insect hosts are destroyed, human infection is prevented. An effective way of eliminating hosts is also an effective way of eliminating the disease. "Randomization also means that the assumptions for routine significance tests are likely to be met. ------- 226 A Guide to Evaluation Research Theory and Practice Therefore, randomization, on the average, eliminates causal processes that may be confounded with the intervention and enormously enhances internal validity. That is, the problem of spurious interpretations can be addressed quite effectively. We advocate the use of randomized experiments at this stage in program develop- ment because of their scientific merit. (For other assets of randomized experiments see Berk et al., 1985.) However, this commitment in no way undermines the complementary potential of more qualitative approaches such as ethnographic studies,16 particularly to document why a particular intervention succeeds or fails. For example, in designing educational campaigns involving workshops, qualitative studies can uncover those orga- nizations whose sponsorship can be most easily obtained. Workshops held after working hours under the sponsorship of employers may appear to be an efficient strategy except that interviews with employees may uncover the fact that few would remain after hours for any purpose. Indeed, a program of proposed workshops to teach better health habits to persons at risk of coronary heart disease that was based on this strategy attracted no more than a handful of participants instead of the hundreds that had been planned for. Ordinarily, developmental experiments should be conducted on a relatively modest scale, and are most useful to policy needs when they test a set of alternative programs that are intended to achieve the same effects. For example, it would be useful for an experiment to test several ways of motivating people to have their homes tested for radon since the end result would be to provide information on the relative effectiveness of several equally attractive (a priori) methods. There are many good examples of field testing through randomized experiments of promising programs. The five income-maintenance experiments were devised to test the impact of negative income tax plans, under varying conditions, as substitutes for existing welfare programs (Kershaw and Fair, 1976; Rossi andLyall, 1976,Robins, 1982; Hausman and Wise, 1985). The DepartmentofLabortestedtheextensionof unemployment benefit coverage to prisoners released from state prisons in a small, randomized ex- periment conducted in Baltimore (Lenihan, 1976). Randomized experiments have also been used to test national health insurance plans and direct cash subsidies for housing to poor families. At issue in most of the randomized experiments was whether the proposed programs would produce the intended effects and whether undesirable side effects could be kept to a minimum. Thus the Department of Labor's LIFE experiment (ibid.) was designed to see whether released felons would be aided in adjusting to life outside prison through increased employment and lowered arrest rates. The most extended series of developmental experiments is that reported by Fairweather and Tornatzky (1977). These involved more than two decades of consistent refinement and retesting, resulting in an efficacious and replicable treatment that can be implemented in a variety of conditions. Currently 16 An ethnographic study proceeds by careful observation of persons as they function naturally in their environment. Such observations might require detailed interviewing, or simply living in that environment. The art of ethnography, especially as practiced by anthropologists, is a highly disciplined approach including linguistic skills as well as training in precise and accurate recording of behavior and speech of the persons or groups being observed. ------- A Guide to Evaluation Research Theory and Practice 227 underway in three cities are several extensive tests designed to evaluate alternative ways to lower the incidence of heart disease through improving nutrition. In the environmental area, six alternative approaches to communicating information about radon have been tested in New York (Smith et al, 1987). Practical Developmental Evaluation Approaches. If all of the research activities described in the preceding pages were undertaken for each and every proposed program or policy shift, the pace of change in American public programs would be appreciably slowed. One is forced to admire the devotion, care, and diligence of Fairweather and his colleagues, who spent two decades designing and testing an effective treatment for released mental patients. But it is instructive to note that when the Lodge approach finally had been perfected, psychopharmacological developments and the community mental health movement had so drastically changed the treatment of mental health patients that the Lodge approach had become largely irrelevant.17 While Fairweather and his associates labored carefully and at great length to perfect the Lodge approach, the content of policy space had shifted to highlight other concerns about the treatment of the mentally ill. Clearly, practical approaches to program development must take into account all the constraints on time and resources that are ordinarily confronted. Decades-long development efforts may be the "right" way, but the practical way must deliver the best possible information in a timely fashion. There are no hard and fast guidelines about how best to proceed, although a few general principles may be stated. In general, the greater the potential impact of the proposed program, the more carefully it should be evaluated before being implemented. This means that programs that promise to be costly, that may affect targets adversely, or that deal with the central gnawing problems of society, deserve the best possible evaluation. Minor programs, in which the loss to society of implementing an ineffective program is slight, demand less thorough treatment. It is probably the case that most prospective programs up for evaluation deal with relatively minor changes to existing programs and therefore deserve lighter prospective evaluations. Evaluation in support of development has been described above as a chronological list of procedures. However, that need not be the case. A set of experiments conducted simultaneously on several alternative programs can telescope the total amount of time necessary to arrive at a conclusion. Demonstrations of programs can be used for fine- tuning purposes. In addition, randomized experiments may be foregone when there are strong indications of effectiveness from nonexperimental evidence. The Assessment of Ongoing Programs: Accountability Evaluation Once a program has been enacted and is functioning, one of the main questions to ask is whether or not the program is appropriately in place. Here the issues are not so much whether or not the program is producing its intended effects, but whether the program is simply running in ways that are appropriate, and whether or not problems have arisen in the field that need to be solved. Programs often have to be fine-tuned in the first few years "Fairweather's efforts were not totally in vain. The basic understanding gained concerning what is needed to sustain chronically mentally ill persons outside institutions has made important contributions to the treatment of deinstitutionalized former patients. ------- 228 A Guide to Evaluation Research Theory and Practice or so of operation. (Therefore, estimates of effectiveness should be made only when any necessary "shakedown period" is over.) Is the Program Reaching the Appropriate Beneficiaries? Assuring that the appro- priate beneficiaries are covered by a program is often difficult. Sometimes a program is so poorly designed that it simply does not reach significant portions of the intended beneficiary population. For example, an educational program designed to reach intravenous drug users through community institutions such as churches and schools may simply miss its target population because they do not use the community institutions. A program to provide food subsidies to children who spend their days in child care facilities may fail to reach a large proportion of such children if the subsidy regulations exclude child care facilities that are serving fewer than five children. A very large proportion of children who are cared for during the day outside their own households are cared for by women who take a few children into their homes (Abt Associates, 1979). A thorough needs assessment of child care problems would have brought to light the fact that so large aproportion of child care was furnished by small-scale vendors, and hence should have been taken into account in drawing up administrative regulations. However, the needs assessment might not have been thorough enough. In addition, patterns of the problem might change over time, sometimes in response to the existence of a program. For example, it is quite likely that the existence of shelters for battered women increases the demand for such shelters because the existence of alternatives to remaining in an oppressive living arrangement lowers the tolerance threshold of battered women. These examples show the need to review from time to time how many of the intended beneficiaries are being covered by a program. Another example concerns the labelling of consumer products. Labels that are printed in extremely fine print or that use professional jargon may satisfy agency regulations but may be ignored by most consumers. The labelling program in its implementation simply does not reach many of its intended beneficiaries. Experience with social programs over the past two decades has shown that there are few, if any, programs that achieve full coverage or near full coverage of intended beneficiaries, especially where coverage depends on positive actions on the part of beneficiaries. Thus, not all persons who are eligible for Social Security payments actually apply for them; estimates indicate that up to 15% of all eligible beneficiaries never apply. AFDC programs only reach about one-half of the families who are eligible. Some intended beneficiaries may not be reached because the facilities delivering the services are not accessible to them. A single job training program for the entire state of Iowa that is located only in Dubuque does not exist, for all practical purposes, for those who live 50 or more miles from that city. There is another side to the coverage problem. Some programs cover and extend benefits to persons or organizations that were not intended to be served. Such unintended coverage may be impossible to avoid because of the ways in which the program is delivered. For example, although "Sesame Street" was designed primarily to reach disadvantaged children, it also turned out to be attractive to advantaged children and to many adults. There is no way to keep anyone from viewing a television program once broadcast (nor is it entirely desirable to do so in this case); hence, a successful TV program ------- A Guide to Evaluation Research Theory and Practice 229 designed to reach some specific group of children may reach not only them but also many others (Cook et al., 1975). Although the unintended viewers of "Sesame Street" are reached at no additional costs, there are times when unintended coverage may severely drain program resources. For example, while Congress may have wished to provide educational experiences to returning veterans through the GI Bill and its successors, it was not clear whether Congress had in mind the subsidization of the many new proprietary educational enterprises that came into being primarily to supply "vocational" education to eligible veterans. Or, in the case of the bilingual education programs, many primarily English- speaking children were found to be program beneficiaries, as some school systems discovered that the special bilingual classes were an excellent place to tuck away their trouble-making English- speaking students. Studies designed to measure coverage are similar in principle to those discussed under "Needs Assessment" studies earlier. In addition, overcoverage may be studied as a problem through program administrative records. However, undercoverage often involves commissioning special surveys. Program Integrity Research: Are Benefits Being Delivered? When program ser- vices depend heavily on the agencies' ability to recruit and train appropriate personnel, retrain existing personnel, or to undertake significant changes in standard operating procedures, it sometimes affects whether a program will manage to deliver to its target population that which had been intended. For many reasons the issue of program integrity often becomes a critical one that may require additional fine- tuning of legislation or administrative regulations. Several examples highlight the importance of this issue. Although informational pamphlets can be provided to medical personnel, pharmacies, and hospitals, the distribu- tion of such literature to patients is always problematic. It is difficult to motivate medical personnel to add distribution of these pamphlets to their existing duties. When the educational program requires special equipment, such as video and audio cassettes, delivery of the program can be even more difficult. In other cases, the right services are being delivered—but at a level that is too low to make a significant impact on beneficiaries. Thus, a supplementary reading instruction program that, on the average, results in only a mere additional 40 minutes per week of reading instruction, is hardly being delivered at sufficient strength and quantity to make any difference in reading progress. We have used medical services as an illustration because they involve loosely coupled organizations in which the lines of authority are clear but weak because of the autonomy given to the professional personnel. Similar situations exist in almost all human service organizations, such as police departments, courts, welfare departments, and schools. In all such organizations it is difficult to control what is occurring at the point of service delivery because of the discretion and autonomy given to service workers. For example, exhorting or even requiring doctors and nurses to educate their patients about the proper use of pharmaceuticals is difficult. It is much easier to regulate the pharmaceutical industry, a more tightly coupled institutional complex. Evaluation research designed to measure what is being delivered may be accom- plished easily or may involve problems of considerable complexity. Thus, it may be very ------- 230 A Guide to Evaluation Research Theory and Practice easy to learn from hospitals how many persons are served each week by their various outpatient services, but very difficult to learn precisely what goes on in those contacts between medical personnel and patients. If one is interested in the kinds of information provided by physicians and nurses in outpatient care, direct observation would be necessary, and could be very expensive to implement on a large scale. For example, the second author was recently involved in an evaluation of efforts to teach literacy as a part of vocational training. Although only six classes were being studied, two full-time observers were needed to conduct classroom observation. In addition to the cost problem, there is the possibility that the presence of observers may alter the behavior of teachers and students. One of the best examples of systematic studies in difficult-to-observe situations is Reiss's (1971) study of police-citizen encounters. Research assistants were assigned to ride with police on patrol to in order to systematically record each encounter between these police and members of the public. Reiss's study provides basic descriptive accounts of how such encounters are generated, how the behavior of citizens affected police responses, and soon. A recent example of an excellent implementation study is one conducted on the mental hospitals that serve the Chicago metropolitan area (Lewis et al, 1987.) The main issue of the study was to describe how the legislation and rules in place since the 1970s concerning involuntary commitment to the mental hospitals was working out in practice. The researchers discovered that fewer than 1% of the patients admitted over a year's time were involuntarily committed. Observing the court procedures, it was found that many persons brought to the attention of the police because of their bizarre or aggressive behavior were offered the choice between voluntarily committing themselves for periods of up to 30 days or being involuntarily committed for 60 days or more. The courts and prosecutors offered these alternatives because involuntary commitment involves lengthy procedures that appreciably slow down the transactions that the court processes. Given the choice, most persons brought in under complaint by the police choose the more lenient alternative. These practices averted what might have been a very heavy burden on the courts and prosecutors. To fine-tune a program, it may not be necessary to proceed on a large scale. For instance, it may not matter whether a particular implementation problem occurs frequently or infrequently, because it is not desirable that it occur at all. Thus, small-scale, qualitative observational studies may be the most fruitful for program fine-tuning. For example, if qualitative interviews with welfare recipients reveal any instances in which a husband and wife separated solely to retain or increase their benefit eligibility, one might judge that this was sufficient evidence that the program rules should be altered to remove the incentive for separation. Programs that depend heavily on personnel for delivery, involve complicated programs, or that call for individualized treatments for beneficiaries are especially likely candidates for careful and sensitive fine-tuning research. Each of these characteristics— either alone or in combination—can produce difficulties during implementation. (See Fairweather and Tornatzky 1977, for an outstanding example of the problematic nature of complicated individualized human services.) ------- A Guide to Evaluation Research Theory and Practice 231 Are Funds Being Used Appropriately? Fiscal Accountability. The accounting profession has been around considerably longer than has program evaluation; therefore, the procedures for determining whether or not program funds have been used responsibly and as intended are well established and not problematic. However, fiscal accountability measurements cannot substitute for the studies mentioned above. The fact that funds appear to be used as intended may not mean that program services are being delivered as intended, but only that proper documentation existed for funds expended. The conven- tional accounting categories used in a fiscal audit are ordinarily sufficient to detect, say, fraudulent expenditure patterns, but may be insufficiently sensitive to detect whether services are being delivered at the requisite level of substantive integrity. Indeed, it is worthy of note that the General Accounting Office has set up a separate section called the Program Evaluation Methodology Division. One of this Division's major roles is to instruct GAO personnel in appropriate evaluation procedures and to undertake evaluations of programs upon request by Congress. It is also important to keep in mind that the definition of costs under accounting principles differs from the definition of costs used by economists. For accountants, a cost reflects conventional bookkeeping entries such as out-of-pocket expenses, historical costs (i.e., the purchase price of an item), depreciation, and the like. Accountants focus on the value of current stocks of capital goods and inventories of products, coupled with cash flow concerns. When the question is whether program funds are being appropriately spent, the accountant's definition will suffice. However, economists stress "opportunity costs" defined in terms of what is given up when resources are allocated to particular purposes. More specifically, opportunity costs reflect the next best use to which the resources could be put. For example, the opportunity cost of raising teachers' salaries by 10% may be the necessity of foregoing the purchase of a new set of textbooks. While opportunity costs may not be especially important from a cost-accounting point of view, they become critical when cost- effectiveness or benefit-cost analyses of programs are undertaken. We will have more to say about these issues later. Program Assessment Evaluation The evaluation tasks discussed under accountability studies are directed mainly toward how well a program is running. Whether or not a program is effective is a different question, one to which answers are not easily provided. Essentially, the question is whether or not a program achieves its goals over and above what would be expected to happen without the program. Many evaluators consider the effectiveness question to be quintessential evaluation. There is some justification for this position because effectiveness assessment is certainly more difficult to accomplish, requiring higher levels of skills and ingenuity than any of the previously discussed evaluation activities. However, there is no justification for interpreting every evaluation task as calling for an effectiveness assessment Apparently, some evaluators have done this in the past, aided in their misinterpretation by imprecise requests for help from policymakers and administrators. Evaluative information on implementation and coverage can often suffice. ------- 232 A Guide to Evaluation Research Theory and Practice Nevertheless, in the final analysis, a program that has been successfully placed might still be ineffective. Estimating a program's degree of effectiveness is the main task to be described in this section. Can Effectiveness Be Estimated? The Evaluabilitv Question. A program that has gone through the stages described earlier in this chapter should provide few obstacles to evaluation for effectiveness in accomplishing its goals. However, many human-service programs present problems for effectiveness studies because one or more of several criteria for evaluation are absent. Perhaps the most important criterion—one which is frequently absent—is the lack of well-formulated goals or objectives for the program. For example, a program that is designed to raise the level of learning among certain groups of school children through the provision of per capita payments to schools for that purpose cannot be evaluated for its effectiveness without further specification of its goals. "Raising the level of learning" as a goal must be defined to indicate what is meant by "levels" and the kinds of learning achievements that are deemed relevant. Goals can often be clarified by helping program personnel articulate them. This step must be accomplished before proceeding with an effectiveness evaluation. A second criterion is that the program in question be well specified. Thus, a program that is designed to make health education agencies more effective by encouraging innovations cannot be evaluated for effectiveness. Primarily, the goals are not well specified, and neither are the means for reaching the goals. Innovation as a means of reaching a goal is not a method, but a way of proceeding. Anything new is an innovation; hence, such a program may encourage the temporary adoption of a wide variety of specific techniques and is likely to vary widely from site to site. Third, a program can be evaluated from an effectiveness point of view only if it is possible to estimate what the expected state of the targeted recipients would be in the absence of the program. As we will discuss below, the critical hurdle in effectiveness studies is to make comparisons between persons who experienced a program with those who did not. Hence, a program that is universal in its coverage and has been going on for a long period of time cannot be evaluated for effectiveness. For example, we cannot evaluate the effectiveness of the public school systems in the United States because it is impossible to make observations about American cities, towns, counties, and states that do not have (or recently have not had) public school systems. Finally, effectiveness evaluations are the most difficult evaluation tasks undertaken by evaluators, requiring the most highly trained personnel and often considerable sums of money for data collection and analysis. Such evaluations should not be undertaken unless sufficient resources and appropriately trained professionals are available to undertake the evaluations at the appropriate level. Legislatures and administrators have often mistakenly requested effectiveness evaluations by agencies that are not prepared to undertake them, and assumed that the costs of the evaluations would be slight (Raizen and Rossi, 1981). Unfortunately, there are no hard and fast rules about how much an effectiveness evaluation should cost or about how much skill may be needed for a given task. Effectiveness evaluability is discussed here because we believe that evaluators are often asked to undertake tasks that are impossible or nearly impossible. For example, the second author was recently asked to design an evaluation of a prosecutorial effort in a particular county to increase the likelihood that serious drug offenders would be sanctioned ------- A Guide to Evaluation Research Theory and Practice 233 severely and swiftly. One of the evaluation outcomes was citizens' fear of crime; presumably swift and severe sanctions would bring down the crime rate, at least for drug related offenses. Unfortunately, the evaluation was being designed after the program began; therefore, no pretest of citizen attitudes was possible. Without a pretest measurement, it is simply impossible to tell whether the program made any difference. As is emphasized earlier in this paper, there is no substitute for planning evaluations during the program design phase. Techniques have been developed (Whole, 1977) to determine whether a program is evaluable in the senses discussed above. Decisionmakers are well advised to commission such studies as a first step rather than to assume that all programs can be evaluated. Evaluability assessments essentially determine whether program goals are sufficiently well articulated, whether the program is uniform enough throughout the agency in question to assume a single program, and whether the evaluation results are going to reach the attention of decisionmakers. Finally, it is worth mentioning that questions of evaluability have in the past been used to justify "goal-free" evaluation methods (e.g., Scriven, 1972, Deutscher, 1977). The goal-free advocates have contended that since many of a program's aims evolve over time, the "hypothetico-deductive" approach to impact assessment (Heilman, 1980) is at best incomplete and at worst misleading. In our view, impact assessment necessarily requires some set of program goals, although whether they are stated in advance and/or evolve over time does have important implications for one's research procedures (Chen and Rossi, 1980). In particular, evolving goals require far more flexible research designs (and researchers). In other words, there cannot be such a thing as a "goal-free" impact assessment. At the same time, we have stressed above that there are other important dimensions to the evaluation enterprise in which goals are far less central. For example, a sensitive monitoring of program activities can proceed productively without any consideration of ultimate goals. Thus, goal- free evaluation approaches can be extremely useful as long as the questions they address are clearly understood. Did the Program Work? The Effectiveness Question. As discussed above, any assessment of whether or not a program was successful assumes that what the program was supposed to accomplish is known. For a variety of reasons, the legislation establishing programs often sets relatively vague objectives for the program, making it necessary to develop specific goals during the design phase. The goals for such general programs may be developed by program administrators through consideration of social science theory, past research, and/or studies of the problem that the program is supposed to ameliorate. However the goals are established, the important point is that it is not possible to determine whether a program was successful without developing a limited and specific set of criteria for establishing the condition of "having worked." For example, it would not have been possible to assess whether "Sesame Street" worked without having decided that its goals were to foster reading and number-handling skills. Whether these goals existed before the program was designed or whether they emerged after the program was in operation is less important for our purposes than the fact that such specific goals existed at the time of evaluation. By the same token, a public education campaign intended to raise public consciousness about environmental hazards—but also to have the contradictory ------- 234 A Guide to Evaluation Research Theory and Practice goals of reassuring the public and making them worried about such hazards—probably should not be evaluated until these contradictions are resolved. Programs rarely succeed or fail in absolute terms. Success or failure is always relative to some bench mark. Hence, an answer to "Did the program work?" requires considering "compared to what?" Appropriate comparisons can be made in at least three dimensions: 1) comparisons with different subjects, 2) comparisons with different settings, and 3) comparisons with different times. In the first instance, one might compare different sets of persons, varying the setting and the times of comparison. In the second instance, one might compare the performance of the same set of persons in different settings—for example, at home and at work. In the third instance, one might compare the same students in the same setting, but at different points in time. Because everyone is familiar with the different levels of aggregation involved in school settings—individual students, classes, and schools—and the time structure of schools—class periods, terms, and academic years—we will use examples in which students, classes, and classroom periods figure strongly. However, it is important to keep in mind that the concepts being illustrated are generally applicable; for example, in the adultpopulation, we can distinguish individuals, households, neighborhoods, and cities for different levels of aggregation and life cycle stages for the time periods of adult life.18 As Figure 1 indicates, it is also possible to mix these three fundamental dimensions to develop a wide variety of comparison groups.19 For example, comparison group C2 varies both the subjects and the setting although the time is the same. Comparison group C6 varies the subjects, the setting, and the time. However, with each added dimension by which a comparison groupdiffers from the experimental group, the validity of the resulting effectiveness estimates necessarily decreases. For example, the use of comparison group C4 (different setting and different time period) requires that the assessment of program effectiveness simultaneously take into account possible confounding factors such as differences in student background and motivation, or the "reactive" potential of different classroom environments. This in turn requires either an extensive data collection effort to obtain measures of these confounding factors coupled with the application of appropriate statistical adjustments (e.g., multiple regression analysis), or the use of randomization and, thus, true control groups. Of course, randomization will, on the average, eliminate confounding influences in the estimation of impact. For analytic simplicity alone, it is easy to see why so many expositions of impact assessment strongly favor research designs based on random assignment. In addition, it should be emphasized that appropriate statistical adjustments (in the absence of randomization) through multivariate statistical techniques, require a "The convenience of using schools lies in the typical school organization, which assigns students to classes and classrooms, and instruction to periods. Adults are sometimes found outside of households, neighborhoods often do not have distinct boundaries, and human life cycle stages have only fuzzy boundaries. 19 We have used the term "comparison group" as a general term to be distinguished from the term "control group." Control groups are comparison groups thathave been constructed by random assignment. ------- A Guide to Evaluation Research Theory and Practice 235 Figure 1 A Typology of Comparison Groups Same Subjects Different Subjects Same Different Same Different Setting Setting Setting Setting Same Time xxa xxa c\ C2 Different Time C3 C4 C5 •Although logically possible, these two boxes imply comparison groups that are not sensible with human subjects. ------- 236 A Guide to Evaluation Research Theory and Practice number of assumptions that are almost impossible to meet fully in practice.20 For example, it is essential that measures of all confounding influences are included in a formal model of the program's impact, that their mathematical relationship to the outcome is properly specified (e.g.,alinear additive form versusamultiplicativeform),and that the confounding influences are measured without error. Should any of these requirements be violated, one risks serious bias in any estimates of program impact. At the same time, random assignment is often impractical or even impossible. Furthermore, even when random assignment is feasible, its advantages rest on randomly assigning a relatively large number of subjects. To randomly assign only two schools to the experimental group and two schools to the control group, for example, will notproduce, on the average, equivalence between experimentals and controls.21 Consequently, one is often forced to attempt statistical adjustments for initial differences between experimental and comparison subjects. Whether or not such adjustments succeed in performing their function is always questionable. The use of multivariate statistical adjustments raises a host of questions that cannot be addressed in detail here. Suffice it to say that there is a growing consensus among statisticians that various social scientists have routinely pushed statistical procedures well beyond where they are designed to go.22 However, as a general rule, multivariate adjust- ments are justifiable to the extent that appropriate measures are used for the adjustment, and that such measures are highly reliable and valid—criteria that are not easily satisfied. To assess the usefulness of impact evaluations not resting on random assignment, consider a recent evaluation (Robertson, 1980) of the effectiveness of driver education programs in reducing accidents among 16 to 18-year-olds. The evaluator took advantage of the fact that the Connecticut legislature decided not to subsidize such programs within local school systems. In response to this, some school districts dropped driver education 20 There are a number of nonrandomized designs that yield effectiveness estimates of high validity without random assignment (Cook and Campbell, 1979). One of the strongest is the regression- discontinuity design, which, under very modest assumptions, guarantees unbiased estimates of treatment effects (Berk and Rauma, 1983). A discussion of such a "quasi-experimental" design is beyond the scope of this paper, but it is an important option when true experiments cannot be conducted. In general, the better randomized designs cannot be used because the conditions for their proper use are not often met. 21 Since classes are randomized, it is necessary to have relatively large numbers of classes randomly allocated to the experimental and control conditions to be assured that the two sets of classrooms are tending to equivalence. 22 See, for example, the Summer 1987 issue of The Journal of Education Statistics. Sta- tistical procedures have far too often been applied to data that are not even remotely appropriate, using models that have virtually no convincing justification. Coming under particular criticism is the use of structural equation models (especially with latent variables) which regularly outstrip social science data and theory. At this juncture, perhaps the best advice is to keep one's statistical analyses simple and as close to the data as possible. For example, multivariate matching, when feasible, may be superior to statistical adjustment (often based on techniques such as multiple regression) because matching assumes no functional form between the explanatory/control variable and the outcome. ------- A Guide to Evaluation Research Theory and Practice 237 from their high school curriculum and others retained it. Two sets of comparisons were possible: accident rates for persons of the appropriate age range in the districts that dropped the program, computed before and after the program was dropped; and accident rates for the same age groups in the districts that retained driver education were compared with the accident rates in districts that dropped the driver education program. It was found that the accident rates dropped significantly in those districts that dropped the program, a finding that might lead one to interpret that the program increased accidents because young people were enticed to obtain licenses earlier than otherwise. The use of non-randomized comparison groups is justified in this research because there was some knowledge about the selection process involved in some school boards dropping the program; in most cases school boards did so on the basis of financial considerations rather than because the program was successful or unsuccessful. It is sometimes possible to either solve or partially bypass comparison group problems by resorting to some set of external criteria as a baseline. For example, it is common in studies of desegregation or affirmative action programs to apply various measures of equity as a"comparison group" (Baldus and Cole, 1977). Thus, an assessment of whether schools in black neighborhoods are being funded at levels comparable to schools in white neighborhoods might apply the criterion that disparities in excess of plus or minus 5% in expenditures per pupil indicate inequality and, hence, failure (Berk and Hartman, 1972). However, the use of such external baselines by themselves still leaves open the question of causal inference. It may be difficult to determine if the program itself or some other set of factors produced the observed relationship between outcomes of interest and the external measurement. It is also important to understand that distinguishing between success and failure is not a clear-cut decision because there are usually degrees of success and failure. While decisionmakers may have to make binary decisions, for example, to fund or not to fund, the evidence provided on effectiveness usually consists of statements of degree that have to be translated into binary terms by the decisionmakers. Thus, a program that succeeds in raising the average level of reading by half a year more than one would ordinarily expect—not an inconsiderable gain—may be less successful than one that has effective- ness estimates of a full year. This quantitative difference has to be translated into a qualitative difference when the decision to fund one rather than the other program comes into question. At this point, other considerations may surface, including costs, potential negative effects, public acceptability, and so on. In short, passing a statistical significance test does not necessarily mean that a program's effects are substantively significant. Designs Frequently Used For Estimating Effectiveness. The preceding discussion of comparison group strategies has, of necessity, been couched in relatively abstract terms. The actual practice of choosing among such strategies leads to a large variety of practical research designs; a typology of research designs commonly used for assessing the effectiveness of programs is shown in Figure 2. There are two main bases for the typology: 1) how the comparison and treatment groups are constituted, and 2) whether or not the comparison groups are reflexive (i.e., involve comparisons of the subjects with them- selves). The data-collection strategies usually associated with each research design also are shown. The last column on the right indicates whether the research design is applicable to full-coverage programs or to partial-coverage programs. As indicated earlier, ongoing ------- Figure 2 A TYPOLOGY OF RESEARCH DESIGNS FOR IMPACT ASSESSMENT to U) 00 RESEARCH DESIGN I: II: III: IV: "True" or randomized experiments Regression discontinuity Time Series Quasi-experi- ments with non-random INTERVENTION ASSIGNMENT TO TARGETS Researcher controlled random assignment Controlled, biased, but known selection*3 Uncontrolled selection Uncontrolled selection:3 non- random assignment TYPE OF CONTROLS Randomization often with statistical controls Statistical control modeling known selection bias Reflexive Constructed, statistical and/or generic0 OUTCOME DATA COLLECTION POINTS Minimum = After intervention. usually before and after, often many measures during intervention Minimum = Before and after intervention Many measure before and after intervention Minimum = After intervention. usually before and after, often many measures during intervention APPLICABILITY ONLY partial programs coverage Partial coverage programs Partial and full coverage programs ONLY partial programs coverage Q c Si o 8 W E. e o' 3 o. o' o> ------- Figure 2 (continued) V: Panel Studies Uncontrolled Reflexive More than two measures during VII: Time Series Uncontrolled selection Retrospective After intervention with respective reflexive measures of before state Partial and full coverage Partial and full coverage programs O c RESEARCH DESIGN INTERVENTION ASSIGNMENT TO TARGETS TYPE OF CONTROLS OUTCOME DATA COLLECTION POINTS APPLICABILITY — o. 0 Evaluat ' VI: Regression discontinuity selection Uncontrolled selection intervention Reflexive Minimum = Before and after intervention programs Partial and full coverage programs esearch Theory o. VIII: Cross-section Uncontrolled surveys selection Statistical After intervention only Partial coverage programs IX: Judgmental Uncontrolled Shadow assessments selection controls After intervention only Partial and full coverage programs aln a few quasi-experiments, the control over who will receive the treatment is exercised by the researcher. ^Selection process must be clearly stated and faithfully carried through. cGeneric controls are known standards, such as average IQ or reading skills of the population. to L»J ------- 240 A Guide to Evaluation Research Theory and Practice programs intended to cover all of the targeted subjects present special difficulties; in general, only reflexive controls are applicable. It is not possible logo into detail here on how each of the designs can be implemented appropriately. They are ranked from top to bottom roughly in the order of their ability to produce unbiased effectiveness estimates. Although the more powerful research designs are generally more expensive, there are notable exceptions including time-series designs.23 The several common approaches to establishing comparison groups are sketched below: Randomized Comparisons: Targets are randomly divided into an experimental group, to whom the intervention is administered, and randomized controls, from whom the intervention is withheld. Constructed Comparisons: Targets to whom the intervention is given are matched with an equivalent group—constructed comparison groups—from whom the intervention is withheld. Statistical Comparison: Participant and nonparticipant targets are compared, hold- ing differences statistically constant between participants and nonparticipants. Reflexive Comparisons: Targets who receive the intervention are compared with themselves, as measured before the intervention. Repeated Measures Reflexive Comparisons: A special case of reflexive controls in which targets are observed repeatedly over time. Also called panel studies. Time-Series Reflexive Comparisons: A special case of reflexive controls in which rates of occurrence of some events are compared before and after the start of an intervention. Generic Comparisons: Intervention effects among targets are compared with estab- lished norms of typical changes occurring in the target population. Shadow Comparisons: Targets who receive the intervention are compared with the judgments of experts, program administrators, and/or participants about what changes are to be ordinarily expected for the target population. The most severe restriction on strategy choice is whether or not the intervention in question is being delivered to all (or virtually all) members of a target population. For programs with total coverage, as in the case of long-standing, ongoing, fully funded programs, it is not usually possible to identify a group that is not receiving the intervention 23 Time-series designs are typically possible when some agency has collected time-series data over some lengthy period. Typical time series include stock market prices, unem- ployment measures, crime rates, and the like. If the full cost of collecting the series is included, time-series designs would be among the more expensive. ------- A Guide to Evaluation Research Theory and Practice 241 and that is essentially comparable with the subjects who are beneficiaries. In such cir- cumstances, the main strategy available is the use of reflexive comparisons. In contrast, interventions that are to be tested on a demonstration basis ordinarily will not be delivered to all of the target population. Hence, in the start-up phase new programs are, by definition, programs with partial coverage. In all likelihood, no program has ever achieved total coverage of its intended target population. Even in the best programs, there are some persons who refuse to participate, others who are not aware that they can participate, and still others who are declared ineligible on technicalities. Nevertheless, many programs achieve almost full coverage. The Social Security Administration's retirement payments, for example, reach most of the eligible portions of the population. Fortunately for our purposes, there are enough programs with full coverage that are also not uniform over time or over localities. These differences, over time and across administrative subdivisions, provide the evaluator with some limited opportunities to assess the effects of variations in the program. Thus, one might not be able to assess what the net impact of elementary schooling is (as compared to no schooling at all) but one can assess the differential impact of various kinds of schools and of changes in schools over time. These variations in ongoing, established programs occur in a variety of ways; policies change over time along with their accompanying programs. A program's administrators may also institute modifications in order to meet some new condition or to make administration easier. Thus, from time to time, Social Security benefits have been increased to take into account new conditions or to add new services (e.g., Medicare). Similarly, sufficient local autonomy may be given to states and local governments so that a program (e.g., Aid to Families with Dependent Children) may vary somewhat from place to place. With proper precautions, such "natural variation" may provide a leverage point for the estimation of program effects. For partial-coverage programs, a larger variety of strategies are available. If the program is under the control of the evaluator (as may be the case in new or prospective programs), the ideal solution is to use randomized comparisons. A set of potential target subjects, representative of those who might be served if the program goes into effect, are selected in some unbiased way and randomly sorted into an experimental group and a comparison or control group. This process of randomization assures probabilistic equivalence of the beneficiaries receiving the intervention (the experimental group) to others who are not (the randomized controls). When an evaluator cannot employ randomization by forming experimental and control groups or conditions, adequate comparison groups may be formed by uncovered target subjects, if the proper precautions are taken. The simultaneous consideration of comparison group strategies, intervention features, and data collection strategies produces the schematic classification of impact assessment research designs shown in Figure 2. Each of the research designs shown in the table is discussed below: • DESIGN I: Randomized "True" Experiments. "True" experiments are applicable only to partial-coverage programs. The essential feature of true experiments is the random assignment of treatments to targets and the random withholding of treatment from targets, so that these constitute, respectively, an experimental and a control group. ------- 242 A Guide to Evaluation Research Theory and Practice The most elaborate true experiments are longitudinal studies consisting of a series of periodic observations of experimental and control groups. Most of the large-scale field experiments undertaken over the past two decades to test proposed programs have been longitudinal, randomized experiments in which data on participants were collected over periods of years. For example, the several negative income tax experiments have all employed the same basic longitudinal design, varying one from the other in the kinds of treatments tested and in the length of time over which the intervention treatments were given, ranging from three to ten years. However, preintervention measures often are simply indefinable. For example, prisoner rehabilitation experiments that are designed to affect recidivism can be based only on postintervention measures, since recidivism cannot be measured before release from prison. Similarly, intervention efforts designed to reduce the incidence of disease or accidents have undefined preintervention outcome measures. • DESIGN II: Regression-Discontinuity Studies Some programs are administered using a definite and precise set of rules for selecting participants. For example, some college fellowship programs allocate fellowships on the basis of scores received on standardized tests (e.g., The National Merit Scholarship Test) and food stamp eligibility is determined by income eligibility rules. If such rules are followed with reasonable fidelity, it is possible to derive fairly accurate estimates of the net effects of the program in question by statistical analyses that focus on persons who are at the cutting points used in selection. The analyses require that the rules of selection be administered uniformly and that valid and reliable measures of outcomes be employed. Although this approach to studying impact is free of many of the problems associated with nonexperimental designs, it is of limited usefulness because few programs are administered by selecting participants in a clear and precise fashion. In addition, the required statistical analysis is considerably sophisticated and cannot be used by persons with only an elementary knowledge of statistics. A detailed discussion of the regression-discontinuity design may be found in Trochim (1984). DESIGN III: Time-Series Designs Time-series designs are based on the analysis of repeated measures taken on an aggregate unit (usually a political jurisdiction) with many data points surrounding a point in time when a new, full-coverage intervention was introduced or an old program was substantially modified. By aggregate statistical series we mean periodic measures taken on a relatively large population, such as vital statistical series (births, deaths, migrations), usually defined as rates for fairly large popula- tions.24 24 Whether the time-series data concern a city, state, or the nation as a whole, only one entity is under study. Indeed, the basic strategy of time series has been used to study single cases, as in clinical studies of individual persons. (See Kadzin, 1982.) ------- A Guide to Evaluation Research Theory and Practice 243 Time-series analyses are especially important for estimating the net impacts of full- coverage programs, which present especially difficult problems in impact assess- ment because they lack an uncovered target population that might serve as a control. However, if extensive, over-time, before-program-enactment observations on out- come measures exist, it is possible to use the powerful techniques of time-series analyses. Thus, it may be possible to study the effect of the enactment of a gun- control law in a particular jurisdiction, but only if the evaluator has access to a sufficiently long-term series consisting of crime statistics that track long-term trends in gun-related offenses. Of course, for many ongoing interventions, such long-term measures do not exist; for example, there are no long-term, detailed time series on the incidence of certain acute diseases, making it difficult to assess the impact of Medicare or Medicaid on them. Although the technical procedures of time-series analyses are quite complicated, the ideas underlying them are quite simple. The trend before a treatment was put into place is analyzed in order to obtain a projection of what would have happened without the intervention. The trend after the intervention is then compared to the resulting projections, and statistical tests are used to determine whether or not the observed postintervention trend is sufficiently different from the projection to infer that the treatment had an effect. For example, the effects of changing the pricing policies on household water consumption can be studied using time-series analysis by analyzing the consumption trends before the pricing policy changes, projecting water consumption trends on that basis, and comparing actual consumption with the projections (Berk et al., 1981). Perhaps the most serious limitation on time-series designs is that many prein- tervention observations are needed in order to model preintervention time trends accurately (more than 30 points in time are recommended). For this reason, time- series analyses are usually restricted to outcome concerns for which governmental or other groups routinely collect and publish statistics. • DESIGN IV: Ouasi-Experiments with Constructed. Generic, and/or Statistical A large class of impact assessment designs consists of nonrandomized "quasi- experiments," all of which have in common comparisons between experimental groups, created out of targets who have elected (in some fashion) to participate in a program (or have been selected administratively as participants), and constructed comparisons, groups of nonparticipants who are in some critical ways comparable with the participants. Such comparisons may be made through the assembly (or construction) of groups of nonparticipant targets who resemble closely the group of participants. It is critical to select comparison groups that are very similar to each other. Comparison groups may be constructed by matching each unit in the intervention group with a similar unit or by matching aggregate features of the intervention group ------- 244 A Guide to Evaluation Research Theory and Practice with another group with the same aggregate feature. For example, cities in the intervention group may be matched each with another city similar in size, regional location, and economic base. Or individual persons may each be matched in age, gender, and ethnic background. An example of aggregate matching is to select school classes with the same averages in age, IQ scores, and ethnic proportions as the group of students used in the intervention. Key to the construction of appropriate comparison groups is some prior knowledge of the important factors on which the intervention group and the comparison group are to be matched. Closely related to constructed comparisons are those defined through statistical analysis. Persons who have not participated in a program are compared to those who have, using statistical techniques that hold constant known differences between participants and nonparticipants. Statistical controls are often used, along with pre-, ongoing, and postmeasures of outcomes in constructed comparison groups. Indeed, the combined use of constructed controls and statistical controls can often increase the power of a quasi-experiment considerably. If statistical controls are used with postmeasures only, then the design is really that of a cross-sectional survey. (See discussion of Design VIII.) In short, the line between nonrandomized experiments with constructed controls and one-shot surveys is often obscure. The important point is that the reasoning involved in both is much the same: both attempt to estimate net effects by creating control groups that presumably represent potential targets who were not exposed to the intervention. Another approach to the comparison group construction problem is to use generic controls, usually consisting of measurements purporting to represent the typical performance of targets or the population from which targets may be drawn. Thus, in judging the performance of school children enrolled in a new learning program, the participants' scores on a standardized achievement test may be compared with published general norms for school children of that age or grade. Generic controls are widely available for some subjects, such as IQ and achievement tests, but for most subjects are not easily at hand. In any event, generic controls are rarely suitable; targets are often selected because of the ways in which they differ from the general population. DESIGN V: Panel Studies Panel studies are ones in which the same units are repeatedly measured over time, the period in question spanning the introduction of an intervention. For example, a panel may be established before the beginning of an educational campaign, and queried repeatedly before, during, and after the campaign is put in place. Panel studies are based on a reflexive control strategy in which the changes in individuals occurring during the intervention are attributed to be the effects of the intervention. Although panel studies appear to be a simple extension of before-and-af ter designs (see Design ------- A Guide to Evaluation Research Theory and Practice 245 VI) through the addition of more data collection points, these studies enjoy a considerably higher standing in the plausibility order of impact assessments. The additional time points, properly employed, allow the researcher to begin to specify the processes by which an intervention has impacts upon targets. This design is especially important in the study of full-coverage programs. A prominent but controversial example of how this design was used is a study of the impact of children's viewing of violence and aggression shown in television programs on their manifestations of aggression toward their classmates. Given the circumstances of almost universal television viewing among children and, hence, the virtual impossibility of establishing controls who do not watch television, the best approach was to study how varying amounts of watching violence and aggression affected the display of aggression at some subsequent point in time (Milavsky et al, 1982). In some circumstances, because subjects are repeatedly contacted, panel studies risk affecting the subjects through the research effort itself. Thus, in the New York State study of communication to households about the dangers of radon gas, subjects became more sensitive to the problem simply because they were repeatedly asked questions about the topic (Smith et al, 1987). DESIGN VI: Before-and-After Studies Although few designs have as much intuitive appeal as before-and-after studies, they are among the least valid of assessment approaches. The essential feature of abefore- and-after study is a comparison of the same targets at two points in time, separated by a period of participation in a program, the differences between the two measurements being taken as an estimate of the net effects of the intervention. The main deficiency of such designs is that ordinarily they cannot disentangle the effects of extraneous events occurring during that period from the effects of the intervention. For example, a mass educational campaign's effects cannot be easily separated from those caused by ordinary media coverage of the same topics. DESIGN VII: Retrospective Before-and-After Studies The principal feature of this design is that it is based on retrospective reconstructions of the state of targets before an intervention along with postintervention measures. Typically, people are selected who have participated in a program, and they are asked to reconstruct what their circumstances were before they participated in the program. For obvious reasons, this design yields estimates of the net effects of programs that are even less plausible than straight before-and-after studies based on Design VI. In addition to the ambiguities of interpretation caused by uncontrolled-for extraneous events, this design also suffers from the problems of using fallible reconstructions of the situation before the intervention, relying as it does on possibly faulty recall. For these reasons, this design is not recommended for use in any evaluation. ------- 246 A Guide to Evaluation Research Theory and Practice DESIGN VIII: Cross-Sectional Surveys Cross-sectional surveys are single censuses or sample surveys. They are cross- sectional in the sense of providing a set of measures—as of aparticular point or cross- section in time. The typical cross-sectional survey used to provide estimates of net effects is usually a sample survey of some target population, part of whom have received a treatment (or participated in a program) and part of whom have not. In some cases, the cross sectional survey is of target population members who have received differing amounts of a treatment or who have experienced several variations of the treatment. Those who received the treatment are compared with those who did not on postintervention outcome measures, using statistical techniques to hold constant differences between the two groups. Although cross-sectional designs are among the less expensive ways to estimate impact, they are also among the more difficult to carry out rigorously. The critical problems center on whether sufficient knowledge exists concerning which are the important factors to hold constant in making statistical comparisons between persons who have been exposed to an intervention and those who have not. Indeed, in most cases, an important case can be made that exposure itself, being selective and voluntary, is an indication of important intervention and comparison group differ- ences. By definition, this self-selection cannot be held constant in any comparison. When cross-sectional surveys are used with partial-coverage programs, they are to be considered a variant of constructed comparison groups. However, using them to gauge the effectiveness of full-coverage programs that vary from place to place constitutes a unique application. For instance, there are several studies that attempt to gauge the effectiveness of gun-control legislation by examining the levels of restrictions on licensing and gun usage (Krug, 1967; Geisel et al., 1969; Seitz, 1972). In this case, the states constitute the units, with the observations being rates for various sorts of crime in a particular year. These studies are not analyses of time series, but use rates at only one point in time. Note that such impact assessments lead to estimates of how much of a net effect one variation in the treatment has compared with others. In the case of gun-control legislation studies, the variations being assessed are degrees of stringency in state laws. If applied to the study of whether Medicaid plans of varying levels of generosity affect medical care usage, the same kind of state comparison assesses the effects of varying levels of generosity but will not be able to tell whether Medicaid per se has any effect on medical care consumption. A variant of the cross-sectional survey may be seen in a design that uses constructed controls with after-only measures. One of the best known of such studies is the controversial evaluation of Head Start (Cicirelli et al., 1969), which was based on a comparison of children in the first grade who had participated in Head Start at nursery-school-age with first-graders of comparable background in the same or nearby schools who had not participated. Whether or not a cross-sectional evaluation was carried out properly is a question that centers on the types of statistical controls that were employed, which is almost always a matter subject to disagreement. The ------- A Guide to Evaluation Research Theory and Practice 247 issues involved in the proper design and analysis of one-shot surveys of existing, full- coverage programs with treatment that varies by site are especially complicated. • DESIGN IX: Judgmental Assessments The final design considered in Table 2 is one in which the judgments of some presumed experts, program administrators, or participants play the largest role in estimating net impact. In connoisseurial impact assessments, an expert—or connois- seur—is employed to examine a program, usually through visits to the program site. The expert gathers data in an informal way and then makes a judgment. Such judgments may be aided by the use of generic controls- -that is, existing estimates of what the population as a whole usually experiences—or "shadow" controls, which are educated guesses about what normal progress is considered to be. Needless to say, such assessments are the shakiest of all impact assessments. Equally suspect are impact assessments that rely upon the judgments of program administrators. Because of their obvious interests in making their efforts appear successful, such judgments are far from impartial. In the assessment of some programs, participants' judgments of program success have been used. These judgments have some validity, especially for programs that seek to increase participant satisfaction. However, it is usually difficult, if not impossible, for participants to make judgments about net impact because they do not have appropriate knowledge to bring to bear on their judgment. We do not mean to argue against all judgmental assessments; there are circumstances in which the evaluator can use nothing else. Furthermore, although some might advise against undertaking any assessment, we believe that some assessment is better than none. Judgmental designs may be the only type that can be used when few funds are available; when no preintervention measures exist; when no reflexive controls can be used; and when everyone is covered by the program and the program is uniform in place and time—so that neither randomized nor constructed controls can be used. Choosing which design to use in an evaluation is difficult. As a general rule, one should employ the best design possible, given the time and resources available. In addition, programs that rely heavily on voluntary self-selection should not use designs that cannot adequately handle self-selection biases. Was the Program Worth It: The Economic Efficiency Question. Given a program of proven effectiveness, the next question one might reasonably raise is whether the opportunity costs of the program are justified by the gains achieved. The same question might be more narrowly raised in a comparative framework: Is Program A more "efficient" than Program B?—both being otherwise equally acceptable ways of achieving a particular goal? The main problem in answering such questions focuses on establishing a yardstick for such an assessment. For example, would it be useful to think in terms of dollars spent ------- 248 A Guide to Evaluation Research Theory and Practice for units of achievement gained, in terms of students covered, or in terms of classes or schools that come under the program? The simplest way of answering efficiency issues is to calculate cost-effectiveness measures, dollars spent per unit of output. Thus, in the case of the "Sesame Street" program, two cost- effectiveness measures were computed: 1) dollars spent per child-hour of viewing, a measure of the cost of running the program, and 2) dollars spent per each additional letter of the alphabet learned, a cost-effectiveness measure that takes into account the resulting increase in learning. Note that the second measure implies knowing the effectiveness of the program as established by an effectiveness evaluation. The most complicated way to answer the efficiency question is to conduct a full- fledged cost- benefit analysis in which all the costs and benefits are computed. Relatively few such analyses have been made of social programs because it is difficult to convert all the costs and all the benefits into the same yardstick terms. In principle, it is possible to convert into dollars all the costs and benefits of a program; in practice, it is rarely possible to do so without some disagreement on the value, say, of learning an additional letter of the alphabet. An additional problem with full-fledged cost-benefit analyses is that they must take into consideration the long-run consequences; notonly of the program,butalsoof the long- term consequences of the next best foregone alternative. This immediately raises the question of "discounting": the fact that resources invested in a social program today may produce, over a number of years, consequences that have to be compared with those that might have resulted from the next best alternative. For example, a vocational program in inner-city high schools should address (among other things) the program's long-term impact on students' earnings over their lifetimes. This in turn requires that the costs and benefits of the program and the next best alternative be phrased in terms of today' s dollars. Without going into the arcane art of discounting, the problem is to figure out what might be a reasonable rate of return over the long run for current program investments and competing alternatives. One can obtain widely varying assessments, depending on what rate of return is used (Thompson, 1980). Evaluation in Evolution The field of evaluation research is scarcely out of its infancy as a social science activity. The first large-scale field experiments were started in the mid- 1960s. The interest in large-scale national evaluations of programs had its origins in the War on Poverty. The art of designing large-scale implementation and monitoring studies is still evolving rapidly. Concern with the validity status of qualitative research has just begun. Neverthe- less, the demand for sound program evaluations is growing. In this context, perhaps the best overall message is to keep it as simple as possible. Typically, simple programs will be hard enough to design and implement. Simple research designs usually will be sufficiently demanding. And simple data analyses will likely tax even the best evaluators. In other words, there is no such thing as a routine evaluation. To add unnecessary complexity to the burden is to turn a promising opportunity into an almost certain failure. ------- A Guide to Evaluation Research Theory and Practice 249 REFERENCES Abt Associates. 1979. Child Care Food Program. Cambridge, Massachusetts: Abt As- sociates. Baldus, D.C., and J.W. L. Cole. 1977. Quantitative proof of intentional discrimination. Evaluation Quarterly l(l):53-86. Barnett, V. 1982. Comparative Statistical Inference. New York: John Wiley. Becker, H.S. 1958. Problems of inference and proof in participant studies. American Sociological Review. 23(6):652-60. Berk, R.A. 1988. The role of subjectivity in criminal justice classification and prediction methods. Criminal Justice Ethics. 6(1),, in press. Berk, R.A. 1988. Causal Inference for Sociological Data. In the Handbook of Sociology. N. Smelser (ed.). Beverly Hills: Sage Publications. Berk, R.A., R. Boruch, D. Chambers, P. Rossi, and A. Witte. 1985. Social policy ex- perimentation: a position paper. Evaluation Review 9,4 (August) 387-429. Berk, R.A., and M. Brewer. 1978. Feet of clay in hobnailed boots: an assessment of statisticalinference in applied research. In Evaluation Studies Review Annual. Vol. 3. T.D. Cook, ed. Pp. 90-214. Beverly Hills: Sage. Berk, R.A., and T.F. Cooley. 1987. Errors in forecasting social phenomena. Climatic Change 11(2): 247-265. Berk, R.A., T.F. Cooley, CJ. LaCivita, and K. Sredl. 1981. Water Shortage: Lessons in Water Conversation Learned from the Great California Drought. Cambridge, Mass: Abt Books. Berk, R.A., and A. Hartman. 1972. Race and class differences in per-pupil staffing ex- penditures in Chicago elementary schools. Integrated Education 10(l):52-57. Berk, R.A., and D. Rauma. 1983. Capitalizing on nonrandom assignment to treatments: a regression discontinuity evaluation of a crime control program. Journal of the American Statistical Association. Berk, R.A., and P.H. Rossi. 1976. Doing good or worse: evaluation research politically reexamined. Social Problems 23(4):337-49. Campbell, D.T., and A. Erlebacher. 1970. How regression artifactsin quasi-experimental evaluations make compensatory education look harmful. In Compensatory Education: A ------- 250 A Guide to Evaluation Research Theory and Practice National Debate. J. Hellmuth, ed. pp. 185-210. New York: Brunner/Mazel. Carson, R. 1955. The Silent Spring. New York. Bantam Books. Chen, H., and P.H.Rossi. 1980. The multi-goal, theory-driven approach to evaluation: a model linking basic and applied social science. Social Forces 59:(1): 106-22. Cicirelli,V.G.,etal. 1969. The Impact of Head S tart. Athens, Ohio. Westinghouse Learning Corporation and Ohio State University. Coleman, J., et al. 1967. Equality of Educational Opportunity. Washington: GPO. Conant, James B. 1959. The American High School Today. New York: McGrawHill. Cook, T., et al. 1975. Sesame Street Revisited. New York: Russell Sage. Cook, T., and D. Campbell. 1979. Quasi-Experimentation. Chicago: Rand McNally. Cronbach, L.J. 1975. Five decades of controversy over mental testing. American P§ychologM30(l):l-14. Cronbach, L. J. and Associates. 1980. Towards Reform of Program Evaluation. Menlo Park,California: Jossey-Bass. Cronbach.L. J. 1982. Designing Evaluations of Educational and Social Programs. Menlo Park.California. Jossey-Bass. Deutscher, 1.1977. Toward avoiding the goal trap in evaluation research. In Readings in Evaluation Research. F. Caro, ed. pp. 221-38. New York: Russell Sage. Ericksen,E. P.,andJ. B.Kadane. 1985, Estimating the population in a census year: 1980 and beyond. Journal of the American Statistical Association. VolSO. 98-131. Fairweather, George, and Louis G. Tornatzky. 1977. Experimental Methods for Social Policy Research. New York: Pergamon. Franke, R.H. 1979. The Hawthorne experiments: review. American Sociological Re- view44:(5):861- 67. Franke, R.H., and J.D. Kaul. 1978. The Hawthorne Experiments: first statistical interpretation. American Sociological Review 43:(5):623-43. Geisel, M. S., R. Roll, and R. S. Wettick, Jr. 1969. The effectiveness of state and local regulation of handguns: a statistical analysis. Duke Law Journal. (August) 647-676. ------- A Guide to Evaluation Research Theory and Practice 251 Gramlich,E.M.,andP. Koshel. 1975. Educational Performance Contracting. Washington: TheBrookings Institution. Cuba, E., and Lincoln, Y. 1981. Effective Evaluation. Menlo Park, California. Joseey- Bass. Guttentag, M., and E. Struening, eds. 1975. Handbook of Evaluation Research. 2 vols. BeverlyHills: Sage. Harrington, Michael. 1962. The Other America. New York: MacMillan. Hausman.J. A.,andD. A. Wise. 1985. Social Experimentation. Chicago, Illinois. The Universityof Chicago Press. Heckman, J., and R. Robb. 1985. Alternative methods for evaluating the impact of interventions, in J.J. Heckman and B. Singer Longitudinal Analysis of Labor Market Data. New York. Cambridge University Press. Heilman.J.G. 1980. Paradigmatic choices in evaluation methodology. Evaluation Review 4:(5):693-712. Holland, P. W. 1986. Statistics and causal inference. Journal of the American Statistical Association. Vol 81. 945-960 Holland, P. W., and D. B. Rubin. 1988. Causal inference in retrospective studies. Evaluation Review: in press. Kazdin.A.E. 1982. Single Case Research Designs. New York. Oxford University Press. Kershaw, D., and J. Fair. 1976. The New Jersey Income Maintenance Experiment. New York: Academic Press. Kish, L. 1965. Survey Sampling. New York: John Wiley. Kmenta, J. 1971. Elements of Econometrics. New York: MacMillan. Krug, A. S. 1967. The relationship between firearm licensing laws and crime rates. Congressional Record. 113 Part 15. July 25. 200060-200064. Lenihan, K. 1976. Opening the Second Gate. Washington: GPO. Lewis, D. A., T. Pavkov, H. Rosenberg, S. Reed, A. Lurigio, Z. Kalifon, B. Johnson, and S. Riger. 1987. State Hospitalization Utilization in Chicago. Evanston, Illinois. Center for Urban Affairs and Policy Research. ------- 252 A Guide to Evaluation Research Theory and Practice Lewis, Oscar. 1965. La Vida. New York: Random House. Liebow, Elliot. 1967. Tally's Corner. Boston: Little-Brown. Lord,F. M. 1980. Applications of Item Response Theory to Practical Testing Problems. Hillsdale, NJ. Erlebaum. Mathematical Policy Research. 1980. Job Corps Evaluated. Princeton: Mathematica. Maynard,R.A.,andR.J.Murnane. 1979. The effects of the negative income tax on school performance. Journal of Human Resources 14:(4):463-76. Mensh, I.N., and J. Henry. 1953. Direct observation and psychological tests in anthropo- logical field work. American Anthropology 55:(4):461-80. Milavsky, J. R., H. H. Stipp, R. C. Kessler, and W. S. Rubens. 1982. Television and Aggression: A Panel Study. New York. Academic Press. Murray, Sandra A. 1980. The National Evaluation of the PUSH for Excellence Project. Manuscript. Washington: American Institutes for Research. Murray, William A. 1981. Final Report: Evaluation of Cities in School Program. Manuscript. Washington: American Institutes for Research. t, Nathan, R. F. C. Doolittle and Associates. 1983. The Consequences of Cuts. Princeton. NJ. Princeton Urban and Regional Research Center. Pollard, W. E. 1986. Bavesian Statistics for Evaluation Research. Beverly Hills, Cali- fornia. Sage Publications. Pratt. J.W., and R. Schlaifer. 1984. On the nature and discovery of structure. Journal of the American Statistical Association 79(1): 9-21. Raizen, S., and P. H. Rossi. 1981. Program Evaluation in Education: When? How? To What Ends? Washington. DC: National Academy Press, 1981. Reiss, Albert E. 1971. The Police and the Public. New Haven: Yale University Press. Riis, Jacob A. 1890. How the Other Half Lives. New York: C. Scribner. Robertson, L.S. 1980. Crash involvement of teenaged drivers when driver education is eliminated from high school. American Journal of Public Health 70:(6'):599-603. Robins, P. K., et al. 1980. A Guaranteed Annual Income: Evidence from A Social Experiment. New York. Academic Press. ------- A Guide to Evaluation Research Theory and Practice 253 Rossi, P.H. 1978. Issues in the evaluation of human services delivery. Evaluation Quarterly 2:(4):573-99. Rossi, P.H. 1987. The iron law of evaluation and other metallic rules. In J. Miller and M. Lewis (eds). Research in Social Problems and Public Policy. Vol. 4, JAI Press, Greenwich CN. pp. 3-20. Rossi, P.H., and B. Biddle. 1966. The New Media and Education. Chicago: Aldine. Rossi, P.H., and Robert Dentler. 1961. The Politics of Urban Renewal: The Chicago Findings. New York: Free Press of Glencoe. Rossi, P. H.,G. Fisher and G.Willis. 1986. The Condition of the Homeless of Chicago. Amherst, MA and Chicago II. Social and Demographic Research Institute, University of Massachusetts, and NORC: A Social Science Research Institute, University of Chicago. Rossi, P.H., and K. Lyall. 1976. Reforming Public Welfare. New York: Russell Sage. Rossi, P.H., J. D. Wright, E. Weber-Burdin, and J. Pereira. 1983. Victims of the Envi- ronment: Loss from Natural Hazards in the United States. 1970-1980. New York. Aca- demic Press. Rossi,P.H.,RichardA.Berk,andBettyeK.Eidson. 1974. The Roots of Urban Discontent. New York: John Wiley. Rossi,P.H.,R.Berk,andK.Lenihan. 1980. Money. Work and Crime. New York: Academic Press. Rossi, P.H., H. Freeman. 1985. Evaluation: A Systematic Approach. (Third Edition) Beverly Hills: Sage. Scriven.M. 1972. Pros and cons about goal-free evaluation. Evaluation Comment3:d'):l- 4. Seitz, S. T. 1972. Firearms, homicide and gun control effectiveness. Law and Society Review. 6. (May) 595-613. Sinclair, Upton. 1906. The Jungle. New York: Doubleday. Smith, V. K., W. H. Desvousges, A. Fisher, andF. R. Johnson.CommunicatingRadonRisk Effectively: A Mid-course Evaluation. Washington, D.C. The Environmental Protection Agency. 1987 (Publication # EPA-230-07-87-029.) Steinbeck, John. 1939. The Grapes of Wrath. New York: Viking. ------- 254 A Guide to Evaluation Research Theory and Practice Struyk, R. J. and M. Bendick. 1981. Housing Vouchers for the Poor: Lessons from an National Experiment. Washington, D.C. The Urban Institute. Suchman, E. 1967. Evaluation Research. New York: Russell Sage. Sudman, S. 1976. Applied Sampling. New York: Academic Press. Thompson, M. 1980. Cost-Benefit Analysis. Beverly Hills: Sage. Trochim, W. M. K. 1984. Research Design for Program Evaluation: The Regression Discontinuity Approach. Beverly Hills, California. Sage Publications. U.S. Conference of Mayors. 1987. The Continuing Growth of Hunger. Homelessness. and Poverty in U.S. Cities: 1987. Washington D. C. U. S. Conference of Mayors. Wardwell, W. L. 1979. Comment on Kaul and Franke. American Sociological Review 44:(5):858-61. Weiss, C. 1972. Evaluation Research. Englewood Cliffs, New Jersey: Prentice Hall. WholeyJ.S. 1977. Evaluability Assessment. In Evaluation Research Methods. L. Rutman, ed. pp. 49-56. Beverly Hills: Sage. Wright, J. D., P. H. Rossi, and K. Daly. 1983. Under the Gun: Weapons. Crime and Violence in America. New York. Aldine Press. ------- PARTICIPANTS ------- JOHN AHEARNE Vice President and Senior Fellow Resources for the Future Resources for the Future is an independent, nonprofit research organization special- izing in natural resources, energy, and the environment. In addition, Dr. Ahearne is Chairman of the Department of Energy Advisory Committee on Nuclear Facility Safety; Chairman of the National Research Council Committee on Risk Perception and Commu- nication; and is on the National Research Council Steering Committee for the Workshop on Chemical Processes and Products in Severe Reactor Accidents. Formerly, Dr. Ahearne served in varying posts as Deputy Assistant Secretary of Defense for General Purpose Programs; Principal Deputy Assistant Secretary of Defense for Manpower and Reserve Affairs, including Acting Assistant Secretary; Assistant to the Secretary of Energy; Deputy Assistant Secretary of Energy for Power Applications; and Commissioner and Chairman of the Nuclear Regulatory Commission. Recent publications include "Nuclear Power after Chernobyl," in Science: and "Three Mile Island and Bhopal: Lessons Learned and Not Learned," in Hazards: Technology and Fairness. Dr. Ahearne received his Ph.D. in physics from Princeton University. FREDERICK W. ALLEN Associate Director Office of Policy Analysis Office of Policy, Planning and Evaluation U.S. Environmental Protection Agency Mr. Allen is Associate Director of EPA's Office of Policy Analysis. In this capacity he helps to manage an office working on a variety of issues concerning environmental and health risk. In the past several years he has managed a number of projects designed to improve the manner in which the Agency communicates with the public about environ- mental and health risks. He was also the lead staff member on a major agency task force which published the widely discussed report, Unfinished Business: A Comparative As- sessment of Environmental Problems. Mr. Allen has been at EPA since 1978. He has been Acting Director of the Energy Policy Division, Chief of the Energy Development Branch, and Staff Director of the Interagency Resource Conservation Committee. He has also worked on the staff of the Secretary of Labor, at the federal Energy Administration, the Cost of Living Council, and VISTA. Mr. Allen earned his B.A. with Honors at Yale University and his M.B.A. at the Harvard Business School. 257 ------- 258 Participants ELAINE BRATIC ARKIN Health Communications Consultant Following 16 years with the U.S. Public Health Service devoted to developing communications programs, Mrs. Arkin left her position as Deputy Director of Public Affairs to become an independent consultant She provides marketing and communica- tions assistance to Federal and other public sector clients including EPA, the National Cancer Institute, the National Heart, Lung, and Blood Institute, and the Institute for Health Policy Analysis at Georgetown University. ANN STOUFFER BISCONTI Vice President Research and Program Evaluation U.S. Council for Energy Awareness (USCEA) As Vice President, Research and Program Evaluation, Dr. Bisconti is responsible for public attitude tracking, advertising testing, and evaluation of program effectiveness. She has over twenty years' experience directing projects ranging from advertising research to survey research on energy, higher education, human resource development, and health. She is author of five books and over 30 other publications. Dr. Bisconti received her bachelor's degree in sociology and anthropology from McGill University and her Ph.D. in social science research from Union Graduate School. Before joining USCEA, she held the positions of Director of the National Center for Allied Health Leadership, Director of the Washington Office of Higher Education Research Institute, and Vice President, Human Resources Policy Corporation. CARON CHESS Associate Director Environmental Communication Research Program Cook College, Rutgers University Ms. Chess coauthored Improving Dialogues with Communities: A Risk Communi- cation Manual for Government and has given a variety of presentations and workshops on the subject. Before moving to academia, she coordinated programs for both advocacy organizations and government agencies. She played a leadership role in the campaign for the country's first right-to- know law, which gave the public access to information about toxic hazards and has written a book and many articles about the development of such laws. She was founding Executive Director of the Delaware Valley Toxics Coalition, linking environmental and labor constituencies. As Right-to- Know Coordinator for the New Jersey Department of Environmental Protection, Ms. Chess lead the effort to implement the state's new right-to-know law. She also laid the groundwork for New Jersey's innovative Risk Communication Unit. ------- Participants 259 THOMAS CHIZMADIA Director of Corporate Communications and Public Policy CIBA-GEIGY Corporation Mr. Chizmadia has been with CIBA-GEIGY Corporation since 1979. In his present position, he directs internal and external communication for the Corporation. Addressing environmental issues constitutes a large part of his effort due to their importance to the chemical industry and CIBA-GEIGY throughout the United States. Prior to assuming his present position at corporate headquarters, Mr. Chizmadia was Director of Government Affairs and Communications for the CIBA-GEIGY plant in Toms River, New Jersey. His tenure there coincided with the EPA Region II pilot public education program on Superfund. VINCENT T. COVELLO Executive Director Health Effects Institute Boston, Massachusetts In addition to his work at the Health Effects Institute, Dr. Covello is a professor at Columbia University and Director of the Center for Risk Communication. Prior to his current positions, Dr. Covello was Director of the Risk Assessment Program at the National Science Foundation and a senior scientist on detail to the White House Council on Environmental Quality. He has also been a study director at the National Academy of Sciences and a professor at Brown University. Dr. Covello received his Ph.D. from Columbia University and his B. A. with honors from Cambridge University in England. He has edited or authored over twenty books and numerous articles on various aspects of risk assessment, risk management, and risk communication. Dr. Covello is on the editorial board of several journals and is currently President of the Society for Risk Analysis. CHARLES DARBY Director Survey Research, Evaluation, and Analysis Prospect Associates In his present position, Mr. Darby provides design and data collection, analysis, and interpretation services in response to program and communications evaluation needs of government agencies and private-sector organizations. He has over twenty years experi- ence in survey and evaluation research focusing on health-related issues. Mr. Darby has directed a range of quantitative and qualitative research projects from large-scale national surveys to small-scale message testing and in-depth interviewing projects. ------- 260 Participants CAROL DECK Program Evaluation Division Office of Policy, Planning and Evaluation U.S. Environmental Protection Agency Ms. Deck has been on the staff of the Program Evaluation Division since October 1985. Before joining EPA, she was employed by ICF, Incorporated, and the National Science Foundation. Ms. Deck received her Bachelor's degree from Kalamazoo College and her Master's degree from Georgetown University. She is a recipient of a 1988/89 Fellowship from the Bosch Foundation to study environmental planning in the Federal Republic of Germany. ROBERT W. DENNISTON Director, Division of Communication Programs Office for Substance Abuse Prevention Alcohol, Drug Abuse, and Mental Health Administration U.S. Public Health Service Mr. Denniston received his M.A. degree in mass communications and is currently pursuing a Ph.D. in public communications at the University of Maryland. Prior to assuming his present position at the Alcohol, Drug Abuse, and Mental Health Administra- tion (ADAMHA), Mr. Denniston was Director, Division of Prevention and Research Dissemination, National Institute on Alcohol Abuse and Alcoholism. Before that, he held positions as Chief, Information Projects Branch, National Cancer Institute, and Com- munications Director, Mayo Comprehensive Cancer Center, Mayo Clinic. Mr. Denniston is Chair-elect of the American Public Health Association's Section on Alcohol and Drugs. His other professional affiliations are with the National Council on International Health and the International Communication Association. WILLIAM H. DESVOUSGES Senior Research Economist, Environmental Economics Department Center for Economics Research Research Triangle Institute As a Senior Research Economist at RTI for the past eight years, Dr. Desvousges' particular area of expertise has been risk communication. His recent projects include measuring the risk-related impacts of siting a high-level nuclear waste repository, measuring the effectiveness of alternative radon risk communication materials sent to 2,300 homeowners and supervising 7,000 interviews to measure radon risk perceptions and to track expenditures to reduce those risks, and measuring the effectiveness of a community-based radon risk communication effort. For these studies, he developed print ------- Participants 261 and radio public service announcements and designed survey questionnaires, coordinated data collection, and analyzed survey results. Dr. Desvousges is also an acknowledged expert in benefits analysis, having published a book and articles on the subject. He holds a B.A., M.A., and Ph.D. in economics and was a professor at the University of Missouri-Rolla for five years after receiving his Ph.D. from Florida State University. RICHARD A. EISINGER Assistant Branch Chief of the Human Resources and Housing Branch Office of Information and Regulatory Affairs Office of Management and Budget Possessing extensive knowledge and experience in survey research, program evalu- ation and regulatory issues, Mr. Eisinger heads a staff that is responsible for overseeing these functions for various government departments and agencies, including the Department of Health and Human Services, the Education Department, the Veterans Administration, and the National Science Foundation. Mr. Eisinger has a bachelor's degree in psychology from the University of Colorado and a master's in social psychology from the University of Missouri. He was a Ph.D. candidate in business and industrial psychology at the University of Maryland. ANN FISHER Manager, Risk Communication Program Office of Policy Analysis U.S. Environmental Protection Agency Dr. Fisher joined the U.S. Environmental Protection Agency's Benefits Staff in 1980. Her initial work at EPA was on methods for measuring the benefits from improved water quality and the role of risk assessment in the decision process. Later she switched emphases to valuing changes in morbidity and mortality and measuring the benefits of regulating hazardous wastes. In 1986, she set up EPA's Risk Communication Program for exploring the use of information programs as potential alternatives to—and complements of—regulation for reducing risk. Dr. Fisher is author of numerous articles on risk communication topics in a variety of publications. She has served as Associate Editor for the Journal of Environmental Economics and Management and is currently a member of that publication's Editorial Council. She taught for nine years at the State University of New York—College at Fredonia and continues to lecture extensively on environmental and risk communication issues. Dr. Fisher holds a B.A. in mathematics and an M.A. and Ph.D. in economics, all from the University of Connecticut. ------- 262 Participants JUNE FLORA Assistant Professor, Institute for Communication Research Department of Communication Stanford University In addition to her teaching responsibilities, Dr. Flora holds two positions in the Department of Medicine—Associate Director, Stanford Center for Research in Disease Prevention, and Director of the Educational Program, Stanford Heart Disease Prevention Program. She has been coauthor of numerous monographs, book chapters, and presenta- tions—most recently, "Indicators of Societal Action to Promote Physical Health" in Individual and Societal Actions for Health Promotion: Strategies and Indicators. Dr. Flora holds an M. A. and Ph .D. in educational psychology (with sub-specialization in child development) from Arizona State University, and a B.A. in psychology from Bridgewater College in Virginia. VICKI S. FREIMUTH Director of Health Communication Associate Professor, Department of Communication Arts and Theatre University of Maryland, College Park At the University of Maryland, Dr. Freimuth teaches courses in health communica- tion, diffusion of innovations, and research methods. Her research focuses on the dissemination of health information in this country and in developing countries. She is lead author of a forthcoming book from the University of Pennsylvania Press, Searching for Health Information: The Cancer Information Service Model. Her publications have ap- peared in Human Communication Research. Journal of Communication, American Journal of Public Health, Health Education Research: Theory and Research. In addition, Dr. Freimuth consults regularly for the National Cancer Institute, the National Heart, Lung and Blood Institute, the National Institute on Alcohol Abuse and Alcoholism, and the Agency for International Development. She is Chairperson of the Health Communication Division of the International Communication Association. Dr. Freimuth holds a Ph.D. from Florida State University. HENRY L. GARIE Assistant Director Office of Science and Research New Jersey Department of Environmental Protection In his present position, Mr. Garie directs the activities of NJDEP's demographic Information System and the Office of Environmental Health Assessment, which includes the Risk Assessment Unit, the Risk Communication Unit, and the Risk Reduction Unit. Prior positions in the Office of Science and Research included Acting Assistant Director, ------- Participants 263 Research Scientist I and II, and Technical assistant to the Director. Mr. Garie also served as Principal Biologist in the Office of Cancer and Toxic Substances Research, NJDEP. Mr. Garie has been author or coauthor of numerous publications including a recent article, "Overview of the Implementation of a Statewide Worker and Community Right- to-Know Act," in Hazard Communication: Issues and Implementation. ASTM STP 932. He has a B.S. in biology and an M.S. in environmental science from Rutgers University. JAMES A. HARRELL Deputy Director Office of Disease Prevention and Health Promotion Office of the Assistant Secretary for Health U.S. Department of Health and Human Services Mr. Harrell has been with the Department of Health and Human Services since 1975 having held positions as S enior Science Analyst; Chief, Program, Policy and Planning; and Director, National Center on Child Abuse and Neglect. Immediately prior to assuming his present position, Mr. Harrell served as Director, Planning, Research and Evaluation Division, Administration for Children, Youth and Families. Not surprisingly, his publi- cations focus on child abuse and day care issues as they relate to Federal initiatives and information dissemination. Mr. Harrell holds master's degrees from Yale University and the University of Maryland. He has done graduate work in public administration at George Washington University and participated in the Senior Executive Service Candidate Development Program at DHHS. JEANNE HERB Research Scientist Division of Science and Research New Jersey Department of Environmental Protection Ms. Herb is currently manager of the Risk Reduction Unit, which focuses on studying technical and policy options for hazardous waste source reduction. She has been with the New Jersey Department of Environmental Protection for three years and during that time has participated in implementing a community Right-to-Know program, assisted in establishing an environmental health assessment program within the Division, and directed the initial activities of the Risk Communication Unit. Ms. Herb holds an M. A. in science and environmental journalism from New York University and a B.S. in environmental science from Rutgers University. Her previous professional experience includes magazine editing and school teaching. ------- 264 Participants ROGER KASPERSON Member, Hazard Assessment Group Center for Technology, Environment, and Development (CENTED) Clark University Dr. Kasperson, who holds his Ph.D. from the University of Chicago, is co-author of Participation. Decentralization and Advocacy Planning, and co-editor of The Structure of Political Geography. Water Re-Use and the Cities. Equity Issues in Radioactive Waste Management, and Nuclear Risk Analysis in Comparative Perspective (in press"). He has written widely on issues connected with risk assessment and risk management, nuclear energy policy, and radioactive wastes. For the past seven years, Dr. Kasperson has directed a series of research projects, funded by the National Science Foundation and the Russell Sage Foundation, dealing with technological risk management, industrial management of hazards, and ethical and policy issues involved in occupational safety and health manage- ment. His current research projects deal with emergency planning around nuclear power plants, the risks issues and social impacts associated with the siting of radioactive waste repositories, and risk communication. Dr. Kasperson has served as consultant to several public and private agencies on energy and environmental issues. He was a member of the National Research Council's Board of Radioactive Waste Management and chaired its panel on Social and Economic Issues in Siting Nuclear Waste Repositories. He has also been Visiting Senior Scientist at the Beijer Institute in Stockholm, Sweden. Currently, he is on the editorial boards of Environment and Risk Analysis. MARK KLINE Research Associate Environmental Communication Research Program New Jersey Department of Environmental Protection Mr. Kline is in the final phases of the doctoral program in clinical psychology at the Graduate School of Applied and Professional Psychology atRutgers University. He brings experience as an individual, marital, and family therapist in community mental health settings to the fields of risk communication and evaluation. For the past year, Mr. Kline has worked with Caron Chess and Peter Sandman at the Environmental Communication Research program on a project for the New Jersey Department of Environmental Protection, which involves assessing and recommending "quick and easy" evaluation strategies. The clinical background he possesses has been most helpful in assessing attitudes, emotions and motivations without the benefit of research tools. As a clinician, he is frequently called upon to deal with emotional reactions to difficult interpersonal situations in an empathic and productive manner. This perspective has been of great value in understanding the dilemmas of risk communicators. Mr. Kline will continue his doctoral program as a psychology intern at Dartmouth Medical School in 1988-89. ------- Participants 265 MAXLUM Program Manager, Health Education Programs Agency for Toxic Substances and Disease Registry Centers for Disease Control Dr. Lum entered the government as a White House Fellow with the Office of Economic Opportunity. He worked for ten years in the Office of Program Evaluation and Research of the Department of Labor and participated in various evaluation programs for AID and the World Health Organization. He is currently Program Manager for ATSDR's Health Education Programs. Dr. Lum holds a master's degrees in public administration, with an emphasis on systems, and a doctorate in education, with a specialty in medical education. DAVID McCALLUM Senior Fellow Institute for Health Policy Analysis Georgetown University Medical Center As senior fellow at the Institute for Health Policy Analysis, Dr. McCallum conducts and develops research on health policy and risk communications and directs the Institute's program on risk communication. Formerly, Dr. McCallum served as a senior analyst in the Office of Technology Assessment of the U.S. Congress where he worked on a study of the impact of technology on aging in America. He has served in a variety of other governmental and private agencies examining technology, disease prevention, and public health. Dr. McCallum received an M.S. in chemical engineering and a Ph.D. in biomedical engineering from the University of Virginia. JOHN C. McGRATH Chief, Communications and Marketing Section Communication and Public Information Branch National Heart, Lung, and Blood Institute National Institutes of Health Mr. McGrath is currently Chief of the Communications and Marketing Section at the National Heart, Lung, and Blood Institute. He serves as the co-project officer on the Institute's Cardiovascular Risk Factor Public Education Program. His area of emphasis is public communication campaigns. At the Institute he is also responsible for dealing with the press. He has worked with several consulting firms supporting federal public education programs. Mr. McGrath has a master's degree in communications. ------- 266 Participants LOUIS A. MORRIS Psychologist and Acting Director Division of Drug Advertising and Labeling Food and Drug Administration In addition to his work at the FDA, Dr. Morris is an Adjunct Professor of Marketing at the American University and teaches part time at the University of Maryland and Johns Hopkins University. Dr. Morris is a graduate of Tulane University and has authored over 75 articles, chapters of books, and reports on topics related to drug information for consumers and health professionals. Recently, Dr. Morris served as a scholar-in-residence at the Center for Marketing Policy Research at the American University. WILLIAM D. NOVELLI President Porter/Novelli Mr. Novelli is President and Co-founder of Porter/Novelli, lead agency of the Omnicom PR Network, which is part of the Omnicom Group (a global organization of marketing communications agencies). In addition, he is a member of the Communications Planning Board of the National Cancer Institute and a national board member of CARE. Mr. Novelli regularly teaches a marketing management course in the MBA program at the College of Business and Management, University of Maryland. He holds undergraduate and graduate (master's in communication) degrees from the University of Pennsylvania and did post-graduate work at New York University. Mr. Novelli's past experience in marketing and advertising includes positions as marketing manager for Lever Brothers Company and account manager for Wells, Rich & Greene advertising agency. He also served as director of advertising and creative services for both the Peace Corps and ACTION. MARIA PAVLOVA National Expert on Toxicology Office of Toxic Substances U.S. Environmental Protection Agency Dr. Pavlova has both a medical degree and a Ph.D in microbiology and public health. She practiced medicine in Bulgaria, specializing in disease prevention activities. Since coming to the United S tales in 1969, she has been involved in cancer research and the study of environmentally and occupationally related disease at the University of Massachusetts and the Medical Department of Brookhaven National Laboratory. At Brookhaven, Dr. Pavlova also conducted research on interactions between chemical carcinogens and tumor viruses. ------- Participants 267 At EPA, Dr. Pavlova is Project Coordinator of the EPA Program on Community Needs Assessment and Educational Resource Development related to Emergency Pre- paredness and Community Right to Know (SARA, Title in). She is also a member of the Working Group of the Task Force on Environmental Cancer and Heart and Lung Disease and serves as Chairperson of the Interagency Group on Public Education and Communi- cation. In addition to her many presentations and scientific publications, Dr. Pavlova was Program Coordinator of a pilot education and communication program, "Communicating Risks," in Toms River, New Jersey. JAMES L. REGENS Associate Professor of Political Science Associate Director, Institute of Natural Resources University of Georgia Dr. Regens specializes in policy analysis, environmental regulation, and energy policy. His professional activities and honors include: Science Advisory Board, Living Lakes, Inc.; Research Fellowship, North Atlantic Treaty Organization, Scientific Affairs Division, Committee on the Challenges of Modern Society; Acting Director, Center for Science and Public Policy, University of Georgia; Recipient, James E. Webb Award, American Society for Public Administration; Recipient, Bronze Medal for Commendable Service, U.S. Environmental Protection Agency; Chairman, Group on Energy and the Environment, Organization for Economic Cooperation and Development; Joint Chairman, Interagency Task Force on Acid Precipitation; EPA Representative, Committee on Materials of the Office of Science and Technology Policy, Executive Office of the President; Assistant Director for Science Policy, Office of International Activities, U.S. EPA; Senior Technical Advisor to the Deputy Administrator, U.S. EPA; Senior Policy Analyst, Office of Research and Development, U.S. EPA; Public Administration Fellow, National Association of Schools of Public Affairs and Administration; Faculty Research Fellowship, U.S. Department of Energy, Oak Ridge National Laboratory; Visiting Research Fellow, Energy Division, ORNL; Member, State and Local Government Subcommittee of the Committee on Science, Engineering and Public Policy of the American Association for the Advancement of Science. Dr. Regens has also served as consultant to various public and private groups and is author/coauthor of over 50 publications and 40 conference presentations. MARILYN RICE Regional Advisor in Health Promotion, Health Education and Community Development Pan American Health Organization/World Health Organization Ms. Rice has extensive experience in designing, initiating, and executing local, national and international public health, primary health care, and health promotion and ------- 268 Participants education programs. With fluency in Spanish, French, and Portuguese, Ms. Rice has provided political and technical leadership in the development of health promotion and education programs for the thirty-eight countries in the Americas including a plan of action for women, which she developed and promoted. In addition she has published guidebooks, newsletters, and technical health manuals for national and international distribution. Ms. Rice holds a master's degree in health education from Columbia University and a B.A. in English and sociology from the University of Wisconsin. ROSE MARY ROMANO Chief, Information Projects Branch Office of Cancer Communications National Cancer Institute Ms. Romano is a graduate of Manhattanville College, with a master's degree in community health education from New York University. As a Public Health Educator at the National Cancer Institute, she is responsible for designing, implementing, and evaluating programs to reach the public and professionals with cancer information. She has presented numerous workshops on market research and evaluation. She serves as a marketing and promotional resource, organizing national conferences and teleconferences and providing communication consultation within NCI and to outside agencies, organiza- tions, and groups. Ms. Romano also serves on the American Red Cross Corporate Communication Advisory Committee. PETER H. ROSSI S.A. Rice Professor of Sociology and Research Associate Social and Demographic Research Institute University of Massachusetts at Amherst Dr. Rossi has extensive experience as a social science researcher in addition to his responsibilities in the classroom as S.A. Rice Professor of Sociology. Selected recent publications include the Handbook of Survey Research. Evaluation: A Systematic Ap- proach. "The Iron Law of Evaluation and Other Metallic Rules" in Research in Social Problems and Public Policy. Armed and Considered Dangerous: A Survey of Felons and TheirFirearms.TheCondition of the Homelessof Chicago, and "Homelessness: The Nature and Origin of the Problem" in Homelessness and Health. He has been recipient of the Common Wealth Award for contributions to sociology, and awards from the Evaluation Research Society for technical contributions to Evaluation Research. Dr. Rossi has taught at Harvard, the University of Chicago, and Johns Hopkins and been the Director of the National Opinion Research Center at the University of Chicago. He has held elective offices as President, American Sociological Association; Editor, the American Journal of Sociology and Social Science Research; and Fellow, American Academy of Arts and Sciences. ------- Participants 269 MILTON RUSSELL Professor of Economics and Senior Fellow Energy, Environment and Resources Center Waste Management Research and Education Institute The University of Tennessee In addition to his present work in academia, Dr. Russell is Senior Economist at the Oak Ridge National Laboratory. Prior to that, he was Assistant Administrator for Policy, Planning and Evaluation, U.S. Environmental Protection Agency. During his years at EPA, Dr. Russell wrote and presented extensively on issues relating to the economic implications of environmental protection and the role of risk management and assessment in environmental policy-making. Dr. Russell holds both an M.A. and Ph.D. in economics and was Professor of Economics for many of the 18 years that he taught the subject. Dr. Russell has served on numerous energy and economics advisory committees over the years, as well as holding the positions of Senior Fellow and Director, Center for Energy Policy Research, Resources for the Future; Senior Staff Economist, Council of Economic Advisers; and Staff Economist, Federal Power Commission. JUDITH A. SHAW Research Scientist Division of Science and Research New Jersey Department of Environmental Protection Ms. Shaw is in her second year with the New Jersey Department of Environmental Protection. She currently manages the Risk Communication Unit, which focuses on developing communication models and assisting in the integration of risk communication strategies into overall management and practice within the NJDEP. Her previous professional experience includes education, community organizing, and public relations. Ms. Shaw holds an M.A. in education and community development from the University of Michigan, a B.S. in elementary education from the University of North Dakota, and a B.A. in zoology/sociology from Indiana University. SHELAGH A. SMITH Public Health Educator/Evaluator Office of Cancer Communication National Cancer Institute National Institutes of Health In her present position at NCI, Ms. Smith is primarily responsible for designing and monitoring evaluation of the Office of Cancer Communication's mass media programs in cancer prevention and patient education. This includes developing guidelines for pretest- ing and interviewing, as well as evaluation of programs through surveys, case studies, pilot ------- 270 Participants programs, and focus groups. In addition to responding to public inquiries regarding NCI survey data, marketing and communications research, and tobacco education materials, Ms. Smith serves as a liaison with other government as well as non-government groups. Ms. Smith was previously employed at the Health Care Financing Administration in Baltimore, Maryland, as a social science research analyst in the Office of Research and Demonstrations, Division of Health Services and Special Studies. She has given numerous presentations on issues ranging from funding preventive services to public knowledge of such illnesses as cancer and sexually transmitted diseases. Ms. Smith received her B.S. in education from the University of Tennessee and her M.P.H. in health services adminis- tration from Johns Hopkins School of Hygiene and Public Health. MILDRED Z. SOLOMON President, Solomon Associates A specialist in the design and development of health communications, Ms. Solomon has over 12 years' experience in developing educational programs for use in diverse settings, including schools, community organizations, hospitals, and clinics, on subjects as diverse as nutrition, drug abuse, stress, occupational health, injury control, and sexually transmitted diseases. She is particularly committed to designing health education interventions that result in measurable behavior changes as well as changes in knowledge and attitudes. Ms. Solomon is currently a doctoral candidate in human development at Harvard University. She has also taken graduate courses in filmmaking and is a producer of award-winning health education audiovisual materials. JAMES W. SWINEHART President, Public Communication Resources, Inc. In his present position, Dr. Swinehart assists various organizations in planning, producing, and evaluating mass media programs or campaigns. Prior to that he was Director of Research for a 24-program television series on health broadcast nationally by PBS. His major professional interest is planning, production, and evaluation of public service communication programs (social psychological approaches to communication and influence and the use of audience research to develop and appraise media campaigns). At the University of Michigan, where he received a Ph.D. in social psychology, Dr. Swinehart held faculty appointments in the Survey Research Center, School of Public Health, and Highway Safety Research Institute. His publications pertaining to evaluation include titles such as "News about Science: Channels, Audiences, and Effects," "Creative Use of Mass Media to Affect Health Behavior," and the "Feeling Good" series. He has been involved in producing and evaluating TV and radio spots, TV programs, films, print ads, and supplementary materials for many campaigns. ------- Participants 271 NANCY ZAHEDI Program Analyst, Program Evaluation Division Office of Policy, Planning and Evaluation U.S. Environmental Protection Agency Ms. Zahedi received a bachelor's degree from Stanford University and a Master's of Public Policy from the John F. Kennedy School of Government. Before assuming her current position in the Program Evaluation Division at EPA, she served as a Peace Corps volunteer for two years and worked for the Save the Children Federation as a Planning and Evaluation Coordinator for two years. ------- INDEX ------- INDEX Audience analysis; See also Needs Assessment; Surveys Overview of methods 50-53 Using data 154 Audience Analysis Matrices 51 Audience Information Needs Assessment 51 Audience segmentation 35, 36,68,74,139 Audience motivations 83 Bounce-back cards 155,157 Broadcast Advertisers Reports (BAR) 155 Cancer Information Service (CIS) 183 Causality 208-209 Central location intercept interviews See Intercept interviews CIBA-GEIGY Corporation, Toms River (NJ) Plant 147,181 Communication Style Survey 58 Communicator assessment Overview of methods 57-58 Comparisons See also Study Design; Impact evaluation Comparison groups 23,234-235 Overview of methods 240 Concept development 35; See also Message design. Concept testing 138,154 Conflict Management Survey 58 Consultant services 103 Academic partnerships 103 Criteria for selecting 104 Directories 108 Cost-effectiveness 247-248 Data See also Needs assessment; Study design; Surveys Importance for planning 27 Sources 152,158,217-218 Use in message development 153 Eat for Health program 138 ENVIRON Corporation 148 Environmental changes Role in risk reduction 65 Environmental Protection Agency (EPA) 163,165 Office of Toxic Substances 175 Region II181 275 ------- 276 Index Ethical issues Code of ethics 9 Duty to inform public 6 Elitism 7 Individual rights 5,6 Manipulation vs. deception 7 Self-inflicted illness 5 Evaluability assessment 207,231-233 Goal-free evaluation 233 Evaluation Benefits of xii-xiii, 89,93,99,135,137 Costs 104-107 Criteria 206 In relation to policy xiii-xiv, 206,213,215-216 Interpreting findings 28, 117,128,155 Levels of 21,66,102,117,137,213,227 Obstacles xiii-xvi, 18,38,47,100,139 Social context of 196 Timeframes 117 Using results xvi, 38 Fear in messages 66,77 Fetal alcohol syndrome 159 Field review (by experts) 155,163 Field testing 181,226; See also Pilot testing. Focus groups 13, 56,108,138-139,154,157,173-179 Food and Drug Administration (FDA) 171 Formative evaluation 12,21,25-26, 33-37,99-101,138,163; See also Focus groups; Concept testing; Pretesting Generalizability of findings 209 Goals and objectives xiv-xv, 26,27,115,143 Health Belief Model 69 Health Objectives for the Nation 91 Impact evaluation 15,27-28,46,232-237; See also Summative evaluation Individual interviews 13 Intercept interviews 55-56 Intermediaries 68 Local groups 145,149 Interpersonal communication 35,58, 59 Door-to-door campaign 149 Vs. mass media 83-84 ------- Index 277 Lead 163 Legislation Role in risk reduction 65 Love Canal 191 Marketing research 41 Directories 108 Maryland Department of the Environment 165 Mass media See also Public Service Announcement Selecting 78-79 Types 79-80 vs. interpersonal communication 83 Materials development 84-86 Stages of evaluation 157 Measurement error 208 Meeting evaluations Overview of methods 59-60 Meeting Reaction Form 59 Message design Cognitive dissonance 68 Principles of 76-79,84 Use of fear 66,77 Using data 154 Midcourse reviews 97 Myers-Briggs Type Indicator 57 National Cancer Institute (NCI) 137 Cancer Prevention Awareness Program 173 National Cholesterol Education Program 151,177 National Heart, Lung, and Blood Institute (NHLBI) 151,177,187 National High Blood Pressure Education Program 151,187 Needs assessment 12,35,138,217-200 ; See also Audience analysis Forecasting 220 Qualitative 219 Sources of data 217-218 New Jersey Department of Environmental Protection (NJDEP) 141 News clippings (for audience analysis) 52 NHLBI Smoking Education Program 151 Observation and Debriefing 60 Ocean County Citizens for Clean Water 150 Office of Management and Budget (OMB) 127,129 Outcome evauluation 14-15,20,99,101,139,160; See also Summative evaluation ------- 278 Index Pilot programs 26 Pilot testing 14,140 Planning tools 50, 77 Policy Profiling Questionnaire 50 Pollstart 53 Pretesting 12-13,223-224; See also Focus groups; Intercept Interviews; Readability; Theater testing. Overview of methods 13,53-56 Process evaluation 14,20,37-38,93-96,99,138 Delivery of program 224,227-230 Fiscal accounting 231 Using results 39 Psychographic survey 139 Public opinion poll See Surveys. Public Opinion Polling (software) 52 Public service announcements 155-157 Qualitative research 23,219-220; See also Focus groups; Pretesting Quantitative research 23; See also Data; Surveys; Study design Radon testing 121, 163,165 Readability 13,54 SMOG formula 54 Recommendations of Workshop xvi-xvii Reye's syndrome 171 Rightwriter (software) 54 Risk assessment 144 Integral to risk communication Sampling 211; See also Study design; Surveys Probability samples 211 Random sampling 211,235-237 Self tests 13 Signaled Stopping Technique 55 Speech Evaluation Checklist 60 Strength Deployment Inventory 57 Study design 237-247; See also Comparisons; Surveys Cross-sectional surveys 245-246 Before-and-after studies 245 Judgmental assessments 246-247 Panel studies 244-245 Quasi-experiments 243-244 Randomized experiments 241-242 Regression-discontinuity studies 242 Time-series designs 242-243 ------- Index 279 Summative evaluation 26-29, 38,46,163 ; See also Outcome evaluation; Impact evaluation. Superfund 141,147,181 Surveys See also Study design. As outcome measure 155,160,166,183-185 Cross-sectional 246-247 For audience analysis 152-156,183-185,218-219 Limitations 166-167 Questionaire design 172 Using survey data 104-107,167,176 Theater testing 13,56 U.S. Council for Energy Awareness (USCEA) 169 Union Lake, NJ 141 Validity 212 Construct validity 207 External validity 210 Internal validity 209 Statistical conclusion validity 210-212 Verbal Meeting Feedback 59 Vineland Chemical Company 141 oU.S. GOVERNMENT PRINTING OFFICE:! 991 .51.8 -18 7/20550 ------- |