THE EPA MULTIMEDIA INTEGRATED MODELING SYSTEM SOFTWARE SUITE By Steven S. Fine, Program Manager, National Oceanic and Atmospheric Administration on Assignment to the U.S. Environmental Protection Agency, c/o U.S. EPA, MD E243-04, Research Triangle Park, NC, 27711, 919-541-0757, fine.steven@epa.gov; Steven C. Howard, Computer Specialist, National Oceanic and Atmospheric Administration on Assignment to the U.S. Environmental Protection Agency, Research Triangle Park, NC; Alison M. Eyth, Software Engineer, MCNC Environmental Modeling Center, Research Triangle Park, NC; Dean A. Herington, Research Assistant, Department of Computer Science, University of North Carolina, Chapel Hill, NC; Karl J. Castleton, Software Engineer, Pacific Northwest National Laboratory (formerly with U.S. EPA), WA INTRODUCTION Several trends in environmental modeling are driving a significant increase in the complexity of environmental modeling studies. These include the growing importance of • combining models from multiple physical media (e.g., air, water, soil) or disciplines to make predictions that include an increasingly complete set of processes and outcomes; • performing sensitivity and uncertainty studies to understand factors that affect results and to estimate the confidence that should be associated with predictions; and • comparing multiple models and data sets that are intended to represent similar processes or contain similar information to understand models' and data sets' biases and errors. Those and related challenges affect a number of projects at the U.S. Environmental Protection Agency (EPA). Examples include studying cross-media pollution or nutrient transport and transformation, tracking pollutants from their source to human doses via multiple pathways, predicting the relationships between climate and air quality at regional and global scales, and understanding risks associated with hazardous waste. If approaches typically used for executing and evaluating individual models are applied to such complex problems, a prohibitive amount of effort could be required and there is a significant probability that configuration and operational mistakes will corrupt the results. To help modelers manage such increasingly complex simulations, a number of groups, including the EPA, have developed software systems that support modeling (e.g., Rizzoli and Young, 1997; Dennis et al., 1996; Leavesely et al., 1996; Laniak, 1999). These systems typically provide tools, software libraries, and/or software environments that simplify or partially automate common operations, including composing, configuring, executing, and evaluating models; but they lack the combination of power and flexibility required to effectively support some types of complex simulations. Issues that require further attention include feedbacks between models; suitability for models with different spatial and temporal scales; a conceptual design that cleanly supports interchanging models and data sets and modeling of physical, chemical, biological, and human systems; cross-platform portability; support for off-the-shelf models; and distributed computing. 1 ------- MULTIMEDIA INTEGRATED MODELING SYSTEM SOFTWARE SUITE We are developing the Multimedia Integrated Modeling System (MIMS) software suite to address the EPA's current and future interdisciplinary modeling needs. The MIMS software will allow modelers, including model developers and risk assessors, to conduct complex studies with less effort and greater confidence that the results represent the modelers' intent. MIMS will not make scientific or management decisions for modeling studies, but MIMS should allow modelers to better focus their attention on those issues. The MIMS software suite will support the following activities: • Data management and manipulation • Model coupling and swapping • Repetitive work, such as modeling multiple locations and conducting sensitivity and uncertainty studies • Computation management, such as managing model executions on remote computers • Model evaluation The software suite will support these activities by automating common actions, simplifying manual operations, checking consistency based on information provided by modelers, and providing tools for various activities. To effectively meet the needs of users, we have identified a number of desirable characteristics for the software suite. The suite should: • Be applicable to a wide variety of environmental issues; • Be compatible with multiple families of models and multiple sources of data (e.g., field observations, satellites, models); • Support feedback between models; • Be easy for modelers to use; • Run on multiple operating systems (e.g., Windows, UNIX, Linux); • Support distributed computing in a flexible manner; • Be open source; • Allow modelers to incorporate their models into the system without losing "ownership"; • Allow users to choose their level of investment (e.g., achieve some benefit with limited effort or achieve greater benefit with greater effort); and • Make common operations easy to perform while also supporting less common and more complex operations. The MIMS software suite will include several components, as shown in Figure 1. The "framework" provides a software infrastructure for composing, executing, and evaluating models. MIMS tools provide functionality required to prepare models or their inputs or to operate on models' results. In the future, MIMS software libraries will provide a standard implementation of common functions required by models. For instance, MIMS might provide a standard protocol (preferably based on off-the-shelf software) for representing and interchanging 2 ------- MIMS Framework Provides Infrastructure for Composing, Executing and Evaluating Models Model A Model B Model C f Tools: Data Manipulation Data Analysis Visualization Decision Support Tools Figure 1: Components of the MIMS software suite. environmental data which modelers could choose to use. There are currently no MIMS libraries. The MIMS framework and tools are described in the following sections. MIMS Framework Composing and Interchanging Models and Data Sets: To provide a flexible and robust method of combining and interchanging models and data sets, MIMS uses a modified version of the modeling paradigm from Argonne National Laboratory's Dynamic Information Architecture System (DIAS) (Christiansen, 2000) and the supporting DIAS software library. In the DIAS paradigm, one or more modelers decompose a system to be modeled into "domain objects" that represent the important things or concepts in the simulation. Examples of domain objects include an aquifer, a pollutant source, the atmosphere, homes, and a fish population. Each domain object contains parameters and processes. Parameters are attributes that describe the domain object, and processes are behaviors the domain object exhibits. Models provide the implementation of processes that are active in a simulation, as shown in Figure 2. Each model is defined to read data from and write data to a domain object's parameters, as shown in Figure 3. In essence, each domain object's parameters serve as a standard for any information about that domain object. For example, in Figure 3 Domain Object A could represent a lake. Its parameters might include depths, temperatures, and nitrogen concentrations. Domain Object B could represent an urban region with parameters of population, sewage treatment type, and economic activity. Model B could implement a discharge process by computing the amount of nutrients the urban region contributes to the lake. Model A could implement an aquatic chemistry process by computing the lake's nutrient concentrations. Since each model is defined in terms of the data standards provided by the domain objects, models conceptually do not interact directly with other models. This allows a modeler to replace a model with another implementation of the associated process without affecting other models in a system. This also allows a modeler to easily remove a model from the simulation and to instead incorporate data sets that contain the same type of information that the model would produce. Data analysis and model evaluation programs can also be included in scenarios as "models." 3 ------- Model A Process -*Jj Model B Process I Model C Figure 2: Models provide the implementation of the domain object's processes that are active in a simulation. The primary difference between the MIMS and DIAS model coupling paradigms is that DIAS includes another layer of software between a model and a domain object which translates between the model's and the domain object's parameters and assumptions. If that functionality is required in MIMS, a wrapper is placed around an existing model and the combination of wrapper and existing model is treated by MIMS as a model. The MIMS approach allows a layer of code to be eliminated for models that do not require translation with little or no loss of flexibility. To support a wide range of modeling issues, MIMS uses a very general concept of model parameters. Parameters provide the attributes of domain objects and the inputs and outputs of models. MIMS parameters check their consistency with other parameters and can provide customized user interfaces for editing. A parameter's developer determines the scope of its consistency checks. The basic MIMS parameters primarily check for compatible types of information (e.g., confirm that a floating-point number is provided when one is required). MIMS places few other requirements on parameters, which allows parameters to represent a broad range of information. MIMS already provides parameters for basic data types, such as floating-point numbers, strings, and integers, as well as more complex data structures, such as files, sets of chemical reactions, and descriptions of regular grids. Java code for new parameters can be easily incorporated into MIMS. Some parameter types provide an abstract description of data that models can use without being tied to a specific data source. For instance, models can be defined to read a time series of values without making any assumptions about where the data are stored (e.g., in a file, in memory, in a database). The MIMS approach for parameters is based on some concepts used in the computer framework developed for the Total Risk Integrated Methodology project (Palma et al, 1999) with significant extensions. The effort required to use a new model in MIMS varies greatly. If the model is not consistent with the process and parameter standards set by the domain objects, then the model must be adapted or a wrapper placed around the model. A wrapper's responsibilities might include converting units, computing derived parameters, and interpolating data. For all models, MIMS 4 ------- Reading/Writing Parameters Figure 3: Models are defined to read from and write to domain objects. requires some basic information including the type of domain object to which the model applies, the process the model implements, how the model should be invoked, and the model's input and output parameters. A user can define this information in a graphical user interface, a text file, or Java code. Modelers can define their own models or use definitions someone else prepared. MIMS provides several features that make it possible to use some existing ("legacy") models in MIMS without having to write any new code. The person who defines a legacy model in MIMS can specify that input parameters should be passed to the model on the command line, via environment variables, in a textual control file, and/or as columnar time series data. MIMS will create the required control and time series files before invoking the model. MIMS can also read model outputs from columnar time series files and make those outputs available to other models in a MIMS scenario. Users bring together or compose domain objects and models to create modeling scenarios. MIMS uses the information about domain objects and models to perform some basic consistency checks as the scenario is created. The consistency checks include verifying that all required input parameters are defined and that models' input and output parameter types and units are consistent with those of the domain objects. MIMS also allows scenarios to be composed. A scenario, which might invoke a number of models, can be defined as a new type of model and attached to a domain object's process in another scenario. This is similar to the concept of wrapping Fortran statements in a subroutine and provides the same benefits of conceptual encapsulation and reuse of processing instructions. For example, a user could create a scenario that simulated aquatic biology via models of algae, fish, etc. Then that scenario and the multiple models it contains could be used to implement an aquatic biology process in a scenario that simulated multimedia nutrient transport. Executing Scenarios and Models: After a user defines a new modeling scenario or opens a predefined scenario, he executes the scenario. MIMS passes descriptions of the scenario's 5 ------- domain objects and models to the DIAS library, which invokes the models in the proper sequence. DIAS invokes a model when its input parameters are available and have changed since the model's last invocation. We are currently developing capabilities to distribute model executions to remote computers. This will allow users to easily utilize remote compute servers from their desktops to significantly decrease the turnaround times for some types of simulations. For instance, Monte Carlo uncertainty analysis can be performed much more quickly if multiple computers are used. Our design includes the use of multiple protocols for working with remote computers so MIMS can be compatible with a variety of networking, computer architecture, software, and security environments. Iteration: Repetitively running a set of models for different inputs is a very common activity. Examples of iterative studies include sensitivity and uncertainty studies, model calibration, optimization, and model execution for different sites, time periods, or management assumptions. MIMS includes a general iterator design that is flexible and easy to extend. The design allows iterator developers to easily take advantage of MIMS's user interfaces, parameter consistency checking, and execution management while specifying whether realizations should be executed sequentially (e.g., for optimization) or in parallel (e.g., for multiple sites) and what post- processing of results should be performed (e.g., to compute the importance of factors during a sensitivity study). MIMS already includes a basic Monte Carlo iterator, and collaborators are developing an extensive uncertainty package that will plug into MIMS. MIMS iterators can be applied to any MIMS model, including complex scenarios that invoke multiple legacy models. MIMS Tools: Typically, modelers invest significant effort preparing models and their inputs and analyzing and utilizing model results. MIMS tools assist with the most common of those activities. One tool is currently available and another two tools are under development. Modular Spatial Allocator: When dealing with spatially explicit data sets, a very common operation is to allocate attributes from one set of polygons, lines, or points to another set of polygons, lines, or points. For instance, attributes that are specified by county (e.g., atmospheric emissions of pollutants) might be allocated to model grid cells, or fluxes on a fine mesh might be aggregated to a coarse mesh. This is a standard operation in a geographic information system (GIS), but some communities that use MIMS would benefit from a stand-alone spatial allocator because they do not have expertise with GIS packages, because they have little or no additional need for a GIS, or because it can be difficult or inefficient to invoke a GIS as part of an interface between two models. To address these concerns, MIMS includes a modular spatial allocator. This application reads a source set of polygons, lines, and points with an associated attribute and a destination set of shapes, computes the overlaps of the source and destination shapes, allocates the input attribute, and writes the results. The spatial allocator has been designed in a manner that makes it relatively easy to add additional input and output formats for data or to add a different spatial allocation algorithm. 6 ------- Plotting Tool: Many users would like to plot model results and comparisons in scatterplots, time series plots, histograms, bar charts, and boxplots (box-and-whisker plots). Commercial packages can perform many of these functions but are often platform-specific, may not be amenable to operation in a batch mode where no graphical user interface is present, and cannot be distributed as part of a free modeling system. MIMS already allows users to include scatterplots, time series plots, and histograms in their scenarios. We are also designing a more general and powerful plotting facility that will combine the best features of framework-embedded plotting facilities and a stand-alone, easy to use plotting application. The goal is to allow users to use the same data analysis tool to easily analyze their data outside of the framework, to create templates containing plot compositions and formats to use in their scenarios, and to automatically create plots on-screen or in a publication- quality file format when they execute their MIMS scenarios. To achieve this we are selecting an existing plotting application that is open source, portable across platforms, extensible, and easy to use and that produces a wide variety of plots on-screen and in publication-quality formats. We will then add features for bidirectional communication with the framework. The plotting application will create plot templates that the framework can use and the framework will invoke the plotting application with the data the user would like to plot. Thematic Mapper: We are also pursuing a similar approach to providing a thematic mapping capability in MIMS. Such a tool could provide an easy way to quickly view model inputs or results in a geospatial context. We are evaluating open source, cross-platform mapping components or applications, such as OpenMap, that could be invoked from within MIMS as well as operate in a stand-alone manner. Development Approach: To try to maximize the value to customers and the timeliness of our work, we have adopted some approaches from the Extreme Programming software development methodology (Beck, 1999) while tempering those approaches to account for the special requirements of an extensible framework, our very small and scattered development team, and the inertia of a large organization. While we have a long-term vision for MIMS, we typically select short-term priorities based on customers' needs and schedules. When considering one customer's needs we also consider if there is a more general issue that is relevant to multiple customers. We are also collaborating with other agencies that are developing modeling tools to try to identify and share standard subsystems. APPLICATIONS OF MIMS EPA's Council for Regulatory Environmental Modeling (CREM) encourages the use of common best practices in EPA's modeling groups. CREM is expecting MIMS to provide a platform for groups that require modeling frameworks to conduct complex studies. To help us achieve the generality required to meet CREM's expectations, we are working with groups that have very different types of models and applications. Working with these multiple groups helps ensure that the MIMS design is general, provides a broad evaluation of MIMS approaches, and helps guide growth in a variety of directions. The projects that currently plan to use MIMS include the following: 7 ------- • Total Risk Integrated Methodology (TRIM). TRIM will support risk assessment for hazardous pollutants that are emitted to the air and then are transported to soil and water, such as mercury. MIMS will provide the TRIM project a platform for coupling the models required for their risk studies and data analysis tools. • Community Multiscale Air Quality fCMAO) model. CMAQ is a state-of-the-art grid- based air quality model. MIMS will provide a graphical alternative for configuring CMAQ and for managing repetitive model executions. • Clean Air Status and Trends Network (CASTNET). MIMS will provide a graphical user interface, simulations of multiple sites and years, and data analysis tools for the application of the Multilayer Model of dry deposition (Meyers et al., 1998) to CASTNET data. • New Generation Compartmental Model. This project uses MIMS as a platform for exploring new approaches for constructing fully integrated compartmental models of multiple media, including biota. • Urban Drainage Decision Support System. An external group supported by a cooperative agreement funded by EPA's Office of Water is using MIMS as the basis for a prototype decision support tool for urban drainage applications. This includes the development of uncertainty analysis and optimization tools within MIMS. While those projects will be using the MIMS software suite, they are not currently designing their domain objects or models to be interoperable. A few of the projects have expressed interest in using another project's models with MIMS's support in the future, but they have not invested any effort yet in the conceptual design and model adaptations that will be required to achieve that interoperability. FUTURE DIRECTIONS During the next couple of years, new MIMS capabilities will most likely address two issues: very large computations and working with environmental data. More detailed observations and representations of environmental processes, increased interest in long-duration environmental simulations, and growing demand for sensitivity and uncertainty estimates are significantly increasing both the computational resources required to perform an individual simulation and the number of simulations. Several planned MIMS capabilities will support such computationally intensive work. As described above, we are designing distributed computing support for MIMS that will allow a user to easily utilize remote compute servers from her desk. In the future MIMS will be extended so the computation management portion of MIMS can continue running on a server even when a user turns off the machine where she started MIMS. This will enable MIMS to manage simulations that require weeks or more of time without being tied to a desktop machine. Also, we are considering adding a script language to MIMS to provide another avenue for automating computations. We are starting to shift the emphasis of our development from computation management to all facets of working with environmental data. We will provide in MIMS common data reduction tools, such as computing averages, subsets, and extrema. After we have developed the basic plotting and thematic mapping tools described above, we will investigate off-the-shelf applications that can provide an integrated view of three-dimensional, time varying data sets that are not on the same grid or mesh, such as overlapping results from models at two different 8 ------- resolutions. We will also investigate data representation and interchange approaches that can foster sharing data among independently developed models and data manipulation and analysis tools. This may apply or build on existing approaches such as the Synthetic Environment Data Representation and Interchange Specification (Foley et al.), the Distributed Oceanographic Data System (Unidata, 2002), and the Earth Science Markup Language (Ramachandran, 2001). Another issue we expect to address is how to track model results and find data sets that might be located at multiple institutions. SUMMARY The MIMS software suite will allow modelers to focus more on scientific and policy issues while conducting increasingly complex modeling studies. The suite supports composing, configuring, executing, and evaluating a wide range of models. We are working with diverse modeling projects to identify and support their common requirements and to evaluate our success towards that goal. The current status of MIMS is available at http://www.epa.gov/asmdnerl/mims. REFERENCES Beck, K., 1999, Extreme Programming Explained: Embrace Change, Addison-Wesley. Christiansen, J. H., 2000, A Flexible Object-Based Software Framework for Modeling Complex Systems with Interacting Natural and Societal Processes. Proceedings, 4th International Conference on Integrating GIS and Environmental Modeling (GIS/EM4): Problems, Prospects and Research Needs. Banff, Alberta, Canada, September 2-8. Dennis, R. L., Byun, D. W., Novak, J. H., Galluppi, K. J., Coats, C. J., Vouk, M. A., 1996, The Next Generation of Integrated Air Quality Modeling: EPA's Models-3. Atmos. Environ., 30, 1925-1938. Foley, P. G., Mamaghani, F., Birkel, P. A., 1998, The Synthetic Environment Data Representation and Interchange Specification (SEDRIS) Development Project. http://www.sedris.org/prlltrpl.htm. Laniak, G. F., 1999, Documentation for the FRAMES-HWIR Technology Software System, Volume 1: System Overview, http://www.epa.gov/epaoswer/hazwaste/id/hwirwste/pdf/risk/system/s0499.pdf. Leavesley, G.H., Markstrom, S.L., Brewer, M.S., and Viger, R.J., 1996, The Modular Modeling System (MMS) - The Physical Process Modeling Component of a Database-Centered Decision Support System for Water and Power Management. Water, Air, and Soil Poll., 90, 303-311. Meyers, T.P., Finkelstein, P., Clarke, J., Ellestad, T. G., Sims, P. F., 1998, Description and Evaluation of a Multilayer Model for Inferring Dry Deposition Using Standard Meteorological Measurements. J. Geophys. Res., 103(D7), 22,645-22,661. Palma, T., Vasu, A. B., Hetes, R. B., 1999, The Total Risk Integrated Methodology (TRIM). Environ. Manager, 5, March, 30-34. Ramachandran, R. Alshayeb, R. M., Beaumont, B., Conover, H., Graves, S., Li, X., Mowa, S., McDowell, A., Smith, M., 2001, Earth Science Markup Language: A Solution for Generic Access to Heterogeneous Data Sets, http://esml.itsc.uah.edu/presentations2.html. Rizzoli, A. E., Young, W. J., 1997, Delivering Environmental Decision Support Systems: Software Tools and Techniques. Environ. Modelling & Software, 12, 237-249. 9 ------- Unidata, 2002, Distributed Oceanographic Data System Web Site, http://www.unidata.ucar.edu/packages/dods/index.html. 10 ------- REPORT DOCUMENTATION PAGE Form Approved OMB No. 0704-0188 Pubic reporting burden for this collection of information is estimated to average 1 hour per response, including the time for reviewing instructions, searching existing data sources, gathering and maintaining the data needed, and completing and reviewing the collection of information. Send comments regarding this burden estimate or any other aspect of this collection of information, including suggestions for reducing this burden to Washington Headquarters Services, Directorate for information Operations and Reports, 1215 Jefferson Davis Highway, Suite 1204, Arlington, VA 22202-4302, and to the Office of Management and Budget, Paperwork Reduction Project (0704-0188), Washington, DC 20503. 1. AGENCY USE ONLY (Leave Blank) PB2004-101304 2. REPORT DATE 2003 3. REPORT TYPE AND DATES COVERED 4. TITLE AND SUBTITLE: The EPA Multimedia Integrated Modeling System Software Suite 5. FUNDING NUMBERS None 6. AUTHOR(S) S. Fine, S. Howard, A Eyth, D. Herington, K. Castleton 7. PERFORMING ORGANIZATION NAMES(S) AND ADDRESS(ES) U.S. EPA, ORD, Nat'l Exposure Research Lab. Research Triangle Park, N.C. 27711 8. PERFORMING ORGANIZATION REPORT NUMBER 9. SPONSORING/MONITORING AGENCY NAME(S) AND ADDRESS(ES) National Exposure Research Laboratory -RTP, NC ORD, U.S. EPA, Research Triangle Park, N.C. 27711 10. SPONSORING/MONITORING AGENCY REPORT NUMBER EPA/600/A-03/044 11. SUPPLEMENTARY NOTES 12a. DISTRIBUTION/AVAILABILITY STATEMENT Release to the General Public 12b. DISTRIBUTION CODE EPA/600/9 13. ABSTRACT (Maximum 200 words) Several trends in environmental modeling are driving a significant increase in the complexity of environmental modeling studies. These include the growing importance of: 1) combining models from multiple physical media (e.g., air, water, soil) or disciplines to make predictions that include an increasingly complete set of processes and outcomes; 2) performing sensitivity and uncertainty studies to understand factors that affect results and to estimate the confidence that should be asssociated with predictions; and 3) comparing multiple models and data sets that are intended to represent similar processes or contain similar information to understand models' and data sets' biases and errors. 14. SUBJECT TERMS 15. NUMBER OF PAGES: 14 16. PRICE CODE A03 17. SECURITY CLASSIFICATION OF REPORT Unclassified 18. SECURITY CLASSIFICATION OF THIS PAGE Unclassified 19. SECURITY CLASSIFICATION OF ABSTRACT Unclassified 20. LIMITATION OF ABSTRACT None NSN 7540-01-280-5500 Standard Form 298 (Rev. 2-89) Prescribed by ANSI Std. Z39-18 298-102 ------- |