oEPA

United States

Environmental Protection Agency



Office of

¦7 Research and Development

National Human Exposure Assessment

Survey (NHEXAS)

Quality Systems and Implementation Plan
for Human Exposure Assessment

Title: Logsheet and Confidential Questionnaire Data Entry and
Preparation

Source: Harvard University/Johns Hopkins University

Notice: The U.S. Environmental Protection Agency (EPA), through its Office of Research and Development (ORD),
partially funded and collaborated in the research described here. This protocol is part of the Quality Systems
Implementation Plan (QSIP) that was reviewed by the EPA and approved for use in this demonstration/scoping
study. Mention of trade names or commercial products does not constitute endorsement or recommendation by
EPA for use.

Maryland

Emory University
Atlanta, GA 30322

Cooperative Agreement CR 822038

Standard Operating Procedure

NHX/SOP-D04

U.S. Environmental Protection Agency
Office of Research and Development
Human Exposure & Atmospheric Sciences Division
Human Exposure Research Branch


-------
D02 Questionnaire Data Entry and Preparation,
Rev. 1.0

page 1 of 7
September 25, 1995

1.	Title of Standard Operating Procedure

Harvard University/Johns Hopkins University Standard Operating Procedure:

D04 Logsheet and Confidential Questionnaire Data Entry and Preparation, Rev 1.1

2.	Overview and Purpose

The purpose of this SOP is to describe how data is transcribed from logsheets and confidential
questionnaire forms into the official database also known as the Complete Dataset (CDS). This
SOP also describes the structure and purpose of the CDS and the Analysis-ready Dataset
(ADS).

3.	Discussion

Data will be entered into the CDS and ADS incrementally. To help ensure that all data is
accounted for during the data entry process each participant's logsheets and confidential
questions will be assigned to a unit of data. A unit of data will be handled together and
processed as a unit. All logsheets and confidential questions collected from individuals who
are participating during a specified interval of time will be defined as a unit of questionnaire
data. See SOP D02 for more details.

Structure of the CDS.

The CDS is the official database. The CDS will be used to create any database used for analysis,
e.g. used to construct the ADS, and will be the database sent when data is requested from other
investigators. The confidential questionnaires, while part of the CDS, will be available to a
limited number of individuals.

Initial data sets will be Paradox tables and each unit of data entered will be entered into a new
file which will be appended later to the CDS. The initial data set files will have names nearly
identical to the the corresponding CDS files except the file names will start with the letter i.

Information from logsheets will be contained in two files. The file logt.db will contain the data
entered onto the logsheets required to translate the lab results into typical units of measurement
used to analysis the data. The file logc.db will perform a similar, but expanded role as
qxcom.db, where comments which flag unusual samples and correction information will be
stored.

The confidential questions will be stored in five tables, see Appendix A for a complete list of
table names and their contents. These tables contain names, address, telephone numbers of
household members, and health and income conditions of the participant.

The CDS will reflect exactly what was written on the confidential questionnaires. Information
recorded from logsheets in the CDS will include all data necessary to transform the data to the
appropriate units of measurements and comments, if any, on collection problems.

Structure of the ADS

Confidential questions will not be part of the ADS.

Information from the logsheets will be indirectly part of the ADS, see SOP D03 for details.


-------
D02 Questionnaire Data Entry and Preparation,	page 2 of 7

Rev. 1.0	September 25, 1995

File naming conventions

File name conventions are the same as for the questionnaires. See SOP D02 for details. Names
and contents of the templates for the CDS tables are given in Appendix A of this SOP.

4. Personnel Responsibilities

4.1 Project Data Coordinator is responsible for
•assuring accuracy and consistency of database
•modifying data entry protocol if necessary, in writing

•tracking confidential questionnaires and logsheets from collection to entry into
database to storage.

•notifying the Data Entry Supervisor of any modifications that affect working databases.
4.2 Data Entry Supervisor is responsible for
•training and supervising Data Entry Assistants

•tracking confidential questionnaires and logsheets through data entry and review
•resolving problems and ambiguities that Assistants are unable to handle.

4.3 Data Entry Assistants are responsible for

•logging in arriving data, checking for ID, storing confidential questionnaires and
logsheets
•data entry

•reviewing data entry.

The same Assistant may do all of these jobs, but no one verify a stack s/he has entered.

5.	Required Equipment and Reagents

set of confidential questionnaires and logsheets and pen for marking
computer with Borland Paradox V5.0 for Windows and database directories
data in hardcopy format
•initial data set disks and back-ups
CDS backup disks.

data entry instruction sheet appropriate to the type of data being coded

data entry log (forms with initials and dates for data entry; kept in Supervisor's logbook)

6.	Procedures

Marking the Questionnaires

The interviewer will use a black or blue ink pen to mark the questionnaire. The response will
be circled by the interviewer. If a change needs to be made in the marked response, the
incorrect mark will be crossed out and initialed. The correct response will then be marked. The
interviewer will give these instructions to the participants for questionnaires they mark.

Filling in the Logsheets

Discussion of logsheets and procedures for filling in the logsheets are given in the rel event Field
SOP, SOP FOl-SOP F12.

Processing Questionnaires and Logsheets at the FCC

The confidential questions will be separated from the questionnaires by the FCC Clerk on return
from the field. They will be stored until a unit of data has been collected. See the discussion


-------
D02 Questionnaire Data Entry and Preparation,
Rev. 1.0

page 3 of 7
September 25, 1995

section above for the definition of a unit of questionnaires and SOP D01 for details on storage
procedures. When a unit of questionnaires has been collected the FCC Clerk will photocopy the
questionnaires minus confidential questions twice and the confidential questions once.

The FCC Clerk will store the logsheets in the designated location (see SOP DO 1) until a unit of
data has been collected.

The FCC Clerk will then send the originals of the questionnaires, confidential questions and the
original copy of the logsheet to the Project Data Coordinator and copy 1 of the logsheet to the
Principle Investigator. The photocopies of the confidential questions will be returned to the
secure location described in SOP DO 1. The FCC Clerk will notify the Project Data Coordinator
via e-mail that the confidential questionnaires and logsheets have been shipped.

Data Preparation of Logsheets at Emory

On receipt of the logsheets the Data Entry Assistant will store the logsheets in a labeled folder
in the location designated for logsheets which need to be entered into the CDS. The Data Entry
Assistant will note the arrival of the logsheets in the logbook and the CTDS.

The procedure will be the same for both tables, itlog.db and iclog.db; the table itlob.db will be
used as an example. The steps in preparing the initial data sets for logsheets are:

1)	The Data Entry Supervisor will assign the folder to an Assistant for entry into the
database.

2)	The Data Entry Assistant who will enter the data will:

i)	take the folder with the logsheets and the initial data set and backup disks

ii)	check whether a file with the appropriate filename exists; if not, rename a
copy of the template of the itlog.db table to identify it as the latest version of
the entered itlog.db table, e.g. itloglE.db.

iii)	enter the data from the folder into the appropriate fields in the file, according
to the instruction sheet

iv)	if there are any problems or ambiguities, mark each one with a post-it with an
arrow and note pointing out the location of the problem; and continue to the
next field until all data has been entered

3)	the Data Entry Assistant will resolve any ambiguities found in previous step and

return to the skipped fields and enter result. Clarification of resolution will
be noted on logsheets and explanation of resolution in comments section of
logsheet. Ambiguities will be resolved with the assistance of the Data Entry
Supervisor and Project Data Coordinator.

4)	The Data Entry Assistant will then record completion of data entry in the data entry

logbook

5)	The Data Entry Supervisor will assign the folder to a different Assistant to verify the
data entry.

6)	The Data Input Assistant who will verify the data entry will:

i)	take the folder, data entry instruction sheet, and data and backup disks

ii)	verify the data entry according to the instructions

iii)	the Data Entry Assistant will resolve any ambiguities found in previous step
and return to the skipped fields and enter result. Clarification of resolution
will be noted on logsheets and explanation of resolution in comments section


-------
D02 Questionnaire Data Entry and Preparation,	page 4 of 7

Rev. 1.0	September 25, 1995

of logsheet. Ambiguities will be resolved with the assistance of the Data
Entry Supervisor and Project Data Coordinator.

iv)	include in logbook a record of completion

v)	Rename the file by dropping the E from the file name.

vi)	The Data Entry Assistant will make a backup copy and submit it to the Data
Entry Supervisor when data entry has been completed and any ambiguities
have been resolved.

vii)	The Data Entry Assistant will then record completion of verification in the
data entry logbook

Data Preparation of Confidential Questionnaires at Emory

It is anticipated that very little data will need to be entered for each cycle for the confidential
questionnaire data. This is because we expect only a small percent of participants to have
health problems, be on medication, be pregnant or have large changes in income (so that they
change income categories). If this presumption turns out to be false, the data entry process will
be changed to the same method as described for the logsheets.

Two members of the Data Entry Staff will participate in entry simultaneously. They will follow
these steps in data entry.

1.	The Data Entry Supervisor will assign the confidential questions to be entered into the
tables to two Data Entry Staff (the Data Entry Supervisor may be one of the staff).

2.	A staff member will separate out those questionnaires which do not have a response to any of
the confidential questions (or no change as a response).

3 .The second staff member will check to make sure that the separation was performed correctly.
4.The table will be renamed to reflect that the new table is the next version.

5 .One staff member will then enter the data into the appropriate table, while the second member
checks the data as it is entered.

6.The	new version will be copied to appropriate back-up disks.

7.	A record of completion will be entered into the logbook for both staff members.

8.The	confidential questions will be returned to the Data Entry Supervisor who will return the
confidential question forms and disks to the designated storage location.

Entering new data units to the CDS

New records are added to copies of the latest version of the tables by using the Add/Append
functions to put the results from the initial data files into the CDS. An introduction to these
methods can be found on p. 180 and p. 187 of the Paradox User's Guide.

The procedure will be the same for all tables. The Data Entry Assistant will take the following
steps in preparing the CDS tables for confidential questions and logsheets.

1.	Gather all new units of initial data files to be entered. Then for each new unit of data:

2.	Rename a copy of the last version of the CDS table to identify it as the latest version of
the CDS table.

3.	Take the initial data files and use the Tools|Utilities|Add command to add the initial
table to the latest version of the relevant CDS table created in Step 2.

4.	The new CDS will be copied to the CDS back-up disk. See SOP D01 for more details.

5.	The old version of the CDS will be removed from the CDS directory so only the current
version of the CDS is available.


-------
D02 Questionnaire Data Entry and Preparation,
Rev. 1.0

page 5 of 7
September 25, 1995

6. A record of completion will be entered into the data entry logbook.

Correcting the CDS

See Quality Assurance Procedures for methods used to detect errors in the CDS. If a possible
error is observed by the Data Entry Assistant, they will notify the Data Entry Supervisor who
will notify the Proj ect Data Coordinator. The Proj ect Data Coordinator will decide whether the
correction will be made and will notify the Principle Investigator about the decision. If the CDS
needs to be corrected a renamed copy of the CDS will be made as described in the subsection
File Naming Conventions in the discussion above. An example: if the CDS name is techw4.db
then the renamed copy will be techw4a.db. The Data Entry Assistant will make the correction
directly to the renamed CDS table and the nature of the correction and the reason for the error if
known will be noted in the file conqx.db or logc.db. The updated file and the comment files
will be copied and stored in the designated official database location. The old version of the
CDS will then be archived and removed from the CDS directory. See SOP D01 for storage
details.

Creating the ADS

Confidential questions will not be part of the ADS.

Information from the logsheets will indirectly be part of the ADS, see SOP D03 for details.
7. Training of Data Entry Assistants

7.1	General

Training and testing materials will be prepared and approved by the Project Data
Coordinator and the Data Entry Supervisor. The Data Entry Supervisor will ensure that all
Assistants have been trained and checked for the tasks that they will perform. Assistants may
be trained invidivually or in groups.

Most Data Entry Assistants will be college students working part time. Depending on the
needs of the project and the schedule of each Assistant, an individual Assistant will not
necessarily be trained in all skills; for example; an Assistant might be trained in coding and
reviewing coding, but not in data entry or verification.

7.2	Data Entry and Verification

Training materials will include:

computer with data entry software having appropriately labeled fields

disks for data and backups

confidential questionnaires and logsheets

data entry and verification instruction sheet

data files ready for data entry, and data files that have been entered but not verified
During training:

the Supervisor will demonstrate how to open a file, create a new file, find the appropriate
fields for data entry, enter data, save a file, and rename a file

the Assistant will read the instruction sheet, practice data entry, and submit the completed


-------
D02 Questionnaire Data Entry and Preparation,

Rev. 1.0

page 6 of 7
September 25, 1995

file to the Supervisor

the Supervisor will check the file (using software to compare it with a correct file) and
discuss any errors with the Assistant

if the Assistant has made any errors, s/he will practice until s/he demonstrates 100%
accuracy

to learn verification, the Assistant will read the instruction sheet, practice verifying a file,
and submit the completed file to the Supervisor

the Supervisor will check the file (using software to compare it with a correct file) and
discuss any errors with the Assistant

if the Assistant has made any errors, s/he will practice until s/he demonstrates 100%
accuracy

when the Assistant has demonstrated 100% accuracy in data entry and verification, the
Supervisor will log the Assistant into a list of Assistants authorized to do data entry and
verification.

All personnel will have appropriate training.

Quality assurance procedures used by the Data Entry Company are the following: 1. Double
entry techniques to ensure the correct entry of the questionnaires.

2. Use of Range limits on field inputs to reduce opportunity for incorrect entry.

The ranges used by the Data Entry Company will be specified by the Project Data Coordinator
and will be used in the CDS tables. If an imported value is outside the stated ranges the Data
Entry Assistant will automatically be notified by the Paradox software and the Data Entry
Assistant will investigate the nature of the error and appropriate action will be taken. See SOP
G06 on Problem Management.

To guarantee no complete records have been missed, the Data Entry Assistant will check the
number of records in the temporary data file, which contains the new data, and the number of
records in the old version of the official database to make sure the sum equals the number of
records in the new version of the database. Discrepancies will be checked by looking directly at
the initial data file sent by the Data Entry Company to see which records may have been lost or
duplicated. If no error is observed the Data Entry Assistant will check to make sure that all old
records were correctly incorporated into the new file.

Spot checks of the questionnaires will also be carried out. See SOP D05 for details.

At the conclusion of the study, a comparison of previous versions of the official database with
the final version of the official database will be undertaken. Any discrepancies will be checked.

Borland Paradox Relational Database V5.0 User's Guide and Online Help.
Harvard University/Johns Hopkins University Standard Operating Procedure:
D01 Data Flow Procedures, Rev 1.0

Harvard University/Johns Hopkins University Standard Operating Procedure:
D02 Questionnaire Data Entry and Preparation, Rev 1.0

8

Quality Assurance Procedures

9. References


-------
D02 Questionnaire Data Entry and Preparation,
Rev. 1.0

page 7 of 7
September 25, 1995

Harvard University/Johns Hopkins University Standard Operating Procedure:
D03 Lab Results Data Entry and Preparation, Rev 1.0
Harvard University/Johns Hopkins University Standard Operating Procedure:
D05 Exploratory Data Analysis and Summary Statistics, Rev 1.0


-------
D02 Questionnaire Data Entry and Preparation,	page 8 of 7

Rev. 1.0	September 25, 1995

Appendix A - List of Confidential Questionnaire and Logsheet Data files and
Description of Contents

Initial Data files

ipdat.db - Confidential baseline questions: Dl, D5a,b, D10, T5, B44.
iheal.db - Confidential health questions: B21, F8.
imed.db - Confidential medicine questions: F6,F7.

inam.db - Confidential Question D6a, names of individuals living in the household,
ivis.db - Names of technicians and sampling notes (includes name and address),
itlog.db - Data from logsheet for transformation of lab results,
iclog.db - Comments from logsheet.

The CDS files
pdata.db -
healt.db -
medic.db -
names.db -
visit.db -
conqx.db -
logt.db -
logc.db -

Confidential baseline questions: Dl, D5a,b, D10, T5, B44.

Confidential health questions: B21, F8.

Confidential medicine questions: F6,F7.

Names of individuals living in the household: D6a.

Visit information (includes name and address).

Comments on confidential questionnaire

Data from logsheet for transformation of lab results.

Comments from logsheet or on logsheet data entry.

Appendix B - Set of Confidential Questions

Each questionnaire has an initial page which contains the name, address and phone number of the
participant. This information is not going to added to the data set. The questions are:

Dl: Address

D5a,b: Names of Individuals in household

D10: Telephone number

B21: Health information.

B44: Income levels.

F6: Medications: prescription

F7: Medications: non-prescription.

F8: Pregnancy

T5: Indicate nearest major intersection.


-------