APPENDIX C-4
RECORD CONTENT ANALYSIS METHODOLOGY
Table of Contents
1.0. Introduction..............................................................................................................1
2.0. Method Overview....................................................................................................1
3.0. Objectives................................................................................................................1
4.0. Context Within The Evaluation Framework...............................................................2
5.0. Data Collection and Analysis....................................................................................2
5.1. Survey of GILS Universe..................................................................................2
5.2. Development of Analysis
Criteria......................................................................4
5.2.1.
Issues in Developing Record Content Aggregation Criteria.......................5
5.3. Content Analysis of Sampled
Records...............................................................6
6.0. Method Limitations and Recommendations to Future Researchers.............................8
7.0. Conclusion...............................................................................................................8
Table C4-1 Record Content Analysis Sample Population.................................................3
Table C4-2 Record Content Analysis Criteria..................................................................4
Table C4-3 Aggregation Semantics.................................................................................6
Table C4-4 Information Object Semantics.......................................................................7
1.0. INTRODUCTION
Moen and McClure, in The Government Information Locator Service (GILS): Expanding Research and Development on the ANSI/NISO Z39.50 Information Retrieval Standard: Final Report (1994, p. 30) noted "an important factor in the overall utility of the GILS will be the quality of the data in GILS records. Quality criteria will include accuracy, consistency, completeness, and currency. In order to encourage the creation of high quality information that will populate GILS servers, the development of written guidelines for creating GILS records is essential." This direction, The government information locator service: Guidelines for the Preparation of GILS Core Entries (National Archives and Record Administration, 1995a) is available electronically from the National Archives gopher at <gopher.nara.gov> under "Information for Archivists and Records Managers/GILS Guidance," or from <URL: http://www.nara.gov:70/1/managers/gils>. In addition, Federal information processing standards publication 192, Application Profile for the Government Information Locator Service (GILS) (National Institute for Standards and Technology, 1994) provides other quality-related direction such as preferred order of display for record elements as well as their definitions.
Content analysis of GILS records served three purposes: to assess records’ quality in terms of completeness and accuracy; to explore the relationship of selected characteristics of records and serviceability in networked information discovery and retrieval (NIDR); and to develop recommendations for future application or adaptation of the method.
More than 3500 instances of metadata were evaluated for incidence and/or content, and entered into a database for coding and analysis. In addition, the evaluators maintained a log of lessons learned and areas for further research (see Appendix E-2 Record Content Analysis Findings, Discussion, and Recommendations) that may be utilized by system developers, specification and procedures writers, and people with direct responsibility for GILS record quality.
2.0. METHOD OVERVIEW
The analysis comprised in two phases: Phase 1 involved examination of a pool of 83 records from 42 agencies’ GILS retrieved deliberately to represent a range of information resource types (e.g., databases, catalogs, records systems). These records served as the basis for developing and operationalizing a set of more than 50 qualitative and quantitative evaluative criteria that included records’ format, aggregation, media representation, and descriptiveness. Descriptiveness was defined as the incidence of utilization and content (value) attributes for all mandatory and selected optional elements and subelements as specified by FIPS Pub. 192 Annex E-GILS Core Elements and the NARA Guidelines. In Phase 2, these criteria were systematically applied to a set of 83 records randomly retrieved January 13 and 14, 1997, from 42 agencies’ GILS.
The following paragraphs present information concerning the record content analysis objectives, the context of the analysis within the overall evaluation framework, data collection and analysis, method limitations, lessons learned, and recommendations.
3.0. OBJECTIVES
This analysis attempted to describe the "quality" of GILS records in terms of character or attributes rather than strict conformance to specifications. The latter, which constitutes an audit, would require a greater level of operational detail than current policy and standards provide and is a technique better suited to a more mature information service. The following objectives guided the current examination of GILS records. Where adherence to published direction was relevant, FIPS Pub. 192 Annex E definitions, as reproduced and supplemented by usage guidelines and examples in the NARA Guidelines, served as the basis for evaluation:
1. To assess the accuracy of GILS records in terms of errors in format
and spelling
2. To gauge and compare the relative "completeness" or level
of description of GILS records
- Number of elements per record ("blank"
vs. populated)
- Utilization and values of both mandatory and
selected optional elements
3. To characterize a general profile of GILS product in terms of record
types, aggregation levels, and containers (dissemination media)
4. To evaluate records’ serviceability
- Factors affecting NIDR
- User convenience
- Aesthetics and readability
- Relevance judgment.
The quantitative and qualitative assessments, respectively, of the constitution and properties of sampled records provided data meeting these objectives.
4.0. CONTEXT WITHIN THE EVALUATION FRAMEWORK
As with the other methods comprising this user-oriented evaluation of GILS implementations, the record content analysis both was informed by and served to inform other data collection and instrument development activities in the study. Presentations and panel discussions at the 1996 GILS Conference and focus groups with various user communities highlighted recurring issues surrounding the content of GILS records, such as the level of resource aggregation, suitability of metadata elements, consistency, and quality of presentation. In turn, as discussed in Appendix E-2 Record Content Analysis Findings, Discussion, and Recommendations, the record content analysis proved invaluable in developing a user-assessment script that would both isolate GILS "quality" from that of the user interface or search engine and present realistic information retrieval encounters.
5.0. DATA COLLECTION AND ANALYSIS
Data collection and analysis were performed as described in the following paragraphs using the tool presented in Appendix D-4 Record Content Analysis Instrument as constructed in a Microsoft Accessã database and Microsoft Excelã spreadsheets. Two surveying activities were prerequisite to the analysis of record content: a determination of the GILS universe to optimize the breadth of the sample and a review of planned (i.e., per the NARA Guidelines) vs. actual record characteristics to inform development of analysis criteria.
5.1. Survey of GILS Universe
To provide the broadest possible base for record selection, the investigators first determined the universe of GILS implementations. This was accomplished through various means:
Results of this effort, completed on December 31, 1996, are shown in below in Table C4-1 Record Content Analysis Sample Population with two additional agencies identified for sampling in Phase 2 of the record content analysis.
Table C4-1
Record Content Analysis Sample Population
| Consumer Product Safety Commission |
| Department Of Agriculture |
| Department Of Commerce |
| Department Of Defense |
| Department Of Energy |
| Department Of Health And Human Services |
| Department Of Housing And Urban Development |
| Department Of Interior |
| Department Of Labor |
| Department Of State |
| Department Of Treasury |
| Environmental Protection Agency |
| Equal Employment Opportunity Commission |
| Farm Credit Administration |
| Federal Communications Commission |
| Federal Emergency Management Agency |
| Federal Energy Regulatory Commission |
| Federal Labor Relations Authority |
| Federal Maritime Commission |
| Federal Reserve Board |
| Federal Trade Commission |
| General Services Administration |
| Government Printing Office |
| International Trade Commission |
| Merit Systems Protection Board |
| National Aeronautics And Space Administration |
| National Archives And Records Administration |
| National Transportation Safety Board |
| Nuclear Regulatory Commission |
| Nuclear Waste Technical Review Board |
| Office Of Government Ethics |
| Office Of Management And Budget |
| Office Of Personnel Management |
| Overseas Private Investment Corporation |
| Pension Benefit Garanty Corporation |
| Railroad Retirement Board |
| Securities And Exchange Commission |
| Selective Service System |
| Small Business Administration |
| Social Security Administration |
| U.S. Commission On Civil Rights |
| U.S. Postal Service |
| Total=42 |
5.2. Development of Analysis Criteria
The second activity to prepare for a systematic analysis of GILS record content was the creation of criteria to satisfy the study objectives. This was accomplished by examining a set of two records retrieved from each identified agency GILS. These records—retrieved by use of search terms including "system," "database," "manual," the agency acronym, subject-oriented single words—were selected to represent a variety of file sizes, formats, and content types.
These records were studied and compared to produce the assessment categories shown in Table C4-2 Record Content Analysis Criteria. (Appendix D-4 Record Content Analysis Instrument presents a table of the database fields, possible values, and coding notes that was constructed to record data.)
Table C4-2
Record Content Analysis Criteria
Accuracy
Completeness
Profile
Serviceability
5.2.1. Issues in Developing Record Content Aggregation Criteria
The following definitions served as an initial starting point for operationalizing the phenomenon of aggregation:
AGGREGATION: the degree to which two or more separate parts have been brought together without changing their function or producing any result other than the sum of the operation of the parts.
GRANULATION: the degree to which two or more separate parts of a whole are distinguishable within that whole.
It became apparent during review of the Phase 1 sample that the above definitions are unsuitable for application to GILS records. For example, a record describing a publicly-accessible enterprise-wide AIS whose function is to track information output of four discrete, functionally dedicated, not publicly accessible micro-AISs could be labeled a "highly aggregated" record in that it "rolls up" other potential records. But, should the record include a description of each "grain" (microsystem) it embraces, one would be tempted to code it "low granularity" (subparts are distinguishable).
Another, more concrete, example of the problem of characterizing aggregation of information resources would be The Federal Register in digital (databased) or paper print format. This one record describes one "discrete" publication, but that publication aggregates myriad standalone information objects that, in print, are highly granular to the initiated user but in database form (digital format) are less distinguishable.
Another, more concrete, example of the problem of characterizing aggregation of information resources would be The Federal Register in digital (databased) or paper print format. This one record describes one "discrete" publication, but that publication aggregates myriad standalone information objects that, in print, are highly granular to the initiated user but in database form (digital format) are less distinguishable.
In short, the attribute of "aggregation" is discernible only to the degree that the GILS record presents an explicit enumeration of "granules" or aggregated parts—whether those parts are:
which some will argue is too granular, or they are:
which some will argue should be distinguishable.
Application of definitions of aggregation and granularity imply a knowledge of component-level and collective functionalities that the investigators, and, by proxy, a GILS user, lack and which may be gained only through examination of the object. In a physical library, users of a card catalog, subject bibliography, or other metadata-based tools are accustomed to retrieving and scanning resources’ object-peculiar "primary" metadata (e.g., tables of content, graphics, and back-of-the-book indexes) as required to determine whether "granules" might satisfy their information need; in GILS, where often information resources cannot be examined and thus their "operation" is unknown, the concept of simply "pointing" to an aggregated "locator" may be insufficient in that the aggregation "produces no result other than the sum of the operation of the parts."
Nonetheless, because record and resource aggregation was identified as a recurring theme during other data collection activities of the study, investigator’s adopted the operational definitions of aggregation coding scheme shown in Table C4-3 Aggregation Semantics to characterize the phenomenon. To supplement the limited value returned from assigning aggregation-level coding, investigators incorporated the criterion of "information object" as defined in Table C4-4 as well. Appendix E-2 Record Content Analysis Findings, Discussion, and Recommendations offers additional interpretation of the utility of these measures relative to aggregation and resource description.
Table C4-3
Aggregation Semantics
|
Code |
Operational Definition |
Examples |
| Record Aggregates Objects | GILS record, by virtue of its creation, collects discrete information resources that record content indicates would not have otherwise been collected or aggregated. Assigned in the absence of clues within the record that the represented objects were heretofore packaged as this collection to optimize information discovery and retrieval. |
|
| Aggregated Object Represented | GILS record represents an a priori or purposeful collection of information resources—e.g., woodpecker database or agency website. GILS record represents an object that collects, or comprises, two or more discrete information objects, and that represents a collection of standalone information files or products packaged together on the basis of a common theme or subject for functional convenience. |
|
| Discrete Object Represented | GILS record describes a standalone document-level entity that does not meet the criteria for "object aggregates metadata" below. |
|
| Object Aggregates Metadata | GILS record describes a pre-existing metadata collection, or "locator," as an information resource. |
|
5.3. Content Analysis of Sampled Records
As of early January 1997, 42 agencies’ GILS had been discovered by procedures identified in Section 5.2 Survey of the GILS Universe. The 83 sampled records, selected as described in the next paragraph, resided in three broad "host" categories: GPO (61% of the sample), record sources (34%), and FedWorld (5%). 93% of sampled records resided on a WAIS or Z39.50-compliant server, with the remaining on an HTTP server containing standalone HTML files of GILS records. (Note: since the time period of analysis, FedWorld and GPO have mounted record-source hosted GILS and those hosted by one another, and at least one HTTP-based GILS has migrated to WAIS).
The record content analysis per se first involved selection of GILS records from the known GILS universe (see Table C4-1 Record Content Analysis Sample Population) in one of two ways. For GILS featuring a search engine (i.e., residing on an information retrieval-based platform such as WAIS or Z39.50-compliant server or including a site-resident search engine), the investigator retrieved the first and last "hits" resulting from a "full-text" query of the agency acronym (using the default "number of records to return"). For GILS on which this was not possible (i.e., those mounted on a web server of HTML files that present only a picklist of record titles as if for known-item retrieval or browsing), the investigator retrieved the first and last items listed. In the event of multiple record formats per record, the HTML format was selected.
The resultant 83 records (one agency’s GILS featured only 1 record) were printed for ease of study and comparative reference. Their characteristics were assessed and recorded in a relational database for compilation and subsequently transferred to a spreadsheet for analysis using descriptive statistics. A subset of the total was created and subject to identical analysis by filtering the data for values of "US Federal GILS" or "U.S. Federal GILS" in the Controlled Vocabulary-Local Subject Index-Local Subject Term subelement—a state presumed to indicate record-creators’ intention of identifying the record as a "Core record" as delineated in the NARA Guidelines. No further operationalization of the "Federal Core" was achieved in this evaluation. The "Core subset" comprised 50% of the total sample.
Table C4-4
Information Object Semantics
|
Object |
Operational Definition |
Examples |
| Administrative Catalog | A locator listing of procedural actions related to the conduct of agency business | FERC’s "Directory Of External
Information Collection Requirements" PBGC’s "Log Of Benefit Termination Plans" USPS’s "Index Of Final Opinions And Orders" |
| Agency Homepage | Information mounted on an HTTP server | "Superintendent of Documents Home Page on the World Wide Web" |
| Bibliographic Database | An automated information system comprising metadata about bibliographic entities/publications | DOE’s "OpenNet" "HUD USER" |
| Form | A document designed to elicit and transmit specific information from the user to the supplier, respectively | "Request for Registration for
Political Risk Insurance "SSA-1710" |
| Job Line | A telephonic recording of employment opportunities | "DOI Employment Center" |
| Miscellaneous Documents In Ad Hoc Collection | Plurality of documents grouped by function or subject | bulletins and memoranda press releases public comments under-described "technical documents" and "reports" update notices letters speeches records |
| Organization | A set of human resources defined by an agency to provide specific products or services | information center/library research consortium NASA’s "Flight Dynamics Facility" |
| Program | A prescribed set of activities and functions performed to accomplish an objective | report management records management |
| Publication | Discrete monographic document published one-time or in serial mode to disseminate information | annual report user’s manual "The Federal Register" Regulations CD-ROM fact-sheet series procedures manual |
| Publications Catalog | A fixed, flat (non-machine-searchable) listing of selected or all agency publications | FEMA’s "Publications Catalog" |
| Subject Matter Database | Single, stand-alone automated information system comprising data, records, or multiple documents on technical or administrative subject(s) and/or definable reference themes | Privacy-Act records health risks aviation accidents red cockaded woodpecker |
| System Of Systems | Macro-AIS comprising or integrating multiple databases and/or single-AISs | DOD’s "Enterprise Information
System" EPA’s "Information Systems Inventory" |
6.0. METHOD LIMITATIONS AND RECOMMENDATIONS TO FUTURE RESEARCHERS
The primary limitation to the procedures described for analyzing GILS record content is generalizability—the extent to which results can be assumed valid for the entire population of GILS records. The sample was small, less than 2% of the estimated total of approximately 5,000, and the sampling technique was largely convenience-driven due to time constraints. In addition, the method as employed did not provide data concerning differences in record quality among or within agencies’ GILS, which might prove useful in estimating the scope of effort required in modifying elements or standardizing the characteristics of element values.
The record content analysis was extremely time-consuming, both in terms of defining mutually exclusive codes for content description and data collection. As noted above, even this small sample involved recognition of presence or absence of thousands of instances of metadata elements as well as examination and description of their values. Much of the labor burden of the current procedure could be alleviated by machine processing—e.g., for element counts, incidence of hypertext, etc. In addition, it is anticipated that the exploratory method described herein will be refined and adapted during subsequent applications, both for assessing the responsiveness of government-wide quality standards for GILS (vis a vis the NARA Guidelines) and, at the agency level, the quality of GILS record collections.
7.0. CONCLUSION
In summary, the method employed to analyze the content of GILS records
proved highly satisfactory in rendering the type of results that would
inform the overall evaluation. By providing a bird’s-eye view of the "product
on the shelf" at a given point in time, this method allows a comparison
of planned vs. actual outcomes for quality. Agencies’ continuous analysis
and reporting of record content will serve well in complementing evaluations
of the effectiveness of the NARA Guidelines, implementation maturity,
and user satisfaction.