UNT Research Home | Digital Curation

Archives | Staff

Digital Curation

UNT Known for Making

Information

Accessible

in the Digital Age

By Nancy Kolsti

During a visit to her small East Texas hometown, Cathy Hartman, associate dean for the University of North Texas Libraries, was surprised to learn from a fellow restaurant customer that he had seen the 1924 teaching contract of her aunt, Pearl Vinson. The document, preserved online on UNT’s Portal to Texas History, stated that Vinson would teach for seven months at Cross Roads Elementary in Cass County, receiving $85 per month.

Richard Dixon

From left, Mark Phillips, assistant dean of digital libraries, Cathy Hartman, associate dean of libraries, and Martin Halbert, dean of libraries, help make UNT a leader in digital curation. The library has the capacity to scan 100,000 pages of content a month, indexed to the word level.

Photo by: Michael Clements

The teaching contract is more than just history for Hartman’s family. It’s an example of the primary source materials from libraries, museums, archives and private donors that provide portal users with glimpses of past life in Texas.

“Unless you could travel all over the state, you wouldn’t be able to see all of these items, especially since some of the smaller libraries’ and museums’ collections aren’t widely publicized. Even if they had the content on their websites, chances are it wouldn’t show up in the first 100 pages of a search engine hit list,” Hartman says. “But the portal, which has more than 6 million visitors per year, is indexed by Google and all other search engines. Its content is generally at the top of the results page.”

Created by the UNT Libraries’ Digital Projects Unit in 2002, the portal is one example of how the digital age has changed the way people access information and the way information professionals make it accessible. During the past decade, UNT has received statewide, national and international recognition for its digital collections and leadership in digital curation — a term from the museum community, says Martin Halbert, dean of the UNT Libraries.

“Just as museums have objects that are displayed and cared for over the years, and the museums provide information about the objects to visitors, ‘digital curation’ refers to digital objects that are displayed,” Halbert says. “Our library has the capacity to scan 100,000 pages of content per month and index them down to the word level.”

Online Resources

The portal is the largest of UNT’s digital collections and one of the most used, with more than 500,000 visitors per month. It has received several recognitions, including the 2013 Wayne Williams Library Project of the Year Award from the Texas Library Association. It also has been recognized as one of the best online resources for education in the humanities by the National Endowment for the Humanities.

Mark Phillips, assistant dean for digital libraries, notes that while the portal is “outwardly focused,” the UNT Digital Library contains material from the university. Information available includes students’ theses and dissertations dating from the 1930s, scholarly and creative works by faculty members, and the Data Repository, a central archive for long-term access to faculty research datasets.

“Federal funding agencies now require those who receive grants to submit data management plans. With the repository, our faculty can allow the library staff to take care of it,” Phillips says.

The digital library also includes the CyberCemetery, which has archived inactive government websites, including those from past presidential administrations and defunct agencies, since 1997.

The CyberCemetery and the scanned items in the Government Documents collection led to the UNT Libraries being named one of 10 affiliated archives of the National Archives and Records Administration.

The libraries also were ranked among the top 30 institutional digital repositories in the world by the Cybermetrics Lab, a research group of the Spanish National Research Council.

Mapping Texts

Andrew J. Torget

Andrew J. Torget, assistant professor of history, helped develop a better way than simple keyword searches to explore the content of the historical Texas newspapers on UNT’s award-winning Portal to Texas History.

Photo by: Michael Clements

Hartman says digital curation is more than just making material available online. It includes preserving the material over time and creating metadata to provide user-friendly information about the items. Faculty members conducting research related to digital scholarship and access also are finding the data provided by UNT’s digital collections invaluable.

As both a 19th-century historian and a researcher in digital scholarship, Andrew J. Torget wanted to develop a better way than simple keyword searches to explore the content of the more than one million pages of historical Texas newspapers available on the Portal to Texas History. The newspapers date to 1829.

In 2007, the UNT Libraries began digitizing the newspaper pages after receiving National Endowment for the Humanities funds through the NEH National Digital Newspaper Program. UNT was one of eight U.S. universities and the only Texas university to receive the initial NDNP funding, for its proposal “Lone Star Ink: Exploring Texas Through Historic Newspapers.” UNT has received more than $2 million from the program to digitize newspapers.

In 2010, Torget, an assistant professor of history, and computer scientist Rada Mihalcea, now at the University of Michigan, began working with faculty members at Stanford University’s Bill Lane Center for the American West to create Mapping Texts — two interfaces for the language content of more than 250,000 pages of the historical newspapers.

Torget says the project “is about solving a big data problem.”

“When you can explore hundreds of millions of words, a basic text search simply isn’t enough,” he says, noting that when he searched the newspaper pages for “cotton,” he received more than 71,000 results.

The researchers’ goal was to develop methods for finding and analyzing meaningful content within the massive collection. The first interface on the Mapping Texts website allows users to assess the amount of information available and its digital quality — the number of recognizable words compared to the total number scanned — by geographic area, newspaper and time period. The second interface allows users to assess language patterns, browsing the most common words, names and topics by geographic area, newspaper and time period.

Mapping Texts has been featured in the Journal of Digital Humanities and is being used by researchers of digital scholarship, Torget says. He adds that the research team hopes to expand the project to integrate the interfaces directly into the portal and include more newspaper pages.

Multilingual Access

Jiangping Chen

Jiangping Chen, associate professor of library and information sciences, researches multilingual information access. She is using human translations of metadata records to train a machine translation system.

Photo by: Michael Clements

While Torget studies better ways to explore digital content, Jiangping Chen, associate professor of library and information sciences, works on a different issue: multilingual information access. During the past four years, Chen has received two National Leadership grants from the Institute of Museum and Library Services.

In the first project, she used digital records from the Portal to Texas History and the UNT catalog to evaluate machine translation technologies, such as Google Translate and Bing Translate, and combined machine translation results to develop the most effective metadata translation strategies. In her latest IMLS research, Chen will use a machine translation system developed by her team to translate a collection of digital records from the UNT and Library of Congress catalogs into simplified Chinese and Spanish, the two most widely used languages on the Internet after English.

Chen’s team will use translations of metadata records by native speakers of Chinese and Spanish to train the machine translation system. She notes that several U.S. digital libraries have provided multilingual information access to their collections, with all records manually translated.

“We assume that humans can do a better job than a machine translation system, but human translation is costly and slow. It also leads to inconsistency. Of course, machine translation is also not perfect,” she says, noting that the word “food” has 10 translations in simplified Chinese.

“A system learned from verified human translations could do a fairly good job and perform translation much faster,” she says.

Chen has been collaborating on the project with researchers from Carnegie Mellon University, Wuhan University and Shenzhen Library in China, and the Autonomous University of the State of Mexico.

History for All

As researchers continue to work on issues of digital access, UNT continues to receive attention for its digital curation initiatives. In 2012, the Oklahoma Historical Society Research Division received funding for UNT’s Digital Projects Unit to create the Gateway to Oklahoma History. The gateway is similar to the Portal to Texas History and contains more than 600,000 pages.

Hartman says the libraries receive requests for use of the portal’s items from researchers in a multitude of fields, and she is particularly proud of the Resources for Educators portion of the portal, which consists of more than 60 lesson plans being used by elementary and secondary school teachers and their students.

The plans incorporate photos, newspaper articles, memoirs, letters and maps from the portal. More than 8,000 teachers visit the site each month to download the free lessons and find new ideas for their classroom curricula.

“We’re bringing history to scholars of all ages,” Hartman says.

GUIDING THE FIELD

UNT’s national and international reputation as a leader in digital curation — including a top 30 ranking among the world’s institutional digital repositories — also makes it a leader in guiding the development of the field.

In 2011, the Aligning National Approaches to Digital Preservation conference at the National Library of Estonia in Tallin developed out of a series of conversations among representatives of the UNT Libraries and the U.S. Library of Congress with others. That conference and a second conference in 2013 in Barcelona, Spain, fostered international collaborations in digital preservation.

In the area of research data management — how to preserve research data and make it publicly accessible over the long term — UNT Libraries Dean Martin Halbert leads a team of researchers on the DataRes Project. Funded by the Institute of Museum and Library Services’ Laura Bush 21st Century Librarian Program, the researchers are conducting a baseline study investigating universities’ data management practices and the role of libraries in the process.

Data management plans detailing how research results will be shared and disseminated are required by funding agencies such as the National Science Foundation and National Institutes of Health. The two-year DataRes Project was cited in a 2013 report from the Council on Library and Information Resources.

More Features

Biocultural Conservation

Biocultural Conservation
The sub-Antarctic provides a model environment for studying the diversities of life.
By Adrienne Nettles

Plant Discoveries

Plant Discoveries
Improved crops and new bioproducts grow from the research of Richard Dixon.
By Leslie Wimmer

College of Innovation

College of Innovation
Across the College of Engineering, students and faculty turn knowledge into technology for a healthier world.
By Leslie Wimmer

Art Meets Science

Art Meets Science
Music, art and technology merge as faculty push the boundaries of their fields.
By Margarita Venegas

Honors College Research

Honors College Research
Students team with faculty to study water, therapeutic molecules, drinking and physiology.
By Jessica DeLeón

Economic Viability

Economic Viability
New composite index helps cities measure and battle urban blight.
By Ellen Rossetti

Departments

President's Note

Advancing research and economic development

News Briefs

Fulbright awards, medical research, student projects

Faculty Books

Race and the Cold War, science and politics, systemic thinking

Student Researchers

Sustainable fashion, ecotoxicology, musician health,
marketing, merchandising, nonprofit management

Faculty Researchers

Engineering, journalism, composition, psychology of physical activity, catalysts, behavior analysis, information aesthetics

End Note

Multidisciplinary research and high-quality faculty