Indiana University

Skip to:

  1. Search
  2. Breadcrumb Navigation
  3. Content
  4. Browse by Topic

Media Contacts

Ceci Jones Schrock
Media Relations Specialist
(812) 856-2337

New tools available to mine world's largest digital repository of books


April 22, 2013

BLOOMINGTON, Ind. and URBANA, Ill. -- This week the HathiTrust Research Center (HTRC) announced the availability of data mining and analytics tools for the HathiTrust Digital Library, a collection of digital texts from over 70 research libraries around the world. The new tools provide a much-needed entry point to large-scale analysis of HathiTrust's contents.

"All of us at Indiana University and the University of Illinois, who have been working toward this release for the last year, can be proud of enabling a first round of shared computation tools for the HathiTrust corpus," said Beth Plale, professor in the IU School of Informatics and Computing and co-director of the HTRC. "We are now ready to share this framework for analytical (non-consumptive) research."

Indiana University and the University of Illinois are the founding partners of the HTRC. The new infrastructure release follows an aggressive development path set forth by the HTRC Executive Management Team at the 2012 HTRC UnCamp, a gathering of HTRC developers, researchers and librarians. Users can now apply sophisticated computational research methodology across the large-scale collection, leveraging metadata crafted over time by libraries.

In phase two of the HTRC (September 2012-March 2013), the HTRC Technical Working Group created production versions of the beta services previewed at the 2012 UnCamp event. They are now working to open the resources to community testers who are part of the HTRC User Group Community. (For subscription details, see:

"This represents a major step forward in understanding how new knowledge can be derived from one of the largest digital library collections in the world," notes J. Stephen Downie, professor in the Graduate School of Library and Information Science at the University of Illinois and co-director of the HTRC.

The HTRC service stack, which provides the analytical entry point, is based on a completely new technical architecture. This framework leverages existing analytics tools such as SEASR (, digital library software such as Blacklight (, and a services-oriented architecture application interface. The current production phase includes a HTRC Sandbox that is open to scholars for evaluation of the HTRC services stack as part of their experiments.

"This is a significant step forward in making the HathiTrust digital collection a valued source for creating new scholarship," remarked Laine Farley, member of the HathiTrust Board of Governors and executive director at the California Digital Library.

About the HTRC and the HathiTrust

The HathiTrust Research Center (HTRC) is dedicated to providing computational access to the HathiTrust repository ( HTRC's mission is to create a persistent and sustainable structure that enables original research, and drives new discoveries on the text corpus of the HathiTrust repository. For more on the HTRC and its services stack, see

The HathiTrust Digital Library guarantees the long-term preservation of the materials it holds, while providing the expert curation and consistent access associated with research libraries. HathiTrust also enables members to provide enhanced discovery and access services to their communities. For more, see

About Indiana University

Grounded in the liberal arts and sciences, Indiana University is a major multi-campus public research institution with more than 130,000 students, faculty and staff. IU has a national reputation in the areas of information technology and advanced networking, and seeks opportunities to offer leadership in creative solutions for 21st century problems. For more, see

About the University of Illinois

The University of Illinois, Champaign-Urbana has long ranked among the nation's most distinguished teaching and research institutions. Its diverse, world-class programs reflect the mission of a land-grant university. The largest public university in Illinois, the U. of I. campus was chartered by the state in 1867 as the Illinois Industrial University and opened its doors to students in 1868. For more, see