University of Denver
Log Number: LG-86-18-0061-18
The University of Denver, in collaboration with Northeastern University, will perform a content-based study of text duplication and similarity in massive digital library collections like the HathiTrust Digital Library. Content-based analysis of large digital libraries is an emerging research domain in the humanities, but its effectiveness is limited by text duplications and variations. The research will work to overcome the biases of these duplicated and variant texts by developing tools to identify multiple levels of similarity. It also will produce a dataset of likenesses between books and authors to inform access and retrieval methods in libraries, making it easier for library catalog recommender systems to unearth original works and authors.