February started off with a new employee at OCLC Research’s San Mateo, Calif., office. Marc Bron, a PhD candidate at University of Amsterdam, will spend the next three months focusing on archival data and incorporating his findings into ArchiveGrid, before returning home to The Netherlands and finishing his information retrieval (computer science) program.
In his own words, here is a description of what Marc’s project this spring will be:
“Archival collections provide us with a window into history, and Encoded Archival Descriptions (EAD’s) support the discovery of these collections by carefully describing each collection. Each EAD, however, describes an individual collection in isolation of other collections. That both collections have something in common, i.e., the history of California, remains hidden. In the next three months we will investigate whether it is possible to make meaningful connections between EAD’s and thus the collections they describe. As a starting point we focus on entities and perform Named Entity Recognition (NER) on the EAD’s. By analyzing the co-occurrences of entities across EAD’s we will gain an understanding of what type of co-occurrence constitutes a meaningful connection. We plan to incorporate these findings into the ArchiveGrid system as a new discovery model and to evaluate the model through A/B testing.
“My personal interest lies in investigating the type of connections between entities that can be automatically detected in contextually sparse documents such as EAD’s. Answering this question would further our knowledge about the opportunities for using automatic methods to supplement the annotation of archival collections.”
Marc has been thoroughly enjoying the weather and the beautiful environment in the San Francisco Bay area and he is planning some weekend trips to the national parks.