Finding aids we acquire directly from archives and special collections and load into the ArchiveGrid index come in many forms: EAD, HTML, PDF, and even some Word files. We send a program – a crawler – to a website – which we call a crawler site – and collect, or harvest, these finding aid source files. However, not all crawler sites are made equal and often we have to ask potential contributors requesting to include their finding aids in ArchiveGrid to make some modifications.
Here are ideas for what we seek in a crawler site:
Finding aids on a machine-readable web page. We prefer if the the page only contains finding aids. It can be a directory, or just a page with source files. Here is an example. However there are other crawler site options that can work. One page we crawl lives on Google Sites.
If there isn’t a machine-readable page, or time or resources to make one, we can harvest finding aids from a front-end interface for users, such as a browse page. This works as long as: 1. Finding aid URL’s are distinguishable from non-finding aid URL’s by their word strings, and 2. An option exists to display all finding aids on one page. Our crawler only picks up links we tell it to on the URL we give it, and it doesn’t follow links to other pages where other finding aids may be.
Please write to us with any questions you may have. We have found solutions with most contributors for finding aid inclusion, so we leave no finding aid un-discoverable in ArchiveGrid.