ArchiveGrid is a collection of nearly two million archival material descriptions, including MARC records from WorldCat and finding aids harvested from the web.
It's supported by OCLC Research as the basis for our experimentation and testing in text mining, data analysis, and discovery system
applications and interfaces. Archival collections held by thousands of libraries, museums, historical societies, and archives are represented in ArchiveGrid.
ArchiveGrid provides access to detailed archival collection descriptions, making information available about historical documents, personal papers,
family histories, and other archival materials. It also provides contact information for the institutions where the collections are kept.
ArchiveGrid data is primarily focused on archival material descriptions for institutions in the United States. This reflects the contribution patterns for
descriptions of materials under archival control in WorldCat,
which make up the majority of descriptions in ArchiveGrid. We may extend ArchiveGrid beyond its current scope if it is necessary to support
OCLC Research experimental objectives.
ArchiveGrid illustrates OCLC's interest in advancing issues important to the archival community. Our work within ArchiveGrid gives OCLC Research
a foundation for collaboration and interactions with others in the archival community. We expect to share the results of MARC and EAD tag analysis,
provide discovery system analytics for contributors, document investigations of text mining and data visualization, participate in community
working groups pursuing improvements to description and discovery, and more. To support those interests and objectives, we'll continue to
build this extensive and current aggregation of archival material descriptions, within the
constraints of OCLC Research's committed and on-going support for this project.
OCLC had offered ArchiveGrid as a subscription-based discovery service until 2012 when that subscription service was discontinued. While the new,
freely-available OCLC Research ArchiveGrid interface is not a full production service, it shares some of the same attributes. Researchers can
expect to use it for discovery of archival materials, and archives can work with OCLC Research to have their materials represented in the
aggregation in a reliable and persistent way.
If you have questions about your collection descriptions in ArchiveGrid, please get in touch with us.
Interested in contributing? Please let us know that as well.
How do we get our finding aids in ArchiveGrid?
Your role in getting your finding aids into ArchiveGrid is simply to give us permission to do so. We harvest your finding aids from a webpage you provide us, and load them into our index that end-users can search. About once every six weeks, we update our index by removing all finding aid files and re-harvesting them, so any changes you make on your end, such as adding, editing, or removing finding aids, will be reflected in ArchiveGrid then. When you sign up to include your finding aids, we will let you when our index update is planned.
There is no time commitment from you, except to provide us with a single webpage of finding aids. If you are not able to provide us with a directory of finding aids, or a single webpage containing all of them for us to harvest, we may still be able to include them.
Can I contribute finding aids even if I don't belong to OCLC?
Yes, we can still harvest your finding aids. There is no cost to contribute your finding aids, and we take them in EAD, HTML, PDF, and Word formats, too.
Where do you get your collection descriptions?
We index finding aids harvested directly from contributors, and we also pull MARC bibliographic records from WorldCat which we identified as Archival material.
How do you select WorldCat records for inclusion in ArchiveGrid?There isn't a simple way to identify a MARC record that describes the types of materials held in archives, manuscript collections,
and special collections. We look at every one of the 280 million or more records in WorldCat, and exclude those that have any of these characteristics:
- Have more than one library holding symbol attached
- Do not have the value b, d, f, p, r, or t in MARC Leader byte 6 (see table below), or the value "a" (language material) in Leader byte 6 and the value "c" (collection) in Leader byte 7, or the value "a" (archival) in Leader byte 8
- Have a value of any kind in MARC 260 subfield "a" or "b" (to filter out published works)
- Have a MARC subject heading with a subfield "a" or "v" beginning with the word "Bibliography"
- Have a MARC 502 field (Theses or dissertation note)
- Have the material type "book" or "serial" and any value in the MARC 008 or 006 "Nature of Contents" bytes (to eliminate theses, reference works, and other non-archival materials)
This filter isn't always successful. Especially for minimally-cataloged materials, we sometimes see descriptions of unpublished manuscripts of various kinds filter through. But we continue to evaluate and improve the filter as best we can.
MARC Leader byte 6 values:
- a Language material
- b Archival and manuscripts control Note: Value obsolete
- c Printed music
- d Manuscript music
- e Cartographic material
- f Manuscript cartographic material
- g Projected medium
- i Non-musical sound recording
- j Musical sound recording
- k 2-dimensional non-projectable graphic
- m Computer file
- o Kit
- p Mixed material
- r 3-dimensional artifact or naturally occurring object
- t Manuscript language material
Why does the same collection sometimes have two different entries in the ArchiveGrid index?
Duplicate entries result from harvesting finding aids and extracting WorldCat MARC records for the same collection. While we continue to work on a way to effectively cluster or de-duplicate these two forms of the collection description from the same contributor, we have hesitated in favoring one over the other as each includes access points not found in the other. We are trying to maximize access and discovery, so in this situation we've decided to favor recall over precision.
What about copyright?
OCLC does not claim copyright ownership of individual collection descriptions contributed to ArchiveGrid.
More information on rights and responsibilities and a legal statement are available on the web page where you can request to include your collections.
How many institutions contribute?
Collection descriptions from around 1,000 institutions - libraries, museums, historical societies, etc. - in mostly North America are included in ArchiveGrid. The number of institutions in Europe, Asia, Australia, and Central and South America is growing.
Why is ArchiveGrid a free service?
We transitioned from a subscription-based service to a free one last year, in order to make our index of more than 1.7 million archival collection descriptions freely available for everyone - family history researchers, students, working professionals ... you name it. ArchiveGrid remains an OCLC Research project, allowing us to improve archives and special collections research through studies of researchers, description, and discovery.
What do you know about your users?
When ArchiveGrid started in the late 1990s as a way to test if EAD finding aids from different sources could work well together in one search system, we mostly had faculty and college students and genealogists in mind as our researchers. We later gathered a lot of data about these user groups from studies, and we continued to make system design improvements based on their needs and behavior.
Earlier this year, we studied archives and special collections users again and learned that the future of primary source research still lies in faculty, students, and genealogists, but it must also accommodate people motivated by something in their personal and professional lives to conduct primary source research. This "fourth arm" of archives and special collections users include filmmakers, writers, designers, hobbyists, and much more. We understand the need to continue to diversify our contributors and available resources, and improve our design for our researchers.
What are your site visit statistics like?
We use Google Analytics to track site visits and show selected current statistics here. Most visitors reach us from search engines and other visitors link to us directly from lists of archival resources institutions provide. Since we realize most discovery happens on the web with search engines, so we make sure resources in ArchiveGrid get noticed in Google search results.
Interpreting These Statistics
The growth of MARC records in the last year represents identification and inclusion of additional contributors of data to WorldCat, and the on-going
growth of WorldCat in general. While there has also been a concerted effort to increase the number of finding aid contributors, that form of
archival description is a relatively small percentage of the entire aggregation. And some MARC records may be removed from the current set, as we improve our
selection algorithms to identify archival collection descriptions.
ArchiveGrid Weekly Visits, November 2011 - May 2013
Visits referred by search engines
Visits via links from other websites
Visits via direct links
By State in the United States
Interpreting These Statistics
ArchiveGrid contributing institutions are currently hard to count. In some cases, what might be considered as one institution may be represented in ArchiveGrid by several
different contributors, one for each archive or special collections department affiliated with the larger institution. But in other instances, one contributing institution
provides descriptions for a great many more individual archives (for example, the NUCMC records contributed by the Library of Congress). Until we can more accurately
identify and count these contributors, the current count of contributors will be much lower than the actual number of institutions represented.