How do we filter MARC records out of WorldCat, anyway?

The inspiration for this blog post came from the realization that, after touting system improvements we made over the weekend to the way ArchiveGrid looks and adapts to smartphones and tablets, we forgot to add one feature we had worked on in tandem with said improvements: A “frequently asked questions” section to our About ArchiveGrid page. It’s there now, addressing almost every question contributors and potential data suppliers ask.

Except one, which this post attempts to explain.

The question: How do we filter the MARC records out of WorldCat?

As shown in the statistics on our updated About ArchiveGrid page, MARC records extracted from WorldCat make up the bulk of ArchiveGrid’s content … about 90%. But there isn’t a simple way to identify a MARC record that describes the types of materials held in archives, manuscript collections, and special collections.

We look at every one of the 280 million or more records in WorldCat, and exclude those that have any of these characteristics:

  • Have more than one library holding symbol attached
  • Do not have the value b, d, f, p, r, or t in MARC Leader byte 6 (see table below), or the value “a” (language material) in Leader byte 6 and the value “c” (collection) in Leader byte 7, or the value “a” (archival) in Leader byte 8
  • Have a value of any kind in MARC 260 subfield “a” or “b” (to filter out published works)
  • Have a MARC subject heading with a subfield “a” or “v” beginning with the word “Bibliography”
  • Have a MARC 502 field (Theses or dissertation note)
  • Have the material type “book” or “serial” and any value in the MARC 008 or 006 “Nature of Contents” bytes (to eliminate theses, reference works, and other non-archival materials)

This filter isn’t always successful.  Especially for minimally-cataloged materials, we sometimes see descriptions of unpublished manuscripts of various kinds filter through.  But we continue to evaluate and improve the filter as best we can.

MARC Leader byte 6 values:

  • a Language material
  • b Archival and manuscripts control Note: Value obsolete
  • c Printed music
  • d Manuscript music
  • e Cartographic material
  • f Manuscript cartographic material
  • g Projected medium
  • i Non-musical sound recording
  • j Musical sound recording
  • k 2-dimensional non-projectable graphic
  • m Computer file
  • o Kit
  • p Mixed material
  • r 3-dimensional artifact or naturally occurring object
  • t Manuscript language material
This entry was posted in Building ArchiveGrid. Bookmark the permalink.

Comments are closed.