What two changes we made to ArchiveGrid mean for users

When Morristown National Historic Park in New Jersey was dedicated on July 4, 1933 as the first national historic park in the United States, there was a parade. Its theme was Spirit of ’76, referring to the Revolutionary War when a cluster of three historic military posts, Jockey Hollow, the Ford Mansion, and Fort Nonsense, served as George Washington’s army headquarters.

Imagine you’re in Google looking for photographs of the parade. You click on something and go to a page in ArchiveGrid, and this digitized black-and-white photograph from the dedication is discovered:

If you have visited ArchiveGrid before, you’ll know you’re looking at an archival collection description. You’ll be able to find out who holds the collection (this one’s at the Morristown and Morris Township Public Library) and how to access it.

Google Analytics shows that around 80 percent of ArchiveGrid visitors arrive this way, by clicking on a collection description or item they found in Google. These descriptions act as a kind of “home page” for most new visitors to our site, and this might be our best opportunity to show them what ArchiveGrid is all about and provide them maximum value.

With our new “More like this” feature, we’re hoping to do exactly that. Located in a box on the right-hand side of an individual record display, “More like this” uses connections made in ArchiveGrid’s Solr index to offer extra contextual information and links to related materials – without disrupting the flow for those who just want contact information and to learn more about access to the resource.

Success of “More like this” depends on how rich the collection description is and the extent to which related people and topics can be found in other descriptions.

Right away we noticed this feature seems to work well for items from digitized collections, such as in the example above. It provides a way to view other items in the collection without searching:

For other collections, it can suggest closely-related resources at other institutions:

Now imagine you’re doing a search in ArchiveGrid. A relevance ranking algorithm that mostly paid attention to keyword matches, compared to the number of times they occur in a description, and the description’s overall length, used to generate your search results.

We made some adjustments so now matches in certain metadata elements (title, author, scope and content) get emphasized over other fields in the keyword index. Behind the scenes, we’re grouping descriptions by their extent into small, medium, and large. This allows us to give greater weight to collection descriptions over sub-series and items.

As a result, we’re doing a better job of making “key collections” appear near the top of a related search result in ArchiveGrid.

Collections from land down under now aboard ArchiveGrid

A century-old story of the British Antarctic expedition and the Australian photographer who documented it needs to be told before going into detail about our most recent ArchiveGrid index update.

100 years ago, against the backdrop of the early stages of World War One, the national heroes in Great Britain and Norway were men who had been on expeditions in Antarctica, the world’s final frontier at the time. Like the 1960s United States/Russia race to the moon, Britain wanted to beat Norway to the South Pole. Roald Amundsen took that honor for Norway in 1911, easily beating Britain’s Robert Falcon Scott with superiority in skiing and dog sledding.

As a sort of consolation prize, 40-year-old Sir Ernest Shackleton, who contracted scurvy while on Scott’s first polar mission 12 years before, was determined to re-capture England’s glory by being the first to cross Antarctica.

Shackleton made sure that members of his expedition had the best of everything. Outfitted in Burberry aboard the new Norwegian barquentineEndurance,” the men, along with 69 sled dogs, left a south Atlantic whaling station on Dec. 5, 1914, bound for the Weddell Sea. Their plan after crossing the continent was to catch a relief ship, the Aurora, in the Ross Sea.

Strong tents, big and hearty dogs, a disease-preventing nutrition plan, and a ship built of Norwegian fir to withstand ice were supposed to help the Shackleton Expedition succeed. Instead, ice packs trapped the Endurance, crushed her, and marooned the crew. They didn’t touch land for 497 days and they turned to eating their dogs, penguins, and seals for survival. Their survival was a testament to Shackleton’s leadership.

Some funds the Irish-born Shackleton raised for the cross-Antarctic expedition came from news and film rights sales. That’s where Australian photographer Frank Hurley comes into the picture. He provided the images that would cement what went from an ambitious undertaking to one of the most compelling stories of human survival. Not only did they help pay off debt Shackleton owed after the expedition, they are still held in high regard for their beauty and storytelling power.

Photo-journalists today can learn from Hurley’s innovation and courage to push the limits for the best shots, get close to his subjects, and leave no angle unconsidered. Hurley’s preservation technique of soldering negatives in metal casing is how his work incurred little damage during the expedition.

Gone is discovery done the Shackleton way, marked by hunger, strain, animal deaths, and rugged perseverance. But now you can discover photographs from the Shackleton Expedition in ArchiveGrid. New Australian contributors and collections of Hurley’s work were brought aboard during our most recent index update. The National Library of Australia’s collection of digitized glass plate negatives of photos taken during the expedition are just part of the collections that are newly included in ArchiveGrid.

We recently expanded the filter that selects WorldCat records for inclusion in ArchiveGrid to include more collections of images and both visual and audio recordings. After that expansion we needed to also broaden our horizons for institutions that we register for inclusion in ArchiveGrid. WorldCat contributors in Australia and New Zealand that weren’t previously registered as ArchiveGrid institutions have now been added, including:

We were excited to see how rich these collections from Down Under are, especially the materials related to the history of South Pole exploration. We are also excited that these collections are more “discoverable” for researchers. Isn’t excitement what drives people farther than they imagined? Who else on Dec. 5, 1914 felt more excitement than Sir Shackleton? Maybe, by seeing the uncharted future through Hurley’s lens, we can feel a bit of that excitement, too.

With the addition of these significant collection descriptions to ArchiveGrid, the Shackleton Expedition records are just the tip of the iceberg, in terms of collections you can explore. We invite you to share your discoveries with us!

With MARC filter-flexing, ArchiveGrid index exceeds 3 million records

While March came in like a lion weather-wise in many places, the saying meant good news index-wise for ArchiveGrid. This week we updated the index and once again, the number of collections and items represented in ArchiveGrid spiked. In January’s update, the index grew by around 600,000 records and reached 2.4 million. Now we’re over 3 million.

A snow-drizzled "Fortitude" guards one side of the New York Public Library entrance, while "Patience" (not pictured) guards the other. Image source: Flickr Creative Commons

Here’s why:

MARC records from WorldCat represent at least 90 percent of ArchiveGrid’s descriptions. Since no particular MARC record field tells us “Hey! Include me in ArchiveGrid,” we use a combination of elements. In the “recall vs. precision” performance metric, we’ve tended to err on the side of recall. Details on how we filter WorldCat records for inclusion in ArchiveGrid are here.

Twice this year, we tuned the filter we use to extract MARC records from WorldCat to include more record types based on the MARC Leader byte 6 value. January’s update brought in records with the value of “k” (two-dimensional, non-projecting graphics). This update includes records with the value of “g” (projected medium), “i” (nonmusical sound recording), or “j” (musical sound recording). We think these adjustments allow more descriptions of the types of materials ArchiveGrid searchers could expect to find, without overloading the index with records we’d prefer to filter out: irrelevant materials or published works, items held in multiple locations, etc. So we expect to continue adjusting the filter and see the total number of records change as we get more precise.

Here are some highlights of what valuable primary sources the added g, i, and i indexes in ArchiveGrid have to offer:

  • Around 1,800 sound discs, tape reels, and cassettes of nearly all of Duke Ellington’s commercial and non-commercial recordings and also some radio broadcasts. Collected by Joseph Jeffers Dodge, Harvard University acquired the collection in March 1998.
  • A live 1963 recording in Germany of John F. Kennedy’s “Ich bin ein Berliner” speech, at Ball State University.
  • A May 2, 2003 VHS recording of the “Service of death and resurrection for Fred McFeely Rogers, or Mister Rogers, at Pittsburgh Theological Seminary.

Another addition with this update is documentation for ArchiveGrid indexes. We use a number of “hidden” indexes in ArchiveGrid for testing and trouble-shooting, so in a new how to search page, we explain what these indexes are and how they can be used in a search. This should be considered a work in progress, so if you have suggestions for improvements or questions about how the indexes work, please let us know.

ArchiveGrid and NUCMC: What’s the relationship?

A scenario ArchiveGrid visitors have encountered before goes like this: An institution – in this example we’ll use Maine Historical Society – has records in ArchiveGrid describing their collections. This is a WorldCat MARC record view in ArchiveGrid for a collection at MHS of photographs taken more than 100 years ago:

Yet below the ArchiveGrid contributor location map on our homepage, MHS is not listed:


This is because MHS is represented in ArchiveGrid through the National Union Catalog of Manuscript Collections (NUCMC), a 55-year-old cooperative cataloging program operated by the Library of Congress. According to a recent blog post on “Off the Record” by Society of American Archivists President Danna C. Bell about NUCMC and its advantages for small repositories, “As of 2013, catalog records have been created describing approximately 130,000 collections in about 1,800 repositories.”

When Library of Congress catalogers create NUCMC in WorldCat for their members, we in turn bring them into ArchiveGrid about every four to six weeks when we update the index. Right now WorldCat has 74,976 records with the NUCMC holding symbol attached.

NUCMC data in ArchiveGrid currently accounts for more than 50,113 records, or about 2.5 percent of the index, associated with hundreds of institutions including MHS. They’re made freely available for users to search, learn about what an institution holds, and contact the repository for help accessing materials. When a NUCMC contributor in ArchiveGrid don’t have its own listing in our contributor database, they won’t appear on our homepage as part of our discovery system.

A project to identify all of our NUCMC institutions and make contributor records for each one is a worthwhile and feasible project, and like other organizations we work with, resources and time are the challenges. In the meantime, we will set up any NUCMC member who asks us to be listed individually on the homepage.

New record types infuse ArchiveGrid in 2014

Vitamin K and potassium, symbolized by the chemical symbol K, improve the body’s nervous, circulatory, and muscular systems. So too, adding WorldCat MARC records coded with a record type of “k” (indicating the materials they describe are two-dimensional graphics) to ArchiveGrid has improved our “system” of finding aids and collection descriptions. With this addition, nearly 600,000 rich new records have been added to our index.

How was this done? By telling our filter to find the Leader byte 6 (indicating type of record) in WorldCat MARC records, and “k” (materials indicated by this value include prints, photographs, posters, etc.). More about how we filter WorldCat records into ArchiveGrid can be read on our about page.

For ArchiveGrid users, this means more records will show images the contributor has digitized, such as this one from the Denver Public Library:

Girl with cat two boys with dog woman watching them. Image courtesy of Denver Public Library Digital Collections.

It also means using descriptive keywords in a search, such as “photographs,” will retrieve more precise listings. Browsing can also be fun because of the unique types of materials described. Take, for example, design blueprints of Abraham Lincoln’s tomb. Or watercolors of Russian Orthodox churches in Alaska. Or rubbings from Buddhist cave temples.

This index update also includes finding aids from three new contributors: Wildlife Conservation Society in New York, Akron-Summit County Public Library in Ohio, and Haffenreffer Museum of Anthropology in Rhode Island.

A happy new year from the ArchiveGrid team to our contributors and users!

2013: The ArchiveGrid Year in Review

Rock and Roll Hall of Fame Library and Archives

Starting 2013 with nearly 1.8 million descriptions in ArchiveGrid inspired us to declare that our first database update of the year “rocked,” a playful nod to the Rock and Roll Hall of Fame for being one of the new contributors included.

What the small but mighty ArchiveGrid ensemble would find out as the year progressed is that were going to get on a roll.

In February, Marc Bron – a scholar from the University of Amsterdam – joined OCLC Research’s San Mateo, Calif., office for a three month internship. Marc’s speciality is in information retrieval and visualization, and he led several interesting projects which headlined some of our year’s main events. Marc carried out a thorough analysis of tag usage in the approximately 130,000 EAD XML finding aids in ArchiveGrid. You can read some interesting findings from the perspective of “discovery,” published in the October Code4Lib Journal article by Marc and Merrilee.

Marc also did innovative work to try and train a program to examine EAD documents to detect the names of – and find relationships between – people, groups, places and events, and to then find connections between related documents and collections. Marc and others in OCLC Research tested various Named Entity Recognition processes, and the ArchiveGrid team and some brave volunteers tried methods to annotate sample document sets.


What followed was a crescendo to new and novel ways of thinking about archival collections, connections, collaboration and annotation.

During one of Marc’s first few weeks in our office, we ambled over to the Stanford University campus to see a presentation by Amy Jo Kim about collaborative games. Her presentation got us thinking about how we might employ similar techniques in ArchiveGrid, and engage with domain experts to help identify relationships between collections. We developed a game of our own to test this approach. Called “TopicWeb,” it used the ArchiveGrid index and a dash of gamification to help experts assemble and sort the relationships of collections for a topic. We were able to get some great real-time reactions from archivists at the Society of California Archivists conference in April, and also from a more formal focus group in June.

Marc Bron, Brian Tingle, and Bruce Washburn at SCA

In May, TopicWeb starred in a well-attended webinar we held to update the archival community on recent developments.

Staying tuned to organizing collection discovery around topics, we thought these connections might be staged in the ArchiveGrid user interface. Nine hand-crafted topic pages got their big break in July, when we implemented them on the ArchiveGrid homepage. And during the summer we also published the results of a survey we conducted in 2012, asking archive users about Social Media and Archives.

SAA ArchiveGrid booth

Our summer ArchiveGrid demonstration tour hit both ends of the Mississippi River: Merrilee in Minneapolis for the Rare Books and Manuscripts Section preconference in June, and Bruce, Merrilee, and Ellen in New Orleans for the SAA Annual meeting in August. We met a great many of our colleagues, some familiar and some new, at the ArchiveGrid exhibit booth, and took part in a range of meetings and panel discussions.

We wrapped up our summer by sharing a testbed of finding aids from ArchiveGrid with the Technical Subcommittee for Encoded Archival Description (the group reports to SAA’s Standards Committee) to help them test a program to automatically convert EAD 2002 format documents to the new EAD 3 format.

October brought the launch of the new ArchiveGrid user interface. Based on version 3 of the popular Bootstrap front-end framework, it’s a “Mobile First” redesign that aims to give us a strong foundation on which to extend ArchiveGrid’s features. We also tested some heatmaps to give us a better idea of what ArchiveGrid interface features are used the most, which helped us ascertain ways in which the update improved the user experience. Based on other analytics we’ve been tracking, the interface changes along with improved sitemaps for crawling by search engines have increased ArchiveGrid’s visibility and utility, with use continuing to track upwards since October.

ArchiveGrid visits through November 2013

And we ended the year with some promising work on “localizing” the view of ArchiveGrid, getting lots of good advice on that from colleagues at a couple of ArchiveGrid contributing institutions. It’s still an experiment and a work in progress, but we may have more to say about it soon.

But until then, there’s shopping to do for the holidays. Not ending the year without an encore, the team put out the call to our archivists colleagues (who have been very good this year) for advice on gift ideas, and Ellen assembled a fun and practical guide on the ArchiveGrid blog (to date our most popular blog post, by a mile).

Index update brings in bountiful harvest of new finding aid contributors

By Unknown, Public domain, via Wikimedia Commons

We’ve just updated the ArchiveGrid index, and have benefited this time from the hard work of our colleagues Elizabeth and Todd at Arizona Archives Online, who prepared a sitemap to make the consortium’s finding aids available. In an earlier post we mentioned our affection for sitemaps – they are simple to crawl and follow a standard, widely-used protocol. We think archives will benefit in other ways by adopting this approach, as the sitemap can be used to make collections easier for search engines and others to find.

Five of our 11 new finding aid contributors are part of the AAO, thanks to access we gained to a central sitemap of its contributors’ finding aids for harvesting. They are:

Sharlot Hall Museum Archives and Library
Museum of Northern Arizona
Arizona State Museum Library and Archives
Arizona State Library, Archives & Public Records
Lowell Observatory Library

Our six other new finding aid contributors are:

Connecticut College – Charles E. Shain Library
Chicago State University – Douglas Library
Carroll University – Todd Wehr Memorial Library
Free Library of Philadelphia – Rare Book Department
Colorado State University Library
SUNY College at Plattsburgh – Special Collections

Welcome new contributors, and keep checking our blog for more news and updates about ArchiveGrid and other stories by the ArchiveGrid team.

Chrome overtakes Internet Explorer in ArchiveGrid

We go back a ways with web browsers. My first browser and still a nostalgic favorite was the alpha release of NCSA’s Mosaic browser 20 years ago, in 1993. Similar to the paint color options for Henry Ford’s Model T, you could have any background color you wanted, as long as it was gray. But even in those early days, there were browser skirmishes; sorry, fans of Cello.

Over time the browser skirmishes grew into wars, in which at various times we’d find ourselves to be on the front-lines (“I’ve got a JavaScript work-around for that”), or quickly shifting our partisan allegiances (“so long Netscape, and hello Firefox”), or hunkering down until the end of the siege (“can we stop supporting Internet Explorer 6 yet?”).

Now that OCLC Research’s ArchiveGrid system has been running for a while, we can take a look at the browsers that are being used to reach it, and how that’s changing. A year and a half ago, Internet Explorer was the dominant browser with Firefox a somewhat distant second. This month, Chrome is in the lead, nearly doubling its share of the ArchiveGrid market, with Internet Explorer in second but on a steady decline.

The new world order of browsers being used to visit ArchiveGrid matches other wider views of browser popularity, in sequence if not in volume: http://en.wikipedia.org/wiki/Usage_share_of_web_browsers#Summary_table.

With four different browsers divided roughly evenly across the ArchiveGrid user base, that could pose some issues for support, as they all have their own special features and idiosyncrasies. We’re greatly assisted by the use of solid user interface and JavaScript frameworks (Bootstrap and JQuery), which help us avoid the worst of the cross-browser programming difficulties, leaving us with a nearly Swiss neutrality in the on-going browser wars.

You’re getting warmer: using click trail heat maps to evaluate ArchiveGrid

Click trail heat maps are visual overlays on a website that can help identify whether important features are being seen and used by visitors.

We recently updated ArchiveGrid with some user interface changes and wanted to see whether that update changed how the system is used.

In the previous system, the heat maps indicated that the only parts of the home page that were getting any significant use were the search box and the map with its list of archive locations. It didn’t surprise us that those were the most popular features, but we were surprised that other features appeared to be almost entirely ignored. Were they not being seen, or just not of interest or use?

The new interface didn’t make any significant changes to the content of the home page, but design and layout changes may have made some of these features a bit easier to recognize and use. In particular, we’re seeing a little more use of the topic browse feature.

The visualization of clicks and views can also help evaluate how this page appears and is used by visitors on mobile devices. In this view it appears that Search is the only thing that matters enough to “click” or “tap,” and mobile visitors tend not to scroll very far down the page.

We’re turning our attention to how the pages for individual archival collection descriptions are viewed and used, as those had more substantial changes and improvements (we hope!) in the most recent system update. We’ll report what we learn here on the ArchiveGrid blog.

October index update welcomes design changes and record growth

Happy American Archives Month. Although there is no connection between the observance each October of archives and the archival profession, and our most recent index update – which this blog post is about – both occurrences are meaningful. For us, two pieces of news distinguish our most recent index update from others: We launched a new design to ArchiveGrid which Bruce wrote about, and our count of finding aids and collection descriptions passed the two million mark.

New contributors are:

State Historical Society of Missouri
African American Museum and Library at Oakland, Oakland Public Library
Bibliotheek Universiteit Leiden
California Judicial Center Library
Moravian Archives
Roger Williams University Library

You will also start seeing more content on this blog, as we are ironing out a more comprehensive and branding-oriented ArchiveGrid communications strategy. American Archives Month is a perfect time to make this happen.

Thank you for all your continued support of ArchiveGrid.