What two changes we made to ArchiveGrid mean for users

When Morristown National Historic Park in New Jersey was dedicated on July 4, 1933 as the first national historic park in the United States, there was a parade. Its theme was Spirit of ’76, referring to the Revolutionary War when a cluster of three historic military posts, Jockey Hollow, the Ford Mansion, and Fort Nonsense, served as George Washington’s army headquarters.

Imagine you’re in Google looking for photographs of the parade. You click on something and go to a page in ArchiveGrid, and this digitized black-and-white photograph from the dedication is discovered:

If you have visited ArchiveGrid before, you’ll know you’re looking at an archival collection description. You’ll be able to find out who holds the collection (this one’s at the Morristown and Morris Township Public Library) and how to access it.

Google Analytics shows that around 80 percent of ArchiveGrid visitors arrive this way, by clicking on a collection description or item they found in Google. These descriptions act as a kind of “home page” for most new visitors to our site, and this might be our best opportunity to show them what ArchiveGrid is all about and provide them maximum value.

With our new “More like this” feature, we’re hoping to do exactly that. Located in a box on the right-hand side of an individual record display, “More like this” uses connections made in ArchiveGrid’s Solr index to offer extra contextual information and links to related materials – without disrupting the flow for those who just want contact information and to learn more about access to the resource.

Success of “More like this” depends on how rich the collection description is and the extent to which related people and topics can be found in other descriptions.

Right away we noticed this feature seems to work well for items from digitized collections, such as in the example above. It provides a way to view other items in the collection without searching:

For other collections, it can suggest closely-related resources at other institutions:

Now imagine you’re doing a search in ArchiveGrid. A relevance ranking algorithm that mostly paid attention to keyword matches, compared to the number of times they occur in a description, and the description’s overall length, used to generate your search results.

We made some adjustments so now matches in certain metadata elements (title, author, scope and content) get emphasized over other fields in the keyword index. Behind the scenes, we’re grouping descriptions by their extent into small, medium, and large. This allows us to give greater weight to collection descriptions over sub-series and items.

As a result, we’re doing a better job of making “key collections” appear near the top of a related search result in ArchiveGrid.

2013: The ArchiveGrid Year in Review

Rock and Roll Hall of Fame Library and Archives

Starting 2013 with nearly 1.8 million descriptions in ArchiveGrid inspired us to declare that our first database update of the year “rocked,” a playful nod to the Rock and Roll Hall of Fame for being one of the new contributors included.

What the small but mighty ArchiveGrid ensemble would find out as the year progressed is that were going to get on a roll.

In February, Marc Bron – a scholar from the University of Amsterdam – joined OCLC Research’s San Mateo, Calif., office for a three month internship. Marc’s speciality is in information retrieval and visualization, and he led several interesting projects which headlined some of our year’s main events. Marc carried out a thorough analysis of tag usage in the approximately 130,000 EAD XML finding aids in ArchiveGrid. You can read some interesting findings from the perspective of “discovery,” published in the October Code4Lib Journal article by Marc and Merrilee.

Marc also did innovative work to try and train a program to examine EAD documents to detect the names of – and find relationships between – people, groups, places and events, and to then find connections between related documents and collections. Marc and others in OCLC Research tested various Named Entity Recognition processes, and the ArchiveGrid team and some brave volunteers tried methods to annotate sample document sets.

TopicWeb

What followed was a crescendo to new and novel ways of thinking about archival collections, connections, collaboration and annotation.

During one of Marc’s first few weeks in our office, we ambled over to the Stanford University campus to see a presentation by Amy Jo Kim about collaborative games. Her presentation got us thinking about how we might employ similar techniques in ArchiveGrid, and engage with domain experts to help identify relationships between collections. We developed a game of our own to test this approach. Called “TopicWeb,” it used the ArchiveGrid index and a dash of gamification to help experts assemble and sort the relationships of collections for a topic. We were able to get some great real-time reactions from archivists at the Society of California Archivists conference in April, and also from a more formal focus group in June.

Marc Bron, Brian Tingle, and Bruce Washburn at SCA

In May, TopicWeb starred in a well-attended webinar we held to update the archival community on recent developments.

Staying tuned to organizing collection discovery around topics, we thought these connections might be staged in the ArchiveGrid user interface. Nine hand-crafted topic pages got their big break in July, when we implemented them on the ArchiveGrid homepage. And during the summer we also published the results of a survey we conducted in 2012, asking archive users about Social Media and Archives.

SAA ArchiveGrid booth

Our summer ArchiveGrid demonstration tour hit both ends of the Mississippi River: Merrilee in Minneapolis for the Rare Books and Manuscripts Section preconference in June, and Bruce, Merrilee, and Ellen in New Orleans for the SAA Annual meeting in August. We met a great many of our colleagues, some familiar and some new, at the ArchiveGrid exhibit booth, and took part in a range of meetings and panel discussions.

We wrapped up our summer by sharing a testbed of finding aids from ArchiveGrid with the Technical Subcommittee for Encoded Archival Description (the group reports to SAA’s Standards Committee) to help them test a program to automatically convert EAD 2002 format documents to the new EAD 3 format.

October brought the launch of the new ArchiveGrid user interface. Based on version 3 of the popular Bootstrap front-end framework, it’s a “Mobile First” redesign that aims to give us a strong foundation on which to extend ArchiveGrid’s features. We also tested some heatmaps to give us a better idea of what ArchiveGrid interface features are used the most, which helped us ascertain ways in which the update improved the user experience. Based on other analytics we’ve been tracking, the interface changes along with improved sitemaps for crawling by search engines have increased ArchiveGrid’s visibility and utility, with use continuing to track upwards since October.

ArchiveGrid visits through November 2013

And we ended the year with some promising work on “localizing” the view of ArchiveGrid, getting lots of good advice on that from colleagues at a couple of ArchiveGrid contributing institutions. It’s still an experiment and a work in progress, but we may have more to say about it soon.

But until then, there’s shopping to do for the holidays. Not ending the year without an encore, the team put out the call to our archivists colleagues (who have been very good this year) for advice on gift ideas, and Ellen assembled a fun and practical guide on the ArchiveGrid blog (to date our most popular blog post, by a mile).

Chrome overtakes Internet Explorer in ArchiveGrid

We go back a ways with web browsers. My first browser and still a nostalgic favorite was the alpha release of NCSA’s Mosaic browser 20 years ago, in 1993. Similar to the paint color options for Henry Ford’s Model T, you could have any background color you wanted, as long as it was gray. But even in those early days, there were browser skirmishes; sorry, fans of Cello.

Over time the browser skirmishes grew into wars, in which at various times we’d find ourselves to be on the front-lines (“I’ve got a JavaScript work-around for that”), or quickly shifting our partisan allegiances (“so long Netscape, and hello Firefox”), or hunkering down until the end of the siege (“can we stop supporting Internet Explorer 6 yet?”).

Now that OCLC Research’s ArchiveGrid system has been running for a while, we can take a look at the browsers that are being used to reach it, and how that’s changing. A year and a half ago, Internet Explorer was the dominant browser with Firefox a somewhat distant second. This month, Chrome is in the lead, nearly doubling its share of the ArchiveGrid market, with Internet Explorer in second but on a steady decline.

The new world order of browsers being used to visit ArchiveGrid matches other wider views of browser popularity, in sequence if not in volume: http://en.wikipedia.org/wiki/Usage_share_of_web_browsers#Summary_table.

With four different browsers divided roughly evenly across the ArchiveGrid user base, that could pose some issues for support, as they all have their own special features and idiosyncrasies. We’re greatly assisted by the use of solid user interface and JavaScript frameworks (Bootstrap and JQuery), which help us avoid the worst of the cross-browser programming difficulties, leaving us with a nearly Swiss neutrality in the on-going browser wars.

You’re getting warmer: using click trail heat maps to evaluate ArchiveGrid

Click trail heat maps are visual overlays on a website that can help identify whether important features are being seen and used by visitors.

We recently updated ArchiveGrid with some user interface changes and wanted to see whether that update changed how the system is used.

In the previous system, the heat maps indicated that the only parts of the home page that were getting any significant use were the search box and the map with its list of archive locations. It didn’t surprise us that those were the most popular features, but we were surprised that other features appeared to be almost entirely ignored. Were they not being seen, or just not of interest or use?

The new interface didn’t make any significant changes to the content of the home page, but design and layout changes may have made some of these features a bit easier to recognize and use. In particular, we’re seeing a little more use of the topic browse feature.

The visualization of clicks and views can also help evaluate how this page appears and is used by visitors on mobile devices. In this view it appears that Search is the only thing that matters enough to “click” or “tap,” and mobile visitors tend not to scroll very far down the page.

We’re turning our attention to how the pages for individual archival collection descriptions are viewed and used, as those had more substantial changes and improvements (we hope!) in the most recent system update. We’ll report what we learn here on the ArchiveGrid blog.

A fresh look for ArchiveGrid means there are more places to call home

We recently updated the ArchiveGrid interface, applying the popular Bootstrap framework to make the system more responsive on smartphones, tablets, and other mobile devices, and to take advantage of Bootstrap’s built-in features for handling common layout and design problems.

While the basic features of ArchiveGrid did not change in this update, this is more than a fresh coat of paint.  We’ve long recognized that, for many visitors to ArchiveGrid, their first experience of the system is on a page other than the system’s official “home page.”  About 76% of visitors arrive in ArchiveGrid, typically from a link in a Google, Bing, or Yahoo search result, on a page describing a single collection.

In the previous ArchiveGrid design, this page was fairly static.  Other than a search box to continue searching and a link to contact information for the archive, there wasn’t really much you could do from here:

 

In the redesign, we’ve made more access points available, added a map to help with finding your way to the archive, and changed the layout to move more information “above the fold”:

 

http://beta.worldcat.org/archivegrid/collection/data/47230217

We’ll be watching ArchiveGrid analytics closely to see how these changes affect the use and utility of the system.  If we’re making incremental progress, we could see a reduction in the “bounce rate” (visitors who arrive and leave without doing anything else).

Site Maps can point the way to your finding aids

Last month Ellen posted a note about some of the ways in which we routinely harvest finding aids from ArchiveGrid contributor’s websites.

This month we’re working with our first ArchiveGrid contributor to make their finding aids available with the Site Map protocol.  In a way it’s surprising that this is our first opportunity to harvest finding aids this way.  The Site Map protocol has been around for years, is a widely used method of making website content visible to search engines, and is relatively easy to set up.  At any rate, we’re very pleased to have a Site Map to guide our way.

In our experience in support of ArchiveGrid in cases where a protocol beyond just following links on the website is employed, institutions have in some cases expressed interest in OAI-PMH. In these cases a Site Map may prove to be a more effective mechanism for sharing finding aids.  Site Maps can help search engines see the documents you want them to see (Google withdrew support for OAI-PMH in 2008), may already be supported as part of content management systems or web server platforms, and are familiar to a wide array of harvesters.  For valuable insights on the role Site Maps and metadata play for institutional repositories in Google Scholar, we recommend the Library HiTech article Invisible institutional repositories: addressing the low indexing ratios of IRs in Google by Kenning Arlitsch and Patrick O’Brien.

If you have Site Maps in place that we could use to harvest your finding aids, and of which we’re not yet aware, please let us know.

 

Hurricane names: Why there will never be another Donna. Or an Andrew. Or a Katrina.

Archivists and librarians spend much of their time sorting the names of people, groups and places. Authority control systems are an integral part of processing archival materials and manuscripts, and an important area of innovation, as we’re seeing with work around the EAC-CPF.

Hurricane names represent an interesting alternate approach. As described on the National Weather Service website, there was once a practice of naming tropical storms and hurricanes in the West Indies after a particular saint’s feast day. Given that hurricane season in the North Atlantic (generally from June through November) would encompass the same limited set of saint’s days, the same name could be attributed to more than one storm system.

The first hurricane named this way was Hurricane of San Bartolme in 1568, and earlier storms were named years later by historians. Two major hurricanes named after San Felipe occurred on exactly the same day, but 52 years apart, in 1876 and 1928.

As Wayne Neely writes in The Great Hurricane of 1780, “This system for naming them was haphazard and not really a system at all.”

The Great Hurricane of 1780 is also known as Hurricane San Calixto II. It’s thought to be the deadliest Atlantic hurricane on record, responsible for, among other things, the sinking of 40 French ships involved in the American Revolutionary War, with 4,000 souls lost. You may then be left wondering whether it’s named after Pope Callisto II, or if it’s the second hurricane named after the Feast of Pope Saint Callisto I. We’re not sure.

The practice of naming hurricanes after women began in 1953 in the United States, and in 1978-1979, male names were added to the storm lists. These six-year storm name lists for Atlantic hurricanes are developed by the World Meteorological Organization (WMO), and each year’s list includes 21 names. If a given year has more than 21 named storms, the Greek alphabet is used. Then each list repeats every seventh year. So, these names can recur, perhaps with some of the same ambiguity as Hurricane San Felipe. Remember Tropical Storm Alberto from May, notable as the earliest-forming tropical storm in the Atlantic in nearly 10 years? It was also the name of a tropical storm that caused considerable damage in Florida and Georgia in 1994.

For certain calamitous storms, the name is retired. There are currently 76 names on the “retired” list, including the notorious Andrew, Donna, and Katrina. The list of names is controlled by the WMO, and given recent events, we suspect they will retire Sandy too at their next annual meeting.

For an inside view of what’s involved in search and rescue operations following a major hurricane, take a look at this transcript of a 2005 interview after Hurricane Katrina with Commander Meredith Austin, provided through ArchiveGrid by the U.S. Coast Guard Historian’s Office.

“You know in an average hurricane we’ll fly out to the impacted area and get RV’s if we have to because it’s not that you want to be pampered or anything but when it’s really hot and it’s really humid and you want people to work in really harsh conditions for 12 to 14 or 18 hours a day, you’ve got to have a place for them to recover or they’re going to be no good to you the next day. So to have them sleeping out in tents we have to worry about fire ants and your stuff getting wet. You can do that for a couple of days, anyone can, but we’re here for the long term. There are going to be Strike Team folks down in these areas for probably a year.”

How do we filter MARC records out of WorldCat, anyway?

The inspiration for this blog post came from the realization that, after touting system improvements we made over the weekend to the way ArchiveGrid looks and adapts to smartphones and tablets, we forgot to add one feature we had worked on in tandem with said improvements: A “frequently asked questions” section to our About ArchiveGrid page. It’s there now, addressing almost every question contributors and potential data suppliers ask.

Except one, which this post attempts to explain.

The question: How do we filter the MARC records out of WorldCat?

As shown in the statistics on our updated About ArchiveGrid page, MARC records extracted from WorldCat make up the bulk of ArchiveGrid’s content … about 90%. But there isn’t a simple way to identify a MARC record that describes the types of materials held in archives, manuscript collections, and special collections.

We look at every one of the 280 million or more records in WorldCat, and exclude those that have any of these characteristics:

  • Have more than one library holding symbol attached
  • Do not have the value b, d, f, p, r, or t in MARC Leader byte 6 (see table below), or the value “a” (language material) in Leader byte 6 and the value “c” (collection) in Leader byte 7, or the value “a” (archival) in Leader byte 8
  • Have a value of any kind in MARC 260 subfield “a” or “b” (to filter out published works)
  • Have a MARC subject heading with a subfield “a” or “v” beginning with the word “Bibliography”
  • Have a MARC 502 field (Theses or dissertation note)
  • Have the material type “book” or “serial” and any value in the MARC 008 or 006 “Nature of Contents” bytes (to eliminate theses, reference works, and other non-archival materials)

This filter isn’t always successful.  Especially for minimally-cataloged materials, we sometimes see descriptions of unpublished manuscripts of various kinds filter through.  But we continue to evaluate and improve the filter as best we can.

MARC Leader byte 6 values:

  • a Language material
  • b Archival and manuscripts control Note: Value obsolete
  • c Printed music
  • d Manuscript music
  • e Cartographic material
  • f Manuscript cartographic material
  • g Projected medium
  • i Non-musical sound recording
  • j Musical sound recording
  • k 2-dimensional non-projectable graphic
  • m Computer file
  • o Kit
  • p Mixed material
  • r 3-dimensional artifact or naturally occurring object
  • t Manuscript language material

Preview the new ArchiveGrid Interface

Over the past few months we’ve been working on an update to the ArchiveGrid interface.

Along with some bug fixes and cosmetic changes, the new interface has two major new features:  A “result overview” display that summarizes important access points in a search result, and an “adaptive” (sometimes also called “responsive”) layout to improve how the system works on tablets and smartphones.

You can try a preview version of the new interface now.  We’re testing and updating it still, but expect to replace the current interface with the new one in the next month or two.

The Result Overview

Here’s an example of the Result Overview for a search.  The search began by looking for collections that match golden gate bridge photographs.  With 377 matching collections it would take a while to scroll through each brief record, but the Result Overview helps identify some key access points at a glance:

And as access points are selected, the search result is narrowed.  This can be a quick and effective way to reduce a large result to something that can be more easily checked for a deeper dive into the collection descriptions.  The Result List and Overview are different views of the same result: selecting their tabs makes it easy to switch from one to the other.

The Adaptive Display

Though we aren’t yet seeing very much ArchiveGrid use by people with smartphones and tablet devices, we want to ensure that all users have an experience in ArchiveGrid that is best suited to the capabilities of their browser.  While still a work in progress, the new interface adapts its layout and features based on the device that has connected.  We’ve been testing this on a range of computers: an iPhone, a Nexus 7 tablet, a “One Laptop per Child” computer, a variety of notebook and desktop computers, and a large flat screen display connected to a Chromebox.

As with everything else about ArchiveGrid, we’d love to hear your comments and suggested improvements to this new version of the user interface.

Understanding Special Collections: A first look at the survey results

In April and May of 2012 we conducted a survey to update our understanding of how special collections research is carried out by faculty, graduate students, genealogists, and unaffiliated scholars. We’re currently analyzing the 695 survey responses, though one clear finding is the importance of librarians and archivists as a source for recommendations.  Over 80% of survey respondents identified librarians and archivists, when answering the question “Is there a particular type of user whose comments, recommendations, etc. you find most valuable?”

The full set of survey questions and choices is listed here. They are also downloadable here.

instrument_PDF

instrument_DOCX

Expect to hear more on the ArchiveGrid blog as our analysis of the survey responses continues.

  1. Have you used special collections materials?
    Special collections materials are defined as library and archival materials in any format, generally characterized by their value, physical format, uniqueness or rarity. For example: rare books, manuscripts, photographs, institutional archives including digital items.
  2. What kind of special collections materials did you use?
  3. What are the important attributes of these materials for you?
    Unique, Primary Source, Digital, Other
  4. In the last year or so, what have been the subjects of your research?
    Family History, Genealogy, History (unaffiliated/conducting personal research), History (conducting professional research), Academic Coursework, Instruction/Lesson Planning, Other
  5. What is the intended purpose of your research?
    For publication, for degree or coursework, for hire, for personal interest, other
  6. When using special collections what is your usual role?
    Faculty affiliated with a college or university, Post graduate/Graduate student, Undergraduate, Unaffiliated Scholar, Genealogist (professional), Genealogist (conducting personal research)
  7. Remembering your research in the past year or so, as you begin the research process where do you typically go for help in your initial investigations?
    Web search engines, Library catalogs/databases, Colleagues and friends, Email lists/discussion boards, Print materials, None of the above
  8. When you are in the middle of, or completing, your research, which resources are the most useful to you?
    Web search engines, Library catalogs/databases, Colleagues and friends, Email lists/discussion boards, Print materials, None of the above
  9. When you complete your research, do you need to make sure that all potential sources have been checked?
    Never, Sometimes, Always
  10. How do you discover new websites and other research resources?
    Colleagues and friends (via email, word of mouth, etc.), Professional/trade literature, Events and meetings, Email/posts from communities and groups (listservs, chat boards, etc.), Twitter, Facebook, None of the above
  11. When you want to share information about a new website or research resource, how do you usually identify it?
    Website name, Website URL, URL from a search engine, The resource’s institution name, The resource’s collection name, Finding aid or collection description, Library catalog reference, Other
  12. When you want to share information about a new website or other research resources what are your preferred ways to communicate?
    Word of mouth with colleagues and friends , Professional/trade literature, Email with colleagues and friends, Email/posts to communities and groups (listservs, chat boards, etc.), Twitter, Facebook, Other
  13. Which of these website features are valuable for your research?
    User comments, Tags, Reviews, Recommendations, Saving to a list, Connecting with others, None of these are relevant for me
  14. Comments, tags, reviews and recommendations can come from a variety of sources. Is there a particular type of user whose comments, recommendations, etc. you find most valuable?
    A scholar whose reputation I know, Faculty affiliated with any college or university, Faculty affiliated with a specific college or university, Library or archive staff, Undergraduate, Post graduate/graduate student, Genealogist (professional), Genealogist (conducting personal research), Colleagues and friends