- Updates and recent developments on ArchiveGrid webinar lineup
- National parks libraries hold rich potential for ArchiveGrid
- Index update comes with new features, contributors, and webinar plans
- OCLC Research bids “dag” to our intern, Marc Bron
- Earth Day week in ArchiveGrid: Five days, four keywords, three events, two people, one finding aid
Category Archives: Building ArchiveGrid
Don’t know what’s been up with ArchiveGrid since November 2011, when the ArchiveGrid team last gave a webinar about it?
This Thursday, from 3-4 p.m. EDT online via WebEx, is the chance to learn. All are invited to attend.
In ArchiveGrid and Related Work, the ArchiveGrid team – Bruce Washburn, Merrilee Proffitt, and Ellen Eckert – will:
- Review ArchiveGrid’s background, what it is, and its mission
- Discuss who uses ArchiveGrid and how
- Lead a brief tour of ArchiveGrid
- Describe how we get collections in and promote it
- Explain recent OCLC Research work done to better understand the ArchiveGrid aggregation, how information about special collections is used, and new methods for making connections between related archival collections
- Facilitate questions and answers from attendees
- Monitor Twitter #archivegrid usage
From the news announcement:
“This is the thirteenth webinar in the OCLC Research Technical Advances for Innovation in Cultural Heritage Institutions (TAI CHI) series, developed to highlight specific innovative applications, often locally developed, that libraries, museums and archives may find effective in their own environments, as well as to teach technical staff new technologies and skills.”
This index update is different than previous index updates because it coincides with two new features we’re adding to ArchiveGrid and one event later this month we’re preparing for now.
First: A widget (code is at the bottom of our about page) which Bruce Washburn created for anyone to embed on guide pages so users can start an ArchiveGrid search and get taken to search results in our system.
Second: Extent, or physical descriptive data, in search result displays. These brief additions come from MARC record sources that include a 300 field subfield a, and for EAD sources that include an extent element value.
Third: A webinar from 3-4 p.m. EDT on Thursday, May 23. In this hour-long session we will review what ArchiveGrid is and how it works, share our latest developments, and answer questions. Please register here to attend. Advance registration is required.
Also we welcome five new finding aid contributors to ArchiveGrid:
University of California, Berkeley – Environmental Design Archives
Seton Hall University
University of California, Los Angeles – Ethnomusicology Archive
Santa Barbara Trust for Historic Preservation – Presidio Research Center
West Virginia University – West Virginia and Regional History Center
With the addition of data from these new contributors, our index count is now nearly two million finding aids. Number of visits to ArchiveGrid has also climbed and was most recently at its second-highest since we started keeping track toward the end of 2011. We are thankful for these achievements and we hope for the growth to continue!
Today is Queen’s Day in the Netherlands and its new king, Willem-Alexander, was welcomed. While today signified a new beginning for the country where our intern, Marc Bron, comes from, it also signified the end of Marc’s three-month internship at OCLC Research in San Mateo, Calif. Marc started today early by watching live coverage of his country’s coronation ceremonies and ended it by gathering with co-workers and friends in San Francisco. Marc returns to the Netherlands tomorrow to continue his doctoral studies at University of Amsterdam and complete a series of papers about his work at OCLC Research.
Among those who attended the farewell dinner for Marc was Bruce Washburn, who worked closest with Marc on projects involving ArchiveGrid data and TopicWeb. TopicWeb, putting it bluntly, is a way to “play” with archival data in the context of a game and we expect it will change the way archivists think about and interact with collections. It’s something that hasn’t been done before and to accomplish such a task is a milestone in Marc’s early career in library and information science. It definitely won’t be his last.
Right now, however, is a time to talk about last’s. After a somber ArchiveGrid team meeting – the last one we would have with Marc – Bruce wrote about Marc’s time with us:
“Marc has been an outstanding addition to the small but mighty ArchiveGrid team … we miss him already. While here, Marc was instrumental in carrying out the first comprehensive tag analysis of the 130,000 or so EAD files that we’ve gathered together in ArchiveGrid, and is working with Merrilee to complete and publish a report of the results. He worked with our colleagues Jean Godby and Devon Smith in OCLC Research to test Named Entity Recognition tools with the EAD sources, with the results of that effort providing a way to better support faceted searching in ArchiveGrid.
“Marc led the team on a journey to find innovative ways to find related archival collection descriptions: the NER work was part of that effort, but it evolved into the development of an experimental collaborative game for searching ArchiveGrid and drawing connections between collections related to a topic. In a few short but very busy weeks we were able to assemble a system that we could demonstrate at the Society of California Archivists meeting in April, and we’re continuing to develop this idea into a system that we can share more widely. In all, an important and transformative period in ArchiveGrid’s history, primarily due to Marc’s intelligence, persistence, and deep and wide-ranging knowledge of information retrieval.”
One way to say “bye” in Dutch is “dag,” and we look forward to following Marc’s promising future.
(People pictured above: Merrilee Proffitt, Bruce Washburn)
Don’t let these frustratingly tiny mobile image uploads fool you. They tell the story of a big week we just wrapped up at OCLC Reseach. While eggs laid by Canadian geese outside OCLC Research’s San Mateo, Calif., offices hatched, we geared up for the annual Society of California Archivists (SCA) conference held April 11-13 across the San Francisco Bay in Berkeley.
Big ideas and strategies about TopicWeb, a newly-hatched development of our own in connection with ArchiveGrid meant to improve how we understand key collections, filled an office whiteboard. But that scenario has already happened more than once before, so what’s more significant is what resulted from hours of intelligent and hard-working people perfecting TopicWeb for demonstration at SCA. This post cannot go further without crediting the natural teamwork between Bruce Washburn and Marc Bron to put together TopicWeb and promote it at SCA to interested archivists who visited the ArchiveGrid booth.
What is Topic Web? Here is the short answer: TopicWeb is a game we developed and refer to endearingly as “Yelp for Archives,” yet it’s more. It’s designed to bring a team of experts on a particular topic together to evaluate EAD and MARC collection descriptions in the ArchiveGrid index for their relevancy, or importance, to a topic that one person, the TopicWeb creator, chose. Building the TopicWeb happens when players on the team search for and locate collections relevant to the topic and add them to the web and leave comments explaining why. Then the team members review suggested collections and vote on whether they are relevant or not to the topic, and again leave comments explaining why. Each team has one week to complete their TopicWeb before it can get published – although we have not finalized that stage. Points accrue, players advance levels, and other incentives happen. It’s filled with potential for capturing the exchange of knowledge between archivists and collections in ways we will gain valuable research data from and that are fun.
Since TopicWeb is still in development mode, we welcome any feedback, and interest in helping us improve the game further.
One thing some archivists in New York City and at Yellowstone National Park have in common is that their organizations joined ArchiveGrid as new finding aid contributors. Trinity Wall Street archives and Carnegie Hall archives in Manhattan, Queens College in Queens, and Yellowstone Heritage and Research Center in Gardiner, Mont., are among the six new contributors included in our index update this week and we welcome them. Circus World Museum in Baraboo, Wisc., and New College of Florida in Sarasota are our other two new contributors.
Here are links to their collections:
Circus World Museum – Robert L. Parkinson Library and Research Center
New College of Florida – Jane Bancroft Cook Library
Trinity Wall Street – Archives
Carnegie Hall Archives
Yellowstone National Park – Yellowstone Heritage and Research Center
Queens College – Benjamin Rosenthal Library
Yellowstone’s presence in ArchiveGrid is also what we hope a start to getting collection descriptions from more national parks archives represented in our system. We also hope the presence of collection descriptions from our other new contributors is a start to more smaller archives getting their finding aids online and available for machines to harvest.
Anyone who has camped at Yellowstone, driven through, or encountered wildlife there may appreciate a nearly 100-year-old photograph collection of sagebrushers (campers), who traveled by train to West Yellowstone, Mont., and entered the park in wagons because automobiles were not yet allowed inside. According to the finding aid, “Images of a black bear include several photographs depicting members of the party feeding the bear.”
However, this collection is also appropriate for anyone who likes historic photographs of views and everyday people. In addition to images of landscapes, geysers and thermal features, and buildings, “Images with people include one image of a Reverend Rice as well as views of hiking, camping, a picnic, and cooking outdoors. There are also views of the tents, a camp stove, the wagons, and a surrey.”
If Daylight Savings Time this weekend prompts people to start making summer travel plans, Yellowstone is a good destination choice. Just don’t feed any bears.
February started off with a new employee at OCLC Research’s San Mateo, Calif., office. Marc Bron, a PhD candidate at University of Amsterdam, will spend the next three months focusing on archival data and incorporating his findings into ArchiveGrid, before returning home to The Netherlands and finishing his information retrieval (computer science) program.
In his own words, here is a description of what Marc’s project this spring will be:
“Archival collections provide us with a window into history, and Encoded Archival Descriptions (EAD’s) support the discovery of these collections by carefully describing each collection. Each EAD, however, describes an individual collection in isolation of other collections. That both collections have something in common, i.e., the history of California, remains hidden. In the next three months we will investigate whether it is possible to make meaningful connections between EAD’s and thus the collections they describe. As a starting point we focus on entities and perform Named Entity Recognition (NER) on the EAD’s. By analyzing the co-occurrences of entities across EAD’s we will gain an understanding of what type of co-occurrence constitutes a meaningful connection. We plan to incorporate these findings into the ArchiveGrid system as a new discovery model and to evaluate the model through A/B testing.
“My personal interest lies in investigating the type of connections between entities that can be automatically detected in contextually sparse documents such as EAD’s. Answering this question would further our knowledge about the opportunities for using automatic methods to supplement the annotation of archival collections.”
Marc has been thoroughly enjoying the weather and the beautiful environment in the San Francisco Bay area and he is planning some weekend trips to the national parks.
Last month Ellen posted a note about some of the ways in which we routinely harvest finding aids from ArchiveGrid contributor’s websites.
This month we’re working with our first ArchiveGrid contributor to make their finding aids available with the Site Map protocol. In a way it’s surprising that this is our first opportunity to harvest finding aids this way. The Site Map protocol has been around for years, is a widely used method of making website content visible to search engines, and is relatively easy to set up. At any rate, we’re very pleased to have a Site Map to guide our way.
In our experience in support of ArchiveGrid in cases where a protocol beyond just following links on the website is employed, institutions have in some cases expressed interest in OAI-PMH. In these cases a Site Map may prove to be a more effective mechanism for sharing finding aids. Site Maps can help search engines see the documents you want them to see (Google withdrew support for OAI-PMH in 2008), may already be supported as part of content management systems or web server platforms, and are familiar to a wide array of harvesters. For valuable insights on the role Site Maps and metadata play for institutional repositories in Google Scholar, we recommend the Library HiTech article Invisible institutional repositories: addressing the low indexing ratios of IRs in Google by Kenning Arlitsch and Patrick O’Brien.
If you have Site Maps in place that we could use to harvest your finding aids, and of which we’re not yet aware, please let us know.
Striving to better serve the information needs of music fans and connect with scholars, the library and archives at the Rock and Roll Hall of Fame in Cleveland, Ohio, is one of ArchiveGrid’s new contributors included in our most recent index update this week. With nearly 300 finding aids describing collections related to the history of rock and roll and its role in society now discoverable in ArchiveGrid, this move advances the hall of fame’s mission to raise the museum’s visibility and recognition as a learning institution and tell the story of rock and roll through exhibits and programs.
One such program related to American Archives Month last October showcased archival materials related to local music and musicians from Ohio, many of whom led rock music genres: Marilyn Manson, Judas Priest, Devo, The Black Keys, and The Pretenders. Materials related to these musicians and more appear in ArchiveGrid, advancing our new year’s resolution we discussed this week to reach more types of researchers.
Our count of archival collection descriptions now nears 1.8 million.
We also welcome our other new contributors:
Bowdoin College – George J. Mitchell Department of Special Collections and Archives
Illinois Wesleyan University – Ames Library
Bowling Green State University – Browne Popular Culture Library
Finding aids we acquire directly from archives and special collections and load into the ArchiveGrid index come in many forms: EAD, HTML, PDF, and even some Word files. We send a program – a crawler – to a website – which we call a crawler site – and collect, or harvest, these finding aid source files. However, not all crawler sites are made equal and often we have to ask potential contributors requesting to include their finding aids in ArchiveGrid to make some modifications.
Here are ideas for what we seek in a crawler site:
Finding aids on a machine-readable web page. We prefer if the the page only contains finding aids. It can be a directory, or just a page with source files. Here is an example. However there are other crawler site options that can work. One page we crawl lives on Google Sites.
If there isn’t a machine-readable page, or time or resources to make one, we can harvest finding aids from a front-end interface for users, such as a browse page. This works as long as: 1. Finding aid URL’s are distinguishable from non-finding aid URL’s by their word strings, and 2. An option exists to display all finding aids on one page. Our crawler only picks up links we tell it to on the URL we give it, and it doesn’t follow links to other pages where other finding aids may be.
Please write to us with any questions you may have. We have found solutions with most contributors for finding aid inclusion, so we leave no finding aid un-discoverable in ArchiveGrid.
This post is intended to introduce the people at OCLC Research who work on ArchiveGrid – our names and faces who get and send emails, attend meetings and conferences, give presentations and demonstrations, host webinars, post videos, participate in groups, write things, work behind the scenes, and do whatever else is necessary, big or small, to advance ArchiveGrid’s role in archives and special collections research.
To reveal more about ourselves in addition to our work at OCLC, we pulled from own archives this past week photographs and memories of Halloweens past to share today, in addition to how we ended up in libraryland. Enjoy!
Other than a couple of summers in the mid-70s when I worked in a cannery, from age 18 on I’ve worked either in or for libraries. That path has taken me from shelving books, to cataloging, to providing technical support, to web design, and to software engineering in OCLC Research. My time and attention are divided across a range of systems and projects. Along with providing programming, design, strategic planning, and management support for ArchiveGrid, I support the WorldCat Search API and a number of data analysis projects being pursued in OCLC Research.
The photo suggests both my interest in technology and my habit of putting things together from what’s close at hand. In the case of my Halloween 1964 robot costume, I only needed a few cardboard boxes, a discarded TV antenna and radio parts, some wooden blocks, and silver spray paint.
We were big Kennedy fans at my house, so my Halloween costume a year earlier for 1963 was based on PT-109, the motor torpedo boat famously commanded by Lieutenant John F. Kennedy and sunk after a collision with a Japanese destroyer in the South Pacific, 1943. Again, I assembled it from what was lying around, including more cardboard and more silver spray paint. And a hat. No pictures survive of that effort. President Kennedy was assassinated the next month, and my mother was too heart-broken to see my boat costume stored away near the washing machine in our basement, so it was tossed. My brother likes to tell people that my 1964 costume was based on the Texas School Book Depository.
Halloween was always a HUGE night for kids back in the days when I was a trick-or-treater. We all felt a huge sense of kid community as we ran around the neighborhood (no parents in tow) shrieking and laughing. Now, when I see every year what a tiny handful of gremlins and munchkins come around in our perfectly safe, suburban neighborhood, I find myself hoping that they’re having a grand time somewhere else, as opposed to missing out on all the traditional fun of this goofy holiday.
I remember remarkably little about my costumes, except for the last one. I was 11, pushing the age envelope for trick-or-treating. Mom made me wear my older brother’s recycled Peter Pan costume, including the pants she had made from boys’ long underwear. Boo! No pictures available of me in that (thank god) or any other Halloween costume, so this one is of me in my baton twirler outfit, which suited my pre-pubescent self image far better.
As for my life in libraries and archives, it has been an absolute blast — 30 years and counting! I had no thought of specializing in special collections when I was in grad school at UCLA, but I was hooked forever when I fell into an opportunity to catalog original prints and photographs at the Library of Congress. After 25 years of great jobs in libraries, I’m now in a perpetual state of bliss as one of the lucky crew at OCLC Research. Among much else, being part of the ArchiveGrid gang is a great pleasure. Thanks to Bruce and Ellen, AG has grown incredibly in both content and functionality over the past couple years. We love helping to make the archival community happy and productive!
Here’s my Halloween photo, circa 1980. Despite being dressed as a baby, this is the year I was told I was too old for trick-or-treating and this was my last gasp at age 11 or 12. Pictured with me is my best friend from elementary and high school who has risen up the ranks from cowgirl to an illustrious career in newspaper journalism. Previous costumes were the Bride of Frankenstein (I think I used Desitin on my face since I didn’t have access to makeup) and Princess Leia from Star Wars. This year I’ll be dressed up as a speaker for the ARL Assessment Conference in Charlotteville, Virg. (Editor’s note: Due to Hurricane Sandy and cancelled flights out of the Bay Area, this plan did not come to fruition.) I did make my daughter’s “lightning fairy” costume from scratch.
I first started working as in a library as a “volunteen” in the early 1980s to help provide support for the summer reading program at the Chapman Branch of the Orange County Public Library – the fact that the library offered air conditioning and an unlimited supply of books was an added bonus. I took a break from the library world and worked as a cashier at Disneyland and as an in-home caregiver in high school and college, before coming back to the library as a student employee at the Regional Oral History Office at UC Berkeley’s Bancroft Library in 1988. I never looked back. My roots are planted firmly at the corner of digitization and special collections, and I love working on collaborative projects, which is why I made the move to work as a program officer at RLG in 2001. Since 2006, I’ve been working in OCLC Research on a range of projects, mostly in our “Mobilizing Unique Materials” strand, and am fortunate to be part of the small but mighty ArchiveGrid team.
My favorite Halloween costume involves swim goggles and my grandfather’s academic robe. I pin on a piece of red paper in the shape of an hourglass – the signature of a black widow spider. I use three sets of goggles for the spider’s multiple eyes. My grandfather’s robe has long sleeves that act as the third set of legs.
This photo is from the 1980s. I’m with my dad, who is dressed up as Zorro. We’re on our way to downtown Boulder for the annual rowdy stroll, a rival to the Castro’s former street party. I’m still not too old for trick-or-treat, especially in San Francisco.
I’m an accidental librarian and an accidental archivist. Right around the time of this photo, I got my first job in a library. I was broke, after a year as a ski bum. I feel lucky to have worked in all kinds of libraries all over North America: university, special, public, research, historical society, etc. Except for a brief stint as a cataloger, I’ve worked with researchers and been a researcher myself, especially in rare books and archives. Until I came to OCLC Research, my librarian regalia was another kind of costume: vintage gabardine blazers, 1940s shoes, and always – always – a string of pearls.
Cats were my favorite animal growing up and for several consecutive childhood Halloweens I wore some variation of a furry, full-body feline costume my mom sewed. These practical get-ups kept me warm on cold Oregon nights for trick-or-treating, and although it seems bizarre now, I enjoyed wearing them around the house. In this photo, I am playing the part as our cat Angus (named by my teenage sister at the time after AC/DC Guitarist Angus Young) attacks my tail. I must have been believable, because his tail and back fur is fluffed in defense mode.
I’m somewhat new to the library profession, but I’m not new to the library world. As a kid my family took me to our local library at least once a week because I had a fierce curiosity about everything and read voraciously. My favorite memories in middle school were going to the Multnomah County Library Central Branch in downtown Portland with my older sister after school because the place was full of history and wonder and she could drive us there. In college I was captivated by the enormity of our library and everything it had to offer.
I joined OCLC Research as an intern in 2010 while working on my master’s in library and information management and have since moved up to full-time staff status. My work with ArchiveGrid has evolved from data clean-up tasks to helping prime our system for the future of archives and special collections research and it all happens because of the incredible work my colleagues at OCLC Research do. When I feel the old itch from my days as a newspaper reporter to write, I edit and create posts for the ArchiveGrid blog.