Library Technology Guides

Document Repository

Discovering Harry Potter

Computers in Libraries [March 2011]

.

Copyright (c) 2011 Information Today

Image for Discovering Harry Potter

Abstract: In recent years, libraries have turned away from the traditional online catalog to embrace a new genre of public interfaces that go by names such as next-generation library catalogs, discovery interfaces, or discovery services. These new products aim to revitalize the stodgy online catalogs of the past to deliver to library patrons an experience of the collections and services of the library more in tune with the expectations set by the mainstream web. With increasing overlap and even competition to serve the information needs of library patrons by commercial destinations, it seems essential for libraries to offer the most compelling approaches possible for providing access to their valuable resources.


In recent years, libraries have turned away from the traditional online catalog to embrace a new genre of public interfaces that go by names such as next-generation library catalogs, discovery interfaces, or discovery services. These new products aim to revitalize the stodgy online catalogs of the past to deliver to library patrons an experience of the collections and services of the library more in tune with the expectations set by the mainstream web. With increasing overlap and even competition to serve the information needs of library patrons by commercial destinations, it seems essential for libraries to offer the most compelling approaches possible for providing access to their valuable resources.

Extended Scope of Library Search

Library content offerings today extend far beyond the books on library shelves. Most libraries also offer vast collections of scholarly articles and digital materials such as photographs, documents, or manuscripts. For many libraries, the items managed by their integrated library systems and available through their associated online catalogs represent an ever-diminishing proportion of their overall collection. This broader view of library collections leads to press the new generation of discovery services to a corresponding breadth of scope. Instead of presenting a menu of different specialized search tools, it's becoming more common for libraries to offer a single search box that addresses all the many different repositories that represent the library's collections.

New Challenges for Discovery Services

In this new paradigm of expanded search scope, it's essential for discovery interfaces to return results in such a way that library users can easily select items of interest. A more expansive view of search makes it essential to build tools and techniques into the interface that guide users to meaningful results.

Facets have become accepted among library interfaces as a standard approach to guide a user through an overly broad set of results to a manageable number of items. With broad keyword-based searching, most queries will return results from across many different areas. By pulling out terms from different metadata fields, an interface can present facets that allow the user to quickly narrow the results.

Relevant Results

One of the major expectations in any search interface today involves returning results according to relevancy. Each of the general web search engines - Google especially-does an incredible job of placing the best results at the top of the list. Informed by their experiences on the web, library users likewise expect the items that most closely match their interest to appear first.

Google has established itself as the master of relevancy. Its ability to anticipate the items of most interest to a searcher seems almost magical. While Google does not publicize the details of how it performs relevancy, it does document some of the factors it uses to determine the page rank that determines the placement of each page in search results. The occurrence of how words appear on the page determines whether a page is a candidate for results to a given query, but in a body of content where many thousands of pages will match by keyword, other factors come into play. Google relies heavily on other measures of importance and interest, such as the number of other authoritative sites that link to a page and the frequency with which that page has been selected in prior occurrences of that query. When searching large bodies of content and calculating relevancy, it seems that dynamic factors other than the placement of words need to be taken into consideration.

As library discovery platforms implement relevancy-based search, it's important to order items according to common-sense expectations. It may not be possible to anticipate exactly what any given user expects when typing a query, yet we'll save patrons a great deal of frustration to the extent that the relevancy ordering reflects some defensible measure of importance and interest.

Common-Sense Relevancy

One of the searches that I frequently try as I evaluate an online catalog or discovery service associated with a public library is "Harry Potter." This search should return many results, since it's one of the most popular series of books ever written. The typical library has dozens of copies of each of these titles in each of its branches. According to my common-sense benchmark, search results should place entries representing the books written by J. K. Rowling at the top of the list, with the latest and most popular given precedence. Books and articles written about these books might be of interest as well, but definitely at a lower level. Translations into other languages and alternate formats such as large print, books on cassette, books on CD, and DVDs of the movies should also appear, but not necessarily above the entries for the regular print book, which in this case seems to be the format most in demand given the investments made by libraries in purchasing so many copies. Popular topics such as these provide a good opportunity to put a search engine through its paces to see how it executes its relevancy algorithms and prioritizes different media types. How well do the various discovery platforms perform with this kind of common-sense relevancy? Let's take a look at some of the products and see how well they do.

Amazon.com yields excellent results to my "Harry Potter" search. A boxed set of all seven books in the series tops the list, followed by each of the best-sellers in the series. Given the diverse inventory of goods now available throughAmazon.com, we also see other items such as toys (a Harry Potter light wand) and clothing (a scarf). Selecting on the facet for "books" cleans up the results to each of the titles in the series, including some derivative materials such as The Unofficial Harry Potter Cookbook. Clicking on any of the cover images of the books presents a page that lets you choose from formats such as hardcover, paperback, audio, CD, and audiobook versions. Clicking on any of these format options opens up listings of individual items.

Libraries have worked hard to develop a metadata structure that might eventually help them improve the organization of their catalogs. The Functional Requirements of Bibliographic Records, or FRBR, enables the capacity to organize materials in a hierarchical way, according to work, expression, manifestation, and item. Amazon organizes its interface using similar concepts of hierarchies and groupings, with better results than almost any library interface I've encountered. While Amazon may not make use of FRBR explicitly, it provides an excellent model for what it aims to accomplish in library interfaces.

I wasn't nearly as happy with the results over at http://search.barnes andnoble.com. The first page included 10 results beginning with the derivative work Harry Potter: Film Wizardry , a book by Brian Sibley; a Harry Potter chess game; a Harry Potter pop-up book; the six-disc DVD set; and then the boxed hardcover and paperback book sets. The result ordering seems a lot messier, but clicking on an individual title does yield the same kind of display that leads to different format selections.

To get a sense of how library discovery systems offer common-sense relevancy, I tried the "Harry Potter" search in several different systems from different sites. To understand the strengths and weaknesses of library discovery services, I think that it's important to continually try out different searches in the products as they are implemented in different libraries. (See www.librarytech nology.org/discovery.pl for information on what libraries use each product.)

Nashville Public Library, my local library that uses Encore from Innovative Interfaces, uses relevancy as the default result ordering. The results begin with two related titles in print, a computer game (two duplicate records), sound recordings, and a DVD movie ahead of the first print copy of the one of the titles in the series - Harry Potter and the Deathly Hallows on the 10th entry on the first page. In this result list, it appears that more recent derivative works take precedence over the original titles in the series. I would have appreciated a facet for Authors where I might have been able to narrow the results by Rowling. I did not see any implementation of FRBR or other grouping methods that organize all the different formats of a given work together.

As implemented at The Seattle Public Library, in BiblioCommons my "Harry Potter" search yielded none of the print editions of the book series on the first page of results, though it included a couple of the movies on DVD.

The open source Evergreen catalog for the PINES consortium in Georgia did not place an entry for a print edition to one of the books in the series until the sixth page of results. Interestingly, many foreign translations appeared earlier, as well as DVD movie versions. I also noted that the interface offers a facet category for relevant authors, but J. K. Rowling was not offered as a selection.

Also in the open source vein, the Koha-based catalog of the Athens County Public Libraries (Ohio) did a nice job of presenting the titles in the series early in the result list but gave precedence to formats such as books on tape or DVD. I also found the format difficult to discern, as it was mentioned only in the textual description of the items and not through more noticeable icons as implemented in many of the other interfaces.

The AquaBrowser implementation at the Queens Library (New York) did a nice job of ordering the results to my "Harry Potter" query. The first nine entries in the results list represented the print versions of the books in the series, with a DVD movie at the end of the first page. Even on following pages, the books themselves (including alternate formats) came before derivative work about these books. WorldCat Local, as implemented by the Lincoln Trails Libraries System, presented print editions in the series as the top four search results, followed by other formats and derivative works. Selecting one of the entries displays a page that shows copies available in each of the libraries and branches throughout the library system. Although OCLC has been quite active in work related to FRBR, I did not see any collation of the alternate formats together in the WorldCat Local interface. All in all, WorldCat Local outdid the other library discovery interfaces in ordering results according to common-sense relevancy.

I like the "Harry Potter" search for trying out catalogs and discovery services for public libraries, since it's so popular and all libraries will have all or most of the books in the series in multiple formats and languages and that there will also likely be a body of other related and derivative works. It's a lot harder to come up with simple searches that evaluate relevancy for discovery systems implemented for academic libraries. For these libraries, the key challenge involves achieving the right balance of items from different formats in response to a query. Discovery services that aim to provide access to scholarly articles as well as books and other materials from the ILS, such as Summon, Primo Central, Encore Synergy, and EBSCO Discovery Service, face quite a challenge in creating results that expose users to each of the material types within their scope. If the index of the discovery service, for example, includes several hundred million journal articles and only a few million books, ensuring that the book content is not completely overwhelmed by articles can be quite difficult. Tuning the relevancy algorithms to proportionately represent the different components of a library collection in the first page of a search result represents an even harder problem than returning the Harry Potter books in some kind of intuitive fashion.

Each of these library discovery systems follows a different approach to relevancy and results groupings. I don't at all consider this little exploration a scientific study, but it's an informal benchmark that might be a starting point for how others might conduct more systematic surveys on the performance how each product ranks search results. My common sense relevancy is only the most basic beginning of the process of evaluating search results, and the ordering of results is only one aspect of a complex set of features embodied in these products. I think that it's important for those of us in libraries to continually probe at the capabilities of these discovery systems and engage in a dialogue with implementers and developers to help continually improve their performance. It may be possible to tune relevancy through local configuration options, or it may require more complex technical adjustments performed by developers. Although library discovery systems have improved dramatically from the dreaded OPACs of the not-so-distant past era of library automation, we still have a long way to go to achieve the elegance and power seen in commercial e-commerce platforms.

Permalink:  
View Citation
Publication Year:2011
Type of Material:Article
Language English
Published in: Computers in Libraries
Publication Info:Volume 31 Number 3
Issue:March 2011
Page(s):21-25
Publisher:Information Today
Place of Publication:Medford, NJ
Notes:Systems Librarian Column
ISBN:1041-7915
Record Number:15596
Last Update:2012-12-29 14:06:47
Date Created:2011-04-09 16:31:25