WGC Minutes - April 12, 2006
Present: Zoe Stewart-Marshall (chair), David Banush, Adam Chandler, Matt Connolly, Keith Jenkins, Anna Korhonen, Jim LeBlanc, Joe McNamara, Lisa Maybury, Linda Miller, Liisa Mobley, Boaz Nadav-Manes, Margaret Nichols, Jean Pajerek, Lois Purcell, Nathan Rupp, Cecilia Sercan, Rick Silterra, Pam Stansbury, Barb Tarbox, Debra Warfield, Scott Wicks, Marijo Wilson (recorder).
David Banush led a discussion of Karen Calhoun's report for the Library of Congress on The Changing Nature of the Catalog and its Integration with Other Discovery Tools (available online at http://www.loc.gov/catdir/calhoun-report-final.pdf). Prior to the free ranging discussion, David presented a view of the report?s highlights, including what he saw as salient tensions and possibly paradoxical recommendations. Karen had suggested that the group list its questions for her to review and respond to; Jim LeBlanc volunteered to write up the questions and send them to her. This list of questions was posted to WGC-L on April 12, 2006 and is appended below, with Karen?s subsequent responses.
May WGC meeting:
Zoe Stewart-Marshall announced that the May meeting would be a joint effort with the Metadata Working Group. On May 8th Jennifer Bowen (University of Rochester) and Diane Hillmann will collaborate on a program on RDA (Resource Description and Access). (N.B. The Joint WGC-MWG meeting on RDA has been rescheduled: Tuesday May 16th from 2:30-4:00pm in Olin 106).
Discussion: LC?s report on The Changing Nature of the Catalog and its Integration with Other Discovery Tools
David Banush provided a brief introduction to the report The Changing Nature of the Catalog and its Integration with Other Discovery Tools. Karen had suggested that the group list its questions for her to review and respond to; Jim LeBlanc volunteered to write up the questions and send them to Karen. LC?s Bicentennial Conference on Bibliographic Control (Nov. 2000) resulted in an action plan based on the issues identified as impacting on the future of library catalogs. The plan?s Action Item 6.4, "support research and development on the changing nature of the catalog to include consideration of a framework for its integration with other discovery tools" led to the commissioning of Karen Calhoun as the principal investigator on the research project. David noted that there has already been discussion of the report on lists such as Autocat, as well as a review by Thomas Mann (LC) expressing an opposing viewpoint. He observed that, in his view, the report made a number of key assertions, including:
- The use of print collections is in decline; digital scholarly resources now or will soon eclipse print
- The library catalog is used with decreasing frequency and that decrease will continue in the future
- The catalog is expensive and time consuming to produce, involves duplication of effort, and is consequently not cost-effective
- Speed of processing should be viewed as the standard measure of quality rather than fullness of description
- The catalog as we know it is in decline and at the end of its useful life cycle
As a way to start the conversation, David noted one of the tensions contained in the report that seemed worthy of further exploration: that there is (on the one hand) an acute need for rich legacy data to support new services and functions; and (on the other) an equally urgent need to limit the amount of effort and time staff spend adding this data to catalog records. Automated processes may increase the need for controlled vocabularies, taxonomies, cooperative authority projects, and wider application of metadata, but might cuts in cataloging have the opposite effect? How would one reconcile these two points? After the introduction the floor was opened to a free-ranging discussion of the report and its implications; selected themes and questions raised are outlined below:
Utilization of a Business Administration Model for the Report:
- Analysis for the report relied heavily on a business model. Jim LeBlanc recognized the need to be cost effective while differentiating between the library?s prime directive of service vs. a business mission of profit.
- Libraries must remain competitive by distinguishing our services while avoiding the replication of services that are already available for free (e.g. Web discovery tools)
- How do we measure product to justify expense? Need to determine the cost/benefit to justify what we do or propose to do. Blueprint item #6 "Make Good Decisions" should probably be #1 in the recommended planning process for a phased implementation of a revitalized research library catalog.
- Need to have a critical mass of users to justify a service; its not enough to simply meet someone?s needs.
- Research libraries? claim to excellence is based on having materials available that may only be used once in fifty years. Can we still afford to support this level of scholarship? Little-used resources are still important but don?t need to be duplicated in multiple locations.
Web Discovery Tools
- What do we know about the cost/benefit analysis of Amazon, Google, etc. compared to the catalog?
- Catalog serves different roles ? inventory control system vs. search and discovery tool.
- People search the Web for fun and the catalog for business; but tools like Amazon and Google provide discovery through more serendipity.
- Local catalog and ILS is the "last mile" to find the actual object or desired piece of information, rather than the primary point of discovery. However, that "last mile" is still critical in meeting user needs.
- Discovery path from Google to e.g. WorldCat not necessarily dependent on detailed catalog records but rather an underneath process utilizing ?magic numbers" (ISBN, ISSN, etc.) Google relies on algorithms and brute force number crunching of massive amounts of data to make connections; uses computer power rather than structuring of data. The data of the catalog record is not rich enough for Google Scholar.
- Google algorithms look at linking (citations, bibliographies, use of pages) and relevance. With access to many library catalogs can do analysis across multiple collections.
- The report in some ways tells us to back away from traditional tools like LCSH, but doesn?t offer a clear alternative.
- How much effort should we be putting into local systems vs. supporting OCLC in developing user interface.
- Technological developments are transitory, but libraries cannot afford to do nothing while waiting for stability; rate of change in systems is too great.
- Historical mission of Collect-Preserve-Organize serves as a guide in the decision making process. Relying on Google may be attractive in the sort term but Google development will be dictated by where the growth and money is.
- Level of access will always change with the technology; libraries still have a commitment as stewards of the data.
Role of the Catalog and Cataloging
- Catalog provides legacy data which is critical to the functioning of other systems
- The catalog may be dead as we know it but there is still a niche for it; need to ask what is the catalog used for and does it provide unique information or service?
- The report (p. 25) quotes Norm Medeiros: "more and more, users want, expect, and pursue full text. In increasing numbers they look past the catalog when searching for e-journals, databases and Web sites." An implication is the use of the catalog is the same as previously. Are new technologies replacing the old or are both used simultaneously? There is a need to revitalize the catalog to meet current needs; determine what audience it serves and needs to serve.
- David Banush observed that a Ph.D. candidate can do research based solely in Google, but that it is unlikely the person would be granted a degree in such a case ?at least today. This suggests an ongoing need for some alternative or complimentary service to Google. Is the catalog that tool or is it too obsolete for that purpose?
- From the viewpoint of a student, Scott Wicks noted that the catalog functions primarily as an inventory tool rather than a discovery tool to get at the latest articles and data.
- Zoe Stewart-Marshall called the catalog an extension of our collection and we need to be more visible getting users from Google to the "good stuff" in our collection. Margaret Nichols added that requests for the primary resources in the CUL collections is increasing. MARC records are not adequate for the description of archival collections. Finding aids have been developed to fill the gap. There have been a series of reports over the years on linking catalog records to Table of Contents files, but this enhancement has lacked institutional support. Do we concentrate our resources and efforts on enhancing the local interface or collectively supporting the development of WorldCat?
- Unique materials call for a greater level of cataloging, i.e. lowest use materials require highest effort. CUL probably ranks high in the use of batch processing in order to devote more resources to unique materials.
- Shortcomings in the catalog/user interface are an impediment to optimal utilization of enriched records; the user never sees the full record.
- The report suggests that speed in processing materials should be a gold standard. Jean Pajerek noted that speed is a short term goal; catalog records should last a long time. The need is for adequate metadata as well as speed of access. According to an article by Tina Gross and Arlene Taylor, at least one third of successful keyword subject searches depends on the presence of adequate subject headings. Question: How long will Cornell support the research mission of CUL?
Based on the WGC?s discussion of the report The Changing Nature of the Catalog and its Integration with Other Discovery Tools, Jim LeBlanc compiled the following questions raised in the course of discussion to forward to Karen after the meeting. Karen?s subsequent responses to the questions are imbedded in the list. Additionally, Karen will schedule a brown bag lunch to provide an additional forum for discussion of the issues raised by the report and the reaction to it in the wider library community.
1. How will we continue to support uses for legacy data, if we reduce the amount of information we add to catalog records in the future?
I think you mean, if the metadata we obtain or create for books and serials and their electronic counterparts becomes simpler, how will we be able to use that data to support sophisticated searching and browsing for students and scholars in the future? I hope I have interpreted the question correctly. I agree the library community has a lot of work to do to assure we choose (or develop) the right data elements and methods to continue including in descriptive cataloging records, and also how to do subject analysis that supports "more like this" services and clustering by subject in online catalogs. In this regard I am watching the development of RDA (Resource Description and Access), urging vendor support for deployment of FRBR (Functional Requirements for Bibliographic Records) concepts in catalogs, and I am doing what I can to encourage research and development of new tools to make the most of (a) what catalogers will continue to do by hand; (b) related data sets that are available, such as cover art, reviews, tables of content, and so on; and (c) the library world's controlled vocabularies for subject analysis.
A second aspect of my answer goes beyond the question that you asked. The question rests on the premise that catalog records will continue to have the primary role they do now--as online surrogates of (mainly) books or print serials that are available offline, usually by borrowing them or by going to the stacks to look at them. The world is changing in this respect, and already we are seeing successful access and retrieval for some library materials not as dependent on surrogates (catalog records). In the report, I suggest that we need to begin to imagine the role that catalog data will play when more library content is available online, and when full text indexing techniques (about which librarians tend to know not much, but about which others in the information organization business outside libraries know a good deal) can be applied to more library materials. We are already seeing a big change in how people think about catalog records for journals (I mention LC's work on "access level" records for serials in this regard -- see page 46).
2. How do we reconcile the expense of initiatives such as authority control with the idea that cataloging is too expensive?
Sorry, I'm not sure I completely understand this question. I think it means that I've said in the report that we need to make cataloging less expensive, but I haven't also said we should stop doing authority control, which drives a lot of the expense of cataloging. Is that right? I think in general, a starting point for me, in writing the report, was that we should not walk away too quickly from our cataloging traditions, but that we should very actively engage in looking for less costly ways to achieve the benefits that cataloging/authority control tools and practices bring to students and scholars. To drive down costs, we in CUL have looked at both designing workflows differently and using technology differently. For example, when we migrated to Voyager we changed how we do name authority control (loading all the LC files, etc.) In general we in CUL tech services have been creative and open to change. And, we have constantly been on the lookout for things that we can stop doing, when the cost of continuing to do them exceeds the value that students and scholars are getting from them. In CUL tech services over the past ten years, we have applied this thinking collectively to getting rid of an enormous cataloging backlog, finishing recon, learning how to provide metadata services, and providing title-level access to thousands and thousands of electronic resources. Many other libraries are very very far behind us. That is why we have so many visitors to tech services. They want to know how we--how you--did these things.
3. Are non-MARC metadata cataloging projects cost-effective? Where will metadata for automated projects come from without some kind of catalog record?
I should probably ask Marty or one of the members of the Metadata Services group to answer this question. I think I can say, on their behalf, that non-MARC metadata cataloging practices are effective (and cost-effective). In LTS, we have programmatically created MARC records for the titles in big aggregations of full text journals by manipulating data in Excel spreadsheets that content providers have given us. Big book digitization projects tend to repurpose MARC records. The projects that Marty's group gets involved in obtain metadata from many sources, usually outside the library. For example, for some of the Faculty Innovation grants, metadata came from a professor's own personal files (like an MS Access database, or Excel spreadsheets). Marty's group gets a lot of experience with transforming data sets, mapping the data to metadata formats they can use, then loading them into some online delivery system (sometimes the catalog, sometimes some other kind of system). For the National Science Digital Library (NSDL), John Saylor went out to find agencies that had valuable content to load. In some cases, Marty's group would take a look at the content and help the agency figure out how to create a usable metadata set from it, for loading into the NSDL union catalog. For the physics ArXiv, and for Cornell's DSpace repository, descriptive metadata and subject keywords come from the contributors. There are lots of sources of metadata. When there isn't any metadata that can be repurposed, and no automated way to collect the data elements that are needed, it is necessary to create records one at a time, by hand. For those kind of metadata records, AACR2 and MARC are good for describing a lot of the materials that libraries collect. There are better metadata types for some kinds of objects, like images, for which Marty's group is now using VRA core (from the Visual Resources Association).
4. If we outsource cataloging to publishers and vendors, will their work be reliable and appropriate for a research library environment?
In my experience, sometimes the records for books and journals that come from publishers and vendors fit our requirements for description and access in a research library environment, and sometimes they don't. When they don't, sometimes LTS managers decide to use them anyway, then try to capture better bibliographic records later to overlay what we started with, using a tool like Marcadia. This is the same principle we use to manage acquisitions generally--we make an initial record then we build it up. At other times, when the vendor records aren't good enough, we work with the vendor to persuade them to give us better data. Of late I and other TS directors in big libraries have been working with LC and a set of vendors to persuade them to give us reliable record sets for the books they sell to us. You might have heard about the doings around the Casalini Libri records in this regard.
5. How will we automate subject analysis?
Not by working in isolation, but in collaboration with other libraries and the organizations that serve them. Marty and I have some ideas for automated subject analysis, using legacy LCSH and LCC data, that we've arranged to discuss next week in a conference call with Lois Mai Chan and two researchers from the OCLC Office of Research. We will do our best to spur movement in the profession toward automated subject analysis so that we can make the most of the tools we already have in LCSH and LCC. Accomplishing this won't happen by individual libraries working alone.
6. If the catalog is a product at the end of its life cycle, why are we trying to revitalize it?
Because the catalog (and the collections its represents) remains important to scholarship. It would be awful if students and scholars just gave up on catalogs, because they are so hard to use in comparison to search engines. I am fearful that could happen, if we don't make catalogs better, more vital and engaging tools for students and scholars, and better connected to the kind of tools they expect. As I say in the report in several places, the legacy of the world's library collections is tied to the future of catalogs. And, we will continue to need to catalog for library collections, because we don't now have full text or full text indexing or automatic cataloging methods for all the books and serials that research libraries continue to buy, and because we can't rely on Google to enable students and scholars to find out what is in library collections and get their hands on what they need. I think that CUL needs to make it as easy for students and faculty to borrow our books on the Web as it is for them to buy books on the Web.
A daunting goal. And, here is the part that readers of the report seem to have difficulty with--I am saying we need to make catalogs better AT THE SAME TIME we are reducing the cost of producing them. That might seem contradictory, but that's what we must do, and I believe we can, and without diminishing the scholar's experience using library collections, but even improving that experience. And here is another long term benefit of revitalizing catalogs: assuring the continued relevance and use of the world's great library collections, which are described by catalogs.
7. How do we measure the benefit or cost-effectiveness of what we do -- in other words, how do we measure scholarly value?
I think we need to study the people, and what they do, rather than the library systems we give them to use. So often our user studies focus on our systems rather than the students and scholars who have information needs, and how they satisfy those needs. I think the University of Rochester has hold of the right end of the stick in this regard, with their student and scholar "work practice" studies and usability methods for building or changing library systems. I quote Susan Gibbons' work in the report. Several colleagues -- Adam among them -- just went through Rochester's usability training class.
8. What audience does (or should) the catalog serve?
Cornell communities of practice, other students and scholars, other libraries, the citizens of New York State, the public. There might be others.
9. What audience does (or should) the collection serve?
Same as the catalog. In fact, it's best to think of the the catalog not as an end in itself, but a tool to represent the collections, so they can be discovered and used.
10. What can we do to re-focus our attention more towards union catalog development (e.g. OCLC) and less on local catalog development?
CUL is doing a project called Catalyst, which features the development of a union catalog. Maybe the WGC would be interested in learning more about it? You could invite Xin Li to come talk about it with you; Jim LeBlanc has been serving as an advisor to this project.
11. How much control do we, as practitioners, have over where libraries are heading?
That's a great question. I think it's at the crux of why some people are really uncomfortable with where libraries are headed, because they feel less in control than they would like. I would like the chance to come and talk with you about this topic.
12. How is the library's mission within the university addressed in the paper?
Sorry, I'm wearing out of time and energy. The short answer is, it's addressed only by implication. The assumption I make is, that our and the world's library collections are vital to teaching, learning, research, and higher education--and these things are tied to the mission of the university, that is, to educate new leaders and create new knowledge. I have other papers/presentations that do a better job than the LC report of defining the library's mission within the university, and I could give you my take on it at a future WGC meeting if you would like.
13. When you talk about abandoning LCSH, do you mean abandoning controlled vocabulary or just the string structure?
The manual application of the string structure. I don't mean abandoning the controlled vocabulary. Here is a snippet of what I sent to one of the listservs in answer to a similar question:
My issue is with the current application of LCSH, which has been under fire for decades. We have failed to reengineer LCSH to deploy it within sophisticated online search systems and to match information seekers? behaviors and preferences. I am not calling for the abandonment of controlled vocabularies for subject access but for deploying these tools in more cost-effective, user-adaptive ways. In the report, I particularly urge the exploration of ways to use legacy LCSH and classification data to develop new tools to not only speed subject analysis but also improve the end-user?s ability to search and browse. .