Monday, July 25, 2011

AALL 2011 - Barbara Tillett and John Mark Ockerbloom on Authority Control Vocabularies and the Semantic Web

I am at the American Association of Law Libraries 2011 Conference in Philadelphia. These are notes are from talk by Dr. Barbara B. Tillett of the Library of Congress and John Mark Ockerboom of the University of Pennsylvania.  Note: these are my selected notes from this session; any inaccuracies or omissions are my own.

Dr. Barbara B. Tillett, Library of Congress

DBPedia - example of a linked data, open data project

  • Community effort to extract structured information from Wikipedia and to make this information available 
  • covers 3 million things that are interconnected
  • meant as proof of concept/prototype, but fully working now
  • linking Wikipedia to lots of other content on the web (videos, websites, etc.)
  • libraries got involved in the linked data network with University of Sweden getting involved first
  • Library of Congress Subject Headings now linked
  • Virtual National Authority File also linked
All our data can be freely accessible on the web, or available for a fee; now we can share in the cloud via the Internet. Data can come from publishers, data sources themselves, libraries, and from anyone else who wants to help describe the data

Bibliographic resources are available now, and vocabulary being added.

Three projects the Library of Congress is involved in:

1.  VIAF (Virtual International Authority File)

  • facilitate exposure of authority data
  • reduces cataloguing costs
  • simplifies authority control (creation and maintenance) internationally
From the VIAF website:
VIAF, implemented and hosted by OCLC, is joint project of several national libraries plus selected regional and trans-national library agencies. The project's goal is to lower the cost and increase the utility of library authority files by matching and linking widely-used authority files and making that information available on the Web.

e.g. if bibliographic data appears in Japanese script, VIAF could be used to show this to users in Latin script. 

Originally thought national bibliographic agencies in each country should be responsible for the authors in their own countries; however, this is problematic because different countries have different cultural needs.

VIAF now has 18 participants with more adding on.  There are 21 different authority files as some countries have different languages.

All of the terms in the VIAF data are represented by URIs and are linked data.  VIAF itself is using unicode so they can handle any script characters.  MARC 21, UNIMARC and RDF are all supported. 

Usage of VIAF tripled last year.

They are mining data from bibliographic records to create a derived authority record. All of the data is normalized (diacritics and capitalization removed). Subjects are group, material types are turned into a code; publication date turned into a decade; co-author pulled out. Take the author record and attach derived authority data to it to created an enhanced authority record.

A lot of information can be derived from bibliographic records e.g. areas of interest of authors, for how long those people published, who they worked with, alternative names they published under, etc. 

Tillett encourages us to use VIAF - "It's fun!"  VIAF shows us how we can more creatively (and graphically) represent data from our MARC records. 

Next steps for VIAF
  • better searching
  • more "Linked data"
  • Participants beyond libraries
    • have Getty signed on
    • Rights management agencies, publishers
    • museums, archives
    • have been working with ISNI project to include their information
  • want to add more name types (beyond personal and corporate names)
    • geographic jurisdictions
    • family names
    • "uniform" work titles
2. SKOS (Simple Knowledge Organization System)

Have put the Library of Congress Subject Headings into SKOS. You can search e.g. "animated films" pulls back three entries. You can suggest subject headings (under the "terminology" tab) to them, even if you are not a member. 

You can go to the "aquabrowser" display that visually shows headings into graphical interface (with circles).

3. RDA (Resource Description and Access)

RDA controlled vocabularies - currently free on the web at Open Metadata Registry (RDA element sets and RDA vocabularies available).

Metadata includes the URI for every one of the terms. 

Originally created in English: also in German, and Spanish and French being added (French so that Canada could use it).

RDA Linked Data - all linked data can be displayed using linking URIs. Depending on the user's view, all of the linked data can be displayed in one particular language. 

What is slowing them down: current ILSes (integrated library systems). "They are still working in 1970s technology mindsets. They do not take advantage of this."

John Mark Ockerbloom, University of Pennsylvania Libraries

Increased use of linked open data will improve discovery significantly

Some definitions:

linked data:  
  • data that you put on the web that has resolvable, persistent URIs.
  • creates a web of data that machines can be used

open data:
  • data that welcomes reuse, with little or no restriction
  • may included linked data
  • people may reuse, remix, mash up data, and give results back to the community
  • if you open your data, make it easy to get in bulk

  • a coded format, easier for a machine to understand
  • once you have this information, you can do analysis

Penn Libraries were able to pull the Library of Congress Subject Headings to pull down data and apply to their catalog to improve the quality of their own data.  Also, using linked data they can enhance the catalog so that researchers can find data e.g. movie An Inconvenient Truth was catalogued under "global warming" but not "climate change" so may not be found.

The Online Books Page -

They have used these technologies to created listings of 1 million books freely available on the Internet, and to let people to easily search the subject categories. 

He talked about libraries pulling data from external sources and combining it with what we have in our own collections.  

Another example of a project using linked data: Cornell and others are building VIVO - a network showing university scholars and what they are doing (publishing, where their funding is coming from, who is collaborating with whom).

Getting started:
  • Don't jump in the deep end right away. "Make good data" and then make it available in one of these systems.  Adapt and improve your own data.
  • Consume and adapt others' data to create practical applications
  • collaborate with a growing community of collaborators

No comments: