Dr. Barbara B. Tillett, Library of Congress
DBPedia - example of a linked data, open data project
- Community effort to extract structured information from Wikipedia and to make this information available
- covers 3 million things that are interconnected
- meant as proof of concept/prototype, but fully working now
- linking Wikipedia to lots of other content on the web (videos, websites, etc.)
- libraries got involved in the linked data network with University of Sweden getting involved first
- Library of Congress Subject Headings now linked
- Virtual National Authority File also linked
All our data can be freely accessible on the web, or available for a fee; now we can share in the cloud via the Internet. Data can come from publishers, data sources themselves, libraries, and from anyone else who wants to help describe the data
Bibliographic resources are available now, and vocabulary being added.
Three projects the Library of Congress is involved in:
1. VIAF (Virtual International Authority File)
- facilitate exposure of authority data
- reduces cataloguing costs
- simplifies authority control (creation and maintenance) internationally
From the VIAF website:
VIAF, implemented and hosted by OCLC, is joint project of several national libraries plus selected regional and trans-national library agencies. The project's goal is to lower the cost and increase the utility of library authority files by matching and linking widely-used authority files and making that information available on the Web.
e.g. if bibliographic data appears in Japanese script, VIAF could be used to show this to users in Latin script.
Originally thought national bibliographic agencies in each country should be responsible for the authors in their own countries; however, this is problematic because different countries have different cultural needs.
VIAF now has 18 participants with more adding on. There are 21 different authority files as some countries have different languages.
All of the terms in the VIAF data are represented by URIs and are linked data. VIAF itself is using unicode so they can handle any script characters. MARC 21, UNIMARC and RDF are all supported.
Usage of VIAF tripled last year.
They are mining data from bibliographic records to create a derived authority record. All of the data is normalized (diacritics and capitalization removed). Subjects are group, material types are turned into a code; publication date turned into a decade; co-author pulled out. Take the author record and attach derived authority data to it to created an enhanced authority record.
A lot of information can be derived from bibliographic records e.g. areas of interest of authors, for how long those people published, who they worked with, alternative names they published under, etc.
Tillett encourages us to use VIAF - "It's fun!" VIAF shows us how we can more creatively (and graphically) represent data from our MARC records.
Next steps for VIAF
- better searching
- more "Linked data"
- Participants beyond libraries
- have Getty signed on
- Rights management agencies, publishers
- museums, archives
- have been working with ISNI project to include their information
- want to add more name types (beyond personal and corporate names)
- geographic jurisdictions
- family names
- "uniform" work titles
2. SKOS (Simple Knowledge Organization System)
Have put the Library of Congress Subject Headings into SKOS. You can search e.g. "animated films" pulls back three entries. You can suggest subject headings (under the "terminology" tab) to them, even if you are not a member.
You can go to the "aquabrowser" display that visually shows headings into graphical interface (with circles).
3. RDA (Resource Description and Access)
RDA controlled vocabularies - currently free on the web at Open Metadata Registry (RDA element sets and RDA vocabularies available).
Metadata includes the URI for every one of the terms.
Originally created in English: also in German, and Spanish and French being added (French so that Canada could use it).
RDA Linked Data - all linked data can be displayed using linking URIs. Depending on the user's view, all of the linked data can be displayed in one particular language.
What is slowing them down: current ILSes (integrated library systems). "They are still working in 1970s technology mindsets. They do not take advantage of this."
John Mark Ockerbloom, University of Pennsylvania Libraries
Increased use of linked open data will improve discovery significantly
- data that you put on the web that has resolvable, persistent URIs.
- creates a web of data that machines can be used
- data that welcomes reuse, with little or no restriction
- may included linked data
- people may reuse, remix, mash up data, and give results back to the community
- if you open your data, make it easy to get in bulk
E.g. showing us the LCSH record for Arbitration (International Law)
- a coded format, easier for a machine to understand
- once you have this information, you can do analysis
Penn Libraries were able to pull the Library of Congress Subject Headings to pull down data and apply to their catalog to improve the quality of their own data. Also, using linked data they can enhance the catalog so that researchers can find data e.g. movie An Inconvenient Truth was catalogued under "global warming" but not "climate change" so may not be found.
The Online Books Page - http://onlinebooks.library.upenn.edu/
They have used these technologies to created listings of 1 million books freely available on the Internet, and to let people to easily search the subject categories.
He talked about libraries pulling data from external sources and combining it with what we have in our own collections.
Another example of a project using linked data: Cornell and others are building VIVO - a network showing university scholars and what they are doing (publishing, where their funding is coming from, who is collaborating with whom).
- Don't jump in the deep end right away. "Make good data" and then make it available in one of these systems. Adapt and improve your own data.
- Consume and adapt others' data to create practical applications
- collaborate with a growing community of collaborators