Linked Data and the Semantic Web

Uniform Resource Identifiers

How to create a URI

To some, the idea of creating a URI seems daunting. In fact, it is quite easy.

To begin with, you only need to create a URI for things that do not already have one. For example, bibliographic records in the Library of Congress catalog have a "Permalink" that looks like:

http://lccn.loc.gov/72614326

This is a URI for that LC record. Similarly, OCLC WorldCat records have a Permalink URI:

http://www.worldcat.org/oclc/527725

DOIs are also URIs:

http://dx.doi.org/10.1016/j.acalib.2006.08.002

In addition, many things have been given URIs by a wide assortment of communities. You can discover these by visiting the Web site of "Linked Open Vocabularies". These include many topic vocabularies (including ones from libraries), geographic names, and identified terms in the sciences and government data.

You should look for an authoritative identifier for any thing you wish to identify. An authoritative identifier often comes from the organization that creates and maintains the data. For example, the Library of Congress Subject Headings are given identifiers by LC using its domain "id.loc.gov." Ideally you should only need to create identifiers for data that you create and maintain. Where identifiers are not available for things not under your control, you may need to use text strings until there are URIs available.

But for anything that you do create and maintain, you can "mint" identifiers for those things quite easily. You need a domain name that belongs to you (such as your institution's domain) and some way to keep track of your identifiers so that you do not assign the same identifier to more than one thing.

For example, I could create identifiers using my "kcoyle.net" domain simply by adding a string after the slash:

   http://kcoyle.net/1    
   http://kcoyle.net/2
   http://kcoyle.net/3

Both Library of Congress and WorldCat use their record identifier as the string that follows their domain name. The string does not have to have any meaning, so my "1, 2, 3" is just as legitimate as their use of previously assign record numbers.

Name spaces and prefixes

The part of the URI that makes the identifier unique, usually consisting of a domain name but possibly a domain plus sub-directories, is called the "name space." Because writing out the entire long identifier can take up a lot of space (and for programmers is much too much to type each time an identifier is used), you will often see identifiers that look like this:

    dcterms:title
    foaf:name

This is a shortened form of an identifier that is used commonly in data and programs. It replaces the names space with a shorter form, and can only be used in the context of a data set or a program in which that short form has been declared. To use such a shortened form, your data must declare the long form of the identifier before using the shortened version. How this long form is declared depends on the data format that you are using, but here are some examples:

    xmlns:dcterms='http://purl.org/dc/terms/' (in xml) 
    @prefix foaf: <http://xmlns.com/foaf/0.1/> . (in "turtle") 
    

There are some common namespaces, like the two shown here, that you will see often and by habit the shortened forms are used in many different environments. They do still have to be declared for machine use, but sometimes short examples in documentation leave off the formal declaration.