The semantic Web has had some great claims made about it. With the recent advent of some C# class libraries for working with its various layers, it becomes easy to explore this 'greatness' practically and make up your own mind. One good application is to illustrate the effect the semantic Web can have on search results once data has become semantically structured. We are going to look at music genre as a good semantically rich information source that when related to music files has many advantages and creates a whole new Web of potential applications.
I'm sure you have all read background on what the semantic Web is and huge amounts of directed definitions. So I will try not add too much to your load. The semantic Web is a big mesh of information linked in a way that makes it machine processable, on a global scale. You can either think of it as being a new way of representing data on top of the World Wide Web or as a globally-linked database; your choice. With some recent class libraries developed in C# it is not only possible to implement, but ridiculously straight forward.
The first main difference between the Web and the semantic Web is the use of URIs, a more general form of URLs. This is the format in which Web addresses are written (http://www.codeproject.com/). URIs, unlike URLs, do not necessarily retrieve information but are used to uniquely identify information on the semantic Web much like URLs are used to uniquely retrieve information on the current Web. The second major difference is that the semantic Web is built on RDF, instead of XHTML, which in turn is built on XML. This means we are only ever talking about data representation and not presentation. Sorry if you have confused the semantic Web with Web 2.0 but we will not cover those things here. RDF is used to create meta data about files and objects on the semantic or existing Web. Much like the meta data you include at the top of a current Web page or in a music file today except each item is uniquely identified with a URI instead of being just a string of text characters you type in. For example, to say an audiofile was performed by 'Chris' you break this meta-data statement into three parts; song1 URI, performer URI, Chris URI. All meta data statements, no matter how complex, can be broken down into these three parts; subject, predicate and value, respectively. However to be able to do anything useful with these statements, you must be able to define relationships. For example, if there was another meta data statement Chris URI (subject) CanOnlyPlay URI (predicate) Drums URI (Value). Then using these two statements, what Chris plays and the songs he plays on could be combined to attach the additional statement, Chris played drums on song1. This creation of new statements is done through a process known as inference by relating multiple meta-data statements often from multiple sources. These relationships are defined at the ontology or RDF-scheme level of the semantic Web in the diagram above.
To illustrate this, we are going to build a music system that uses the semantic Web to put a small amount of common sense inference into a semantic Web music search application. The idea is to give the semantic Web enough information to relate music files through their statements made about genres. I will keep this simple for the purpose of this article although you could add some really powerful functionality by defining complex relationships between statements. We will be using RDF scheme (RDFS) to define the relationships between statements. This is done by relating the URI strings in their subject predicate value parts using basic relationships such as
SubClassOf. We will then use a C# class library to load in the statements from multiple RDF documents that describe several music files as well as load in the relationships defined in the RDF scheme document. The C# class library will then be used to create new statements (inference).
RDFS's inference is not a complex task, but a simple set of 12 rules that are continuously applied to a list of statements already made. This continues to add new statements until no new statements can be added. These rules take the simple format of, if this subject, predicate and value then you can add this additional subject, predicate and value. All you need to do is keep looping through all the rules until you stop adding statements. A process known as entailment. At which point inference is complete. Very simple!
You can then query this newly created list of statements using various matches on different subject-predicate-value string combinations. For example if I want to find out what Chris played on song one I might ask something like Chris (subject) played(predicate) null(value), couldn't be easier, interpreted as what URI string value is connected to URI representing the subject Chris by the predicate played.
The wonderfully easy to use semantic Web library for C# semweb has all the capabilities needed to illustrate the huge range of potential music applications of the semantic Web.
The RDFS document supplied in the zip file contains genre and sub-genre relationships. The ontology is simplistic, not containing huge amounts of musical genre relationships, but could be easily expanded if needed.
Working with a Semantic Web Class Library
The example code below is from the sample ASP.NET C# application. It was written in Visual Studio. NET 2005 using C# 2.0 but should work on previous versions. It requires the SemWeb class library produced by Joshua Tauberer which is easy to use. It simply does what it says on the tin. It first loads in two RDF documents music.xml and tracks.xml. Both RDF documents contain statements about music files which could be located anywhere on the Internet. It then sets-up strings to be used for the URIs, loads in the RDF scheme document (genre.xml) and creates an inference object passing in the scheme document and the two document assertions. This creates the new statements which are then queried to produce the results. Two queries are made, one against the store of statements before-any other statements have been added through inference, the other against the store of statements after the inference has taken place.
Store store = new MemoryStore();
Store store2 = new MemoryStore();
System.Xml.XmlDocument doc = new System.Xml.XmlDocument();
System.Xml.XmlDocument doc3 = new System.Xml.XmlDocument();
const string RDF = "http://www.w3.org/1999/02/22-rdf-syntax-ns#";
const string GEN = "http://www.example.org/genre.xml";
Entity rdftype = RDF + "type"; Entity GENjazz = GEN + "#jazz"; Entity GENjazz = GEN + "#jazzfunk";
System.Xml.XmlDocument doc2 = new System.Xml.XmlDocument();
SemWeb.Inference.RDFS rdfs = new SemWeb.Inference.RDFS(new RdfXmlReader(doc2), store);
rdfs.Select(new Statement(null, rdftype, GENjazz), store2);
foreach (Statement s in store2.Select(new Statement(null, null, null)))
output1.Text += s.Subject.Uri + s.Predicate.Uri + s.Object.Uri + "\n";
output1.Text += "\n";
foreach (Statement s in store.Select(new Statement(null, rdftype, GENjazzfunk)))
output2.Text += s.Subject.Uri + s.Predicate.Uri + s.Object.Uri + "\n";
output2.Text += "\n";
The two outputs should be the same. In the first query, a request is being made to return all music that is of type jazz, which because of the new assertions added by the inference steps now include all sub-genre relationships as well. However the only jazz pieces that are known to have the type jazzfunk which the inference steps have been additionally labelled as a type of jazz. So it just returns all jazz funk music as well. The second query queries the original statements and returns all jazzfunk tracks. This is obviously equal to the first query. Well done, you have just built a semantic Web music application.
More Complex Music Applications for the Semantic Web
This is an incredibly simple application of the semantic Web and one which requires little imagination to see its advantages. However let us discuss some more music applications for the semantic Web. With a little thought, you can see this is just a few stone's throws away from a recommender system and a large number of collaborative filtering systems that we see about now. However it is the connection of this type of information source that really makes it interesting. Music brainz is probably the largest and most useful example containing huge amounts of meta-data RDF statements about music that have many applications, including recommenders. Some of it is licensed under a creative commons public license meaning that it's free for commercial and non-commercial work. Whereas some of it is licensed under a creative commons license that restricts these parts to non-commercial work.
Music brainz gives you a collection of RDF classes that can be used as an impressive bootstrap onto and equally impressive repository of information. This together with Web service access to this information, makes it the first semantic Web service that has a musical application. Musicbrainz also has the ability to relate music information to the RDF assertions it has from the audio CD much like the services you have experienced on I-Tunes. Place a CD in your computer and you are presented with album-art, track-names etc. Music brainz however takes this to a whole new level due to the added information they have.