Ontology, Notation 3, and SPARQL

Lutosław

Rate me:

4.91/5 (18 votes)

18 Feb 2011CPOL10 min read

72.7K

2.2K

Introduction to the concept of ontology, Notation 3, and SPARQL.

Download sample project (dotNetRdf 0.31 alpha included) - 1 MB

Ontology dimensions map

Introduction

In computer science and information science, an ontology is a formal representation of knowledge as a set of concepts within a domain and the relationships between those concepts. It is used to reason about the entities within that domain, and may be used to describe the domain (Wikipedia).

This article covers the following topics:

Storing ontologies using Notation 3 (N3) syntax.
Using SPARQL queries to retrieve data from an ontology.
Reading N3 and executing SPARQL with C# code.

Notation 3

The basics

As said in the introduction, an ontology is a set of relationships between concepts. Such a set can be mapped to a set of triples. Each triple is built of three parts: subject, predicate, and object.

For example, in a triple "(Peter) (suffers) (acrophobia)", (Peter) is the subject, (suffers) is the predicate, and (acrophobia) is the object. Using Notation3 syntax, it can be written as:

my:Peter my:suffers my:acrophobia .

Note the my: prefix and a dot at the end of the triple. That dot is very error-prone so think about it when a syntax error happens. The extra space placed between the object part and the dot eliminates ambiguity which may appear in some cases.

For convenience, predicates and subjects can be grouped using commas and semicolons.

my:Peter a my:person, my:boy;
    my:suffers my:acrophobia, my:insomnia, my:xenophobia;
    my:name "Peter";

    my:likes my:Kate .

my:Mark a my:person, my:boy;
    my:suffers my:insomnia;
    my:name "Mark" .

my:Kate a my:person, my:girl;
    my:name "Kate" .

Peter is a boy, who suffers a lot. His name is Peter and he likes Kate. Mark is a boy as well and he suffers insomnia only. Kate doesn't suffer anything but she is a girl so it doesn't count.

"Peter", "Mark" and "Kate" are literal nodes. Literal nodes can be used to store values which are used in an application. Literal nodes can be of different types: strings, integers, dates, and other XSD types [1]. A type of a literal can be specified explicitly with an XSD type name: "42"^^xsd:integer is the same as 42. Textual literal nodes can also specify a language: "Peter"@en.

There are four special predicates which have standardized URIs and are automatically unrolled by an interpreter: a (type), = (same as), => (implies), <= (implies in the inverse direction). Additionally, any predicate may be treated either as a subject or an object.

my:suffers a my:uncomfortablePredicate .

Every identifier should be globally unique. This is achieved by simply using URIs (Unique Resource Identifier) as names. In all previous examples, a QName notation was used. In fact, to use it, a prefix must be defined. Prefixes are defined using a @prefix keyword.

@prefix my: <http://my.ontology#> .
@prefix : <n3_notation#> .

If a prefix wasn't specified (like in the second line), then the URI would be considered as a default namespace. However, inside the body of the N3 file, colons must be present anyway.

# using default namespace
:Peter :suffers :acrophobia.

The above triple will be expanded to:

<n3_notation#Peter> 
    <n3_notation#suffers>
    <n3_notation#acrophobia> .

Semantic web concept

Semantic web is a group of methods and technologies to allow machines to understand the meaning – or "semantics" – of information on the World Wide Web. Sharing knowledge is the aim of the semantic web concept. The only way to achieve this is to standardize the vocabulary used in ontologies. There are several sources of predefined vocabularies (Semantic publishing). One of them is Dublin Core Metadata Initiative (DCMI Metadata Terms). All you have to do is add a prefix to your N3 file:

@prefix dc:  <http://purl.org/dc/elements/1.1/> .

Then, dc:title would refer to a widely understandable concept of title.

Blank nodes

If you weren't interested in having a named node, you can use a blank node (hurray!). Blank node is a node which has no defined subject. Therefore, it describes something which satisfies a given criteria. In the following example, a blank node [a :girl; :name "Kate"] is used as an object in a triple.

:Peter :likes [a :girl;
                :name "Kate"] .

Peter likes a girl named Kate.

Note: it will not automatically find an object which matches a blank node (:Kate). Instead, a new unnamed node will be created. Another example:

[a :geek;
    :suffers :insomnia] :refreshes :lounge .

A geek who suffers insomnia keeps refreshing the Lounge forum.

Additional resources

N3 resources and tutorials are listed in Notation 3 Resources. For beginners, I recommend this one: Primer: Getting into RDF & Semantic Web using N3 [2]. More advanced techniques are included in an article about Notation 3 [3].

Advantages of using ontologies

Ontology is a flexible way to store data. There is no actual schema in terms of relational databases. There is a vocabulary defined; however, there are no fixed columns which have to be filled with data. New properties can be added at any time without breaking compatibility. The downside is that there is no way to make a particular property "required". The table below does a rough translation of terms used to describe relational databases and ontologies.

Relational database	Ontology
row	subject
column	predicate
table data	literal nodes

Additionally, an ontology can define relations between high-level concepts. In relational databases, this is achieved with foreign keys. We got used to them, but it is still not a natural solution. Let it be a table of fruits. Each fruit has a colour stored in a Color field which is a foreign key to a Colors table. If we found an apple with a colour that is not stored already, we'd either denormalize the table and add new columns (Color2, Color3), or create a separate table, define foreign keys, modify the existing SQL queries, and mess everything up. Using N3, the new colour can be expressed intuitively:

[ a :fruit;
    :name "apple";
    :color :red;
    :color :green ].

SPARQL

Selecting

SPARQL is a query language for ontologies. The best source of knowledge about SPARQL is a W3C Working Draft document, SPARQL 1.1 Query Language [4] which includes a very good tutorial. Therefore, consider this chapter as a commentary to that tutorial. The first query returns the names of all people who suffer insomnia. It uses a syntax similar to SQL and contains a blank node inside a WHERE clause.

PREFIX my: <http://my.ontology#>
SELECT ?name
WHERE
{
    [ a my:person;
      my:suffers my:insomnia;
      my:name ?name].
}

The next query returns the names of people who suffer insomnia or acrophobia, but only those whose names start with a 'J' character.

PREFIX my: <http://my.ontology#>
SELECT ?name
WHERE {
{
    [ a my:person;
      my:suffers my:insomnia;
      my:name ?name].
    FILTER regex(?name, "^J").
}
UNION
{
    [ a my:person;
      my:suffers my:acrophobia;
      my:name ?name].
    FILTER regex(?name, "^J").
}
}

SPARQL supports keywords like (NOT) EXISTS, UNION, OPTIONAL, DISTINCT, MINUS, GROUP BY, HAVING. Available aggregates are: COUNT, SUM, MIN, MAX, AVG, GROUP_CONCAT, and SAMPLE. Moreover, all XPath functions [5] (e.g., string manipulation utils like concat) can be used as long as they are preceded with the "xsd:" prefix. Here is an example from the W3C Working Draft:

PREFIX : <http://books.example/>
SELECT (SUM(?lprice) AS ?totalPrice)
WHERE {
  ?org :affiliates ?auth .
  ?auth :writesBook ?book .
  ?book :price ?lprice .
}
GROUP BY ?org
HAVING (SUM(?lprice) > 10)

A first-class object which may resist in a WHERE clause is a graph pattern. A graph pattern is a set (not sequence!) of conditions which a queried graph must fulfill. A whole WHERE block is a graph pattern itself. A graph pattern can contain the following objects:

Triple patterns -- written similarly to N3;
Property paths -- a shorthand notation which allows to skip transitive variables, e.g., ?org :affiliates/:writesBook/:price ?lprice. (chapter 9);
Variables -- identifiers preceded with question marks (e.g., ?totalPrice);
Filter operators -- used to perform more advanced filtering than just triple patterns;
Graph pattern modifiers -- like OPTIONAL and MINUS;
Nested graph patterns.

Variables are used to link triple patterns. If a bounded variable is put just after the SELECT keyword, then it becomes part of a result set.

Capacities of a WHERE clause include most of the Notation 3 plus some additional features, as shown in the diagram. N Triples and Turtle are other formats not covered by this article. RDF/XML is a mainstream RDF data format; however, it is very hard to read or write for a "standard" human.

Filters

The FILTER keyword is used to narrow the resultset. It takes a boolean parameter and acts as another condition in a WHERE clause. Various types of expression may be passed to the FILTER. They include arithmetic and string comparisons, filter operators, and other expressions which could be used as a sensible true/false predicate. Examples:

FILTER (?price > 10)
FILTER regex(?name, "^Smith", "i")
FILTER EXISTS { ?person my:email ?anyemail }

The (NOT) EXISTS { graph pattern } filter operator returns true or false depending on whether the pattern matches the current query solution.

Graph pattern modifiers

The MINUS { graph pattern } statement removes existing matches from the current query solution, which boils down to a standard set subtraction. To perform this operation, both graph patterns have to be evaluated since the MINUS operator works only on those matches which were already found. Note that this behavior differs from that of the NOT EXIST operator.

The OPTIONAL modifier enables working with incomplete datasets. Consider a query which was supposed to fetch a list of all people, taking their two attributes: name and a phone number.

SELECT ?name ?phone
WHERE {
    [ :name ?name;
      :phone ?phone ].
}

And data:

:A :name "John"; :phone 1234 .
:B :name "Mark" .

The result would by a set containing only one pair: { {name="John", phone=1234 } }. That's because _ :phone ?phone. requires that every match has defined both name and phone. If a complete list with incomplete entries is preferred over an incomplete list of complete entries, then the OPTIONAL modifier should be used.

SELECT ?name ?phone
WHERE {
    ?person :name ?name .
    OPTIONAL { ?person :phone ?phone . }
}

Resultset flood

For a SQL developer, it is obvious that if another condition was added to a WHERE block, then the result set would be narrowed. SPARQL is different. If the added condition is disjoint from earlier conditions, then a Cartesian product could be returned. Think of it as adding another table to a SQL-ish FROM clause which ends up in a result quantity increase. Here is a very straightforward example. The pattern:

{ ?x ?y ?z }

gives N results, whereas:

{
    ?x ?y ?z .
    ?a ?b ?c .
}

gives N*M results. In this particular case N=M, though (thanks to Rob Vesse for explaining that).

Updating

In SPARQL 1.1, update clauses were introduced. They are nicely presented in a W3C article SPARQL 1.1 Update [6]. The INSERT clause allows adding new triples. With the DELETE clause, existing triples can be removed. There are two varieties of both of them. INSERT DATA and DELETE DATA allow adding/removing nodes which are given explicitly.

PREFIX my: <http://my.ontology#>
DELETE DATA
{ my:Mark my:likes my:Kate . }

It is a comfortable solution when a choice of affected triples was done in an application layer. On the other side, INSERT and DELETE clauses include a WHERE statement for the sake of filtering. For example, the following statement ensures that nobody likes people who are xenophobic:

PREFIX my: <http://my.ontology#>

DELETE
 { ?x my:likes ?y . }
WHERE
 { ?x a my:person .
   ?y a my:person; my:suffers my:xenophobia .
 }

The below INSERT "query" outputs that everybody loves girls -- excluding the girls themselves...

PREFIX my: <http://my.ontology#>

INSERT
 { ?x my:loves ?y . }
WHERE
 {
   ?x a my:person .    
   MINUS {?x a my:girl}

   ?y a my:person, my:girl .
    
 }

The difference between the NOT EXISTS and MINUS keywords is fairly explained in a mentioned W3C Working Draft [^].

Using the code

dotNetRDF [7] is a great and free .NET library for RDF (Resource Description Framework), developed and maintained by Rob Vesse. Using it is extremely simple. Reading an N3 file can be done in just three lines of code. The following snippet loads a file ontology.n3 to an in-memory structure Graph.

using VDS.RDF;
using VDS.RDF.Parsing;

(...)

var parser = new Notation3Parser();
var graph = new Graph();
parser.Load(graph, @"n3\ontology.n3");

SPARQL queries can be executed with the ExecuteQuery method.

getInsomnia =
@"
PREFIX my: <n3_notation#>
SELECT ?name
WHERE {
    [ a my:person;
        my:suffers my:insomnia;
        my:name ?name] .
}";

(...)

var resultSet = graph.ExecuteQuery(getInsomnia) as SparqlResultSet;
if (resultSet != null)
{
    foreach(SparqlResult result in resultSet)
    {
        Console.WriteLine("{1}", result["name"]);
    }
}

The ExecuteQuery method runs the getInsomnia query against a loaded ontology. The returned SparqlResultSet consists of a number of SparqlResults. Each SparqlResult corresponds to a single fetched "row". Using an indexer, we can get a value assigned to a desired variable (result["name"]). A variable is any identifier listed after the SELECT keyword. The output is a list of names of people who suffer insomnia:

Peter
Mark

Besides SPARQL queries, dotNetRDF provides an object model accessible via the Triples and Nodes properties of a Graph class. dotNetRDF can also parse and serialize other formats than N3: NTriples, Turtle, RDF/XML, RDF/JSON, RDFa in HTML/XHTML, TriG (Turtle with Named Graphs), TriX, NQuads. Moreover, RDF can be stored using a variety of common RDF Stores or in a SQL database; both Microsoft SQL and MySQL are supported. A complete list of the features can be found in this Project Outline.

Summary

There are many things unsaid yet. If you are interested in using an ontology-related technology, I recommend following the links provided. Getting into RDF is quite easy after some initial warm-up.

Points of Interest

SPARQL 1.1 is still undergoing standardization and is subject to change. Many of the more SQL like features (Aggregates, GROUP BY etc.) are new to SPARQL 1.1 so older implementations may not support these.
Maintaining a big ontology database by hand is ~~difficult~~ impossible. You have to think ahead and include creating an appropriate toolkit in a project's timeline.
Use existing vocabulary as much as you can. This will help interchanging RDF data (DCMI Metadata Terms, FOAF, and Swoogle).
If you got too excited, the article Ontology is Overrated should cool you down. When I read it, I started to wonder about something like "fuzzy ontology" based on a mass feedback from internet users.
Please use the message board below to ask questions or make suggestions.

References

History

Feb/12/2011 -- First version posted.

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

Written By

Lutosław

Software Developer

Poland

My name is Jacek. Currently, I am a Java/kotlin developer. I like C# and Monthy Python's sense of humour.

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.