SPARQLqueries

The goal of SPARQLqueries is to show how to query Wikidata and other open data sources using SPARQL. SPARQL is a query language that can be used to query semantic web data in RDF format. RDF data consists of statements in the form of triples subject-predicate-object. SPARQL queries also make use of these triple patterns.

Take, for example, we have got three programming languages, C, Python, and C++, that are stored in our database, with the following identifiers.

	  <https://example.com/C>
<https://example.com/C++>
<https://example.com/Python>

We use <, > to embed the identifiers for these languages and make use of the example domain https://example.com/, which can be replaced by any other user-specified domain.

Let's also assume that we have another entity called Programming language, represented in the following manner.

	  <https://example.com/ProgrammingLanguage>

We need to represent the relationship between our example programming languages and the entity called ProgrammingLanguage. For this purpose, we may need to introduce a relationship called IsA, which is identified in the following manner:

	  <https://example.com/IsA>
        

Now our goal is to state the following statements:

	  C is a Programming language.
C++ is a Programming language.
Python is a Programming language.

This may be done in the following manner:

	  <https://example.com/C> <https://example.com/IsA> <https://example.com/ProgrammingLanguage>
<https://example.com/C++> <https://example.com/IsA> <https://example.com/ProgrammingLanguage>
<https://example.com/Python> <https://example.com/IsA> <https://example.com/ProgrammingLanguage>

The difference between the above statements using identifiers and the ones written in natural language (English) is that the former can be easily understood by the machines and can be queried.

And this brings us to SPARQL queries. What if I want to ask the following queries?

These questions can be asked using SPARQL queries on our database.

Give me all the programming languages.

	  SELECT ?proglang {
?proglang <https://example.com/IsA> <https://example.com/ProgrammingLanguage>
}

Here, we use the triple pattern seen above but replacing the first value with a variable proglang. So our SPARQL query engine must find all the possible values for this variable that can match this pattern.

How many programming languages are there in my database?

	  SELECT (count(?proglang) as ?count) {
?proglang <https://example.com/IsA> <https://example.com/ProgrammingLanguage>
}

As you can see, though we reuse the query seen above, we also make use of a special aggregate function count that counts the number of possible values that match our pattern with different values for variable proglang.

What is C?

	  SELECT ?type {
<https://example.com/C++> <https://example.com/IsA> ?type }

We change the position of our variable and use a new variable name type to obtain the type of C.

However, instead of repeating the example domain https://example.com/, it is possible to give a namespace to it using the keyword PREFIX. The above query now becomes:

	  PREFIX example: <https://example.com/>
SELECT ?type {
example:C++ example:IsA ?type }

But SPARQL can handle many more complex queries. The goal of this work is to be able to use SPARQL queries on real-world data. In real-life, we may have a lot of information on programming languages like the date of the first release, the names of creators and designers, etc. And our databases are not just limited to programming languages. They may have information on human beings, natural languages, rivers, mountains, etc. Some of this information may not be present in our database and we may need to query external databases. These use cases are discussed in detail in this work.

Wikidata

To demonstrate SPARQL with real-life data, we now use Wikidata, which is an open-data store for information related to a large number of domains and not just programming languages. It also has a dedicated SPARQL endpoint where you can run the queries given below and see the responses.

Basic SPARQL queries (More)

Let's reuse some of the above examples.

Give me a list of the programming languages

	  PREFIX wdt: <http://www.wikidata.org/prop/direct/>
PREFIX wd: <http://www.wikidata.org/entity/>
SELECT ?proglang {
?proglang wdt:P31 wd:Q9143
}

Take a look at the two prefixes that we used in the above query. wdt: is used to specify the relationships and wd: for the entities. In this case, wdt:P31 can be used to specify the relationship IsA seen above or type.

So far, we have been using only one triple pattern in the query. What if, there are more than two triple patterns in the query. Take, for example, wdt:P571 can give the date when a programming language was first released. So now our query engine will need to match both the patterns in our data store as demonstrated by the next query.

Give me a list of the programming languages along with the dates of their inception?

	  SELECT ?proglang ?year {
?proglang wdt:P31 wd:Q9143.
?proglang wdt:P571 ?year.
}

We however we need not repeat the ?proglang every time. We can just simply remove the subsequent appearances, using the ;.

	SELECT ?proglang ?year {
?proglang wdt:P31 wd:Q9143;
wdt:P571 ?year.
}

SPARQL queries using expressions (More)

In real-life situations, we may not be interested in listing all the programming languages. We may wish to filter our results. SPARQL query language supports expressions that can be used to specify conditions and filter relevant results. Let's take an example query.

Give me a list of the programming languages released after the year 2000 along with the dates of their inception?

	  SELECT ?proglang ?year {
?proglang wdt:P31 wd:Q9143;
wdt:P571 ?year.
FILTER (year(?year) > 2000).
}

Now our SPARQL query engine will not only try to match the two triple patterns but also verify whether the inception year is greater than in 2000. In the above example, we make use of the keyword FILTER for filtering the results. A function year will extract the year from the inception date.

Aggregate SPARQL queries (More)

What if, we are not interested in listing the example programming languages, but want to explore the count of available information. The languages provide several aggregate functions like count etc. for this purpose, which are used in the following two examples.

Give me the count of available programming languages?

	SELECT (count(?proglang) as ?count) {
?proglang wdt:P31 wd:Q9143.
}

The above code will give the number of programming languages stored in Wikidata. But it is also possible to make use of expressions and get the count of filtered results that is demonstrated below.

Give me the count of programming languages released after the year 2000 along with the dates of their inception?

	  SELECT (count(?proglang) as ?count) {
?proglang wdt:P31 wd:Q9143;
wdt:P571 ?year.
FILTER (year(?year) > 2000).
}

The above code will give the number of programming languages stored in Wikidata, whose inception is after 2000.

Advanced SPARQL queries (More)

But with SPARQL, we can also try some advanced queries, like asking Wikidata if there are some programming languages stored in the datastore.

Is there any programming language?

	  ASK {
?proglang wdt:P31 wd:Q9143.
}

In the example below, we ask Wikidata whether it has any information about the programming languages and their inceptions.

Is there information about programming languages and their inception?

	  ASK {
?proglang wdt:P31 wd:Q9143;
wdt:P571 ?year.
}

The answers to these queries may be true or false, depending on the availability of the data.

SPARQL queries using Federation (More)

In real-life, not one datastore can store all the information. We may need to make use of multiple datastores to get a (probably) complete view of the different entities. In our final example, we query another data store called DBPedia to see if we can obtain additional information. For example, the C programming language has a lot of information on DBPedia, which may not be present on Wikidata.

Is there some additional information about programming languages on DBPedia?

	  SELECT ?proglang ?resource ?val{
?proglang wdt:P31 wd:Q9143.

SERVICE <http://dbpedia.org/sparql> {
?resource rdf:type wd:Q9143;
owl:sameAs ?proglang;
foaf:homepage ?homepage.
}
}
LIMIT 10

See the use of the keyword SERVICE that specifies the SPARQL endpoint of DBPedia for obtaining the relevant information. The interesting part of such queries is that they can be run on the Wikidata SPARQL endpoint and the query engine will call other services like DBPedia for obtaining the data. Such queries are called federated queries.

This is an introduction to several major aspects of the SPARQL query language. But several advanced concepts are discussed in detail in this work.

About