Elipson Elipson - 1 month ago 7
Java Question

Retrieving blank node mapping

My group is currently developing a point and click interface for navigating and extracting information from RDF graphs. As a part of this we are connecting to various triple store endpoints using Jena's sparqlservice method. In order to move the point at which the user is currently looking, the user can select a node and make it the center. The program then fetches the neighbors of that node using the expression seen below:

CONSTRUCT {
<URI> ?p ?o .
?s ?p <URI> .
} WHERE {
{<URI> ?p ?o .}
UNION
{?s ?p <URI> .}
} LIMIT N


Where URI is the node the user has selected(We do something slightly different for literals). This expression is then executed as follows:

Query myQuery = QueryFactory.create(_query);
QueryExecution qexe = QueryExecutionFactory.sparqlService(this.myURL, myQuery);
Model resultModel = qexe.execConstruct();
return resultModel;


The issue we are facing is with regard to blank nodes. When Jena gets a blank node from an endpoint, it is immediately assigned a Jena bNode ID. This ID will not be the same as the one presented by the endpoint, and if a user selects a blank node on the client side as the new center, this obviously causes issues.

My question is therefore: Is there some way to retain the original endpoint-ID within Jena? From browsing through the belly of Jena, I can see that several of the
ResultSet
classes use a class to handle the mapping between endpoint- and Jena ID's, called
LabelToNodeMap
. Is there some way to retrieve this mapping? Or alternatively, prevent Jena from using its own ID schema, and instead use the endpoints.

Answer Source

Essentially no you can't identify a blank node directly when talking to a remote SPARQL service.

For a start the various SPARQL results specifications actually don't mandate that stores send their internal IDs as blank node IDs. For example the SPARQL Results XML specification has this to say:

Note: The blank node label I is scoped to the result set XML document and need not have any association to the blank node label for that RDF Term in the query graph.

And even with CONSTRUCT queries the situation is similar, almost all RDF formats say that a blank node label is only scoped to the document. So if I have _:id and _:id in two separate requests semantically speaking I have two different blank nodes.

Regardless of the format you also have the issue that some syntaxes are quite restrictive in what characters can appear in a blank node label so even if a store does use its internal identifiers (which is rare) it will often have to escape/encode them in some way to be valid syntax. This then requires you to be aware of each endpoints escaping/encoding scheme (if it exposes identifiers at all) and how to translate it into an actual ID.

Bottom line is that the endpoint isn't giving you its internal identifiers anyway so making Jena preserve them (which strictly speaking is possible though not an easy extension point) wouldn't really help you.

Even if you can preserve them you can't send them back to the remote endpoint since blank nodes in a query are anonymous variables not identifiers. Some stores will accept the non-standard syntax <_:id> to refer to a blank node but many will not and you are going beyond the SPARQL specification so your application loses portability.

Workaround

The workaround is to simply extend your previous query, your question implies that the user only sees this blank node because of a previous query. Since you can only identify blank nodes by association you can modify your previous query to ask for additional details about the blank node.

It may be that this will return details about multiple nodes and you have to do some client side processing to figure out which node the user actually wanted and how to associate the additional data with your existing visualisation but that is all doable.