RocketRML - A NodeJS implementation of a use-case specific RML mapper

Umutcan Şimşek, Elias Kärle, Dieter Fensel

KGB Workshop co-located with ESWC 2019, Portoroz, Slovenia, June 03, 2019

Twitter: @semantifyit | @umutsims

Acknowledgement

We would like to thank to our developers Phillipp Häusle and Thibault Gerrier for their implementation support.

Introduction

We are building knowledge graphs, mainly based on tourism data

Tyrolean Tourism Knowledge Graph (27M+ facts per crawl from 7 different sources)

MindLab (in progress)

DACH-KG (in progress)

We get the data from various (mostly) propriatery sources, typically in JSON or XML format

Then, we map them to schema.org

The generated RDF data feeds the knowledge graphs and websites as semantic annotations

We needed a scalable mapping solution

We looked around, made some preliminary tests and found RML very suitable for this task

But soon we faced some challenges due to our uses cases, so we decided to implement a new mapper

Motivation

We need to...
  • handle large files (>500 MB)
  • work with nested objects that do not have any fields to join

								[{
									"name":"Gschwandtkopflifte",
									"type":"SkiResort",
									"contactDetails":[
									{
									"address":{
									"street":"Gschwandtkopf 700",
									"postcode":"6100",
									"city":"Seefeld",
									"type":"Office"
									}
									},
									{
									"address":{
									"street":"Gschwandtkopf 702",
									"postcode":"6100",
									"city":"Seefeld",
									"type":"Lifte"
									}
									}
									]
									}]
							

Tool

Limitations

Only XML and JSON as source format (CSV support is coming)

No Named Graphs

No JOIN Support

Only javascript functions

Tool

Customizations

New Iterator Behaviour

Tool

Customizations

Global Language Tags

The language tag can be given to the mapper as an option and this tag is then attached to all mapped string literals

Particularly useful for data from the tourism sector

Tool

Performance Tests


RML-Mapper: v3.0.2 | RML-Mapper-Java: v4.3.1

Tool

Performance Tests

RML-Mapper: v3.0.2 | RML-Mapper-Java: v4.3.1

RML Test Cases

All applicable tests passed.

New Implementation with Joins

Implementing joins affected the performance drastically.

So we did some optimizations...

New Implementation with Joins

  • Each TriplesMap is iterated once
  • Before starting the mapping process for a TriplesMap, we check whether the TriplesMap is in the join condition of another TriplesMap. If it is, then we get the parent path of the join condition and evaluate it. The value then is cached as path - value pair

New Implementation with Joins

  • Then we map the data based on the TriplesMap as usual. If there is a join condition encountered during the mapping, then value of the child and path to the parent is cached in the child
  • After everything is mapped, we go through two caches and join the objects with matching child and parent values.

New Implementation with Joins

   
						
								rr:subjectMap [ rr:template "http://example.com/resource/student_{ID}";
								  rr:class <http://example.com/ontology/Student>;
							   ];  
								rr:predicateObjectMap [ 
								  rr:predicate foaf:name ;
								  rr:objectMap [ rml:reference "Name" ];
								] ;
								rr:predicateObjectMap [ 
								  rr:predicate <http://example.com/ontology/practises> ;
								  rr:objectMap [ 
									a rr:RefObjectMap ;
									rr:parentTriplesMap <TriplesMap2>;
									rr:joinCondition [
									  rr:child "Sport" ;
									  rr:parent "ID" ;
									]
								  ];
						];
An excerpt from RMLTC0009b-JSON mapping file

Comparison with other RML Implementations

Carml: v0.2.3
RML-Mapper: v4.3.3

Comparison with Other RDF Generation Tools

Same setting as described in this article by Maxim Kolchin
Karma: v2.2
RML-Mapper: v4.3.3
SPARQLGenerate: v1.2

RML Test Cases

All applicable tests passed.*

* Currently, there is an issue with blank node naming due to the jsonld library.

Source Code on GitHub

Also available as NPM package and integrated with semantify.it

Thank you for your attention!

We are looking forward to new cooperation with the Knowledge Graph community!