In the previous post in this series I looked at the first building block in the standards stack, the metadata standard ISO 11179. In using ISO 11179 the one area I had concerns was over how terminology and 11179 worked together (see the “So Many To Choose From” post to the right).
I also had a gut feeling that semantic technology could do a better job than what ISO 11179 could do for me – of course, you can place 11179 into a semantic implementation – but I was aware of standards like SKOS (Simple Knowledge Organization System) and thought it might be worth looking around.
So, armed with Google, I went out and had a look and came across the ISO 25964 standard.
I’m not quite sure what ‘clicked’ when I was searching but the appeal of ISO 25964 was:
- Simple model
- Allowed for hierarchy (e.g. higher level terms / lower level terms. I have talked about this in previous posts)
- Generic and allowed for loading different terminologies.
- I found a mapping document from ISO 25964 to SKOS that was interesting and helpful (see references below)
So I created a simple schema reflecting the ISO model and loaded several versions of the CDISC terminology into the model by taking the CDISC terminology RDF files and running a XSLT to convert them into the 25964 Schema.
I remain unconvinced about the way I have organised the URIs for the versions. I took the versions from December 2013 and all of those issued since applying a version to each. For each version I took the SDTM terminology and the COA (the old questionnaire and functional test terminology) and combined these into a single .ttl file for loading into Topbraid.
The following ‘manifest’ file lists the files I loaded and the version numbers. It may seem odd to start at 34 but I used a baseline of the first terminology release that I could find on the NCI Thesaurus web site which was April 2007 and I labeled this as version 1.
<?xml version="1.0"?> <CDISCTerminology> <Update date="2013-12-20" version="34"> <File filename="SDTM_2013-12-20.owl" /> <File filename="QS_2013-12-20.owl" /> </Update> <Update date="2014-03-28" version="35"> <File filename="SDTM_2014-03-28.owl" /> <File filename="QS_2014-03-28.owl" /> </Update> <Update date="2014-06-27" version="36"> <File filename="SDTM_2014-06-27.owl" /> <File filename="QS-FT_2014-06-27.owl" /> </Update> <Update date="2014-09-24" version="37"> <File filename="SDTM_2014-09-24.owl" /> <File filename="QS-FT_2014-09-24.owl" /> </Update> <Update date="2014-10-06" version="38"> <File filename="SDTM_2014-10-06.owl" /> <File filename="QS-FT_2014-09-24.owl" /> </Update> <Update date="2014-12-16" version="39"> <File filename="SDTM_2014-12-15.owl" /> <File filename="QS-FT_2014-12-16.owl" /> </Update> <Update date="2015-03-27" version="40"> <File filename="SDTM_2015-03-27.owl" /> <File filename="QS-FT_2015-03-27.owl" /> </Update> </CDISCTerminology>
One of my drivers for loading the terminology was to look at changes and version history. I initially tied to do this using SPARQL but failed. This was down to my knowledge rather than a limitation of SPARQL and my lack of knowledge about running queries across two or more namespaces. This is something I need to revisit.
So, wanting to get some results, I resorted to a simpler SPARQL query per version, and then used various XSLTs to convert the results into the various outputs that I wanted.
You can see the results in the Terminology Difference and History tools:
In terms of the actual mapping of the CDISC terminology to ISO 25964 see the image. The green boxes indicate where the CDISC terminology items get mapped to, the blue indicate the use of SKOS. The extensible attribute is the only item where I ‘extended’ the model but I would also note that I simplified the definition part of the model and the Preferred able Synonym parts just to get things working quickly.
Some other notes:
- Code List Items and Code Lists are related using skos:narrower
- Code Lists and the Terminology release are related using skos:inScheme.
- Both the classes are Thesaurus and ThesaurusConcept are subtypes of ISO11179:IdentifiedItem so as to use the version management features of ISO 11179
The standard itself is available on the ISO website and I also found a dedicated ISO 25964 website:
I also came across a few good reference relating ISO 25964 and SKOS:
- Word document that looks at ISO 25964 to SKOS mapping, very useful
- Slides about the ISO 25964 to SKOS mapping above
- Background paper The ISO 25964 Data Model for the Structure of an Information Retrieval Thesaurus
- More background materials The ISO 25964 data model for the structure of an information retrieval thesaurus
- More useful slides ISO 25964 – the new standard for thesauri and interoperability with other vocabularies
A single file can be found on GitHub as usual called ISO25964.ttl. The construction of the file should be easy to follow given the diagram above.
If anyone wants to see the actual terminology files or the XSLT files that I used to create them then quite happy to post those. Let me know via the comments.