Over the last few weeks I have been writing a few posts to set the scene around the work (see the panel to the right) I have been undertaking on MDRs, content and the semantic web. Finally, having set the scene, I can now get down to some technical aspects and describe, at a high level, the model I have been trying to construct.
The image below provides an overview of the model and tries to show the layered approach I have taken in building the model. As you can see there are several layers:
- Base Layer
- Core Metadata Handling
- Version Management
- ISO 21090 datatypes
- Research Concept Templates
- Research Concepts
- Operational Models
- Forms (CDISC CDASH)
- Tabulations (CDSIC SDTM)
- Others TBD but there to support things like protocol and ADaM in the longer term
A lot of thought went into arriving at this picture – it may not seem it – and a lot of going around in circles was endured. Should I use ISO 11179 or not? BRIDG? I tried a few ideas, went round in circles, threw some ideas away but eventually came up with this. I tried to base the work on standards and, as the title of the blog suggests, the lovely thing about standards is there are so many to choose from. So what do we have:
I decided to use ISO 11179 – ‘ISO 11179 Information technology — Metadata registries (MDR)’ to give it its full title – as it is the basis of the CDISC SHARE repository. I have to admit to not being a great fan of the standard but I discovered the model had improved since I had last looked at it and I had also seen other work on MDRs that had used it. The standard comes in six parts with ‘Part 3: Registry metamodel and basic attributes’ being the important part – not to say the remaining parts are not useful but part 3 is the guts.
This is where it starts to get a little tricky. Part 3 is now in its third edition. Edition 2 was issued in 2003 and then replaced in 2013 by Edition 3; you can see the pace of ISO standards development by noting the dates. 11179 is hard to get into, the specification is difficult to read and the information contained within is sparse. However, the 2013 Edition 3 added some key constructs which allows Research Concepts to function; the improvements I mentioned earlier. So, after a lot of hair pulling and indecision I decided to go with it.
Having got on the ISO 11179 train we immediately get version management so it seemed sensible to go with the flow. One big downside is that the version management model in Edition 3 is different from Edition 2 so that is really not going to help us a lot when we see repositories based on the two models trying to communicate. May be I am wrong on this, time well tell.
The next building block is terminology. Now the work I had seen to date of the CDISC standards into RDF had used the 11179 Edition 2 model and used that model to store the terminology. But, and I think this is a big but, I am not sure it will meet all of a sponsor’s needs. I was keen to allow not only the CDISC/NCIt terminology to be loaded but also potentially others and in particular hierarchical terminologies. So I did some searching and came across another ISO standard 25964 (again, so many to choose from!). The standard is for implementing thesauri and seemed to make sense, was relatively easy to understand and seemed to offer flexibility. I also found a document detailing a 25964 mapping with SKOS. SKOS is “an area of work developing specifications and standards to support the use of knowledge organization systems (KOS) such as thesauri, classification schemes, subject heading lists and taxonomies within the framework of the Semantic Web” so the combination was attractive.
Then on top of this foundation I implemented the ISO 21090 Healthcare Datatypes – these are complex datatypes – on which BRIDG is based and then BRIDG itself with classes, attributes and relationships though I have to say I have kept it very simple at the moment. Each of the 21090 items and the BRIDG classes I made use of the 11179 building blocks and the version management mechanisms so that all of these items fit with 11179 and can be versioned managed. I should mention I did the same with the 25964 Thesaurus.
Then on top of BRIDG I built the Research Concept Templates with again the RCTs being entities and managed items in the 11179 sense. At this point I could then implement the Research Concept themselves again using 11179 constructs.
And finally I could get to some business objects such as forms and domains. The domains I based upon the work done on the PhUSE’s CDISC Foundational Standards in RDF project that has modelled using ISO 11179 the current CDISC standards. I put another layer on top as these standards need application within a sponsor context; extra information is needed to provide everything needed for sponsor processes. The forms I created a simple form/group/question schema based on ODM but I feel I should replace this with something better, perhaps a proper ODM implementation, but I feel this will need to be expanded to accommodate everything that a sponsor might need.
That is the basic structure. I created a number of schemas for 11179 to reflect the construction within the specifications, one for ISO25964, one for ISO21090, one for BRIDG, one for RCTs, one for RCs, imported the CDSIC2RDF files and based one schema on that for my domains with a final file for the form side of things. As you can see the schemas reflect the diagram but I also did this so, for example, the ISO 11179 could be taken by someone without the rest and used independently as I thought that might be useful.
Next I will write up the ISO 11179 aspects followed by the terminology part. I also will publish the schemas on github and link to any references I found on my journey. I am a little behind on this but I will get there in the next few weeks.
27th March 2015: Name of the PhUSE / CDISC RDF project correct in line with Geoff Low’s comment