I did promise to write every week but that didn’t appear to go that well. My defence is that I was a little reluctant to write about something I was having some doubts about. I’ll explain below.
Research Concepts, Biomedical Concepts, Clinical Models, Archetypes … there are probably a few other names I could call them if I dug around in the darker parts of my memory. They are essentially the same stuff. Small packages of knowledge that relate to some measurement or event in the real world. They have meaning in their own right. However it is worth noting that they also need context to be really useful. An old favourite example is Age 26. Whose age? When were they 26, yesterday or 3 years ago? Are they in a study, which one? The questions go on. BCs define all of the data we record.
For further information about Biomedical Concepts (BC) have a read of the previous post in this series where there are links to the slides and paper from the PhUSE Conference in Vienna. This should give the background for what I am talking about in this post.
So in keeping with the last post I will split the words into two parts, the nerdy / techie bits and the user / business aspects.
The Nerdy Bit
As ever there is a model and an associated semantic implementation. As I have done in the past I have a single picture that shows my model such that I can refer to it when writing queries and remember what is going on. The model is to the right, click on it for a larger view (as with all the images posted).
Now we get to why I didn’t post for a few weeks. I have been beating myself up. It’s wrong. It’s not right. It, like me, is flawed. In simple terms it doesn’t fit well into ISO11179, the metadata standard (see the previous posts for more on my friend and adversary the ISO 11179 standard) . I say it does not fit well. It is probably fairer to say that it does not implement well.
The model is what I want to support Biomedical Concepts and meet the business need. I then have to fit that model into the ISO 11179 standard. When I do this every ‘class’ should be a ISO concept and every ‘attribute’ a property (see the screen shot from an NCI website on UML to caDSR mapping, I found this useful for aligning my own thoughts on ISO 11179). What happens when you do this is that ISO 11179 has a somewhat complicated structure for implementing relationships. It has Links and LinkEnd (see the posts on ISO11179 and the concepts section here and also the last post where I posted the model diagram if you want more information) and a series of other classes to store the metadata about the model you want to store. The problem is the implementation explodes because of the meta modelling of such entities.
To illustrate the point have a look at the image which I found in a presentation on the 11179 Working Group site that had this slide within. This picture really brought home the impact of using a ‘pure’ ISO 11179 approach. I had already used this method in implementing the ISO 21090 datatypes but the explosion in triples per object is large.
17 triples for every one seemed to go against the spirit of using semantic technology (I’m still thinking about this) and so this is why I looked at my screen and read and thought and looked at my screen and read some more and went round in circles for a few days. My tea consumption went up but other than that I achieved nothing. As an aside this all relates to the Object Modelling Group’s Meta Model stack and the M1 and M2 parts, the metadata (BC Model) and the meta model (ISO 11179) parts. It comes down to how true I wanted to be.
After much thought I decided to take the pragmatic approach and implement it as it is to get it working and then come back to this issue and refactor the design / code as necessary. Refactoring is a relatively new term (if you are the same age as a dinosaur) used by software engineers to describe the situation where we did it badly the first time and come back to tidy up the mess after. I’ve been doing this for 30 years or more; its just nice that we have a nice soft and fluffy term for it now which hides the embarrassment.
So looking at the actual model you can see a BC Template (BCT) or instance (I use BC to refer to the instances) at the top, both of which are ISO concepts such that they can be managed within the repository (version controlled, have unique identifiers and state such as approved/draft etc). A BC instance is based on a template. A BCT (template) or BC (instance) is then based on the Item class that links to the relevant BRIDG Class / Attribute pairs. The BRIDG attribute has a complex datatype that is handled by the Datatype class (with a link to the ISO 21090 definition) with each Datatype having one or more Property classes (the Datatype and Property classes reflecting the ISO 21090 setup). However, a Property can itself be a complex datatype with further properties so there are forward and back relationships around the Property and Datatype classes to allow for this ‘recursion’. At the bottom we hit simple data types which have PropertyValues. For coded data there may be several values to select from so we can put these into a series using the ‘nextValue’ object property. The recursion around the complex data types is the fun bit. Semantic technology rather helps here.
A template defines the required structure of the concepts based on BRIDG Classes and Attributes and the relationships between them. ISO2109 datatypes drive the lower level structure. A BC then takes such a template and adds values for any terminology that needs to be set (binds the terminology in). This includes what concept it is, potential method codes, result units and everything else we need. BRIDG provides a framework to ensure the BCs are well-formed. After that I don’t really want to see it.
I have used some simple concepts in a previous implementation and have these in an XML format so I have used XML Transforms (XSLT) to convert these from the XML into a semantic format for loading (they are converted to turtle .ttl files as I find the format the easiest to read). The screen shots below show (in order) my messy internal XML format, a snippet of the resulting turtle file and a screen shot form the system once that definition is loaded.
The one thing to compare is the XML and Turtle views in which you can see BRIDG references. In the user screen shot there are no BRIDG references. The BC has also been flattened from a tree-like structure and the alias mechanism (note that these are pure text labels) used to identify the items (leafs of the BC) in terms the user understands today.
As ever the model files are on GitHub.
The Users Bit
It all looks rather complicated. It’s not. From a user’s perspective the only thing the user needs to pay attention to is the screen shot above of a BC in the system. All of the nerdy technical details are hidden.
A BC is a collection of variables, the collection proving the metadata about an observation or event. Each item on its own makes no sense or is of little use. Together they tell us all of the details, in this case that it is a height, the value, the units and when collected. Other such ‘concepts’ would then provide content such as subject, study to provide the context for the observation.
In the screen shot the green highlight is there to show what is enabled. Enabled is simply those items being used from the superset of items available in the template. For example some BCs will need a method code, some will not. We also have a collect flag to indicate what is collected and what is preset; –TESTCD is never collected, the BC presets this such that the system knows what it is. We also see prompt text and question text (not filled in yet!) and format information. What we have is, in today’s terms, a combination of some CDASH, some SDTM combined with new information to form a building block that is independent of those standards.
There is no need to comment about the terminology fields and duplicate values. I have left this in, this will be resolved in an update shortly but is related to providing a full reference (rather than a partial reference) to the term(s) desired. You will see both partial and full references in the screen shot.
The BCs can then be used, for example, in building CRFs (this is the next blog post) and auto-creation of annotated CRFs (working on this at the moment, nearly there, some bugs to fix) and other business objects. You will notice from the screen shot that all the value-level metadata is present, I don’t need to worry about this in subsequent steps, I just need to include the BC and it arrives with the BC.
I hope a lot of the BCs will come from CDISC. There is some BC metadata in the recently released COPD TAUG and I want to get round to looking at that and loading it so see how it works, time seems to be in short supply at the moment. Templates should definitely come from CDISC and the screen shot to the left is my first attempt at an BC editor which I am currently testing (there are bugs here). As an aside if you want to enter BCs you want a bulk entry facility but there will be a need to tweak BCs, add a code list value for a new method, an additional unit and such like.
And Finally …
One aspect that is worth pointing out. You will notice the value at the leaf of the BC tree. If you were to maintain a name value pair of leaf references with captured values or copy the BCs and fill in the captured values then all of the clinical data are immediately linked to the required metadata. It will be in a semantic form. If some other data are, say your previous study, your partner org’s study or some other study in the same therapeutic area then the data can be readily combined. Also, SDTM datasets would be a query or two away (more on that in a post to come). My immediate vision is SDTM creation with little or no code.
As I said above I am currently working on Forms based on BCs and also automated annotated CRF creation. The aCRF bit is close, the form bit is one of those areas where you can go to town and have lots of wonderful features. Hopefully by the end of next week I will have them working so as to be able to write the next post. I also have some basic SDTM domain functionality in (linking dominos to BCs) but I will leave that for a different post.