I have long been searching for that one picture, a simple one-pager that can be used to explain some of the issues we face in creating and using standards. A few months ago I drew the first version of the diagram below and then shared it with one or two people, discussed it a little and then updated it a little in light of the comments I got back.
It sits at a very high-level and that is the intention. It does not contain all the items you might expect to see. It is not intended to. It is intended to be a one-pager and allow for the issues to be discussed; it is there to get the idea across and let others understand the discussion. It has enough to serve that purpose.
There are two sides, the left and the right. Click on the image for a larger version.
The left represents our standards today in that our standards specify views of the actual data. It is not exhaustive but uses a couple of examples. A dataset is a means of reporting several ‘activities’. I use activity in a loose sense. So many of the words we use are loaded and people make assumptions based on the label we apply. An activity is an assessment, an observation, a procedure, a drug administration but includes anything we do in relation to a subject. A CRF is a means of collecting one or more ‘activities’; it is yet another view of our data. The third item I included is define.xml, define being a means of describing one or more datasets from a study. These three examples are all views of the data, one to collect it, one to assemble the results and report and the define to assemble it into a package for submission.
The right represents the actual data rather than a view of it and the natural relationships within the data. One key aspect is that an ‘activity’ can be considered a part of healthcare or part of a clinical study. Of course simply drawing two shapes and a connecting line between doesn’t solve the problem but if the two worlds of healthcare and research could be aligned in terms of the representation of the low-level data then integration of that data becomes a lot easier. This, hopefully, ties to the good work we see from HL7 with FHIR.
The relationships between subjects, parents, siblings etc. is one area where I have long felt we could do a much better job. Often the relationship is just a text label on a CRF such that we need to rebuild them into the data. We could do much better in preserving these important relationships if we had better data structures in the first instance.
Our problems stem from our understanding of the relationship between the right and the left worlds; effectively the ‘map’ between the left and the right. We don’t have a complete fool proof method to bring the right into the left and we don’t have a complete understanding of the right-hand side. As a result we get overloaded terms, we get people mapping to SDTM in different ways and we don’t get high quality data.
As an aside some will comment that the right-hand side is BRIDG. Yes, possibly, but BRIDG, in my experience, is difficult to consume and understand. We need something simpler.
One area where I see issues is in the therapeutic area developments. I see work on the TA areas on the right-hand side with concept maps but then it jumps back to the left side with domains specifications etc. If the work focused on the right with a standard mechanism to produce the left I suspect we would have much more consistent TA products. Back in 2012 I wrote a couple of posts about what was then being referred to as 55 in 5 and some concerns about the programme.
This week it’s the PhUSE CSS meeting in Silver Spring. I am heading to that meeting with high hopes that ideas similar to those above will be discussed. I don’t suggest this is the complete answer but I do believe the ideas are well worth pursuing.