8 Responses to “Metadata and Layers”

Comments

Read below or add a comment...

  1. I think the picture is essentially right. You mentioned controlled terminology (codelists) as the “base”. However, I don’t think we treat codelists correctly in CDISC at this moment.
    First, let us look at your example “age units”. In CDISC we defined our own codes for that (YEARS, MONTHS, WEEKS) whereas the rest of the healthcare world (100 times bigger than ours) consequently uses UCUM units. Why did we reinvent the wheel?
    Now, let us look at the codelist for “laboratory test”. Wait a minute! We have two of them and they seem to be independent: one for “laboratory test code” and one for “laboratory test name”. With two independent lists, how can a system know that “APTT” (from CL.LBTESTCD) means “Activated Partial Thromboplastin Time” (from CL.LBTEST)? Well, it can’t.
    The reason? The SDTM tables.
    Now tables are two-dimensional and are fine to have VIEWS on data. But tables are not the reality, they just are “views”. In our diligence to create controlled terminology, we seem to have forgotten that.
    The world is not flat (famous statement of an FDA representative), but it is not cubic either. And those knowing a bit more about geography will know that it is not spherical either.
    It is good that we assign “allowable values” to our concepts. But we should stop starting thinking that these are simple values and not composite objects themselves (e.g. test code + test name).
    In other cases we even need to go a step further: composites like:
    test code – test name – test position (e.g. SYSBP, systolic blood pressure, sitting/standing/…). And for the units, these may be composite too (just like UCUM treats them): mm[Hg] (millimeter + Mercury) or m[H2O] (meter + water).
    For other tests, we may need to add “test body location” (for example for “body temperature”).

    So if we need a solid basis for our model, we might first need to completely rethink the way we treat codelists in CDISC.

    • Dave IH

      Jozef

      I would agree with your basic premise that code lists in he CDISC world need a good looking at. As you point out Test Names and Test Codes and the “linking” thereof is certainly an issue. But I still believe that controlled terminology sit as the base of the metadata “stack” in that as a general principle I think metadata should only make use of items in the lower levels and never use from a higher level.

      As for tables, yes they are just views or presentations on the data, I would agree. The view of consistent use of metadata across the life cycle does lead to the idea of a use-case neutral data storage format that supports all use cases.

      I am not so convinced about some of your ideas regarding some of the observational qualifiers such as position, location, method etc. I see those as data items with associated controlled terminology that qualify an observation. So they would be another atomic part of a “concept” rather than just part of a single coded value holding several items of information (i.e. as BRIDG has modelled them).

      Dave

  2. Hi Dave,

    fully agree – I did not want to suggest that “location” and “position” should be part of a single coded value. As you state, it is better to treat them as qualifiers for an observation.

    I am still puzzled why CDISC decided to create its own controlled terms for units though, ignoring UCUM.

    Best regards,

    Jozef

    • Dave IH

      Jozef

      With respect to UCUM, timing was probably an issue, the core work on SDTM was done say 2000 to 2005 or so – difficult to remember – what was the state of UCUM at that time? Knowledge levels of the team at that time was probably also an issue as well as maturity of UCUM itself along with its complexity.

      There is always the argument for “lets get something done” versus the Rolls Royce solution. The problem sometimes – not always – with “lets get something done” is that it can leave you vulnerable in the future. It is a judgement call and 20/20 hindsight is a wonderful thing! :)

      Dave

  3. Anthony Chow

    Speaking of timing, I was just presenting and preaching the same topic to my data management constituents. Someone from the audience asked, how do I tabulate the subject’s position (supine, sitting, etc) when blood pressure was measured. I answered by asking if the protocol specifies one way or another; and, whether we can distinguish the position from the CRF metadata. When the protocol specifies a sitting systolic and diastolic blood pressure, I hope the CRF is set up in such a way the position is discernible. For example, one can label the CRF question prompt accordingly; or, name the collection variable SITTING_SYSBP_VSORRES and SITTING_DIABP_VSORRES; or, add variables SYSBP_VSPOS and DIABP_VSPOS (hide them with a default when appropriate). This way, these essential attributes are apparent in the data transfer and downstream processor such as ETL can take advantage of them.

    If I may apply the above back to he “concept” concept you mentioned, I would think clarity is ensured when people are instructed and trained to collect sitting systolic and diastolic blood pressure to mindful of 1) position; 2) measurement unit; and, 3) specific verbiages. So, the concept should be defined upfront. This is no different than SDLC where user requirements should be well documented and understood before implementation.

    Lastly, it is good to see you writing again.

    • Dave IH

      Anthony

      I think your second para is the key

      The idea of the “concept” is that it is defined upfront once with such items as position, location etc being either collected or pre-defined, but every study collects or they are fixed. That way the data from every study are of the same structure irrespective of whether the data are collected or not. As an aside the “don’t care” values need some thought.

      The concepts are then stored in a metadata repository (MDR). The third figure, the code lists and AGE item are these pieces.

      If we have consistent data then, as you say, downstream processes such as ETL etc can greatly benefit.

      So for blood pressure, the common items such as test code(s), test name(s), position, method, location, time, date … and then value and units for both systolic and diastolic giving us at least 11 pieces of information. Here we need to be slightly careful, normally we repeat the test code etc for both sys and dia, so a little thought is necessary but there are ways to achieve the answer. I would define this once, store in my MDR and re-use study after study.

      These “atomic” data elements, things I an split no further, combine into a “concept” of blood pressure that can be referenced in a protocol, expanded in the CRF, used to build a tabulation – as Jozef said a view of the data – using ETL tools and onwards.

      Dave

  4. Simon Bishop

    Thanks for starting the discussion, Dave.
    Layering of metadata is very important – the habit of defining objects as the combination of definitions and terminology has been hugely expensive to companies who find themselves with multiple objects with broadly (but not exactly)the same definition but with different names and different terminologies (and no cheap way to bring these together).

    I see 4 layers:
    1. Definitions of clinical concepts (e.g. systolic blood pressure) together with the identification of all component parts (e.g. method, body position). It should be possible for these to be used across the whole pharmaceutical industry.
    2. Terminology used for component parts (e.g. a set of valid values for the body position component of the systolic blood pressure). It is desirable that these be industry standard too, but this will be hard/impossible to achieve for all clinical concepts.
    3. Groupings of individual clinical concepts (e.g. those comprising the set of vital signs). It is possible to standardise the more common examples, but no more.
    4. Standard definition of operational objects (e.g. as eCRFs, SDTM datasets, company specific datasets). Of these, only CDASH modules and SDTM datasets can be standardised across the industry.

    Bullet 1 talks about defining clinical concepts together with their component parts. Including the component parts is very important: there are many ongoing initiatives to define Clinical Data Elements (CDEs) and these efforts only go partway to what industry needs.

    Current work to define Clinical Data Elements (CDEs) does not deliver all the data re-use capabilities needed e.g. the recent Parkinson’s disease standards developed by the National Institute of Neurological Disorders and Stroke (NINDS) and National Institutes of Health (NIH) have no recorded relationships between CDEs (other than through human interpretation) and no model for developing these so these are often very specific and inconsistent in approach, limiting the ability to automate processes and limiting the downstream benefit. Here are 4 examples:

     CDE1 (CDE is very specific; instructions require reference to CRF page): “Has participant/subject ever regularly taken ibuprofen-based non-aspirin medications, that is, at least two pills per week for 6 months or longer”
    Instructions say “If No is answered, skip to question #2”
     CDE2 (units are part of the CDE definition): “Record the pulse of the participant/ subject in beats per minute”
     CDE3 (2 separate CDEs for weight and weight unit): “Record the weight of the participant/subject. To be collected at the visit, not self-reported. Also, indicate whether weight was measured in pounds (lbs) or kilograms (kg)”
     CDE4: “Weight unit of measure, choose either Pounds (lb) or Kilograms (kg)”

  5. I would like to add a layer “0″ to Simon’s nice list. And that is what you may call Real-World Phenomenon 1). And also, for layer 3 there is also the need to categorize/classify [clinical/scientific/research/observation] concepts (e.g. for lab tests hematology, urinanalysis), and also relate concepts to each other, beside grouping concepts together.

    1) “On the one hand there is your blood pressure itself, the real-world phenomenon which obeys the laws described in a medical textbook (which will tell you about systemic arterial pressure, about systolic and diastolic phases, about fluid dynamics, etc., etc. complicated physics and physiology that will be of practical importance e.g. when designing an instrument that can accurately measure blood pressure or when dealing with a patient who has atrial fibrillation). On the other hand there is a blood pressure observation, another real-world phenomenon, but of an entirely different sort, involving factors such as:
    - the position of the patient at the time of measuring (sitting, lying, etc.),
    - the tilt of the surface on which the person is lying,
    - the variation in measured blood pressure with respiration,
    - the instrument used to measure the blood pressure,
    - the size of the cuff if a sphygmomanometer is used,
    …”
    From a blog post on the HL7 Watch blog: http://hl7-watch.blogspot.com/2006/02/is-there-difference-between-person-and.html

    See also http://ontology.buffalo.edu/smith/articles/Vital_Sign_Ontology.pdf

Leave A Comment...