Introduction
Today (Wednesday 6th May 2015) I gave a presentation at the EuroInterchange in Basel about Research Concepts based on work I have performed over the last the five or more years. Last year at the CDISC Interchange I provided a write up to accompany my presentation and, as some of you may know my slides tend to be rather pictorial, I decided to do the same this year. Hopefully the write-up will provide those who attended something useful to refer to post the event while for those who could not attend can also access the talk.
The post consists of the submitted short abstract followed by a slide-by-slide commentary. The slides in PDF format can be found here.
Current slides available via the above link are version 2 as presented. If changes are made subsequent to the presentation then I will note them here.
Presentation Abstract
With the arrival of the FDA guidance on electronic submissions and the associated data conformance guide there is increasing pressure to provide higher quality submissions to the FDA. In the past mention has been made of Research Concepts (Research Concepts) to help sponsors create such higher quality data but what are they, how might they assist sponsors and how are they deployed?
This presentation, based on experience of three implementations that have used Research Concepts and of the CDISC SHARE project, will provide answers to the above questions using concrete examples of Research Concepts to illustrate the underlying principles, relate Research Concepts to the CDISC SHARE project and highlight where CDISC is already publishing Research Concept metadata.
This presentation will:
- Examine why we want Research Concepts and highlight the principles behind them
- Provide an understanding of what a Research Concept is
- Detail how Research Concepts can be deployed
- State the benefits Research Concepts bring to the business and support business artefacts such as annotated CRFs and define.xml.
- List the existing sources of Research Concept metadata
The presentation is based on practical experience gained in both production and prototype environments.
Slides 1 & 2
The usual title slides.
Slide 3
This slide is a scene setter and the scene is one I am sure we have all encountered. We stare at the CRF and ponder where we are going to map the various elements. Some are easy but we all too often hit the tricky ones. We resort to the CDISC website, google, LinkedIn forums, may be papers from PhUSE or pharmaSUG conferences.
We eventually decide to map it to X. But the person in the sponsor company down the road encountering the same or similar issue maps it to Y, or may be X but in a different way. Industry standardization just failed. The FDA pick up the pieces.
Slide 4 – We Need Better
We need better standards. The slides lists a varied set of reasons under a set of headings. They are not all equal and they overlap. There is nothing new on the slide, we have all seen the reasons before, we have all heard the logic.
This is an interesting area and I am tempted to write a post just on this particular topic in the future.
Slides 5 & 6 – Variable-Based World
So consider a simple example. We all know the Vital Sign (VS) domain, it is straightforward but serves as an example that people can quickly understand. Within VS we have the test ‘temperature’ as one of a long list of test codes (variable VSTESTCD and the associated code list C66741) and we have a list of units (VSORRESU and the code list C66770) but what I do not get told is that, when using temperature test, the only valid units are F and C.
I know that I know it, I know you know it. Unfortunately we have not told the machine, the computer.
Similarly when dealing with temperature do I have to use it in conjunction with VSLOC and/or VSLAT, may be VSMETHOD or other variables? With temperature it is easy, again we all know, but for more complex test it is less clear. We need to move towards clear and complete definitions in all cases.
Slides 7 – Research Concepts
Research Concepts (RCs) are there to try and solve these issues. Research Concepts simply try and bring all of our knowledge about out data into useful self-contained package that can be used across the life-cycle. We want to bring clarity to what goes with what, what is valid and what is not. We want complete definitions with ALL terminology defined with the appropriate subsets all in a form that the human AND the machine can understand.
We can then use these definitions within a variety of scenarios such as the basis for creating all of our business objects, supporting the end-to-end process and, very importantly, providing traceability. They can also prove very valuable in impact analysis when, for example, new versions of terminology are produced.
Slides 8 – Simple VS Research Concepts
So what do these concepts look like. This slide is based upon a simple MS Excel demonstrator implementation I did recently to outline the key principles behind Research Concepts. The spreadsheet defines several VS Research Concepts using a simple ‘flat’ scheme but each can be represented as per the ‘tree’ diagram seen on the slide. Each Research Concept in this demonstration environment gives:
- Concept Name
- Structure: the tree and the relationships. For example Result is composed of a Value and the Units
- Terminology and subsets, for example Units are ‘cm’ and ‘in’
- Tells us what is not required, no method, no position etc.
Slides 9, 10 & 11 – Vital Signs – Additional Information
I said earlier than we don’t have ‘Research Concept’ information as yet. This was a small lie in that CDISC released spreadsheets in late 2014 that detailed some of the information that we need for VS and for EG domains. Slide 9 has an extract that provides the units for HEIGHT and WEIGHT.
Note that for HEIGHT ‘mm’ is listed as a possible unit whereas I did not include it within my simple Research Concept on the previous slide.
Slide 10 is the equivalent information for SYSBP and DIABP. Note here we have the additional information relating to position and the code list subset associated with it. Also note we only have a single possible unit, ‘mmHg’.
The ‘tree’ picture on slide 11 represents DIABP with position in the right place.
Note the structure is essentially the same as the previous Height Research Concept. This leads to identifying Research Concept Templates (RCTs) that set the pattern for the RCs in a particular area (here VS). The templates allow us to build Research Concepts that are consistent and of high quality.
Slides 12 – Define Once, Use Many
This is an update to an old slide of mine, one I have been using for a very long time, but it shows the linkage across the silos that Research Concepts can help bridge.
First the protocol, we often see elements like this within visit descriptions indicating the test and procedures to be performed within a visit. Here we want blood Pressure captured that is composed of Systolic and Diastolic pressure. Is this one concept or two? I will ignore this since experience to date is showing that being able to group two Research Concepts into a higher level Research Concept may well be beneficial and thus I could have BP composed of DIABP and SYSBP. The use of the Research Concept name within the protocol immediately links the protocol to the Research Concept and can help with subsequent study build processes. You can see that collections of, say, Lab Tests (a panel) would be useful as will being able to state individual tests, questionnaires that relate to several Research Concepts and so on. Using the Research Concept names could also help with writing protocols and aid clarity on what data are to be collected.
Obviously, also in the protocol is the Visit & Assessment Schedule (Table/Schedule of Events/Assessments/Procedures) which lays out the visit versus assessments for the study. This tends to be more at a CRF level (but is not always, it might be more a collection of RCs) but obviously it could have a direct link to the Research Concepts. It would be desirable to express this table in terms of Research Concepts, Forms built from Research Concepts or groups of Research Concepts.
As an aside two aspects that also are worth considering:
- Linking Research Concepts to inclusion /exclusion criteria to allow for structured protocol IEs that could be machine evaluated.
- The Statistical Analysis Plan and ‘analysis concepts’ and the use of Research Concepts therein is something that also needs to be investigated, but that is some way off.
The CRF built to capture the data also links to the Research Concept. From the protocol we obviously know which Research Concepts are to be captured and thus each field on the CRF references the corresponding element within the Research Concept defining the data to be captured along with question text, code lists etc. On this form obviously we need to refer to both the SYSBP and DIABP Research Concepts. But here we have a common field, the position. It will be defined within both concepts but we only want to collect it once so the form will refer to the same item within both concepts.
And then the tabulation, the SDTM domain. The VSPOS variable (column) references (points to) the position definitions within the Research Concepts (the other variables similarly pointing to the appropriate item within the Research Concept). This allows the VSTESTCD and VSTEST variables (as well as other fixed values) to be set based on the content within the Research Concept.
Now I have a chain, the protocol specified BP which is composed of SYSBP and DIABP. These Research Concepts are used to define the CRF, the CRF fields referring the the items within the Research Concepts (the leafs of the trees seen earlier) and the variables within the VS domain also refer to the same items. I have linked my protocol to my CRF and to my tabulation. I can work forward. Equally, and very importantly, I can work back. I get traceability for free.
Slides 13 – Silos
Silos bedevil the industry. This slide is here just to illustrate what we all see in our day jobs, we have standard stacks within each silo but jumping from one to the next is sometimes hard. A lot has been done with mapping tools and define.xml for example with the CRF to SDTM ‘gap’ but the solutions all leave something to be desired.
Slides 14 – Decrease Need for Mapping & Gain Traceability
By using Research Concepts we can start to knock down these silos; we can link the CRF and Tabulation silos in both the forward (process) and reverse (traceability) directions. As seen earlier we can also link to the protocol.
As you will notice there are no links into the analysis world as yet but people are thinking on this, there is some work in CDISC and PhUSE but this does need to be attacked. I believe we will see ‘concepts’ for the analysis work but what form they take it is too early to say.
Slides 15, 16 & 17 – Pictures
The next three slides I included just to emphasise the use of Research Concepts and the layered approach that can be taken working down from Operational Objects such as CRFs and Domains that refer to Research Concepts that, in turn refer to terminology. Note that reference is always down the stack and never up. This is particularly important with the SDTM annotation not being set within the Research Concept but being defined at the domain level for instance.
This means an Research Concept can appear in one than one domain (in theory) though this would be unusual. An example might be that CDISC place a certain test in Vital Signs but for a particular study the sponsor wishes to place it into a custom domain.
Slides 18, 19, 20, 21 & 22 – Deployment
Metadata is great but on no use until it is deployed in support of the business, i.e. used within a study. Slide 18 details some of the use that can be made of the Research Concept metadata based on my experience within production environments.
The next three slides give high level algorithms, based on the metadata hierarchy just seen, for the production of an study annotated CRF – the same algorithm can be used for standard library CRFs – define.xml and the production of metadata that can be used in the production of SDTM.
An important aspect to highlight is slide 21 on Value Level Metadata. Using Research Concepts we do not have to expend effort in building value level metadata. It is all present within the Research Concept, we have the complete picture readily available. Thus the creation of define files becomes a lot easier. It should be this way, Research Concepts give us complete metadata which is assembled into a study, define is merely a listing of our study metadata.
Slides 23 – Asthma Therapeutic Area Work
So what are the sources of this wonderful metadata?
We have seen earlier the spreadsheets published by CDISC containing partial VS and ECG (panel on right hand side titled “CT Mapping/Alignment Across Codelists”) metadata. That is one source. I noticed while writing this post that a small amount of cardiovascular metadata has also been published.
The next source is the Therapeutic Area work currently underway. This slide shows an extract from the Asthma Therapeutic Area User Guide (TAUG) and the Forced Vital Capacity (FVC) concept. More on this in a minute
The final source is a sponsor’s own define.xml file. A lot of the work in determining the relationships we have spoken about has been documented within the numerous define files produced by sponsors. The issue is the content has not be aligned across the industry as part of the TA work. This is the next big challenge for CDISC.
Slides 24 – Asthma Therapeutic Area Work
Some of the TAUGs, not all, provide very good metadata definitions. If we examine the FVC definition we can extract the metadata and present it as one of the familiar tree diagrams. Note here with this RC that we have two potential result units (“ml”, “L”) with fixed method and location attributes.
On this slide is the first mention of BRIDG. Research Concepts use BRIDG-based templates to ensure consistent construction. We need to have well structured Research Concepts with the same item of information (say the method) placed in the same place within all Research Concepts such that we know where to find it and to aid automation and to do that we need a framework. So we use Research Concept Templates based on BRIDG and then ‘fill’ these templates over and over with different content to build the RCs. So we might have a single VS RC Template to build all of the VS RCs.
Slides 25 & 26 – Research Concept Creation
This slide is here to show some features of Research Concepts using a screenshot from a prototype tool (it is actually a prototype web-based tool that I will publish shortly) that I am using to investigate Research Concept creation, management and use combined with semantic web technology.
The tool takes a Research Concept Template and allows the user to fill out the template and assign the correct terminology to the various items to create an RC. Here I have built the FVC Research Concept we just saw from the Asthma TAUG and entered the definition.
In the middle at the top is a target window, the Research Concept I am building. To the Right is the template I am using to build from. I can drag and drop each piece from the Template into the target Research Concept (not all pieces of a template are needed in every Research Concept, I might not need a Method for example in every case). Below is the CDISC terminology to allow the code list items to be dragged and dropped (and thus attached, to the various parts of the Research Concept.
Note that the Research Concept Template is a tree structure while the Research Concept itself is flat, two column, list; a name value pair type structure. This is just to show we can look at Research Concepts in various ways while the machine definition behind can maintain the true, more, complex structure.
The names in the first to the two columns is based on an alias mechanism to hide the complexity of the underlying BRIDG structure. We want well structured Research Concept Templates but once built from BRIDG we can hide the BRIDG details from users. Only those working with templates need to understand this BRIDG piece.
Also I have placed on top of the screen shot the SDTM annotation. The alias names could reflect these but I am cautious as I do not want to see Research Concepts bound to particular domains so as to maintain flexibility. I want SDTM to refer to the Research Concepts not the other way around.
We will want to see Research Concepts expanded from what you see here to include more CDASH-like information such as question text, formats, edit checks and completion guidelines thought I still debate with myself whether some of this information should reside at the Form or Research Concept level or may be a combination with some information at the RC level and some at the Form level.
The next slide is a simple visualisation from the tool of the resulting Research Concept. Note when I tool the screen shots I have not attached the Method code.
For the tech nerds this is a D3js visualisation and I did this to gain familiarity with the graphics library. I feel that visualisation will be very helpful in tools in particular with impact analysis (what uses what type graphs especially where the metadata is semantic-web based).
Slides 27 – Research Concept Import
It is pretty obvious that we cannot create every Research Concept that we need by hand, the task is simply too big. One area of work I have recently been looking at is importing MS Excel workbooks that contain Research Concept definitions along with the associated templates and then translate those definitions into a semantic form and a corresponding ODM representation.
The point here is not so much the tool, it is a prototype to look into issues of creating Research Concepts from source materials etc, but to illustrate that Research Concepts can be delivered in many forms. Here it is the Excel source materials translated into ODM and a Semantic Web representation.
Slides 28 – ‘Traditional’ Use
Just because we have Research Concepts does not mean we have to change everything. This is really important to appreciate.
There are ways in which users can limit the impact on current process but gain benefit from the better definitions; Research Concepts are only the same information that we use today but are more complete, consistent across the industry and removes a lot of the mapping heartache when we have domains based on them. We can extract some or all of the Research Concept definition into ODM, spreadsheet or other form that meets the needs of the current process. The process should hopefully not need changing but will result in better business outputs, CRFs, SDTM, aCRFs and defines that are consistent and of higher quality.
Slide 29 – Summary
And in summary. Research Concepts are good. We want well defined, wells structured, complete and consistent definitions with terminology defined. We want to use these RCs to support better ways of running clinical studies but also want to be able to use the same definitions to support the work today.
And Finally …
Hopefully the above has proved useful. I want the post to provide an insight into what Research Concepts are. Research Concepts are simply a binding together of all the definitions we need into convenient packages that can be used as the basis of a better, more automated process, that can be used within existing process to bring, in my opinion, significant gains.
If you want further explanation or want to ask questions or simply comment feel free to do so using the comment facility.
Hi Dave,
I really liked your presentation at the European Interchange meeting and despite I’m not familiar for now with the RDF / Research Concept representation I can see this is very powerful. Can I ask which editor do you use ? I would think that this is feasible to represent links between CDASH and SDTM for example and let the MDR knows that those links should be based automatically. I can see a lot of automation can be done with that concept, am I right ? Does that include standards governance or this has to be defined more at the MDR level ? (trace changes to standards). Thanks again for this interesting topic.
Melanie
Melanie
Thank you for the comments. Answering your points
1. First thing to note is that we don’t need the semantic web for Research Concepts. the ideas can be implemented in many technologies. The semantic web may be the best way but we can use spreadsheets, java even SAS! 🙂 It is the ideas that are important.
2. For semantic work I use the Topbraid tools, see http://www.topquadrant.com/products/
3. We can automate a lot by basing both CDASH and SDTM on the research concepts and sharing definitions across the standards.
4. Standards governance is more of a MDR & process function. One thing I felt might be worth writing about is impact analysis. The MDR can tells us what is impacted when something changes, so if a CDISC term changed the MDR can tell which Research Concepts are impacted, which forms and domains use those Concepts to allow us to judge the impact of any change