My desire for this post was to jump in and start writing about Research Concepts and describe some of the work that I have undertaken to date. Then I stopped myself and thought that it might be wise to set the scene as to why am I bothering to look into this idea of metadata. So I’m taking a step back and writing a scene setter, in preparation for subsequent posts.
When I came into the pharma industry, around 15 years ago, I came from a background where standards where the norm. I had worked in a variety of industries, within which nothing worked unless you used standards. Coming into pharma, it was quite a shock to see very few standards and the lack thereof quickly led me to discovering and using ODM and getting involved in CDISC. At that time, back in 2002, I needed something to transport study definitions from system to system to allow for quicker study set up.
So why use standards is the first question. I will ignore the obvious answer I can now throw at you, ‘cos the FDA says so’. To me using standards is blindingly obvious, a no brainer. It will save you time, if you can re-use content and use standards to build a repeatable process. As you repeat that process, you improve, saving more time. Note however that it will take some work to get to that happy state. It takes effort, it takes perseverance. If the same process is understood by a pool of people, you gain operational flexibility by being able move those people around as demand dictates.
A very simple example. This week I was reviewing some SDTM annotations applied to a questionnaire. On the CRF, the category had been set to a value from the CDISC terminology. All good there. However the test code chosen by the person setting up the form was not the correct CDISC code. A quick change to use the CDISC code results in better content that is now stable and good for some time to come. Had the use of the invalid code progressed through the process, the danger is that we end up with a study using one test code, a second study repeating the process but using a second and different test code and so on. If the test codes differ, I cannot align the data from these studies easily. For these questionnaires, we don’t have result codes defined (where the results are code lists) so, again, we are in danger of studies using their own and thus alignment of data being problematical. Set them up once and reuse them and later parts of the process become easier. Setting up future studies that use this questionnaire becomes easy. If the form is readily available to studies and is easy accessible, the study teams know it is there. It can then be readily used by tools to build it into a study and such information can be used by subsequent parts of the process as needed (e.g SDTM creation).
Consider the way you are reading this article. A range of standards have been used to transport and structure the content such that the browser can render the post in the way intended (you might disagree with the words but it looks OK). If one part doesn’t obey the relevant standard – the HTML formatting, the HTTP protocol etc. – then it won’t work. Obey the standard and the browser springs into life and there is the page. The content works across a hardware and software platform built by a whole range of independent teams using standard specifications. Without those standards and their correct implementation there is no hope of the page being delivered correctly across the hardware and software working together, in what we know today, as the internet and world-wide-web. As an aside, I suspect we are moving towards such a world with submissions at the FDA. Don’t get the content and structure right and the submission won’t file.
OK, that was a simple justification but standards are good. If you don’t agree I would stop reading now. So why this talk of metadata?
I can give you a 400 page PDF once every six months and let you get on with it. You’ll have just got familiar with the last version and the new one will drop onto your desk. The amount of content is growing and the pace of change is increasing. The 400 pages every six months could well turn into 600 every quarter. Tucked in the middle is some really important small change to some piece of content. You got tired and skipped over that page. We need automation to help us see all the details. If we stick with paper we wont get to where we need to be, the content is already too big.
I want machine automation, where this mass of information (the 400, 600 or whatever the page count might be) is reduced to a download (or a set of) that the machine understands that results in every piece of information being filed into the right place. Even if the content is only a human-readable rule, I want it placed in the context of the item it applies to, such that I can see there is a rule, when I look at the item in question. I want to see the changes quickly, be able to assess those changes and see if they impact me and my content. My systems should be able to respond to the new definition, if something has been updated. If its new, I want to be able to use it to replace my existing internal definition with ease, assuming of course they are the same thing and that is another large issue in its own. All of these definitions are metadata. I want metadata to make my life easy.
I want standards, I want them electronically as metadata.
So that is my motivation, electronic standards to make managing content easy and allow that content to be used efficiently within the business.
That is the background. Now we can start talking about the business need and some technical details in the next few posts.