In the previous post, I discussed the changes that we have seen in the latest release of the CDISC terminology, seeing the differences between two releases and using those results for a simple impact analysis. The post and a couple of previous ones discussed the various drivers for change and the amount of content being released that needs to be handled and version managed by sponsors.
The posts are not intended to be a criticism. It is very easy to criticise, much harder to do. I appreciate that all of the restructuring, new content, and updated development processes are required, if we are ever to get the well structured content with the required coverage we all desire and need. We need to be realistic about the challenges and my concerns are about getting the right structures for all of this content and having the right tools to meet the challenge, when we try to manage that content, while at the same time meeting the challenges of today’s studies.
I have always believed that, if you raise issues, you should also be putting forward ideas, solutions and/or suggestions. I have been using MS Excel based tools for a long time, tweaking them as needed but I know full well that there is better way; web-based tools and services. So my first step was the build the difference tool for the terminology. The issue with the terminology is it’s volume and being able to focus in on the item that you are interested in at that time. Is it searching to see if the CDISC terminology contains the term you want or seeing the impact of a code list item change? These are two different needs. I had a spreadsheet format that worked but it was a bit clumsy and you could always inadvertently modify something.
Against this background, while I undertook my annual pilgrimage to the Tour de France, I started looking seriously at the semantic world and what it could bring to the table, as well as the work done by others at the time. After going round in circles for a while and trying a few different approaches, I concluded that the semantic way combined with Research Concepts is the right approach. Mind you, I am always happy to learn and if there is something else out there which improves the solution and the chance of success then I am not against incorporating it. That said, too many changes, keep changing and you will never produce anything.
I have imported the terminology files, along with other content, into a semantic database (a triple store). This store has then been used to prototype other functionality such as Research Concept construction and the building of operational items such as forms and domains based on those Research Concepts. Another prototype is a simple tool for the auto generation of an annotated CRF based on a form specification.
The terminology tool was one of the first steps. I have made it accessible via the web such that I could use it but also so that others could as well. If others find it useful or suggest ways it can be improved and that combined wisdom can be incorporated into tools such as CDISC SHARE and/or the NCI Thesaurus then that would be great. I put the tool out there last week but did not advertise the link that much but you can now find it here. I promise not to move it!
The information presented goes back to the last release of 2013 (the Q4 release) and is based on the OWL files released by NCIt. They started releasing the OWL files back in Q2 2013 but I thought going back to the end of 2013 was far enough though it would not take long to do the others if people want them.
Once I get a little more organised and a little further forward, I will publish the materials under some appropriate open-source licence such that everyone can benefit from anything useful that emerges from the work. As I have probably repeated several times already in the post, we need to have better ways of creating, organising and managing content within sponsors, if we are to succeed. My aim over the coming weeks and months is to prototype ideas and take these ideas forward into tools and content that meet the business need, with solid foundations in the semantic web, available to whoever wants to use them. I will also be looking to sharing any lessons learnt with the community, as part of any appropriate CDISC or PhUSE team. If others wish to help (and one or two are already doing so) then so much the better.