At last I get the chance to write the first answer to the series of questions I, along with Kerstin, posed in the first blog post in this series
The question was “What is the semantic web and linked data?” So first of all Kerstin suggested that I read an article An Intro To The Semantic Web: Why You Need To Know About It Sooner Than Later and watch a video Linked Data (and the Web of Data), a 4-minute video from Ireland’s Digital Enterprise Research Institute (DERI).
I did not feel that either gave me that one sentence answer I was seeking to the question posed. I wanted something short and sweet. Kerstin and I had a chat via a web conference, she sitting in a café in Goteborg with me in the UK, the power of the internet allowing us to connect and discuss. Around the same time Kerstin posted a link via twitter (you can follow her via @kerfors) to a presentation by Juan Sequeda on Slide Share. So I looked and read. Fortunately this particular slide set and explanation made far more sense to me.
I also did that thing we all do when faced with explaining some concept to people when we ourselves do not really know the answer, we step back to first principles. We used to reach for the dictionary on the shelf, now we go to the technological wonder that is Wikipedia.
The Semantic Web is a “web of data” that enables machines to understand the semantics, or meaning, of information on the World Wide Web.
After digesting these two additional sources, in particular the presentation by Juan Sequeda, the light bulb in my brain started to glow, a little dimly may be, but glow nonetheless. My conclusion?
While the web we know today is a series of documents designed for human consumption, the semantic web wants to permit access to all the data across the web and allow machines to process it.
Now Kerstin changed this in the draft of the article to:
While the web we know today is a series of documents designed for human consumption, the semantic web wants to make data ready for automation where machines can directly process it and hence give humans new insights
But there may be something missing. And it is that word semantic in “Semantic Web”. We also want to understand the data, get at the meaning of the data. So my final version of a description for the semantic web:
While the web we know today is a series of documents designed for human consumption, the semantic web wants to expose the data, allow the meaning of the data to be understood and permit the data to be consumed by machines so as to give humans new insights about the data
To allow the machine access, we need a mechanism by which data sources can describe and expose their data. This is not something I want to discuss in this post but we will certainly be returning to this topic in subsequent ones. Now, if two sites publish their data and there is a common component within those data (some part is common), then the two sets of data can be linked. And thus we have linked data.
The semantic web is a way of publishing data, linked data is the process of linking it together.
One final thought that crossed my mind as I was reading through the materials referenced above is about the definition of the data. The semantic web seems, and I stress this is an initial impression bred from ignorance, to be built on less than precise definitions. This is another question we will look into in subsequent posts.
So our set of questions can now be extended to:
- What is the semantic web and linked data? See this blog post
- What is an ontology?
- Where do RDF and OWL fit?
- How do I link my data?
- How precise are the definitions?
- What could these things, if they are relevant , bring to a metadata repository?
- What do they mean to clinical trial data, what is the benefit, what is in it for me?
- What can other industries teach us?
Hopefully we will reach then end of the questions rather than just generate new ones!