Contributed by: Dirk Saturday, December 11 2010 @ 08:00 am EST
IKS (Interactive Knowledge Stack) is an initiative[*1] to help Content Management Systems enrich their content with semantic information (think Semantic Web[*2] ). The initiative, which is in part funded by the European Union, has now reached a point where a first usable version is available. The IKS Project is running an early adopters program and a series of workshops to help get CMS vendors become familiar with the system and so that they can provide early feedback.
One such workshop was held on December 9 + 10 in Amsterdam[*3] , Netherlands. I (Dirk) attended this workshop, representing Geeklog, to try and get a better idea of what this project is all about.
What the IKS Project has produced so far is a webservice with a REST[*4] API that CMS can send their content, e.g. an article, to. The webservice then sends back some RDFa[*5] that contains the semantic information for the article. So if the article would contain the word "Paris", you'd get back information stating that Paris is a place and a list of possible places called Paris that the article may be referring to. Since there are several places[*6] in the world called Paris, the RDFa will also include a confidence level for each of the options. The quality of the results depends on the semantic engines behind the webservice.
The webservice used to be called FISE but has since been accepted by the Apache Software Foundation as an incubating project and will be known as Apache Stanbol[*7] from now on.
Stanbol can be thought of as a middleware with the REST API on one end and a collection of semantic engines on the other end. As mentioned above, the quality of the results depends on these semantic engines. The idea is that you can select which engines you are using for your CMS or website. This allows for specialized engines that can identify, say, medical terms. Or you may drop the engine that identifies places and use one that knows about Greek sagas, where "Paris" would most likely refer to a person[*8] .
From a technical point of view, you would typically run your own instance of Stanbol (although it can also be shared between sites). Being written in Java, Stanbol is a bit of a heavy-weight but simple enough to install. In preparation for the workshop, I hacked together a very primitive Geeklog plugin that would simply send every story to Stanbol when it is saved. The REST API makes this really easy. The more tricky part is parsing and interpreting the RDFa and to use it for something interesting. For my prototype I settled on making it highlight the places and persons it had identified. There will probably be PHP libraries developed as part of Stanbol, which would also offer a way for Geeklog to contribute to the project.
Speaking of Geeklog: What does all this mean for Geeklog now? That's up to us, i.e. the Geeklog community, really. Is there enough interest in getting involved with what looks like a promising project to finally make the Semantic Web (proposed in 1999, after all) a reality? Can you think of use cases for adding semantic information to your Geeklog site? Please leave comments below.
To throw out just one idea: Google is already interpreting some semantic information embedded in pages (see Google Rich Snippets[*9] ), so this may help in SEO, at least for some sites.