A quick update on my progress in regard to the work on LinguMatic.
To recap, the LinguMatic project is my effort to create an “open-source” Business Semantic Thesaurus (BST) to address the interoperability issues between eBiz interchange standards (e.g. UN/EDIFACT, UNTDED/ISO7372 and Core Component Library (CCL) and ERP backends (in-house databases).
At the start of this year I started with researching what the best framework would be for such a semantic dictionary/thesaurus. Under consideration were:
- XML Dictionary eXchange Format (XDXF), a project to unite all existing open dictionaries and provide both users and developers with universal XML-based format; and
- Simple Knowledge Organization System (SKOS), a family of formal languages designed for representation of thesauri, classification schemes, taxonomies, subject-heading systems, or any other type of structured controlled vocabulary. SKOS is currently developed within the W3C framework.
After reviewing both I concluded that SKOS being part of the Semantic Web is the most promising to active my goal. For more information on SKOS click HERE!.
The next step was to find some free (open-source) software for managing such a thesaurus.
I found three promising tools, they were:
- TemaTres, an open source vocabulary server for manage controlled vocabularies, taxonomies and thesaurus;
- SKOSEd, a plugin for Protege that allows the creation and editing of thesauri (or similar artefacts) represented in SKOS; and
- iQvoc, a vocabulary management tool that combines easy-to-use human interfaces with Semantic Web interoperability.
After spending some time evaluating all three, I ruled out SKOSEd because it was just an editor, and I was looking for a online tool and server. There were some other tools, some were commercial products and others were outdated or their development had stopped some time ago.
That left me with TemaTres and iQvoc, the later being based on the Ruby on Rails open-source Web framework which allows increased productivity combined with unlimited interoperability of the product, and TemaTres is coded in PHP requiring a HTTP Web server and Database server (like MySql).
After playing with both, I decided on iQvoc because it supports not only multiple languages within the thesaurus, but also for the website user interface. Since the goal is to make this a global tool, multilingual support is a absolute must.
iQvoc is actively being developed by innoQ Deutschland GmbH and is being employed in a variety of diverse projects. At the moment iQvoc is actively being used by several projects. The German Federal Environment Agency (Umweltbundesamt) employs iQvoc in the public thesaurus UMTHES. Since it is an open-source project the code is available at GitHub, the current version is 3.5.6. I had little trouble deploying it on my server since it is based on Ruby on Rails.
Now that I had the system up and running, I needed to pick a semantic subject area that would not be to large, but large enough to demonstrate the issues the project is to address. I chose to use the postal address since I had done some earlier work comparing the CCL address entries to the Universal Post Union (UPU) data model.
As I started to enter some concepts I realized that I needed a copy of the UPU S42 standard to enter the definitions for all the concepts, but sadly it is not publicly available, and to buy it one can only purchase the complete set of UPU standards for about €100. Doing some searching I finally found some related work using S42 called AddressVocab. It is a vocabulary to add postal addresses to foaf-profiles. The Friend of a Friend (FOAF) project is creating a Web of machine-readable pages describing people, the links between them and the things they create and do; it is a contribution to the linked information system known as the Web. FOAF is a small but shapely piece of the wider Semantic Web project, a term coined by Tim Berners-Lee.
Thanks to the FOAF work I was able to enter the concepts and collection making up the postal address. However, there is always a however, I quickly realized that the pointers/bridges to the “interchange standards” needed to be real URLs that point to, and open, a new web page providing the details of the linked data. Sadly I realized that the CCL was only available as a EXCEL spreadsheet, and the UN/EDIFACT online directory is not usable to link to.
Therefore I decided to deployed a second instant of the iQvoc server to contain the semantic information required for the linkage. My plan is to use the second server for the relevant UN/EDIFACT, CCL and TDED semantics. Instead of transferring all the data from those three directories, it will be done when required because of the related concepts with the Business Semantics Thesaurus.
So what is the current status of those two systems?
On the LinguMatic – Business Semantics Thesaurus System, I have completed to enter all the concepts and collections of the FOAF postal address model. Since the FOAF model uses it own Agent model for the Addressee and Mailee concept collections, I have not created those collections nor entered the underlying concepts. However, I found last night an ISO/IEC JTC1 SC32 WG2 document that contains the complete UPU data model with all the needed definitions. Therefore, I will update the LinguMatic side by adding the UPU definitions to the FOAF definitions for those concepts shared by both, and add the missing UPU concepts and collections. For now I have not included the match links to the EDIFACT, CCL and/or TDED concepts. This will be done when all the address concepts have been entered on that side.
On the eBiz – Semantic Directories (CCL/TDED/EDIFACT) system, I have added all CCL concepts with links to the LinguMatic side, and started with some UN/EDIFACT element concepts. However, now that I have the UPU address information, once they are entered, I will have to update those with any new relevant links.
The next steps:
- Update the LinguMatic side with the missing UPU concepts and collections, including any updates to the definitions shared by both the UPU and FOAF concepts and collections.
- Complete the eBiz – Semantic Directories side adding all UN/EDIFACT and TDED concepts and collections.
- Update the LinguMatic side with the “match” links back to the EDIFACT side.
- Send invitation to those that have indicated to be part of the open-source project to get the project underway, now that there is enough to build a full “proof-of-concept” system.
In closing, the preliminary work is almost completed to get the LinguMatic project of the ground. If you like to participate, get an invitation, drop me a line using the Contact Me widget.