It’s a good question.
In fact, it’s a really great question, as searching for similar advice online results in very few opinions on the subject.
But which one’s are the best for novices? Which have the best learning curves? which has the easiest install or the shortest time between starting out and being able to query things?
I’ll try to pose as much as I can as a newcomer which won’t be too hard 🙂 Some of the comments will be my own, and some will be comments from others, but I’ll try to be as honest as I can be to reflect new user expectation and experience and most importantly, developer-attention span. (See the end for some of my reasons for this approach.)
(Puts on newbie hat and enables PEBKAC mode.)
Installable (local) triplestores
Sesame – http://www.openrdf.org/
Simple menu on the left of the website, one called downloads. Great, I’ll give that a whirl. “Download the latest Sesame 2.x release” looks good to me. Hmm 5 differently named files… I’ll grab the ‘onejar’ file and try to run it. “Failed to load Main-Class manifest attribute from openrdf-sesame-2.2.1-onejar.jar”, okay… so back to the site to find out how to install this thing.
No links for installation guide… on the Documentation page, no link for installation instructions for the sesame 2.2.1 I downloaded, but there is Sesame 2 user documentation and Sesame 2 system documentation. Phew, after guessing that the user documentation might have the guide, I finally found the installation guide (system documentation was about the architecture, not how to administer the system as you might expect.)
(Developer losing interest…)
Ah, I see, I need the SDK. I wonder what that ‘onejar’ was then… “The deployment process is container-specific, please consult the
(Only Java-friendly developers continue on)
Right, got Tomcat, and put in the war file… right so, now I need to work out how to use a commandline console tool to set up a ‘repository’… does this use SVN or CVS then? Oh, it doesn’t do anything unless I end the line with a period. I thought it had hung trying to connect! “Triple indexes [spoc,posc]” Wha? Well, whatever that was, the test repository is created. Let’s see what’s at http://localhost:8080/openrdf-sesame then.
“You are currently accessing an OpenRDF Sesame server. This server is
intended to be accessed by dedicated clients, using a specialized
protocol. To access the information on this server through a browser,
we recommend using the OpenRDF Workbench software.”
Bugger. Google for “sesame clients” then.
- There is a Java client it seems, but it seems to need a lot to get going. Oh, and useful if my application is in Java or in a JVM (jRuby, jython)
- http://jeenbroekstra.blogspot.com/2008/09/sesame-2-desktop-client.html .Net GUI… not so useful for programmatic stuff
I’ve pretty much given up at this point. If I knew I needed to use a triplestore then I might have persisted, but if I was just investigating it? I would’ve probably given up earlier.
Mulgara – http://www.mulgara.org/
Nice, they’ve given the frontpage some style, not too keen on orange, but the effort makes it look professional. “Mulgara is a scalable RDF database written entirely in Java.” -> Great, I found what I am looking for, and it warns me it needs Java. “DOWNLOAD NOW” – that’s pretty clear. *click*
Hmm, where’s the style gone? Lots of download options, but thankfully one is marked by “These released binaries are all that are required for most applications.” so I’ll grab those. 25Mb? Wow…
Okay, it’s downloaded and unpacked now. Let’s see what we’ve got – a ‘dist/’ directory and two jars. Well, I guess I should try to run one (wonder what the licence is, where’s the README?)
Mulgara Semantic Store Version 2.0.6 (Build 2.0.6.local) INFO [main] (EmbeddedMulgaraServer.java:715) – RMI Registry started automatically on port 10990 [main] INFO org.mulgara.server.EmbeddedMulgaraServer – RMI Registry started automatically on port 1099 INFO [main] (EmbeddedMulgaraServer.java:738) – java.security.policy set to jar:file:/home/ben/Desktop/apache-tomcat-6.0.18/mulgara-2.0.6/dist/mulgara-2.0.6.jar!/conf/mulgara-rmi.policy3 [main] INFO org.mulgara.server.EmbeddedMulgaraServer – java.security.policy set to jar:file:/home/ben/Desktop/apache-tomcat-6.0.18/mulgara-2.0.6/dist/mulgara-2.0.6.jar!/conf/mulgara-rmi.policy2008-11-14 14:06:39,899 INFO Database – Host name aliases for this server are: [billpardy, localhost, 127.0.0.1]
Well, I guess something has started… back to the site, there is a documentation page and a wiki. A quick view of the official documentation has just confused me, is this an external site? No easy link to something like ‘getting started’ or tutorials. I’ve heard of SPARQL, what’s iTQL? nevermind, let’s see if the wiki is more helpful.
A default configuration for a standalone Mulgara server runs a set of
web services, including the Web User Interface. The standard
configuration puts uses port 8080, so the web services can be seen by
pointing a browser on the server running Mulgara to http://localhost:8080/.
Ooo cool. *click*
SPARQL, I’ve heard of that. *click*
HTTP ERROR: 400Query must be supplied
I guess that’s the SPARQL api, good to know, but the frontpage could’ve warned me a little. Ah, second link is to the User Interface.
Good, I can use a drop down to look at lots of example queries, nice. Don’t understand most of them at the moment, but it’s definitely comforting to have examples. They look nothing like SPARQL though… wonder what it is? I’m sure it does SPARQL… was I wrong?
Quick poke at the HTML shows that it is just POSTing the query text to webui/ExecuteQuery. Looks straightforward to start hacking against too, but probably should password protect this somehow! I wonder how that is done… documentation mentions a ‘java.security.policy' field:
string: URL: The URL for the security policy file to use.
Kinda stumped… will investigate that later, but at least there’s hope. Just be firing off the example queries though shows me stuff, so I’ve got something to work with at least.
Jena – http://jena.sourceforge.net/
Front page is pretty clear, even if I don’t understand what all those acronyms are. downloads link takes me to a page with an obvious download link, good. (Oh, and sourceforge, you suck. How many frikkin mirrors do I have to try to get this file?)
Have to put Jena on pause while Sourceforge sorts its life out.
ARC2 – http://arc.semsol.org/
Frontpage: “Easy RDF and SPARQL for LAMP systems” Nice, I know of LAMP and I particularly like the word Easy. Let’s see… Download is easy to find, and tells me straight away I need PHP 4.3+ and MySQL 4.0.4+ *check* Right, now how do I enable PHP for apache again?… Ah, it helps if I install it first… Okay, done. Dropping the folder into my web space… Hmm nothing does anything. From the documentation, it does look like it is geared to providing a PHP library framework for working with its triplestore and RDF. Hang on, SPARQL Endpoint Setup looks like what I want. It wants a database, okay… done, bit of a hassle though.
Hmm, all I get is “Fatal error: Call to undefined function mysql_connect() in /********/arc2/store/ARC2_Store.php on line 53″
Of course, install php libraries to access mysql (PEBKAC)… done and I also realise I need to set up the store, like the example in “Getting Started“… done (with this) and what does the index page now look like?
Yay! there’s like SPARQL and stuff… I guess ‘load’ and ‘insert’ will help me stick stuff in, and ‘select’ looks familiar… Well, it seems to be working at least.
Unfortunately, it looks like the Jena download from sourceforge is in a world of FAIL for now. Maybe I’ll look at it next time?
Triplestores in the cloud
Talis Platform – http://www.talis.com/platform/
From the frontpage – “Developers using the Platform can spend more of their time building
extraordinary applications and less of their time worrying about how
they will scale their data storage.” – pretty much want I wanted to hear, so how do I get to play with it?
Lots of links on the frontpage, takes a few seconds to spot: “Join – join the n² community to get free developer stores and online support” – free, nice word that. So, I just have to email someone? Okay, I can live with that.
Documentation seems good, lots of choices though, a little hard to spot a single thread to follow to get up to speed, but Guides and Tutorials looks right to get going with. The Kniblet tutorial (whatever a kniblet is) looks the most beginnerish, and it’s also very PHP focussed, which is either a good thing or a bad thing depending on the user 🙂
Openlink Virtuoso – http://virtuoso.openlinksw.com/
Okay, I tried the Download link, but I am pretty confused by what I’m greeted with:
Not sure what one to pick just to try it out, it’s late in the day, and my tolerance for all things installable has ended.
Why take the http/web-centric, newbie approach to looking at these?
Answer: In part, I am taking this approach because I have a deep belief that it
was only after relational DBs became commoditised – “You want fries
with you MySQL database?” – that the dynamic web kicked off. If we want
the semantic web to kick off, we need to commoditise it or at least, make
it very easy for developers to get started. And I mean EASY. A query that I want answered is: “Is there something that fits: ‘apt-get install
triplestore; r = store(‘localhost’), r.add(rdf), r.query(blah)’? ”
NB I’ve short circuited the discovery of software homepages – Imagine
I’ve seen projects stating that they use “XXXXX as a triplestore”. I know
this will likely mean I’ve compared apples to oranges, but as a newbie, how
would I be expected to know this? “Powered by the Talis Platform” and
“Powered by Jena” seem pretty similar on the surface.)