Ditching the DB-based blog for a semantic one

Posted on April 17, 2008


Why Yet-another-blog-engine?

Well, blog engines tend to do the same things, their functionality is derived by simple views on a relational DB. To a large extent, I think that this RDB reliance has shaped the scope of what you can do with a blog and also I really feel it has guided how the blog (and related publishing) technology has developed.

Things like blog export, and interlinking of blogs and the bloggers who power the system are seen as extras, features added as plugins or as an additional service – this needs to change! Let’s see what naturally happens when we try to build a system with a more interesting backend.

So, what to replace the RDB with?

Simply put, the data which a blog needs to function normally can be modelled in an objectstore, by linking items together by RDF predicates. Specifically, there is a namespace created by the SIOC project, aimed at defining social networks and their inter-linking in a semantic way – http://www.sioc-project.org/. I have a strong hunch that by using this work, a whole load of extra possibilities will emerge.

(Aside from the obvious benefits of simple export and reuse of objects, being able to make a single comment on more than one blog post from more than one blog, and so on.)

Okay… what’s the plan?

So, my coding bias for objectstore and framework language is FedoraCommons and python, so no surprise there. It also means that I’ll be reusing my code, so each object will have a OAI-ORE aggregration and good search intergration (via Apache Solr).

Modelling the blog:

Luckily, the good folks at the SIOC project have done a good load of the work for me, and having read through their work, I can say that I see no problem with it for my purposes. This means that using their namespace (http://rdfs.org/sioc/ns#), I can adopt the structure of classes, helpfully illustrated here

The first class objects from or subclassed from SIOC therefore will be as follows (a first class object has a 1-to-1 parity with the underlying Fedora objects):

User [contains or links to FOAF record]Post [text and zero or more attachments]Forum(Blog) [Dublin Core]Post(Comment) [text] – Site [Dublin Core]

Other first class objects:

Link [DC record] and Petition [ Later πŸ˜‰ ]

Link speaks for itself. If a post contains a link, the link is promoted to an object and the post will connect to the link object. If someone else uses that link, the previous object is reused.

Petition is a social experiment which I will go into later πŸ˜‰

Now for the behaviour of the blog – I am envisioning an academically focussed blog, so the social network, persistance, trust and discourse are important features to consider.

Users – accounts and blogs

Frankly, I am tired of writing authentication systems. So is everyone else. Users are tired of having an account per site too. Thank god for OpenID then πŸ™‚

Right, so we have a working, live system for authenticating people, but what about authorising? Authenticate to comment, that’s self-explanatory. But this is where we can do some interesting things:

The blog engine is ‘seeded’ by a blog author or authors, most likely the same people that installed the engine. Borrowing a common idea, these seeds can invite other people to have a publishing account. This is done by an indication of trust – at a technical level <uri-inviter> <trust:trust10> <uri-invitee>, with the <uri-invitee> being the User object created to correspond to a given OpenID. The next time they log in, the ability to create a blog should be apparent.

(trust namespace: http://trust.mindswap.org/ont/trust.owl)

Why not <foaf:knows>? Because that is saved for later πŸ™‚ <trust:trust10> is a predicate intended as a way for someone to fully vouch for someone else – you may know people, but that doesn’t mean they should automatically gain blogging rights just because you have those rights.

Any post (post, comment or link) can be tagged as interesting to a User, via the <foaf:interest> predicate – declaring “I am interested in this thing” The payoff to the user is that their page (every User is an object, remember) will display the things they have marked as interesting. If a User declares that another User’s blog is ‘interesting’ then all the latter User’s posts will be accessible as well from here.

There are two forms of free-text tags; a trusted tag and a normal tag. Trusted tags are those placed on a Post, Blog or User, by the author/owner of that object – a statement about what the author feels is the subject of the object. An author can also tag themselves, and this is to give extra indication about what they normally blog or comment on, in addition to the tags they’ve put on their own items.

A normal tag can be placed by any authenticated User on any Post or Blog. Expected functionality, really.

I haven’t tripped across a suitable ontology for these two, and I’d really like to use an existing one so feel free to add a comment if you have an idea of one.

Now, I mentioned a curious thing called a Petition – this is a second method to gain trust in the social network. A User can make a Petition, stating a brief summary of what they’ll like to write about, and then tag the Petition with its subjects.

This is where it gets semantically fun – a Petition will be visible to existing blog posters with the following filters available:

  • Show Petitions that have tags in common with mine, from Users I trust
  • Show Petitions that have tags in common with mine, from Users I know
  • Show Petitions that have tags in common with mine, from Users my friends know
  • Show Petitions from Users who I’ve shown interest in
  • Show Petitions from Users who my friends show interest in
  • Show Petitions that have tags in common with mine
  • Show most recent Petitions

It doesn’t take much more time to see that the information already in the triplestore can be used to create some really interesting filters.

Now, a User can then decide to place a level of trust in a given Petitioner. The actual mechanics of what is required to elevate the trust placed in the Petitioner is up to the system installer, but a few interesting things can be used: A requirement for the combined <trust:trust> given to a User equivalent to a certain level, maybe a tiered system (2 trust9’s are equivalent to 1 trust10, etc)

The bottom line is that this trust system means that there is no requirement for a super User as the network should be self-regulating – take the trust away, and the Petitioner loses their Blog. (It also doesn’t mean that having a Super user is a bad idea!)

Posts and Posting

I hate the word blog… I’ve been using it as a crutch, as what I’d like this to be is a site to have a voice on. By using semantic relationships, it is quite possible to view it as you might a blog, but you might also view it like a forum, with posts and threaded comments. The underlying information and connections are the same, but the way you can view and present this become a lot more flexible.

So when I say Post, I mean it in a twitter/blogger/thread-reply kind of way πŸ˜‰

The Post objects have a summary (200 char limit) and an optional body (a blog ‘post’) – no title!. They can be tagged of course and the post can also hold attachments for download, or embedding here or elsewhere. (All resources have dereferenceable URIs, and so can be linked to directly)

So a Post can be a ‘tweet’, or a ‘blog’ post – it’s all the same thing. However, on the user’s jumpoff page, the summaries are listed.

I’ll have a go at putting something together, to see what works and what doesn’t, so watch this space.

Posted in: blog, fedora, idea, rdf