Using python to play with a Fedora repository

Posted on December 14, 2007

8


Firstly, you’ll need some extra libraries:

(If you are using Windows, I’m afraid you are on your own with problems. I can’t help you, it’s not a system I use.)

Get easy install from here: http://peak.telecommunity.com/DevCenter/EasyInstall
(If that site is slowed to a trickle, just install ez_install.py from somewhere else that is trustworthy)

Then, as root:

easy_install ZSI
easy_install uuid
easy_install 4Suite-xml
easy_install pyxml

(There may be more, I don’t have a system set aside to try it out.)

Then create a clean working directory and grab the libraries from here:

svn co https://orasupport.ouls.ox.ac.uk/archive/archive/lib

These are of questionable quality, and are in a state of transistion from proof of concept jumbled structure into a more refined and refactored set of libraries. The main failing is that I have a mix of convenience methods which might be pretty specific in use, alongside more fundamental methods which are much more generic.

(PS, if you want to try the full archive interface out, you’ll need to inject some objects into the repository to start with, specifically the resource objects that have the xsl for the view transforms. If anyone wants, I’ll wrap these with a bow and add them when I have time.)

But, for now, they will at least provide the fundamentals to play with.

I will assume you have a Fedora repository set up somewhere, and that you know a working username and password that will let you create/edit/etc objects inside it. It also assumes the instance has an API like Fedora 2.2 especially for SOAP. I’ll post up about making the FedoraClient multi-versioned with regards to SOAP later.

For the purposes of the rest of this post, fedoraAdmin is both the username and password for the repository, and that it lives at localhost:8080/fedora.

Inside the same directory that holds the lib/ directory, start the python commandline:

~/temp$ python
Python 2.5.1c1 (release25-maint, Apr 12 2007, 21:00:25)
[GCC 4.1.2 (Ubuntu 4.1.2-0ubuntu4)] on linux2
Type “help”, “copyright”, “credits” or “license” for more information.
>>>

Now let’s get a fedora client and poke around the repository

>>> from lib.fedoraClient import FedoraClient
(cue SOAP related chugging of CPU when loading the SOAP libs)
>>> help(FedoraClient) # This will show you all sorts about this class
>>> # But we are interested in the following:
>>> f = FedoraClient(serverurl=’http://localhost:8080/fedora’, username=’fedoraAdmin’, password=’fedoraAdmin’, version=’2.2′)

Now we have the client, let’s try out a few things:

>>> print f.getDescriptionXML()
(XML related stuff in reply)

>>> f.doesObjectExist(‘namespace:pid’)
True or False depending

>>> # For example, in my dev repo:
>>> f.getContentModel(‘person:1’)
u’person’

>>> f.listDatastreams(‘ora:20’)
[{‘mimetype’: u’image/png’, ‘checksumtype’: u’DISABLED’, ‘controlgroup’: ‘M’, ‘checksum’: u’none’, ‘createdate’: u’2007-09-25T14:36:29.381Z’, ‘pid’: ‘ora:20’, ‘versionid’: u’IMAGE.0′, ‘label’: u’Downloadable stuff’, ‘formaturi’: None, ‘state’: u’A’, ‘location’: None, ‘versionable’: True, ‘winname’: u’ora_20-IMAGE.png’, ‘dsid’: u’IMAGE’, ‘size’: 0}, {‘mimetype’: u’text/xml’, ‘checksumtype’: u’DISABLED’, ‘controlgroup’: ‘X’, ‘checksum’: u’none’, ‘createdate’: u’2007-09-25T14:37:02.882Z’, ‘pid’: ‘ora:20’, ‘versionid’: u’DC.2′, ‘label’: u’Dublin Core Metadata’, ‘formaturi’: None, ‘state’: u’A’, ‘location’: None, ‘versionable’: True, ‘winname’: u’ora_20-DC.xml’, ‘dsid’: u’DC’, ‘size’: 272}, {‘mimetype’: u’text/calendar’, ‘checksumtype’: u’DISABLED’, ‘controlgroup’: ‘M’, ‘checksum’: u’none’, ‘createdate’: u’2007-09-25T14:37:03.391Z’, ‘pid’: ‘ora:20’, ‘versionid’: u’EVENT.3′, ‘label’: u’Events’, ‘formaturi’: None, ‘state’: u’A’, ‘location’: None, ‘versionable’: True, ‘winname’: u’ora_20-EVENT.ics’, ‘dsid’: u’EVENT’, ‘size’: 0}, {‘mimetype’: u’text/xml’, ‘checksumtype’: u’DISABLED’, ‘controlgroup’: ‘X’, ‘checksum’: u’none’, ‘createdate’: u’2007-08-31T14:21:39.743Z’, ‘pid’: ‘ora:20’, ‘versionid’: u’MODS.4′, ‘label’: u’MODS Record’, ‘formaturi’: None, ‘state’: u’A’, ‘location’: None, ‘versionable’: True, ‘winname’: u’ora_20-MODS.xml’, ‘dsid’: u’MODS’, ‘size’: 1730}]

>>> f.doesDatastreamExist(‘ora:20′,’DC’)
True
>>> f.doesDatastreamExist(‘ora:20′,’IMAGE’)
True
>>> f.doesDatastreamExist(‘ora:20′,’IMAGE00123’)
False

Creating new items:

The steps are as simple as creating a new blank FoXML object, and ingesting it, datastreams are uploaded and added afterwards. The first example will be trivial and the second will be more detailed.

First Demo:
http://pastebin.com/f7b1f21e7

Second Demo:
Look at the ‘createBlankItem’ method in FedoraClient. Plenty of scope for creating complex objects on the fly there.

Poking around the Triplestore:

Using the above libs:

>>> from lib.risearch import Risearch
>>> r = Risearch(server=’http://localhost:8080/fedora’) # This is the default, and equivalent to Risearch()

Then you can ask it fun things:

>>> # Retrieve a list of all the objects in the repository:
>>> pids = r.getTuples(“select $object from <#ri> where $object <http://www.w3.org/1999/02/22-rdf-syntax-ns#type&gt; <info:fedora/fedora-system:def/model#FedoraObject>”, format=’csv’, limit=’10000′).split(“\n”)[1:-1]

>>> # Get a list of the pids in a given bottom up collection (ora:neeo):
>>> pids = r.getTuples(“select $object from <#ri> where $object <fedora-rels-ext:isMemberOf> <info:fedora/ora:neeo>”, format=’csv’, limit=’10000′).split(“\n”)[1:-1]

>>> # Test to see if a certain relationship exists:
>>> # May need to change the code in risearch.py to use the old method
>>> r.doesTripleExist(‘<info:fedora/ora:1> <person:hasStatus> <info:fedora/collection:open>’)
False

Next post, I’ll write up about how Solr can be fed from objects in a Fedora repository.

Advertisements
Posted in: fedora, python, rdf, repository