Browsing All Posts filed under »repository«

My swiss army toolkit for distributed/multiprocessing systems

February 11, 2010


My first confession – I avoid ‘threading’ and shared memory. Avoid it like the plague, not because I cannot do it but because it can be a complete pain to build and maintain relative to the alternatives. I am very much pro multiprocessing versus multithreading – obviously, there are times when threading is by far […]

Usage stats and Redis

January 18, 2010


Redis has been such a massively useful tool to me. Recently, it has let me cut through access logs munging like a hot knife through butter, all with multiprocessing goodness. Key things: Using sets to manage botlists: >>> from redis import Redis>>> r = Redis()>>> for bot in r.smembers(“botlist”):… print bot…lycos.txtnon_engines.txtinktomi.txtmisc.txtaskjeeves.txtoucs_botswisenut.txtaltavista.txtmsn.txtgooglebotlist.txt>>> total = 0>>> for […]

The four rules of the web and compound documents

August 18, 2008


A real quirk that truly interests me is the difference in aims between the way documents are typically published and the way that the information within them is reused. A published document is normally in a single ‘format’ – a paginated layout, and this may comprise text, numerical charts, diagrams, tables of data and so […]

Trackbacks, and spammers, and DDoS, oh my!

August 18, 2008


The Idea Before I give you all the dark news about this, let me set out my position: I really, really think that repositories communicating the papers that are cited and referenced to each other is a really good thing. If a paper was deposited in the Oxford archive, and it referenced a paper held […]

A method for flexibly using external services

May 29, 2008


aka “How I picture a simple REST mechanism for queuing tasks with external services, such as transformation of PDFs to images, Documents to text, or scanning, file format identification.”Example: Document to .txt service utilisation Step 1: send the URL for the document to the service (in this example, the request is automatically accepted – code […]

Creating a web application from scratch, backed by Fedora-Commons and Apache Solr (Part 1)

February 24, 2008


(Part 1 will detail the installation and setup of the basic system, services and libraries needed for a Fedora-Commons/Apache Solr backed web ‘service’. Subsequent parts will deal with configuring and feeding the search engine, and constructing a web interface to handle article/blog/comment posting and using OpenID for authentication.) Step 1 – Get a nice clean […]

Yahoo Pipes + Solr’s API = RSS feeds for repository submitters

February 2, 2008


Quick post: Part of the reason for trying to get as many open APIs onto the services I put in place for the repository is so that I don’t have to customise things for every department or use; the interested parties can do it for themselves. As a little proof, I have set up a […]