A method for flexibly using external services

Posted on May 29, 2008


aka “How I picture a simple REST mechanism for queuing tasks with external services, such as transformation of PDFs to images, Documents to text, or scanning, file format identification.”
Example: Document to .txt service utilisation

Step 1: send the URL for the document to the service (in this example, the request is automatically accepted – code 201 indicates that a new resource has been created)

{(( server h - 'in the cloud/resolvable' - /x.doc exists ))}

u | ----------------- POST /jobs (msg 'http://h/x.doc') ----------------> | Service (s)
| <---------------- HTTP resp code 201 (msg 'http://s/jobs/1') ---------- |

Step 2: Check the returned resource to find out if the job has completed (an error code 4XX would be suitable if there has been a problem such as unavailability of the doc resource)

u | ----------------- GET /jobs/1 (header: "Accept: text/rdf+n3") --------> | s

If job is in progress:

u | <---------------- HTTP resp code 202 ---------------------------------- | s

If job is complete (and accept format is supported):

u | <---------------- HTTP resp code 303 (location: /jobs/1.rdf ----------- | s

u | ----------------- GET /jobs/1.rdf --------------> | s
| <---------------- HTTP 200 msg below ------------ |

@PREFIX s: <http://s/jobs/>.
@PREFIX store: <http://s/store/>.
@prefix dc: <http://purl.org/dc/elements/1.1/>.
@prefix ore: <http://www.openarchives.org/ore/terms/>.
@prefix dcterms: <http://purl.org/dc/terms/>.

"Antiword service - http://s";

"My Document"



Then, the user can get the aggregate parts as required, noting the TTL (the deleted date predicate, for which I need to find a good real choice for)

Also, as this is a transformation, the service has indicated this with the final triple, asserting that the created resource is a rendition of the original resource, but in a different format.

A report based on the item, such as something that would be output from JHOVE, Droid or a virus-scanning service, can be shown as an aggregate resource in the same way, or if the report can be rendered using RDF, can be included in the aggregation itself.

It should be straightforward to see that this response gives the opportunity for services to return zero or more files and for that reply to be self-describing. The re-use of the basic structure of the OAI-ORE profile, means that the work going into the Atom format rendition can be repicated here, so an Atom report format could also work.

General service description:

All requests have {[?pass=XXXXXXXXXXXXXXXXX]} as an optional. Service has the choice whether to support it or not.

GET /jobs
Content-negotiation applies, but default response is Atom format
List of job URIs that the user can see (without a pass, the list is just the anonymous ones if the service allows it)

POST /jobs
Body: “Resource URL”

HTTP Code 201 – Job accepted – Resp body == URI of job resource
HTTP Code 403 – Fobidden, due to bad credentials
HTTP Code 402 – Request is possible, but requires payment
– resp body => $ details and how to credit the account

DELETE /jobs/job-id
HTTP Code 200 – Job is removed from the queue as will any created resources
HTTP Code 403 – Bad credentials/Not allowed

GET /jobs/job-id
Header (optional): “Accept: application/rdf+xml” to get rdf/xml rather than the default atom, should the service support it
HTTP Code 406 – Service cannot make a response to comply with the Accept header formats
HTTP Code 202 – Job is in process – msg MAY include an expected time for completion
HTTP Code 303 – Job complete, redirect to formatted version of the report (typically /jobs/job-id.atom/rdf/etc)

GET /jobs/job-id.atom
HTTP Code 200 – Job is completed, and the msg header is the ORE map in Atom format
HTTP Code 404 – Job is not complete

Authorisation and economics

The authorisation for use of the service is a separate consideration, but ultimately it is dependent on the service implementation – if anonymous access is allowed, rate-limits, authorisation through invitation only, etc.

I would suggest the use of SSL for those service that do use it, but not HTTP Digest per se. HTTP Basic through an SSL connection should be good enough; the Digest standard is not pleasant to try to implement and get working (standard is a little ropey).

Due to the possibility of a code 402 (payment required) on a job request, it is possible to start to add in some economic valuations. It is required that the server holding the resource can respond to a HEAD request sensibly and report information such as file-size and format.

A particular passcode can be credited to allow it to make use of a service, the use of which debits the account as required. When an automated system hits upon a 402 (payment required) rather than a plain 403 (Forbidden), this could trigger mechanisms to get more credit, rather than a simple fail.

OAI-ORE spec – http://www.openarchives.org/ore/0.3/toc

HTTP status codes – http://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html

