This will be running on a small cluster of machines. These machines will already be running CouchDB, and so I have map/reduce setup to process the data as it comes into the database. This queue is to schedule other tasks, such as crawling the web, going to third party APIs and pulling down some data, etc.
I'd like the queue to be transactional so that if a worker dies before finishing the task, the task remains on the queue and another worker will pick it up. It is ok with me if I have to be responsible to make sure the tasks are idempotent.
I'd like the queue to be replicated in the cluster so that if one node happens to go down, we don't lose a chunk of the tasks from the queue. I'd rather the occasional task be done twice than any task ever be forgotten.
Each node and each process will be both adding things to the queue and taking things off to process.
If these things come setup to assume a specific language for the worker processes, then javascript or python are the preferred ones.
Would prefer it to be relatively lightweight, and require nearly zero administration.
Part of the reason I'm asking is that I think I might be using the wrong terminology to try and find this. I found DISCO (http://discoproject.org/). I could make DISCO do what I want probably, but I have Map and Reduce covered in couchDB already. I need "go ftp this zip file, uncompress it, and then run it thru this python script" kinds of tasks to be scheduled.
Thanks in advance for advice!