Update CouchDB Documents With A Javascript Map Function
As a part of typical database administration duties, you may need to update a large number of documents in a CouchDB database. The new cdbmorph command included in the couchdb-dump npm package lets you supply a Javascript map function to do this.
cdbmorph lets you write a simple javascript function and apply it over all documents in a CouchDB database, or, just those included in a view. The function details are as follows:
function(doc, callback)
Arguments
- doc - CouchDB document (json).
- callback(err, doc) - A callback function used to return the morphed document or an error. If an error is returned the unchanged document is included in the output stream and the error is printed on console.error. Returning null as the doc argument causes the document to be excluded from the output.
An Example
Now just to demonstrate how this works lets say you have a database with just these four documents in it:
{ docs: [ { "_id": "5a278653b00f54b7dacc6ec33f02cd56", "_rev": "1-ffaaac8d7d7d52032103c480e5d46a47", "type": "jedi", "midichloriancount": 27700, "lostlightsabercount": 2, "name": "Anakin Skywalker" }, { "_id": "5a278653b00f54b7dacc6ec33f02db23", "_rev": "1-ab17ef4ced73534d36ba920b6881855e", "type": "jedi", "midichloriancount": 14500, "lostlightsabercount": 2, "name": "Luke Skywalker" }, { "_id": "5a278653b00f54b7dacc6ec33f02e604", "_rev": "1-7d53b826aa3358dc28bb2a6007dd2ccd", "type": "dark-side", "midichloriancount": 12000, "lostlightsabercount": 2, "name": "Darth Maul" }, { "_id": "5a278653b00f54b7dacc6ec33f02f489", "_rev": "1-25825e5578d05fbd55289e339b3f475f", "type": "jedi", "midichloriancount": 17700, "lostlightsabercount": 1, "name": "Yoda" } ] }
Now lets say you need to update these documents using this function:
function(doc, cb){ if(doc.type && doc.type === 'dark-side'){ doc._deleted = true; return cb(null, doc); } if(doc.name && doc.name === 'Yoda'){ return cb(); //This document will be omitted from the output } if(doc.type && doc.type === 'jedi'){ doc.jedirank = (doc.midichloriancount * 0.10) - (doc.lostlightsabercount * 5); return cb(null, doc); } //its some other kind of document like a design doc perhaps so just pass it through cb(null, doc); }
Before we look at how to apply this function over all the documents in the CouchDB database, first lets break the process down into 3 steps so it's easy to understand how it works.
Breaking down how it works
First lets dump the documents in the database out to a file:
cdbdump -k -s 2 -d starwarsdb > starwars.json
If you open starwars.json with a text editor, it would look similar to the document list above.
Now lets apply our function, which we saved in a file called update.js, over these documents:
cat starwars.json | cdbmorph -s 2 -f ./update.js > edited-starwars.json
This reads the starwars.json file and feeds each document in it through our function. The output written to edited-starwars.json is similar to starwars.json but it contains the documents as modified by our function.
Now lets update the database with these modified documents:
cat edited-starwars.json | cdbload -d starwarsdb
Assuming that the documents in the database have not been modified by some other process during the time we have been working, the documents should be updated with the changes applied by our function successfully. Otherwise we will see CouchDB update conflict messages.
Putting it all together
Now lets look at how to do it in a single step. Since each command operates on an inbound stream and writes it's output to a stream, we can chain them all together like this:
cdbdump -k -d starwarsdb | cdbmorph -f ./update.js | cdbload -d starwarsdb
This string of commands reads the documents out of the starwarsdb database as a JSON object containing a docs array. That is piped into the cdbmorph command which applies our function to each document and writes the output again in the form of a JSON object containing a `docs` array. Finally, that output is piped back into the starwarsdb database via the cdbload command which updates the database with the new documents.
Keep in mind that this process respects rev values as normal for CouchDB document updates. If other processes are modifying the same documents, you will get conflict errors while running cdbload.
For more info, see the couchdb-dump webpage at http://raffi-minassian.github.io/couchdb-dump/.
For full usage information describing all of the options available on the cdbdump, cdbmorph, and cdbload commands, visit the README at the project page on Github and maybe give it a star while you are there.
Comments, corrections, improvements, and criticisms of this article via twitter are appreciated.