Pre-allocating IDs for App Engine datastore
App Engine SDK 1.2.5 is out and there's an unheralded feature that makes datastore backup and restores much easier. The feature didn't warrant a bullet in the revision history or the python SDK release notes.
I present to you the glorious
db.allocate_ids(model_instance_or_key, number_of_ids)
You may use this method to pre-allocate blocks of IDs. It returns a tuple giving you (start_id, end_id).
Why would you care? Before 1.2.5, you could use remote_api, loop through every entity in your datastore, and serialize the data to a backup file using pickle. It's a simple piece of code. The problem occurs when saving entities with numeric ids instead of a string key name you provide.
The id values are generated by the datastore. They increase over time using a counter tied to each parent entity. If you reloaded the pickled entities into a new datastore, there was a chance the new datastore could generate an id that clashed with your restored entity. With this new allocate_id() method, you can manually inflate the counters to guarantee future entities won't be assigned a conflicting id.
Try this out on some Foo(db.Model):
for i in xrange(0, 10):
foo_key = Foo().put()
logging.info("Foo put with id = %d", foo_key.id())
allocated_range = db.allocate_ids(foo_key, 2000)
for i in xrange(0, 10):
foo_key = Foo().put()
logging.info("Foo put with id = %d", foo_key.id())
For a pristine datastore, you'll probably see your returned ids go from 1 to 10, then from 2011 to 2020. All the ids from 11 to 2010 will be available for inserting entities.
Comments are closed
1 Comments
Re: Article by Ryan Barrett (2009-09-10)
thanks for the post! we're excited about this feature too, both in python as db.allocate_ids() and in java as DatastoreService.AllocateIds().
in case anyone's curious, the only reason we didn't announce this in 1.2.5 is that it's missing one final change to be fully usable in python. specifically, you can't yet set an explicit id on a new entity in python. we plan to add a new 'key' constructor parameter to db.Model and db.Expando in 1.2.6 that lets you do exactly that, at which point we'll officially document the feature as a whole.
happily, that's the only missing piece. java already supports setting an explicit id on a new entity, as does the (undocumented) lower level datastore.py interface in python. pre-allocated ids can also be manually populated in CSV data that you plan to bulk upload, as you suggest.