Shelving Objects - Page 15
June 7, 2001
This is a somewhat advanced topic, but certainly not a difficult
one. This section is likely of most interest to people whose work
involves storing or accessing pieces of data in large files,
because the Python shelve module does exactly that—
it permits the reading or writing of pieces of data in large
files, without reading or writing the entire file. For
applications which perform many accesses of large files (such as
database applications), the savings in time can be spectacular.
Like the cPickle module (which it makes use of), the
shelve module is very simple.
Let's explore it through an address book. This is the sort of
thing that is usually small enough so that an entire address file
could be read in when the application is started, and written out
when the application is done. If you're an extremely friendly
sort of person, and your address book is too big for this, better
to use shelve and not worry about it.
We'll assume that each entry in our address book consists of a
tuple of three elements, giving the first name, phone number, and
address of a person. Each entry will be indexed by the last name
of the person the entry refers to. This is so simple that our
application will just be an interactive session with the Python
shell.
First, import the shelve module, and open the
address book. shelve.open will create the address
book file if it does not exist:
>>> import shelve
>>> book = shelve.open("addresses")
Now, add a couple of entries. Notice that we're treating the
object returned by shelve.open as a dictionary
(though it is a dictionary which can only use strings as keys):
>>> book['flintstone'] = ('fred', '555-1234', '1233 Bedrock Place')
>>> book['rubble'] = ('barney', '555-4321', '1235 Bedrock Place')
Finally, close the file and end the session:
>>> book. close()
So what? Well, in that same directory, start Python again, and open the same address
book:
>>> import shelve
>>> book = shelve.open("addresses")
But now, instead of entering something, let's see if what we put in before is still
around:
>>> book['flintstone']
('fred', '555-1234', '1233 Bedrock Place')
The "addresses" file created by shelve.open in the
first interactive session has acted just like a persistent
dictionary. The data we entered before was stored to disk, even
though we did no explicit disk writes. That's exactly what
shelve does.
More generally, shelve.open returns a shelf object
which permits basic dictionary operations, key assignment or
lookup, del, and the has_ key and
keys methods. However, unlike a normal dictionary,
shelf objects store their data on disk, not in memory.
Unfortunately, shelf objects do have one significant restriction
as compared to dictionaries. They can only use strings as keys,
versus the wide range of key types allowable in dictionaries.
It's important to understand the advantage shelf objects give you
over dictionaries when dealing with large data sets.
shelve. open makes the file accessible; it does not
read an entire shelf object file into memory. File accesses are
done only when needed, typically when an element is looked up,
and the file structure is maintained in such a manner that
lookups are very fast. Even if your data file is really large,
only a couple of disk accesses will be required to locate the
desired object in the file. This can improve your program in a
number of ways. It may start faster, since it does not need to
read a potentially large file into memory. It may execute faster,
since there is more memory available to the rest of the program,
and thus less code will need to be swapped out into virtual
memory. You can operate on data sets that are otherwise too large
to fit in memory at all.
There are a few restrictions when using the shelve
module. As previously mentioned, shelf object keys can only be
strings; however, any Python object that can be pickled can be
stored under a key in a shelf object. Also, shelf objects are not
really suitable for multiuser databases, because they provide no
control for concurrent access.
Finally, make sure to close a shelf object when you are
done— this is sometimes required in order for changes you've made
(entries or deletions) to be written back to disk.
As written, the cache example of the previous section would be an
excellent candidate to be handled using shelves. You would not,
for example, have to rely on the user to explic-itly save his
work to the disk. The only possible issue is that it would not
have the low-level control when you write back to the file.
Summary
File input and output in Python is a remarkably simple but
powerful feature of the lan-guage. You can use various built-in
functions to open, read, write, and close files. For very simple
uses, you'll probably want to stick with reading and writing
text, but the struct module does give you the
ability to read or write packed binary data. Even better, the
cPickle and shelve modules provide
simple, safe, and powerful ways of saving and accessing
arbitrarily complex Python data structures, which means you may
never again need to worry about defining file formats for your
programs.
Pickling the Cache - Page 14
The Quick Python Book
|