Climbing the BTree
December 18, 2000
A BTree does not grow bees. Thankfully. We mentioned earlier the
DB_File module, which is another DBM that supports the
BerkeleyDB version 1. Using the same syntax seen above for tieing a
hash to SDBM, we could use DB_File in its normal hash mode. But
DB_File also supports another data architecture for your hash -- the
BTree.
A crucial difference between hashes and lists in Perl is that
lists are ordered. You can predict the fourth item in a list, because
it is always the fourth item (unless you move it or change the list).
But the keys of a hash are not ordered -- you can't predict the
fourth key in a hash. You can only reference hash values by their
associated key, regardless of order to other keys.
Enter the BTree -- an ordered hash! When you tie a hash to a
DB_File BTree, the hash not only preserves keys and values, but order
as well. By default, when you add a new key to such a tied hash, it
is placed in lexical (alphabetical) order relative to the other keys.
You can define your own alternative sorting routing for the order of
the BTree, but we won't get into that here. The DB_File
documentation does, though.
So, let's create our %car hash in a DB_File Btree:
use DB_File;
use Fcntl;
my %car=();
tie (%car, "DB_File", "car_data",
O_CREAT|O_RDWR, 0666, $DB_File::DB_BTREE) ||
die "Could not open or create database.";
%car = ( 'make' => 'Nissan',
'model' => 'Maxima',
'year' => '1997',
'color' => 'evergreen'
);
untie (%car);
What does it mean now that %car is stored as a BTree? For
one thing, BTrees are very fast to search, so when you ask for
$car{year} DB_File will be able to pull that value very quickly.
This is especially useful when your hash is very large, with
thousands of keys. Not quite as dramatic when there are only four
keys!
Beyond that, this hash has order. Specifically, the key
color is going to be the first key in the hash, followed by
make, model, and year -- because that's the lexical order
of these keys. DB_File maintains a "cursor" at a given position in
the hash -- this is a hypothetical pointer that says "this is where I
am in the hash". Much like a cursor in a text document. You may never
need to use this cursor, but doing so allows for some funky actions
beyond the operations that normal Perl hashes are restricted to.
For example, let's say we want to find the first key that begins
with the letter 'm', and the next key after that. First, the code,
assuming the previous example has already run and the contents of
%hash are already stored on disk as a BTree:
use DB_File;
use Fcntl;
my %car=();
my $dbm = tie (%car, "DB_File", "car_data",
O_RDONLY, 0, $DB_File::DB_BTREE) ||
die "Could not open or create database.";
#Set cursor to first key that matches
my $matchKey="m";
my $matchValue=0;
my @matchedKeys=();
$dbm->seq ($matchKey, $matchValue, R_CURSOR);
push (@matchedKeys, $matchKey);
#Move cursor to next key
$dbm->seq ($matchKey, $matchValue, R_NEXT);
#Release tied hash
undef $dbm;
untie (%car);
Several new concepts are introduced in this example. Because we're
dealing with functions that normal Perl hashes don't support, we need
to rely on functions provided by DB_File itself. When the hash is
tied, we assign it to an object handle $dbm, through which
we'll call DB_File methods.
One DB_File method is illustrated here, seq(). Provide
seq() with a scalar variable for the key and the value, along
with a keyword that tells the method what to do. Once "it" is done,
the current key and value will be assigned to the scalar variables
provided.
In the first call to seq(), we specify the keyword
R_CURSOR. This tells DB_File to set the cursor at the first key that
matches $matchKey. Since we only provided a key of "m",
DB_File will set the cursor at the key "make", the first and closest
match. Next, the key "make" and its value, "Nissan", will be assigned
to $matchKey and $matchValue respectively.
With the cursor in position, we call seq() again, this time
with the R_NEXT keyword. Consequently, DB_File moves the cursor to
the next key in order ("model") and fills the two variables with the
relevant data ("model" and "Maxima"). Of course, we could have placed
the seq( ... R_NEXT) calls inside a loop to traverse many keys
of the hash.
Once complete, we must set the object handle $dbm to undef
along with untieing the hash.
Tie a Yellow Ribbon ...
The Perl You Need to Know
Getting Deep with Hashes
|