The Personalization Database
October 23, 2000
Long-term memory, courtesy of our personalization database, is really the
backbone of this architecture. We need to design a relational database that is
well suited to the type of information we want to store in each user's
"account". We speak of the word "account" to represent a
vague box of stuff that is assigned to a particular user. The "stuff"
may reside in one, two, or more database tables, depending on the nature and
needs of the system.
We'll divide the pieces of a user's account into semantically distinct tables.
That is, segregating data required for login from data about the user, data
representing user preferences, and so on. Let's imagine a small scale
personalization system containing two database tables. Remember that we're
using MySQL syntax in these examples, and column definition types may vary for
other database systems.
user_info
CREATE TABLE user_info (
userid mediumint(8) unsigned NOT NULL auto_increment,
login blob DEFAULT '' NOT NULL,
pwd blob DEFAULT '' NOT NULL,
name blob DEFAULT '' NOT NULL,
created timestamp(14),
PRIMARY KEY (userid)
);
The user_info table will contain the vital account information for the
user. Each user is assigned an integer-based unique ID, configured to simply
auto increment each time a user record is inserted into the table. The maximum
unsigned integer for this type is 16777215, so this table is limited to
supporting some 16.7 million potential users.
The login name (login), password (pwd), and display name (name) fields are all
binary objects, or blobs. Our system will recommend that users select their e-
mail address as their login name, while their on-screen name for display will
be stored in the name column. All three columns are blobs because we'll be
encrypting this information in such a way that yields binary data. More on this
in a moment.
Finally, a created column will contain a timestamp when the user record
is created, in case we ever want to know when a user joined the system. The
userid is indexed as a primary key, and will be a unique identifier with
which we relate to records in other tables within the personalization database.
Encryption of sensitive table data is especially important, in case a hacker
gains access to the database. With account information encrypted, a stolen
database table may not do much good to the spy. That said, encryption is a
sprawling topic, and there are many levels of encryption of varying degrees of
sophistication. Like an automobile, almost any type of security can be
compromised by a determined vandal. At the least, a reasonable level of
encryption will deter the "joy rider" who will move on to easier
targets.
MySQL in particular offers
several encryption functions worth looking into. In this case, we're using
MySQL's encode() and decode() functions, which will encrypt
plaintext data based on a known password. For example, consider the user's
password. One possible algorithm might be to encrypt their password using
encode() and supplying an encryption password based on a calculation of
the user's password plus a known value:
encrypt('mypassword','drowssapym#32-{sAP7!=}')
Above, we know that we can decrypt the user's password using a password based
partially on the user's own input (the attempted password in reverse) combined
with a known value (the string "#32-{sAP7!=}"). If using this
approach, it would be a good idea to devise a different sequence for each
column that you want to encrypt: in this case, the login, pwd,
and name columns.
So, to wrap up user_info, let's consider a scenario. A visitor attempts to
login to our site, submitting the login "usergal@planet.com" and the password
"farout". We've written Perl code which attempts to pull their user record from
the database, which might look like:
#!/usr/bin/perl
#Attempt user login via user_info table in database
use CGI;
my $cgiobj=new CGI;
my $login=$cgiobj->param(login);
my $password=$cgiobj->param(password);
my $passwordR=reverse(split //,$password);
my ($userid,$userlogin,$username)=&user_login($login,$password);
if ($userid) {
...login successful...
}
else { ...login failed...
}
sub user_login {
#assume this subroutine connects to the database #and returns a database handle
my $dbh=&connect_to_DB();
my $sqlquery=
qq /select userid,login,name from user_info where
decode(login,'clever_login_decryption_password')='$login' and
decode(pwd,'${passwordR}#32-{sAP7!=}')='$password'/;
my $sth=$dbh->prepare($sqlquery);
$sth->execute
|| die "Failed to access DB in search of account ".$dbh->errstr;
return $sth->fetchrow_array;
}
Accepting as CGI input the parameters login and password, the
above code constructs an SQL query that attempts to pull this user's record
from user_info. The key is the query, which compares the results of a
decode() function on the column values with the values submitted by the
user. If the query succeeds, an array of column values is returned, otherwise
nothing is returned. We test this by evaluating the presence of a value in
$userid, and from there we know if the login was successful.
If the login was successful, we have also acquired some important account
information for this user into $userid, $userlogin, and
$username.
user_prefs
Account information in hand, we can proceed. Proceed with what? The goal in
this example is to harvest the data that we want to store in our short-term
memory, the cookie. We may not need all information from the database in our
cookie, which is why, for instance, we only requested three fields in our
earlier query. But we're not yet done collecting data from the database. Now
that we've verified the user account, we want to grab some of the data from our
second table user_prefs. Here we store some of the preferences that
affect how the site appears or behaves for this user. The possibilities are
nearly endless, but let's imagine a user_prefs table with some realistic
preferences:
CREATE TABLE user_prefs ( userid mediumint(8) unsigned DEFAULT '0' NOT NULL, matchtype enum('simple','advanced') DEFAULT 'simple' NOT NULL, results_per_page tinyint(3) unsigned DEFAULT '10' NOT NULL, match_color varchar(6) DEFAULT 'FFFFCC' NOT NULL, PRIMARY KEY (userid) );
Perhaps our web site is or contains a search of some sort. The preferences
above would seem to fit into such a service. The first column, userid,
keys these records to the account records from user_info. The column
matchtype is an enumerated set -- meaning it can contain one of several
possible string values -- in this case our two fictional types of search. The
preferred number of results to display on one page is contained in
results_per_page, naturally, and we also specify a hexadecimal color
code for use in highlighting result matches. Again, these are hypothetical
preferences, and you can easily imagine an extensive set (see Raging Search's customization system for
an example).
Now that we've looked at the database behind the personalization, our long-term
memory as it were, it's time to see how we interact with client-side cookies to
implement a short-term memory.
Databases and Cookies
The Perl You Need to Know
Baking with Julia
|