
Some Considerations for
Assigning a Primary Key in a Database Table
Preamble
Database
developers must decide on how they will allocate primary keys in their database
tables.
There are
many developers who have strong beliefs about this or that way to do it. In particular,
concerning the usage of artificial or surrogate fields.
There simply is no consensus on
that question, and a definitive statement on it is therefore not possible. I will certainly reveal my own
preferences, but it is not my primary intention here to argue the relative
merits of natural versus surrogate keys.
Instead,
the focus here is on attempting to clarify the questions that the developer
needs to take into account, and to dispel some of the confusion that I often
see around this subject.
Bottom Line
In
general, each table in a relational database should have a primary key
assigned. This is a basic database
design principle.
The
primary key is made up of one or more fields in the table. The value of the
data entered into the primary key must be unique across all records. The primary key serves as the basis of
Relationships with other tables, often as the basis of joins in queries, and other
operational functions within the application.
Options
The developer is free to use whatever they want
as the primary key.
The purpose of a primary key in a table is to
ensure that each record is uniquely identified. If there is already a field, or
combination of fields, in the table whose data uniquely identifies each record,
then there is absolutely no reason why this should not serve as the primary key
for the table. This is called a
“natural key”. So, this
is one option.
Another option is
adding a field to the table, to contain meaningless record identification data,
and assign this as the primary key.
Access provides the AutoNumber data type, which lends itself to this
purpose, or otherwise there are various methods of using procedures within the
application to generate ID numbers etc.
In my opinion, both the above are valid and
useful approaches when used appropriately. Which approach is appropriate under
various circumstances is a judgement call that becomes easier with experience.
Normalisation
Simply stated, using an AutoNumber primary key
violates normalisation rules when there is a valid natural key available in the
data.
The
choice of an AutoNumber or other artificial data for use as primary key is
based on other considerations, and really has nothing to do with normalisation,
or data integrity, or any data modelling principles.
Characteristics of Candidate Data
It is
apparent that in almost all tables you would find a field or combination of
fields whose values are unique across all rows. It is also apparent that a primary key
should be natural wherever possible.
In practice, however, many developers resort to the use of surrogate
keys. Why is this?
It
generally comes down to a consideration of the characterisitcs of any natural
data which could be a candidate for primary key status. These are some of the factors:
These factors define a continuum, and most
developers eventually get themselves comfortable somewhere along that
continuum, at the point that defines where they draw the line between using
natural or surrogate primary keys.
Some take the view that only if
the data itself presents no genuine primary key, should one resort to a
surrogate key (either an AutoNumber or a derived sequence number).
At the other end of the spectrum are those who
routinely put an AutoNumber field in almost every table... though probably most
have a point at which they make an exception.
For example, in the case of simple lookup tables,
one field only, by definition unique values in that field, the purpose of which
is to simply provide the Row Source for comboboxes in order to expedite and
validate data entry into a core table, typical examples being Categories or
Colours – in those cases I expect most would make an exception.
I once saw a database that included a table of
US states. In this case, the usual
two fields – the name of the state, and the two-letter state abbreviation
code. The data in both of these
fields are unique, of course, as well as being simple, short, and stable. The state abbreviation, in other words,
meets all the criteria for making an ideal primary key. But no, in this example there was also
an AutoNumber field StateID defined as the primary key! Quite bizarre.
Convenience and Performance
In many
tables there is a single field that contains simple unique data. Membership
numbers, registration numbers, serial numbers, product codes, the list is
endless. This also applies to the
example of lookup tables previously mentioned.
In such
cases, there are many advantages to using these fields as the primary keys, and
surrogate AutoNumbers are completely unnecessary. Queries become simpler, sometimes hugely
so, and therefore more efficient, as well as reducing development time. Commonly used UI approaches such as
cascading comboboxes become simpler.
Data becomes more comprehensible.
And so on.
Developers
who use AutoNumber fields even in these circumstances have said that they
don’t have to stop and think about it on a table by table basis. Just whack in an AutoNumber and you are
assured you have a unique identifier.
I have seen it claimed that this makes the development process faster
because it is convenient. Well,
that depends how quickly you think, I suppose.
The only
real downside is that the database may be bigger, because stored foreign key
values are likely to be bigger with natural keys. Maybe in a very large database this may
need to be taken into account.
But the
balance starts to shift if the data is likely to be subject to frequent change,
for example. Access provides for
this to be easily handled, via enforcing Cascade Updates when defining
Relationships with Referential Integrity.
So the occasional renaming of a category or some such will be fine, but
some developers might feel that data instability is an indication for a
surrogate key.
Similarly,
it seems more manageable and efficient if the primary key data is short, and does
not contain special characters. The
full botanical names for plants, for example, might feel a bit complex to be
using as the basis of joins between tables, and some developers might feel that
this would be an indication for a surrogate key. The operative word here is “seems”
– the distinction is more psychological than structural.
The real
cut-off point for many developers, however, comes when the natural data will
only be unique on the basis of a combination of fields. In practice, this scenario often
occurs. A natural key would involve
a composite (multi-field) primary key.
Which means multiple joins between tables in queries, some data
manipulation code becomes more complex (sometimes considerably so), more
complex query criteria are sometimes required, etc. In other words, the balance of advantage
shifts towards the use of a surrogate key such as an AutoNumber field.
Logic Lapses
One can
often see a recommendation to put an AutoNumber field in a table,
that seems to be based on something like this type of reasoning:
An AutoNumber
is essentially meaningless data that therefore should not be exposed to the
user.
If we had
an AutoNumber, it would be the primary key.
Therefore
the primary key should not be exposed to the user.
There is
already a field in the table that uniquely identifies each record.
But this
data has meaning, and thus must be exposed to the user.
Therefore
it shouldn’t be the primary key.
Therefore
we need an AutoNumber field to be the primary key.
Suffice
it to say that this is not an example of sound logic!
Here’s
another one, somewhat related:
A primary
key’s purpose is to be a unique idenifier.
Therefore
its data can’t have any meaning.
An
AutoNumber has no meaning.
Therefore
a primary key must be an AutoNumber.
Again,
obviously the logic doesn’t hold up.
There is absolutely no reason why a field with meaningful data
can’t be the primary key.
Tricks and Traps
1. Of
course, in order to use natural data as primary key, you have to be certain
that the data will be unique and universal. That means stop and think, as
occasionally this is not as easy as it first appears. The classic example is developers in the
2. There
are a number of ways of providing a computer-generated number to uniquely
identify records. Using the
AutoNumber data type is one method, but there are other approaches that are
commonly employed. Sometimes these
numbers end up having a meaning within the business rules of the application. Invoice numbers and order numbers are
common examples. If the user cares
that there may sometimes be gaps in the numbering, AutoNumbers are not suitable
in such instances.
3. If an
AutoNumber or other artificial primary key field is introduced, because the data
requires a combination of natural fields to ensure uniqueness, there will still
be a requirement for the data in that combination of existing fields to be
unique. In many cases, therefore,
it may be advisable or necessary to set a unique index on that composite of
fields, and/or write code to validate this data.
Conclusion
There are
a number of factors to consider when assigning a primary key to a database
table.
There is
no absolute right or wrong, as evidenced by the fact that no matter which
approach you choose, you can find experienced, professional Access developers
who do the same.
Taking
into account all the considerations, the following is what I presently consider
to be the most rational and manageable approach:

Steve Schapel
![]()