So I've always been told that it's absolutely necessary to have a primary key specified with a table. I've been doing some work and ran into a situation where a primary key's unique constraint would stop data I need from being added.
If there's an example situation where a table was structured with fields:
Age, First Name, Last Name, Country, Race, Gender
Where if a TON of data was being entered all these fields don't necessarily uniquely identify a row and I don't need an index across all columns anyways. Would the only solution here be to make an auto-incrementing ID field? Would it be okay to NOT have a primary at all?
It's not always necessary to have a primary key, most DBMS' will allow you to construct a table without one (a).
But that doesn't necessarily mean it's a good idea. Have a think about the situation in which you want to use that data. Now think about if you have two twenty-year-old Australian men named Bob Smith, both from Perth.
Without a unique constraint, you can put both rows into the table but her's the rub. How would you figure out which one you want to use in future? (b)
Now, if you just want to store the fact that there are one or more people meeting those criteria, you only need to store one row. But then, you'd probably have a composite primary key consisting of all columns.
If you have other information you want to store about the person (e.g., highest score in the "2048" game on their iPhone), then you don't want a primary key across the entire row, just across the columns you mention.
Unfortunately, that means there will undoubtedly come a time when both of those Bob Smith's try to write their high score to the database, only to find one of them loses their information.
If you want them both in the table and still want to allow for the possibility outlined above (two people with identical attributes in the columns you mention) then the best bet is to introduce an artificial key such as an auto-incrementing column, for the primary key. That will allow you to uniquely identify a row regardless of how identical the other columns are.
The other advantage of an artificial key is that, being arbitrary, it never needs to change for the thing being identified. In your example, if you use age, names, nationality or location (c) in your primary key, these are all subject to change, meaning that you will need to adjust any foreign keys referencing those rows. If the tables referencing these rows uses the unchanging artificial key, that will never be a problem.
(a) There are situations where a primary key doesn't really give you any performance benefit such as when the table is particularly small (such as mapping integers 1 through 12 to month name).
In other words, things where a full table scan isn't really any slower than indexing. But these situations are incredibly rare and I'd probably still use a key because it's more consistent (especially since the use of a key tends not to make a difference to the performance either way).
(b) Keep in mind that we're talking in terms of practice here rather than theory. While in practice you may create a table with no primary key, relational theory states that each row must be uniquely identifiable, otherwise relations are impossible to maintain.
C.J. Date who, along with Codd, is one of the progenitors of relational database theory, states the rules of relational tables in "An introduction to Database Systems", one of which is:
The records have a unique identifier field or field combination called the primary key.
So, in terms of relational theory, each table must have a primary key, even though it's not always required in practice.
(c) Particularly age which is guaranteed to change annually until you're dead, so perhaps date of birth may be a better choice for that column.
Collected from the Internet
Please contact [email protected] to delete if infringement.
Comments