Proper relational database design requires that you follow the formal process called “Normalization”. Following normalization completely is the perfect way to design a “proper” database that will be well suited for data modification. A completely normalized database will suffer for queries however. This article isn’t a substitute for normalization. This is a simple checklist to help assess the design. These are some “must haves”. The reality is that many of us are given a database to work with and we wonder if the design is “good” or “bad”. These won’t quite get you to a normalized database, but they will lead you in that direction and get you closer.
- One topic per table. Employee and dependent information belong in two separate tables. In fact, employee and phone number information belong in two separate tables. Let’s look at an example:
Employee ID LastName FirstName Home Work Cell Fax
We would, of course, have some other columns related to the employee. We might also have some other phone numbers. This design is flawed. If we are in the position where we don’t know how many columns we need to support particular data, we have a problem. In this example, it’s conceivable that I could have an employee that has another cell phone number, a home fax number, a second home number, a main number at work, a main fax at work, etc. Then I would surely have employees that do not have all of those phone numbers. I would see blank values throughout the table for them and be in the situation to add just one more column with each new phone type. The column number is not endless, but it is undefined. Here’s what we should do instead. We will have two tables.
The first table will store the employee information only – EmployeeID, LastName, FirstName, and any other columns directly related to the employee. The second table will look like this.
Employee ID PhoneNumber PhoneType
We might also have a generated key for the Primary Key identifier. The value in the second design is that an employee will have a row for each phone number and we will be able to relate the contact information back to the Employee table. Our table gets deeper (more rows), but it doesn’t get wider (more columns). This gives us a big advantage in our queries because we don’t have to search through a large number of columns. We also don’t have to reserve storage space for empty values when an employee doesn’t have a particular phone type.
- One element per column. We shouldn’t have columns like Name — that include both the first and the last name. This causes problems when we query the database.
- Primary key in every table. Each record in the table should be uniquely identified.
- Foreign keys where appropriate (so you can relate them to the primary key). Foreign keys exist for the purpose of putting related data together. A foreign key is a primary key value from another table. An example is a CustomerID field in the order table. This allows us to see the customer information from the Customer table and the orders associated with that customer from the Order table.
- No derived data. In other words, if you have a Salary field and a Percent Bonus field, then you can calculate the Bonus Amount at the time of the report. This “properly” doesn’t belong in the database. However, if the report takes too long to process, you just might see derived columns in the database.
- No redundant data. If you have stored the Zip Code in a field in the table, you do not also need to store the City and State columns in that table. Consider this – we have an Employee table, Customer table, Vendor table, and Supplier table, and we have stored City, State, and Zip in each of those tables. Now the name of the city changes to a township. We must now update those values in every table. In addition, we have used unnecessary storage for each table. The solution is to have one lookup table with Zip, City, and State and to store Zip Code and not store City or State in the Employee, Customer, Vendor and Supplier tables. This reduces the amount of space needed and also makes our updates easier.
- Use appropriate data types. What’s appropriate? Choose a data type that supports the kind of information you are storing. If you are storing numeric data, choose a numeric data type. Choose data types that will support your current information as well as some future growth. Choose a data type that isn’t too large. If you only need to store the date and will never need the time, then you should choose a data type that supports just the date.