Normalization in databases is a systematic approach of organizing data to reduce redundancy and improve data integrity. The process typically involves dividing a database into two or more tables and defining relationships between them. This ensures that each table contains data related to one concept or theme, known as a “subject”. The main objective of normalization is to minimize duplication and to ensure data dependencies make sense, which simplifies the design and maintains the integrity of the database.
1. First Normal Form (1NF)
1NF is the basic form of database normalization and involves the following key rules:
- Each table cell should contain a single value.
- Each record needs to be unique, often achieved by using a primary key.
For example, if you have a table with multiple values in a single cell, like a list of phone numbers, it should be broken down into individual rows to comply with 1NF.
2. Second Normal Form (2NF)
A table in 2NF must first meet all the criteria of 1NF. In addition, it must meet these additional rules:
- All non-key attributes must be fully functionally dependent on the primary key. This means that each non-key attribute must be a fact about the key, the whole key, and nothing but the key.
If a table contains composite keys (keys composed of more than one column), and some columns are not dependent on the entire composite key but only on part of it, the table is not in 2NF. In such cases, the table is broken down into two or more tables to achieve 2NF.
3. Third Normal Form (3NF)
A table is in 3NF if it satisfies 2NF and the additional condition:
- There should be no transitive dependency for non-prime attributes.
In simpler terms, no non-primary key attribute should depend on another non-primary key attribute. This further eliminates redundant data and ensures that each piece of information is stored only once.
4. Boyce-Codd Normal Form (BCNF)
BCNF is a slight enhancement of 3NF. A table is in BCNF if it is in 3NF and for every functional dependency (X → Y), X is a super key. This means that X should either be a primary key or a part of it. BCNF addresses situations where 3NF does not resolve all redundancy.
A Real-World Example of Database Normalization
In this example, we’ll use a simple library management system to demonstrate database normalization.
Original Unnormalized Table:
Imagine we start with a single table recording book loans in a library:
Loan ID | Member Name | Member ID | Book Titles | Book IDs | Loan Date | Due Date |
---|---|---|---|---|---|---|
1 | John Doe | JD123 | [The Hobbit, 1984] | [B1, B2] | 2022-01-10 | 2022-01-24 |
2 | Jane Smith | JS456 | [To Kill a Mockingbird] | [B3] | 2022-01-12 | 2022-01-26 |
This table is not normalized and has several issues like repeating groups (Book Titles and Book IDs) and storing multiple pieces of information in one row.
First Normal Form (1NF):
To bring this table into 1NF, we need to eliminate repeating groups and ensure each cell contains atomic values:
Loan ID | Member Name | Member ID | Book Title | Book ID | Loan Date | Due Date |
---|---|---|---|---|---|---|
1 | John Doe | JD123 | The Hobbit | B1 | 2022-01-10 | 2022-01-24 |
1 | John Doe | JD123 | 1984 | B2 | 2022-01-10 | 2022-01-24 |
2 | Jane Smith | JS456 | To Kill a Mockingbird | B3 | 2022-01-12 | 2022-01-26 |
Second Normal Form (2NF):
To achieve 2NF, we remove partial dependencies (non-key attributes that don’t depend on the whole primary key). We can split the table into two: one for Loan Details and one for Member Details:
Loan Details Table:
Loan ID | Book Title | Book ID | Loan Date | Due Date |
---|---|---|---|---|
1 | The Hobbit | B1 | 2022-01-10 | 2022-01-24 |
1 | 1984 | B2 | 2022-01-10 | 2022-01-24 |
2 | To Kill a Mockingbird | B3 | 2022-01-12 | 2022-01-26 |
Member Details Table:
Member ID | Member Name |
---|---|
JD123 | John Doe |
JS456 | Jane Smith |
Third Normal Form (3NF):
To achieve 3NF, we remove transitive dependencies. Suppose the books have authors, and this information is included in the Loan Details table. We should move it to a separate Book Details table:
Loan Details Table (Revised):
Loan ID | Book ID | Loan Date | Due Date |
---|---|---|---|
1 | B1 | 2022-01-10 | 2022-01-24 |
1 | B2 | 2022-01-10 | 2022-01-24 |
2 | B3 | 2022-01-12 | 2022-01-26 |
Book Details Table:
Book ID | Book Title | Author |
---|---|---|
B1 | The Hobbit | J.R.R. Tolkien |
B2 | 1984 | George Orwell |
B3 | To Kill a Mockingbird | Harper Lee |
Boyce-Codd Normal Form (BCNF):
For BCNF, we need to ensure that for every functional dependency, the left side is a super key. If the Book ID uniquely identifies each book, then our current design already satisfies BCNF.
This process of normalization helps in reducing redundancy (like having the same member details repeated for every loan) and improving data integrity (by ensuring that each piece of information is stored only once and in one place). Each step from 1NF to BCNF ensures a more efficient and logically structured database, which is crucial in a real-world scenario like a library management system.