Database Normalization (1NF, 2NF, 3NF, BCNF)

Normalization in databases is a systematic approach of organizing data to reduce redundancy and improve data integrity. The process typically involves dividing a database into two or more tables and defining relationships between them. This ensures that each table contains data related to one concept or theme, known as a “subject”. The main objective of normalization is to minimize duplication and to ensure data dependencies make sense, which simplifies the design and maintains the integrity of the database.

1. First Normal Form (1NF)

1NF is the basic form of database normalization and involves the following key rules:

  • Each table cell should contain a single value.
  • Each record needs to be unique, often achieved by using a primary key.

For example, if you have a table with multiple values in a single cell, like a list of phone numbers, it should be broken down into individual rows to comply with 1NF.

2. Second Normal Form (2NF)

A table in 2NF must first meet all the criteria of 1NF. In addition, it must meet these additional rules:

  • All non-key attributes must be fully functionally dependent on the primary key. This means that each non-key attribute must be a fact about the key, the whole key, and nothing but the key.

If a table contains composite keys (keys composed of more than one column), and some columns are not dependent on the entire composite key but only on part of it, the table is not in 2NF. In such cases, the table is broken down into two or more tables to achieve 2NF.

3. Third Normal Form (3NF)

A table is in 3NF if it satisfies 2NF and the additional condition:

  • There should be no transitive dependency for non-prime attributes.

In simpler terms, no non-primary key attribute should depend on another non-primary key attribute. This further eliminates redundant data and ensures that each piece of information is stored only once.

4. Boyce-Codd Normal Form (BCNF)

BCNF is a slight enhancement of 3NF. A table is in BCNF if it is in 3NF and for every functional dependency (X → Y), X is a super key. This means that X should either be a primary key or a part of it. BCNF addresses situations where 3NF does not resolve all redundancy.

A Real-World Example of Database Normalization

In this example, we’ll use a simple library management system to demonstrate database normalization.

Original Unnormalized Table:

Imagine we start with a single table recording book loans in a library:

Loan IDMember NameMember IDBook TitlesBook IDsLoan DateDue Date
1John DoeJD123[The Hobbit, 1984][B1, B2]2022-01-102022-01-24
2Jane SmithJS456[To Kill a Mockingbird][B3]2022-01-122022-01-26

This table is not normalized and has several issues like repeating groups (Book Titles and Book IDs) and storing multiple pieces of information in one row.

First Normal Form (1NF):

To bring this table into 1NF, we need to eliminate repeating groups and ensure each cell contains atomic values:

Loan IDMember NameMember IDBook TitleBook IDLoan DateDue Date
1John DoeJD123The HobbitB12022-01-102022-01-24
1John DoeJD1231984B22022-01-102022-01-24
2Jane SmithJS456To Kill a MockingbirdB32022-01-122022-01-26

Second Normal Form (2NF):

To achieve 2NF, we remove partial dependencies (non-key attributes that don’t depend on the whole primary key). We can split the table into two: one for Loan Details and one for Member Details:

Loan Details Table:

Loan IDBook TitleBook IDLoan DateDue Date
1The HobbitB12022-01-102022-01-24
11984B22022-01-102022-01-24
2To Kill a MockingbirdB32022-01-122022-01-26

Member Details Table:

Member IDMember Name
JD123John Doe
JS456Jane Smith

Third Normal Form (3NF):

To achieve 3NF, we remove transitive dependencies. Suppose the books have authors, and this information is included in the Loan Details table. We should move it to a separate Book Details table:

Loan Details Table (Revised):

Loan IDBook IDLoan DateDue Date
1B12022-01-102022-01-24
1B22022-01-102022-01-24
2B32022-01-122022-01-26

Book Details Table:

Book IDBook TitleAuthor
B1The HobbitJ.R.R. Tolkien
B21984George Orwell
B3To Kill a MockingbirdHarper Lee

Boyce-Codd Normal Form (BCNF):

For BCNF, we need to ensure that for every functional dependency, the left side is a super key. If the Book ID uniquely identifies each book, then our current design already satisfies BCNF.

This process of normalization helps in reducing redundancy (like having the same member details repeated for every loan) and improving data integrity (by ensuring that each piece of information is stored only once and in one place). Each step from 1NF to BCNF ensures a more efficient and logically structured database, which is crucial in a real-world scenario like a library management system.