What is The Difference between 2NF and 3NF?

What is The Difference between 2NF and 3NF?

·

5 min read

Featured.jpeg

What is Normalization?

Normalization in a database is the process of organizing the data to reduce redundancy. The main idea is to segment a larger table into smaller ones and connect them through a relation.

But why should an end-user like you or me be concerned about Data Normalization?

To answer that question, we first need to understand what could happen if our data is not normalized.

Why is Database Normalization important?

Let’s assume a company stores all its data, such as employee details, personal information, etc., in a single table. This data is accessible to end-users such as developers, database admins (DBAs), etc. But what happens when multiple end-users interact with the database at the same time?

For instance, imagine a DBA is updating the database, and during the process, a developer, completely unaware, performs another operation on the database. These users now have a different view of the database altogether.

Such data mismatch when multiple users interact with the database simultaneously can be termed as an anomaly.

This is where Data Normalization steps in. It helps in avoiding data inconsistencies and provides a more organized way to store your data. So, whether you’re a database administrator or just an end-user, you need to be aware of how your data is getting stored and what it means for the company.

Now that we have a clear idea of what can happen without Data Normalization, let’s look at the various normal forms available for Database Management (DBMS).

Normal Forms

First Normal Form (1NF)

A relation is said to be in First Normal Form (1NF) if it does not contain any multi-valued or composite attribute. In other words, every single attribute in a table has to hold an atomic value; otherwise, it defies the rules of the First Normal Form.

Let’s look at an example:

First NameLast NameMovies rented
SamHollandThe Notebook, A walk to remember
JoeDunphyHarry Potter and the Goblet of Fire
HarryWilliamsInterstellar, Inception, Gravity

The last column in the table - ‘Movies rented’ is holding several values. The 1NF version of this table will look like this:

First NameLast NameMovies rented
SamHollandThe Notebook
SamHollandA walk to remember
JoeDunphyHarry Potter and the Goblet of Fire
HarryWilliamsInterstellar
HarryWilliamsInception
HarryWilliamsGravity

Now you may wonder, how does Data Normalization reduce redundancy if the above conversion doubled the number of existing rows?

This is because we have taken a trivial scenario where we considered only a single table. In the next section, we will discuss the second normal form, and it’ll make more sense as to how Data Normalization reduces the overall redundancy.

Second Normal Form (2NF)

A relation is in Second Normal Form (2NF) if:

  1. it is in First Normal Form (1NF) and,
  2. it has no partial dependency

If a non-key attribute can be determined from a proper subset of the candidate key, then the relation is said to have a partial dependency.

Let’s see an example to understand this better -

Subject TaughtTeacher IDTeacher Age
Mathematics18137
Social Sciences1129
English18137
Physics2745
Chemistry2745

The above table follows 1NF as each attribute holds a single value. However, ‘Teacher Age,’ which is a non-key attribute (as it cannot be used as an identifier - two people can have the same age), is dependent on ‘Teacher ID,’ which is a proper subset of the candidate key. Therefore, this table exhibits partial dependency and does not follow the Second Normal Form.

The 2NF conversion of the adobe table will look like this:

Table 1:

Teacher IDTeacher Age
18137
1129
2745

Table 2:

Subject TaughtTeacher ID
Mathematics181
Social Sciences11
English181
Physics27
Chemistry27

Now, we no longer need to store Teacher Age every time we add in a new course. This breakdown reduces the overall redundancy when you are dealing with a large number of rows.

Let’s have a look at the next normal form in the chain.

Third Normal Form (3NF)

A relation is in Third Normal Form (3NF) if -

  1. it is in 2NF and,
  2. it has no transitive-dependency for all the non-key attributes.

If A->B and B-> C are two functional dependencies, then A->C is a transitive dependency. If a table has such indirect dependencies, then it does not follow the Third Normal Form.

Alternatively, a relation with a functional dependency of A->B is in 3NF if one of these conditions is true -

  1. A is the superkey
  2. B is a prime attribute, i.e., B is a part of the candidate key

Consider the following table -

Employee IDEmployee NameEmployee StateEmployee CountryEmployee ZIP
1267Sam HollandCaliforniaUSA421005
4582Joe DunphyTexasUSA560051
2362Harry WilliamsFloridaUSA690087
1260Alexa StewartAlaskaUSA798423

Primary Key: Employee ID

Non-key attributes: Employee Name, Employee State, Employee Country, Employee ZIP

  1. ‘Employee ZIP’ is dependent on ‘Employee ID.’
  2. Employee State’ and ‘Employee Country’ are dependent on ‘Employee ZIP.’

Thus by the definition of transitive dependency, ‘Employee State’ and ‘Employee Country’ depend on ‘Employee ID.’ This table is, therefore, not in 3NF. We need to break down the table into two, and the final conversion looks like this -

Table 1:

Employee IDEmployee NameEmployee ZIP
1267Sam Holland421005
4582Joe Dunphy560051
2362Harry Williams690087
1260Alexa Stewart798423

Table 2:

Employee ZIPEmployee StateEmployee Country
421005CaliforniaUSA
560051TexasUSA
690087FloridaUSA
798423AlaskaUSA

The Difference between Second Normal Form (2NF) and Third Normal Form (3NF)

The overview of the three normal forms tells us one thing for sure - that each normal form is stricter than its predecessor. For instance, in 2NF, non-prime attributes are not dependent on prime (or key) attributes, but a non-prime attribute can depend on another non-prime attribute. 3NF eliminates this possibility as non-prime attributes are only dependent on the super key of the relation.

Moreover, 2NF tackles partial dependency, whereas 3NF focuses on avoiding transitive dependency. With 2NF, we saw that the repeating groups were eliminated from the table, whereas 3NF reduced the redundancy altogether. Thus, 3NF is a stronger normalization form.

A direct comparison between 2NF and 3NF is somewhat misleading as it is not an apples-to-apples comparison. 3NF is a more sophisticated case of 2NF, and thus, it wouldn’t be fair to compare these normal forms. The choice of normalization depends on your data and end goal. If you aim to reduce the main redundant data, choose 2NF. However, if you are looking to ensure referential integrity, 3NF is a better choice.