Normalization
is the process of efficiently organizing data in a database. There are two
goals of the normalization process:
·
Eliminating redundant data
(storing the same data in more than one table)
·
Ensuring data dependencies make
sense (only storing related data in a table)
Both
of these are worthy goals as they reduce the amount of space a database
consumes and ensure that data is logically stored.
In
relational database, SQL can be used to manipulate data in the following ways:
insert new data, delete unwanted data, and update existing data. Similarly, in
an un-normalized design, there are 3 problems that can occur when we work with
the data. To understand these problems I take an example of Student table:
Student_ID
|
Student_Name
|
Address
|
Subject_Opted
|
401
|
Arun
|
Delhi
|
Maths
|
402
|
Sulekha
|
Pune
|
Biology
|
403
|
Pinky
|
Kolkata
|
Maths
|
404
|
Ramesh
|
Delhi
|
Physics
|
Update anomaly:
The same data can be scattered on multiple rows therefore updates to the table
may result in logical inconsistencies. For example: To update address of a
student who occurs twice or more than twice in a table, we will have to update
“Student_Address” column in all the rows, else the table is left in an
inconsistent state.
Insertion anomaly:
We tried to insert data in a record that does not exist at all. Suppose for a
new admission, we have a Student_ID, Student_Name and Address of a student but
if student has not opted for any subjects yet then we have to insert NULL
there, leading to Insertion Anomaly.
Deletion anomaly:
If (Student_ID) 401 has only one subject and temporarily he drops it, when we
delete that row, entire student record will be deleted along with it.
Normalization
is a method to remove all these anomalies and bring the database to a
consistent state. There are several different levels of database normalization.
1. 1st Normal Form (1NF)
·
Eliminate repeating groups in
individual tables.
·
Create a separate table for each
set of related data.
·
Identify each set of related
data with a primary key.
2. 2nd Normal Form (2NF)
·
Meet all the requirements of the
first normal form.
·
The non key attributes should be
fully dependent on composite primary key, partial dependency is not allowed.
·
Relate the tables with a foreign
key.
3. 3rd Normal Form (3NF)
To achieve this normal form we
have to eliminate the non-key values which are not fully functionally dependent
on primary key of the table.
4. Other Normalization Forms
Fourth normal form, also called Boyce Codd Normal
Form (BCNF), and fifth normal form do exist, but are rarely considered in
practical design.
To
explain the levels I take an example of unnormalized “Student” table:
Student_ID
|
Student_Name
|
Address
|
Subject_Opted
|
401
|
Rajeev
|
Delhi
|
Maths,Physics
|
402
|
Sulekha
|
Pune
|
Biology,chemistry
|
403
|
Pinky
|
Kolkata
|
Arts
|
After
applying the rules of normalization To bring this table to first normal form,
we split the table into multiple tables and now we have the resulting tables:
Table: Student
Student_ID
|
Student_Name
|
Address
|
401
|
Rajeev
|
Delhi
|
402
|
Sulekha
|
Pune
|
403
|
Pinky
|
Kolkata
|
Table: Subjects
Subject_ID
|
Student_ID
|
Subject_Opted
|
801
|
401
|
Maths
|
802
|
401
|
Physics
|
803
|
402
|
Biology
|
804
|
402
|
Chemistry
|
805
|
403
|
Arts
|
Table:
Student_Subjects
Subject_ID
|
Student_ID
|
801
|
401
|
802
|
401
|
803
|
402
|
804
|
402
|
805
|
403
|
No comments:
Post a Comment