Saturday, 20 June 2015

NORMALIZATION

Normalization is the process of efficiently organizing data in a database. There are two goals of the normalization process:

·         Eliminating redundant data (storing the same data in more than one table)

·         Ensuring data dependencies make sense (only storing related data in a table)

Both of these are worthy goals as they reduce the amount of space a database consumes and ensure that data is logically stored.

In relational database, SQL can be used to manipulate data in the following ways: insert new data, delete unwanted data, and update existing data. Similarly, in an un-normalized design, there are 3 problems that can occur when we work with the data. To understand these problems I take an example of Student table:

Student_ID
Student_Name
Address
Subject_Opted
401
Arun
Delhi
Maths
402
Sulekha
Pune
Biology
403
Pinky
Kolkata
Maths
404
Ramesh
Delhi
Physics

Update anomaly: The same data can be scattered on multiple rows therefore updates to the table may result in logical inconsistencies. For example: To update address of a student who occurs twice or more than twice in a table, we will have to update “Student_Address” column in all the rows, else the table is left in an inconsistent state.
Insertion anomaly: We tried to insert data in a record that does not exist at all. Suppose for a new admission, we have a Student_ID, Student_Name and Address of a student but if student has not opted for any subjects yet then we have to insert NULL there, leading to Insertion Anomaly.
Deletion anomaly: If (Student_ID) 401 has only one subject and temporarily he drops it, when we delete that row, entire student record will be deleted along with it.

Normalization is a method to remove all these anomalies and bring the database to a consistent state. There are several different levels of database normalization.
1.       1st Normal Form (1NF)
·         Eliminate repeating groups in individual tables.
·         Create a separate table for each set of related data.
·         Identify each set of related data with a primary key.

2.       2nd Normal Form (2NF)
·         Meet all the requirements of the first normal form.
·         The non key attributes should be fully dependent on composite primary key, partial dependency is not allowed.
·         Relate the tables with a foreign key.

3.       3rd Normal Form (3NF)
To achieve this normal form we have to eliminate the non-key values which are not fully functionally dependent on primary key of the table.

4.       Other Normalization Forms
Fourth normal form, also called Boyce Codd Normal Form (BCNF), and fifth normal form do exist, but are rarely considered in practical design.

To explain the levels I take an example of unnormalized “Student” table:

Student_ID
Student_Name
Address
Subject_Opted
401
Rajeev
Delhi
Maths,Physics
402
Sulekha
Pune
Biology,chemistry
403
Pinky
Kolkata
Arts

After applying the rules of normalization To bring this table to first normal form, we split the table into multiple tables and now we have the resulting tables:

Table: Student
Student_ID
Student_Name
Address
401
Rajeev
Delhi
402
Sulekha
Pune
403
Pinky
Kolkata

Table: Subjects
Subject_ID
Student_ID
Subject_Opted
801
401
Maths
802
401
Physics
803
402
Biology
804
402
Chemistry
805
403
Arts

Table: Student_Subjects
Subject_ID
Student_ID
801
401
802
401
803
402
804
402
805
403


No comments: