Shambhu Jha: NORMALIZATION

Normalization is the process of efficiently organizing data in a database. There are two goals of the normalization process:

· Eliminating redundant data (storing the same data in more than one table)

· Ensuring data dependencies make sense (only storing related data in a table)

Both of these are worthy goals as they reduce the amount of space a database consumes and ensure that data is logically stored.

In relational database, SQL can be used to manipulate data in the following ways: insert new data, delete unwanted data, and update existing data. Similarly, in an un-normalized design, there are 3 problems that can occur when we work with the data. To understand these problems I take an example of Student table:

Student_ID	Student_Name	Address	Subject_Opted
401	Arun	Delhi	Maths
402	Sulekha	Pune	Biology
403	Pinky	Kolkata	Maths
404	Ramesh	Delhi	Physics

Update anomaly: The same data can be scattered on multiple rows therefore updates to the table may result in logical inconsistencies. For example: To update address of a student who occurs twice or more than twice in a table, we will have to update “Student_Address” column in all the rows, else the table is left in an inconsistent state.

Insertion anomaly: We tried to insert data in a record that does not exist at all. Suppose for a new admission, we have a Student_ID, Student_Name and Address of a student but if student has not opted for any subjects yet then we have to insert NULL there, leading to Insertion Anomaly.

Deletion anomaly: If (Student_ID) 401 has only one subject and temporarily he drops it, when we delete that row, entire student record will be deleted along with it.

Normalization is a method to remove all these anomalies and bring the database to a consistent state. There are several different levels of database normalization.

1. 1st Normal Form (1NF)

· Eliminate repeating groups in individual tables.

· Create a separate table for each set of related data.

· Identify each set of related data with a primary key.

2. 2nd Normal Form (2NF)

· Meet all the requirements of the first normal form.

· The non key attributes should be fully dependent on composite primary key, partial dependency is not allowed.

· Relate the tables with a foreign key.

3. 3rd Normal Form (3NF)

To achieve this normal form we have to eliminate the non-key values which are not fully functionally dependent on primary key of the table.

4. Other Normalization Forms

Fourth normal form, also called Boyce Codd Normal Form (BCNF), and fifth normal form do exist, but are rarely considered in practical design.

To explain the levels I take an example of unnormalized “Student” table:

Student_ID	Student_Name	Address	Subject_Opted
401	Rajeev	Delhi	Maths,Physics
402	Sulekha	Pune	Biology,chemistry
403	Pinky	Kolkata	Arts

After applying the rules of normalization To bring this table to first normal form, we split the table into multiple tables and now we have the resulting tables:

Table: Student

Student_ID	Student_Name	Address
401	Rajeev	Delhi
402	Sulekha	Pune
403	Pinky	Kolkata

Table: Subjects

Subject_ID	Student_ID	Subject_Opted
801	401	Maths
802	401	Physics
803	402	Biology
804	402	Chemistry
805	403	Arts

Table: Student_Subjects

Subject_ID	Student_ID
801	401
802	401
803	402
804	402
805	403

Shambhu Jha

Saturday, 20 June 2015

NORMALIZATION

No comments: