Relational Database 11 | Schema Refinement, Norm Forms

Series: Relational Database

Relational Database 11 | Schema Refinement, Norm Forms

Schema Refinement

(1) The Definition of the Schema

A schema is a collection of database objects including tables, views, triggers, stored procedures, indexes, etc. It helps to organize the database objects into logical groups to make them more manageable. Also, schema can be treated as a dictionary with all the information objects of the database.

To see the schema information of all the objects in the current database,

SELECT *
FROM information_schema.columns;

Or we can choose to see the schema for a specific table,

SELECT *
FROM information_schema.columns
WHERE table_name = 'boats';

(2) Data Redundancy Problems

Data redundancy occurs when the same piece of data exists in multiple places, whereas data inconsistency is when the same data exists in different formats in multiple tables. The consequences of the data redundancy are always not serious compared with the data inconsistency problem.

Commonly, there are four possible problems of the data redundancy,

Redundant Storage

This means the same information is stored by multiple times and they take up the storage repeatedly.

Update Anomalies

If one copy of repeated data is updated, an inconsistency is created unless all copies are updated.

Before: students
-------------------------------------------------
name       department       sid           gid    
-------------------------------------------------
Rick         Economy       12345          102
Rick         Economy       12345          102

Query:
UPDATE students SET gid = 101 WHERE name = 'Rick';

After: students
-------------------------------------------------
name       department       sid           gid    
-------------------------------------------------
Rick         Economy       12345          101
Rick         Economy       12345          102

Note that it must be updated at least 2 times or there will be inconsistent data in the database.

Insert anomalies

It may not be possible to store certain information unless some other, unrelated, information is stored as well.

Before: students (gid NOT NULL)
-------------------------------------------------
name       department       sid           gid    
-------------------------------------------------
Rick         Economy       12345          102

Query:
INSERT INTO students VALUES ('Nick', 'Economy', 12321, NULL);

After: ERROR, can not insert.

Note that this can be discovered, but however, in practice, we have handreds of records to insert at one time and this can lead to database inconsistencies due to omission.

Delete anomalies

It may not be possible to delete certain information without losing some other, unrelated. Information as well. For example, for the following table, if we don’t know the sid of the single student to delete but we can know that this student is from group 101 and department Economy. If we delete with these conditions, some other students can be deleted because of that.

Suppose we want to delete Wolski.

Before: students (gid NOT NULL)
-------------------------------------------------
name       department       sid           gid    
-------------------------------------------------
Rick         Economy       12345          102
Bruchs        Math         11235          101
Wolski       Economy       12234          101
Nina         Economy       15324          101

Query:
DELETE FROM students WHERE department = 'Economy' AND gid = 101;

After: students (gid NOT NULL)
-------------------------------------------------
name       department       sid           gid    
-------------------------------------------------
Rick         Economy       12345          102
Bruchs        Math         11235          101

2. Decomposition

(1) The Definition of Decomposition

The decomposition is a way through which we can deal with the data redundancy problems. It is defined as the process of breaking down in parts or elements. It replaces a relation with a collection of smaller relations.

(2) Properties of the Decomposition

Lossless Decomposition

Decomposition must be lossless. It means that the information should not get lost from the relation that is decomposed.

Dependency Preservation

Dependency is an important constraint on the database. Every dependency must be satisfied by at least one decomposed table. This decomposition property can only be done by maintaining the functional dependency.

Lack of Data Redundancy

The proper decomposition should not suffer from any data redundancy.

(3) Armstrong’s Axiom Introduction

The term Armstrong axioms refer to the sound and complete set of inference rules or axioms. When X,Y, and Z are sets of attributes over a relation schema R, then (we only have to know these three rules),

Reflectivity: If Y ⊆ X, then X ⇒ Y.
Augmentation: If X ⇒ Y, then XZ ⇒ YZ for any given Z.
Transitivity: If X ⇒ Y and Y ⇒ Z, then X ⇒ Z.

(4) First Normal Form (1NF)

A relation in the 1NF should satisfies the following properties,

A table has a primary key.
A table has no repeating attributes or groups attributes.

To conduct the 1NF to a relationship, we have to do the following actions:

Delete repeated values
Split columns with more than one values per record

Example 1. PRIMARY KEY = ‘sid, gid’

Without 1NF: students (sid NOT NULL)
-----------------------------------------
name           sid             gid       
-----------------------------------------
Rick          12345          101,102     
Bruchs        11235          101,102     
Bruchs        11235          101,102
Wolski        12234            101       
Nina          15324            101

Steps of 1NF:
- Delete repeated values: delete record #3
- Split records in rid

With 1NF: students (sid NOT NULL)
-----------------------------------------
name           sid             gid       
-----------------------------------------
Rick          12345            101   
Rick          12345            102
Bruchs        11235            101
Bruchs        11235            102     
Wolski        12234            101       
Nina          15324            101

Problems of 1NF:

Although there’s no Data Redundancy problems in 1NF, however, there can be insert/delete/update anomalies.

(5) Second Normal Form (2NF)

When there’re two or more entries in the PRIMARY KEY, and only a subset of some entries of the PRIMARY KEY can be represented as the PRIMARY KEY for some columns. Then there can be insert/delete/update anomalies.

Detection:

Subset of PRIMARY KEY ⇒ columns

To conduct the 2NF to a relationship, we have to do the following actions:

Make sure the relationship satisfies 1NF
Find a subset that can determine a column
Split the relationship into smaller relationships based on the PRIMARY KEY
Make sure the relationship satisfies 1NF

Example 2. PRIMARY KEY = ‘sid’

With 1NF: students (sid NOT NULL)
----------------------------------------
name           sid             gid      
----------------------------------------
Rick          12345            101      
Rick          12345            102      
Bruchs        11235            101      
Bruchs        11235            102      
Wolski        12234            101     
Nina          15324            101

Steps of 2NF:
- note that sid ⇒ name
- split the relationship
- maintain 1NF

With 2NF: studentid (sid NOT NULL)
-------------------
sid        name
-------------------
12345      Rick      
11235      Bruchs            
12234      Wolski           
15324      Nina

----------------
sid        gid
----------------
12345      101
12345      102
11235      101
11235      102
12234      101
15324      101

(6) Third Normal Form (3NF)

Even when there’s only one entry in the PRIMARY KEY, some of the columns in the relationship can have the functional dependency problem,

some columns ⇒ other columns

Thus, we can split the table so there’s won’t any update anomalies.

Example 3. PRIMARY KEY 1= ‘sid’, PRIMARY KEY 2= ‘level’

With 2NF: studentid (sid NOT NULL)
----------------------------------------------
sid        name      level       scholarship
----------------------------------------------
12345      Rick        l2           3,000
11235      Bruchs      l3           5,000
12234      Wolski      l2           3,000
15324      Nina        l4           8,000

With 3NF: studentid (sid NOT NULL)
-------------------------------
sid        name      level     
-------------------------------
12345      Rick        l2      
11235      Bruchs      l3      
12234      Wolski      l2       
15324      Nina        l4

---------------------------
level       scholarship
---------------------------
l2             3,000
l3             5,000
l4             8,000