Relational Database 11 | Schema Refinement, Norm Forms

Series: Relational Database

Relational Database 11 | Schema Refinement, Norm Forms

  1. Schema Refinement

(1) The Definition of the Schema

A schema is a collection of database objects including tables, views, triggers, stored procedures, indexes, etc. It helps to organize the database objects into logical groups to make them more manageable. Also, schema can be treated as a dictionary with all the information objects of the database.

To see the schema information of all the objects in the current database,

SELECT *
FROM information_schema.columns;

Or we can choose to see the schema for a specific table,

SELECT *
FROM information_schema.columns
WHERE table_name = 'boats';

(2) Data Redundancy Problems

Data redundancy occurs when the same piece of data exists in multiple places, whereas data inconsistency is when the same data exists in different formats in multiple tables. The consequences of the data redundancy are always not serious compared with the data inconsistency problem.

Commonly, there are four possible problems of the data redundancy,

  • Redundant Storage

This means the same information is stored by multiple times and they take up the storage repeatedly.

  • Update Anomalies

If one copy of repeated data is updated, an inconsistency is created unless all copies are updated.

Before: students
-------------------------------------------------
name department sid gid
-------------------------------------------------
Rick Economy 12345 102
Rick Economy 12345 102
Query:
UPDATE students SET gid = 101 WHERE name = 'Rick';
After: students
-------------------------------------------------
name department sid gid
-------------------------------------------------
Rick Economy 12345 101
Rick Economy 12345 102

Note that it must be updated at least 2 times or there will be inconsistent data in the database.

  • Insert anomalies

It may not be possible to store certain information unless some other, unrelated, information is stored as well.

Before: students (gid NOT NULL)
-------------------------------------------------
name department sid gid
-------------------------------------------------
Rick Economy 12345 102
Query:
INSERT INTO students VALUES ('Nick', 'Economy', 12321, NULL);
After: ERROR, can not insert. 

Note that this can be discovered, but however, in practice, we have handreds of records to insert at one time and this can lead to database inconsistencies due to omission.

  • Delete anomalies

It may not be possible to delete certain information without losing some other, unrelated. Information as well. For example, for the following table, if we don’t know the sid of the single student to delete but we can know that this student is from group 101 and department Economy. If we delete with these conditions, some other students can be deleted because of that.

Suppose we want to delete Wolski.
Before: students (gid NOT NULL)
-------------------------------------------------
name department sid gid
-------------------------------------------------
Rick Economy 12345 102
Bruchs Math 11235 101
Wolski Economy 12234 101
Nina Economy 15324 101
Query:
DELETE FROM students WHERE department = 'Economy' AND gid = 101;
After: students (gid NOT NULL)
-------------------------------------------------
name department sid gid
-------------------------------------------------
Rick Economy 12345 102
Bruchs Math 11235 101

2. Decomposition

(1) The Definition of Decomposition

The decomposition is a way through which we can deal with the data redundancy problems. It is defined as the process of breaking down in parts or elements. It replaces a relation with a collection of smaller relations.

(2) Properties of the Decomposition

  • Lossless Decomposition

Decomposition must be lossless. It means that the information should not get lost from the relation that is decomposed.

  • Dependency Preservation

Dependency is an important constraint on the database. Every dependency must be satisfied by at least one decomposed table. This decomposition property can only be done by maintaining the functional dependency.

  • Lack of Data Redundancy

The proper decomposition should not suffer from any data redundancy.

(3) Armstrong’s Axiom Introduction

The term Armstrong axioms refer to the sound and complete set of inference rules or axioms. When X,Y, and Z are sets of attributes over a relation schema R, then (we only have to know these three rules),

  • Reflectivity: If Y ⊆ X, then X ⇒ Y.
  • Augmentation: If X ⇒ Y, then XZ ⇒ YZ for any given Z.
  • Transitivity: If X ⇒ Y and Y ⇒ Z, then X ⇒ Z.

(4) First Normal Form (1NF)

A relation in the 1NF should satisfies the following properties,

  • A table has a primary key.
  • A table has no repeating attributes or groups attributes.

To conduct the 1NF to a relationship, we have to do the following actions:

  • Delete repeated values
  • Split columns with more than one values per record

Example 1. PRIMARY KEY = ‘sid, gid’

Without 1NF: students (sid NOT NULL)
-----------------------------------------
name sid gid
-----------------------------------------
Rick 12345 101,102
Bruchs 11235 101,102
Bruchs 11235 101,102
Wolski 12234 101
Nina 15324 101
Steps of 1NF:
- Delete repeated values: delete record #3
- Split records in rid
With 1NF: students (sid NOT NULL)
-----------------------------------------
name sid gid
-----------------------------------------
Rick 12345 101
Rick 12345 102
Bruchs 11235 101
Bruchs 11235 102
Wolski 12234 101
Nina 15324 101

Problems of 1NF:

Although there’s no Data Redundancy problems in 1NF, however, there can be insert/delete/update anomalies.

(5) Second Normal Form (2NF)

When there’re two or more entries in the PRIMARY KEY, and only a subset of some entries of the PRIMARY KEY can be represented as the PRIMARY KEY for some columns. Then there can be insert/delete/update anomalies.

Detection:

Subset of PRIMARY KEY ⇒ columns

To conduct the 2NF to a relationship, we have to do the following actions:

  • Make sure the relationship satisfies 1NF
  • Find a subset that can determine a column
  • Split the relationship into smaller relationships based on the PRIMARY KEY
  • Make sure the relationship satisfies 1NF

Example 2. PRIMARY KEY = ‘sid’

With 1NF: students (sid NOT NULL)
----------------------------------------
name sid gid
----------------------------------------
Rick 12345 101
Rick 12345 102
Bruchs 11235 101
Bruchs 11235 102
Wolski 12234 101
Nina 15324 101
Steps of 2NF:
- note that sid ⇒ name
- split the relationship
- maintain 1NF
With 2NF: studentid (sid NOT NULL)
-------------------
sid name
-------------------
12345 Rick
11235 Bruchs
12234 Wolski
15324 Nina
----------------
sid gid
----------------
12345 101
12345 102
11235 101
11235 102
12234 101
15324 101

(6) Third Normal Form (3NF)

Even when there’s only one entry in the PRIMARY KEY, some of the columns in the relationship can have the functional dependency problem,

some columns ⇒ other columns

Thus, we can split the table so there’s won’t any update anomalies.

Example 3. PRIMARY KEY 1= ‘sid’, PRIMARY KEY 2= ‘level’

With 2NF: studentid (sid NOT NULL)
----------------------------------------------
sid name level scholarship
----------------------------------------------
12345 Rick l2 3,000
11235 Bruchs l3 5,000
12234 Wolski l2 3,000
15324 Nina l4 8,000
With 3NF: studentid (sid NOT NULL)
-------------------------------
sid name level
-------------------------------
12345 Rick l2
11235 Bruchs l3
12234 Wolski l2
15324 Nina l4
---------------------------
level scholarship
---------------------------
l2 3,000
l3 5,000
l4 8,000