High-Performance Computer Architecture 27 | Introduction to Storage, Magnetic Disks, Optical Disks…

Series: High-Performance Computer Architecture

High-Performance Computer Architecture 27 | Introduction to Storage, Magnetic Disks, Optical Disks, Magnetic Tape, SSD, Flash Memory, and Storage I/O

  1. Introduction to Storage

(1) The Role of the Storage

The storage keeps all the files (i.e. programs, data, settings), virtual memory (because the physical memory is not enough to hold all the virtual memory).

(2) The Performance of the Storage

To talk about the performance of the storage, what we really care about is the throughput (i.e. how many bytes can per second can we get out of the disk) or the latency (i.e. when we request a page of data, how long should we wait to get it back). Even though both the throughput and the latency are improving, they are not improving as quickly as the processor’s speed and the speed of the DRAM (although DRAM is much much slower than the processor’s speed).

In addition, what we also care about is reliability. If our processor fails, our system temporarily doesn’t work until we boot the system up. But if our disk fails, we will probably lose all the programs’ data, settings, and etc. which is much much worse than just a processor failing.

(3) Types of the Storage

The types of storage that we can use are very diverse. These include,

  • Magnetic disks: the most traditional one
  • Optical disks: such as CDs or DVDs
  • Tape: for backup
  • Flash drivers: much faster

2. Magnetic Disks

(1) Magnetic Disks (HDD)

The magnetic disk (aka. hard disk, floppy disk, hard drive). A magnetic disk has a spindle to which we can attach platters. All of the platters are attached to the same spindle and rotate at the same speed in the same direction. There will be a motor at the bottom that really drives the spindle to rotate.

If we look at a single platter, it is covered with magnetic material on both of its two sides. Both of these sides are called the surface of this platter, and the surfaces are used to store the data bits. We can access the data on the surface by adding a magnetic head to each surface. All the magnetic heads are connected together to the head assembly, which can move all the heads in unison.

Each magnetic head will be able to access a circle called track on the surface of the disk. All of the tracks at the same distance from the spindle is called a cylinder. The way we access different tracks on the platter is by moving the magnetic head closer or further from the spindle.

On one track, we don’t store data continuously through the track because usually a lot of bits are on a single track. Instead, the data along with one track is divided into different sectors. The sector is the smallest unit we can read. At the beginning of a sector, we have a preamble that is used to tell where should the sector start. Then some appropriate number of bits are provided to provide the data we need. Finally, there will be some checksums for checking the potential errors.

(2) Magnetic Disk Capacity

Based on the discussion above, we can easily derive that the capacity if a HDD is,

(3) Magnetic Disk Access Time

Now, let’s see how do we access th magnetic disk and how long does it take to get our data. We have to consider the following time,

  • Spinning time: if the disk is not spinning at the beginning, it takes several seconds to actually spin up the disks
  • Seek time: the time it takes to move the head assembly to the correct cylinder
  • Rotational latency: after we have move the heads, there are somewhere along the track we can find the sector that we are interested in
  • Data read: read until the end of the sector is seen by the head
  • Controller time: the controller need to check the checksum
  • I/O bus time: how long does it take to get the data to the main memory
  • Queuing delay: before we start our read from the disk, we have to wait for the previous reads to finish

(4) Trends for Magnetic Disks

The capacity of the magnetic disk improves exponentially at the rate of 2 per 1–2 years.

The seek time has consistently been 5–10 ms with very slow improvement. The only way we can improve the seek time is either by a faster motor or a smaller platter. In the past few years, there has been some recent improvement in that we have gone from the 5-inch diameter disk to the 3.5-inch diameter disk.

The speed of rotation has been improving. It went from 5,000 RPM (rotations per minute) to 15,000 RPM, and even beyond that. Because we have faster rotation speed, this also require the improvement in what material is the platter built out of. Another factor that has been affecting this has been the noise. Disk rotates faster turns to be noise and it also creates a higher pitch sound, so the improvement for the speed of rotation is also relatively slow.

The speed of the controller and the I/O buses has been improving at an okay rate so the controller and the bus are becoming a smaller and smaller fraction of the overall access time.

Because the seek time and the rotation speed requires better improving mechanisms, we can not subject to Moore’s law as what we get for the processors.

3. Optical Disks

(1) An Introduction to the Optical Disks

An optical disk is very similar to a hard disk that it has a platter and we store the data on the surface of the disk. The difference is that, instead of using a magnetic head, we can shoot a laser to the surface and take the information from the reflection from the material.

(2) Features of the Optical Disks

Unlike hard drives that the magnetic head tends to be close to the surface, the laser can be not so close to the surface. So the smudges and the dusts are less of a problem for it. Because of this feature, we can carry CDs and DVDs around without worrying about them not working when they gets dirty.

Because we also want to read the CDs and the DVDs on whichever machine we want, we also need to have a standardization that helps the portability of the optical disks. However, the standardization limits the technology improvements because the technology improves needs to wait for standardizing which company agrees.

4. Magnetic Tape

Now, let’s briefly talk about the magnetic tape. The magnetic tapes are also called secondary storages and they are usually used for backup purposes. Tapes can have large capacity and they are usually replaceable. But the access of a magnetic tape is actually sequential, and we have to seek along the tape until we find the data we are interested in, so it is not good for a virtual memory because we have to spend a long time on searching the data we need.

The tapes are slowing dying out mainly because they have got low production volumn. So the cost of tapes is not dropping as rapidly as disks, and the cost of the hard drives gets cheaper over time. Nowadays, people tends to buy the USB drives which is actually a hard drive packaged with a USB interface as a method of backups.

5. Using RAM for Storage

(1) Reasons for Using RAM for Storage

We have seen that the hard drives do not really benefit from the Moore’s law and the speed of them can be very slow. However, the DRAMs can benefit from Moore’s law in terms of capacity and the speed. So there is a trade-off between the RAM and the hard disk. The disk is about 100 times cheaper than the RAM per GB, but DRAM is about 100,000 times faster than than the disk.

Because there are some situation that we care less about the cost and we care more about the I/O performance, we need solid-state disk (SSD) for these cases.

(2) Solid-State Disk (SSD)

Although the solid-state disk (SSD) is called a disk, it is actually not a real disk at all because it is actually a kind of DRAM. One way to build a SSD is to combine a DRAM and a battery (because the DRAM needs a power supplier). This is even faster and more reliable, but it is much more expensive than the hard drive. However, the SSD is not good for archiving because the battery will eventually run out of the power and we will start to lose our data.

(3) Flash Memory

Today, there is another technology called the flash memory. Flash is fabricated in similar technology to the one that is used for making the DRAM, and it also uses transistors to store data, so it benefits from Moore’s Law. In addition, the most exciting thing is that the flash keeps the data alive without power. So we can store something in a flash memory for a relatively long time.

(4) Properties of the Magnetic Disk

There are, actually, some properties of gthe magnetic disk we have to keep in mind,

  • Low cost per GB
  • Huge capacity
  • Power hungary
  • Relatively slow (because of mechanical movement)
  • Sensitive to impacts while spinning

(5) Properties of the Flash Memory SSD

There are also some properties of the flash memory SSD, and these include,

  • Fast speed
  • Power efficient
  • Reliable for no moving parts

(6) Hybrid Magnetic-Flash Storage

So now we have a simple idea: if we can have both the advantages of the flash and the magnetic disk? The answer is that we can use the flash as a cache of the hard disk. By this means, although most of the data will be on the disk, but the data that we are frequently accessing will be on the flash memory and they can be accessed efficiently.

6. Storage I/O

(1) I/O Buses

Usually, the storage is connect to the system by a standardized I/O bus. It needs to have a standard because we want to be able to connect from one manufacture to a computer built by another one. This standardization process limits the rate of bus improvements.

(2) Mezzanine Bus

Instead of having just one type of bus that connects to all I/O devices, typically we have a hierarchy of buses, and these buses are called the mezzanine buses. Typically, we have a mezzanine bus such as the PCI expresses. These mezzanine buses can be fast and short, and they can be used to directly connect to fast devices like graphics.

Then we will connect storage buses to it like SATA and SCSI. These buses are specialized for storage and a SATA controller will act as a PCI express device on the mezzanine bus. The reason why we have the SATA and the SCSI, and we don’t directly connect the hard disk to the PCI express is that we would like to maintain the standard for the storage.

We can also use a USB hub or a USB bus for connecting to the PCI express. Althouth the USB bus can be really slow, its standard last longer.