Operating System 28 | Final Review for Outlines

Series: Operating System

Operating System 28 | Final Review for Outlines

OS Scheduling

(1) Metrics for OS Scheduling

throughput: task per second
the average time it took for tasks to complete
the average time that tasks spent waiting before they were scheduled
the overall CPU utilization

(2) Run-to-Completion Scheduling

As soon as a task is assigned to a CPU, it will run until it finishes or until it completes.

First-Come, First-Serve (FCFS)
Shortest-Job First (SJF)
Complex First

(3) Preemptive Scheduling

Tasks don’t hog (means exclusively occupy) the CPU until they’re completed. We will use the following policy,

Priority Scheduling: first execute the highest priority

But there can be some problems,

Starvation: stuck low priority task. The solution is priority aging
Priority Inverse

(4) Round-Robin Scheduling (RRS)

Round-robin scheduling means that we will looply access all the tasks until they’re completed. There can be two types of RRS,

Time Slicing
Without Time Slicing

(5) Run Queue

Active processes are placed in an array called a run queue. We can have different types of run queues,

Single runqueue: same run queue for I/O (slow) and CPU (fast) -bound tasks
Multi-runqueue: different run queues for I/O and CPU-bound tasks in the same layer. I/O-bound tasks will have short time slices and CPU-bound tasks have long time slices. This method needs classifications by history-based heuristics.

Multi-Level Feedback Queue (MLFQ): different run queues for I/O and CPU-bound tasks in different layers. We don’t have to do any classification.

(6) Linux Schedulers

Kernel 2.5: O(1) scheduler, not suitable for interactive tasks
Kernel 2.6.23 and later: CFS scheduler

(7) I/O-bound Tasks Vs. CPU-Bound Tasks

I/O-bound Tasks (Memory-bound Tasks): high CPI, low IPC
CPU-bound Tasks (Compute-bound Tasks): low CPI, high IPC

(8) Conclusion of Fedorova’s Paper

Mixed IPC workloads on each core results in a better performance.

Example 1-1. Run-to-Completion Scheduling and RR Scheduling

Suppose we have three tasks T1, T2, and T3 arrive in the order T1 -> T2 -> T3, which have the following execution time,

T1 =  1 s
T2 = 10 s
T3 =  1 s

For each of the three algorithms (i.e. FCFS, SJF, complex first) with run-to-completion scheduling or RR scheduling with ts = 1s. Calculate,

throughput
average completion time
average waiting time

Solution:

For run-to-completion scheduling with FCFS,

throughput = 3/12 = 0.25 task/s
ACT = (1+11+12)/3 = 8 s
AWT = (0+10+11)/3 = 4 s

For run-to-completion scheduling with SJF,

throughput = 3/12 = 0.25 task/s
ACT = (1+2+12)/3 = 5 s
AWT = (0+1+11)/3 = 1 s

For run-to-completion scheduling with complex first,

throughput = 3/12 = 0.25 task/s
ACT = (10+11+12)/3 = 11 s
AWT = (0+10+11)/3 = 7 s

For RR scheduling (ts = 1s) with FCFS,

throughput = 3/12 = 0.25 task/s
ACT = (1+12+3)/3 = 5.33 s
AWT = (0+1+2)/3 = 1 s

For RR scheduling (ts = 1s) with FCFS,

throughput = 3/12 = 0.25 task/s
ACT = (1+2+12)/3 = 5 s
AWT = (0+1+2)/3 = 1 s

For RR scheduling (ts = 1s) with complex first,

throughput = 3/12 = 0.25 task/s
ACT = (12+2+3)/3 = 5.67 s
AWT = (0+1+2)/3 = 1 s

2. Memory Management

(1) Memory Management Methods

There are two ways to manage memory,

Page-based: virtual memory is divided into fix-sized pages
Segment-based: virtual memory is divided into flexible segments

(2) Page Tables

OS generates a page table for every process that it runs, which is used for virtual to physical translation. Page table entry contains flags like,

Present (P): same as the valid bit
Dirty (D): if need to write back
Access (A): show read or write
Protection (R/W/X): show the permission

(3) Memory Management Unit

MMU is the hardware for supporting memory management. It is responsible for,

Virtual to physical address translation
Generate page faults: like the non-existing page (need to read from disk) and protection error

(4) Page Table Size

(5) Improving Page Tables

The following ways can be used to improve the page tables. These include,

Multi-level Page Tables (MLPT): reduce page table cost
Translation Look-aside Buffer (TLB): faster search to the table
Inverted Page Tables (IPT): another approach, need to do a linear search
Hashing Inverted Page Tables (hashing IPT): improving the linear search with hash tables

(6) Descriptor Table

With segments, the address space is divided into components of arbitrary granularity of arbitrary size. The descriptor table is used to translate the virtual address to the linear address.

(7) Segmentation Translation Procedure

Segmentation must be used with paging.

(8) Memory Allocator

The memory allocator is used to allocate memory and it can be implemented at either the user level or the kernel level.

Buddy memory allocator: The buddy allocator will continue subdividing consecutive memory region until it finds the smallest chunk that’s with a size of a power of 2ⁿ that can satisfy the request. This is fast but has internal fragmentation.

Slub memory allocator: The slub allocator builds custom object caches on top of slabs, and the slabs themselves represent contiguously allocated physical memory. This avoids internal and external fragmentation.

(9) Copy On Write (COW)

Forked processes continue using the virtual address space of their parent process until we need to perform a write operation.

(10) Demand Paging

Demand paging means that we we store some information on the disk and when we need them, we can load them back from the secondary storage.

(11) Page Replacement

When we memory usage is above the threshold or CPU usage is below the threshold, the pages should be swapped out. We can have some common replacement policies like,

LRU
Swap dirty bit = 0 because we don’t have to write back

(12) Failure and Recovery Management

There methods that we can handle the failures and recover from a failure,

Checkpointing
Rewind Replay: means to restart from an early point
Migration: means to restart

3. Inter-Process Communication

(1) Types of Inter-Process Communication (IPC)

Message-based (message pass) IPC: pipes, message queues, sockets
Memory-based (shared memory) IPC: shared memory, memory-mapped files

(2) Pipes

Pipes are characterized by only two processes can communicate at a time via the data stream.

(3) Message Queue

Message queue send and receive well-formatted messages between processes.

(4) Sockets

Socket is a way to communicate through different protocols by sending or receiving messages.

(5) Sys V Shared Memory IPC

generate unique key: ftok(pathname, proj_id)
create segments: shmget(shmid, size, flag)
attach process: shmaddr = shmat(shmid, addr, flags)
detach process: shmdt(shmaddr)
destroy segments: shmctl(shmid, cmd, buf)

(6) POSIX Shared Memory IPC

create files: shm_open()
attach process: mmap()
detach process: unmmap()
detech all processes: shm_close()
destroy allocated resources: shm_unlink()

4. Synchronization

(1) Spinlocks

Spinlock will be busy and spinning when locking, so it is different from the mutex.

(2) Semaphores

Semaphore is an integer number that will be block when the value equals 0. There are two operations on the semaphore,

wait: means to reduce 1
post: means to add 1

(3) Sys V Semaphore

create semaphore: sem_t sem;
initialize semaphore: sem_init(&sem, pshared, count);
semaphore wait: sem_wait(&sem);
semaphore post: sem_post(&sem);

(4) Semaphore-Mutex Implementation

Mutex is a specific kind of semaphore. It is equivalent to,

sem_t mutex;
sem_init(&mutex, 0, 1);

(5) Read/Write Lock (RWlock)

The read/write lock is a specific spinlock only for exclusive writes. This is simpler than using the condition variables.

(6) Linux RWlock

create rwlock: rwlock_t rwlock;
read lock: read_lock(&rwlock);
read unlock: read_unlock(&rwlock);
write lock: write_lock(&rwlock);
write unlock: write_unlock(&rwlock);

(7) Monitors

Monitor is a synchronization explicitly specify,

shared resources
entry procedures (read or write)
mutex locks
condition variables

(8) Other Synchronizations

serializer: make it easier to define priority, hide explicit use of condition variable
path expressions: programmer specify regex captures the correct synchronization behavior
barriers: a barrier will block threads until it figures all of them reach a certain condition
rendezvous points
optimistic wait-free synchronization (RCU)

(9) Atomic Instructions

Atomic instructions are exclusively executed instructions. There are three types of them,

test_and_set: if fits the condition, then set the value
read_and_increment: get the value, and then increase it
compare_and_swap: compare the values, and then swap them

(10) Shared Memory Processors (SMP)

The SMP is a system that all the processors will share 1 or more main memory.

(11) Cache Memory Interactions

Write allocate: means data writes in cache and memory
Non-write allocate: means data writes only in memory
Write through: directly write to the memory after write
Write back: delay write to the memory until being kicked out

(12) Cache Coherence Policies

Non-Cache-Coherent (NCC): cache coherence supported by software
Write Invalidate: simply invalidate copies of data after write by hardware
Write Update: update copies of data after write by hardware

(13) Spinlock Performance

Latency: low latency
Delay: low delay time
Contention: reduce traffic, but this conflict with the other two goals because we have to execute less atomic instructions

(14) Spinlock Implementation

Method #1. Test-then-test Implementation

spinlock_lock(lock):
    while(test_and_set(lock) == busy);

low latency and low delay
high contention (bad)

Method #2. Test-test-set Implementation

spinlock_lock(lock):
    while (1) {
        if (lock == free)
            while(test_and_set(lock) == busy);
        else
            while(lock == busy);      // wait for lock becomes free
    }

okay latency and delay
low contention for write update
high contention for write invalidate (bad)

Method #3. Delay Spinlock

spinlock_lock(lock):
    while (1) {
        if (lock == free)
            while(test_and_set(lock) == busy);
        else {
            while(lock == busy);      // wait for lock becomes free
            delay();
        }  
    }

We can use the following two

static delay: delay static value like CPUid
dynamic delay: higher contention, large range, randomly choose

The performance will be,

okay latency
static delay: unnecessary delay when low contention (bad)
dynamic delay: high contention and high delay when test_and_set fails (bad)

Method #4. Queuing Spinlock (Anderson’s lock)

init:
    flags[0] = has_lock;
    flags[1 ... N-1] = must_wait;
    queuelast = 0;                    // global

lock:
    myplace = read_and_increment(queuelast);
    // spin
    while (flags[myplace%N] == must_wait);   
     
          // now in critical section    
     
    // finally reset the entry in the queue for next thread
    flags[myplace%N] = must_wait;

unlock:
    flags[myplace%N + 1] = has_lock;

high latency (bad): because inefficient read_and_increment instructions
good delay: no wait before next signal
good contention: read_and_increment is only executed once

(15) Spinlock Implementations Performance

Basic settings are,

1~20 processes
one process per core (1~20 cores)
overheads is compared with the ideal state (less means better)
write invalidate

For more cores (higher load),

queuing lock is the best because good contention
test-test-set lock (i.e. spin on read) is the worst because of O(N²) high contention
static delay lock (i.e. static release/ref.) better than dynamic delay lock (i.e. backoff release/ref.) because there are less low contentions

For less cores (lower load),

test-test-set lock is the best because low latency
queuing lock is the worst because of high latency
dynamic delay lock is better than static delay lock because of lower delay

5. I/O Devices

(1) I/O Devices Components

control registers: includes status registers , command registers , data transfer registers
micro-controller
on-device memory
other logic

(2) Device to CPU Connnections

Peripheral Component Interconnect Bus (PCI bus)
Peripheral Component Interconnect Express Bus (PCIe bus)
Peripheral Component Interconnect Extend Bus (PCI-X bus)

(3) Interconnects

Interconnects are other connects relay on the PCI (main bus).

SCSI bus: for SCSI disks
Expansion/Perpherial Bus: keyboards, parallel port, serial port, etc.
Bridge Controller: connect different interconnects

(4) Device Drivers

Device drivers are device-specific software components responsible for,

device access
device management
device control

(5) Types of Devices

Block Devices: blocked data like disks supports directly access
Character Devices: get/put character data like keyboard
Network Devices: stream data with flexible blocks

(6) Device Files

OS maintain abstractions of devices as device files in the file system. Common device file names are,

tty : special devices representing terminal stations
sda / hda: disks like hard drives, SSDs, CD-ROMs, etc.
lp : for printers
mem : access to the physical memory
etc.

(7) Pseudo Device

Linux also supports what is called pseudo or virtual devices like,

blackhole: /dev/null
random string: /dev/random

(8) CPU-to-Device Interactions

Memory Mapped I/O: CPU write to device file and PCI send to file
I/O Port Model: special instructions for accessing a particular device
Programmed I/O: CPU directly write command and data to device registers
Direct Memory Access: CPU directly write command to device registers but data is written from the memory based on the DMA controller. We don’t have to send commands many times

(9) Device-to-CPU Interactions

Interrupt: generate an interrupt to CPU
Poll: send the device register values to CPU

(10) Device Access Procedure

perform a system call
run the in-kernel stack
invoke the appropriate driver
performs the configuration of the request to the device
perform the request
send back results

(11) OS Bypass

It’s not necessary to access a device in the kernel mode. Instead, we can used some library supported device accesses, and this is called the OS bypass.

(12) Synchronous Access and Asynchronous Access

Synchronous Access: Block the process when accessing a device
Asynchronous Access: Not block the process when accessing a device

(13) Block Device Access Procedure

User Process: Files
Kernel: File system
Driver: Generic block layer that is different types of disks
Device: Different types of disks

(14) Virtual File System (VFS)

We use virtual file systems because Linux wants to support different file systems.

(15) VFS interface

All the different file system details will be hidden by a virtual file system interface.

(16) Files

Files are the basic data elements on which VFS operates and the OS describes files as file descripters.

(17) Inode

Inodes are the true file alias that the Linux OS uses because the machine can not understand file names. Each inode can be mapped to a specific file.

(18) Directory

A directory in Linux is a file with its contents including file path — inode pairs called dentry (directory entry).

(19) Inode Data Structure

The inode contains some important information like,

list of all data blocks
permission
size of the file
whether lock or not
etc.

(20) Dentry Cache

This is used to speedup the dentry search.

(21) EXT2 File System

Boot Block: code to boot the computer
Block Groups: the storage is divided into block groups
Superblock: info about the overall block group (# of inodes, # of disk blocks, the start of the free blocks)
Group Descriptor: overall block group state (bitmaps, # of free nodes, total #of directories)
Bitmaps: used to quickly find a free block or a free inode
Inodes: each is a 128-byte long data structure that describes exactly 1 file (the owner of the file, accounting information, etc.)

(22) Inode for Data

Because the inode of a file contains a list of all data blocks of a file, we can read from these blocks to get the file data. However, the file size is limited by the inode list length.

(23) Inode with Indirect Pointers

Indirect pointers can be used for inodes to store a file with a much larger size.

(24) Disk Access Optimization

FS use several techniques to reduce the disk access and disk overheads,

caching: reduce access
prefetching: reduce access
journaling: reduce random access
I/O scheduling: reduce disk head movements

6. Virtualization

(1) Virtual Machine

Each operating system together with its applications as well as the virtual resources that it pings at us is called a virtual machine.

(2) Virtual Machine Monitor (VMM)

The virtual machine is taken to be an efficient, isolated, duplicate of the real machine, which is supported by a virtual machine monitor (VMM). The VMM is actually the layer that enables VMs to exist. It has the following characters,

essentially identical to the original machine
minor decrease in program speed
complete control of the system resources

(3) Virtualization Reasons

Consolidation
Migration
Security
Debugging
Legacy OS
etc.

(4) Bare-Metal Virtualization (BMV) Model

If all VMs rely only on the VMM, then it is called a bare-metal virtualization model.

(5) Bare-Metal Virtualization (BMV) with Service VM

If we have a privilege standard service OS control, manage, and configuring the VMM and other VMs, then it is called a BMV with Service VM. For example, Xen (service VM is dom0), and VMware ESX (service VM are remote APIs).

(6) Hosted Virtualization Model

If at the lowest level of the hosted model, there is a full-fledged host OS that manages all of the hardware resources, then this is called a hosted virtualization model. For example, KVM (hardware virtualized by QEMU).

(7) Protection Levels for Virtualization

We have two ways for virtualization protection. We can,

If all in the non-root mode,

If all in the root mode,

(8) VMexits and VMentry

Attempts by the guest OS to perform privileged operations are called VMexits. When the hypervisor completes its operation, it passes control back to the virtual machine by performing a VMentry.

(9) CPU Virtualization

Trap-and-Emulate Technique: trap to the supervisor, see illegal or legal before emulation. It has a problem because instructions at that time has no trap.

Binary Translation Technique: use software to detect the instructions that should be trapped

Full Virtualization: VM doesn’t know it’s virtual.
Paravirtualization: VM knows it’s virtual so that they can all hypercalls to access the hypervisor. This is popularized by Xen.

(10) Memory Virtualization

Direct Translation: Direct translation from VA (VMapplication) to PA(VMOS) to MA (hypervisor). Expensive.
Shadow Page Table: Create shadow page table mapping VA to MA.
Paravirtualization: No need for the ordinary page table, just a shadow page table is okay and it can use hypercalls to update the shadow page table.

(11) Device Virtualization

Passthrough Model: get to the device register directly. Exclusive and hard to share
Hypervisor-Direct Model: get to device register by VMM. Easy to share but complex and slow
Split Device Driver Model: get to device register by service VM (not through VMM). Reduce overheads and have better management

7. RPC, DFS, DSM, and Cloud Computing

(1) Remote Procedure Calls (RPC)

RPC is an IPC mechanism for cross-machine interactions. It has the complexity of sockets by providing high-level procedure call interfaces.

(2) Client Stub

It seems like an ordinary function in the client but what it does is integrating information in a buffer and send it to the server.

(3) Server Stub

It seems like a function in the server but what it does is unpacking messages from the server, conduct the operation and send the result back.

(4) RPC 9 Steps

Register: server construct with basic information
Bind: client finds the server by a database called registry
Call: client pass information to the client stub
Marshal: client stub integrate information into the buffer
Send: the client sends the message buffer to the server
Receive: server receives the message
Unmarshal: the server stub unpacks the message into information
Actual call: get arguments from the information and conduct an actual procedure call
Result: server stub sends the result back in a message

(5) Interface Definition Language (IDL)

IDLs serve as a protocol of how the agreement of client and server will be expressed. This is useful because the client and server may be written by different developers in different languages.

Language-agnostic IDL: XDR
Language-specific IDL: Java RMI

(6) IDL Content

IDL is an agreement between the server and client, which should contain information like,

procedure names
argument types
result types
version number

(7) RPC Pointers

Passing merely the pointers in a client is useless for the server, so a proper RPC will also pass the data structure related to that pointer called serializing.

(8) RPC Failures

Because RPC is unlikely to provide a detailed error, we will use the error notifications without claiming the details.

(9) Distributed File System

DFS is remote file systems supported by VFS over the network.

(10) 2 Types of DFS Models

Server Model: DFS is on the server, centralized and difficult to scale
Peer Model: DFS is on the other machines, easy to scale by adding machines

(11) Types of Remote File Services (RFS)

Upload-Download Model (UDM): means download, operate, upload. Quick operations but server loses control
True Remote File Access Model (TRFAM): means RPC, operate, download when necessary. Slow and easily overloaded
Remote File Service/Access Model (RFAM): a combination of UDM and TRFAM. If the client has data in local memory, operate, broadcast, upload when kicked out. If the client doesn’t have data in local memory, RPC, operate, download when necessary.

(12) Types of Servers

Stateless server: server doesn’t contain state information from the client. Need self-described messages from the client
Stated server: server contains state information from the client. Need Checkpointing

(13) DFS Cache Coherence

We have to maintain state information of caches on the client and server by file-sharing policies called semantics.

Session Semantics: write back when close, get the latest by open
Periodic Updates: write back periodically, get the latest by open
Immutable Files: simply only-read policy
Transaction: API supported policies

(14) DFS Replication

There are two ways for DFS replication because we need some fault tolerance,

Replication: extra servers for a copy of all the files
Partitioning: each server stores only a part of all the files

(15) DFS Examples

NFSv3: stateless
NFSv4: stated
Sprit DFS: periodic updates

(16) DSM

DSM is a service that manages the memory across multiple nodes.

Hardware Supported DSM: supported NIC
Software Supported DSM

(17) Memory Sharing Granularity

Variables: bad
Pages or Objects: better, but will have false sharing problems.

(18) Memory Access Algorithms

Single reader, single writer (SRSW): simplest but not commonly used
Multiple readers, single writer (MRSW) or Multiple readers, multiple writers (MRMW): should maintain write order and consistency

(19) DSM Performance Metrics

The main metric for measuring DSM performance is the latency.

(20) DSM Optimization

Migration: if access then move
Replication: if access then copy

(21) DSM Consistency

push invalidations when write
pull periodically on demand

(22) Memory Consistency Models

Strict Consistency: all accesses strictly in the give sequence
Sequential Consistency: different processes see accesses in the same order
Causal Consistency: one process sees accesses in order
Weak Consistency: no consistency guarantee, need synchronization

(23) Types of Internet Services

presentations: static
business logic: dynamic
database tier: data storage and management

(24) Cloud Computing Functions

shared resources
APIs for different services
billing and accounting

(25) Cloud Deployment Models

public model
private model
hybrid model
community model

(26) Cloud Service Models

Infrasture as a service (aka. IaaS) Model: Amazon EC2
Platform as a service (aka. PaaS) Model: google APPs
Software as a service (aka. SaaS) Model: Gmail

(27) Cloud Requirements

fungible resources
elasticity and flexibility
scalability
failure tolerance and exception handling
isolation for multi-tenancy
security

(30) Cloud Technique Implementations

Virtualization
Resource provisioning
Big Data
Software
Monitoring