Operating System 8 | A Tutorial for the PThread Programming

Series: Operating System

Operating System 8 | A Tutorial for the PThread Programming

  1. PThreads Basis

(1) The Definition of PThreads

Birrel’s paper talked about threads in a general way. However, PThreads is a very concrete multithreading system and it is the default standard in the UNIX system. PThreads stands for POSIX threads, and POSIX means Portable Operating System Interface, which is a kind of interface that the operating system needs to support. Within POSIX, PThreads describes the threading APIs the operating system needs to support.

(2) PThread Headers

In order to use the PThread interfaces, we have to include the header pthread.h at the beginning of the CPP script,

#include <pthread.h>

What’s more, we can also import some other headers for some other purposes,

  • iostream because then we can use the I/O stream
  • unistd.h because then we can use the sleep function
  • errno.h because then we can have detailed errors

(3) PThread Creation

From Birrel’s paper, we have known that we can create a thread by fork. So,

hThread = fork(proc, args)

where hTread is the handler of our created thread, proc is the program counter that we want to assign for the new thread, and args is some other arguments for our new thread.

For PThread, we should use the pthread_create call to create a thread. While the return value of this call is the error value (return 1 if the creation fails) instead of the handler, we have to manually create a handler by pthread_t type (this is actually defined by typedef unsigned long int pthread_t;),

pthread_t hThread;

And then transfer this handler to the pthread_create call, which has a synopsis of,

int pthread_create(pthread_t *thread,
const pthread_attr_t *attr,
void *(*start_routine) (void *),
void *arg);

where thread should be the address of our handler, pthread_attr_t is a data structure that we can specify something about our PThread, start_routine should be the program counter we would like to assign for this thread (similar to proc), and arg is for some other arguments (similar to args).

To create our new thread, we can directly call,

pthread_create(&hThread, NULL, thread, NULL);

Where hThread is the handler that we have initialized and thread is the function for the new thread to start with. We assign NULL (means to use default values) for both the arg and the attr because we don’t want to specify these values now.

(4) PThread Termination

For Birrel’s paper, we have also talked about the join function, which allows one thread to wait until another thread completes its execution. For example,

ret = join(hThread);

is used to suspend the current thread until the hThread thread finishes.

For PThread, we have pthread_join to do exactly the same thing. The synopsis of this call is,

int pthread_join(pthread_t thread, void **value_ptr);

To terminate a thread with a handler hThread, we can directly call,

pthread_join(hThread, NULL);

We assign NULL (means to use default values) for value_ptr because we don’t want to specify this value now.

(5) Example Code 1: PThread Creation and Termination

Now, let’s implement our first code example. The purpose of this CAT.cpp code is quite simple. We ought to create a new thread by the current thread, and then we suspend the current thread and wait for the complement of the newly created thread.

To test your code, you may need to run,

$ g++ -Wall -lpthread CAT.cpp -o CAT && ./CAT

Or you can use the Makefile file as follows,

The example code is,

The output should be,

Creating a PThread...
Successfully created!
Block the current thread...
PThead: This is a pthread.
End of execution.

Note that because we use sleep(1); for the thread, so you may notice that there will be a 1-second delay before the output End of execution.

2. PThread Attributes

(1) PThread Attributes

In the example above, we have created a thread by pthread_create, and for the second argument attr, we used NULL for the default parameters. NULL is equivalent to the attr if we only do the initialization,

pthread_attr_t attr;
pthread_attr_init(&attr); // initize attr with the default values

and then call,

pthread_create(&hThread, attr, thread, NULL);

Actually, the pthread_attr_t structure can be used to specify certain attributes in a flexible way. For a thread, we have the following attributes,

  • stack size
  • scheduling policy
  • priority
  • inheritance
  • joinable
  • system/process scope

(2) Joinable PThreads and Detaching PThreads

Now, let’s talk about the joinable attribute. In PThreads, the default behavior of thread creation is joinable. A joinable child thread means that this thread can join the parent thread at a later time. So parent thread can not complete unless the joinable child thread completes in the first place. If the parent thread exits early, the children thread will become a zombie thread

However, a detached child thread means that when this new thread is created, it has no relationship with the parent thread. So once the thread is detached, this new thread can not be joined to its parent thread.

If we want to create a detached child thread, we can use,

pthread_attr_setdetachstate(&attr, PTHREAD_CREATE_DETACHED);

Note that we can always use pthread_detech to detach a thread (i.e. hThread) whenever we want by,

pthread_detach(hThread);

If we want to create a joinable child thread, we can either keep the result settings or use,

pthread_attr_setdetachstate(&attr, PTHREAD_CREATE_JOINABLE);

(3) System and Process Scope

We have talked about the system scope and the process scope for any multithreading systems. For PThreads, if we want to specify a system scope, we can use,

pthread_attr_setscope(&attr, PTHREAD_SCOPE_SYSTEM);

For PThreads, if we want to specify a process scope, we can use,

pthread_attr_setscope(&attr, PTHREAD_SCOPE_PROCESS);

(4) Set the Stack Size

We can also set the stack size of a PThread by its attribute. Let’s say that we would like to set the stack size of our new thread to 512 bytes, then we should set,

pthread_attr_setstacksize(&attr,512);

(5) Example Code 2: PThread Attributes

In this case, we are going to continue with the first example code. In the previous example, the thread function is relatively simple with,

void *thread(void *ptr) {
cout << "PThead: This is a pthread." << endl;
sleep(1);
return 0;
}

Now, we would like to create a PThread within this function. For this newly created thread, we are going to set it to a detached thread with system scope and a 512-byte stack,

pthread_attr_t attr;
pthread_attr_init(&attr);
pthread_attr_setdetachstate(&attr, PTHREAD_CREATE_DETACHED);
pthread_attr_setscope(&attr, PTHREAD_SCOPE_SYSTEM);
pthread_attr_setstacksize(&attr, 512);

Then we have to create this new thread by these attribute rules,

pthread_create(&hThread, &attr, foo, NULL);

Note that we are going to call the foo function because if we call thread function at this time, we are going to create an infinite loop of creating child threads. The foo function can be defined by,

void *foo(void *ptr) {
cout << pthread_self() << ": This is a pthread." << endl;
sleep(1);
return 0;
}

You may notice that we use the function pthread_self() and this call obtains the ID of the current thread (as tread ID). Thus, we can identify which thread we are in from the output information. We can also compile the script by either g++ or Makefile.

The final script is,

The code above can also be found in my repo.

The output for this code will be (it’s okay if the thread IDs are different),

0x1158865c0: Creating a PThread... 
0x1158865c0: Successfully created!
0x1158865c0: Block the current thread...
0x700008115000: This is a pthread.
0x700008115000: Creating a PThread...
0x700008115000: Successfully created!
0x700008115000: Block the current thread...failed
0x700008198000: This is a pthread.
0x700008115000: End of execution.
0x1158865c0: End of execution.

Or,

0x1158865c0: Creating a PThread... 
0x1158865c0: Successfully created!
0x1158865c0: Block the current thread...
0x700008115000: This is a pthread.
0x700008115000: Creating a PThread...
0x700008115000: Successfully created!
0x700008115000: Block the current thread...failed
0x700008115000: End of execution.
0x700008198000: This is a pthread.
0x1158865c0: End of execution.

The sequence of the cout can be different because of the race condition, which means that a thread tries to read/write a value while another thread modifies it. More information can be found from here.

The diagram of this program is,

(6) Example Code 3: Multiple PThreads

In this example, we are going to create two child threads for the parent thread by thread function and then create 1 thread for each of these two child threads by foo function. All the attributes of the child threads should be set as DEFAULT by NULL.

The output for this code may be (it’s okay if the thread IDs are different),

0x1100985c0: Creating a PThread...
0x1100985c0: Successfully created!
0x1100985c0: Successfully created!
0x1100985c0: Block the current thread...
0x700006951000: This is a pthread.
0x7000069d4000: This is a pthread.
0x700006951000: Creating a PThread...
0x7000069d4000: Creating a PThread...
0x700006a57000: This is a pthread.
0x7000069d4000: Successfully created!
0x7000069d4000: Block the current thread...
0x700006951000: Successfully created!
0x700006951000: Block the current thread...
0x700006ada000: This is a pthread.
0x700006951000: End of execution.
0x7000069d4000: End of execution.
0x1100985c0: End of execution.
0x1100985c0: Block the current thread...
0x1100985c0: End of execution.

Sometimes, you may notice an out-of-order output as,

0x700006f7f0000x700006efc000: Block the current thread...
: This is a pthread.

This is also because of the race condition of the output stream. You can re-execute the program several times to get an in-order output stream.

The diagram of this program is,

(7) Quiz 1

Now, let’s see a quiz. Suppose we have the following program,

Then our problem is, how many times of “Hello Thread” are we going to capture in the output? You can first answer this question and then run the code to see if you are right.

3. PThread Arguments and Mutex

(1) PThread Arguments

In the previous cases, we have used the tread IDs (like 0x…) to identify different threads. However, can we have a much simple number for us to track the threads? For example, we may want to set a unique thread number for each of the threads we have created. We are going to use the PThread arguments arg of pthread_create to implement this feature,

int pthread_create(pthread_t *thread,
const pthread_attr_t *attr,
void *(*start_routine) (void *),
void *arg);

For example, if we want to pass a unique integer i for each of these PThreads, we can call the following pthread_create function with the address of i being transmitted,

pthread_create(&hThreads, NULL, thread, &i)

And in the thread function, we can achieve this integer value by the ptr parameter,

void *thread(void *ptr) {
int *p = (int *) ptr; // convert ptr to pointer, assign to p
int num = *p; // get the value of pointer p
}

Then we can output the number by,

cout << "Thread No." << num << " has ID: " << pthread_self() << endl;

In this case, we would like to create 4 child threads for the current thread and then determine them respectively. In terms of the output, we will remove all the previous couts like Creating a PThread… to avoid an over-complicated result.

#define NUM_THREADS 4

We would also like to test the whole program for 5 times to see the result, so

#define EXE_TIMES 5

So the overall code is,

What we expect for the result is that, for each of the trials, we expect to see something like,

======== i-th trial ========
Thread No.1 has ID: 0x70000c1a2000
Thread No.2 has ID: 0x70000c225000
Thread No.3 has ID: 0x70000c2a8000
Thread No.4 has ID: 0x70000c32b000

However, we may get something like,

======== i-th trial ========
Thread No.2 has ID: Thread No.0 has ID: 0x70000c1a2000
Thread No.0 has ID: 0x70000c32b000
0x70000c225000
Thread No.0 has ID: 0x70000c2a8000

There are two problems with this result. Firstly, the cout is not outputted in turn, so different threads can write to the output stream in an arbitrary order. Secondly, the number that we have seen is not in order, which means we expect to see 1,2,3,4... instead of 2,0,0,0.... Let’s deal with these problems one by one.

(2) Mutex

First, we have seen that the cout is not safe for a multithreading system because the outputs can be out-of-order. Thus, we have to use a mutex to lock the cout until it finishes. In order to do so, we should include the mutex.h header first,

#include <mutex>

And then, we can define a global mutex variable mu by,

mutex mu;

To use this lock, we can try mu.lock(); and mu.unlock(); before and after the code segment we want to lock,

mu.lock();
cout << "Thread No." << num << " has ID: " << pthread_self() << endl;
mu.unlock();

So generally the code should be,

Now, we can get a result in the order. For example,

======== 1-th trial ========
Thread No.3 has ID: 0x700009af4000
Thread No.0 has ID: 0x700009b77000
Thread No.1 has ID: 0x700009c7d000
Thread No.2 has ID: 0x700009bfa000
======== 2-th trial ========
Thread No.1 has ID: 0x700009af4000
Thread No.3 has ID: 0x700009b77000
Thread No.3 has ID: 0x700009bfa000
Thread No.1 has ID: 0x700009c7d000
======== 3-th trial ========
Thread No.3 has ID: 0x700009af4000
Thread No.0 has ID: 0x700009b77000
Thread No.1 has ID: 0x700009c7d000
Thread No.1 has ID: 0x700009bfa000
======== 4-th trial ========
Thread No.1 has ID: 0x700009af4000
Thread No.0 has ID: 0x700009bfa000
Thread No.0 has ID: 0x700009c7d000
Thread No.1 has ID: 0x700009b77000
======== 5-th trial ========
Thread No.2 has ID: 0x700009af4000
Thread No.3 has ID: 0x700009b77000
Thread No.1 has ID: 0x700009bfa000
Thread No.3 has ID: 0x700009c7d000

(3) Output Analysis

In the output above, we can find out that the outputted result has two features,

  • it doesn’t have to be in-order. For example, we may expect
Thread No.0 has ID: 0x700009af4000
Thread No.1 has ID: 0x700009b77000
Thread No.2 has ID: 0x700009c7d000
Thread No.3 has ID: 0x700009bfa000

However, in the 1-th trial, what we have is

Thread No.3 has ID: 0x700009af4000
Thread No.0 has ID: 0x700009b77000
Thread No.1 has ID: 0x700009c7d000
Thread No.2 has ID: 0x700009bfa000

This is because we don’t have any controls on how to schedule these newly created threads. So the actual order of how these threads are actually performed is unclear and can be different from the order they were created.

  • it doesn’t have to contain every number. If some numbers are lost, some other numbers can repeat. For example, in the 2-th trial, we have
Thread No.1 has ID: 0x700009af4000
Thread No.3 has ID: 0x700009b77000
Thread No.3 has ID: 0x700009bfa000
Thread No.1 has ID: 0x700009c7d000

This is because that the value of the integer variable i is a global variable, and it is passed to the thread function by its address. So when this variable i changes in one thread, all the other threads will see the new value.

So when we create a new thread with pthread_create(&hThreads[i], NULL, thread, &i), if the parent thread does i++ before we do int num = *p; in this new thread, we can not have the original value of this thread. Instead, we will have a value of i after the increment.

(4) Missing Number Problem

Based on the discussion above, let’s now deal with the missing number problem. Because we have known that the missing value is because that the i++ is executed before the thread reads i, so we can actually assign this value to another variable to avoid the change of variable i.

Let’s say we can create a new int variable called tmp_num (temporary number),

int tmp_num;

And then we assign the value i to it before we call the pthread_create function,

tmp_num = i;

Then we call pthread_create with respect to this argument,

pthread_create(&hThreads[i], NULL, thread, &tmp_num);

The output can be,

======== 1-th trial ========
Thread No.1 has ID: 0x70000221f000
Thread No.2 has ID: 0x7000022a2000
Thread No.3 has ID: 0x700002325000
Thread No.3 has ID: 0x7000023a8000
======== 2-th trial ========
Thread No.1 has ID: 0x70000221f000
Thread No.2 has ID: 0x7000022a2000
Thread No.3 has ID: 0x700002325000
Thread No.3 has ID: 0x7000023a8000
======== 3-th trial ========
Thread No.1 has ID: 0x70000221f000
Thread No.2 has ID: 0x7000022a2000
Thread No.3 has ID: 0x700002325000
Thread No.3 has ID: 0x7000023a8000
======== 4-th trial ========
Thread No.1 has ID: 0x70000221f000
Thread No.2 has ID: 0x7000022a2000
Thread No.3 has ID: 0x700002325000
Thread No.3 has ID: 0x7000023a8000
======== 5-th trial ========
Thread No.1 has ID: 0x70000221f000
Thread No.2 has ID: 0x7000022a2000
Thread No.3 has ID: 0x700002325000
Thread No.3 has ID: 0x7000023a8000

It is pretty because now we can still find some duplicated numbers (i.e. 3 for each trial). So what’s wrong with this? Well, this is because when we pass the &tmp_num , the tmp_num is still a global variable. So if the instruction of the next loop tmp_num = i; executed before the int num = *p; , we will still have the wrong number for this.

The method to deal with this problem is that we can assign different int variables for different numbers. So the newly changed value will not influence the previous values that we have stored. The simplest way to create several int variables and then assign values to them is by creating an int array. Thus, we can create an array by,

int nums[NUM_THREADS];

And then we assign the value i to each of its elements before we call the pthread_create function,

nums[i] = i;

Then we call pthread_create with respect to this element,

pthread_create(&hThreads[i], NULL, thread, &nums[i]);

So finally, our code should be as follows. You can also find this code from my repo.

And we will have an output of,

======== 1-th trial ========
Thread No.1 has ID: 0x70000e349000
Thread No.0 has ID: 0x70000e2c6000
Thread No.2 has ID: 0x70000e3cc000
Thread No.3 has ID: 0x70000e44f000
======== 2-th trial ========
Thread No.0 has ID: 0x70000e2c6000
Thread No.1 has ID: 0x70000e349000
Thread No.3 has ID: 0x70000e44f000
Thread No.2 has ID: 0x70000e3cc000
======== 3-th trial ========
Thread No.0 has ID: 0x70000e2c6000
Thread No.1 has ID: 0x70000e349000
Thread No.2 has ID: 0x70000e3cc000
Thread No.3 has ID: 0x70000e44f000
======== 4-th trial ========
Thread No.1 has ID: 0x70000e349000
Thread No.3 has ID: 0x70000e44f000
Thread No.2 has ID: 0x70000e3cc000
Thread No.0 has ID: 0x70000e2c6000
======== 5-th trial ========
Thread No.0 has ID: 0x70000e2c6000
Thread No.2 has ID: 0x70000e3cc000
Thread No.1 has ID: 0x70000e349000
Thread No.3 has ID: 0x70000e44f000

It is quite good to see that we have no duplicated values or missing values now. But there is a problem because we still have the out-of-order execution of the threads. Well, we are not going to talk about the scheduling problem in this section.

4. Producer-Consumer Model

(1) Producer-Consumer Model

The producer-consumer problem (also known as the bounded-buffer problem) is a classic example of a multi-process synchronization problem. It manages to solve the reading and writing problems to shared buffer in a multithreading system.

The producer thread generates some data and then puts those data into a shared buffer and it should stop putting the data and wait when the buffer is filled. While the consumer thread consumes the data in this buffer and it should stop consuming the data and wait when the buffer is empty.

Note that this buffer should be a first in first out (FIFO) buffer.

(2) Shared Buffer

Now, let’s see how we can implement a shared buffer. Because the shared buffer is a FIFO data structure, we can implement it by an array buffer and three index variables num , add , rem and all these three variables should be initialized with 0.

int buffer[BUFFER_SIZE];
int num = 0;
int add = 0;
int rem = 0;

The size of the array can be specified by the macro BUFFER_SIZE with the define. Suppose we want to define a shared buffer with 5 elements, then,

#define BUFFER_SIZE 5
  • the variable num is the number of elements in this buffer. We can use this value to see if this buffer is full (when num == BUFFER_SIZE) or empty (when num == 0).
  • the variable add is the index for inserting the next data. Whenever we need to insert data into the buffer, we can use the buffer[add] = value; instruction. After inserting, we have to add 1 to this variable ( by add++;) to trace the next inserting index. Remember, we should also increase num by 1 if we insert a unit of new data.
  • the variable rem is the index for removing the next data. Whenever we need to remove data from the buffer, we can directly add 1 to this variable (by rem++;). Remember, we should also decrease num by 1 if we remove a unit of the data.

The diagram of this buffer is,

(3) PThread Mutex

In the previous case, we have used the mutex header for the lock. Now, we would like to introduce the PThread mutex. In the previous case, the structure of a mutex lock is,

mutex m; /* global */
m.lock();
// Critical Section
m.unlock();

For the PThread mutex, similarly, we can have,

pthread_mutex_t m;  /* global */

But this is not enough for a PThread mutex. Each PThread mutex needs to be initialized by,

pthread_mutex_init(&m, NULL);

where NULL means the default initialization.

Or we can even initialize a mutex at the moment we define it,

pthread_mutex_t m = PTHREAD_MUTEX_INITIALIZER;  /* global */

Then, we can lock the critical section by,

pthread_mutex_lock(&m);
// Critical Section
pthread_mutex_unlock(&m);

The PThread mutex also provides some other operations which can be convenient for us to use. For example,

  • trylock call: this can be used to lock a mutex. However, if the mutex is locked, the current thread will not be blocked and this call will return EBUSY to show that the mutex was already locked. So we can do something else and maybe try again later.
pthread_mutex_trylock(&m);
  • destory call: we should remember that we need to destroy the mutex to release some memory it takes after the execution.
pthread_mutex_destory(&m);

(4) PThread Mutex Tips

We are going to meet hundreds of mutex bugs in the future, so it would be better if you keep in mind some tips when you use the mutex. Here are some common tips that we have to remember,

  • Shared data should always be accessed through a single mutex.
  • Mutex scope must be visible to all. This means that it has to be a global variable.
  • For all threads, maintain that there are global ordered locks. This means that we have to lock the mutexes in order for all threads in case of a deadlock.
  • Always unlock a mutex and also unlock the correct mutex! It is easy to forget the unlock in most cases.

(5) PThread Condition Variable

PThread also provides us a lot of operation for the condition variables. In Birrel’s mechanism, we have talked that we need to have a condition type Condition, which is provided by PThread as pthread_cond_t,

pthread_cond_t Cond;

We also talked that we have to wait for the signal by Wait(mutex, cond); and in PThread, this should be,

pthread_cond_wait(&Cond, &m);

Also, we have to use the Signal(cond); and the Broadcast(cond); calls for sending conditions, this is implemented by,

pthread_cond_signal(&Cond);

and,

pthread_cond_broadcast(&Cond);

There are also some other operation for the conditional variables, for example, we can initialize a condition by (where NULL means by default),

pthread_cond_init(&Cond, NULL);

or through its definition by,

pthread_cond_t Cond = PTHREAD_COND_INITIALIZER;

We can also destroy a condition variable by,

pthread_cond_destory(&Cond);

(6) PThread Condition Variable Tips

There are also some tips we should remember for using the condition variables,

  • Don’t forget to notify the waiting threads and make sure you are signaling or broadcasting the correct condition variable.
  • Always use broadcast if you are in doubt of using whether the signal or the broadcast. If you are wrong, you only lose some performance.
  • You don’t need a mutex for signaling or broadcasting.

(7) Producer-Consumer Model Implementation

Now, let’s implement the producer-consumer model with PThreads.

You can also find this example code from my repo. Note that you have to quit the process of this program manually because the consumer thread won’t terminate automatically (it will always listen to the new data available in the shared buffer).