Operating System 19 | Libcurl Library, Sys V IPC APIs, and Linux Signals

Series: Operating System

Operating System 19 | Libcurl Library, Sys V IPC APIs, and Linux Signals

  1. libcurl Library

(0) liburl Installation

There is a probability we don’t have the libcurl installed in our system. So before we use it, we have to install it first. We can use the following command to test if we have libcurl library installed.

$ curl-config --libs

If the output is,

-bash: curl-config: command not found

This means that we must install it in the first place. We can install the curl library by,

$ sudo apt install libcurl4-nss-dev

Then we can test it again,

$ cd 

If we successfully installed the libcurl, we can get the following output,

-lcurl

However, this method can not be useful when you are doing the project, because later we may meet a Problem with the SSL CA cert (code 77). When we meet this error, we would suggest deleting the whole environment and re-installing it as follows,

  • a. Open Oracle VirtualBox
  • b. power off your environment.
  • c. Right-click on the env and press remove
  • d. Then delete all files

(1) Why Should We Use Libcutrl?

You have known the CURL command in Linux and we can use this command to send an HTTP request. For example,

$ curl https://stallman.org/internet-voting.html

Then we can get the content of this webpage. The Libcurl is the library for the CURL command and we can use the libcurl’s easy C interface for the same usage. Now, let’s see how it works in the C language.

(2) Initializing Libcurl

Before we use the easy interface of the libcurl, we have to first know how to initialize the curl. Because libcurl has a global constant environment that you must set up and maintain while using libcurl. This essentially means you call curl_global_init at the start of your program and curl_global_cleanup at the end. The flag of curl_global_init should be set to CURL_GLOBAL_ALL so that it can handle with everything possible.

The curl_version can be called to print the version of the present libcurl , so that we can confirm that the environment of the libcurl is successfully set up.

The following code can be used to test how to do this initialization.

In the end, you will get a result of,

Libcurl Version: libcurl/7.54.0 LibreSSL/2.6.5 zlib/1.2.11 nghttp2/1.24.1

(3) Curl’s Easy Interface

Now, let’s do something more. Let’s see how we can mimic the command curl and print the content of a webpage as our output. In this case, the curl’s easy interface will be used. We will continue with the CURL environment that we have set up in the last section.

We can create an “easy handle” using the curl_easy_init call. The type of easy handle is a CURL type (which is defined by typedef void CURL; in the curl.h file).

We then set your desired set of options in that handle with curl_easy_setopt function. In this case, we will only use the CURLOPT_URL option, which can be used to set the URL that we are going to work on. Particularly, the value of this option will be set to https://stallman.org/internet-voting.html in our case because we are this webpage is simple enough to be used as an example.

The curl_easy_perform call will start to perform the transfer. It will get all the content of the URL that we are requesting and then directly print all the contents directly in the output.

It will be a good manner if we do the curl_easy_cleanup after we perform a translation. This function is called to eliminate all the data structures related to the easy handle that we have set.

The following code is an example that shows how the easy interface can be used by us.

The output should be,

Libcurl Version: libcurl/7.54.0 LibreSSL/2.6.5 zlib/1.2.11 nghttp2/1.24.1
<!DOCTYPE HTML>
<html>
<head>
...

(4) Write Callback

Now, we have already mastered the basic easy interface of CURL, now, let’s continue to think more about it. In the last example, we directly print the result to the terminal and this can be inconvenient for us at some time. For example, we may find it hard to save the contents as a file, or we may find it difficult to print some of the content that we need instead of all the contents. In these cases, we can modify the write callback function of writing the data.

If the callback function is not referred by us, we will get a result of the arbitrary output. Or we can use the curl_easy_setopt to reset the callback function to the callback function we are going to use for other specific cases (e.g. partially print, save as a file, etc.). The option that can be used to reset the callback function is CURLOPT_WRITEFUNCTION. For example,

curl_easy_setopt(easy_handle, CURLOPT_WRITEFUNCTION, writecb);

In order to reset the callback function properly, we should also maintain a memory data structure that can be used both in the callback function and the main function. This structure will help us store the data that we receive as a response from the HTTP server.

There need to be 2 elements maintained in this function,

  • response is the pointer to a string that can be used to store the data received from the HTTP server
  • size is used to keep the length of the data we receive

So this function can be realized by,

typedef struct memory {
char *response;
size_t size;
} datamemory;

We would like to redefine the name of this structure as datamemory. To make this structure visible for both the main function and the callback function, we will create an instance of this structure in the main function and then pass its pointer to the callback function. So the callback function can also have access to this instance. We will call the pointer to this instance of datamemory we have created as a chunk.

Before we pass the chunk to the callback function, we have to remember to allocate a range of memory to our instance. We will use the malloc function in our case and we will also have to initialize the value in this by NULL and 0,

datamemory *chunk = (datamemory *) malloc(sizeof(datamemory));
chunk->response = NULL;
chunk->size = 0;

Remember we can also use a more advanced method to initialize the structure by memset and it can be used as,

datamemory *chunk = (datamemory *) malloc(sizeof(datamemory));
memset(chunk, 0, sizeof(datamemory));

Then, the pointer to this datamemory instance can be passed to the callback function by using the CURLOPT_WRITEDATA option of curl_easy_setopt by,

curl_easy_setopt(easy_handle, CURLOPT_WRITEDATA, chunk);

The callback function writecb(void *buffer, size_t size, size_t nmemb, void *user_p) contains the following arguments,

  • buffer is the data that we have received as a response from the server
  • size is the data size of the chunks (not the same as the chunk pointer that we have discussed) in the buffer. This value is used because for most transfers, this callback gets called many times and each invoke delivers another chunk of data in the buffer.
  • nmemb is the number of chunks in the buffer
  • user_p is the chunk pointer that we have passed to this callback function by the curl_easy_setopt function

Now, we are going to mimic the process of directly print the contents as the output. What’s different is that we will get a copy of the content in the datamemory instance that can be used for us in the future in the main function.

First, the real size of the content should be calculated by multiplying the size and nmemb. The whole size of the output will be assigned to a realsize variable,

size_t realsize = size * nmemb;

To use the user_p pointer as the chunk pointer, we have to convert its data type from void * to datamemory *. The memory allocated to the response variable should be reallocated if we want to store the data in the buffer to it. The newly assigned memory should have a size of realsize + 1 so that it can be able to restore the entire data we need.

datamemory *chunk = (datamemory *) user_p;
chunk->response = realloc(chunk->response, chunk->size + realsize + 1);

Note that we have to check if there is enough memory for us to allocate when we use the realloc function,

if (chunk->response == NULL) {
printf("Out of memory!");
return 0;
}

After this procedure, we have a well-sized datamemory instance that can be used for us to store the data in the buffer. memcpy will then be called to copy the memory content in the buffer to the chunk.

memcpy(&(chunk->response[chunk->size]), buffer, realsize);

Then we can update the size element in our datamemory instance with the realsize.

chunk->size += realsize;

To directly print the data now in the datamemory instance, we can use,

printf("%s", chunk->response);

After we conduct curl_global_cleanup, because we have saved the data in the datamemory instance, we can still print the data to the output. This can be checked by adding the following codes to the end,

printf("\nCheck the existence of the data =================== \n");
char data[256];
strncpy(data, chunk->response, 100);
data[100] = '\0';
printf("%s\n", data);
printf("end ================================================= \n");

Note that this code will print only the first 100 characters in the response from the HTTP server.

The example of the code we have discussed will be,

Then the output of this code should be,

Libcurl Version: libcurl/7.54.0 LibreSSL/2.6.5 zlib/1.2.11 nghttp2/1.24.1
<!DOCTYPE HTML>
...
</html>
This is from the modified callback.
Check the existence of the data =================== 
<!DOCTYPE HTML>
<html>
<head>
<title>Internet Voting</title>
<meta http-equiv="Content-Type" con
end =================================================

(5) HTTP Requests Exceptions

When we have a wrong webpage address, we will potentially get a 404 Not Found because the HTTP server needs to prevent probing security attacks. What we expect is that, when we get a 404 Not Found response, we will not print the contents. Instead, we will print an error ERROR: Can’t find the resource. The curl_easy_getinfo function can be used to get the response code from the HTTP server. If the code is 200, that means we successfully find the resource we need. Instead, if the code is 404, that means we have a file not found error. The response code can be stored as the status variable.

long status;
curl_easy_getinfo(easy_handle, CURLINFO_RESPONSE_CODE, &status);

Then if the status is 404, we will return the function.

if (status == 404) {
fprintf(stderr, "ERROR: Can't find the resource.\n");
return 0;
}

The example code should be,

The output will be,

Libcurl Version: libcurl/7.54.0 LibreSSL/2.6.5 zlib/1.2.11 nghttp2/1.24.1
ERROR: Can't find the resource.

2. Inter-Process Communication Implementation

(0) Create a New Process

In this section, we would like to have multiple processes, so it is necessary for us to recall some of the functions for managing child processes. The fork function can be used to create a new process. The return value of this function in the parent process will be 0, but in the children process, this will be a non-zero value,

The output can be (note the value 56908 should be different),

56908: This is the child process.
0: This is the parent process.

(1) Pipe

Now, let’s see how we can use the pipe to communicate between two processes. In this case, we want to create a pipe so that the parent process can write to this pipe. After the parent writes to the pipe, the child can then read from the pipe and get the message it wants.

Before we call the function pipe to create a pipe for communication, we must maintain a 2-item array pipefd of the int type. The array pipefd is used to
 return two file descriptors referring to the two ends of the pipe. pipefd[0] refers to the read end of the pipe and pipefd[1] refers to the write end of the pipe. Then we can use the following code to create this pipe.

int pipefd[2];
if (pipe(pipefd) == -1) {
printf("Unable to create pipe\n");
return -1;
}

This pipe seems like the following diagram,

Then, we can use fork to create a new process. In the parent process, we are going to use write to send data to the pipe. For example,

write(pipefd[1], "Hi ", STREAM_SIZE);

Note because the parent process will only write to the pipe, it will never use the pipefd[0] endpoint. Thus, we can close this end for the safety concerns.

close(pipefd[0]);

Similarly, in the child process, we will first close the unused write endpoint and then read from the read endpoint.

char readmessage[STREAM_SIZE];
close(pipefd[1]);
read(pipefd[0], readmessage, sizeof(readmessage));

In general, the following example code can be used to communicate between two processes via pipe,

The output should be,

Parent Process - Message writing to pipe
Parent Process - Message writing to pipe
Child Process - Reading from pipe – Message 1 is Hi
Child Process - Reading from pipe – Message 2 is Pipe!

(2) Named Pipe

Now, we have implemented the pipe, but there is a problem. The pipe can only be used for related processes (e.g. parent process and child process) so that they can share the same pipe easily. However, there can be many situations that we want two unrelated processes (e.g. the client and the server) to communicate via the pipe. At this moment, the pipe will no longer be useful. Instead, we will use the concept of the named pipe. The named pipe is also called FIFO (means first in first out).

The named pipe is nothing like a real pipe, instead, it seems more like a special kind of file. The main idea of the named pipe is that we can attach this file to both of the two processes so that they can read or write from this file. The file follows the FIFO rule, which means that we will first read the first thing write to this file.

To create a named pipe, we must select the directory of the named pipe file. Commonly, we will put this file under the /tmp directory so that it can be automatically deleted after rebooting. In our case, we will store the file in the directory /tmp/myfifo,

#define FIFO_FILE "/tmp/myfifo"

Then the function mkfifo is called in each process to establish a FIFO file and attach this file to the current process. DEFFILEMODE is the permission we will use for this file, which simply means S_IRUSR | S_IWUSR | S_IRGRP | S_IWGRP | S_IROTH | S_IWOTH (permission for all the users). You can also assign 0666 for this argument, which means basically the same.

mkfifo(FIFO_FILE, DEFFILEMODE);

After we attached this pipe two these processes, they can then communicate with each other via this named pipe. Let’s see how it works. When one process wants to send a message to another, it will first open the file (as the write-only mode) and then write the message to the named pipe. After writing, the file should be closed by the close function.

int fd = open(FIFO_FILE, O_WRONLY);
write(fd, "Hello", strlen("Hello")+1);
close(fd);

Then, in another process, if we want to read from this process, we should first open the file (as the read-only mode) and then read the message from the named pipe. After reading, the file should be closed by the close function.

char str[80];
fd = open(FIFO_FILE, O_RDONLY);
read(fd, str, 80);
close(fd);

Now, let’s see an example code. The code for fifoserver.c should be,

The code for fifoclient.c should be,

The output for the server will be,

Named pipe established! ==============
Server: send Connected!
Client: Hello Server

The output for the client will be,

Named pipe established! ==============
Server: Connected!
Client: send Hello Server

(3) Shared Memory

Now, let’s see how we can use the shared memory. We will see two examples in this section.

In the first example, we will have to communicate between two related processes (i.e. a parent process and its child). The shared memory will be attached to both these two processes and then they are able to communicate via this shared memory. In this case, we are going to use the Sys V APIs for sharing the memory.

The basic idea of the first program is that, firstly, the shared memory is created by the shmget function. In this case, the unique key will not be generated, instead, we will assign IPC_PRIVATE (i.e. value 0) to this argument. This is because we have a private IPC and we don’t really care about this unique key.

Then, the shmat is called to attach the current process to this shared memory we have created by shmget. The return value of shmat is void * so that we have to convert it to the data structure we will use to store the shared data.

Although we can use a simple string for communication, in this case, we are going to use a user-defined data structure called datamemory. Similarly, this data structure has two elements as we have discussed. One is the message variable that we are going to use for storing the shared data. The other is the size variable that represents the length of the shared data.

typedef struct {
char message[SHARED_MEM_SIZE];
int size;
} datamemory;

Then the fork is called to create the child process of the current process. Because we have already called shmat , now both of these processes are attached to the same shared memory. Because they can do write and read to the shared memory simultaneously, we have to do some synchronizations to make sure that the child process writes in the first place, and then the parent will read from the shared memory. We are going to discuss the synchronizations later and now we will simply use sleep to synchronize these two processes.

First, the parent will read from the memory and reads nothing because the memory is empty now. So the child should sleep at the beginning. After the parent reads from the memory, it should sleep and wait for the child to write. In end, after the child writes to the shared memory, the parent will read the shared memory again and get the shared data. In fact, we should have a workload of the following diagram.

After the child writes to the memory, it will wait for another second before it returns (Why? You can delete the sleep(1); to see what will happen). It should also detach from the shared memory because it finishes its task. However, it must not destroy the shared memory because the parent may be reading from this memory.

When the parent process successfully read the data from the shared memory, it can be detached from it, and then we can call shmctl to destroy the shared memory because it will no longer be useful.

The example code should be,

And the output should be,

SHM Content:
SHM Content: Hello, shared memory.

Now, let’s see the second example. The second is much simpler than the first one but we have to communicate between two unrelated processes. In this case, we will directly use a shared string instead of a user-defined data structure because the size of the data is useless.

The writer process will first write to the shared memory and then it will be detached from the memory. The reader process will then read from the shared memory. After it prints the data in the shared memory, it will be detached from the shared memory. In the end, the shared memory will then be destroyed through shmctl by the reader.

What is different in this case is that, because this is not a private IPC anymore, we have to assign a unique key to this shared memory so that the other process can know where to find this shared memory. Thus, the unique key is relatively important in this case. We will use ftok to generate the unique key. Note that the first argument of this function must be the path of an existing file, but the file content is not relevant for us. Also, the second argument is also not relevant. In our case, we will use 65 but you can change the value of it to whatever you like.

The example codes for this case are,

And the output for the reader should be,

Data read from memory: Hello, shared memory.

(4) Message Queue

The main difference between the message queue and the pipe is that the latter one can only be used for IPC between relative processes. The main difference between the message queue and the named pipe is that the named pipe relies on the file, while the message does not rely on the file depend on the processes. The main difference between the message queue and the former one doesn’t need synchronization.

The Sys V API for creating a message queue is quite similar to the shared memory. The msgget is called create a message queue and an identifier of this queue will be returned. To create a message queue, we should get a unique key for this message queue. The function msgsnd is used to send the data to the message queue and msgrcv is used to read the data from the queue. After the reader reads from the queue, msgctl will be called to destroy the message queue.

The example code can be found from here. Note that we will read a message from the queue instead of some bytes at a time.

3. Linux Signals

(1) Signal Handler

Most Linux users use the key combination Ctrl+C to terminate processes in Linux. The reason behind this is that whenever Ctrl+C is pressed, a signal SIGINT is sent to the process. The default action of this signal is to terminate the process.

Now, let’s add another print. Before the process terminates, we will have to print Received SIGINT, quitting … , this means we must modify the signal handler to some user-defined functions. In fact, the handler can be replaced by the signal function and the second argument will become a callback function of the present signal. The example code of modifying the signal should be,

The output of the program should be,

^C
Received SIGINT, quitting ...

(2) Linux’s Kill Command

Now, let’s try another way to terminate a process. You may know that the kill command can be used to terminate a process in Linux. This is true but we must know the process id (i.e. pid) we would like to kill. Let’s run the program above again but we will not use Ctrl+C to terminate it this time. After executing the code above, we can use the following command to find its process id (in another terminal).

$ ps | grep signal

The output might be,

96001 ttys003    0:00.00 grep signal
96056 ttys005 0:20.09 /.../signal

Note that the first line is useless, this means that we are going to use a grep command in finding the keyword signal. The process in the second line is exactly the process we want and its pid should be 96056 . Note that you might get a different pid and it will be okay. Then we can use the kill command to terminate the process,

$ kill 96056

Then we will find the process is terminated and the output will be,

Terminated: 15

(3) Modify SIGKILL handler

Now, how about let’s have a try on modifying the SIGKILL handler just as what we have done to the SIGINT. So when we meet a SIGKILL, we can print Received SIGKILL, quitting … before the process is terminated. You can try the following program to see if it works,

However, it is a pity because the output says,

Error: Can't catch SIGKILL

This is because the SIGKILL is handled by the kernel and we, as the users, can not simply change the handler of it. It’s quite a similar case for the SIGSTOP signal. This is why the manual says,

The signals SIGKILL and SIGSTOP cannot be caught, blocked, or ignored.

You can also view this StackOverflow page for more insights.

(4) User Defined Signals

Even though we can not modify the SIGKILL signal, we can send some user-defined signals to the process by the kill command. In Linux, the system provides us two user-defined signals that can be used by us, the SIGUSR1 and the SIGUSR2. For example, when we receive a SIGUSR1 , we can print Someone is knocking. and then do nothing. Let’s see an example code,

The kill command can also be used to pass a specific signal to the process. For example, we can use the following command to send a SIGUSR1 to our process (remember to change the pid),

$ kill -USR1 96056

Then the process will print,

Someone is knocking.

Because the process will not quit after receiving this signal, this means that you can try it several times. For Example, we can use,

$ kill -USR1 96056
$ kill -USR1 96056
$ kill -USR1 96056
$ kill -USR1 96056

Then the output will be,

Someone is knocking.
Someone is knocking.
Someone is knocking.
Someone is knocking.