Operating System 9 | Socket Programming Experiment 2: Enable IPv4 and IPv6

Series: Operating System

Operating System 9 | Socket Programming Experiment 2: Enable IPv4 and IPv6

  1. addrinfo Vs. sockaddr_in

(1) Recall: Structure sockaddr_in

In the previous socket programming experiment, we have talked about the sockaddr_in structure, which can be used for storing the IPv4 address information. This structure has the following components,

#include <netinet/in.h>

struct sockaddr_in {
short sin_family; // e.g. AF_INET
unsigned short sin_port; // e.g. htons(3490)
struct in_addr sin_addr; // see struct in_addr, below
char sin_zero[8]; // zero this if you want to
};

struct in_addr {
unsigned long s_addr; // load with inet_aton()
};

where,

  • sin_family is the address family for the transport address, which should be set to AF_INET for IPv4 protocol.
  • sin_port is to specify the transport port corresponding to a given address.
  • sin_addr.s_addr is used to store the resolved address result by the given hostname.

However, this sockaddr_in can be used only for the IPv4 protocol. If we want to use the IPv6 protocol, we can use a similar structure named sockaddr_in6. More information on the IPv6 structure can be found from here. But there is still a problem. Sometimes we may want to use a socket for both the IPv4 and IPv6 transportation, these structures are not enough for us to use.

(2) Structure addrinfo

To deal with this problem, let’s see a new structure called addrinfo. To use this structure, we have to include the following three header files,

#include <sys/types.h>
#include <sys/socket.h>
#include <netdb.h>

This addrinfo structure has a data structure of,

struct addrinfo {
int ai_flags; // AI_PASSIVE, AI_CANONNAME, etc.
int ai_family; // AF_INET, AF_INET6, AF_UNSPEC
int ai_socktype; // SOCK_STREAM, SOCK_DGRAM
int ai_protocol; // use 0 for "any"
size_t ai_addrlen; // size of ai_addr in bytes
struct sockaddr *ai_addr; // struct sockaddr_in or _in6
char *ai_canonname; // full canonical hostname
struct addrinfo *ai_next; // linked list, next node
};

where,

  • ai_family specifies the desired address family for the returned addresses. It can be specified to AF_INET (for IPv4), AF_INET6 (for IPv6), or AF_UNSPEC (for either IPv4 or IPv6). The value AF_UNSPEC indicates that getaddrinfo() (we are going to talk about it later) should return socket addresses for any address family.
  • ai_socktype specifies the preferred socket type, such as SOCK_STREAM (for TCP) or SOCK_DGRAM (for UDP). Specifying 0 in this field indicates that socket addresses of any type can be returned by getaddrinfo().
  • ai_addr is a pointer that can be used to point towards both the sockaddr_in structure and the sockaddr_in6 structure.

(3) Initialize a Structure with Zeros by memset

When we first create an instance of a structure, we actually captured a range of memory that can be used for this specific structure. If we don’t initialize the origin values in this structure, we are going to have some garbage values. Let’s see an example here. Suppose we define a structure Test with integer variables a, b.

struct Test {
int a, b;
};

Then if we create an instance of this structure and then print the value of the variable a and variable b,

struct Test test;
printf("%p: a = %d, b = %d\n", &test, test.a, test.b);

We are going to have some garbage values like,

0x7ffeed38b2b0: a = -315051312, b = 32766

To initialize the values in this structure, we can either manually assign each of the elements this structure has by,

test.a = 10001; 
test.b = 20002;

then print the result,

printf("%p: a = %d, b = %d\n", &test, test.a, test.b);

The result should be,

0x7ffeed38b2b0: a = 10001, b = 20002

Or we can also set all the variables of this structure by zeros by memset,

memset(&test, 0, sizeof test);

then print the result,

printf("%p: a = %d, b = %d\n", &test, test.a, test.b);

The result should be,

0x7ffeed38b2b0: a = 0, b = 0

Generally, you can test the following code on your computer to see why we can call memset for initialization.

(4) inet_ntop for Converting Raw Address

Actually, if we are provided a hostname or domain name (i.e. localhost), what we want to have is a translation of this hostname to an IP address, so the computer can know where we should actually send a message. For example, if we have localhost as our hostname, we would like it to translate to either 127.0.0.1 (for IPv4) or ::1 (for IPv6).

However, what we have to know is that the computer can read neither 127.0.0.1 nor ::1, these are actually values for humans to read easily. For example, the IPv4 address 127.0.0.1 actually means 0x7F.00.00.01 in hexadecimal values and this should be 0x7F000001 (you can try it here) as its real address value.

Also, because we usually have a little-edian computer, which is different from the network bytes, we have to convert this value to 0x01.00.00.7F (see a more rigorous explanation about little-edian from here). Thus, the hexadecimal values of this address should be 0x0100007F. If we then convert this hexadecimal number to a decimal value, we are going to have the value 16777343 (you can calculate this value from here).

Suppose now we are given a raw decimal address 16777343 , how could we convert this value to 127.0.0.1? The answer is that we can use the inet_ntop function to print the result. Let’s see a code example here,

The result should be,

16777343 result: 127.0.0.1

You can change the value 16777343 to see how it changes the output IP string.

(5) gethostbyname for Resolving IPv4 Hostname

However, in practice, we are not given the real address like 16777343 , instead, we are given the hostname or domain name like localhost or maybe like google.com. Suppose we are given localhost as our hostname, how can we know that we are mapping to 127.0.0.1 or ::1 . For IPv4 mapping, we can use the function gethostbyname to resolve this hostname. This can be easily used by,

char *hostname = "localhost";
struct hostent* pHostInfo;
int nHostAddress;
pHostInfo = gethostbyname(hostname);
// memcpy: convert the value of pHostInfo->h_addr to long int
memcpy(&nHostAddress, pHostInfo->h_addr, pHostInfo->h_length);
printf("%d", nHostAddress);

The output of the code is,

16777343

which is exactly the localhost’s real address that we have discussed above. We can then use the function inet_ntop to convert this value to the IP string 127.0.0.1.

But what will happen if use this to resolve a hostname for the IPv6 address? You can imagine that the return value for the gethostname function is fixed given a specific hostname, however, the real addresses of IPv4 and IPv6 aere actually not the name. Thus, we can not achieve the real IPv6 address because we are using the IPv4 rules for resolving the hostname.

Let’s now see an example,

The output of the code above is,

IPv6 test fail.
============ localhost: ===========
IPv4: 127.0.0.1
IPv6: 7f00:1:fe7f::b000:80d0:e67f:0
Real Address: 140728915198079

We can find out the real address is 140728915198079 but not 16777343 , this is because we are using the long integer for this case (while in the previous case, we used the int datatype instead). They are the same IPv4 address because both of them have 0x0100007F in their value.

Even though the IPv4 address is all right for us, we can not convert this value to the IPv6 address ::1. Thus, we can know that the function gethostbyname can only work for IPv4.

(6) getaddrinfo for Resolving Hostname

In the previous case, we have used gethostbyname function and we have seen that we can use this function to resolve the localhost to 127.0.0.1. However, for the IPv6 address resolution, we simply have no idea so far! So what we really want is a function that when we tell it that we would like to get the address of localhost, it will not only return 127.0.0.1 but also return ::1 for us. So what can we use to implement this feature? The answer is that we can use a getaddrinfo function. Let’s see how it works.

For a hostname or a domain name, it actually can be resolved to several IP addresses. This technique is called a round-robin DNS. Let’s have a try here. Suppose if we want to fetch some IPv4 addresses for yahoo.com, we can run,

$ nslookup yahoo.com

The result will be (I use 8.8.8.8 as the DNS server),

Name: yahoo.com
Address: 98.137.11.164
Name: yahoo.com
Address: 98.137.11.163
Name: yahoo.com
Address: 74.6.143.25
Name: yahoo.com
Address: 74.6.231.20
Name: yahoo.com
Address: 74.6.143.26
Name: yahoo.com
Address: 74.6.231.21

Similarly, if we want to fetch some IPv6 addresses for yahoo.com, we can run,

$ nslookup -query=AAAA yahoo.com

The result will be (I also use 8.8.8.8 as the DNS server),

yahoo.com has AAAA address 2001:4998:124:1507::f000
yahoo.com has AAAA address 2001:4998:44:3507::8000
yahoo.com has AAAA address 2001:4998:24:120d::1:1
yahoo.com has AAAA address 2001:4998:24:120d::1:0
yahoo.com has AAAA address 2001:4998:44:3507::8001
yahoo.com has AAAA address 2001:4998:124:1507::f001

So if we want to resolve localhost, the answer is that this hostname can be resolved to 2 different addresses 127.0.0.1 and ::1. So of course, now we need a data structure to store these two addresses. So what can we use now? Remember that we have discussed the addrinfo structure, which can be used to store both the IPv4 address and the IPv6 address if we specify the address family to AF_UNSPEC.

Now, let’s see how the function getaddrinfo works for us. To use this function, we usually have to create three variables hints , res , and p. The hints variable is an instance of the addrinfo structure, while res and p are two pointers that can be used to point towards a addrinfo structure. They are defined by,

struct addrinfo hints, *res, *p;

Before we use the getaddrinfo to resolve the hostname, we have to specify the values of the hints structure. We want to specify ai_family to AF_UNSPEC because we want to resolve both the IPv4 and the IPv6 addresses for this hostname. Also, we have to use the TCP transformation, so we have to use the stream sockets.

memset(&hints, 0, sizeof hints); // initialize hints with 0s
hints.ai_family = AF_UNSPEC; // AF_INET or AF_INET6 to force version
hints.ai_socktype = SOCK_STREAM; // TCP transformation

Then we can call this magic function getaddrinfo to resolve the hostname,

char *hostname = "localhost";
status = getaddrinfo(hostname, NULL, &hints, &res);

The return value status of this function is the status after resolving the hostname. 0 means that the hostname is resolved successfully. If the returned value is not zero, we can use the function gai_strerror to print the detailed error information.

printf("%s", gai_strerror(status));

After resolution, the res pointer will be pointing towards a addrinfo structure that stores the address information. Because this structure has an element ai_next , which is a pointer pointing to the next addrinfo structure, we actually have a linked list (or maybe we can call it a lined structure) as a result. If we loop this structure and retrieve ai_addrs until we meet a NULL pointer, we can get all the address information of this given hostname.

So the looping structure should be,

for(p = res;p != NULL; p = p->ai_next) {
...
}

For each loop, we have to specify whether this is an IPv4 address or an IPv6 address by ai_family and then create a structure instance of either sockaddr_in or sockaddr_in6 based on this address. From the manual of the inet_ntop function, we can know that this function accepts the address of the structure sin_addr or sin6_addr for its second argument. Thus, we can use a conditioned structure to resolve the IPv4 and IPv6 addresses of the given hostname.

(7) Example Code for Showing IP By Hostname

Wrap up all the things we have covered, we can have the following program that can resolve the hostname localhost for us. For example,

Remember, in the end, we have to free the linked structure by function freeaddrinfo with the res variable. Note that the code above can also be used to resolve the IP address like 127.0.0.1 or ::1 directly.