Operating System 9 | Socket Programming Experiment 2: Enable IPv4 and IPv6
Operating System 9 | Socket Programming Experiment 2: Enable IPv4 and IPv6

addrinfo
Vs.sockaddr_in
(1) Recall: Structure sockaddr_in
In the previous socket programming experiment, we have talked about the sockaddr_in
structure, which can be used for storing the IPv4 address information. This structure has the following components,
#include <netinet/in.h>
struct sockaddr_in {
short sin_family; // e.g. AF_INET
unsigned short sin_port; // e.g. htons(3490)
struct in_addr sin_addr; // see struct in_addr, below
char sin_zero[8]; // zero this if you want to
};
struct in_addr {
unsigned long s_addr; // load with inet_aton()
};
where,
sin_family
is the address family for the transport address, which should be set toAF_INET
for IPv4 protocol.sin_port
is to specify the transport port corresponding to a given address.sin_addr.s_addr
is used to store the resolved address result by the given hostname.
However, this sockaddr_in
can be used only for the IPv4 protocol. If we want to use the IPv6 protocol, we can use a similar structure named sockaddr_in6
. More information on the IPv6 structure can be found from here. But there is still a problem. Sometimes we may want to use a socket for both the IPv4 and IPv6 transportation, these structures are not enough for us to use.
(2) Structure addrinfo
To deal with this problem, let’s see a new structure called addrinfo
. To use this structure, we have to include the following three header files,
#include <sys/types.h>
#include <sys/socket.h>
#include <netdb.h>
This addrinfo
structure has a data structure of,
struct addrinfo {
int ai_flags; // AI_PASSIVE, AI_CANONNAME, etc.
int ai_family; // AF_INET, AF_INET6, AF_UNSPEC
int ai_socktype; // SOCK_STREAM, SOCK_DGRAM
int ai_protocol; // use 0 for "any"
size_t ai_addrlen; // size of ai_addr in bytes
struct sockaddr *ai_addr; // struct sockaddr_in or _in6
char *ai_canonname; // full canonical hostname
struct addrinfo *ai_next; // linked list, next node
};
where,
ai_family
specifies the desired address family for the returned addresses. It can be specified toAF_INET
(for IPv4),AF_INET6
(for IPv6), orAF_UNSPEC
(for either IPv4 or IPv6). The valueAF_UNSPEC
indicates thatgetaddrinfo()
(we are going to talk about it later) should return socket addresses for any address family.ai_socktype
specifies the preferred socket type, such asSOCK_STREAM
(for TCP) orSOCK_DGRAM
(for UDP). Specifying0
in this field indicates that socket addresses of any type can be returned bygetaddrinfo()
.ai_addr
is a pointer that can be used to point towards both thesockaddr_in
structure and thesockaddr_in6
structure.
(3) Initialize a Structure with Zeros by memset
When we first create an instance of a structure, we actually captured a range of memory that can be used for this specific structure. If we don’t initialize the origin values in this structure, we are going to have some garbage values. Let’s see an example here. Suppose we define a structure Test
with integer variables a
, b
.
struct Test {
int a, b;
};
Then if we create an instance of this structure and then print the value of the variable a
and variable b
,
struct Test test;
printf("%p: a = %d, b = %d\n", &test, test.a, test.b);
We are going to have some garbage values like,
0x7ffeed38b2b0: a = -315051312, b = 32766
To initialize the values in this structure, we can either manually assign each of the elements this structure has by,
test.a = 10001;
test.b = 20002;
then print the result,
printf("%p: a = %d, b = %d\n", &test, test.a, test.b);
The result should be,
0x7ffeed38b2b0: a = 10001, b = 20002
Or we can also set all the variables of this structure by zeros by memset
,
memset(&test, 0, sizeof test);
then print the result,
printf("%p: a = %d, b = %d\n", &test, test.a, test.b);
The result should be,
0x7ffeed38b2b0: a = 0, b = 0
Generally, you can test the following code on your computer to see why we can call memset
for initialization.
(4) inet_ntop
for Converting Raw Address
Actually, if we are provided a hostname or domain name (i.e. localhost
), what we want to have is a translation of this hostname to an IP address, so the computer can know where we should actually send a message. For example, if we have localhost
as our hostname, we would like it to translate to either 127.0.0.1
(for IPv4) or ::1
(for IPv6).
However, what we have to know is that the computer can read neither 127.0.0.1
nor ::1
, these are actually values for humans to read easily. For example, the IPv4 address 127.0.0.1
actually means 0x7F.00.00.01
in hexadecimal values and this should be 0x7F000001
(you can try it here) as its real address value.
Also, because we usually have a little-edian computer, which is different from the network bytes, we have to convert this value to 0x01.00.00.7F
(see a more rigorous explanation about little-edian from here). Thus, the hexadecimal values of this address should be 0x0100007F
. If we then convert this hexadecimal number to a decimal value, we are going to have the value 16777343
(you can calculate this value from here).
Suppose now we are given a raw decimal address 16777343
, how could we convert this value to 127.0.0.1
? The answer is that we can use the inet_ntop
function to print the result. Let’s see a code example here,
The result should be,
16777343 result: 127.0.0.1
You can change the value 16777343
to see how it changes the output IP string.
(5) gethostbyname
for Resolving IPv4 Hostname
However, in practice, we are not given the real address like 16777343
, instead, we are given the hostname or domain name like localhost
or maybe like google.com
. Suppose we are given localhost
as our hostname, how can we know that we are mapping to 127.0.0.1
or ::1
. For IPv4 mapping, we can use the function gethostbyname
to resolve this hostname. This can be easily used by,
char *hostname = "localhost";
struct hostent* pHostInfo;
int nHostAddress;
pHostInfo = gethostbyname(hostname);
// memcpy: convert the value of pHostInfo->h_addr to long int
memcpy(&nHostAddress, pHostInfo->h_addr, pHostInfo->h_length);
printf("%d", nHostAddress);
The output of the code is,
16777343
which is exactly the localhost’s real address that we have discussed above. We can then use the function inet_ntop
to convert this value to the IP string 127.0.0.1
.
But what will happen if use this to resolve a hostname for the IPv6 address? You can imagine that the return value for the gethostname
function is fixed given a specific hostname, however, the real addresses of IPv4 and IPv6 aere actually not the name. Thus, we can not achieve the real IPv6 address because we are using the IPv4 rules for resolving the hostname.
Let’s now see an example,
The output of the code above is,
IPv6 test fail.
============ localhost: ===========
IPv4: 127.0.0.1
IPv6: 7f00:1:fe7f::b000:80d0:e67f:0
Real Address: 140728915198079
We can find out the real address is 140728915198079
but not 16777343
, this is because we are using the long integer for this case (while in the previous case, we used the int
datatype instead). They are the same IPv4 address because both of them have 0x0100007F
in their value.
Even though the IPv4 address is all right for us, we can not convert this value to the IPv6 address ::1
. Thus, we can know that the function gethostbyname
can only work for IPv4.
(6) getaddrinfo
for Resolving Hostname
In the previous case, we have used gethostbyname
function and we have seen that we can use this function to resolve the localhost
to 127.0.0.1
. However, for the IPv6 address resolution, we simply have no idea so far! So what we really want is a function that when we tell it that we would like to get the address of localhost, it will not only return 127.0.0.1
but also return ::1
for us. So what can we use to implement this feature? The answer is that we can use a getaddrinfo
function. Let’s see how it works.
For a hostname or a domain name, it actually can be resolved to several IP addresses. This technique is called a round-robin DNS. Let’s have a try here. Suppose if we want to fetch some IPv4 addresses for yahoo.com, we can run,
$ nslookup yahoo.com
The result will be (I use 8.8.8.8
as the DNS server),
Name: yahoo.com
Address: 98.137.11.164
Name: yahoo.com
Address: 98.137.11.163
Name: yahoo.com
Address: 74.6.143.25
Name: yahoo.com
Address: 74.6.231.20
Name: yahoo.com
Address: 74.6.143.26
Name: yahoo.com
Address: 74.6.231.21
Similarly, if we want to fetch some IPv6 addresses for yahoo.com, we can run,
$ nslookup -query=AAAA yahoo.com
The result will be (I also use 8.8.8.8
as the DNS server),
yahoo.com has AAAA address 2001:4998:124:1507::f000
yahoo.com has AAAA address 2001:4998:44:3507::8000
yahoo.com has AAAA address 2001:4998:24:120d::1:1
yahoo.com has AAAA address 2001:4998:24:120d::1:0
yahoo.com has AAAA address 2001:4998:44:3507::8001
yahoo.com has AAAA address 2001:4998:124:1507::f001
So if we want to resolve localhost
, the answer is that this hostname can be resolved to 2 different addresses 127.0.0.1
and ::1
. So of course, now we need a data structure to store these two addresses. So what can we use now? Remember that we have discussed the addrinfo
structure, which can be used to store both the IPv4 address and the IPv6 address if we specify the address family to AF_UNSPEC
.
Now, let’s see how the function getaddrinfo
works for us. To use this function, we usually have to create three variables hints
, res
, and p
. The hints
variable is an instance of the addrinfo
structure, while res
and p
are two pointers that can be used to point towards a addrinfo
structure. They are defined by,
struct addrinfo hints, *res, *p;
Before we use the getaddrinfo
to resolve the hostname, we have to specify the values of the hints
structure. We want to specify ai_family
to AF_UNSPEC
because we want to resolve both the IPv4 and the IPv6 addresses for this hostname. Also, we have to use the TCP transformation, so we have to use the stream sockets.
memset(&hints, 0, sizeof hints); // initialize hints with 0s
hints.ai_family = AF_UNSPEC; // AF_INET or AF_INET6 to force version
hints.ai_socktype = SOCK_STREAM; // TCP transformation
Then we can call this magic function getaddrinfo
to resolve the hostname,
char *hostname = "localhost";
status = getaddrinfo(hostname, NULL, &hints, &res);
The return value status
of this function is the status after resolving the hostname. 0
means that the hostname is resolved successfully. If the returned value is not zero, we can use the function gai_strerror
to print the detailed error information.
printf("%s", gai_strerror(status));
After resolution, the res
pointer will be pointing towards a addrinfo
structure that stores the address information. Because this structure has an element ai_next
, which is a pointer pointing to the next addrinfo
structure, we actually have a linked list (or maybe we can call it a lined structure) as a result. If we loop this structure and retrieve ai_addr
s until we meet a NULL pointer, we can get all the address information of this given hostname.

So the looping structure should be,
for(p = res;p != NULL; p = p->ai_next) {
...
}
For each loop, we have to specify whether this is an IPv4 address or an IPv6 address by ai_family
and then create a structure instance of either sockaddr_in
or sockaddr_in6
based on this address. From the manual of the inet_ntop
function, we can know that this function accepts the address of the structure sin_addr
or sin6_addr
for its second argument. Thus, we can use a conditioned structure to resolve the IPv4 and IPv6 addresses of the given hostname.
(7) Example Code for Showing IP By Hostname
Wrap up all the things we have covered, we can have the following program that can resolve the hostname localhost
for us. For example,
Remember, in the end, we have to free the linked structure by function freeaddrinfo
with the res
variable. Note that the code above can also be used to resolve the IP address like 127.0.0.1
or ::1
directly.