MC9241- Network programming notes

Tuesday, 27 September 2011

Process termination
command line arguments
Environment of a UNIX process
Process Times
Process relationships terminal logins
Message Queues
Shared Memory

Creation of a process
A unique pid is assigned to the new process
Space is allocated for all the elements of the process image.
The process control block is initialized. Inherit info from parent
The appropriate linkages are set: for scheduling, state queues
Create and initialize other data structures (file tables, IO table etc
Process Interruption
Two kinds of process interruptions: interrupt and trap
Interrupt: Caused by some event external to and asynchronous
Trap : Error or exception condition generated within the currently running process. Ex: illegal access to a file, arithmetic exception
(supervisor call) : explicit interruption

Unix system VProcess controlUNIX

Every process, except process 0, is created by the fork() system call

fork() allocates entry in process table and assigns a unique PID to the child process

child gets a copy of process image of parent: both child and parent are executing the same code following fork()

but fork() returns the PID of the child to the parent process and returns 0 to the child process

fork and exec

Child process may choose to execute some other program than the parent by using exec call.

Exec overlays a new program on the existing process.

Child will not return to the old program unless exec fails. This is an important point to remember.

Why does fork need to clone?

Why do we need to separate fork and exec?

Why can’t we have a single call that fork a new program?

Race conditions

race = shared data + outcome depends on order that processes run

e.g., parent or child runs first?

waiting for parent to terminate

generally, need some signaling mechanism


stream pipes


process: address space + single thread of control

sometimes want multiple threads of control (flow) in same address space


threads separate resource grouping & execution

thread: program counter, registers, stack

also called lightweight processes

multithreading: avoid blocking when waiting for resources

multiple services running in parallel

state: running, blocked, ready, terminated


POSIX standard for threads.

POSIX standard (IEEE 1003.1c) : The standard defines an API for creating and manipulating threads.

Pthreads defines a set of C programming language types and procedure calls.

It is implemented with a pthread.h header and a thread library

Data types

pthread_t: handle to a thread

pthread_attr_t: thread attributes

System V IPC

The System V IPC consists of three types of IPC :

Message Queues,

Semaphores and

shared memory

Similarities between the three IPC

Identifiers and keys

Permission structure

Configuration Limits

Identifiers and keys

Each of the three structures is referred to the kernel by a nonnegative identifier.

Whenever an IPC structure is being created, key must be specified.

The data type of this key is key_t(defined in sys/types.h)

This key is converted to the identifier by the kernel.

Permission Structure.

Every IPC structure has an associates ipc_perm structure.This structure is used to define the permission and the owners.

struct ipc_perm {

uid uid; /* owner’s effective user id */

gid_t gid ; /* owner’s effective group id */

uid_t cuid; /* creator’s effective user id */

gid_t cgid; /* creator’s effective group id */

mode_t mode; /* access mode */

ulong seq; /* slot usage sequence number */

key_t key; /* key */



IPC structures are system wide and do not have a reference count.

IPC structure are known by names in the file system. We can’t access and modify their properties

It is hard to use more than one IPC structure at a time, because we can’t use multiplexed I/O functions.

Message Queues

These are linked list of messages stored within the kernel and identified by a message queue identifier called queue ID.

msgget : is used to create a new queue or open an existing queue.

msgsnd : used to add messages to the end of the queue

msgrcv : messages are fetched from a queue

every queue has a msqid_ds structure associated with it.

This structure is used to define the current status of the queue.

strcut msqid_ds {

struct ipc_perm msg_perm ;

struct msg *msg_first;

struct msg *msg_last;

olong msg_cbytes;

ulong msg_qbytes;

pid_t msg_lspid;

pid_t msg_lrpid;

time_t msg_stime


Semaphores are a programming construct designed by E. W. Dijkstra in the late 1960s

Semaphores let processes query or alter status information. They are often used to monitor and control the availability of system resources such as shared memory segments.

Initializing a Semaphore Set

The function semget() initializes or gains access to a semaphore.

It is prototyped by:

int semget(key_t key, int nsems, int semflg);

The cmd argument is one of the following control flags:

GETVAL -- Return the value of a single semaphore.

SETVAL -- Set the value of a single semaphore. In this case, arg is taken as arg.val, an int.

GETPID -- Return the PID of the process that performed the last operation on the semaphore or array.

GETNCNT -- Return the number of processes waiting for the value of a semaphore to increase.

GETZCNT -- Return the number of processes waiting for the value of a particular semaphore to reach zero.

GETALL -- Return the values for all semaphores in a set. In this case, arg is taken as arg.array, a pointer to an array of unsigned shorts (see below).

SETALL -- Set values for all semaphores in a set. In this case, arg is taken as arg.array, a pointer to an array of unsigned shorts.

IPC_STAT -- Return the status information from the control structure for the semaphore set and place it in the data structure pointed to by arg.buf, a pointer to a buffer of type semid_ds.

IPC_SET -- Set the effective user and group identification and permissions. In this case, arg is taken as arg.buf.

IPC_RMID -- Remove the specified semaphore set.

Semaphore Operations

semop() performs operations on a semaphore set. It is prototyped by:

int semop(int semid, struct sembuf *sops, size_t nsops); The semid argument is the semaphore ID returned by a previous semget() call. The sops argument is a pointer to an array of structures, each containing the following information about a semaphore operation:

The semaphore number

The operation to be performed

Control flags, if any.

The sembuf structure specifies a semaphore operation, as defined in .

struct sembuf


ushort_t sem_num; /* semaphore number */

short sem_op; /* semaphore operation */

short sem_flg; /* operation flags */};


There are two control flags that can be used with semop():

IPC_NOWAIT Can be set for any operations in the array. Makes the function return without changing any semaphore value if any operation for which IPC_NOWAIT is set cannot be performed. The function fails if it tries to decrement a semaphore more than its current value, or tests a nonzero semaphore to be equal to zero.

SEM_UNDO Allows individual operations in the array to be undone when the process exits.

Mutual Exclusion and Synchronization

Mutual exclusion

Guarantee that no two processes (or other types of agents) should access a critical region at the same time


Guarantee that one step should occur before/after another

Use lock, semaphore, or monitor to achieve mutual execution and/or synchronization

Critical Region and Locking

How to Implement the Lock?

How to use lock for mutual exclusion?

lock (lck);

< critical section >

unlock (lck);

How to use lock for synchronization?

E.g., Guarantee that S12 executes before S22

P1: S11, S12 P2: S21, S22

lock (lck);

P1: S11, S12, unlock (lck);

P2: S21, lock (lck); S22

Bounded Buffer Problem


full: indicates how far from full

full.count = 0 ® n items in buffer ® producer wait

Þ producer decreases full.count

consumer increases full.count

empty: indicates how far from empty

empty.count = 0 ® 0 item in buffer ® consumer wait

Þ producer increases empty.count

consumer decreases full.count

Both have to wait on mutex before accessing buffer

If only one producer and one consumer, do we still have to use mutex?

No, each uses a different buffer slot

But we still need to use full and empty semaphores

Reader-Writer Problem

Many reader can read at the same time without causing problem

When there is a writer, has to write exclusively

First reader/writer problem (reader has priority)

If ³ 1 reader in CS Þ allow more readers to get in

If a writer is in CS, no one else can enter

This way readers has a higher priority

Have starvation problem for writers

There are other forms of reader/writer problem


Protect shared objects within an abstraction

Provide encapsulation

Accesses to a shared object is confined within a monitor

Easier to debug the code

Provide mutual exclusive accesses

No two process can be active at the same time within a monitor

Monitor and Semaphore

Monitor provides encapsulation

What should be encapsulated?

Too much Þ reduce concurrency

Some part of the code that can be executed concurrently, if encapsulated in the monitor, can cause reduced concurrency

If not encapsulate them, then lose the meaning of encapsulation

Reader/writer example

If monitor is used to do what a semaphore would do

Monitor is more expensive

Overview of TCP/IP
Socket address Structures
Byte ordering functions
Byte Manipulation Functions
Address conversion functions
Address conversion functions
Elementary TCP Sockets program
Iterative Server
Concurrent Server

Transport Layer

OSI Layer Protocols

Some protocols used

User datagram protocol

Transmission Control Protocol

TCP Connection establishment and Termination

TCP State Transition

Port Number


Protocols are the standards that specify how data is represented when being transferred from one machine to another.

Protocols specify how the transfer occurs, how errors are detected, and how acknowledgements are passed.

To simplify protocol design and implementation, communication stacks are segregated into layers that can be solved independently.

Each layer is assigned a separate protocol.

Network Layer

The Network Layer is responsible for establishing paths for data transfer through the network






IP :

IP is a network layer protocol in the Internet protocol suite


Packet timeouts

Options : trace the route a packet takes (record route), label packets with security features.

IPV4 :

Uses 32 bit addresses


Uses 128 bit addresses 3ffe:ffff:101::230:6eff:fe04:d9ff.


Internet Control Message Protocol

Handles error and control information between routers and hosts


Internet Group Management Protocol

Used with multicasting


the Address Resolution Protocol (ARP)

standard method for finding a host's hardware address when only its network layer address is known.


Reverse Address Resolution Protocol (RARP)

used to obtain an IP address for a given hardware address (such as an Ethernet address).

Transport Layer

Process-Level Addressing:

Segmentation, Packaging and Reassembly:

Multiplexing and Demultiplexing:

Connection Establishment, Management and Termination

Acknowledgments and Retransmissions

Flow Control:

User Datagram Protocol

UDP is not reliable

In case of checksum error or datagram dropped in the network

It is not retransmitted

UDP Datagram has length

The length of the data is passed along with the data so it has record boundaries unlike the TCP which does not have boundaries

Provides a connectionless service

No long term relationship between the client and the server

Transmission Control Protocol

Feature of Transmission Control Protocol

Connections :

Provides connection between clients and servers


Acknowledge required when data is sent over the network

If acknowledgement not received , TCP automatically retransmits the data

UDP not reliable


TCP contains algorithm to estimate the Round Trip Time between the client and server

Sequences Data :

TCP sequences the data by associating a sequence number with every byte that it sends

If a byte arrives out of order TCP can reorder it .

If duplicate data arrives, It discards the duplicate data

Flow Control

Advertised window : TCP tells the peer how many bytes of data it is wiling to accept from the peer at any one time

As data is received from the sender , the window size decreases , but as the receiving application reads data from the buffer , the window size decreases

UDP does not provide flow control

Full duplex connection

The application can send and receive data in both the directions on a given connection at any given time

UDP can be full duplex


A socket is an end to end communication link between a server and a client application. This allows applications to be network aware, and send and receive data via a network. Interface details vary from computer to computer.

applications consist of a server portion and a client portion. An application program request the operating system to create a socket connection. Each time a socket connection is used, the application program must specify the destination address, or alternatively, bind the IP address to the socket.

Sockets use a destination address and port number to communicate with another application. Each connection uses a specific port number, some of which are reserved (see /etc/services).

Types of sockets

Internet Sockets,

unix sockets,

X.25 sockets

Two Types of Internet Sockets

Stream Sockets (SOCK_STREAM)

Connection oriented, rely on TCP to provide reliable two-way connected communication

Datagram Sockets (SOCK_DGRAM)

Rely on UDP, Connection is unreliable

Datagram sockets

The datagram protocol, also known as UDP, is connectionless.

This means that each time a datagram (a packet of data to a destination) is sent, the socket and destination computers address must be included. There is a limit of 64KB for datagrams sent to a specific location.

UDP is also unreliable, as there is no quarantee that the datagrams sent will arrive in the same order at the destination.

The files to include in application programs that define sockets and the various calls associated with them are


Data types

a socket descriptor : int

struct sockaddr

struct sockaddr_in

Creating a socket

The socket() call creates a socket on demand. The format is

int s;
s = socket( AF_INET, SOCK_DGRAM, 0 );
/* specify TCP/IP and use datagrams */

If the socket was not created, -1 is returned to indicate an error. When a socket is created, it is in an unconnected state. An application program normally uses the system call connect() to bind a destination address to the socket and place it into a connected state.

Sockets can be used in either connectionless datagram or as a more reliable stream. In connectionless datagram (udp), there is no guarentee of delivery. In tcp sockets, data delivery is guaranteed.

recvfrom(), sendto() and sendmsg() allow udp as they require the destination address to be specified as part of the call.

Setting up a destination address and port number

An application program creates a variable of type struct sockaddr_in, then assigns the destination address and port number to this variable. In sending or receiving data on the socket connection, this variable is passed as a parameter.

struct sockaddr_in server;

/* set up server name and port number */
server.sin_family = AF_INET; /* use TCP/IP */
server.sin_port = 800; /* specify port 800 */
server.sin_addr.s_addr = inet_addr("");

Binding the destination address

Rather than specify the destination address in each call, the destination address can be bound to the socket.

/* set up the server connection side */server.sin_family = AF_INET; /* use TCP/IP */server.sin_port = 0; /* use first available port */server.sin_addr.s_addr = INADDR_ANY;if( bind( s, &server, sizeof(server) ) < 0 ) { perror("Error, socket not bound."); exit(3);}

Sending data to the socket connection

There are five possible system calls that an application program can use to send data to a socket. They are send(), sendto(), sendmsg(), write() and writev().

The following code fragment sends data to the port.

char buf[32];
strcpy( buf, "Hello" );
sendto( s, buf, sizeof(buf)+1, 0, &server, sizeof(server));

Receiving data from the socket connection

The following code fragment receives data from the port.

char buf[32];int s, client_address_size;struct sockaddr_in client, server; if( recvfrom( s, buf, sizeof(buf), 0, (struct sockaddr *) &client, &client_address_size) < 0 ) { perror("Error getting data from socket connection."); exit( 4 );}

Closing the socket connection

When the application program is finished, the socket connection should be closed.

close( s );

Byte Ordering functions

htons() - short integer from host byte order to network byte order.

ntohs() - short integer from network byte order to host byte order.

htonl() - long integer from host byte order to network byte order.

ntohl() - long integer from network byte order to host byte order.

S = 16 bit ( port number )

L = 32 bit (IPv4 addresses)

Stream protocol

The stream protocol, also known as TCP, is connection orientated.

This requires a connection to be established between the sender and receiver.

One of the sockets listens for a connection request (the server), the other socket asks for a connection (the client).

When the server accepts the connection request from the client, data can then be sent between the server and client.

In TCP there is no limit on the amount of data that can be transmitted. TCP is also a reliable protocol, in that data is received in the same order in which it was sent.

Now Let us write the TCP Program


The main aim of the client is

to create a socket

Get and store the server`s address

Use the connect function to establish connection with the server

Send and receive messages

include the following header files





#include /* bcoz v use hostent structure */


#include /* for internet family adderss structures */


declare the two integer values



declare a char buff of size 100

declare a pointer variable hname of type struct hostent

declare a variable serveraddr of type sockaddr_in

use the gethostbyname() function to lookup the server name and check if it was successful or not . Note that the return value must be received in hname variable

call the socket function and check if it is successful or not and store the return value in sockfd

initialize the values of serveraddr to 0, using the bzero function

Initialize the values of sin_family and sin_port of the serveraddress

get the h_addr of the hname and type cast it to struct in_addr pointer and assign this value to the sin_addr as follows

serveraddr.sin_addr = *((struct in_addr *)hname->h_addr);

use the connect function to establish the connection with the server and check if it successful or not

Now you can send and receive the data

use recv() function to receive the data sent by the server and check if it was successfully received or not

Print the data that was received on the screen


The main aim of the server is

to create a socket

Bind with the local protocol

Listen for any incoming requests

Accept the requests

Send and receive messages

declare two integer values – sockfd and newsockfd

declare a variable myaddress of type structure sockaddr_in;

declare a variable clientaddress of type structure sockaddr_in;

using the socket function create a socket and check if it was successful or not

initialize the values of the myaddress to zero using bzero function

Initialize the values of sin_family, sin_port and sin_addr(INADDR_ANY)

Use the bind() function to get the local protocol address of the server and check if it is successful or not

Use the listen() function to wait for a connection and check if it is successful or not

use the accept () function to accept a client connection and store the return value in the newsockfd variable . Check if it is successful or not

use the send function to send some data to the client and check if it was successfully sent or not ( note that you must use the newsockfd to send the data and not the sockfd)


Client does not establish a connection with the server

Instead, the client just sends a datagram to the server using sendto()

Similarly the server does not accept a connection from the client. Instead it just calls a recvfrom()

Recvfrom(int sockfd, void *buff, size_t nbytes, int flags, struct sockaddr *from , socklen_t *addrlen)

Sendto(int sockfd, const void *buff, size_t nbytes, int flags, struct sockaddr *to, socklen_t *addrlen)

inet_aton, function
o converts the specified string, in the Internet standard dot notation, to a network address, and stores the address in the structure provided.
int inet_aton(const char *cp, struct in_addr *addr);
o Ex
inet_aton(“”, &serveraddress);
o returns 1 if the address is successfully converted, or 0 if the conversion failed.
o converts the specified string, in the Internet standard dot notation, to an integer value suitable for use as an Internet address.
o The converted address is in network byte order (bytes ordered from left to right).
in_addr_t inet_addr(const char *cp);
o Ex:
Struct in_addr a ;
a.in_addr_t = inet_addr(“”)
o On success, inet_addr() returns the Internet address. Otherwise, it returns -1.
o converts the specified Internet host address to a string in the Internet standard dot notation.
char *inet_ntoa(struct in_addr in);
o returns a pointer to the network address in Internet standard dot notation
o char *inet_net_pton(int af, const char *src, void *dst, size_t size);
o converts an Internet network number from presentation format (either Internet standard dot notation, or Classless Internet Domain Routing (CIDR) format) to network format
o Used for IPV6
o converts an address from network format to presentation format.
o Used in IPV6
char *inet_ ntop(int af, const void *src, char *dst, size_t size);

Posix Signal handling
I/O multiplexing –I/O Models
I/O multiplexing –I/O Models
select function
shutdown function -poll function

A signal is a limited form of inter-process communication used in Unix, Unix-like, and other POSIX-compliant operating systems. Essentially it is an asynchronous notification sent to a process in order to notify it of an event that occurred. When a signal is sent to a process, the operating system interrupts the process's normal flow of execution. Execution can be interrupted during any non-atomic instruction. If the process has previously registered a signal handler, that routine is executed. Otherwise the default signal handler is executed.
Process aborted
Signal raised by alarm
Bus error: "access to undefined portion of memory object"
Child process terminated, stopped (or continued*)
Continue if stopped
Floating point exception: "erroneous arithmetic operation"
Illegal instruction
Kill (terminate immediately)
Write to pipe with no one reading
Quit and dump core
Stop executing temporarily
Termination (request to terminate)
Terminal stop signal
Background process attempting to read from tty ("in")
Background process attempting to write to tty ("out")
User-defined 1
User-defined 2
Pollable event
Profiling timer expired
Trace/breakpoint trap
Urgent data available on socket
Signal raised by timer counting virtual time: "virtual timer expired"
CPU time limit exceeded
File size limit exceeded
· Type of I/O Models
o Blocking I/O
o Non Blocking I/O
o I/O Multiplexing
o Signal driven I/O
o Asynchronous I/O
· Synchronous I/O Vs Asynchronous I/O
o Synchronous I/o
§ Causes the requesting process to be blocked until that I/O operation completes
§ Eg : Blocking, non blocking, I/O multiplexing, signal driven I/O
§ Asynchronous I/O
§ Does not cause the requesting process to be blocked
o Asynchronous I/O
o Polling : continually checking the kernel to see if the datagram is ready
o It is a waste of CPU Time
o The process blocks in call to select , waiting for one of possible many sockets to become readable
o Slight disadvantage : requires 2 system calls
o Advantage : can wait for more than one descriptor
o waiting for the datagram to arrive
o Signal driven tells us when the I/O operation can be initiated
o Asynchronous tells us when the I/O operation can be completed
o Few systems support POSIX asynchronous I/O
· Select Function
o This function allows the process to instruct the kernel to wait for any one of multiple events to occur
o and to wake up the process when one or more of the event occurs
o Or when a specified amount of time has passed
o We can call select and tell the kernel to return only when
o Any of the descriptors in the set {1,4,5} are ready for reading
o Any of the descriptors in the set {2,7} are ready for writing
o Any of the descriptors in the set {1,4} have an exception condition pending
o 10 seconds have elapsed
int select( int nfds,
fd_set* readfds,
fd_set* writefds,
fd_set* exceptfds,
const struct timeval* timeout
struct timeval
long tv_sec;
long tv_usec

Socket options – getsocket and setsocket functions –
Generic socket options –
Generic socket options –
IP socket options –
ICMP socket options – TCP socket options –
Elementary UDP sockets
Domain name system –.
gethostbyname function – Ipv6 support in DNS
gethostbyadr function
getservbyname and getservbyport functions

o Use this constant as the level argument to getsockopt or setsockopt to manipulate the socket-level options described in this section.
o This option toggles recording of debugging information in the underlying protocol modules. The value has type int; a nonzero value means “yes”.
o This option controls whether bind (see Setting Address) should permit reuse of local addresses for this socket. If you enable this option, you can actually have two sockets with the same Internet port number; but the system won't allow you to use the two identically-named sockets in a way that would confuse the Internet. The reason for this option is that some higher-level Internet protocols, including FTP, require you to keep reusing the same port number.
o The value has type int; a nonzero value means “yes”.
o This option controls whether the underlying protocol should periodically transmit messages on a connected socket. If the peer fails to respond to these messages, the connection is considered broken. The value has type int; a nonzero value means “yes”.
o This option controls whether outgoing messages bypass the normal message routing facilities. If set, messages are sent directly to the network interface instead. The value has type int; a nonzero value means “yes”.
o This option specifies what should happen when the socket of a type that promises reliable delivery still has untransmitted messages when it is closed; see Closing a Socket. The value has type struct linger.
o — Data Type: struct linger
o This structure type has the following members:
o int l_onoff
o This field is interpreted as a boolean. If nonzero, close blocks until the data are transmitted or the timeout period has expired.
o int l_linger
o This specifies the timeout period, in seconds.
o This option controls whether datagrams may be broadcast from the socket. The value has type int; a nonzero value means “yes”.
o If this option is set, out-of-band data received on the socket is placed in the normal input queue. This permits it to be read using read or recv without specifying the MSG_OOB flag. See Out-of-Band Data. The value has type int; a nonzero value means “yes”.
o This option gets or sets the size of the output buffer. The value is a size_t, which is the size in bytes.
o This option gets or sets the size of the input buffer. The value is a size_t, which is the size in bytes.
o This option can be used with getsockopt only. It is used to get the socket's communication style. SO_TYPE is the historical name, and SO_STYLE is the preferred name in GNU. The value has type int and its value designates a communication style; see Communication Styles.
o This option can be used with getsockopt only. It is used to reset the error status of the socket. The value is an int, which represents the previous error status.
Here are the functions for examining and modifying socket options. They are declared in sys/socket.h.
— Function: int getsockopt (int socket, int level, int optname, void *optval, socklen_t *optlen-ptr)
The getsockopt function gets information about the value of option optname at level level for socket socket.
The option value is stored in a buffer that optval points to. Before the call, you should supply in *optlen-ptr the size of this buffer; on return, it contains the number of bytes of information actually stored in the buffer.
Most options interpret the optval buffer as a single int value.
The actual return value of getsockopt is 0 on success and -1 on failure. The following errno error conditions are defined:
The socket argument is not a valid file descriptor.
The descriptor socket is not a socket.
The optname doesn't make sense for the given level.
— Function: int setsockopt (int socket, int level, int optname, void *optval, socklen_t optlen)
This function is used to set the socket option optname at level level for socket socket. The value of the option is passed in the buffer optval of size optlen.
Generic Socket Options
SO_BROADCAST: permit sending of broadcast datagram, only on broadcast links
SO_DEBUG: enable debug tracing of packets sent or received by TCP socket, trpt program to examine the kernel circular buffer
SO_DONTROUTE: bypass routing table lookup, used by routing daemons (routed and gated) in case the routing table is incorrect
SO_ERROR: get pending error and clear
SO_KEEPALIVE: test if TCP connection still alive periodically (2 hours, changed by TCP_KEEPALIVE)
Generic Socket Options (Cont.)
SO_LINGER: linger on TCP close if data in socket send buffer, linger structure passed
SO_OOBINLINE: leave received out-of-band data inline in the normal input queue
SO_RCVBUF/SO_SNDBUF: socket receive / send buffer size, TCP default: 8192-61440, UDP default: 40000/9000
SO_RCVLOWAT/SO_SNDLOWAT: receive / send buffer low water mark for select to return
SO_RCVTIMEO/SO_SNDTIMEO: receive / send timeout for socket read/write
SO_REUSEADDR/SO_REUSEPORT: allow local address reuse for TCP server restart, IP alias, UDP duplicate binding for multicasting
SO_TYPE: get socket type, SOCK_STREAM or SOCK_DGRAM
SO_USELOOPBACK: routing socket gets copy of what it sends
IPv4 Socket Options
IP_HDRINCL: IP header included with data, e.g. traceroute builds own IP header on a raw socket
IP_OPTIONS: specify socket options like source route, timestamp, record route, etc.
IP_RECVSTADDR: return destination IP address of a received UDP datagram by recvmsg
IP_RECVIF: return received interface index for a received UDP datagram by recvmsg
IP Socket Options (Cont.)
IP_TOS: set IP TOS field of outgoing packets for TCP/UDP socket, TOS: IPTOS_LOWDELAY / THROUGHPUT, RELIABILITY / LOWCOST
IP_TTL: set and fetch the default TTL for outgoing packets, 64 for TCP/UDP sockets, 255 for raw sockets, used in traceroute
IPv6 Socket Options
ICMP6_FILTER: fetch and set icmp6_filter structure specifying message types to pass
IPV6_ADDFORM: change address format of socket between IPv4 and IPv6
IPV6_CHECKSUM: offset of checksum field for raw socket
IPV6_DSTOPTS: return destination options of received datagram by recvmsg
IPV6_HOPLIMIT: return hop limit of received datagrams by recvmsg
IPv6 Socket Options (Cont.)
IPV6_HOPOPS: return hop-by-hop options of received datagrams by recvmsg
IPV6_NEXTHOP: specify next hop address as a socket address structure for a datagram
IPV6_PKTINFO: return packet info, dest IPv6 and arriving interface, of received datagrams
IPV6_PKTOPTIONS: specify socket options of TCP socket
IPV6_RTHDR: receive source route
TCP Socket Options
TCP_KEEPALIVE: seconds between probes
TCP_MAXRT: TCP max retx time
TCP_MAXSEG: TCP max segment size
TCP_NODELAY: disable Nagle algorithm, to reduce the number of small packets
TCP_STDURG: interpretation of TCP’s urgent pointer, used with out-of-band data

condition variables
raw sockets – raw socket creation
raw socket input-raw socket output
ping program
trace route program.


Usually, sockets are used to build applications on top of a transport protocol

Stream sockets (TCP)

Datagram sockets (UDP)

Some applications need to access a lower layer protocol,

Control protocols built on IP rather than UDP or TCP, such as ICMP and IGMP

Experimental transport protocols

A “raw” socket allows direct access to IP

Used to build applications on top of the network layer

Standard socket() call used to create a raw socket

Family is AF_INET, as for TCP or UDP

Socket type is SOCK_RAW instead of SOCK_STREAM or SOCK_DGRAM

Socket protocol needs to be specified, e.g. IPPROTO_ICMP


Features of Raw Socket

Read and write ICMP and IGMP packets (instead for putting more code into the kernel, it is handled entirely in the user process )

A process can read and write IPv4 datagram with an IPV4 protocol field that is not processed by the kernel (most kernel process datagram of ICMP, IGMP, TCP, UDP)

A process can build its own IPV4 header using IP_HDRINCL socket option

Raw socket creation

Only a super user can create a raw socket

Created as follows

Sockfd=socket(AF_INET, SOCK_RAW, protocol)

Protocol: IPPROTO_xxx

IP_HDRINCL option can be set as follows

int on = 1;

setsockopt(sockfd,, IPPROTO_IP, IP_HDRINCL, &on, sizeof(on))

Raw Socket Output

Normal output can be performed : sendto, send, write

The starting address to be specified

If IP_HDRINCL is not set : the first byte following the IP header

If IP_HDRINCL is set : the first byte of the IP Header

Raw Socket Input

Cannot read the TCP & UDP directly through the raw socket. It must be read through the data link layer

Icmp packets are passed to the raw socket after the kernel has finished process the ICMP message

Internet Control Message Protocol

ICMP messages

Query network node(s) for information

Report error conditions

ICMP messages are carried as IP datagrams

ICMP “uses” or is “above” IP

ICMP messages usually processed by IP, UDP,or TCP

Two important applications of Raw Sockets

Ping program

Trace route Program

· Ping is a computer network tool used to test whether a particular host is reachable across an IP network;
· it is also used to self test the network interface card of the computer.
· It works by sending ICMP “echo request” packets to the target host and listening for ICMP “echo response” replies.
· Ping estimates the round-trip time, generally in milliseconds, and records any packet loss, and prints a statistical summary when finished
· ICMP packet
ICMP ping packet
Bit 0 - 7
Bit 8 - 15
Bit 16 - 23
Bit 24 - 31
IP Header
(160 bits OR 20 Bytes)
Type of service
flags et offset
Time To Live(TTL)
Source IP address
Destination IP address
ICMP Payload
(64+ bits OR 8+ Bytes)
Type of message
Data (optional)

· Composition of an ICMP Echo Reply packet
o Header (in blue), with Protocol set to 1 and Type of Service set to 0.
o Type of ICMP message (8 bits)
o Code (8 bits)
o Checksum (16 bits), calculated with the ICMP part of the packet (the header is not used)
o Data load for the different kind of answers
· Traceroute is a computer network tool used to determine the route taken by packets across an IP network.
· Traceroute works by increasing the "time-to-live" value of each successive batch of packets sent.
· The first three packets sent have a time-to-live (TTL) value of one (implying that they are not forwarded by the next router and make only a single hop).
· The next three packets have a TTL value of 2, and so on.
· When a packet passes through a host, normally the host decrements the TTL value by one, and forwards the packet to the next host.
· When a packet with a TTL of one reaches a host, the host discards the packet and sends an ICMP time exceeded (type 11) packet to the sender.
· The traceroute utility uses these returning packets to produce a list of hosts that the packets have traversed en route to the destination.
· The three timestamp values returned for each host along the path are the delay (aka latency) values typically in milliseconds (ms) for each packet in the batch. If a packet does not return within the expected timeout window, a star (asterisk) is traditionally printed.
· Traceroute may not list the real hosts. It indicates that the first host is at one hop, the second host at two hops, etc. IP does not guarantee that all the packets take the same route.