Design Issues of Concurrent and Iterative Servers I/O Multiplexing Socket Options
Design Issues of Concurrent and Iterative Servers I/O Multiplexing Socket Options
socket socket
bind open_listenfd
open_clientfd
listen
Connection
request
connect accept
rio_writen rio_readlineb
Client /
Server Await connection
Session request from
rio_readlineb rio_writen next client
EOF
close rio_readlineb
close
Iterative Servers
Iterative servers process one request at a
time
client 1 server client 2
call connect call accept call connect
ret connect
call write ret accept
read
ret write
close
close
call accept
ret connect
ret accept
call write
read
ret write
close
close
Fundamental Flaw of Iterative Servers
client 1 server client 2
call accept
call connect
ret connect
ret accept
call fgets
Server blocks call read call connect
User goes waiting for
out to lunch data from
Client 2 blocks
Client 1
waiting to complete
Client 1 blocks its connection
waiting for user request until after
to type in data lunch!
system call
recvfrom no datagram ready
process
blocks
datagram ready
copy datagram
return OK
process datagram copy complete
Non-Blocking I/O Model
Application kernel
system call
recvfrom EWOULDBLOCK no datagram ready
system call
recvfrom EWOULDBLOCK no datagram ready
system call
recvfrom datagram ready
copy datagram
return OK
process datagram copy complete
Non-Blocking I/O Model
When an application sits in a loop calling
recvfrom on a nonblocking descriptor like
this it is called polling
The application is continuously polling the
kernel to see if some operation is ready
This is often a waste of CPU time, but this
model is occasionally encountered, normally
on systems dedicated to one function
I/O Multiplexing Model
Application kernel
system call
select no datagram ready
process
blocks
return readable
datagram ready
recvfrom system call
copy datagram
process
blocks
return OK
process datagram copy complete
I/O Multiplexing Model
We call select or poll and block in one of
these two system calls instead of blocking in
the actual I/O system call
Signal Driven I/O Model
Application kernel
system call
establish SIGIO no datagram ready
signal handler return
process
continues
executing
deliver SIGIO
signal handler datagram ready
recvfrom system call
copy datagram
process
blocks
return OK
process datagram copy complete
Signal Driven I/O Model
We first enable the socket for signal driven
I/O and install a handler using sigaction
When the datagram is ready the SIGIO signal
is generated for our process
Either read the datagram from the signal
handler calling recvfrom and then notify the
main loop or we can notify the main loop and
let it read the data
Asynchronous I/O Model
Introduced in POSIX.1 (realtime extensions)
We tell the kernel to start the operation and
notify us when the entire operation (including
copying of data from kernel to our buffer) is
complete
The select() system call
The select() system call blocks until one
or more of a set of file descriptors become
ready
The select() system call
#include <sys/time.h> // For portability
#include <sys/select.h>
struct timeval {
time_t tv_sec; // Seconds
suseconds_t tv_usec; // Microseconds
};
The select() system call
When timeout is NULL or points to a structure
containing non-zero fields, select() blocks
until
At least one of the file descriptors specified in
readfds, writefd or exceptfds becomes ready
The call is interrupted by a signal handler
The amount of time specified by timeout has
passed
The select() system call:
Return values
-1 on error. Possible errors include EBADF
and EINTR
0 means the call timeout before any file
descriptor became ready. In this case the
returned file descriptors will be empty
A positive value indicated more than one file
descriptor is ready
Each must be examined using FD_ISSET()
If file descriptor is ready for more than one event
then count multiple times
Under what conditions is a
descriptor read ready?
Number of bytes of data in a socket receive buffer ≥
current size of low water mark for the socket receive
buffer. A read operation on that socket will not block
and will return a value greater than zero
Low water mark set using SO_RCVLOWAT socket option
Default to 1 for TCP/UDP sockets
The read half of the connection is closed (i.e., a
TCP connection that has received a FIN)
A read operation on that socket will not block and return 0
(i.e., EOF)
Under what conditions is a
descriptor read ready?
The socket is a listening socket and the
number of connected connections is non
zero. An accept on the listening socket will
normally not block
A socket error is pending. A read operation
on the socket will not block and will return an
error (-1) with errno set to specific condition
Under what conditions is a
descriptor write ready?
Number of bytes in socket send buffer ≥
current size of the low water mark for the
socket send buffer
Use SO_SNDLOWAT
Default is 2048 bytes for TCP/UDP sockets
The write half of the connection is closed. A
write operation on the socket will generate
SIGPIPE
A socket error is pending
Under what conditions is a
descriptor exception ready?
If there exists OOB data for the socket or the
socket is still at the out-of-band mark
The poll() function
#include <poll.h>
optlen = sizeof(optval);
getsockopt(sfd, SOL_SOCKET, SO_TYPE,
&optval, &optlen);
getsockopt and setsockopt
Write a program to check whether most of the
options defined are supported, and if so, print
their default values.
SO_BROADCAST
Enables or disables the ability of the process
to send broadcast messages
SO_DEBUG
Supported only by TCP
If enabled for a TCP socket, the kernel keep
track of detailed information about all the
packets sent or received by TCP for the
socket
These are kept in a circular buffer within the
kernel that can be examined with the trpt
program
SO_ERROR
When an error occurs on a socket, the
Berkley driven kernels sets a variable named
so_error for the socket to one of the standard
Unix exxx values
This is called the pending error for the socket
The process can be immediately notified of
the error in two ways
If the process is blocked on select, return
If the process is using signal driven I/O, SIGIO is
generated for either the process or the process
group
SO_KEEPALIVE
When the keepalive option is set for a TCP
socket and no data has been exchanged
across the socket in either direction for 2
hours, TCP automatically sends a keepalive
probe to the peer. Three scenarios can occur
Peer responds with ACK. Application is not
notified (since everything is ok)
Peer responds with RST (peer host has crashed
or rebooted). Socket’s pending error is set to
ECONNRESET and the socket is closed
SO_KEEPALIVE
No response from the peer. Berkley derived
kernels send eight additional probes, 75 secs
apart. TCP gives up in 11 mins 15 secs. If no
response, socket’s pending error is set to
ETIMEDOUT and socket is closed. However if the
socket receives an ICMP error in response to one
of the keepalive probes, the corresponding error
is returned (the socket is still closed). A common
ICMP error is host unreachable, in which case the
pending error is set to EHOSTUNREACH
SO_LINGER
This option specifies how close function
operates for TCP
By default close returns immediately, but if
any data is still remaining in the socket send
buffer, the system will try to deliver the data
to the peer
SO_LINGER
The following structure is passed between
the user process and the kernel
struct linger {
int l_onoff; // 0=off; non-zero=on
int l_linger; // linger time in secs
};
SO_LINGER
If l_onoff is 0, the option is turned off
If l_onoff is non-zero and l_linger is 0, TCP
aborts the connection when it is closed
TCp discards any remaining data in the socket
send buffer and sends an RST to the peer
If both l_onoff and l_linger is non-zero, the
kernel will linger when socket is closed
If there is any data still remaining in the socket
send buffer, the process is put to sleep until either
All data is sent and acknowledged by the peer TCP
The linger time is non zero
SO_RCVBUF and SO_SNDBUF
TCP and UDP have receive buffers to hold
received data until read by application
With TCP, the available room in the socket
receive buffer is the window that TCP
advertises to the other end
Peer is not allowed to send data beyond the
advertised window (TCP flow control)
If peer still sends data beyond this window,
TCP discards it
With UDP if the incoming datagram does not
fit in the receive buffer, datagram is discarded
SO_RCVBUF and SO_SNDBUF
These two socket options let us change the
default sizes
Default value differs widely between
implementations
Older Berkley derived implementations would
default sent and receive buffers to 4096 bytes
New systems use anywhere between 8192 to
61440 bytes
UDP send buffer often defaults to a value around
9000 bytes and the receive buffer around 40000
bytes
SO_RCVBUF and SO_SNDBUF
For a client SO_RCVBUF must be set before
or after connect() and why?
For a server SO_RCVBUF must be set when
and why?
Before or after listen()?
Before or after connect()?
SO_REUSEADDR and
SO_REUSEPORT
SO_REUSEADDR allows a listening server
to start and bind its well known port even if
previously established connections exist that
use this port
The listening server is restarted
The listening server terminates but the child
continues to service the client on the existing
connection
SO_REUSEADDR and
SO_REUSEPORT
SO_REUSEADDR allows multiple instances of the
same server to be started on the same port, as long
as each instance binds a different local IP address
SO_REUSEADDR allows a single process to bind
the same port to multiple sockets, as long as each
bind specifies a different local IP address
SO_REUSEADDR allows completely duplicate
bindings: a bind of an IP address and port, when
that same IP address and port are already bound to
another socket
SO_REUSEADDR and
SO_REUSEPORT
SO_REUSEPORT allows complete duplicate
bindings, but only if each socket that wants to
bind the same IP address and port specify
this socket option
SO_REUSEADDR is considered equivalent
to SO_REUSEPORT if the IP address being
bound is a ___________address (fill in the
blanks).
SO_TYPE
Returns the socket type
The integer value returned is a value such as
SOCK_STREAM or SOCK_DGRAM
Typically used by a process that inherits a
socket when it is started
SO_USELOOPBACK
Socket receives a copy of everything sent on
the socket
Only applies to socket in the routing domain
(AF_ROUTE)
fcntl revisited for sockets
Use O_NONBLOCK using F_SETFL to set a
socket nonblocking
Set O_ASYNC file status flag using F_SETFL
which causes SIGIO to be generated when
status of a socket changes
F_SETOWN allows the socket owner (the
process id or process group id) to receive the
SIGIO and SIGURG signals
F_GETOWN returns the current owner of the
socket
fcntl revisited for sockets
#include<fcntl.h>