-
Notifications
You must be signed in to change notification settings - Fork 2
/
Copy pathnetwork.tex
1432 lines (1290 loc) · 61.6 KB
/
network.tex
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
include(`macros.m4')
\pagebreak
\pdfbookmark[0]{network programming}{site}
\begin{slide}
\sltitle{Contents}
\slidecontents{7}
\end{slide}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\hlabel{NETWORKING}
\begin{slide}
\sltitle{Network communication}
\begin{description}
\item[UUCP (UNIX-to-UNIX Copy Program)] -- first application for communication
between UNIX systems connected directly or via modems, in Version~7 UNIX (1978)
\item[sockets] -- introduced in 4.1 BSD (1982); socket is one end of a
bidirectional communication channel created between two processes either on the
same computer or across a network.
\item[TLI (Transport Layer Interface)] -- SVR3 (1987); API providing network
communication within the \nth{4} layer of ISO OSI. Counterpart to the BSD
sockets API.
\item[RPC (Remote Procedure Call)] -- SunOS (1984); provides access to services
running on a remote machine, data transferred in XDR format (External Data
Representation)
\end{description}
\end{slide}
\begin{itemize}
\item There are two main conceptual models for network communication. ISO
(International Standards Organization) OSI (Open Systems Interconnect), and the
Internet protocol suite (also called TCP/IP). Each one defines several layers.
If we oversimplify, ISO OSI is very formal, quite complex, has 7 layers, and its
specification is not freely available, while TCP/IP is simpler, not that formal,
with 4 layers, and the specification is freely available via RFCs. While the
layers of the two models cannot be precisely mapped onto each other, the
corresponding layers are roughly as follows:
\vspace{1.5mm}
\renewcommand{\arraystretch}{1.2}
\begin{tabular}{|l|l|}\hline
ISO OSI & TCP/IP\\
\hline
\hline
application & application\\
presentation & \\
session & \\
\hline
transport & transport\\
\hline
network & internet\\
\hline
link & link\\
physical & \\
\hline
\end{tabular}
\renewcommand{\arraystretch}{1}
\vspace{1.5mm}
We will be working exclusively with the TCP/IP model.
\item UUCP is a historical thing, fully implemented in userland without any
kernel support. See the \texttt{uucp} man page or Wikipedia for more
information.
\item RPC is implemented as a library linked to applications, uses sockets
and works on top of TCP and UDP. RPC was developed as a communication protocol
for the \emph{Network Filesystem} (NFS). There are several mutually
incompatible RPC implementations.
\item TLI is designed from an OSI model-oriented viewpoint, and it corresponds
to the \nth{4} layer -- transport. TLI API looks similar to sockets.
\item Sockets for communication within the same host are in the
\texttt{AF\_UNIX} domain and their names correspond to special files that
represent the sockets in the filesystem. \texttt{ls -F} uses the equal sign
``='' to mark a Unix domain socket.
\item Sockets in \texttt{AF\_UNIX} are different from local TCP/IP communication
over the loopback interface \texttt{localhost} (\texttt{127.0.0.1}). See page
\pageref{SOCKET} for more information on \texttt{AF\_UNIX}.
\end{itemize}
%%%%%
\begin{slide}
\sltitle{TCP/IP basics}
\begin{itemize}
\item protocols
\begin{itemize}
\item \emsl{IP (Internet Protocol)} -- principal communications protocol,
not accessible for a non-privileged user
\item \emsl{TCP (Transmission Control Protocol)} -- reliable, ordered, and
error-checked delivery of a stream of bytes
\item \emsl{UDP (User Datagram Protocol)} -- datagram, connection-less,
unreliable
\end{itemize}
\item \emsl{IP address} -- 4 bytes (IPv4) / 16 bytes (IPv6), defines a network
interface, not a computer
\item \emsl{port} -- 2 bytes, application end-points on a host
\item \emsl{DNS (Domain Name System)} -- translates domain names to the
numerical IP addresses
\end{itemize}
\end{slide}
\begin{itemize}
\item Unix mostly uses protocols from the TCP/IP family. We will cover TCP and
UDP. In both protocols, one end of a communication channel is identified by an
IP address and a port. Those two pieces correspond to a socket. A TCP
connection is uniquely identified by a pair of sockets.
\item Ports below 1024 are reserved and additional privileges are needed to use
them. For example, \emph{root} can access those. See \texttt{/etc/services}
for the textual database of port numbers and corresponding service names.
\item To learn about networking and the Internet in general,
see the Computer Networks (NSWI090) course.
There are also on-line materials (\url{https://www.earchiv.cz/l226/index.php3})
by ifdef([[[NOSPELLCHECK]]], [[[Ji\v{r}\'{i} Peterka]]]).
\item \emph{IP} -- protocol of the internet layer within the TCP/IP network
stack, provides data transfer between two interfaces identified by an IP
address. It is unreliable. Provides routing and fragmentation. Defined in
RFC~791. The Internet Control Message Protocol (ICMP), defined in RFC~792, is
an inherent part of the IP protocol.
\item \emph{UDP} -- a simple extension of the IP protocol, only adds protocol
numbers. Still unreliable and datagram oriented. Defined in RFC~768.
\item \emph{TCP} -- establishes connections between two points (sockets).
Provides a continuous stream of data, congestion control and reliable
delivery. To create a connection, a so called \emph{handshake} must be
performed. The protocol is defined in RFC~793 and other follow-up RFCs.
\item \emph{DNS} -- hierarchically organized database, its structure does not
have to follow the IP address structure.
\end{itemize}
%%%%%
\begin{slide}
\sltitle{Connection-oriented (TCP), sequential service}
\input{img/tex/tcp_seq.tex}
\end{slide}
\begin{itemize}
\item Note that common \funnm{read}() a \funnm{write}() calls are used. To
keep the picture readable, all arguments aside from a descriptor \emph{fd} were
intentionally omitted.
\item The server creates one connection and does not accept a new one until the
previous connection finished. That is why it is called a sequential service.
\item System calls used:
\begin{itemize}
\item \funnm{socket}() -- creates a socket, returns its descriptor.
\item \funnm{bind}() -- binds an IP address and a port number to the socket.
In other words, it assigns a name to an unnamed socket. The address must be
either one of IP addresses assigned to one of the host network interfaces (the
host where the socket was created), in which case it will only accept connection
requests over that specific interface via that specific IP address, or it can be
a special "any" value (for so called \emph{wildcard sockets},
denoting the connection will be accepted on any IP address on any interface of
the host. See page \pageref{ANYADDR}.
\item \funnm{listen}() -- tells the kernel to start listening on connection
requests on the socket.
\item \funnm{accept}() -- blocks the process until there is a connection
request, then creates the connection and returns a \emsl{new} descriptor which
is used for communicating with the client. The original socket descriptor can
be used for another \funnm{accept}() call to serve a new connection request from
another client.
\item \funnm{close}() -- closes the connection.
\item \funnm{connect}() -- the client asks to create a connection. The IP
address and a port number are passed as arguments (omitted in the picture), the
communication is performed through an already existing socket descriptor
\emph{fd}. In contrast to \funnm{accept}(), a new file descriptor is not
created.
\end{itemize}
\end{itemize}
%%%%%
\pdfbookmark[1]{socket}{socket}
\begin{slide}
\sltitle{Creating a socket: \texttt{socket()}}
\setlength{\baselineskip}{0.9\baselineskip}
\texttt{int \funnm{socket}(int \emph{domain}, int \emph{type},
int \emph{protocol});}
\begin{itemize}
\item creates a socket, returns its descriptor
\item \emph{domain} -- ,,where the communication will take place'':
\begin{itemize}
\item \texttt{AF\_UNIX} \dots{} local communication within a host, its
address is a file name. Also \texttt{AF\_LOCAL}.
\item \texttt{AF\_INET}, \texttt{AF\_INET6} \dots{} internet communication,
the address is an IP address and port pair
\end{itemize}
\item \emph{type}:
\begin{itemize}
\item \texttt{SOCK\_STREAM} \dots{} connection-oriented reliable service,
provides bidirectional data stream
\item \texttt{SOCK\_DGRAM} \dots{} connection-less unreliable service,
transmits datagram
\end{itemize}
\item \emph{protocol}: \texttt{0} (default for a given \emph{type})
or a valid protocol number (e.g. \texttt{6}~=~TCP, \texttt{17}~=~UDP)
\end{itemize}
\end{slide}
\hlabel{SOCKET}
\begin{itemize}
\item The function is declared in \texttt{<sys/socket.h>} as well as other
network related functions from the previous slide.
\item Sockets use the same name space as file descriptors, i.e. the same
descriptor table. If you write a simple program that only calls
\funnm{socket}(), it will return the same file descriptor number as if
\funnm{open()} was called, i.e. the first available slot in the descriptor
table, usually 3.
\item Connection-oriented communication is always bidirectional and is similar
to pipes. However, note that pipes may not be bidirectional, see page
\pageref{TWO_WAY_PIPES}.
\item Sometimes you can see constants beginning with \verb#PF_# (meaning
\emph{protocol family}, e.g. \texttt{PF\_IN\-ET}, \verb#PF_UNIX#, or
\texttt{PF\_IN\-ET6}) and used in a \funnm{socket}() call. Constants \verb#AF_#
(\emph{address family}) are then used only for naming the sockets. While it
might make a better sense, the UNIX specification only defines \verb#AF_#
constants. And if \verb#PF_# constants exist on a system, they are defined via
corresponding \verb#AF_# constants. We do recommend to use only \verb#AF_#
constants.
\item There are other domains, see the \texttt{socket(2)} manual page.
\item There are also other socket types, for example \texttt{SOCK\_RAW}, for
full protocol access. In order to use \texttt{SOCK\_RAW} additional privileges
are usually needed. That is a reason why the \texttt{ping} command, which
works with ICMP headers of sent packets, might need to be a SUID program:
\begin{verbatim}
$ ls -l /usr/sbin/ping
-r-sr-xr-x 1 root bin 55680 Nov 14 19:01 /usr/sbin/ping
\end{verbatim}
\end{itemize}
%%%%%
\pdfbookmark[1]{bind}{bind}
\begin{slide}
\sltitle{Naming the socket: \texttt{bind()}}
ifdef([[[NOSPELLCHECK]]], [[[
\funml{int \funnm{bind}(\=int \emph{socket},
const struct sockaddr *\emph{address}, \\\>socklen\_t \emph{address\_len});}
]]])
\begin{itemize}
\item binds a \emph{socket} with an address
\item \texttt{struct sockaddr} is universal and not used to fill out the address
\begin{itemize}
\item ifdef([[[NOSPELLCHECK]]],
[[[\texttt{sa\_family\_t \emph{sa\_family}}]]]) \dots{} domain
\item ifdef([[[NOSPELLCHECK]]], [[[\texttt{char \emph{sa\_data}[]}]]])
\dots{} address
\end{itemize}
\item for \texttt{AF\_INET}, \texttt{struct sockaddr\_in} is used:
\begin{itemize}
\item ifdef([[[NOSPELLCHECK]]],
[[[\texttt{sa\_family\_t \emph{sin\_family}}]]]) \dots{} domain
(\texttt{AF\_INET})
\item ifdef([[[NOSPELLCHECK]]],
[[[\texttt{in\_port\_t \emph{sin\_port}}]]]) \dots{} port number (16 bits)
\item ifdef([[[NOSPELLCHECK]]],
[[[\texttt{struct in\_addr \emph{sin\_addr}}]]]) \dots{}
IP address (32 bits)
\item ifdef([[[NOSPELLCHECK]]],
[[[\texttt{unsigned char \emph{sin\_zero}[8]}]]]) \dots padding
\end{itemize}
\end{itemize}
\end{slide}
\begin{itemize}
\item \funnm{bind}() assigns a socket its source address for packets being sent
to the other side of the connection which is also the destination address for
data received. The remote address is set via \funnm{connect}().
\item Structure \texttt{sockaddr} is a universal type used by a kernel. For
setting addresses based on a specific domain one has to use concrete structures
per domain, see the next slide. Those specific structures need to be casted to
the universal structure as that is the type required by the \funnm{bind}()
function, for example. However, it is not recommended to use those structures
as the program will only work for one specific address family -- we will do it
here to show you how it works though. However, you can use helper functions
that convert names to addresses with no need to work those structures directly.
See \funnm{getaddrinfo}() on page \pageref{GETADDRINF} for more information.
\item You will also need other header files, see the example on page
\pageref{BIND_EXAMPLE}.
\item For domains \texttt{AF\_INET} and \texttt{AF\_INET6}, you can use a
special address that stands for any address on a given host. Such a socket can
be used to accept connections on any IP address assigned to any interface on the
host. That address is:
\begin{itemize}
\hlabel{ANYADDR}
\item For \texttt{AF\_INET}, use \texttt{INADDR\_ANY} (4 zero
bytes corresponding to \texttt{0.0.0.0})
\item With \texttt{AF\_INET6} the situation is more complicated. You can either
use a constant variable \texttt{in6addr\_any} or a constant
\texttt{IN6ADDR\_ANY\_INIT}. How\-ever, the constant can be only used to
initialize variables of type \texttt{struct in6\_addr}, not for any assignment.
Both values correspond to \texttt{::} (16 zero bytes).
\end{itemize}
\item You cannot bind the same address to multiple sockets.
\item If \funnm{bind}() is not called, the kernel will assign one of available
ports and the primary IP address on the interface through which the destination
can be accessed. In general, as a client, you do not need a specific outgoing
port so \funnm{bind}() is usually not needed and the call is typically used only
by servers as they need a specific port number to be contacted (e.g. 443 for
HTTPS). Note that some legacy services though, e.g. \texttt{rsh}, require the
client to connect from a privileged port (0-1023). Such a client must call
\funnm{bind}() to use such a port and also have enough privileges to do that.
\item \emsl{The address and port must always use network byte ordering.} See
page \pageref{BYTE_ORDERING} where different byte orderings were explained.
More information is on page \pageref{HTON}.
\end{itemize}
%%%%%
ifdef([[[NOSPELLCHECK]]], [[[
\pdfbookmark[1]{struct sockaddr\_in}{sockaddrin}
]]])
\begin{slide}
\sltitle{Structure for IPv4 addresses}
\begin{itemize}
\item each address family has its structure and a header file
\item the structure is in \funnm{bind}() type-casted to \texttt{struct sockaddr}
\end{itemize}
\begin{verbatim}
#include <netinet/in.h>
struct sockaddr_in in = { 0 }; /* IPv4 */
in.sin_family = AF_INET;
in.sin_port = htons(2222);
in.sin_addr.s_addr = htonl(INADDR_ANY);
if (bind(s, (struct sockaddr *)&in, sizeof (in)) == -1) ...
\end{verbatim}
\end{slide}
\hlabel{BIND_EXAMPLE}
\begin{itemize}
\item As already mentioned, such code is for demonstration purposes only.
Unless really needed, you should not write code like this but rather use generic
functions for converting names to addresses in which case you do not need to
worry about address families. Most networking applications are assumed to work
with both IPv4 and IPv6.
\item \texttt{sin\_addr} is a structure itself, of type \texttt{in\_addr}. That
structure must have at least one member, \texttt{s\_addr} whose type must be
equivalent to a 4 byte integer. That comes directly for the UNIX specification
for \texttt{netinet/in.h}.
\item Working with \texttt{sockaddr\_in6} is a bit more complicated. It is in
\texttt{netinet/in.h}, either defined in there directly or the file includes
\texttt{netinet6/in6.h} with the structure definition.
\item We repeat again as it is important -- for both \texttt{AF\_INET} and
\texttt{AF\_INET6}, the port and address must be in the network byte ordering,
see page \pageref{HTON}.
\item As \texttt{INADDR\_ANY} is defined as \texttt{0}, you may see its use
without \funnm{htonl}() (see page \pageref{HTON} for more information on the
function). Never do it that way. Next time you put an IP address there, you
may forget to add \funnm{htonl}() and you are going to run into some issues
right away. And again, if you write code agnostic to specific address families,
you do not need to worry about network byte ordering for addresses and ports
whatsoever.
\item In the \texttt{AF\_UNIX} domain,
\texttt{struct sockaddr\_un} is used, defined in \texttt{<sys/un.h>}:
\begin{itemize}
\item ifdef([[[NOSPELLCHECK]]], [[[\texttt{sa\_family\_t \emph{sun\_family}}]]])
\dots{} domain
\item ifdef([[[NOSPELLCHECK]]], [[[\texttt{char \emph{sun\_path}[]}]]])
\dots{} socket name
\item The size of \emph{sun\_path} has intentionally been left undefined in the
UNIX specification. This is because different implementations use different
sizes. For example, 4.3~BSD uses a size of 108, and 4.4~BSD uses a size of 104.
Since most implementations originate from BSD versions, the size is typically in
the range 92 to 108.
\end{itemize}
\end{itemize}
%%%%%
\pdfbookmark[1]{listen}{listen}
\begin{slide}
\sltitle{Waiting for connection: \texttt{listen()}}
\texttt{int \funnm{listen}(int \emph{socket}, int \emph{backlog});}
\begin{itemize}
\item specifies willingness to accept incoming connections on \emph{socket}, and
the system starts listening
\item maximum of \emph{backlog} connection requests may wait in the queue to
be served
\item connection requests that do not fit the queue are refused
(\funnm{connect}() returns an error on the other side).
\item the system waits for a connection on an address previously assigned by
\funnm{bind}()
\end{itemize}
\end{slide}
\begin{itemize}
\item The system may silently adjust \emph{backlog} if it is not in a supported
range.
\item Wildcard sockets are primarily used for servers. If you need to
distinguish between interfaces, you need a socket per an IP address. This used
to be used for web servers that distinguished virtual servers based on the IP
address. However, that is remote past. To distinguish between virtual servers
running on the same host, a HTTP header \uv{\texttt{Host:}} is used. Similarly,
the TLS protocol uses the \texttt{ServerName} extension.
\item The fact that the system starts listening on a port means that a TCP
handshake is performed and data is being accepted. The data is stored
in a fixed length buffer and after it is filled out, the connection is still
active but the TCP window is set to 0 which means the system stopped accepting
further data. The buffer size is usually a few tens of kilobytes.
\hlabel{UP_TO_LISTEN_ONLY_C} Example: \example{tcp/up-to-listen-only.c}.
\item The example code uses macro \texttt{SOMAXCONN}, required by the UNIX
specification to be in \texttt{sys/socket.h}. It specifies the maximum queue
length for \funnm{listen}(). As far as we know, Linux, FreeBSD, macOS and
Solaris use value of 128.
\item This function is specific to connection-oriented protocols, so it does not
work with UDP.
\end{itemize}
%%%%%
\pdfbookmark[1]{accept}{accept}
\begin{slide}
\sltitle{Accepting connection: \texttt{accept()}}
ifdef([[[NOSPELLCHECK]]], [[[
\funml{int \funnm{accept}(\=int \emph{socket}, struct sockaddr *\emph{address},
\\\>socklen\_t *\emph{address\_len});}
]]])
\begin{itemize}
\item creates a connection between the local, already listening \texttt{socket}
and a remote end
\item returns a new socket descriptor that can be used to communicate with the
remote process. Original socket can be used to accept another connection.
\item returns a remote IP address/port in \emph{address} unless NULL
\item \emph{\texttt{*address\_len}} is a size of the \emph{address} structure,
is updated with its real size on return
\end{itemize}
\end{slide}
\hlabel{ACCEPT}
\begin{itemize}
\item The ``remote end'' is the socket on which a \funnm{connect}() is
called on a remote Unix host, or it could be possibly something else on other
systems. Remember that protocols and APIs are two different things.
\item The newly created socket has the same characteristics as \emph{socket}.
For example, if \emph{socket} was non-blocking, the new socket is non-blocking
as well.
\item If more clients running on the same host connect to the same server (i.e.
using the same IP address and port), individual connections are distinguished
only by the client side port number. Do remember that a TCP connection is
uniquely identified by two sockets, i.e.
ifdef([[[NOSPELLCHECK]]], [[[``((addr1, port1), (addr2, port2)).'']]])
\item The \emph{address} may be \texttt{NULL} which means the caller is not
interested in the remote end address. In that case,
\emph{\texttt{address\_len}} should be \texttt{NULL} as well.
\item If the code is written to be independent of an address family, it should
use \texttt{struct sockaddr\_storage} for \emph{address}. Any specific address
structure is guaranteed to fit in \texttt{struct sockaddr\_storage}, i.e. either
\texttt{struct sockaddr\_in} or \texttt{struct sockaddr\_in6}. Also see page
\pageref{TCPCLNTEXAMPLE}.
\item \hlabel{TCP_SINK_SERVER_C} Example: \example{tcp/tcp-sink-server.c}
\end{itemize}
\pdfbookmark[1]{connect}{connect}
\begin{slide}
\sltitle{Initiating a connection: \texttt{connect()}}
ifdef([[[NOSPELLCHECK]]], [[[
\funml{int \funnm{connect}(\=int \emph{sock}, struct sockaddr *\emph{address},
\\\>socklen\_t \texttt{address\_len});}
]]])
\begin{itemize}
\item attempts to make a connection to a remote socket waiting on \emph{address}
(of \emph{\texttt{address\_len}} length)
\item if \emph{sock} was not bound before, the kernel will assign an available
local address based on the chosen address family
\item you should close \emph{sock} on a connection failure
\end{itemize}
\end{slide}
\hlabel{CONNECT}
\begin{itemize}
\item As the UNIX specification does not say anything about the socket state
after a connection failure, you should definitely close \emph{sock} before
moving on to create a new socket and trying again.
\item After the connection is created, both the server and client can use normal
\funnm{read}() and \funnm{write}() calls, or \funnm{send}(), \funnm{recv},
\funnm{sendmsg}(), and \funnm{recvmsg}(). Behavior of the functions is similar
as if working with pipes.
\item Example: \hlabel{CONNECT_C} \example{tcp/connect.c}
\item \hlabel{CONNECT_FOR_UDP} For connection-less services (UDP),
\funnm{connect}() may be used as well. However, it will only set the remote
address so that \funnm{send}() and \funnm{recv}() which do not have a remote
address parameter can be used.
\item We may also call \funnm{connect}() multiple times for a connection-less
service in which case each call sets the remote address again. If we use
\texttt{NULL} for the remote address, the present address is reset.
\item If the socket is set as non-blocking, see page \pageref{FCNTL},
\funnm{connect}() will not block while waiting to connect. It will return
\texttt{-1} with \texttt{errno} set to \texttt{EINPROGRESS} (= ``not possible to
create a connection right away'') and the connection request is stored in the
system queue to be processed. Until the connection is ready, subsequent
\funnm{connect}()s return \texttt{-1} with \texttt{errno} set to
\texttt{EALREADY}. However, using this way to test the connection readiness is
not the right approach as if the background connection attempt fails, the next
\funnm{connect}() tries to create a new connection and we would end up in a
never ending loop. The right approach is to use \funnm{select}() or
\funnm{poll}(), see pages \pageref{SELECT} and \pageref{POLL}. You can also
find there an example using a non-blocking \funnm{connect}(), page
\pageref{NON_BLOCKING_CONNECT}.
\end{itemize}
%%%%%
\begin{slide}
\sltitle{Connection-oriented (TCP), parallel service}
\input{img/tex/tcp_par.tex}
\end{slide}
\begin{itemize}
\item For each client connection the server accepts, a new process is created to
process it. After the connection is finished, the child exits. The parent
process can accept new connections meanwhile. So, multiple connections may be
served in parallel.
\item After \funnm{fork}()ing but before performing I/O on the connection, the
child may \funnm{exec}() -- that is how \texttt{inetd} works.
\item As you already know, by calling \funnm{waitpid}() you are getting rid of
zombies. This only works because the \texttt{WNOHANG} flag is used,
otherwise the parent would be blocked and the next connection would only be
accepted after one of the children exited.
Another way is to set to ignore \texttt{SIGCHLD} in which case you
completely avoid the grave danger of the living dead accumulating (see page
\pageref{IGNORE_SIG_CHLD}). You could also catch the signal and call
one of the \funnm{wait}() calls from the handler itself which is fine as the
call is \emph{async-signal-safe} (see page \pageref{ASYNCSIGNALSAFE}).
\end{itemize}
%%%%%
\begin{slide}
\sltitle{Connection-oriented service, parallel \texttt{accept()}}
\input{img/tex/tcp_prefork.tex}
\end{slide}
\begin{itemize}
\item After calling \funnm{bind}() and \funnm{listen}(), the parent creates
several children who sequentially serve connection requests. Kernel,
from the user point of view in non-deterministic fashion, distributes the
connection requests between child processes waiting in \funnm{accept}().
The parent itself does not serve any connection but possibly \funnm{wait}()s and
creates new processes as needed.
\item The parent should monitor the number of existing children serving
connections and spawn new processes as necessary.
It is a good idea for child processes to voluntarily exit after serving a
certain number of connections to avoid issues like memory leaks to affect the
host as a whole. Apache web server can be configured to work like this.
\item Any approach of handling the connections on the server side covered so far
works with the same client. The way a client works does not depend on which
approach the server chooses.
\end{itemize}
\begin{slide}
\sltitle{Datagram services (UDP)}
\input{img/tex/udp.tex}
\end{slide}
\begin{itemize}
\item Note that there is no \funnm{listen} call.
\item Both the client and server use the same functions, the client is one that
sends the first datagram.
\item As in TCP, a client does not need \funnm{bind}() unless it requires a
specific local address to bind to. The server gets the client address from the
received datagram.
\item In contrast to connection-oriented service, the connection-less one has
less overhead and one can use the same socket to communicate with multiple
remote processes.
\item You can use \funnm{connect}() for UDP as well, see page
\pageref{CONNECT_FOR_UDP} for more information.
\end{itemize}
%%%%%
ifdef([[[NOSPELLCHECK]]], [[[
\pdfbookmark[1]{recvfrom}{recvfrom}
]]])
\begin{slide}
\sltitle{Receiving message: \texttt{recvfrom()}}
\setlength{\baselineskip}{0.8\baselineskip}
ifdef([[[NOSPELLCHECK]]], [[[
\funml{ssize\_t \funnm{recvfrom}(\=int \emph{sock}, void *\emph{buf},
size\_t \emph{l{}en}, \\\>int \emph{flg}, struct sockaddr *\emph{address},
\\\>socklen\_t *\emph{address\_len});}
]]])
\begin{itemize}
\item receives a message from \emph{\texttt{sock}}, stores it to
\emph{\texttt{buf}}
of size \emph{\texttt{l{}en}}, puts the sender's address to \emph{address},
and address length to \emph{\texttt{address\_len}}. Returns message length.
If the message does not fit \emph{l{}en}, extra data is discarded
(\texttt{SOCK\_STREAM} does not divide data, nothing is discarded).
\item flags \emph{\texttt{flg}} can be:
\begin{itemize}
\item \texttt{MSG\_PEEK} \dots{} message is considered not read, next
\texttt{recvfrom} will return it again
\item \texttt{MSG\_OOB} \dots{} reads urgent (out-of-band) data
\item \texttt{MSG\_WAITALL} \dots{} waits for the buffer to fill up
i.e. \emph{l{}en} bytes
\end{itemize}
\end{itemize}
\end{slide}
\begin{itemize}
\item Mainly for \texttt{SOCK\_DGRAM} sockets. Waits for the whole message,
does not return datagram portion. Again, it is possible to set the socket
as non-blocking.
\item There does not seem to be a portable way how to get the size of the
UDP datagram before reading it out from kernel buffer. On Linux the
\texttt{MSG\_TRUNC|MSG\_PEEK} flags can be used for \texttt{recvfrom} to get the
size. On other systems the generic \texttt{recvmsg} syscall can be used to at
least provide awareness of the truncation (\texttt{MSG\_TRUNC} flag in the
\texttt{msghdr} structure).
Depending on the application, using large buffer (based on lower network layer
constraints) might be the answer (at the cost of wasting memory).
\item \texttt{address\_len} \emsl{must} be initialized with buffer size if
the \texttt{address} is not \texttt{NULL}. \texttt{NULL} address is used to
express that the caller is not interested in remote's address -- however that is
usually not the case when working with datagrams.
\item Instead of using \texttt{recvfrom} it is possible to use
\texttt{recvmsg} which is more generic.
\item If \texttt{connect} was used then \texttt{recv} can be used instead of
\texttt{recvfrom}.
\item After successful return from \texttt{recvfrom} it is possible to reuse
\texttt{address} and \texttt{address\_len} unchanged for a \texttt{sendto} call.
\item Like \texttt{sendto}, \texttt{recvfrom} is possible to use for connected
service (TCP). That said, getting remote's address for a TCP connection is
better via \texttt{getpeername}, see page \pageref{GETPEERNAME}.
\item Example: \hlabel{UDP_SERVER_C} \example{udp/udp-server.c}
\end{itemize}
%%%%%
ifdef([[[NOSPELLCHECK]]], [[[
\pdfbookmark[1]{sendto}{sendto}
]]])
\begin{slide}
\sltitle{Sending message: \texttt{sendto()}}
ifdef([[[NOSPELLCHECK]]], [[[
\funml{ssize\_t \funnm{sendto}(\=int \emph{socket}, void *\emph{msg},
size\_t \emph{l{}en},\\\>int \emph{flags}, struct sockaddr *\emph{addr},
\\\>socklen\_t \emph{addr\_len});}
]]])
\begin{itemize}
\item sends a message \emph{\texttt{msg}} via \emph{socket} of \emph{l{}en}
bytes to address \emph{\texttt{addr}} (of \emph{\texttt{addr\_len}} length).
\item \emph{flags} can carry:
\begin{itemize}
\item \texttt{MSG\_EOB} \dots{} finish a record (if supported by the
protocol)
\item \texttt{MSG\_OOB} \dots{} send urgent (out-of-band) data
\end{itemize}
\end{itemize}
\end{slide}
\begin{itemize}
\item Used mainly for \texttt{SOCK\_DGRAM} sockets, because in such situation
we only have socket representing our side of the connection; see the note
for \texttt{accept}. The remote address has to be specified which cannot be
done with \texttt{write}. Moreover, for \emsl{datagram} service the data sent
is considered as whole, i.e. either it is accepted completely or the call
will block -- partial write does not exist. Like with file descriptors, it is
possible to set the socket as non-blocking, see page \pageref{FCNTL}.
\item Instead of using \texttt{sendto} more generic function \texttt{sendmsg}
can be used.
\item If \texttt{connect} was used then \texttt{send} can be used instead of
\texttt{sendto}.
\item Successful return from either \emsl{does not mean successful delivery of
the message to the remote side}, but only insertion of the data to local buffer
which is yet to be sent out.
\item It is possible to use \texttt{sendto} for stream service, however the
address will be ignored. The only reason not to use \texttt{write} would be to
use flags. In that case though it would be simpler to use \texttt{send}.
\item Example:\hlabel{UDP_CLIENT_C} \example{udp/udp-client.c}.
\end{itemize}
%%%%%
\pdfbookmark[1]{close}{closesocket}
\begin{slide}
\sltitle{Closing socket: \texttt{close()}}
\setlength{\baselineskip}{0.8\baselineskip}
\texttt{int \funnm{close}(int \emph{sock});}
\begin{itemize}
\item close descriptor \emph{sock}, after the last descriptor is closed, close
the socket in kernel
\item for the \texttt{SOCK\_STREAM} socket,
\texttt{SO\_LINGER} flag is important (default is \texttt{.l\_onoff~==~0}, use
\funnm{setsockopt}() with \texttt{struct linger} to change that).
\begin{itemize}
\item \texttt{.l\_onoff~==~0} \dots{} \funnm{close}() returns but system
tries to transfer rest of the data
\item \texttt{.l\_onoff~==~1~\&\&~.l\_linger~!=~0} \dots{} system tries to
transfer data until timeout \texttt{l\_linger} expires (in seconds), if it
fails, return error, otherwise return OK after transferring data.
\item \texttt{.l\_onoff~==~1~\&\&~.l\_linger~==~0} \dots{} reset the
connection
\end{itemize}
\end{itemize}
\end{slide}
\hlabel{CLOSESOCKET}
\begin{itemize}
\item Once closed, TCP socket can remain in transitory state which is defined in
TCP protocol for connection closing. Before the socket is completely destroyed,
it is not possible to use another socket with the same port, unless this
behavior was overridden with the \texttt{SO\_REUSEADDR} flag using the
\texttt{setsockopt} function, see page \pageref{SETSOCKOPT}.
\item Connection reset is abnormal connection termination. In case of TCP
a packet with \texttt{RST} flag is used for such termination. The remote side
will detect this as end of file when reading, the reset will lead to the
\texttt{ECONNRESET} error. Example: \example{tcp/linger.c}.
\end{itemize}
%%%%%
\pdfbookmark[1]{shutdown}{shutdown}
\begin{slide}
\sltitle{Shut down part of a connection: \texttt{shutdown()}}
\texttt{int \funnm{shutdown}(int \emph{socket}, int \emph{how});}
\begin{itemize}
\item shuts down a socket but does not close the descriptor, \emph{how} can be:
\begin{itemize}
\item \texttt{SHUT\_RD} \dots{} shut it down for reading
\item \texttt{SHUT\_WR} \dots{} for writing
\item \texttt{SHUT\_RDWR} \dots{} for both
\end{itemize}
\end{itemize}
\end{slide}
\begin{itemize}
\item After using \texttt{shutdown} it is still necessary to close the
descriptor using \texttt{close}.
\item Normal TCP connection termination each side will signal that no subsequent
writes will follow. This is valid for either \texttt{close} or
\texttt{shutdown(fd, SHUT\_RDWR)}. When using
\texttt{shutdown(fd, SHUT\_WR)} it is still possible to read from the socket.
The remote side will get \texttt{EOF} while reading however it can still write.
\item If \texttt{connect} is used on a datagram socket then calling
\texttt{shutdown} on the socket will make subsequent send and/or receive calls
on that socket fail.
\end{itemize}
%%%%%
ifdef([[[NOSPELLCHECK]]], [[[
\pdfbookmark[1]{inet\_pton, inet\_ntop}{ipaddrfncs}
]]])
\begin{slide}
\sltitle{Working with IPv4 and IPv6 addresses}
\begin{itemize}
\item binary representation of IP address is hard to read
\item string representation of IP address cannot be used when working with
\texttt{sockaddr} structures
\end{itemize}
ifdef([[[NOSPELLCHECK]]], [[[
\texttt{int \funnm{inet\_pton}(int \emph{af}, const char *\emph{src},
void *\emph{dst});}
]]])
\begin{itemize}
\item converts string to binary representation, i.e. something usable for
\texttt{in\_addr} or \texttt{in6\_addr} members of \texttt{sockaddr} structures
\item returns 1 (OK), 0 (wrong address) or -1 (and sets \texttt{errno})
\end{itemize}
ifdef([[[NOSPELLCHECK]]], [[[
\funml{cont char *\funnm{inet\_ntop}(\=int \emph{af}, const void *\emph{src},
\\\>char *\emph{dst}, size\_t \emph{size});}
]]])
\begin{itemize}
\item counterpart to \texttt{inet\_pton}; returns \emph{\texttt{dst}} or
\texttt{NULL} (and sets \texttt{errno})
\end{itemize}
\begin{itemize}
\item for both functions \texttt{af} is either \texttt{AF\_INET} or
\texttt{AF\_INET6}
\end{itemize}
\end{slide}
\hlabel{IPv4_IPv6_ADDRESSES}
\begin{itemize}
\item The functions are declared in \texttt{arpa/inet.h}.
\item \texttt{inet\_pton} returns 1 if the conversion successfully happened,
0 if given string is not an address or -1 if \emph{\texttt{af}} is not supported
(\texttt{EAFNOSUPPORT}). \texttt{inet\_ntop} returns \texttt{dst} if everything
is OK otherwise returns \texttt{NULL} with \texttt{errno} set.
\item Addresses and ports in binary form are stored as big endian.
\item \texttt{dst} has to be sufficiently sized because there is no parameter
specifying the size. This is not a problem since according to the value of
\texttt{af} appropriate address structure or character array can be passed in.
For maximal lengths of strings for addresses, 2 macros can be used --
\texttt{INET\_ADDRSTR\-LEN} (16) or \texttt{INET6\_ADDRSTRLEN} (48).
These values contain space for terminating \texttt{\bs{}0}.
\item \texttt{size} is string size of \texttt{dst}. If not sufficient, the call
will fail and \texttt{ENOSPC} will be set.
\item \texttt{n} stands for \texttt{network}, \texttt{p} stands for
\texttt{presentable}
\item In the past \texttt{inet\_aton} and \texttt{inet\_ntoa} (\texttt{a} as
\texttt{ascii}) were used for IPv4 addresses. Thanks to the functions above
these are now legacy. All these calls are usually documented in the
\texttt{inet} man page.
\item \hlabel{ADDRESSES} Do realize that by using these functions it is only
possible to convert one address family per call, either IPv4 or IPv6. When the
input can be either of those, try one and if that fails, fallback to the other
one. Example: \example{resolving/addresses.c}.
\end{itemize}
%%%%%
ifdef([[[NOSPELLCHECK]]], [[[
\pdfbookmark[1]{setsockopt, getsockopt, getsockname, getpeername}{socketfncs}
]]])
\begin{slide}
\sltitle{More socket functions}
ifdef([[[NOSPELLCHECK]]], [[[
\funml{int \funnm{setsockopt}(\=int \emph{socket}, int \emph{level},
int \emph{opt\_name}, \\\>const void *\emph{opt\_value},
socklen\_t \emph{option\_len});}
]]])
\begin{itemize}
\item sets socket parameters
\end{itemize}
ifdef([[[NOSPELLCHECK]]], [[[
\funml{int \funnm{getsockopt}(\=int \emph{socket}, int \emph{level},
int \emph{opt\_name},\\\>void *\emph{opt\_value},
socklen\_t *\emph{option\_len});}
]]])
\begin{itemize}
\item reads socket parameters
\end{itemize}
ifdef([[[NOSPELLCHECK]]], [[[
\funml{int \funnm{getsockname}(\=int \emph{socket},
struct sockaddr *\emph{address}, \\\>socklen\_t *\emph{address\_len});}
]]])
\begin{itemize}
\item get local socket address
\end{itemize}
ifdef([[[NOSPELLCHECK]]], [[[
\funml{int \funnm{getpeername}(\=int \emph{socket},
struct sockaddr *\emph{address}, \\\>socklen\_t *\emph{address\_len});}
]]])
\begin{itemize}
\item get address of the remote end
\end{itemize}
\end{slide}
\hlabel{SETSOCKOPT}
\hlabel{GETPEERNAME}
\hlabel{GETSOCKOPT}
\begin{itemize}
\item The \texttt{level} value for \texttt{getsockopt} and \texttt{setsockopt}
is usually \verb#SOL_SOCKET#. For \texttt{get\-sock\-opt},
the \texttt{option\_len} value \emsl{must} be initialized to
the size of \texttt{opt\_value}.
\item \texttt{getsockname} is used when not using \texttt{bind} syscall
to determine what is the (local!) address assigned to the socket by the kernel.
\item ifdef([[[NOSPELLCHECK]]],
[[[\verb#getsockopt(sock, SOL_SOCKET, SO_ERROR, &val, &len)#]]]) returns
(and erases) error indication for given socket. This is mainly useful to
get result of non-blocking \texttt{connect}, see page \pageref{CONNECT}.
\item When using ifdef([[[NOSPELLCHECK]]], [[[\verb#SO_REUSEADDR#]]]) it is
possible to immediately create new server (i.e. to successfully call
\texttt{socket},
\texttt{bind}, \texttt{listen} and \texttt{accept}) listening on address and
port previously used even though there are still TCP connections in their final
stage from the previous instance of the server:
\begin{verbatim}
int opt = 1;
setsockopt(fd, SOL_SOCKET, SO_REUSEADDR, &opt, sizeof(opt));
\end{verbatim}
\item See \verb#SO_REUSEADDR# being used in example \example{tcp/reuseaddr.c}.
For the demonstration it is necessary to establish at least once connection,
otherwise there is nothing to wait for and repeated \texttt{bind} will succeed
anyway.
\end{itemize}
%%%%%
ifdef([[[NOSPELLCHECK]]], [[[
\pdfbookmark[1]{htonl, ntohl, htons, ntohs}{byteorderfncs}
]]])
\begin{slide}
\sltitle{Byte order}
\begin{itemize}
\item network services can use byte ordering that differs from the native
one on the system. Conversion functions (macros):
\begin{itemize}
ifdef([[[NOSPELLCHECK]]], [[[
\item\texttt{uint32\_t \funnm{htonl}(uint32\_t \emph{hostlong});}\\
]]]) host $\rightarrow$ network, 32 bits
ifdef([[[NOSPELLCHECK]]], [[[
\item\texttt{uint16\_t \funnm{htons}(uint16\_t \emph{hostshort});}\\
]]]) host $\rightarrow$ network, 16 bits
ifdef([[[NOSPELLCHECK]]], [[[
\item \texttt{uint32\_t \funnm{ntohl}(uint32\_t \emph{netlong});}\\
]]]) network $\rightarrow$ host, 32 bits
ifdef([[[NOSPELLCHECK]]], [[[
\item \texttt{uint16\_t \funnm{ntohs}(uint16\_t \emph{netshort});}\\
]]]) network $\rightarrow$ host, 16 bits
\end{itemize}
\item network byte order is big-endian, i.e. most significant byte first.
Both network addresses and port numbers are multibyte values.
\end{itemize}
\end{slide}
\hlabel{HTON}
\begin{itemize}
\item If the local system uses the network byte order natively, the functions
become no-ops.
\item Simple and sufficient test is to run your program in 2 instances
against each other (if possible) on two systems with different endianess.
\end{itemize}
%%%%%
ifdef([[[NOSPELLCHECK]]], [[[
\pdfbookmark[1]{getprotobyname, getservbyname}{protonumfncs}
]]])
\begin{slide}
\sltitle{Protocol and port numbers}
ifdef([[[NOSPELLCHECK]]], [[[
\texttt{struct protoent *\funnm{getprotobyname}(const char *\emph{name});}
]]])
\begin{itemize}
\item returns protocol number in \texttt{p\_proto} with \emph{name}
(e.g. for \texttt{"tcp"} returns 6).
\item protocol numbers are stored in the \texttt{/etc/protocols} file.
\end{itemize}
ifdef([[[NOSPELLCHECK]]], [[[
\funml{struct servent *\funnm{getservbyname}(\=const char *\emph{name},
\\\>const char *\emph{proto});}
]]])
\begin{itemize}
\item for service name \texttt{name} and protocol name \texttt{proto}
returns port number in \texttt{s\_port}.
\item port numbers are stored in the \texttt{/etc/services} file.
\end{itemize}
The functions return \texttt{NULL}, if matching entry cannot be find in the
database.
\end{slide}
\begin{itemize}
\item The result of \funnm{getprotobyname} is handy for calling \texttt{socket},
the result of \funnm{getservbyname} is for \texttt{bind}.
\item Next to \texttt{getservbyname} there is also \texttt{getservbyport}
that finds service entry according to port number (in network byte order !)
and function \texttt{getservent} and others for entry traversal.
\item All these functions search only "official" lists of services and
protocols, which are located in the relevant naming databases (see slide on
page \pageref{name_service_switch}). Note that the slide mentions the
\texttt{/etc} files for simplification.
\item These files define the mapping for names and numbers for standard
protocols and services.
\item Keep in mind that protocol is the upper layer protocol as specified in
the IP packet header, (e.g. TCP, UDP, OSPF, GRE etc., see pages 11 and 14 in
RFC~791), not HTTP, SSH, telnet or FTP -- these are \emph{services}, represented
by port numbers.
\item \hlabel{GETBYNAME} example: \example{resolving/getbyname.c}
\end{itemize}
%%%%%
ifdef([[[NOSPELLCHECK]]], [[[
\pdfbookmark[1]{getaddrinfo}{getaddrinfo}
]]])
\hlabel{GETADDRINF}