0% found this document useful (0 votes)
63 views40 pages

The Journey of A Packet Through The Linux Network Stack

Uploaded by

Shidong Wang
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
63 views40 pages

The Journey of A Packet Through The Linux Network Stack

Uploaded by

Shidong Wang
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 40

Linux Network Fundamentals

and Applications

Andrew yongjoon kong


Cell lead, Architect
Kakaocorp
Contents

 Linux networking fundamentals


 Linux networking applications for VM
 Linux netwokring applications for
container
 Kakao’s applications

Krnet 2016 kakaocorp


Networking Data Structures

 The most important structures in linux


kernel:
 sk_buff (defined in include/linux/skbuff.h)
 netdevice (defined in
include/linux/netdevice.h)

Krnet 2016 kakaocorp


Linux Network Stack
Network
Applications User

Kernel

BSD Sockets
Socket Interface

INET Sockets

TCP UDP
Protocol Layers

IP
ARP

Link Layers
PPP SLIP Ethernet

Krnet 2016 kakaocorp


Real View of Network transfer

Krnet 2016 kakaocorp


Simplifying Receiving a Packet

 Network card
 receives a frame

issues an  Driver
interrupt  handles the interrupt
•Frame  RAM
•Allocates sk_buff
(called skb)
•Frame  skb

Krnet 2016 kakaocorp


Network Fundamental 1:
sk_buff (skbuff.h)
 Generic buffer for all packets
 sk_buff represents data and
headers
 Almost always sk_buff instances
appear as “skb” in the kernel code

Transport Header Layer4 (TCP/UDP/ICMP)


Network Header Layer4(IPv4/v6/ARP)
MAC Header Layer2 (Mac)
sk_buff ‘s 3 unions

Krnet 2016 kakaocorp


sk_buff (cont.)

struct sk_buff *next


struct sk_buff *prev
struct sk_buff_head *list
struct sock *sk

union {tcphdr; udphdr; …} h; Transport Header
union {iph; ipv6h;arph;…} nh; Network Header
union {raw} mac; MAC Header
…. DATA

Krnet 2016 kakaocorp


SK_BUFF contd.
 struct dst_entry *dst – the route for this
sk_buff; this route is determined by the routing
subsystem.
 It has 2 important function pointers:
 int(*input)(struct sk_buff*);
 int (*output)(struct sk_buff*);

 input() can be assigned to: ip_local_deliver,


ip_forward, ip_mr_input, ip_error or
dst_discard_in.
 output() can be assigned to: ip_output,
ip_mc_output, ip_rt_bug, or dst_discard_out.

Krnet 2016 kakaocorp


sk_buff (cont.)

Krnet 2016 kakaocorp


“Understanding Linux Network Internals”, Christian Benvenuti
Network Fundamental 2:
net_device
 net_device represents a network interface card.
 Not exactly represents physical device
 There are cases when we work with virtual
devices.
 For example, bonding or VLAN

 Many times this is implemented using the


private data of the device (the void *priv
member of net_device);

Krnet 2016 kakaocorp


net_device contd
 unsigned int mtu – Maximum Transmission Unit: the
maximum size of frame the device can handle.
 Each protocol has mtu of its own; the default is 1500 for
Ethernet.
 unsigned int flags (which you see or set using ifconfig
utility): for example, RUNNING or NOARP.
 unsigned char dev_addr[MAX_ADDR_LEN] : the MAC
address of the device (6 bytes).

Krnet 2016 kakaocorp


Receiving a Packet (Device)
 Driver (cont.)
 calls device independent
core/dev.c:netif_rx(skb)
•puts skb into CPU queue
•issues a “soft” interrupt

 CPU
 calls core/dev.c:net_rx_action()

•removes skb from CPU queue


•passes to network layer e.g. ip/arp
•In this case: IPv4 ipv4/ip_input.c:ip_rcv()

Krnet 2016 kakaocorp


Receiving a Packet (IP)
 ip_input.c:ip_rcv()
checks
•Length >= IP Header (20 bytes)
•Version == 4
•Checksum
•Check length again

calls calls
ip_rcv_finish() route.c:ip_route_input()

Krnet 2016 kakaocorp


Receiving a Packet (routing)
 ipv4/route.c:ip_route_input()

Destination == local?
YES ip_input.c:ip_local_deliver()
NO Calls ip_route_input_slow()

 ipv4/route.c:ip_route_input_slow()
Can forward?
•Forwarding enabled?
•Know route?
NO Sends ICMP

Krnet 2016 kakaocorp


Forwarding a Packet

 Forwarding is handled per-device basis


 Receiving device usually do the forwarding
 Enable/Disable forwarding in Linux:
 /proc file system ↔ Kernel

 read/write normally (in most cases)

•/proc/sys/net/ipv4/conf/<device>/forwarding
•/proc/sys/net/ipv4/conf/default/forwarding
•/proc/sys/net/ipv4/ip_forwarding

Krnet 2016 kakaocorp


Forwarding a Packet (cont.)
 ipv4/ip_forward.c:ip_forward()

IP TTL > 1
YES Decreases TTL
NO Sends ICMP

 .... a few more calls


 core/dev.c:dev_queue_xmit()
 Default queue: priority FIFO
sched/sch_generic.c:pfifo_fast_enqueue()
 Others: FIFO, Stochastic Fair Queuing, etc.
Krnet 2016 kakaocorp
Skb life cycle

Krnet 2016 kakaocorp


Linux Network for L3 (Routing)
Zebra
Linux System

RIP BGP OSPF

Routing Information Base


Netlink User Daemon

Kernel Route
Forwarding Information
Base

Krnet 2016 kakaocorp


Routing Lookup
Cache
ip_route_input() in: net/ipv4/route.c lookup

Miss

ip_route_input_slow() Fib_lookup () in Hit


Deliver packet by:
ip_fib_local_table
in: net/ipv4/route.c ip_local_deliver()
or ip_forward()
according to result
Miss

Fib_lookup () in
ip_fib_main_table

Miss

Drop packet

Krnet 2016 kakaocorp


RIB decision by Dynamic Routing
protocols: SDN in L3

FiB, Decided by State and


Algorithm.
Isn’t it already Software
Defined Something?

Krnet 2016 kakaocorp


http://www.xorp.org/papers.html
Software forwarding plane:
Linux kernels
Control plane Interface between control
and forwarding planes:
routing daemons
 Linux (old)
/proc  /proc, sysctl, ioctl
ioctl()
netlink
routing socket
 Linux (new)
 Netlink socket
Linux kernel  BSD
 Routing socket
Forwarding plane

Krnet 2016 kakaocorp


http://www.xorp.org/papers.html
OpenFlow : SDN for L2
 Physical separation of control
OpenFlow and forwarding
Controller  Forwarding plane in L2
 Flow table instead of FIB
 More general than IP
OpenFlow Switch exposes flow table
SSL Protocol

though simple OpenFlow
protocol
 Keep it simple
Flow table
 Vendor can keep platform
closed
OpenFlow-enabled
 Use outboard device for packet
Layer-2 Switch
processing
Matches subsets of packet header fields
Switch MAC MAC Eth VLAN IP IP IP TCP TCP
Port src dst type ID Src Dst Prot sport dport
Krnet 2016 kakaocorp
Linux networking for VM

 Basic networking
 Ethernet
 VLAN
 Subnet, ARP
 DHCP
 IP
 TCP/UDP/ICMP

Krnet 2016 kakaocorp


Linux networking for VM, cont.

 Network Components
 Switch ( packet swtiching vs flow)
 Router ( vs Gateway )
 Firewalls ( vs Iptables )
 Load balancers ( vs Routers)

Krnet 2016 kakaocorp


Linux networking for VM, cont.

 Tunnel technologies
 Generally Known as Overlay
 GRE
 VXLAN
 Why not ipsec?

Krnet 2016 kakaocorp


Linux networking for VM, cont.

 Network namespaces
 A way ( not only ) of scoping networking
functions and components.
 VRF : multiple Gateway on the same router
at the same time

Krnet 2016 kakaocorp


Linux networking for VM, cont.

 SNAT: router modifies source IP in


packet
 DNAT: router modifies destination IP in
packet
 One-to-one NAT

Krnet 2016 kakaocorp


Linux networking for VM, example

 Openstack networking
 Add more complexity
 veth, openvswitch, linux bridge

Krnet 2016 kakaocorp


Linux networking for VM, example

 Openstack networking, cont.

Krnet 2016 kakaocorp


SDN: practical

 Google’s Jupiter

Krnet 2016 kakaocorp


SDN:practical, Kakao’s case

 What we try to solve


 IP movement inter-rack, inter-zone, inter-
dc(?)
 IP resource imbalance
 Fault Resilience
 Dynamically check status of network
 Simple IP Resource Planning and
Management

Krnet 2016 kakaocorp


SDN:practical, Kakao’s case cont.

 Use 32bit subnet, BGP and switch


namespace
Routing Table

1 10.100.10.2/32 via 192.1.1.201

192.1.1.202 eBGP
Compute node Routing Table
Default GW 192.168.1.1 eth1
Switch Namespace dhcp-server 192.1.1.201
iBGP
Host Route dest 10.10.100.2/32
to 10.10.100.1
process eth1
10.10.100.1

neutron-dhcp-
linux bridge agent
IP:10.10.100.2/ neutron-
32 linuxbridge-
agent
vm Routing Table
Default GW x.x.x.x eth0
GW
nova-compute
eth0 Controller
global name space

Krnet 2016 kakaocorp


What is container?

 Container comprises multiple


namespaces
 Standardized resource
 Brick or Lego

Krnet 2016 kakaocorp


Typical container orchestrator’s network

 Yes, it’s overlay again.

Flannel
Krnet 2016 kakaocorp
Scalable container network: Kakao’s
case

 Have to deal with those when you try to


use overlay.
 Have to re-think about performance
 Have to think about fault-resiliency, and
migration issues.
 Still consider how send the packet out of the
system.

Krnet 2016 kakaocorp


Scalable container network: Kakao’s
case, cont.

 It has history
 First approach was using docker libnetwork
Using Docker libnet

blog.midonet.org

 BTW, Kubernetes give it up! OMG

Krnet 2016 kakaocorp


Scalable container network: Kakao’s
case, cont.
 Use node port and Load balancer
 It’s very easy.
 Had issue with scalability node port has limited
port range.
 Only have 5digits number of containers

 Load balancer is expensive.

Krnet 2016 kakaocorp


Scalable container network: Kakao’s
case, cont.
 Use routable container bridge subnet and
bgp injector
 Predefine subnet for each containers bridge
router
 Have to provision before resource depleted.

BGP
Router
Router
Injector
Cluster

subnet1 subnet2 subnet3

Container Container Container


Krnet 2016 Node1
kakaocorp Node2 Node3
BTW
 It’s all about connecting/controlling
fundamental network elements. (we
didn’t invent new wheel)
 But we try to find the secret composition
 Hope that openflow/overlay based
solution will be getting more popular,
cheaper and simpler.

Krnet 2016 kakaocorp

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy