Agughasi Victor Ikechukwu - Final Journal
Agughasi Victor Ikechukwu - Final Journal
COLLEGE
BY
BY
of associated NEs.
Table 2: A subset of XSP primitives
The current XSP implementation includes a
Sockets API-compatible client library, libxsp
Achieving reliable, high-speed data transfer
client, that is being retooled to expand feature
performance remains a “holy grail” for many in
support. This provides familiar semantics
the research and education (R&E) and e-
such as open, close, connect, send, receive
Science communities, and it is increasingly
and extends these with session-specific calls
important for the commercial sector as well.
that interact with the XSP-SH layer. Thus, an
While available link and backbone capacity
application may establish a session for a
have rapidly increased, the achievable
reliable, stream-oriented connection, as would
throughput for typical end-to-end applications
be provided by TCP, while being able to
has failed to increase commensurately. In
configure a desired authorization scheme,
many cases, application throughput may be
dynamic network path, separate data
significantly less than what is theoretically
channels, etc., all from a common interface.
achievable unless a considerable amount of
effort is spent on host, application, and
The XSP implementation also provides a
network “tuning” by users and network
transparent wrapper, known as the Shim
administrators alike. The growing WAN
Library indicated in Figure 1. Using library
acceleration industry underscores this need.
interposition (e.g. via the Linux LD PRELOAD
Our earlier work with Phoebus [33, 34] is a
mechanism), the wrapper allows existing
direct response to this performance gap,
applications to take advantage of XSP without
particularly when bulk data movement is
requiring any source code modifications.
concerned. Phoebus is a middle-ware system
that applies our XSP session layer, along with
associated forwarding infrastructure, for particular window of activity, which is exactly
improving throughput in today’s networks. what our earlier definition of a session allows
Phoebus is a descendant of the Logistical via XSP. The BAG operation is asynchronous
Session Layer (LSL) [10,11], which used a in that the producer is only notified of a
similar protocol and approach. Using XSP, transfer completion if requested or required by
Phoebus is able to explicitly mitigate the the implementation, and typically via an out-
heterogeneity in network environments by of-band message. An analogy can be drawn
breaking the end-to-end connection into a between BAG and that of shipping packages
series of connections, each spanning a via UPS or FedEx. The sender is not, generally,
different network segment. In this model, continuously involved with the delivery of their
Phoebus Gateways (PGs) located at strategic items from one location to another. Instead,
locations in the network take responsibility for UPS is notified that some number of packages
forwarding users’ data to the next PG in the are available at a particular address for
path, or to the destination host. The Phoebus pickup, the sender makes them available at
network “inlay” of intelligent gateways allows their front door, and UPS “gets” the packages
data transfers to be adapted at application run and delivers them to the desired destination. A
time, based on available network resources sender may even request delivery confirmation
and conditions. which can be received or checked “out-of-
band” via a web site.
In order to adapt between protocols along What BAG requires is the ability to decouple
different net-work segments, Phoebus uses the the sender (really the host operating system)
Protocol Channel XSP-SH to implement a from the task of pushing the actual data
number of transfer backends. These backends through the network, freeing up the system to
can then be used interchangeably via the perform other tasks, just as UPS frees a
shared XSP API while the underlying XSP shipper of packages to go about their day.
framework handles any differences in protocol Fortunately, the rapidly evolving area of
semantics. The existing Phoebus Remote Direct Memory Access (RDMA)
implementation has developed and technologies has provided exactly this
experimented with a number of protocol capability.
backends, including TCP, UDP, and MX [21],
as well as userspace protocol implementationsThe BAG approach draws inspiration from
such as UDT [14]. One alternative is todata center environments where switched-
decouple the data movement over the network fabric interconnects like InfiniBand and
from the involvement of the operating system similar RDMA protocols have played a
itself. In this model, sending a resource, significant role in enabling massive
whether it be a set of files or data already parallelization with improved throughput and
within the page cache, involves let-ting the OS
reduced latency and overhead. Supporting
simply stage the data in memory on behalf of zero-copy networking, RDMA operates on the
the requesting application while allowing theprinciple of transferring data directly from the
remote host to asynchronously “get” the memory of one system to another, across a
prepared memory regions via some transport network, while bypassing the operating system
mechanism. We call this particular type of and eliminating the need to copy data between
transfer scenario Bulk Asynchronous GET, or user and kernel memory space. These direct
BAG. memory operations are supported by enabling
net-work adapters to register, or “pin”,
The BAG approach entails having a memory and directly access these explicitly
producer make some resources available for a allocated regions without involvement or
remote consumer to ac-cess, during some context switching from the host operating
system. necessary local and remote addresses, keys,
Recently, enhancements to Ethernet for and size of each mem-ory region to transfer.
“data center bridging” have led to RDMA
implementations that run directly over Figure 3 shows a system-level view of XSP
existing layer-2 network infrastucture, using provid-ing the necessary signaling to maintain
RDMA-enabled network adapters called rNICs. a BAG transfer. After the XSP session
By allowing the network adapter to establishes the RDMA context, registers local
encapsulate memory-resident application data and remote buffers, and exchanges point-ers,
within layer-2 frames directly, the overhead of the rNICs proceed to transfer data within the
higher level protocols can be virtually desig-nated memory regions without further
eliminated. A number of RDMA over Ethernet involvement from the OS. What is missing
(RoE) im-plementations are under from this picture is the service that “stages”
development and a standard for high- requested data in memory to be transferred.
performance Ethernet rNICs has been
proposed and implemented within currently VII. SLaBS with XSP-BAG
available hardware [3, 20].
The first implementation of the BAG approach
extends SLaBS with the ability to use RDMA
over Ethernet to more efficiently transfer slabs
across dedicated network paths. Here, the
memory regions to GET are the slab SP-DUs
being buffered at the SLaBS gateway and the
main challenge involves extending the
Figure 3: System-level view of RDMA transfers with
threaded buffer model within SLaBS to
XSP-driven Bulk Asynchronous GET (XSP-BAG) support efficient BAG transfers over high-
latency WAN paths.
Using XSP, I analysed a conceptual model of
a BAG service that uses an RDMA transport
and have begun implementing the necessary
components within the XSP session layer. This
has involved two main tasks: (i) creating an
RoE Protocol Channel XSP-SH, and (ii) adding
XSP option types that allows the appli-cation
to exchange the necessary metadata to Figure 4: SLaBS “triple buffering” for XSP-BAG
perform the remote GET operations and signal
As network latency increases, it is well
transfer comple-tion events. The RDMA
understood that pipelining the transmission of
protocol handler in XSP-SH uses the
network buffers is required in order to
OpenFabrics [12] rdmacm and Infiniband ib-
continually keep data “in-flight” within the
verbs libraries to establish the RDMA
net-work. TCP solves this with a sliding
transport context and initiate the supported
window protocol clocked to the round-trip
RDMA operations. The em-ployed Infiniband
time (RTT) of the path, the effects of which
“RDMA READ” operation is equiva-lent to the
become exaggerated over so-called “long fat
GET described in Bulk Asynchronous GET
networks” leading to considerable performance
and requires the exchange of memory region
issues. In contrast, SLaBS maintains an open
pointers to move data from one RDMA-
loop model over the dedicated core network
connected host to another. The XSP option
paths which allows us to determine ahead of
blocks defined for BAG transfers (option type
time what resources are necessary and to pace
XSP OPT BAG) encode and exchange the
slab transfers based on buffer and throughput included 8GB of high-speed DDR2 RAM. The
capabilities at various points in the network. gateways systems were outfitted with two
This amounts to ensuring that the buffering Mellanox rNICs in order to evaluate SLaBS
implementation has at least one bandwidth- per-formance with native, or “hard”, RDMA
delay product1 (BDP) sized buffer in transit at over Ethernet.
any given time. However, in the BAG model Figure 5 shows that at 10Gb/s, the SLaBS
with RDMA, there is an inherent trade-off gateway systems are not able to successfully
between the number of registered memory form slab SPDUs and simultaneously burst
regions and total buffer size required to buffered slabs over either TCP or UDP data
saturate the given network path. channels. Increasing the number of additional
incoming streams does not significantly affect
Our current SLaBS buffer implementation, the buffering or backend performance. With
illustrated in Figure 4, uses an adjustable size the XSP-BAG extensions and the RDMA data
ring buffer with configurable memory region channel, the slab bursting performance nearly
partitions within the overall buffer. To simplify matches that of only writing SPDUs into the
the amount of memory region metadata ex- slab buffer from the edge connections. Indeed,
change required with XSP, we employ a simple the RDMA data channel transmits at the
“triple buffering” scheme where three memory maximum bandwidth achievable by the rNIC,
regions are ex-changed between SLaBS approximately 9.71Gb/s. As the WAN latency
gateways in a round-robin fashion to keep the is increased, direct TCP performance suffers
network saturated. While incoming SP-DUs whereas with SLaBS, and the RDMA data
are written to one region, the second region is channel enabled, observable transfer
a “ready-to-send” slab and its associated performance remains consistent beyond
metadata has al-ready been sent to the remote 100ms, improving upon the direct TCP
side within an XSP slab option block. The transfer by up to 18% in the highest latency
remote side posts the GET operations as they case.
are received over the XSP control session while 10
9
a third region is continuously being retrieved.
8
6
VIII. Performance Evaluation
Gb/s
5
This section presents on initial performance
4
results as evaluated using the XSP-BAG buffer only
3
additions to the SLaBS gateway system. All RDMA send
2 TCP send
results were collected from a testbed UDP send
1
environment consisting of 7 nodes connected 0
with 10Gb/s Myri-com Ethernet NICs, each 1 2
No. Of Connections
4 8
node having at least two 10Gb/s interfaces. Figure 5: SLaBS data channel performance
The testbed forms a linear network topology
with client and server nodes at the edges, two
SLaBS gateways in the middle, and 3 delay
nodes segmenting the network into
representative LAN and WAN segments. The
edge host and netem nodes were Sun X2200
servers with quad-core AMD Opteron CPUs
and 4GB of RAM, while the gateway systems
contained AMD Phenom II X4 processors and
date, we have implemented handlers for
OSCARS, Terapaths, and OpenFlow and have
begun implementations for NETCONF and
end-host network configuration for Linux-
based systems.
[16]. http://ipv6.com/articles/general/IPv6-
The-Future-of-the-Internet.htm
[17].http://www.nojitter.com/post/24014554
6/ipv6-and-the-future-internet-problems-
created
[18].http://www.computer.org/csdl/mags/ic/
2012/06/mic2012060011.html
[19].http://www.hit.bme.hu/~farkask/publica
tions/ipv6_english.pdf
[20].http://nes.aueb.gr/publications/FIAbook
2011.pdf
[21]. http://en.wikipedia.org/wiki/Middlebox