Networks and the Internet Notes

Internet Architecture

The Internet is a BIG distributed system with a large dynamic range.

Design principles

Federated design: no single entity controls the entire system
Best effort: the network does not guarantee delivery, but it tries its best to deliver packets
End-to-end principle: network is as simple as possible, endpoints are responsible for reliability, security, etc.

The tradeoff is that the Internet is hard to manage, has no performance guarantees, and is slow.

Clark 88

The design philosophy of the DARPA internet protocols

TCP/IP was first proposed by the Defense Advanced Research Projects Agency (DARPA). Its main goal was to effectively multiplex across existing networks. Some other goals were:

Survivability and fault-tolerance
Supporting a variety of networks
Distributed management of resources
Cost-effectiveness
Accountability

Some design decisions made were:

Datagrams
Packet switching instead of circuit switching
Storing state at the endpoints instead of the network

Circuit-switched packet forwarding involves setting up a dedicated path between the source and destination before data can be sent. By reserving resources, the networks can make performance guarantees. Ex. telephone networks.

Store/forward packet switching allows data to be sent in small packets that can take different paths to the destination. While this supports a flexible topology and benefits from statistical multiplexing, it does not allow for performance guarantees. Ex. the Internet.

E2E

End-to-end arguments in system design

The end-to-end argument is that only endpoints can provide certain functions correctly, and that implementing these functions in the network can be redundant and inefficient. Examples of these functions include:

Reliable data transmission
Acknowledgment of delivery
Data security
Duplicate message suppression

For example, consider the problem of reliable data transfer. Data transfer may involve the host and client application, the operating system, the disk, and the communication subsystem, all of which could be a point of failure. Thus, reliable data transfer can only be fully implemented at the application layer with end-to-end check and retry, and it may be inefficient to implement reliability in the network as well.

Routing Schemes

Multicast is a routing scheme that sends data from one source to multiple destinations simultaneously. It only replicates packets when necessary in the network path, which optimizes bandwidth.

Multihoming connects a host to multiple networks in order to increase reliability and performance.

Internet Infrastructure

B4 And After

B4 and After: Managing Hierarchy, Partitioning, and Asymmetry for Availability and Scale in Google's Software-Defined WAN

Private Wide Area Networks (WANs) are used by large organizations to connect their offices and data centers. They often have a centralized control plane, which allows for more efficient traffic engineering and better performance than the broader Internet.

B4 is Google's private WAN. One of the main scalability issues in B4 is that increasing site counts (1) complicated capacity planning, (2) slowed the TE algorithm and (3) put pressure on the switch forwarding tables.

One of the things Google did to solve that problem was add more hierarchy to the network topology. Each site now has multiple supernodes (leaf and spine architecture) connected in a full mesh.

Edge Caching as Differentiation

Edge caching leads to performance differences for end-users, similar to traffic differentiation. Furthermore, these differences do not explicitly come about as a result of service differentiation, but rather arise implicitly from the nature of shared caching.

CityMesh

Scalable Routing in a City-Scale Wi-Fi Network for Disaster Recovery

CityMesh uses static access points and mobile devices equipped by Wi-Fi to provide connectivity in cases where (1) the network is down but (2) the physical infrastructure is still intact.

It uses map data to determine the best path for routing packets between buildings, and it uses grid-based addressing to allow for scalable routing.

Congestion Control

Dismantling a Religion

Flow rate fairness: dismantling a religion

Briscoe '08 argues that flow rate fairness is not a good measure of 'fairness'. For one, most flow rate fairness schemes can be taken advantage of by users who open multiple flows, and thus receive more bandwidth.

Instead, cost fairness, which considers the congestion caused by a user, is a better measure. "You get what you pay for."

Access

cISP

cISP: A Speed-of-Light Internet Service Provider

The best latency between two points on Earth is the speed of light in a vacuum, or c-latency. Protocol inefficiencies account for the fact that most Internet traffic is 36-100x c-latency. However, infrastructural inefficiencies account for around 3-4x c-latency. This is mainly because fiber cable has a transmission speed around two-thirds that of c.

cISP is a service provider that uses microwave antennas for long-haul routing and uses fiber for the last mile. Microwave has a short range (~100 km) and limited bandwidth, but also a transmission speed essentially equal to c. However, it is very sensitive to weather and obstructions, and is currently only widely used in high-frequency trading (HFT).

Traffic Control

L4S

Low, Latency, Low-Loss, Scalable Throughput (L4S) is an architecture for Internet congestion control. It uses Explicit Congestion Notification (ECN) to transmit information about congestion ahead of time.

Endpoints using L4S is given preferential treatment in exchange for cooperating using improved CCAs. Remarkably, both L4S and non-L4S traffic see improved performance.

RCS

Principles for Internet Congestion Management

The Internet relies on host-based congestion control algorithms (CCAs) to prevent overloads. However, users have an incentive to deploy more aggressive CCAs to receive more bandwidth. To prevent this, the Internet informally requires all CCAs to be TCP-friendly (TCPF), which means

its arrival rate does not exceed the arrival of a conformant TCP connection in the same circumstances

There are multiple problems with TCPF:

Difficult to enforce
Limits CCAs' ability to achieve full efficiency
In practice, non-TCPF CCAs like CCA BBR is deployed widely

The authors' proposal is to have the network actively enable all reasonable CCAs to achieve the same bandwidth in the same static circumstances, or CCA independence (CCAI). They describe a Recursive Congestion Shares (RCS) framework which uses existing commercial agreements to determine packets' relative rights in a link.

Datacenters

ECMP leads to collisions but is easy to implement to hardware.

Fat-Tree

A scalable, commodity data center network architecture

Let $k$ be the number of ports on a switch. A fat-tree topology has:

$k$ pods with two layers of $k/2$ switches
$(k/2)^2$ core switches, each connected to one switch in each pod
$k^3/4$ hosts

Benefits

Don't need to buy more powerful switches for aggregation and core layers
Fault-tolerance
1:1 oversubscription ratio

Issues

No great solution for TOR redundancy
Difficult to load-balance between the core and the aggregate switches
Not amenable to incremental expansion. $k$ is limited by number of ports. Hosts scale with $k^3$ and switches scale with $k^2$ .

Jupiter Rising

Jupiter Rising: A Decade of Clos Topologies and Centralized Control in Google's Datacenter Network

Datacenter bandwidth demands have doubled every year for the past decade, and are expected to continue to do so. Jupiter Rising describes the evolution of Google's datacenter network over the past decade to support this growth.

Jellyfish

Jellyfish: Networking Data Centers Randomly

The idea behind Jellyfish is to use random graphs instead of structured topologies like hypercubes and fat-trees. Randomly connecting nodes allows for incremental expansion, shorter average path lengths, and better bandwidth. However, it also leads to more complex routing and load balancing.

Bundling and patch panels are two techniques for reducing the number of cables in a datacenter network.

Management

SDN Talk

The Future of Networking, and the Past of Protocols

Road to SDN

The road to SDN: an intellectual history of programmable networks

Software-defined networking (SDN) separates the control plane (which decides how to handle traffic) from the data plane (which forwards traffic according to the control plane's instructions). It also consolidates the control plane so that a single piece of software can control multiple dataplane elements.

OpenFlow is an API that allows the control plane to configure packet-handling rules on the data plane.

Motivations for SDN

Computer networks have many different components, including routers, switches, and middleboxes (firewalls, load balancers, NATs, intrusion detection systems). Network administrators have to use different closed and proprietary interfaces to configure each of these components.
SDNs lower the barrier to innovation and experimentation.

History leading to SDN

Active networks (mid-1990s to early 2000s)
Control and data plane separation (early 2000s to mid-2000s)
OpenFlow API and network operating systems (2007-2010)

The idea behind active networks was to allow users to inject code into the network to customize how packets are handled. There are two models:

The capsule model, where the code is carried in the packet itself
The programmable switch model, where the code is stored on the switch and packets can trigger the execution of the code

The separation of the control and data plane was motivated by demands for technology to manage routing within an ISP. Compared to active networking, this research focused more on

Network administrators (rather than end-users)
Programmability in the control plane (rather than the data plane)
Control over the network (rather than individual devices)

Notably, control functionality was moved off of network equipment and into separate servers, who can store all of the routing state and compute all of the routing decisions.

dSDN

A Decentralized SDN Architecture for the WAN

In a typical SDN, the control plane is a centralized controller who runs a traffic engineering algorithm to compute paths (while accounting for capacity). However, to run an SDN in a global WAN, we need a lot more "SDN control infrastructure".

Network Functions

APLOMB

Making middleboxes someone else's problem: network processing as a cloud service

Middleboxes sit at the edge of the network and intercept and modify traffic. Examples include firewalls, intrusion detection systems, proxies, load balancers, and WAN optimizers.

There are two main ideas in APLOMB:

Move from hardware appliances to software appliances
Take software appliances like middleboxes and run them in the cloud

The former was a very influential idea; the latter less so.

Click

The click modular router

Before Click, it was assumed that routers must be fixed-function devices.

VFP

VFP: A Virtual Switch Platform for Host SDN in the Public Cloud

VFP is Microsoft's virtual switch platform used to implement network functions in software. Its features include:

Multiple independent network controllers can program network applications
Rules apply to stateful connections, not just packets
Flow policy offloaded to programmable NICs (FPGAs)

Note: Google's philosophy in the 2010s was to run everything on the CPU (e.g. Andromeda).

RMT

Forwarding metamorphosis: fast programmable match-action processing in hardware for SDN

Switching chips are 100x faster at switching than CPUS and 10x faster than network processors. They take advantage of pipelining and parallism.

Pipelining involves dividing the execution of an instruction into several steps, each of which could run in parallel with other steps. This allows for much higher throughput than a CPU, which typically executes instructions sequentially.

Note: RMT was developed into a commercial product by Barefoot Networks, which was acquired by Intel in 2019. Their commercial product is the Tofino switches.

Hypergiants

Microservices

Microservices are an alternative to monolithic applications. It divides the application into multiple services which communicate through the network through remote procedure calls (RPCs).

Advantages of microservices:

Independent development
Independent deployment and scaling
Fault isolation

Advantages of monoliths:

Lower network and serialization overhead
Tight coupling

In addition to application logic, microservices also need to do:

Authentication
Tracing
Service discovery
Making connection

Because this code tends to be duplicated in microservices, sometimes this is instead handled automatically by the service mesh.

There are two main styles of service mesh: (1) sidecar and (2) remote proxy.

Advantages and disadvantages of sidecars:

Low communication overhead (only between proxies)
Don't need to change code

Advantages and disadvantages of remote proxies

Cross-region
Need to change code
High communication overhead (between service and proxy)

Advantages and disadvantages of library

Low communication overhead (only between services)
Need to change code

Wireless

Wires	Radio
Static	Dynamic
Robust	Experiences interference
High throughput	Lower throughput

The hidden node problem occurs when a node can communicate with a receiver, but cannot directly communicate with other nodes that are communicating with the receiver. As a result, multiple nodes can send data packets to the AP simultaneously, causing interference.

The exposed node problem occurs when a node cannot send packets to other nodes because of interference with a neighboring transmitter. A common way to approach this problem is using RTS (Request-to-Send) and CTS (Clear-to-Send).

One goal of wireless is to reduce the signal-noise ratio (SNR). This can be achieved with a stronger signal. This usually increases the throughput of the system, since there is such a high drop rate.

RFocus

RFocus: Beamforming Using Thousands of Passive Antennas

Problem: A transmitter can direct a more powerful signal beam to a receiver by beamforming with antenna arrays. However, there are physical limits on the number of antennas we can fit on a device.

The idea of RFocus is to increase antenna capacity by putting antennas in the environment. RFocus surface consists of 3200 antennas, all of which are either reflective or transparent. The receiver sends measurements of the received signal strength to the RFocus controller, which then configures the surface.

Cellular Networks

Cellular networks today are divided into:

Radio Access Network (RAN), which consists of radio towers that connect to user equipment (UE)
Mobile core, which connects the RAN to the rest of the Internet

When a UE wants to connect to a mobile network, it performs a cell selection procedure. It first tunes to the appropriate frequency. Then it chooses a tower to connect based on its home operators' Public Land Mobile Network (PLMN) identifier specified in its Subscriber Identity Module (SIM) card, as well as signal strength.

Once it chooses a tower, it synchronizes with the broadcast control channel and attaches to the network. The cell handles user authentication and sets up connections. Later, when a UE moves around in the physical world, it may need to be handed over to an adjacent tower.

When a UE enters an area where its home operator does not have infrastructure, it attempts to roam by attaching onto an available tower. However, this attachment only succeeds if the visited operator has a roaming agreement with the UE's home operator.

There are multiple generations of mobile Internet, notably 4G (LTE) and 5G.

RinP

Problem: Cellular bandwidth demand continues to grow exponentially. Cellular operators have traditionally met this demand by deploying more base stations in an area (densification). Users have typically enjoyed data rates that double every two years, a scaling trend called Cooper's law. However, denser deployments will lead to increased interference, eventually leading to the end of Cooper's law. The problem remains: how can we continue scaling bandwidth in high-demand areas?

The authors of this paper propose to have users connect to operators in overlapping coverage areas in order to dynamically share capacity across operators, known as roaming-in-place (RinP).