Digital Mages - ipcalc(1)

IPCALC

Section: User Contributed Perl Documentation (1)
Updated: 2019-04-21
Index

NAME

ipcalc - Iptables/Network memory calculator

VERSION

$Id: bin/ipcalc, 0.2 2019/04/21 20:47:51 acorliss Exp $

USAGE

    ipcalc [-hvV] {mode args}

DESCRIPTION

This is a simple and possibly very misleading memory calculator which can tell you the impact of network buffering and connection tracking under specific circumstances. It is meant to provide worst case scenarios with which one can tune the kernel or determine RAM requirements.

BACKGROUND

WARNING: This information is likely to be outdated or flat-out wrong, especially with the rapid evolution of the Linux kernel. But, what the hell, live a little. :-)

FORMULAS FOR CONNECTION TRACKING

Connection tracking is critical for stateful firewalling. In order for the kernel to quickly check for the existence of an established session it uses a hash and bucket architecture. In this architecture an algorithm is used to determine which of the predetermined number of buckets a given connection might be found in.

The result of the algorithm becomes the associated key by which the bucket's address can be retrieved. The ``bucket'' is in reality just a linked list array, which is again of a predetermined size.

As might be evident from this description, Linux attempts to tread the middle road between fast retrieval of any given connection's state and minimizing memory impact of fully hashing every individual connection. The practical reality of current desktop and servers is that most machines can easily handle fully hashing every connection, as we'll see from the following formulas.

To begin, the memory impact of connection tracking is calculated via:

  size_of_mem_used (in bytes) = 
        max_connections * sizeof(struct ip_conntrack) +
        hash_size * sizeof(struct list_head)

By default max_connections is calculated by:

  max_connections = ram_size(B) / (arch_pointer_size / 32)

The Linux kernel artificially caps this to 65536 (64K) for >= 1GB RAM machines.

The struct sizes are:

  sizeof(struct list_head) = 2 * arch_pointer_size
  sizeof(struct ip_conntrack) = ~ 350 bytes

Don't ask where I came up with the size of ip_conntrack. It's changed over kernel versions and architectures, so this is a hopefully future-proof reasonable ball park. But probably not.

Anyway, the current default hash_size is 16K w/1GB of RAM. That means on a 1GB RAM machine you have 16K buckets, each containing four entries, to track 64K concurrent sessions. Using the above formula on a 32-bit architecture requires roughly 22MB of RAM. Not really a big deal, eh?

Say you have either a very busy server/router/NAT box, however. You might want to be able to support at least 128K concurrent connections, but fully hash-indexed (i.e., one entry per bucket) for maximum performance. Even that would only require roughly 45MB.

Once you've decided on how many buckets and concurrent sessions to support you can now configure your kernel. Maximum connections can be adjusted dynamically via sysctl (in /etc/sysctl.conf) or in /proc/sys:

  # Handle 128K sessions
  net.ipv4.netfilter.ip_conntrack_max = 131072

The size of the hash is a different story, unfortunately. That number can only be set at code load time. If you have connection tracking compiled as a module you can put the following into /etc/modprobe.conf or /etc/modprobe.d/:

  options nf_conntrack expect_hashsize=131072

FORMULAS FOR TCP BUFFERING

TCP buffer tuning is a much more nebulous subject than connection tracking. With the latter, at least, one knows there's a hard limit to the amount of memory that will be consumed, even if my numbers are provably false. If the maximum number of connections is exceeded then another session is expunged to make room for the new connections using a standard LRU algorithm. In that way your memory requirements will never grow.

With TCP, however, it's an entirely different ball game. There's no hard limit on the number of TCP sessions you can support, all you can do is estimate the per-connection impact based on normal/expected latency and throughput. That, in essence, makes this mode of IP calculations merely a modeling tool.

In a nutshell, your required buffer size is determined by:

  buffer_size (in bytes) = throughput (bytes/sec) * latency (RTT in sec)

Let's assume a best case scenario for a server connected to the clients over a 1Gbps backbone with <= 10ms latency. You'd need 1.28MB of buffering to saturate your link with a single client.

Things completely change when you assume a number of concurrent connections over a slower higher latency link, as might be the case for a web server connected to the Internet over a T1. A single connection saturating the link would require 37.50KB of buffering.

Of course, this doesn't mean that's all you need, since latency will likely vary from connection to connection, some times better, some times worse. The trick is to pick a likely average latency experienced and use that to determine the amount of RAM required to buffer all that traffic in transit. Assume full link saturation as a worst case scenario (even if your local port is hooked up to a 1Gbps switch which is subsequently connected to a router with a T1 interface handling the bulk of your traffic).

The number you get from that should give you an idea of how much RAM should have available in addition to any application load on the box. You can also tell the kernel to explicitly allow that amount of buffering via sysctl (/etc/sysctl.conf) or in /proc/sys:

  # rmem_max - maximum buffer size of read buffers
  net.core.rmem_max = 131072

  # wmem_max - maximum buffer size of write buffers
  net.core.wmem_max = 131072

  # tcp_rmem - vector of 3 INTEGERs: min, default, max
  # NOTE:  core.rmem_max overrides max here, if smaller
  net.ipv4.tcp_rmem =  4096 87040 131072

  # tcp_wmem - vector of 3 INTEGERs: min, default, max
  net.ipv4.tcp_wmem =  4096 87040 131072

MISCELLANIOUS NETWORK TUNING

The following are a few parameters that can be tuned via sysctl to improve network performance. For a complete list please see /usr/src/linux/Documentation/networking/ip-sysctl.txt.

  # Tolerate greater out-of-order packets (default: 3)
  net.ipv4.tcp_reordering = 5

  # Normal queue size for Fast Eth (set to 2500 for Gig Eth)
  net.core.netdev_max_backlog = 1000

  # Enable autotuning
  net.ipv4.tcp_moderate_rcvbuf = 1

  # Perform better RTT calculations for autotuning
  net.ipv4.tcp_timestamps = 1

  # Enable TCP window scaling
  net.ipv4.tcp_window_scaling = 1

  # Enable TCP selective acknowledgement
  net.ipv4.tcp_sack=1
  net.ipv4.tcp_dsack=0
  net.ipv4.tcp_fack=0

  # Better congestion control
  net.ipv4.tcp_congestion_control = cubic

From the command line one can also adjust a few more options:

  # Disable some buggy TSO engines
  ethtool -K eth0 tso off

  # Expand the hardware txqueue length for Gig Eth
  ifconfig eth0 txqueuelen 2000

REQUIRED ARGUMENTS

At least one of either the conntrack or tcp arguments.

OPTIONS

Options can be split into two calculation modes: connection tracking and TCP buffering.

CONNECTION TRACKING

    --conntrack-current             Uses current settings
    --conntrack-default             Uses kernel default settings
    --conntrack-max {integer}       Max number of sessions to track
    --conntrack-hashsize {integer}  Number of buckets

Note: max/hashsize start with the current values unless specifically overridden or the kernel defaults are requested. Arguments are processed in order, so if defaults are asked for after specifying any specific values it will be the defaults used.

TCP Buffering

    --tcp-current                   Uses current buffer settings
    --tcp-throughput {bps}          Max sustained throughput
    --tcp-latency {msec}            Latency expected for RTT
    --tcp-buffer {bytes}            Buffer size
    --tcp-connections {integer}     Number of concurrent sessions

Note: both buffer and throughput numbers can be specified with kb/mb/gb (or bps variants) suffixes.

Note: you can only calculate for either throughput (which will calculate the required buffer size) or buffer size (which will calculate attainable throughput).

Miscellanious

    -D  --debug                     Verbosity level (for debugging)
    -h  --help                      Display help text
    -V  --version                   Display version

DIAGNOSTICS

None.

EXIT STATUS

Returns a 1 if there are any errors either with the execution environment or requested devices. Returns a non-zero value if exiting by a signal. Otherwise, returns 0.