Show HN: Blitzping – A far faster nping/hping3 SYN-flood alternative with CIDR

44 points

2 years ago

I found hping3 and nmap's nping to be far too slow in terms of sending individual, bare-minimum (40-byte) TCP SYN packets; other than inefficient socket I/O, they were also attempting to do far too much unnecessary processing in what should have otherwise been a tight execution loop. Furthermore, none of them were able to handle CIDR notations (i.e., a range of IP addresses) as their source IP parameter. Being intended for embedded devices (e.g., low-power MIPS/Arm-based routers), Blitzping only depends on standard POSIX headers and C11's libc (whether musl or gnu). To that end, even when supporting CIDR prefixes, Blitzping is significantly faster compared to hping3, nping, and whatever else that was hosted on GitHub.

Here are some of the performance optimizations specifically done on Blitzping:

* Pre-Generation : All the static parts of the packet buffer get generated once, outside of the sendto() tightloop;

* Asynchronous : Configuring raw sockets to be non-blocking by default;

* Multithreading : Polling the same socket in sendto() from multiple threads; and

* Compiler Flags : Compiling with -Ofast, -flto, and -march=native (though these actually had little effect; by this point, the bottleneck is on the Kernel's own sendto() routine).

Shown below are comparisons between the three software across two CPUs (more details at the GitHub repository):

  #      Quad-Core "Rockchip RK3328" CPU @ 1.3 GHz. (ARMv8-A)        #
  +--------------------+--------------+--------------+---------------+
  | ARM (4 x 1.3 GHz)  | nping        | hping3       | Blitzping     |
  +--------------------+ -------------+--------------+---------------+
  | Num. Instances     | 4 (1 thread) | 4 (1 thread) | 1 (4 threads) |
  | Pkts. per Second   | ~65,000      | ~80,000      | ~275,000      |
  | Bandwidth (MiB/s)  | ~2.50        | ~3.00        | ~10.50        |
  +--------------------+--------------+--------------+---------------+

  # Single-Core "Qualcomm Atheros QCA9533" SoC @ 650 MHz. (MIPS32r2) #
  +--------------------+--------------+--------------+---------------+
  | MIPS (1 x 650 MHz) | nping        | hping3       | Blitzping     |
  +----------------------+------------+--------------+---------------+
  | Num. Instances     | 1 (1 thread) | 1 (1 thread) | 1 (1 thread)  |
  | Pkts. per Second   | ~5,000       | ~10,000      | ~25,000       |
  | Bandwidth (MiB/s)  | ~0.20        | ~0.40        | ~1.00         |
  +--------------------+--------------+--------------+---------------+

I tested Blitzping against both hpign3 and nping on two different routers, both running OpenWRT 23.05.03 (Linux Kernel v5.15.150) with the "masquerading" option (i.e., NAT) turned off in firewall; one device was a single-core 32-bit MIPS SoC, and another was a 64-bit quad-core ARMv8 CPU. On the quad-core CPU, because both hping3 and nping were designed without multithreading capabilities (unlike Blitzping), I made the competition "fairer" by launching them as four individual processes, as opposed to Blitzping only using one. Across all runs and on both devices, CPU usage remained at 100%, entirely dedicated to the currently running program. Finally, the connection speed itself was not a bottleneck: both devices were connected to an otherwise-unused 200 Mb/s (23.8419 MiB/s) download/upload line through a WAN ethernet interface.

It is important to note that Blitzping was not doing any less than hping3 and nping; in fact, it was doing more. While hping3 and nping only randomized the source IP and port of each packet to a fixed address, Blitzping randomized not only the source port but also the IP within an CIDR range---a capability that is more computionally intensive and a feature that both hping3 and nping lacked in the first place. Lastly, hping3 and nping were both launched with the "best-case" command-line parameters as to maximize their speed and disable runtime stdio logging.

11 comments

# Quad-Core "Rockchip RK3328" CPU @ 1.3 GHz. (ARMv8-A) # +--------------------+--------------+--------------+---------------+ | ARM (4 x 1.3 GHz) | nping | hping3 | Blitzping | +--------------------+ -------------+--------------+---------------+ | Num. Instances | 4 (1 thread) | 4 (1 thread) | 1 (4 threads) | | Pkts. per Second | ~65,000 | ~80,000 | ~275,000 | | Bandwidth (MiB/s) | ~2.50 | ~3.00 | ~10.50 | +--------------------+--------------+--------------+---------------+ # Single-Core "Qualcomm Atheros QCA9533" SoC @ 650 MHz. (MIPS32r2) # +--------------------+--------------+--------------+---------------+ | MIPS (1 x 650 MHz) | nping | hping3 | Blitzping | +----------------------+------------+--------------+---------------+ | Num. Instances | 1 (1 thread) | 1 (1 thread) | 1 (1 thread) | | Pkts. per Second | ~5,000 | ~10,000 | ~25,000 | | Bandwidth (MiB/s) | ~0.20 | ~0.40 | ~1.00 | +--------------------+--------------+--------------+---------------+