jwbensley / etheratemt Goto Github PK
View Code? Open in Web Editor NEWA multi-threaded network load generator/sinker
License: MIT License
A multi-threaded network load generator/sinker
License: MIT License
Finish adding the -t
-x
option.
(Was -t
but that might be confused with Tx mode).
Due to being a massive dunce, the error handling for the various Tx and Rx syscalls being used is wrong....
This was highlighted in closed issue #19
For example, sendmsg()
below dies after one second, as soon a EtherateMT hits line rate ENOBUFS
will be encountered, which is fine and there is no need to die/quit:
$ sudo taskset -c 3 ./build/etherate_mt -i enp1s0f0 -v -p2
Using inteface enp1s0f0 (6).
Verbose output enabled.
Setting interface promiscuous mode
Frame size set to 1514 bytes.
Using raw packet socket with sendmsg()/recvmsg().
Running in Tx mode.
Main thread pid is 11948.
Worker thread 11950 started
0. Rx: 0.00 Gbps (0 fps) 0 Drops 0 Q-Freeze Tx: 9.92 Gbps (818833 fps) Err: 0
11950:Socket Tx error (105: No buffer space available)
Worker thread 11950 returned 1
Removing interface promiscuous mode
This applies to all syscall methods being used.
Sockets aren't properly closed yet (until threads are properly terminated) and promisc mode removed from the interface.
Move the thread setup out of main()
in to separate files.
When an invalid CLI argument is passed to EtherateMT it quits and doesn't free, Valgrind reports lost memory records.
The rand() function called on line 133 will always return the same sequence, as the pseudorandom generator is never initialized. You need to insert
srand((unsigned) time(NULL));
before line 131.
Alternatively, insert <sys/random.h> and replace LL131-134 with:
getrandom(eth.frm_opt.tx_buffer, eth.frm_opt.frame_sz, 0);
Separate TPACKET v2 and v3 code so they can be used/tested separately depending on kernel version.
Implement the following compile options for the "strict" build and correct any errors thrown by these extra compile options:
-Wjump-misses-init
Warn if a "goto" statement or a "switch" statement jumps forward across the
initialization of a variable, or jumps backward to a label after the
variable has been initialized.
-Wlogical-op
Warn about suspicious uses of logical operators in expressions. This
includes using logical operators in contexts where a bit-wise operator is
likely to be expected.
-Wshadow
Warn whenever a local variable or type declaration shadows another variable,
parameter, type, class member (in C++), or instance variable (in
Objective-C) or whenever a built-in function is shadowed.
-Wformat=2
-Wformat=1 checks calls to "printf" and "scanf", etc., to make sure that the
arguments supplied have types appropriate to the format string specified,
and that the conversions specified in the format string make sense. This
includes standard functions, and others specified by format attributes, in
the "printf", "scanf", "strftime" and "strfmon" families. -Wformat=2 enables
-Wformat=1 plus additional format checks, currently equivalent to -Wformat
-Wformat-nonliteral -Wformat-security -Wformat-y2k.
-Wformat-signedness
If -Wformat is specified, also warn if the format string requires an
unsigned argument and the argument is signed and vice versa.
-Wextra
This enables some extra warning flags that are not enabled by -Wall.
-Wclobbered -Wempty-body -Wignored-qualifiers -Wmissing-field-initializers
-Wmissing-parameter-type (C only) -Wold-style-declaration (C only)
-Woverride-init -Wsign-compare -Wtype-limits -Wuninitialized
-Wunused-parameter (only with -Wunused or -Wall) -Wunused-but-set-parameter
(only with -Wunused or -Wall).
-Wdouble-promotion
Give a warning when a value of type "float" is implicitly promoted to
"double". CPUs with a 32-bit "single-precision" floating-point unit
implement "float" in hardware, but emulate "double" in software. On such a
machine, doing computations using "double" values is much more expensive
because of the overhead required for software emulation.
-Winit-self
Warn about uninitialized variables that are initialized with themselves.
Note this option can only be used with the -Wuninitialized option.
-Wuninitialized is included in -Wextra.
-Wtrampolines
Warn about trampolines generated for pointers to nested functions. A
trampoline is a small piece of data or code that is created at run time on
the stack when the address of a nested function is taken, and is used to
call the nested function indirectly. For some targets, it is made up of data
only and thus requires no special treatment. But, for most targets, it is
made up of code and thus requires the stack to be made executable in order
for the program to work properly.
-Wcast-qual
Warn whenever a pointer is cast so as to remove a type qualifier from the
target type. For example, warn if a "const char *" is cast to an ordinary
"char *". Also warn when making a cast that introduces a type qualifier in
an unsafe way. For example, casting "char **" to "const char **" is unsafe.
-Wcast-align
Warn whenever a pointer is cast such that the required alignment of the
target is increased. For example, warn if a "char *" is cast to an "int *"
on machines where integers can only be accessed at two- or four-byte
boundaries.
-Wwrite-strings
When compiling C, give string constants the type "const char[length]" so
that copying the address of one into a non-"const" "char *" pointer produces
a warning. These warnings help you find at compile time code that can try
to write into a string constant, but only if you have been very careful
about using "const" in declarations and prototypes. Otherwise, it is just a
nuisance. This is why we did not make -Wall request these warnings.
At startup EtherateMT displays
Trying to offload Rx timestamps to hardware...
There is is no confirmation this worked even though it has.
Implement recvmsg/sendmsg as a separate test mode.
The stats thread starts before the worker threads so it starts printing zero stats before the workers are ready.
It seems to be only receiving packets is batches of 256 (the default ring size).
When printing that PACKET_MMAP is being used the version number should be included.
When loading a custom frame payload from file, the return value for the fclose()
call isn't being checked.
EtherateMT doesn't check if the interface is physically up and will send traffic out of an interface which is down;
This is an interace that is disabled with sudo ip link set down dev eth2
:
bensley@htpc-ubuntu:~/c/EtherateMT$ sudo ./etherate_mt -i eth2
Using inteface eth2 (4).
Setting interface promiscuous mode
Frame size set to 1514 bytes.
Using raw packet socket with send()/read().
Running in Tx mode.
17927:Socket Tx error (Network is down)
Removing interface promiscuous mode
This is an interface which is up (sudo ip link set up dev eth2
) but the cable is disconnected:
bensley@htpc-ubuntu:~/c/EtherateMT$ sudo ./etherate_mt -i eth2
Using inteface eth2 (4).
Setting interface promiscuous mode
Frame size set to 1514 bytes.
Using raw packet socket with send()/read().
Running in Tx mode.
0. Rx: 0.00 Gbps (0 fps) Tx: 22.66 Gbps (1871223 fps)
1. Rx: 0.00 Gbps (0 fps) Tx: 22.90 Gbps (1890306 fps)
2. Rx: 0.00 Gbps (0 fps) Tx: 22.89 Gbps (1889481 fps)
^CQuitting...
Removing interface promiscuous mode
Currently a mixture of %d
and PRIu16
/32
/64
are used throughout. Standardise on the PRIuN
option.
bensley@ubuntu-htpc:~/c/EtherateMT$ sudo ./build/etherate_mt -I 6 -p1 -f 3000
Using inteface enp1s0f0 (6).
WARNING: Make sure your device supports baby giants or jumbo frames as required.
Setting interface promiscuous mode
Frame size set to 3000 bytes.
Using raw socket with PACKET_MMAP and TX/RX_RING v2.
Running in Tx mode.
16494:Trying to increase to 1048576 bytes...
16494:Write buffer size set to 425984 bytes
tx_bytes == 0
tx_bytes == 0
tx_bytes == 0
tx_bytes == 0
tx_bytes == 0
tx_bytes == 0
tx_bytes == 0
tx_bytes == 0
From: http://man7.org/linux/man-pages/man7/netdevice.7.html
SIOCGIFMTU, SIOCSIFMTU Get or set the MTU (Maximum Transfer Unit) of a device using ifr_mtu. Setting the MTU is a privileged operation. Setting the MTU to too small values may cause kernel crashes.
Any Etherate version.
Remove -fstack-protector-all compile flag, this is having a negative performance impact.
The first set of results are with -fstack-protector-all compile flag present, the second set of results are without it.
Seconds Mbps Tx MBs Tx FrmTx/s Frames Tx
1 3671.73 437 303148 303148
2 5836.00 1133 481836 784984
3 5819.13 1827 480443 1265427
4 5818.16 2520 480363 1745790
5 5813.18 3213 479952 2225742
6 5820.41 3907 480549 2706291
7 5817.24 4600 480287 3186578
8 5816.74 5294 480246 3666824
9 5818.54 5988 480395 4147219
10 5816.81 6681 480252 4627471
Seconds Mbps Tx MBs Tx FrmTx/s Frames Tx
1 4508.76 537 372256 372256
2 6045.29 1258 499116 871372
3 6046.76 1978 499237 1370609
4 6044.54 2699 499054 1869663
5 6043.55 3419 498972 2368635
6 6041.28 4140 498785 2867420
7 6046.71 4860 499233 3366653
8 6045.51 5581 499134 3865787
9 6047.50 6302 499298 4365085
10 6045.84 7023 499161 4864246
Hey James. I'm in the process of writing a port scanner and I am trying to take advantage of a PACKET_TX_RING for the transmission process. I first started reading a post you made on StackOverflow where you mentioned:
"When using PACKET_MMAP in Tx mode, only one frame can be put into each ring block (rather than multiple frames per ring block as with Rx mode)" <--- I can't find this stated in the documentation.
So I went ahead and started reading this repo, and I found this page: "EtherateMT PACKET_MMAP Mode", where the same constraint for the TX_RING is stated (only one frame per block), but then crossed out...
Now, let me explain why this is pertinent, and hopefully, you could shed some light into the dark corners of my understanding.
Currently, my ring is composed of one block of pagesize (4096 on my sys) segmented into 32 frames where each frame is equal to 128 bytes.
Issue 1: Every other cycle (i.e. a send and a receive) my sending is failing. I can log that my thread is filling a frame, but when sent send() is returning 0.
Issue 2: Let's say we are on our 4th cycle (4 sends and recv's), when finding a frame, I notice that I will skip 4 frames which are still in status 1 (TP_STATUS_SEND_REQUEST), even though I've witnessed these frames leave my machine via packet capture (wireshark). I went ahead and started casting the frame->tp_status as a volatile uint32, but their status remains the same on subsequent calls. I've seen some people mention a "consumer pointer" but I have not seen anything in documentation mentioning this. Should I be moving my TX_RING pointer forward past each frame that is used?
If you like you can take a look at my ring setup here:
https://github.com/Dauie/hermes/blob/master/src/thread.c (line 53)
and usage:
https://github.com/Dauie/hermes/blob/master/src/scan.c (line 184)
Can we add in some Helgrind checks due to the use of multiple-threads?
bensley@htpc-ubuntu:~/c/EtherateMT$ sudo valgrind --tool=helgrind ./build/etherate_mt -I 2
==26222== Helgrind, a thread error detector
==26222== Copyright (C) 2007-2013, and GNU GPL'd, by OpenWorks LLP et al.
==26222== Using Valgrind-3.10.1 and LibVEX; rerun with -h for copyright info
==26222== Command: ./build/etherate_mt -I 2
==26222==
Using inteface eth1 (2).
Setting interface promiscuous mode
Frame size set to 1514 bytes.
Using raw packet socket with send()/read().
Running in Tx mode.
==26222==
==26222== Process terminating with default action of signal 11 (SIGSEGV)
==26222== Access not within mapped region at address 0x90
==26222== at 0x4051B6: print_stats (print_stats.c:60)
==26222== by 0x4C30FA6: ??? (in /usr/lib/valgrind/vgpreload_helgrind-amd64-linux.so)
==26222== by 0x4E45183: start_thread (pthread_create.c:312)
==26222== by 0x515903C: clone (clone.S:111)
==26222== If you believe this happened as a result of a stack
==26222== overflow in your program's main thread (unlikely but
==26222== possible), you can try to increase the size of the
==26222== main thread stack using the --main-stacksize= flag.
==26222== The main thread stack size used in this run was 8388608.
==26222==
==26222== For counts of detected and suppressed errors, rerun with: -v
==26222== Use --history-level=approx or =none to gain increased speed, at
==26222== the cost of reduced accuracy of conflicting-access information
==26222== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)
Add an option which uses https://unixism.net/loti/
Add Travis-CI to perform build checks on commit.
AF_PACKET socket require Linux 2.2.
PACKET_RX_RING requires Linux 2.6.5.
PACKET_VERSION with PACKET_RX_RING requires Linux 2.6.27.
TPACKET_V2 requires Kernel 2.6.27.
PACKET_STATISTICS requires TPACKET_V2.
PACKET_TX_RING requires Linux 2.6.31.
PACKET_LOSS requires Linux 2.6.31.
PACKET_TIMESTAMP with PACKET_RX_RING requires Linux 2.6.36.
PACKET_FANOUT requires Linux 3.1.
PACKET_QDISC_BYPASS requires Linux 3.14.
PACKET_VERSION requires Linux 3.0.
PACKET_MMAP TPACKET_V2 requires Linux 2.6.17.
PACKET_MMAP TPACKET_V3 requires Linux 3.2.
TPACKET_V3 and PACKET_TX_RING together requires Linux 4.11.
Merge the sock_op
cases of S_O_QLEN_*
into one case, currently lots of code is duplicated.
In the thd_opt structure, sock is defined as an int32_t. The socket() call returns a 64-bits int using a 64-bits Linux.
In functions rem_int_promisc(), set_int_promisc(), get_if_index_by_name() and get_if_name_by_index() the sock variable is defined as an int32_t instead of an int.
Add a bidirectional mode.
Have the main thread asssign each worker thread to a different CPU core:
This could be a CLI arg of cores to use like a hex bitmap or automatic across available CPUs.
Two Tx threads run out of buffer space almost instantly:
[user@ucpe_002 emt]$ sudo taskset -c 3-5 ./etherate_mt -i ens2f0 -c 2 -f 64
Using inteface ens2f0 (12)
Frame size set to 64 bytes
Using PACKET_MMAP.
Running in Tx mode.
0. 0.00 Rx Gbps (0 fps) 0.00 Tx Gbps (0 fps)
Write buffer size set to 425984 bytes
Write buffer size set to 425984 bytes
1. 0.00 Rx Gbps (0 fps) 1.46 Tx Gbps (2842624 fps)
2. 0.00 Rx Gbps (0 fps) 1.51 Tx Gbps (2957303 fps)
packet_mmap Tx error: No buffer space available
Even with one thread, after 206 seconds of TPACKET v2 no buffer space is available:
206. 0.00 Rx Gbps (0 fps) 0.75 Tx Gbps (1466368 fps)
packet_mmap Tx error: No buffer space available
Using #if LINUX_VERSION_CODE >= KERNEL_VERSION(3,14,0)
from version.h
doesn't work on certain distros, for example CentOS when using a non-standard Kernel. CentOS 7 uses Kernel 3.10 but installing a 4.13 Kernel for example doesn't update LINUX_VERSION_CODE
in /usr/include/linux/version.h
and the EtherateMT checks fail.
#include <sys/utsname.h>
#include <stdio.h>
main () {
struct utsname uts;
uname(&uts);
float f;
sscanf (uts.release,"%f",&f);
if (f >= 3.14) printf ("%f is >= 3.14\n", f);
return 0;
}
Are these of any benefit?
From: https://gcc.gnu.org/onlinedocs/gcc/Instrumentation-Options.html
-fsanitize=alignment
- Enable ThreadSanitizer, a fast data race detector. Memory access instructions will be instrumented to detect data race bugs...The option cannot be combined with -fsanitize=address
, -fsanitize=leak
and/or -fcheck-pointer-bound
s.
-fsanitize=thread
- Enable ThreadSanitizer, a fast data race detector. Memory access instructions are instrumented to detect data race bugs...The option cannot be combined with -fsanitize=address
, -fsanitize=leak
and/or -fcheck-pointer-bounds
.
-fsanitize=pointer-compare
- Instrument comparison operation (<, <=, >, >=) with pointer operands. The option must be combined with either -fsanitize=kernel-address
or -fsanitize=address
. The option cannot be combined with -fsanitize=thread
and/or -fcheck-pointer-bounds
. Note: By default the check is disabled at run time. To enable it, add detect_invalid_pointer_pairs=2 to the environment variable ASAN_OPTIONS. Using detect_invalid_pointer_pairs=1 detects invalid operation only when both pointers are non-null.
-fsanitize=pointer-subtract
- Instrument subtraction with pointer operands. The option must be combined with either -fsanitize=kernel-address
or -fsanitize=address
. The option cannot be combined with -fsanitize=thread
and/or -fcheck-pointer-bounds
. Note: By default the check is disabled at run time. To enable it, add detect_invalid_pointer_pairs=2 to the environment variable ASAN_OPTIONS. Using detect_invalid_pointer_pairs=1 detects invalid operation only when both pointers are non-null.
-fsanitize-address-use-after-scope
- Enable sanitization of local variables to detect use-after-scope bugs. The option sets -fstack-reuse
to βnoneβ.
Investigate the many sub-options available for -fsanitize=undefined
- Enable UndefinedBehaviorSanitizer, a fast undefined behavior detector. Various computations are instrumented to detect undefined behavior at runtime.
-fstack-clash-protection
- Generate code to prevent stack clash style attacks. When this option is enabled, the compiler will only allocate one page of stack space at a time and each page is accessed immediately after allocation. Thus, it prevents allocations from jumping over any stack guard page provided by the operating system.
This is already in use:
-fsanitize=address
- Enable AddressSanitizer, a fast memory error detector. Memory access instructions are instrumented to detect out-of-bounds and use-after-free bugs. The option enables -fsanitize-address-use-after-scope
...The option cannot be combined with -fsanitize=thread
and/or -fcheck-pointer-bounds
.
Add echo mode which would reflect any packets received on an interface back out again.
Use LINUX_VERSION_CODE
and KERNEL_VERSION()
to check that only supported features are included at compile time, otherwise the code won't compile on older Kernels.
This section prints "Trying to offload Rx timestamps to hardware..." - but state that the operation was successfull.
Move the print_pps()
function into a separate file.
What is the limit for the -m
option for the IOVEC array when using sendmmsg()
/recvmmsg()
?
Identify this and document it in the Wiki.
Implement TPACKET v4 as a separate test mode.
Thanks for this great tool!
In Google Cloud Kubernetes environment, packets with "bad" IP address seem to be directly discarded.
Any way we could specify some source & destination addresses in command line?
Add a -v cli arg to display verbose output (there is too much socket related info to display every run).
The -h
option says that it is the "size on the wire" which is incorrect. Update the -h
output and any other documentation references.
When using a frame size of 1302 or higher (-f 1302
) the following output is displayed:
1. 0.00 Rx Gbps (0 fps) 4.57 Tx Gbps (438900 fps)
2. 0.00 Rx Gbps (0 fps) 5.93 Tx Gbps (569275 fps)
3. 0.00 Rx Gbps (0 fps) 5.96 Tx Gbps (572509 fps)
4. 0.00 Rx Gbps (0 fps) 5.94 Tx Gbps (569818 fps)
When using 1301 or smalling:
1. 0.00 Rx Gbps (0 fps) 0.15 Tx Gbps (5317086108870940738 fps)
2. 0.00 Rx Gbps (0 fps) 0.00 Tx Gbps (14675157660483763884 fps)
3. 0.00 Rx Gbps (0 fps) 18446744073.71 Tx Gbps (14831125519677311053 fps)
4. 0.00 Rx Gbps (0 fps) 18446744073.71 Tx Gbps (15001272275161180849 fps)
5. 0.00 Rx Gbps (0 fps) 18446744073.71 Tx Gbps (14774409934516021121 fps)
6. 0.00 Rx Gbps (0 fps) 18446744073.71 Tx Gbps (14788588830806343604 fps)
https://www.kernel.org/doc/html/latest/networking/msg_zerocopy.html
The documention uses send(), can this also be used with sendmsg()?
Lots of code is duplicated to set up the same socket options for each test method.
Implement the AF_XDP socket method (more details to follow).
https://blog.apnic.net/2020/04/30/how-to-build-an-xdp-based-bgp-peering-router/
The behaviour of the -f
flag includes the headers like SRC MAC + DST MAC + ETYPE in EtherateMT but it does not in Etherate. Align the two.
Have you had luck with https://stackoverflow.com/questions/46267495/linux-how-to-debug-spin-lock-source ?
When using either -i
with an incorrect interface name or -I
with an incorrect interface index causes EtherateMT to quit and produces no error (that the interface can't be found).
A declarative, efficient, and flexible JavaScript library for building user interfaces.
π Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. πππ
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google β€οΈ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.