mami-project / pathspider Goto Github PK
View Code? Open in Web Editor NEWTool for A/B testing of path transparency to certain features in the Internet
Home Page: https://pathspider.net/
License: GNU General Public License v2.0
Tool for A/B testing of path transparency to certain features in the Internet
Home Page: https://pathspider.net/
License: GNU General Public License v2.0
Currently no way of detecting this easily
The Tor network provides a large number of exits that can be used to probe the core for transparency issues.
Tor exits often receive special treatment, and so this should be accounted for in the analysis of any test results.
meejah/txtorcon could be used for this and this is already packaged in Debian (by @irl).
One test that could use this is the H2 test in #44.
Flows may be observed by the Observer that Pathspider did not originate. The merger should detect these -- probably with a timeout -- and periodically flush them out of the flow table.
Need to wait a bit for QoF to start, and for QoFSpider to stop before killing QoF.
Pathspider evolved from ECNSpider, which tested kernel code switched on and off by an sysctl, leading to a tight synchronization between testing and configurator threads. Some A/B tests (e.g. injected packets with Scapy) don't need to do this. Generalize Pathspider to make the configurator optional, and used only when it's needed.
There are a few possible design patterns to use here, which we should discuss.
See pathspider/client/ecnclient.py:268 :
if a site does not contribute any results, the corresponding columns are missing.
this happens only if the amount of IPs is very low.
This should set up a Debian jessie machine, and set up PATHspider with development dependencies.
Using this should be documented.
Someone from the OONI community has asked if it would be possible to also store a PCAP trace of all the packets that were seen during a measurement run, for debugging or for deeper analysis of the data that the original observer functions may not have recorded enough detail for.
I don't think that this would be too difficult to add as an option.
The TFO connection setup in connect() of tfospider is not on a time out. Since socket.settimeout() does not work on socket.sendto(). This can cause connect() to get stuck on non-responsive hosts.
Build a flowmeter that allows packet inspection in Python to annotate records that can be merged with Pathspider traffic generator information. Allow dynamic addition of packet inspection functions and flow properties.
This is in progress in the observer branch. It'll be way slower than QoF, but given our requirements most probably fast enough.
libtrace can accept a pcap trace as its input, and so integration testing of the Observer class can be performed without having a PATHspider plugin to generate traffic.
For each plugin, an example input (possibly with crafted packets to allow for edge cases to be tested) and example output should be provided.
Running standalone with webresolver with about 100 targets, in the current testing branch (new-mplane-final-demo
), running against the mPlane SDK in the master
branch mami-project/mplane-sdk
, ecnspider runs, but the result_sink method in EcnAnalysis never gets called.
Why? I'm completely lost in tracking this one down.
Will need to fall back to the old demo as a plan next Monday at the latest (and I'm out of cycles on this 'til then).
Pathspider should print periodic info level logging entries with basic diagnostics: queue lengths, records processed, drain rates, etc. This allows performance estimation during a run.
it should
Drop the dependency on twisted. This may be reintroduced later but shouldn't be a dependency of the base PATHspider.
I suspect it is the observer thread or libtrace that stalls execution of the main thread.
This needs better documented for implementers of plugins, Sphinx autointerface module provides useful features for this. They may also assist in testing.
Need to document how to write a plugin in the Sphinx docs.
mPlane results use the ipaddress
module to represent IP addresses. Pandas and everything after it on the client side should use IP addresses as strings, since strings are waaaaaaaay faster than objects in NumPy.
Now we're stringifying in about half the places that might matter. This is bad, since ipaddress("1.0.0.0") != "1.0.0.0".
Fix this by stringifying addresses as soon as they come out of the mPlane results from the components.
The observer is a relatively heavyweight thread and starves other Python threads from running (see #25). On multicore machines (or VMs), this can be fixed by running the observer in a separate process. Consider multiprocessing or concurrent.futures for this.
Test whether ALPN works, do things support HTTP 2?
Release steps:
single quote instead of double quote.
Add scripts for doing black box testing of:
(1) can we start it in server mode and send it an mPlane specification and have it give us an expected response
(2) can we start it in client mode, point it at a test server, and have it send a resonable specification?
(3) can we start it in standalone mode at a given target and get expected results?
These scripts shouldn't change much as the internals change as this is the interfaces to external things.
ECNSpider currently generates one row of output per flow; instead, it should generate one row of output per path, with an additional key "condition". Conditions should include:
Actually, let's do it this way: an observation can have multiple conditions. The following potential conditions are exclusive.
We need to work out what it is that we want to look at here.
Some options:
NEAT may be able to inform this better.
Where a plugin does not require locking for global state, it should be possible to speed up the semaphores somehow.
Some options:
post_connect()
function instead of connect()
so that all that is locked is just pass
.This would be relevant for the TFO plugin.
Document the core functionality of the pathspider.base.Spider class and add notes where functionality is not implemented, to alert implementers of plugins.
On overloaded machines, the observer might miss packets. The merger should use the records from the workers as ground truth, and flag such records without matching flows so that the merger (and later analysis) can estimate how many packets were missed.
TFOSpider checks packets for TFO option by parsing the TCP header. This is slow and causes problems with multiple workers (>5), leading to not observed flows.
Why do ControlBatch and ControlWeb call clientpool.update() while waiting for the resolver? The reload of capabilities would seem to be superfluous in this case.
vagrant@contrib-jessie:/data/pathspider$ sudo pathspider -I examples/webtest.csv -o /home/vagrant/out.fjson
...
DEBUG:pathspider:all workers joined
and then it doesn't exit (waited more than 5min)
it does not affect 0.9.0 so far.
With large numbers of workers, the observer's timer queue becomes corrupted:
Traceback (most recent call last):
File "/home/gubser/pathspider/pathspider/base.py", line 329, in exception_wrapper
target(*args, **kwargs)
File "/home/gubser/pathspider/pathspider/observer.py", line 267, in run_flow_enqueuer
f = self._next_flow()
File "/home/gubser/pathspider/pathspider/observer.py", line 224, in _next_flow
if not self._next_packet():
File "/home/gubser/pathspider/pathspider/observer.py", line 99, in _next_packet
self._tick(self._pkt.seconds)
File "/home/gubser/pathspider/pathspider/observer.py", line 235, in _tick
heapq.heappop(self._tq).fn()
TypeError: unorderable types: function() < function()
This did not happen on a multiday 50-worker run on pto-de
running d37468a, but did on multiple runs on pto-nl
and pto-big-nl
running f1daacb
Pathspider was designed to run two connections as close to simultaneously as possible, to reduce transience. This means that the connections in each state use different source ports, which also helps to match flows from the observers with results by source port and destination address.
This is obviously bad for trying to isolate paths in an ECMP environment.
We should investigate (1) timestamp-based matching in the merger and (2) serial as opposed to parallel testing, to make it possible to run two tests using the same flow label.
pathspider (clean-mplane) doesn't quite yet run on mplane-sdk (master). Need to figure out where the nulls are coming from.
There's something wrong with IPv6 result merging:
DEBUG:pathspider:got a result (2a00:bdc0:3:103:1:0:403:805, 0)
DEBUG:pathspider:got a result (2a00:bdc0:3:103:1:0:403:805, 0)
DEBUG:pathspider:won't merge duplicate result
The second 0
in the debug output here is supposed to be the source port number. It's not clear what's breaking this.
This will allow the pathspider client to be used as a component itself (though we don't necessarily need to do the full wrapper yet)
When initialising an Observer instance with a libtrace URI that is an interface that is down, everything just hangs and doesn't even respond to Ctrl+C.
A plugin is required to perform UDP-Lite related tests:
Extensions to the base Spider class should be loadable using a dynamic plugin framework. twisted.plugin will be used for this.
Register simple metadata objects with twisted.plugin and then move the init logic back to the constructor.
In order to speed up development to a point where we can run new tests, mPlane support became a casualty. We should add it back.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.