jqk6 / tethering_detection Goto Github PK
View Code? Open in Web Editor NEWThis project forked from yichao0319/tethering_detection
This project forked from yichao0319/tethering_detection
/*************** * Tethering detection ***************/ /*************** * Features ***************/ 1. IP layer a) TTL b) ID field c) Inter-arrival time 2. TCP layer a) # of connections (# of <src IP, src port, dst IP, dst port>) b) round-trip time c) TCP WIN 3. UDP layer a) # of connections 4. Application layer a) User-Agent /*************** * Directories ***************/ 1. dataset_narus, dataset_sprint, dataset_uscc are the links to the dataset 2. 172.31.7.162:/data/ychen stores the copy of the dataset on local disk. 3. plot_summary_ttl plot the results generated by tool1.pl /export/home/ychen/sprint/output/detect.TTL.summary 4. tethered_clients IP of tethered clients detected by each heuristics 5. output_boot_time_freq used to store the frequency calculated by "detect_tethering_boot_time.pl" /*************** * Sub-Tasks ***************/ 1. subtask_initial_ttl Some time ago, the initial TTL of Windows is 64. Just a quick check if there is any client whose initial TTL is 64. - input: ../output/file.$file_id.ttl.txt - output: ./output <# client> <# normal client w/ TTL_X> <...> <# tethered client w/ TTL_X> <...> 2. subtask_ttl_distribution a) Check if the TTLs behind a mobile station only differs by 1. b) Check the distribution of TTL values. some useful info: - http://forums.macrumors.com/showthread.php?t=1140306 - http://www.map.meteoswiss.ch/map-doc/ftp-probleme.htm 3. subtask_nontether_id Generate IP's ID field timeseries of non-tethered users. We want to see if there is difference between ID from tehtered users and non-tethered users. 4. subtask_tcp_seq_ack Group TCP packets by <src IP, src port, dst IP, dst port> and manually check the sequence/ACK numbers. 5. subtask_http_agents List all the User-Agent from http header of each user 6. subtask_clock_skew Calculate the clock skew using TCP Timestamp. a) calculate_clock_skew.pl I followed the method in the paper "Remote physical device fingerprinting" page 5: t_i: the receiving time of the packet i T_i: the TCP Timestamp of the packet i Hz: the clock frequency (so T_i / Hz ~ the tx time) Let x_i = t_i - t_1, which is the interval between the pkt i and the pkt 1 observed by the receiver. Let w_i = (T_i - T_1) / Hz, which is the interval between the pkt i and the pkt 1 from the sender. And let y_i = w_i - x_i, which is the observed offset of the packet i. So, the slope of the offset-set (x_i, y_i) is the skew of the clock. b) calculate_clock_skew_remove_rtt.pl Based on the above codes, I try to remove one way delay estimated by RTT from TCP Timestamp -- but doesn't have any improvement. c) calculate_clock_skew_sue.pl Based on "calculate_clock_skew.pl", use Moon Sue's alg to calculate clock skew. ClockSkewMoon.pm My implementation of Moon Sue's clock skew estimation algorithm proposed in "Estimation and Removal of Clock Skew from Network Delay Measureme d) calculate_clock_skew_remove_delay_intermediate_sue.pl Based on "calculate_clock_skew_sue.pl" and take two trace files as input, one from destination node and one from an intermediate node, and try to remove one way dalay by combining two traces. e) calculate_clock_skew_sue_segment.pl modified from calculate_clock_skew_sue.pl. Take its output and calculate the clock skew of smaller segments. - batch run: i) batch_40_machines_avg_std_skew.sh ii) batch_40_machines_seg_size.sh 7. subtask_boot_time Calculate the boot time using TCP Timestamp option. a) group_by_tcp_timestamp.pl Read the processed pcap file (which includes TCP Timestamp) and calculate the Timestamp of some time t. It also output the calculated Timestamp of each IP. b) group_by_tcp_timestamp.m Read the output of "group_by_tcp_timestamp.pl" and use K-Means to classify calculated Timestamps. Then it uses "partition index" to determine which number of partition yields best clustering. c) group_by_tcp_timestamp.java Read the output of "group_by_tcp_timestamp.pl" and use "X-Means" and "DBScan" to see how many clusters are there. The clustering method is implemented by weka. d) group_by_tcp_timestamp_elki.java Read the output of "group_by_tcp_timestamp.pl" and use "DBScan" to see how many clusters are there. The clustering method is implemented by ELKI. e) group_by_tcp_timestamp_elki.sh Read the output of "group_by_tcp_timestamp.pl" and use "DBScan" to see how many clusters are there. This script uses the command mode (GUI mode) of ELKI. f) group_by_tcp_timestamp_boottime.pl Based on "group_by_tcp_timestamp.pl", but instead of calculate the initial TCP Timestamp, this one calculate the boot time. g) - MyBootTime.pm - estimate_boot_time.pl - estimate_freqs.pl The code tries to implement recusive functions to better estimate the boot time and frequency -- but need some modification to make them work. The basic idea is to iterate all possible frequencies and apply them to calculate the boot time from each packet. If these packets are from the same device, the difference between calculated boot time should be small. Is called by "estimate_boot_time.pl" and "estimate_freqs.pl" h) - analyze_freq.pl Analyze the frequence change over packets. a) latest frequency b) avg frequency c) avg frequency in a time window d) EWMA - analyze_freq_per_flow.pl Same as above but just do that per flow. batch run: analyze_freq.sh 8. subtask_boot_time.stable time_to_stable_boottime_per_flow.pl Analyze the following things: a) How long does the frequence per flow take to become stable. b) How long are the flows (in seconds) c) How many packets per flow d) How much traffic per flow e) How many flows per IP 9. subtask_boot_time.deep_with_user_agent - get_freq_of_ip_labeled_by_user_agent.pl Combine TTL, boot time, and clock frequency heuristics to detect the tethering usage. The result is compared with that of User Agent heuristics. - calculate_flow_statistics.pl Analyze the following things: a) the ratio of flows/IPs with User Agent b) the ratio of flows/IPs with recognizable User Agent (i.e. User-Agent with keywords like "windows", "android", "mac os", ...) c) the ratio of flows/IPs with Timestamp d) the ratio of flows/IPs with Timestamp and is long enough to estimate accurate boot time - calculate_boot_time_intervals.pl Get the boot time intervals from flows with different User Agents of an IP. - sanity_check_unstable_freq.sh Use tshark to output RTT, TCP Timestamp, and packet receiving time to manually calculate the frequency. This is used to check the frequency is indeed unstable and check if it's related to RTT. 10. subtask_compare This task is to compare the result of any heuristic with a more reliable one (e.g. User Agent) and evaluate the parameters/corretness of the heuristic. 11. subtask_tcp_flavor This task is to identify the TCP flavor by congestion window size 12. subtask_tcp_timestamp - timestamp_statistics.pl The code is used to get the following TCP Timestamp statistics a) # flows w/ and w/o TS b) # IPs w/ and w/o TS c) # flows w/ TS of various OS d) # IPs w/ TS of various OS e) # flows w/o TS of various OS f) # IPs w/o TS of various OS 13. subtask_iphone_freq The task is to figure why the clock frequency changes over time and, if possible, find the pattern. - analyze_freq_per_flow[| 2 | 3 | 4].pl Each code uses a different way to calculate the frequency of flows and evaluate them using the standard deviation of estimated boot time. /*************** * Sprint tools ***************/ 1. a) pcapParser.c only IP layer info - input: pcap_file - output: print format <time> <time usec> <src ip> <dest ip> <proto> <ttl> <id> <length> - batch_pcapParser.sh - run pcapParser.c in batch - input files: /data/ychen/sprint/pcap - output files: /data/ychen/sprint/text note. ignore IP-ENCAP, IP fragmentation b) pcapParser2.c IP and TCP info - input: pcap_file - output: print format <time> <time usec> <src ip> <dest ip> <proto> <ttl> <id> <length> <src port> <dst port> <seq> <ack seq> <flag fin> <flag syn> <flag rst> <flag push> <flag ack> <flag urg> <flag ece> <flag cwr> <win> <urp> <payload len> - batch_pcapParser2.sh - run pcapParser2.c in batch - input files: /data/ychen/sprint/pcap - output files: /data/ychen/sprint/text2 note. ignore IP-ENCAP, IP fragmentation c) pcapParser3.c List all the User-Agent from http header of each user - input: pcap_file - output: format <time> <time usec> <src ip> <dest ip> <proto> <ttl> <id> <length> <src port> <dst port> <seq> <ack seq> <flag fin> <flag syn> <flag rst> <flag push> <flag ack> <flag urg> <flag ece> <flag cwr> <win> <urp> <payload len> <http header> <new line> - batch_pcapParser3.sh - run pcapParser3.c in batch - input files: /data/ychen/sprint/pcap - output files: /data/ychen/sprint/text3 d) pcapParser4.c reads in a pcap file and outputs "UDP" info - input: pcap_file - output: format <time> <src ip> <dest ip> <proto> <ttl> <id> <length> <src port> <dst port> <length> - batch_pcapParser4.sh - run pcapParser4.c in batch - input files: /data/ychen/sprint/pcap - output files: /data/ychen/sprint/text4 e) pcapParser5.c reads in a pcap file and find if there is timestamp option - input: pcap_file - output: format <time> <src ip> <dest ip> <proto> <ttl> <id> <length> <src port> <dst port> <seq> <ack seq> <flag fin> <flag syn> <flag rst> <flag push> <flag ack> <flag urg> <flag ece> <flag cwr> <win> <urp> <payload len> <timestamp> <timestamp reply> - batch_pcapParser5.sh - run pcapParser5.c in batch - input files: /data/ychen/sprint/pcap - output files: /data/ychen/sprint/text5 f) pcapParser6.c reads in a pcap file and find Window Scale option - input: pcap_file - output: format <time> <time usec> <src ip> <dest ip> <proto> <ttl> <id> <length> <src port> <dst port> <seq> <ack seq> <flag fin> <flag syn> <flag rst> <flag push> <flag ack> <flag urg> <flag ece> <flag cwr> <win> <urp> <payload len> <window scale> - batch_pcapParser6.sh - run pcapParser6.c in batch - input files: /data/ychen/sprint/pcap - output files: /data/ychen/sprint/text6 2. a) analyze_sprint_text.pl Group packets in flows, and analyze TTL, tput, pkt number, packet length entropy. - input: parsed_pcap_text format <time> <time usec> <src ip> <dest ip> <proto> <ttl> <id> <length> - output ./output/ a) file.<id>.tput.ts.txt: total throughput timeseries b) file.<id>.pkt.ts.txt total packet number timeseries c) file.<id>.ids.ts.txt IP ID of each packet of each flow d) file.<id>.ttl.txt TTL of each flow e) file.<id>.ttl.ts.txt timeseries of # of unique TTLs of each flow f) file.<id>.tput.ts.txt timeseries of tput of each flow g) file.<id>.pkt.ts.txt timeseries of # of packets of each flow i) file.$file_id.len_entropy.ts.txt timeseries of packet len entropy of each flow - batch_pcapParser.sh - run analyze_sprint_text.pl in batch - input files: /data/ychen/sprint/text - output files: a) output of analyze_sprint_text.pl b) log file: /export/home/ychen/sprint/output/<input file>.log note. a flow here means an unique <src IP, dst IP> pair XXX: should be changed to <src IP>?? b) analyze_sprint_text_inter_arrival_time.pl Group packets in flows, and analyze inter-arrival time. - input: parsed_pcap_text format <time> <time usec> <src ip> <dest ip> <proto> <ttl> <id> <length> - output ./output/ a) file.<id>.inter_arrival_time.ts.txt timeseries of inter-arrival time of each flow c) analyze_sprint_tcp_connections.pl Analyze # connections (i.e. <src ip> <src port> <dst ip> <dst port>) of different time bins - input: parsed_pcap_text format: <time> <time usec> <src ip> <dest ip> <proto> <ttl> <id> <length> <src port> <dst port> <seq> <ack seq> <flag fin> <flag syn> <flag rst> <flag push> <flag ack> <flag urg> <flag ece> <flag cwr> <win> <urp> <payload len> - output ./output/ a) file.<id>.connections.bin<time bin size>.txt timeseries of # of connections d) analyze_sprint_tcp_rtt.pl Group packets in flows, and analyze TTL, tput, pkt number, packet length entropy. e) analyze_sprint_text_inter_arrival_time.pl Analyze # of TCP/UDP connections (i.e. <src ip> <src port> <dst ip> <dst port>) of different time bins @timebins = (1, 5, 10, 60, 600); ## the time bin size we want to analyze f) analyze_sprint_http_user_agents.pl Search HTTP User-Agent for the OS and device of the following keywords OS: Windows, Microsoft, Android, MAC, Ubuntu device: HTC, Samsung, LGE, NOKIA, Windows Phone, iPhone, iPad, MacBookAir 3. a) detect_tethering.pl Read in results from "analyze_sprint_text.pl" and "analyze_sprint_text_inter_arrival_time.pl" to detect tethering usage. a) The detection is based on number of different TTL per second. b) After detecting tethered clients, calculate: i) how many tethered clients. ii) how much traffic are generated by tethered clients. c) Find the possible metrics as the detection confidence, and do the inter-flow/intra-flow analysis i) tput ii) # pkts iii) pkt length entropy - input: file_id The file ID of 3-hr Sprint Mobile Dataset. This program uses this ID to look up the output files from "analyze_sprint_text.pl", i.e. ./output/ a) file.<id>.tput.ts.txt: total throughput timeseries b) file.<id>.pkt.ts.txt total packet number timeseries c) file.<id>.ids.ts.txt IP ID of each packet of each flow d) file.<id>.ttl.txt TTL of each flow e) file.<id>.ttl.ts.txt timeseries of # of unique TTLs of each flow f) file.<id>.tput.ts.txt timeseries of tput of each flow g) file.<id>.pkt.ts.txt timeseries of # of packets of each flow i) file.$file_id.len_entropy.ts.txt timeseries of packet len entropy of each flow j) file.$file_id.inter_arrival_time.ts.txt timeseries of packet len entropy of each flow - output: a) Assume TTL heuristic is perfect: i) how many tethered clients. ii) how much traffic are generated by tethered clients. b) intra-flow analysis: the ratio of non-tethered traffic to tethered traffic i) tput ii) # pkts iii) pkt length entropy iv) mean of inter-arrival time v) stdev of inter-arrival time c) inter-flow analysis: the ratio of non-tethered traffic to tethered traffic i) tput ii) # pkts iii) pkt length entropy iv) mean of inter-arrival time v) stdev of inter-arrival time d) fig: generate #TTL/tput/#pkts/pkt_len_entropy timeseries of tethered clients detected by TTL ./figures_ttl/tehtered.<file_id>.<IP>.ts.txt.eps e) fig: generate IDs timeseries of tethered clients detected by TTL ./figures_ttl/tehtered.<file_id>.<IP>.ids.txt.eps - batch_detect_tethering.sh - run detect_tethering.pl in batch - input files: /export/home/ychen/sprint/output/ - output files: /export/home/ychen/sprint/output/detect.TTL.<input file ID>.log b) detect_tethering_TTL.pl Read in results from "analyze_sprint_text.pl" and detect TTL tethering usage. a) > 1 TTL across the whole trace b) > 1 TTL at any second - input: file_id The file ID of 3-hr Sprint Mobile Dataset. This program uses this ID to look up the output files from "analyze_sprint_text.pl", i.e. ./output/ a) file.<id>.ttl.txt TTLs of each flow b) file.<id>.ttl.ts.txt timeseries of # of unique TTLs of each flow - output: IP of tethered clients. a) ./tethered_clients/TTL_whole_trace.<file id>.txt b) ./tethered_clients/TTL_one_second.<file id>.txt c) detect_tethering_connections.pl Read in results from "analyze_sprint_tcp_connections.pl" and detect tethering using number of connections. e.g. > n connections at any time using time bin size b - input: file_id The file ID of 3-hr Sprint Mobile Dataset. This program uses this ID to look up the output files from "analyze_sprint_tcp_connections.pl", i.e. ./output/ a) file.<id>.connections.bin<time bin size>.txt timeseries of # of connections - output: IP of tethered clients. ./tethered_clients/Connections_timebin<time bin size>.threshold<threshold>.<file id>.txt d) detect_tethering_rtt.pl Read in results from "analyze_sprint_tcp_rtt.pl" and detect tethering using the variance of RTT. e.g. when variance of RTT to the same destination is larger than some threshold - input: file_id The file ID of 3-hr Sprint Mobile Dataset. This program uses this ID to look up the output files from "analyze_sprint_tcp_rtt.pl", i.e. ./output/ file.<id>.rtts.txt: the RTT to different destinations format: <src ip>, <dst ip>, <RTTs> - output: IP of tethered clients. ./tethered_clients/RTT_variance.threshold<threshold>.<file id>.txt e) detect_tethering_inter_arrival_time.pl Read in results from "analyze_sprint_text_inter_arrival_time.pl" and detect tethering usage by inter-arrival time. a) The mean of inter-arrival time is smaller than some threshold b) The stdev of inter-arrival time is larger than some threshold f) detect_tethering_tput.pl Read in results from "analyze_sprint_text.pl" and detect tethering usage by throughput. e.g. The tput is larger than some threshold g) detect_tethering_pkt_len_entropy.pl Read in results from "analyze_sprint_text.pl" and detect tethering usage by entropy of pkt length. e.g. The entropy is larger than some threshold i) detect_tethering_TTL_default_value.pl Read in results from "analyze_sprint_text.pl" and detect TTL tethering usage. e.g. TTL != 63, 127, or 254 j) detect_tethering_TTL_diff.pl Read in results from "analyze_sprint_text.pl" and detect TTL tethering usage. e.g. if there are multiple TTLs and their difference is 1 (or < some small number) k) detect_tethering_udp_connections.pl l) detect_tethering_tcp_udp_connections.pl m) - detect_tethering_user_agent.pl - detect_tethering_user_agent_no_win.pl Read in results from "analyze_sprint_http_user_agents.pl" and detect tethering using the number of OSs and devices. The only difference of "detect_tethering_user_agent_no_win.pl" is that it doesn't count "Windows" machine behind the mobile station. This is just used for boot_time based method becuase Windows disable TCP Timestamp option in default. n) detect_tethering_boot_time.pl Detect tethering by calculating the boot time. 4. cross_validate_detected_ip.pl Read IPs of tethered clients detected by different methods, and cross-validate the overlapping - input: IP of tethered clients. Possible base: a) TTL (whole trace): ./tethered_clients/TTL_whole_trace.<file id>.txt b) TTL (one second) : ./tethered_clients/TTL_one_second.<file id>.txt c) TTL (default value) : TTL_default_value.<file id>.txt d) User Agent : User_agent.<file id>.txt e) TTL (diff) : TTL_diff.<file id>.txt Evaluation Methods: c) Connections : Connections_timebin<time bin size>.threshold<threshold>.<file id>.txt Time bins = (1, 5, 10, 60, 600) Thresholds = (2 .. 30) d) RTT (variance) : RTT_variance.threshold<threshold>.<file id>.txt Thresholds = (0.05, 0.1, 0.15, 0.2, 0.25, 0.3, .. , 0.8) e) Inter-arrival time (mean : Inter_arrival_time_mean.threshold<threshold>.<file id>.txt Thresholds = (0.005, 0.01, 0.02, 0.03, 0.05, .. , 4) f) Inter-arrival time (stdev): Inter_arrival_time_stdev.threshold<threshold>.<file id>.txt Thresholds = (0.005, 0.01, 0.15, 0.2, 0.25, .. , 10) g) Throughput : Tput_whole_trace.threshold<threshold>.<file id>.txt Thresholds = (10, 15, 20, 25, 30, 40, 50, 60, .. , 10000) h) Pkt length Entropy : Pkt_len_entropy.timebin<time bin size>.threshold<threshold>.<file id>.txt Time bins = (1, 600) Thresholds = (0.01, 0.015, 0.02, 0.025, 0.03, .. , 2) g) UDP Connections : UDP_Connections_timebin<time bin size>.threshold<threshold>.<file id>.txt Time bins = (1, 5, 10, 60, 600) Thresholds = (2 .. 30) h) TCP/UDP Connections : TCP_UDP_Connections_timebin<time bin size>.threshold<threshold>.<file id>.txt Time bins = (1, 5, 10, 60, 600) Thresholds = (2 .. 30) i) Boot Time : boot_time.method_<methods>.<parameters>.DIFF_<time diff>.NUM_<num pkt>.<file id>.txt Frequency estimation methods: (1, 2, 3) 1 = WINDOW based 2 = EWMA based 3 = last calculated freq Frequency estimation parameters: 1: (10, 100) 2: (0.5, 0.9) 3: (1) THRESHOLD_EST_RX_DIFF = (1 5 30 120) OUT_RANGE_NUM = (1 5 10) - output: a) How many clients are detected by 1/2/3/4/5... methods ./tethered_clients/summary.<file id>.number_methods.txt format: <number of methods> <number of tethered clients> b) overlapping between methods ./tethered_clients/summary.<file id>.cross_validation.txt format: <method1> <method2> <overlap> <only by former> <only by latter> <# total detected clients> <overlap ratio> <only by former ratio> <only by latter ratio> 5. evaluate_methods_based_on_TTL.pl There are many methods to detect tehtering. This program use TTL heuristic as ground truth, and calculate precision and recall of other methods with various parameters (e.g. diff thresholds) - input: IP of tethered clients. Possible base: a) TTL (whole trace): ./tethered_clients/TTL_whole_trace.<file id>.txt b) TTL (one second) : ./tethered_clients/TTL_one_second.<file id>.txt c) TTL (default value) : TTL_default_value.<file id>.txt d) User Agent : User_agent.<file id>.txt e) TTL (diff) : TTL_diff.<file id>.txt Evaluation Methods: c) Connections : Connections_timebin<time bin size>.threshold<threshold>.<file id>.txt Time bins = (1, 5, 10, 60, 600) Thresholds = (2 .. 30) d) RTT (variance) : RTT_variance.threshold<threshold>.<file id>.txt Thresholds = (0.05, 0.1, 0.15, 0.2, 0.25, 0.3, .. , 0.8) e) Inter-arrival time (mean : Inter_arrival_time_mean.threshold<threshold>.<file id>.txt Thresholds = (0.005, 0.01, 0.02, 0.03, 0.05, .. , 4) f) Inter-arrival time (stdev): Inter_arrival_time_stdev.threshold<threshold>.<file id>.txt Thresholds = (0.005, 0.01, 0.15, 0.2, 0.25, .. , 10) g) Throughput : Tput_whole_trace.threshold<threshold>.<file id>.txt Thresholds = (10, 15, 20, 25, 30, 40, 50, 60, .. , 10000) h) Pkt length Entropy : Pkt_len_entropy.timebin<time bin size>.threshold<threshold>.<file id>.txt Time bins = (1, 600) Thresholds = (0.01, 0.015, 0.02, 0.025, 0.03, .. , 2) g) UDP Connections : UDP_Connections_timebin<time bin size>.threshold<threshold>.<file id>.txt Time bins = (1, 5, 10, 60, 600) Thresholds = (2 .. 30) h) TCP/UDP Connections : TCP_UDP_Connections_timebin<time bin size>.threshold<threshold>.<file id>.txt Time bins = (1, 5, 10, 60, 600) Thresholds = (2 .. 30) i) Boot Time : boot_time.method_<methods>.<parameters>.DIFF_<time diff>.NUM_<num pkt>.<file id>.txt Frequency estimation methods: (1, 2, 3) 1 = WINDOW based 2 = EWMA based 3 = last calculated freq Frequency estimation parameters: 1: (10, 100) 2: (0.5, 0.9) 3: (1) THRESHOLD_EST_RX_DIFF = (1 5 30 120) OUT_RANGE_NUM = (1 5 10) - output: a) ./tethered_clients_processed_data format: <threshold> <TP> <FN> <FP> <TN> <precision> <recall> b) figure: plot PR curve (Precision-Recall) using plot_pr.plot.mother ./tethered_clients_figures/ /*************** * Helpers: * some tools to help me quickly generate readable results ***************/ 1. tool1.pl Read in the output of "detect_tethering.pl" and summarize results from all files into a single file. - input: /export/home/ychen/sprint/output/detect.TTL.<input file ID>.log - output: /export/home/ychen/sprint/output/detect.TTL.summary format: <1. # tethered clients>, <2. # clients>, <3. # tethered pkts>, <4. # pkts>, <5. tethered traffic (bytes)>, <6. traffic (bytes)>, <7. intra-flow tput ratio>, <8. intra-flow #pkt ratio>, <9. intra-flow pkt len entropy ratio>, <10. inter-flow tput ratio>, <11. inter-flow #pkt ratio>, <12. inter-flow pkt len entropy ratio>, <13. inter-flow inter arrival time mean ratio>, <14. inter-flow inter arrival time stdev ratio> 2. tool2.pl Summarize how many different TTLs behind the same source IP. - input: ./output/file.<id>.ttl.txt - output: ./output/files.ttl.summary 3. tool3.pl Summarize the corss-validation results. - input: a) ./tethered_clients/summary.$file_id.number_methods.txt b) ./tethered_clients/summary.$file_id.cross_validation.txt - output a) ./tethered_clients/summary.number_methods.txt format: <# tethered clients detected by 1 method>, <ratio>, <# tethered clients detected by 2 methods>, <ratio>, <# tethered clients detected by 3 methods>, <ratio>, ... b) ./tethered_clients/summary.cross_validation.txt <method1>, <method2>, <# overlap>, <# former>, <# latter>, <# tethered clients>, <ratio overlap>, <ratio former>, <ratio latter>, <method1>, <method3>, <# overlap>, <# former>, <# latter>, <# tethered clients>, <ratio overlap>, <ratio former>, <ratio latter>, <method2>, <method3>, <# overlap>, <# former>, <# latter>, <# tethered clients>, <ratio overlap>, <ratio former>, <ratio latter>, ... /*************** * todo: ***************/ 1. implement https://www.cs.columbia.edu/~smb/papers/fnat.pdf for ID heuristic 2. implement change point detection method 3. coefficient between confidence indicators 4. Detection codes 5. figure out the packets with abnormal TTL /*************** * jobs logs: ***************/ - pcapParser (fix a bug about fragments) ............................ done batch_analyze_sprint_text.sh ................................... done batch_analyze_sprint_text_inter_arrival_time.sh ................. done batch_detect_tethering.sh .................................. done batch_detect_tethering_TTL.sh ............................... done batch_detect_tethering_inter_arrival_time.sh ................ done batch_detect_tethering_pkt_len_entropy.sh .................. done batch_detect_tethering_tput.sh ............................. done batch_detect_tethering_TTL_default_value.sh ................ done - pcapParser2 (fix a bug about fragments) ........................... done batch_analyze_sprint_tcp_connections.sh ........................ done batch_detect_tethering_connections.sh ...................... done batch_analyze_sprint_tcp_rtt.sh ................................ done batch_detect_tethering_rtt.sh .............................. done subtask_tcp_seq_ack/batch_analyze_sprint_tcp_seq.sh ............. done - batch_pcapParser3.sh ............................................. done batch_analyze_sprint_http_user_agents.sh ....................... done batch_detect_tethering_user_agent.sh ....................... done - batch_pcapParser4.sh ............................................. done batch_analyze_sprint_udp_connections.sh ........................ done batch_detect_tethering_udp_connections.sh .................. done batch_analyze_sprint_tcp_udp_connections.sh .................... done batch_detect_tethering_tcp_udp_connections.sh .............. done - batch_pcapParser5.sh batch_detect_tethering_boot_time.sh - batch_pcapParser6.sh - batch_evaluate_methods_based_on_TTL.sh ..................... done batch_cross_validation.sh .................................. done /*************** * note: ***************/ 1. The tethering detection methods are applied for all IPs in the trace, including clients and servers. It's meaningless to check the tethering of servers and the methods may not make sense (e.g. TTL is collected in client side and varies depending on the route.) So, during the evaluation, I didn't count the IPs which are not from cellular network (i.e. 28.XXX.XXX.XXX)
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.