GithubHelp home page GithubHelp logo

shadow / shadow Goto Github PK

View Code? Open in Web Editor NEW
1.3K 51.0 235.0 30.75 MB

Shadow is a discrete-event network simulator that directly executes real application code, enabling you to simulate distributed systems with thousands of network-connected processes in realistic and scalable private network experiments using your laptop, desktop, or server running Linux.

Home Page: https://shadow.github.io

License: Other

CMake 1.51% Python 1.98% C 18.20% C++ 0.30% Shell 0.44% Rust 77.50% CSS 0.02% Go 0.05%
emulation simulation networking experimentation tor science-research scientific-computing scalability realism control

shadow's Introduction

The Shadow Simulator

Quickstart

After installing the dependencies: build, test, and install Shadow into ~/.local:

$ ./setup build --clean --test
$ ./setup test
$ ./setup install

Read the usage guide or get started with some example simulations.

What is Shadow?

Shadow is a discrete-event network simulator that directly executes real application code, enabling you to simulate distributed systems with thousands of network-connected processes in realistic and scalable private network experiments using your laptop, desktop, or server running Linux.

Shadow experiments can be scientifically controlled and deterministically replicated, making it easier for you to reproduce bugs and eliminate confounding factors in your experiments.

How Does Shadow Work?

Shadow directly executes real applications:

  • Shadow directly executes unmodified, real application code using native OS (Linux) processes.
  • Shadow co-opts the native processes into a discrete-event simulation by interposing at the system call API.
  • The necessary system calls are emulated such that the applications need not be aware that they are running in a Shadow simulation.

Shadow connects the applications in a simulated network:

  • Shadow constructs a private, virtual network through which the managed processes can communicate.
  • Shadow internally implements simulated versions of common network protocols (e.g., TCP and UDP).
  • Shadow internally models network routing characteristics (e.g., path latency and packet loss) using a configurable network graph.

Why is Shadow Needed?

Network emulators (e.g., mininet) run real application code on top of real OS kernels in real time, but are non-determinsitic and have limited scalability: time distortion can occur if emulated processes exceed an unknown computational threshold, leading to undefined behavior.

Network simulators (e.g., ns-3) offer more experimental control and scalability, but have limited application-layer realism because they run application abstractions in place of real application code.

Shadow offers a novel, hybrid emulation/simulation architecture: it directly executes real applications as native OS processes in order to faithfully reproduce application-layer behavior while also co-opting the processes into a high-performance network simulation that can scale to large distributed systems with hundreds of thousands of processes.

Caveats

Shadow implements over 150 functions from the system call API, but does not yet fully support all API features. Although applications that make basic use of the supported system calls should work out of the box, those that use more complex features or functions may not yet function correctly when running in Shadow. Extending support for the API is a work-in-progress.

That being said, we are particularly motivated to run large-scale Tor Network simulations. This use-case is already fairly well-supported and we are eager to continue extending support for it.

More Information

Homepage:

Documentation:

Community Support:

Bug Reports:

shadow's People

Contributors

ahf avatar amiller avatar apflux avatar caffeineshock avatar cauthu avatar cclauss avatar cohosh avatar congyu-liu avatar dependabot[bot] avatar jaredthecoder avatar jdgeddes avatar jtracey avatar kloesing avatar ln5 avatar lorenzo9uerra avatar mileb avatar mjptree avatar pastly avatar ppopth avatar robgjansen avatar rwails avatar salmanmunaf avatar sjmurdoch avatar sporksmith avatar stevenengler avatar tomstealinc avatar trinity-1686a avatar valdaarhun avatar vrask avatar zyansheep avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

shadow's Issues

Regression on repeatability

Experiments are not reproducible due to the CPU delay model. Running the same experiment twice will result in different physical CPU measurements and, in turn, potentially different ordering of event execution.

The regression started in #34, more specifically in 423b3ad

Scallion simulations with large topologies never finish

Simulation dies above 4000 ticks when running with 2600 clients, 40 relays, and 200 servers.

When it dies, memory usage is around 30GB and single threaded cpu usage is at 100%. Well over 50% of the cpu usage is system calls.

The tail of the scallion output log looks normal, but is clearly truncated:

|t=4389.568|s=0|w=1|shadow|105.221.37.25|1178.proxy.shd| WARNING: vsocket_setsockopt: setsockopt not implemented
|t=4389.570|s=0|w=1|shadow|209.1.219.198|2499.proxy.shd| WARNING: vsocket_setsockopt: setsockopt not implemented
|t=4389.573|s=0|w=1|module|105.221.37.25|1178.proxy.shd| [tor-warn] log_unsafe_socks_warning() Your application (using socks5 to port 8080) is giving Tor only an IP address. Applications that do DNS resolves themselves may leak information. Consider using Socks4A (e.g. via privoxy or socat) instead. For more information, please see https://wiki.torproject.org/TheOnionRouter/TorFAQ#SOCKSAndDNS.
|t=4389.575|s=0|w=1|module|209.1.219.198|2499.proxy.shd| [tor-warn] log_unsafe_socks_warning() Your application (using socks5 to port 8080) is giving Tor only an IP address. Applications that do DNS resolves themselves may leak information. Consider using Socks4A (e.g. via privoxy or socat) instead. For more information, please see https://wiki.torproject.org/TheOnionRouter/TorFAQ#SOCKSAndDNS.
|t=4389.599|s=0|w=1|shadow|113.24.24.95|5.exit.shd| WARNING: vsocket_setsockopt: setsockopt not implemented
|t=4389.609|s=0|w=1|shadow|113.24.24.95|5.exit.shd| WARNING: vsocket_setsockopt: setsockopt not implemented
|t=4389.645|s=0|w=1|shadow|89.179.146.148|15.exit.shd| WARNING: vsocket_setsockopt: setsockopt not implemented
|t=4389.662|s=0|w=1|module|25.185.119.55|829.proxy.shd| MESSAGE: [fg-download-complete] got first bytes in 10.625 seconds and 327680 of 327680 bytes in 21.906 seconds (download 41 of 0)
|t=4389.687|s=0|w=1|shadow|225.57.90.11|303.proxy.shd| WARNING: vsocket_s

Re-running inside of gdb shows a number of filegetter fatal error: server closed messages, but does not otherwise halt or segfault. When I interrupt the process it is nearly always in the same place. Stack trace:

#0  0x00007ffff713f295 in mq_timedreceive () from /lib/x86_64-linux-gnu/librt.so.1
#1  0x0000000000424034 in pipecloud_localize_reads (pipecloud=0x652200)
    at /raid/home/cwacek/shadow/shadow-shadow-v1.0.0/src/util/pipecloud.c:205
#2  0x00000000004240b8 in pipecloud_write (pipecloud=0x652200, dest=<value optimized out>, data=0x5a39cde80 "", data_size=211)
    at /raid/home/cwacek/shadow/shadow-shadow-v1.0.0/src/util/pipecloud.c:187
#3  0x000000000040e456 in dvn_packet_route (dest_type=8 '\b', dest_layer=1 '\001', dest_major=0, 
    frametype=<value optimized out>, frame=<value optimized out>)
    at /raid/home/cwacek/shadow/shadow-shadow-v1.0.0/src/core/routing.c:88
#4  0x000000000040bde5 in dlog_channel_write (channel=<value optimized out>, data=<value optimized out>, 
    length=<value optimized out>) at /raid/home/cwacek/shadow/shadow-shadow-v1.0.0/src/core/log.c:238
#5  0x000000000040c286 in dlogf_main (level=<value optimized out>, context=<value optimized out>, fmt=<value optimized out>, 
    vargs=<value optimized out>) at /raid/home/cwacek/shadow/shadow-shadow-v1.0.0/src/core/log.c:368
#6  0x0000000000412ae1 in snricall_log (va=<value optimized out>)
    at /raid/home/cwacek/shadow/shadow-shadow-v1.0.0/src/core/node/snricall.c:152
#7  0x0000000000413260 in snricall (call_code=<value optimized out>)
    at /raid/home/cwacek/shadow/shadow-shadow-v1.0.0/src/core/node/snricall.c:366
#8  0x00007ffff62f6123 in service_filegetter_log (sfg=0x7ffff667a518, level=SFG_NOTICE, format=<value optimized out>)
    at /raid/home/cwacek/shadow/shadow-shadow-v1.0.0/src/plug-ins/filetransfer/shd-service-filegetter.c:51
#9  0x00007ffff62f6e62 in service_filegetter_report (sfg=0x7ffff667a518, sockd=30046)
    at /raid/home/cwacek/shadow/shadow-shadow-v1.0.0/src/plug-ins/filetransfer/shd-service-filegetter.c:57
#10 service_filegetter_activate (sfg=0x7ffff667a518, sockd=30046)
    at /raid/home/cwacek/shadow/shadow-shadow-v1.0.0/src/plug-ins/filetransfer/shd-service-filegetter.c:438
#11 0x00007ffff62f2584 in _plugin_socket_readable (sockd=30046)
    at /raid/home/cwacek/shadow/shadow-scallion-v1.0.0/src/scallion.c:364
#12 0x0000000000411f99 in context_execute_socket (provider=0x6c58e0, sockd=30046, can_read=1 '\001', can_write=1 '\001', 
    do_read_first=1 '\001') at /raid/home/cwacek/shadow/shadow-shadow-v1.0.0/src/core/node/context.c:131
#13 0x00000000004136ec in vepoll_execute_notification (provider=<value optimized out>, vep=0x5d0ea7f90)
    at /raid/home/cwacek/shadow/shadow-shadow-v1.0.0/src/core/vevent/vepoll.c:180
#14 0x0000000000418719 in vci_exec_event (vci_mgr=0x658b00, vci_event=0x549fa5080)
    at /raid/home/cwacek/shadow/shadow-shadow-v1.0.0/src/core/vnetwork/vci.c:1016
#15 0x000000000041052c in sim_worker_heartbeat (worker=0x6567f0, num_event_worker_executed=0x7fffffffd410)
    at /raid/home/cwacek/shadow/shadow-shadow-v1.0.0/src/core/sim_worker.c:395
#16 0x000000000040d21b in dvn_worker_main (process_id=1, total_workers=1, pipecloud=0x652200)
    at /raid/home/cwacek/shadow/shadow-shadow-v1.0.0/src/core/process.c:119
#17 0x000000000040dd86 in dvn_create_slave (daemon=0, num_processes=1, slave_listen_port=6201, socketset=0x653b20)
    at /raid/home/cwacek/shadow/shadow-shadow-v1.0.0/src/core/process.c:529
#18 0x000000000040df10 in dvn_create_instance (config=0x7fffffffd510)
    at /raid/home/cwacek/shadow/shadow-shadow-v1.0.0/src/core/process.c:625
#19 0x000000000040e04f in dvn_main (config=0x7fffffffd510)
    at /raid/home/cwacek/shadow/shadow-shadow-v1.0.0/src/core/process.c:665
#20 0x000000000040ce41 in main (argc=<value optimized out>, argv=<value optimized out>)
    at /raid/home/cwacek/shadow/shadow-shadow-v1.0.0/src/core/shd-main.c:267

Stepping through from that point in the code, it appears to continuously loop on lines 185 - 190 in the function below. I'm not sure if that's proper behavior.

174     size_t pipecloud_write(pipecloud_tp pipecloud, unsigned int dest, char * data, size_t data_size) {
175             struct timespec ts;
176             int rv;
177
178             assert(dest < pipecloud->num_pipes);
179
180             ts.tv_sec = PIPECLOUD_TIMEOUT_SEC;
181             ts.tv_nsec = PIPECLOUD_TIMEOUT_NSEC;
182
183             rv = mq_timedsend(pipecloud->mqs[dest], data, data_size, 0, &ts);
184
185             while(rv != 0) {
186                     /* first, try to pull in any waiting writes that we might have to avoid deadlocks */
187                     pipecloud_localize_reads(pipecloud);
188
189                     /* then, try to send again. */
190                     rv = mq_timedsend(pipecloud->mqs[dest], data, data_size, 0, &ts);
191             }
192
193             return data_size;
194     }

remove dependency on libevent internals

Because we use internal elements of libevent structures, we need to include some internal libevent header files that are not located in the include directory. They are currently copied to the vevent directory, but we should remove dependencies on those files and remove them from the repo.

Support epoll instead of libevent

epoll is a system interface for polling buffers. The interface is much smaller and is built into the kernel. We should implement the interface for Shadow, intercept and redirect the epoll functions to Shadow's version, and remove support for the rather large libevent.

Ubuntu 11.04 64bit Install Path Issue

When building following the default instructions with default path, I get the following error message.

Install the project...
-- Install configuration: "Release"
-- Installing: /raid/home/cwacek/.local/include/shadow/shd-config.h
-- Installing: /raid/home/cwacek/.local/share/shadow/shadow-externals.cmake
-- Installing: /raid/home/cwacek/.local/share/shadow/shadow-externals-release.cmake
CMake Error at cmake_install.cmake:59 (FILE):
  file INSTALL cannot find
  "/raid/home/cwacek/shadow/shadow/build/shadow/../shadow-resources".


make: *** [install] Error 1
[2011-07-20 14:35:53.117929] setup: make install returned 2
[2011-07-20 14:35:53.117996] setup: run 'shadow' from the install directory

There appears to a shadow-resources.tar.gz file there, but it's not actually a gzipped file. Renaming it to shadow-resources fixes the issue.

ghash_table assertion errors

Running scallion with debug on gives the following errors:

(process:26201): GLib-CRITICAL **: g_hash_table_insert_internal: assertion `hash_table != NULL' failed

(process:26201): GLib-CRITICAL **: g_hash_table_lookup: assertion `hash_table != NULL' failed

(process:26201): GLib-CRITICAL **: g_hash_table_lookup: assertion `hash_table != NULL' failed
main.c:354 connection_start_reading: Assertion conn->read_event failed; aborting.

Support multiple workers

Multiple worker threads can improve shadow's parallelism. We need to make sure each plugin has private versions of libraries, private errno, the intercept library is stateless, and we handle all the other issues that multiple threads will cause.

close release 1.0.0

When the release is ready, tag the release and reintegrate branch into master.

Tor won't recognize static libraries

When trying to install Scallion from Github I'm running into problems when Tor is compiled. At first autoconf complains that no linkable libevent library could be found. If --enable-static-libevent is passed to configure that problem is solved. Autoconf then complains then though that no linkable libssl can befound. Unfortunately --enable-static-openssl doesn't cut it this time. This seems to be caused by the bug in Tor described here:

https://trac.torproject.org/projects/tor/ticket/4692

A workaround is to build libopenssl with shared as opposed to no-shared and use --enable-static-openssl. Can you confirm that issue? I can reproduce it when I delete the .shadow directory and run installdeps.sh followed by python setup.py build. If the .shadow directory is not removed first there might be still .so files in the lib directory because the stable version used to build openssl with the shared flag.

I've submitted a pull request solving the issue by changing the no-shared flag to the shared flag in contrib/installdeps.sh and adding two options (--static-libevent, --static-openssl) to setup,py, which are both enabled by default. Those options then add --enable-static-libevent and --enable-static-openssl respectively in setup_tor.

echo-eth plug-in broken

When autorun-ing the echo-eth plug-in, the message "consistent echo" is not printed. For some reason the plug-in hits a wall before receiving the entire echo.

Create Packet Level Logging Class

Shadow should be able to turn on and off packet logging for certain nodes, so network level metrics can be calculated.

Create a generic template that will call a callback function for every incoming/outgoing packet received/sent, then create a specific pcap logger that utilizes this to log packets if it's turned on for certain nodes.

Fix randomization

The simulator should produce the EXACT same results when run multiple times with the same seed. This currently is broken. Go through and make sure that we only seed the rng once, and that multiple runs produces identical output.

We may need to intercept some functions from the plug-ins to make sure the application randomness goes through our simulator. This will be required for scallion.

Libevent compatibility broken in newer Tor versions

New Tor (tested against 0.2.3.5-alpha) use new libevent config options that break scallion. In src/common/compat_libevent.c:

cfg = event_config_new()
event_base_new_with_config(cfg)

These are not intercepted by Shadow, and lead to evsig_init calls that fail while trying to create socketpair's. We should be able to intercept event_base_new_with_config(), ignore the config hints, and redirect to event_base_new().

Add multiple download mode to browser

Currently, each client may only specify a single server for webpages for the entire experiment. Each client should be able to instead specify a file that contains a list of servers and files to fetch from them (index.html). See the filetransfer client in 'multi' mode for code that does this.

remove dependency on python2.7

Figure out what is forcing us to use python2.7 in our setup.py and src/shadow scripts. See if we can re-write it in a version agnostic way to remove the dependency on 2.7 (while keeping the dependency on python in general).

Installdeps downloads old versions of openssl and libevent, unauthenticated

In the master branch, it looks as if the instaldeps.sh script downloads openssl 1.0.1 (which has known security issues) and libevent 2.0.18 (the latest is 2.0.19). More worrisomely, it doesn't check signatures or digests or anything. (It downloads openssl over an unauthenticated HTTP link.)

It would be much better to check the packages after downloading them, to avoid getting trojaned.

Per-node random streams

Each node should locally behave the same if its behavior is the same, regardless of what other nodes in the network are doing.

Currently, all random bytes are drawn from a simulator-wide pseudo-random stream. Instead, implement a per-node pseudo-random stream that is seeded from the simulator-wide stream during node creation (before the simulation starts). Then each node draws from its own stream during the experiment.

Create abstract class for all VCI events

Currently when executing, depositing and destroying a VCI event a switch statement is used to determine what function should be called, all with similar parameters. To make it easier for managing VCI events, create an abstract class that has a vtable of callback functions that will be called at the appropriate times.

Also, any refactoring of the code that can be done should, so if possible these VCI events should have their own files for implementation.

port to GLib

We currently use custom data structures throughout our code (e.g. lists, hashtables, trees ...). Those data structures are fragile and inefficient. Swap them out for GLib's version. The goal is to reduce our codebase by eliminating the dependence on our custom utilities.

This also means swapping other standard types like ints and pointers to gints and gpointers.

lib64 build problem

When building shadow from master I get the following error (full logs below):
/usr/include/glib-2.0/glib/gtypes.h:34:24: fatal error: glibconfig.h: No such file or directory

Building with:
$ python setup.py build -i /usr/lib64/glib-2.0/include/

makes it build correctly, but this should probably be found automatically by FindGLIB.cmake.


[rob@colossus sandbox]$ git clone git://github.com/shadow/shadow.git shadow_clone_temp
Cloning into shadow_clone_temp...
remote: Counting objects: 569, done.
remote: Compressing objects: 100% (528/528), done.
remote: Total 569 (delta 319), reused 174 (delta 39)
Receiving objects: 100% (569/569), 384.21 KiB, done.
Resolving deltas: 100% (319/319), done.
[rob@colossus sandbox]$ cd shadow_clone_temp/
[rob@colossus shadow_clone_temp]$ ll
total 80K
drwxrwxr-x. 2 rob rob 4.0K May 27 21:40 cmake
-rw-rw-r--. 1 rob rob 4.1K May 27 21:40 CMakeLists.txt
-rw-rw-r--. 1 rob rob 256 May 27 21:40 config.h.in
drwxrwxr-x. 2 rob rob 4.0K May 27 21:40 contrib
-rw-rw-r--. 1 rob rob 35K May 27 21:40 COPYING
drwxrwxr-x. 2 rob rob 4.0K May 27 21:40 doc
-rw-rw-r--. 1 rob rob 4.0K May 27 21:40 README
-rw-rw-r--. 1 rob rob 5.9K May 27 21:40 setup.py
drwxrwxr-x. 7 rob rob 4.0K May 27 21:40 src
drwxrwxr-x. 2 rob rob 4.0K May 27 21:40 test
[rob@colossus shadow_clone_temp]$ python setup.py build
[2011-05-27 21:40:52.793554] setup: running 'cmake /home/rob/sandbox/shadow_clone_temp -
DCMAKE_BUILD_PREFIX=/home/rob/sandbox/shadow_clone_temp/build -DCMAKE_INSTALL_PREFIX=/usr/local' from /home/rob/sandbox/shadow_clone_temp/build
-- The C compiler identification is GNU
-- Check for working C compiler: /usr/lib64/ccache/gcc
-- Check for working C compiler: /usr/lib64/ccache/gcc -- works
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
--
-- -------------------------------------------------------------------------------
-- Current settings: (change with '$ cmake -D=<ON|OFF>')
-- SHADOW_DEBUG=OFF
-- SHADOW_COVERAGE=OFF
-- SHADOW_TEST=OFF
-- SHADOW_DOC=OFF
-- -------------------------------------------------------------------------------
--
-- CMAKE_BUILD_TYPE Release enabled.
-- Check if the system is big endian
-- Searching 16 bit integer
-- Looking for sys/types.h
-- Looking for sys/types.h - found
-- Looking for stdint.h
-- Looking for stdint.h - found
-- Looking for stddef.h
-- Looking for stddef.h - found
-- Check size of unsigned short
-- Check size of unsigned short - done
-- Using unsigned short
-- Check if the system is big endian - little endian
-- Found FLEX: /usr/bin/flex (found version "2.5.35")
-- Found BISON: /usr/bin/bison (found version "2.4.3")
-- Looking for include files CMAKE_HAVE_PTHREAD_H
-- Looking for include files CMAKE_HAVE_PTHREAD_H - found
-- Looking for pthread_create in pthreads
-- Looking for pthread_create in pthreads - not found
-- Looking for pthread_create in pthread
-- Looking for pthread_create in pthread - found
-- Found Threads: TRUE
-- Found ZLIB: /usr/include (found version "1.2.5")
-- Found OpenSSL: /usr/lib64/libssl.so;/usr/lib64/libcrypto.so
-- Found components for RT
-- RT_INCLUDES = /usr/include
-- RT_LIBRARIES = /usr/lib64/librt.so
-- Found components for DL
-- DL_INCLUDES = /usr/include
-- DL_LIBRARIES = /usr/lib64/libdl.so
-- Found components for M
-- M_INCLUDES = /usr/include
-- M_LIBRARIES = /usr/lib64/libm.so
-- Found components for EVENT2
-- EVENT2_INCLUDES = /usr/local/include
-- EVENT2_LIBRARIES = /usr/local/lib/libevent.so
-- Found components for GLIB
-- GLIB_INCLUDES = /usr/include/glib-2.0
-- GLIB_LIBRARIES = /usr/lib64/libglib-2.0.so
-- Configuring done
-- Generating done
-- Build files have been written to: /home/rob/sandbox/shadow_clone_temp/build
[2011-05-27 21:40:54.024534] setup: cmake returned 0
[2011-05-27 21:40:54.024592] setup: calling 'make'
Scanning dependencies of target shadow-util
[ 1%] Building C object src/util/CMakeFiles/shadow-util.dir/btree.c.o
In file included from /usr/include/glib-2.0/glib/galloca.h:34:0,
from /usr/include/glib-2.0/glib.h:32,
from /home/rob/sandbox/shadow_clone_temp/src/util/utility.h:26,
from /home/rob/sandbox/shadow_clone_temp/src/util/global.h:28,
from /home/rob/sandbox/shadow_clone_temp/src/util/btree.c:24:
/usr/include/glib-2.0/glib/gtypes.h:34:24: fatal error: glibconfig.h: No such file or directory
compilation terminated.
make[2]: *** [src/util/CMakeFiles/shadow-util.dir/btree.c.o] Error 1
make[1]: *** [src/util/CMakeFiles/shadow-util.dir/all] Error 2
make: *** [all] Error 2
[2011-05-27 21:40:54.154923] setup: make returned 2
[2011-05-27 21:40:54.154979] setup: now run 'python setup.py install'

Per-node log-level overrides

Add config option to XML that enables nodes to choose their own log level. The simulation-wide configured log level should be the fallback log-level if none is specified for a given node.

Better setup using ncurses

Redesign the setup (dependencies, configure, build, install) process. Use ncurses for improved usability.

Dependency installs can be scripted to make it easier for the user, but the ncurses TUI should explain what its doing and ask the user to confirm actions.

Defaults should always be to $HOME/.local. Assume the user never installs with root, but allow install paths to be configured on the fly during the ncurses setup process.

Better experiment management

We need to make it easier/more clear to get an experiment started.

Currently, we are distributing .xz topology bundles and we require the user to run scallion from the directories therein. This could be managed by:

  1. detecting that we are not in the correct directory when scallion is executed, and inform the user
  2. make the path to the .xz file mandatory, and pass that in as an argument to scallion. the script will automatically decompress and run from the correct directory. though, this may reduce flexibility when running several experiments at once?

I think 2 is better.

Make torrent plug-in thread-safe

Currently, the built-in example of the torrent plug-in does not seem to be thread-safe. Running

$ shadow --torrent -w 2

produces several errors like

0:0:0:044848 [thread-1] 0:0:20:129000002 [torrent-warning] [auth.torrent-5.0.0.0] [torrent_activate] error in client epoll_wait

message queue maintenance

mqueues (in src/util/pipecloud.c) currently have 2 problems:

  1. Message queues use a hard-coded name, preventing more than one instance of shadow from running at the same time on the same machine.
  2. Message queues are not unlinked properly
    • stale messages in the queues from a previous run could affect future experiments
    • users running shadow on the same machine will get "permission denied" problems because the queues are owned by whoever ran shadow first

Be careful when dynamically selecting queue IDs because if they are not properly unlinked, you will run into a system limit on the allowed number of message queues.

Add license for shadow-resources

shadow-resources.tar.gz will be distributed from the shadow homepage. We need a license for it. Find an appropriate one and include it in the distributed tarfile.

Min-heap for Event Queues

Analysis of the GAsyncQueue reveals that its implemented with two GList structures. This means inserts are O(n). Results from gprof confirm: close to 90% of the time is spent managing the queue.

This is the single most important part of the simulator in terms of efficiency.

We should be using min-heap backed event priority queues for O(log(n)).

Torrent plug-in - torrent client error

The torrent example is causing socket errors when building Shadow without symbols:

python setup.py build
python setup.py install
shadow --torrent

The error occurs on line 404 in shd-torrent-client.c:

torrent client fatal error: bad file descriptor

This only seems to be happening on Ubuntu. I confirmed it on 11.10 and 12.04 LTS. I did not see any errors when building without symbols on Fedora 15 or 17. Adding the "-g" flag to the build line before installing seems to prevent the error.

set minimum network latency

The minimum network latency should by floored at 10ms, no matter what is configured by the user. This is meant to avoid failures in parallel mode due to extremely small windows.

Per-node heartbeat statistics

Log a heartbeat every X seconds at log level Y with some aggregate statistics for each node:

  1. We currently take measurements whenever "processing" inside of a plug-in, and delay events for the associated node based on its CPU speed. We should also track the total measured processing time of each node, and log this information in the heartbeat message
  2. Similarly, track and report total bytes in/out of each node
  3. Approximate memory usage would be useful, which we could track as the sum of the registered memory as well as new heap allocations (by intercepting malloc family)

Add config options to XML that enables nodes to choose their own X and Y. Also include simulation-wide defaults for X and Y (editable on the command line) in case nodes do not choose to specify them.

Support running on Mac OS X

Currently there are problems with the rt library (namely, it is not implemented on Mac). This means clock_gettime() will not work. Our filegetter uses this for file download timings. We can switch to using the g_timer_* functions from glib, which should provide cross platform support for us.

GLib prioritizes time lookups as follows, depending on whats available to the system:

  1. clock_gettime(CLOCK_MONOTONIC)
  2. clock_gettime(CLOCK_REALTIME)
  3. gettimeofday(&<struct timeval>, NULL)

IP address generation and configurable addresses

Currently we use GQuarks to represent IDs of various objects, as well as IPs. We should:

  1. move to representing IPs with the built-in Address type
  2. allow users to assign IP addresses to nodes in the XML input file done
  3. generate valid IPs for nodes whose IP address is not specified done

symbol scanner, registration auto-generation

Currently, if we want to support a newer version of Tor, we must do a diff and look at the variable changes. This means that anyone wanting to use shadow must start with v 0.2.2.15-alpha (our currently supported version).

This feature is to support auto scanning object files or binaries and manipulating the symbols in such a way that modification of the source code is unnecessary. This means that symbols that were defined as static (nonglobal) in the object files must be changed so that they can be registered in shadow. After manipulating the object files, the script should generate a C file that contains shadow registration of all variables used in the binaries.

This involves a few steps:

  1. rename static variables
    Static variables are really just variables that get some extra information added to them in the symbol tables. So a static variable named "foo" appears as "foo.1234" in the symbol table. "foo.1234" is not accessible in shadow, so we must rename this to something that is accessible while ensuring variable names are not duplicated. This can be done with something like:

objcopy file.o --redefine-sym oldname_static.1234=newname_global

  1. globalize the static variables so they are accessible by shadow
    Static variables are local by default in the symbol table. This makes them inaccessible outside of the file in which they were defined. We can change them to globals so they can be accessed with something like:

objcopy file.o globalize-symbol=newname_global

  1. dynamically calculate sizes
    The shadow registration function required a pointer to each variable and the size stored there. We can calculate sizes dynamically as follows:

nm --print-size --size-sort file.o

or

readelf -s

The output of each of these commands will need to be parsed, and files with, at least, a registration function should be output.

clean up and simplify preloading

When we preload functions, we use dlsym() to lookup our intercepted function, and the system functions if we are not wanting to intercept it.

This could be done a little differently to make things smoother:

  1. The preload lib calls into shadow, e.g. shadow_getInterceptedFunctions(), to obtain a structure of pointers to all of our intercepted functions. This would require preload to do a lookup for "shadow_getInterceptedFunctions", but once it does that it never calls dlsym again. It can save the pointer and use this structure to make future calls.
  2. For cases where we want to call the system version of a function, we can use syscall(), using the IDs from /usr/include/bits/syscall.h. This way we avoid storing pointers to everything.

This new approach should drastically simplify the process of finding the correct function to call, and should make the preload library cleaner and could be made to work for multiple threads.

redesign parallel model

There are 2 ways for parallel computing: multiple threads or multiple processes. We currently inherited the multiple process approach from DVN. We want a redesign using multiple threads per machine, with the ability to distribute processes to other machines later if wanted.

This means redesigning the message passing functionality using GLibAsyncQueues and designing a solid foundation for distributing load (node contexts) among the many threads.

Node distribution should not be 1 per thread, nor should they be divided evenly among threads. Rather they should perhaps be divided into X as many groups as there are threads, where X is some small constant.

Test that all included topologies work

Some of the topology files we distribute in shadow do not work because of incorrect config. For example, some of the relays have less than 20 KiB/s, which results in an assertion failure in Tor.

We should make sure each Tor network runs to completion.

CPU delay measurements and multipliers

Since Shadow controls execution of the application code, it can take precise measurements on how long it takes before the application returns. These measurements could be used as "cpu delay". We could then model a multiplier for each node to model CPU speeds of actual machines rather than CPU speeds of the host server.

One thing to be careful about is that the application may call back into Shadow. These calls normally consume time in the kernel. Is the time it takes for Shadow to emulate kernel processing close enough to real kernel processing?

Integrate new Shadow engine

The redesigned engine needs to be attached to the old shadow code. This is a general issue tracking changes that allow the integration to progress.

forked from issue #6

Finish Up Browser TODO List

There is some work left from the browser plug-in, in order of importance:

  1. Print usage
  2. Add multiple download mode
    • Currently, each client may only specify a single server for webpages for the entire experiment. Each client should be able to instead specify a file that contains a list of servers and files to fetch from them (index.html). See the filetransfer client in 'multi' mode for code that does this.
  3. Print useful statistics, hopefully in a single message for parsing reasons:
    • time to complete the first html page download
    • size of the first html page download
    • time to complete all embedded objects
    • cumulative size of all embedded objects
    • number of embedded objects (is this done already?)
    • aggregated downloaded and upload bytes
    • ... more stats?
  4. Writing a Readme (The link below might be a good start ;-))
  5. Reuse initial connection which was used to download the HTML document

Here is a link to the original work:
#44

Support 1 application per plug-in, multiple plug-ins per node

Currently, if a node wants to run multiple applications, it must write its plugin to manage those applications and track which sockets belong to which applications. This should be done in shadow, so that a plug-in that runs Tor is completely separate from a plug-in that runs a file server.

Shadow can track which application creates which sockets by keeping a cached pointer to the application before calling into its code. Then if any interceptions happen while in the application context, the sockets that are created/deleted, etc, can be linked to that application. Communication between applications on a single virtual node (localhost) should go through a high-bandwidth shadow interface.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.