GithubHelp home page GithubHelp logo

radeity / lite Goto Github PK

View Code? Open in Web Editor NEW

This project forked from xmudm/lite

0.0 0.0 0.0 41.69 MB

Shell 0.49% JavaScript 0.41% Ruby 0.03% Python 8.65% C 0.01% Java 9.82% Scala 71.91% R 3.01% PowerShell 0.01% CSS 0.09% ANTLR 0.09% Makefile 0.02% Thrift 0.09% HTML 0.20% Batchfile 0.08% Dockerfile 0.02% Roff 0.09% HiveQL 4.74% q 0.25% ReScript 0.01%

lite's Introduction

LITE is an auto-tuning system for various Spark applications on large-scale datasets.

LITE

Enviroment

The project is implemented using python 3.6 and tested in Linux environment. Our system environment and cuda versions are as follows:

ubuntu 18.04
Hadoop 2.7.7
spark 2.4.7
HiBench 7.0
JAVA 1.8
SCALA  2.12.10
Python 3.6
Maven 4.15
CUDA Version: 10.1

1. Data Generate

To generate the training data for your own cluster environments, first install SparkBench https://github.com/CODAIT/spark-bench.

1.1. Run Applications

Run SparkBench applications using the following command:

python scripts/bo_sample.py <path_to_spark_bench_folders>

The log files will be saved on your spark history server, you can use the following command to download it:

hdfs dfs -get <path_to_spark_history_server_folders_on_hdfs> <local_folders>

1.2. Process Logs (for baselines)

Parse the log files generated by spark-bench to obtain stage status

python scripts/history_by_stage.py <spark_bench_conf_path> <history_dir> <result_path>

Note that the log file does not contain the data volume features of the workload, we need to add the data volume features through the configuration file in <spark_bench_conf_path>, result is like:

{
	"AppId": "application_1616731661908_1527",
	"AppName": "SVM Classifier Example",
	"Duration": 23656,
	"SparkParameters": ["spark.default.parallelism=6", "spark.driver.cores=6"......],
	"StageInfo": {
		"0": {
			"duration": 3234,
			"input": 134283264,
			"output": 0,
			"read": 0,
			"write": 0
		}......
	},
	"WorkloadConf": ["NUM_OF_EXAMPLES=1600000", "NUM_OF_FEATURES=100"]
}

Save as training data file

python scripts/build_dataset.py <result_path> <dataset_path>

The dataset files are written as comma-separated values files with a single header row.

AppId	AppName	Duration	spark.default.parallelism	spark.driver.cores	spark.driver.memory	spark.driver.maxResultSize	spark.executor.instances	spark.executor.cores	spark.executor.memory	spark.executor.memoryOverhead	spark.files.maxPartitionBytes	spark.memory.fraction	spark.memory.storageFraction	spark.reducer.maxSizeInFlight	spark.shuffle.compress	spark.shuffle.file.buffer	spark.shuffle.spill.compress	rows	cols	itr	partitions	stage_id	duration	input	output	read	write	code	node_num	cpu_cores	cpu_freq	mem_size	mem_speed	net_width

1.3.Stage-based Code Organization (for LITE)

Obtain the stage code characteristics: enter the instrumentation folder, maven it into a jar package which name is preMain-1.0.jar

cd instrumentation 
mvn clean package

and add the package to the spark-submit's shell file as follow:

spark-submit --class <workload_class> --master yarn --conf "spark.executor.cores=4"  --conf "spark.executor.memory=5g" --conf "spark.driver.extraJavaOptions=-javaagent:<path_to_your_instrumentation_jar>/preMain-1.0.jar" <path_to_spark_bench>/<workload>/target/spark-example-1.0- SNAPSHOT.jar

The stage code is in /inst_log, you can change it by yourself; The file you get by instrumentation should be parsed by prediction_ml/spark_tuning/by_stage/instrumentation/all_code_by_stage/get_all_code.py

python get_all_code.py <folder_of_instrumentation> <code_reuslt_folder>

Our model is saved in prediction_nn,

You should use data_process_text.py、dag2data.py、dataset_process.py to get dictionary information、Process the edge and node information of the graph, and integrate all features.

python data_process_text.py <code_reuslt_folder>
python dag2data.py <log_folder>
python dataset_process.py

Dictionary information and graph information are saved in dag_data, integrated them by dataset_process.py, the final dataset is in the folder dataset.

2.Model Training

Then use fast_train.py to train model.

python fast_train.py

You can change the config of model through config.py, and the model will be saved in the folder model_save.

3. Model Update

You can also use trans_learn.py to finetune the model.

python trans_learn.py

4.Model Testing

nn_pred_1.py can test the model,we use predict_first_cold.py to predict the best combination of parameters and evaluate the performance.

python nn_pred_8.py
python predict_first_cold.py

lite's People

Contributors

cheyennelin avatar alpsxy avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.