GithubHelp home page GithubHelp logo

diversity_tools's Introduction

diversity_tools --description in progress--

""" dataframe_analysis.py Analysis biological features from OrthoVenn2 (.txt file) DataFrame.

This script reads an input file in OrthoVenn2 (.txt file) format, performs various operations on it such as limiting the number of families, filtering columns by value occurrence, selecting families with the highest number of genes, and filtering rows by their names, and finally writes the processed DataFrame to a CSV file.

Usage: python main.py --File <input_file> --Operations --out <output_folder>

Arguments: --File (-f): Path to the input file in OrthoVenn2 (.txt file) format. --Operations (-o): Operations to perform on the DataFrame. Multiple operations should be separated by spaces. - limitation=<value,n>, where n is the maximum number of families to keep in the DataFrame. - filter=<value,v;ignore_zeros,y;threshold,t;mode,m>, where value is the value to filter on. Multiple filters should be separated by semicolons. Additionally, the following optional arguments can be used: - threshold=<threshold_value>: Specifies the minimum value (0-1) a column must have to be kept. - ignore_zeros=<True/False>: Specifies whether to ignore columns with all zero values. - mode=<greater_than/equal/less_than>: Specifies whether to compare the value with >= values, == values or <= values. - Highest_value=, where n is the number of families with the highest number of genes to keep in the DataFrame. - Cols=<column_name_list,[col_name_1,col_name_2,...];keep_columns=True/False>, where column_name_list is a list of column names to keep, and keep_columns specifies whether to keep the specified columns or remove them. - Rows=<row_name_list,[row_name_1,row_name_2,...];keep_row=True/False>, where row_name_list is a list of row names to keep, and keep_row specifies whether to keep the specified rows or remove them. --out (-u): Path to the output folder.

Output: - Analyzed_DataFrame.csv: Processed DataFrame in CSV format. - run.log.txt: Log file containing the command line arguments used to run the script.

Examples: python main.py --File input.txt --Operations limitation=value,5 filter=value,1;threshold,0.5;ignore_zeros,False;mode,equal --out output_folder

python main.py --File input.txt --Operations limitation=value,3 Cols=column_name_list,[COG0532,COG0052];keep_columns=True Rows=row_name_list,[Gene_1,Gene_2];keep_row,True --out output_folder

"""

diversity_tools's People

Contributors

agustinamata avatar aberdur avatar victorgcb1987 avatar

Watchers

Aureliano Bombarely avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.