joerivstrien / process_maxquant Goto Github PK

View Code? Open in Web Editor NEW

1.0 1.0 0.0 341 KB

code process maxquant output (protein-groups.txt)

Python 100.00%

process_maxquant's People

Contributors

Stargazers

Watchers

process_maxquant's Issues

Sample names are not always fully detected (resulting in unexpected behaviour)

The file '20210128_BHM_DDM_XL-CP_all_proteinGroups' has the sample names "iBAQ BHM_DDM_DMTMM_01" and "iBAQ BHM_DDM_PhoX_01" where the program only detects "iBAQ BHM" as sample name resulting in an error. So, make sure that sample names are detect which are "iBAQ {sample_name}_[0-9]{2}" or something like that.

Validate the settings file

Currently, I assume that the settings file the user can change is always correct. Everyone makes mistakes so I should write code to validate whether the settings file is valid.

Enable the user to dictate the column output order and sort the samples alphabetically.

In the settings file the user should be able to dictate the order of the columns in the excel file. This also means that some functions should be made to check whether the user has entered valid column names. Additionally, the samples should be inserted in the excel in a alphabetically way instead of a random order.

The columns in the excel file have the following order: the original columns, the user can dictate which order these appear in the excel file - samples, should be sorted alphabetically - new columns, will be sorted at random.

Processing comments about how the script works

Change the order of the excel sheet columns. The original order should be retained, then the iBAQ samples and clustered columns should appear and lastly the 'new' things from uniprot and mitocarta.
Change the order of the excel sheets. The applying protein sheet first and then the non-applying protein sheet.
Enable that the columns that are looked through in the mitocarta step can be changed depending on user arguments.
Per sample, calculate the total iBAQ protein abundance value per protein and make a column with this value. Additionally, calculate the global protein abundance value and put this value in a new column
In order to increase user friendlyness create a simple PyQt5 gui in which the input and settings file can be selected.

Enable the option to select which column in the mitocarta database is searched

The user should be able to say I want the program to search through the symbol or synonyms column.

Evaluate whether identifiers where found

When the program tries to fetch data from uniprot it is using uniprot identifiers(hopefully). In order to get these identifiers fasta headers are examined and the identifier is extracted from the fasta header. A function needs to be made which evaluates whether the input, identifiers, are present. If not, skip fetching data from uniprot.

Program crashes instead of displaying error in GUI.

When running the tool on a dataset with gi numbers instead of uniprot ID's, but leaving the option "uniprot_step" on, the tool and GUI shut down/crash instead of the error being displayed in the GUI.

Traceback:
Traceback (most recent call last): File "/home/joeri/Documents/coding_projects/process_maxquant/gui_file_acceptor.py", line 149, in execute_process_maxquant_script protein_groups_dataframe = fetch_uniprot_annotation_step(self, protein_groups_dataframe, settings_dict) File "/home/joeri/Documents/coding_projects/process_maxquant/process_maxquant.py", line 948, in fetch_uniprot_annotation_step protein_data_dict = fetch_uniprot_annotation(gui_object, protein_groups_dataframe["identifier"], settings_dict["uniprot_step"]) File "/home/joeri/Documents/coding_projects/process_maxquant/process_maxquant.py", line 305, in fetch_uniprot_annotation request.raise_for_status() File "/home/joeri/miniconda3/envs/py3new/lib/python3.9/site-packages/requests/models.py", line 943, in raise_for_status raise HTTPError(http_error_msg, response=self) requests.exceptions.HTTPError: 400 Client Error: Bad Request for url: https://www.ebi.ac.uk/proteins/api/proteins?offset=0&size=100&accession=10048432,...110625963,110625975,110625979 Aborted (core dumped)

Fix the error message output

Whenever an error message appears it is printed in the GUI. However, the program resumes and the errors message should go away. This doesn't happen and should be changed.
This is a minor thing but should be done in order to prevent confusion.

Maximum recursion error during clustering step

When running the tool on a protein_groups file with 6 samples, (each having 60 slices) an error occurs during the clustering step.

The GUI displays the following error:
An exception occurred while applying clustering on a sample
Maximum recursion depth exceeded in comparison.

relevant lines in log file before error occurs:
INFO:root:Step 4, cluster the fractions per sample using hierarchical clustering.
INFO:root:Start hierarchical clustering for sample T5

(not sure why it starts clustering T5? maybe something went wrong with parsing the other samples? Samples are T1 - T6)

joerivstrien / process_maxquant Goto Github PK

process_maxquant's People

Contributors

Stargazers

Watchers

process_maxquant's Issues

Sample names are not always fully detected (resulting in unexpected behaviour)

Validate the settings file

Enable the user to dictate the column output order and sort the samples alphabetically.

Processing comments about how the script works

Enable the option to select which column in the mitocarta database is searched

Evaluate whether identifiers where found

Program crashes instead of displaying error in GUI.

Fix the error message output

Maximum recursion error during clustering step

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs