hadasvolk / complabngs Goto Github PK
View Code? Open in Web Editor NEWComputational Lab in Next Generation Sequencing and Genomics Data Analysis - TAU 0411358701
License: MIT License
Computational Lab in Next Generation Sequencing and Genomics Data Analysis - TAU 0411358701
License: MIT License
I'm having trouble installing GATK, this is what i did:
also tried to download the gatk via conda. that didn't worked either.
so i would appreciate some help on how to install gatk.
Hello Hadas,
I understood that we have to submit two jupyter_notebooks, correct?
Moodle has a limit of 1 file that can be submitted.
Can you update this, so we can submit two files? Or a different solution?
Chers!
I think I did things right, but I can't find this genes on IGV (YDL083c YEL003W YKL180W YNL162W).
I uploaded the reference genome file - S288C_reference_sequence_R64-2-1_20150113.fasta
the annotation - S288C_reference_annotation_proc.gff
and the indexed (by samtools) STAR BAM file - star_mapAligned.sortedByCoord.out.bam
Where I might be wrong?
Hello,
Could you provide a zoom link to the meeting so people can join?
Thanks
Hello :)
After a successful installation of GATK, I tried to run this command:
$ gatk AddOrReplaceReadGroups -I "SRR1569760_vs_S288C.HQ.sort.bam" -O "out.bam" -LB READS -PL ILLUMINA -PU 1 -SM SRR1569760 --CREATE_INDEX
but I got this error -
Using GATK jar /home/nofar/gatk-4.5.0.0/gatk-package-4.5.0.0-local.jar
Running:
java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -jar /home/nofar/gatk-4.5.0.0/gatk-package-4.5.0.0-local.jar AddOrReplaceReadGroups -I SRR1569760_vs_S288C.HQ.sort.bam -O out.bam -LB READS -PL ILLUMINA -PU 1 -SM SRR1569760 --CREATE_INDEX
Traceback (most recent call last):
File "/home/nofar/gatk-4.5.0.0/gatk", line 511, in <module>
main(sys.argv[1:])
File "/home/nofar/gatk-4.5.0.0/gatk", line 177, in main
runGATK(sparkRunner, sparkSubmitCommand, dryRun, gatkArgs, sparkArgs, javaOptions, debugPort, debugSuspend)
File "/home/nofar/gatk-4.5.0.0/gatk", line 360, in runGATK
runCommand(cmd, dryrun)
File "/home/nofar/gatk-4.5.0.0/gatk", line 416, in runCommand
check_call(cmd, env=gatk_env)
File "/home/nofar/miniconda3/envs/mamba/lib/python2.7/subprocess.py", line 185, in check_call
retcode = call(*popenargs, **kwargs)
File "/home/nofar/miniconda3/envs/mamba/lib/python2.7/subprocess.py", line 172, in call
return Popen(*popenargs, **kwargs).wait()
File "/home/nofar/miniconda3/envs/mamba/lib/python2.7/subprocess.py", line 394, in __init__
errread, errwrite)
File "/home/nofar/miniconda3/envs/mamba/lib/python2.7/subprocess.py", line 1047, in _execute_child
raise child_exception
OSError: [Errno 2] No such file or directory
The bam file as it named in the command line is accurate to my directory. I don't understand what is the problem, and I would like to get help :)
Thanks in advance,
Nofar :-]
I have an issue runing trimmomatic. It seems like I am missing a file to actually run the command. It says that it does not find one of teh files, however, if i list everything in my home directory, this file appears.
I tried deleting ALL files in the environment and rerunning everything, the same issue appears.
Anyone can help?
The commandline i am running:
trimmomatic PE SRR1569760_sub_1.fastqc SRR1569760_sub_2.fastqc -baseout SRR1569760 ILLUMINACLIP:NexteraPE-PE.fa:2:30:10
This is the output:
TrimmomaticPE: Started with arguments:
SRR1569760_sub_1.fastqc SRR1569760_sub_2.fastqc -baseout SRR1569760 ILLUMINACLIP:NexteraPE-PE.fa:2:30:10
Multiple cores found: Using 4 threads
Using templated Output files: SRR1569760_1P SRR1569760_1U SRR1569760_2P SRR1569760_2U
Using PrefixPair: 'AGATGTGTATAAGAGACAG' and 'AGATGTGTATAAGAGACAG'
Using Long Clipping Sequence: 'GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAG'
Using Long Clipping Sequence: 'TCGTCGGCAGCGTCAGATGTGTATAAGAGACAG'
Using Long Clipping Sequence: 'CTGTCTCTTATACACATCTCCGAGCCCACGAGAC'
Using Long Clipping Sequence: 'CTGTCTCTTATACACATCTGACGCTGCCGACGA'
ILLUMINACLIP: Using 1 prefix pairs, 4 forward/reverse sequences, 0 forward only sequences, 0 reverse only sequences
Exception in thread "main" java.io.FileNotFoundException: SRR1569760_sub_1.fastqc (No such file or directory)
at java.io.FileInputStream.open0(Native Method)
at java.io.FileInputStream.open(FileInputStream.java:195)
at java.io.FileInputStream.(FileInputStream.java:138)
at org.usadellab.trimmomatic.fastq.FastqParser.parse(FastqParser.java:135)
at org.usadellab.trimmomatic.TrimmomaticPE.process(TrimmomaticPE.java:265)
at org.usadellab.trimmomatic.TrimmomaticPE.run(TrimmomaticPE.java:555)
at org.usadellab.trimmomatic.Trimmomatic.main(Trimmomatic.java:80)
Hello :)
I couldn't install Qualimap and I don't know why.
I've tried to install it by this manual - http://qualimap.conesalab.org/doc_html/intro.html#installation, with no success.
I've tried to run the qualimap command line inside qualimap directory -
qualimap_v2.3 rnaseq -bam /home/nofar/CompLabNGS/8-RNA1/sortedAligned.sortedByCoord.out.bam -gtf /home/nofar/CompLabNGS/8-RNA1/S288C_reference_annotation_proc.gtf
with no success..
I would be happy to get help,
Thanks in advance :)
Hello :)
After dealing with a lot of errors, I've finally got the expected table out of deseq script -
`Log2 fold change & Wald test p-value: strain S288C vs RM11
baseMean log2FoldChange lfcSE stat pvalue padj
0 0.000000 NaN NaN NaN NaN NaN
1 0.000000 NaN NaN NaN NaN NaN
2 0.000000 NaN NaN NaN NaN NaN
3 0.160284 0.879404 2.528475 0.347800 0.727990 NaN
4 2.424527 4.707459 2.135109 2.204786 0.027469 NaN
... ... ... ... ... ... ...
6692 0.000000 NaN NaN NaN NaN NaN
6693 1.950290 -0.904463 1.170501 -0.772715 0.439691 NaN
6694 0.641902 1.432367 1.793678 0.798564 0.424543 NaN
6695 2.444357 0.447478 1.025930 0.436168 0.662715 NaN
6696 1.110214 0.328981 1.382793 0.237910 0.811951 NaN
[6697 rows x 6 columns]`
I now trying to move forward and filter the table by log2foldchange>1 and pvalue<0.05, but it seems that the results dataframe i'm trying to get is empty. This is my code -
import pandas as pd
from pydeseq2.default_inference import DefaultInference
from pydeseq2.dds import DeseqDataSet
from pydeseq2.ds import DeseqStats
# Read sample info into DataFrame with 'sample' column as index
sample_info_df = pd.read_csv('sample_info.tsv', sep='\t', index_col='sample')
# Read count data into DataFrame (assuming counts.tsv is in the same directory)
count_data_df = pd.read_csv('counts.tsv', sep='\t', skiprows=[0])
# Drop irrelevant columns
count_data_df = count_data_df.drop(columns=['Geneid', 'Chr', 'Start', 'End', 'Strand', 'Length'])
# Transpose the count data
transposed_df = count_data_df.transpose()
# Create an instance of DefaultInference
inference = DefaultInference(n_cpus=8)
# Create a DESeqDataSet object
dds = DeseqDataSet(
counts=transposed_df,
metadata=sample_info_df,
design_factors=['batch', 'strain'],
refit_cooks=True,
inference=inference
)
# Run DESeq2 analysis
dds.deseq2()
results = DeseqStats(dds, inference=inference)
print(results.summary())
summary_df = pd.DataFrame(results.summary())
filtered_results = summary_df[summary_df['pvalue'] < 0.05]
and this is the errors i'm getting (also with log2foldchange filtering) (ignore the padj I tried to filter by, my mistake)
AttributeError: 'NoneType' object has no attribute 'summary'
File "/home/nofar/new.py", line 40, in <module> filtered_results = results.summary()[(results.summary()['log2FoldChange'] > 1) & (results.summary()['padj'] < 0.05)] ~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^ TypeError: 'NoneType' object is not subscriptable
I would be happy to get some help,
Thanks in advanve, Nofar :-]
I have big issues completing the installation of seaborn. In Jupyterlab it does not let me install it and i have no other way of using python...
ModuleNotFoundError Traceback (most recent call last)
Cell In[2], line 1
----> 1 import seaborn as sns
ModuleNotFoundError: No module named 'seaborn'
I am quite stuck with this issue and i do not know how else i can solve it.
i used QUAST 5.2.0 manual to try and install QUAST, asked chatgpt for help and googled how to install it. i installed a bunch of packages that I'm not sure if they are correct. i also don't understand what do you mean running quast on assembly 4. I eventually managed to get some sort of quast command to work, I start the command with "python quast.py" and I continue putting the file directories of the contings and scaffolds, this results in a new directory with nothing in it other than the log file which states a bunch of different errors regarding utf-8 codec and some more unclear errors.
sorry if I spill too much but I'm in a little frustration after a few hours of trying to figure it out. is there a method to start from scratch, to understand how to install/access/use assembly4 (when did you talk about it?) and also quast?
Unable to connect to the Jupiter notebook, the link in the presentation is incorrect
Hi,
In ex8 the file S288C_reference_annotation_proc.gtf is missing from the data directory (and it's needed to run Qualimap)
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.