akikuno / dajin2 Goto Github PK
View Code? Open in Web Editor NEW🔬 Genotyping tool for genome-edited samples, utilizing nanopore sequencer target sequencing
License: MIT License
🔬 Genotyping tool for genome-edited samples, utilizing nanopore sequencer target sequencing
License: MIT License
Current DAJIN2 aims to detect alleles greater than 0.5% to differentiate them from Nanopore sequencing errors. In other words, alleles less than or equal to 0.5% are considered sequencing errors.
However, the mutation info and HTML reports show extremely low-frequency alleles, such as 0.005%.
Extremely low-frequency alleles, such as 0.005%, should be classified under the most similar major allele.
None
None
WSL2
3.10
0.4.6
None
DAJIN2 v0.4.6 does not consider the detection sensitivity for unintended inversion alleles.
Currently, there is no script to extract unintended inversions, so a dedicated script for detecting inversions should be prepared, referring to insertions_to_fasta.py
.
None
None
In the current DAJIN2, deletions due to sequencing error within large deletions are considered true mutations.
While this behavior is somewhat accurate as it reflects mutations within large deletion alleles, it leads to an undesirable outcome where the clustering detects deletions of sequence errors within large deletions and reports them as independent alleles.
Similar to insertions_to_fasta.py
, we aim to classify deletions within large deletion alleles during the classification
step before clustering
by pre-separating the large deletion alleles from the control.
0.4.6
None
GUI implementation of DAJIN2.
Create a GUI that runs on localhost using Flask
or streamlit
.
The GUI should automatically launch with the DAJIN2 gui
command, allowing users to upload FASTQ files and perform other tasks through the GUI.
Hi, Dr. Akihiro
I am Chen,a student who recently started benchmark research on CRISPR editing outcome approaches and I am very interested in your previous work DAJIN which used Nano sim results to cluster reads for long reads sequencing genotyping. In my study, I tried to compare the performance of different genotyping methods in a complicated situations like long deletion lesions
However, I met some problems while trying to repeat your methods on my own public dataset. I could not run DAJIN with its basic usage (listed in another issue) and discovered this DAJIN2 repository. would you like to allow me to have a try on DAJIN2 with some brief description and guidelines on my own data? the dataset is like:
Input
Nanopore and Pacbio sequencing outcomes of several certain gRNAs, the sequencing depth and designed gRNAs vary from each sample.
output
Major editing outcomes of editing events with read count and percentage (like DAJIN )
It would be very grateful for me if you could help we with my study, thank you for your time and help
btw, I found the third nanopore data is really noisy and the length of reads is complete certain (some are entire and some lack pieces), how could you handle this low quality reads in DAJIN ,I am not fully understand your work in DAJIN
thank you!
Hi, I faced an error in execution.
I conducted a paired analysis of a control and a sample by following command:
DAJIN2 --control barcode01 --sample barcode02 --allele actc1L_cont_knockin.fa --name 02 --genome xenLae2
but this resulted:
2024-03-26 17:41:54, INFO, barcode01 is now processing...
2024-03-26 17:41:54, ERROR, Catch an Exception. Traceback:
Traceback (most recent call last):
File "/home/user/miniconda3/envs/dajin2/bin/DAJIN2", line 10, in <module>
sys.exit(execute())
File "/home/user/miniconda3/envs/dajin2/lib/python3.10/site-packages/DAJIN2/main.py", line 236, in execute
execute_single_mode(arguments)
File "/home/user/miniconda3/envs/dajin2/lib/python3.10/site-packages/DAJIN2/main.py", line 47, in execute_single_mode
core.execute_control(arguments)
File "/home/user/miniconda3/envs/dajin2/lib/python3.10/site-packages/DAJIN2/core/core.py", line 26, in execute_control
ARGS: FormattedInputs = preprocess.format_inputs(arguments)
File "/home/user/miniconda3/envs/dajin2/lib/python3.10/site-packages/DAJIN2/core/preprocess/input_formatter.py", line 90, in format_inputs
path_sample, path_control, path_allele = convert_input_paths_to_posix(path_sample, path_control, path_allele)
File "/home/user/miniconda3/envs/dajin2/lib/python3.10/site-packages/DAJIN2/core/preprocess/input_formatter.py", line 33, in convert_input_paths_to_posix
sample = io.convert_to_posix(sample)
File "/home/user/miniconda3/envs/dajin2/lib/python3.10/site-packages/DAJIN2/utils/io.py", line 147, in convert_to_posix
if wslPath.is_windows_path(path):
AttributeError: module 'wslPath' has no attribute 'is_windows_path'
I'm using Ubuntu20.04 and installed the latest version of DAJIN2 via conda in a new environment without any error.
Please let me know if anyone have idea to fix it.
Another errors were occurred when I tried another PC.
I conducted same commands after successful install in another PC:
DAJIN2 --control barcode01/ --sample barcode02/ --allele actc1L_cont_knockin.fa --name act1c --genome xenLae2 --threads 8
But another error was occurred:
2024-03-26 18:29:11, INFO, barcode01/ is now processing...
2024-03-26 18:29:14, INFO, Preprocess barcode01/...
2024-03-26 18:30:27, INFO, Output BAM files of barcode01/...
2024-03-26 18:30:28, INFO, 🍵 barcode01/ is finished!
2024-03-26 18:30:28, INFO, barcode02/ is now processing...
2024-03-26 18:30:31, INFO, Preprocess barcode02/...
2024-03-26 18:30:38, ERROR, Catch an Exception. Traceback:
Traceback (most recent call last):
File "/home/igawa/miniconda3/envs/dajin2/bin/DAJIN2", line 10, in <module>
sys.exit(execute())
File "/home/igawa/miniconda3/envs/dajin2/lib/python3.10/site-packages/DAJIN2/main.py", line 236, in execute
execute_single_mode(arguments)
File "/home/igawa/miniconda3/envs/dajin2/lib/python3.10/site-packages/DAJIN2/main.py", line 48, in execute_single_mode
core.execute_sample(arguments)
File "/home/igawa/miniconda3/envs/dajin2/lib/python3.10/site-packages/DAJIN2/core/core.py", line 120, in execute_sample
preprocess.cache_mutation_loci(ARGS, is_control=False)
File "/home/igawa/miniconda3/envs/dajin2/lib/python3.10/site-packages/DAJIN2/core/preprocess/mutation_extractor.py", line 322, in cache_mutation_loci
mutation_loci = extract_mutation_loci(
File "/home/igawa/miniconda3/envs/dajin2/lib/python3.10/site-packages/DAJIN2/core/preprocess/mutation_extractor.py", line 280, in extract_mutation_loci
anomal_loci = extract_anomal_loci(
File "/home/igawa/miniconda3/envs/dajin2/lib/python3.10/site-packages/DAJIN2/core/preprocess/mutation_extractor.py", line 128, in extract_anomal_loci
idx_outliers = detect_anomalies(values_sample, values_control, thresholds[mut], is_consensus)
File "/home/igawa/miniconda3/envs/dajin2/lib/python3.10/site-packages/DAJIN2/core/preprocess/mutation_extractor.py", line 111, in detect_anomalies
kmeans = MiniBatchKMeans(n_clusters=2, random_state=0, n_init="auto").fit(values_subtract_reshaped)
File "/home/igawa/miniconda3/envs/dajin2/lib/python3.10/site-packages/sklearn/cluster/_kmeans.py", line 1960, in fit
self._check_params(X)
File "/home/igawa/miniconda3/envs/dajin2/lib/python3.10/site-packages/sklearn/cluster/_kmeans.py", line 1792, in _check_params
super()._check_params(X)
File "/home/igawa/miniconda3/envs/dajin2/lib/python3.10/site-packages/sklearn/cluster/_kmeans.py", line 818, in _check_params
if self.n_init <= 0:
TypeError: '<=' not supported between instances of 'str' and 'int'
I could fix this by updating scikit-learn by pip:
pip install -U scikit-learn
However, an another error was occurred after fixing scikit-learn:
2024-03-26 18:46:37, INFO, barcode01/ is now processing...
2024-03-26 18:46:39, INFO, Preprocess barcode01/...
2024-03-26 18:47:52, INFO, Output BAM files of barcode01/...
2024-03-26 18:47:53, INFO, 🍵 barcode01/ is finished!
2024-03-26 18:47:53, INFO, barcode02/ is now processing...
2024-03-26 18:47:56, INFO, Preprocess barcode02/...
2024-03-26 18:48:06, INFO, Classify barcode02/...
2024-03-26 18:48:07, INFO, Clustering barcode02/...
2024-03-26 18:48:28, INFO, Consensus calling of barcode02/...
2024-03-26 18:50:15, INFO, Output reports of barcode02/...
2024-03-26 18:50:15, ERROR, Catch an Exception. Traceback:
Traceback (most recent call last):
File "/home/igawa/miniconda3/envs/dajin2/bin/DAJIN2", line 10, in <module>
sys.exit(execute())
File "/home/igawa/miniconda3/envs/dajin2/lib/python3.10/site-packages/DAJIN2/main.py", line 236, in execute
execute_single_mode(arguments)
File "/home/igawa/miniconda3/envs/dajin2/lib/python3.10/site-packages/DAJIN2/main.py", line 48, in execute_single_mode
core.execute_sample(arguments)
File "/home/igawa/miniconda3/envs/dajin2/lib/python3.10/site-packages/DAJIN2/core/core.py", line 214, in execute_sample
report.report_bam.export_to_bam(
File "/home/igawa/miniconda3/envs/dajin2/lib/python3.10/site-packages/DAJIN2/core/report/report_bam.py", line 127, in export_to_bam
write_sam_to_bam(sam_headers + sam_content, path_sam_output, path_bam_output, THREADS)
File "/home/igawa/miniconda3/envs/dajin2/lib/python3.10/site-packages/DAJIN2/core/report/report_bam.py", line 86, in write_sam_to_bam
Path(path_sam).write_text(formatted_sam + "\n")
File "/home/igawa/miniconda3/envs/dajin2/lib/python3.10/pathlib.py", line 1154, in write_text
with self.open(mode='w', encoding=encoding, errors=errors, newline=newline) as f:
File "/home/igawa/miniconda3/envs/dajin2/lib/python3.10/pathlib.py", line 1119, in open
return self._accessor.open(self, mode, buffering, encoding, errors,
FileNotFoundError: [Errno 2] No such file or directory: 'DAJIN_Results/.tempdir/act1c/report/bam/tmp240891_allele1_control_indels_40.876%.sam'
It may be an error in dajin2 code, specifically in pathlib.py?
I'm happy if this error will be fixed by dajin2 update.
Thanks,
After updating to 0.4.3, genome_fetcher.py reported error:
DAJIN2 --control barcode01 --sample barcode02 --allele actc1L_cont_knockin.fa --name 02 --genome xenLae2 --threads 8
2024-03-29 11:48:17, INFO, barcode01 is now processing...
2024-03-29 11:48:19, ERROR, Catch an Exception. Traceback:
Traceback (most recent call last):
File "/home/igawa/miniconda3/envs/dajin2/bin/DAJIN2", line 8, in <module>
sys.exit(execute())
File "/home/igawa/miniconda3/envs/dajin2/lib/python3.10/site-packages/DAJIN2/main.py", line 236, in execute
execute_single_mode(arguments)
File "/home/igawa/miniconda3/envs/dajin2/lib/python3.10/site-packages/DAJIN2/main.py", line 47, in execute_single_mode
core.execute_control(arguments)
File "/home/igawa/miniconda3/envs/dajin2/lib/python3.10/site-packages/DAJIN2/core/core.py", line 26, in execute_control
ARGS: FormattedInputs = preprocess.format_inputs(arguments)
File "/home/igawa/miniconda3/envs/dajin2/lib/python3.10/site-packages/DAJIN2/core/preprocess/input_formatter.py", line 96, in format_inputs
genome_coordinates = get_genome_coordinates(genome_urls, fasta_alleles, is_cache_genome, tempdir)
File "/home/igawa/miniconda3/envs/dajin2/lib/python3.10/site-packages/DAJIN2/core/preprocess/input_formatter.py", line 67, in get_genome_coordinates
genome_coordinates = preprocess.fetch_coordinates(genome_coordinates, genome_urls, fasta_alleles["control"])
File "/home/igawa/miniconda3/envs/dajin2/lib/python3.10/site-packages/DAJIN2/core/preprocess/genome_fetcher.py", line 29, in fetch_coordinates
coordinate_start = fetch_seq_coordinates(genome, blat_url, seq_start)
File "/home/igawa/miniconda3/envs/dajin2/lib/python3.10/site-packages/DAJIN2/core/preprocess/genome_fetcher.py", line 18, in fetch_seq_coordinates
raise ValueError(f"{seq[:60]}... is not found in {genome}")
ValueError: TTATAATTCAGCATCTAGACAGCAGCAACAAGCATTACCCTGGAATGGTTCATAATATGC... is not found in xenLae2
I confirmed run completed successfully when I replaced older genome_fetcher.py, so it may come from updated one.
Thank you for your effort, anyway!
Consensus sequence is now represented as HTML
and FASTA
. Using my cstag
package, it is possible to output in VCF format.
https://github.com/akikuno/cstag
Use the cstag.to_vcf
function.
None
None
cat << EOF | sort >tmp1.csv
fooa,!!0
foob,!!0
fooc,!00
food,!00
EOF
cat << EOF | sort >tmp2.csv
fooa,M,M,M
foob,M,M,M
fooc,M,S,M
food,M,M,M
EOF
join -t, tmp1.csv tmp2.csv |
maskMS > tmp_maskMS.csv
cat tmp_maskMS.csv |
Rscript ./library/03-preprocess/maskMIDS.R
rm tmp1.csv tmp2.csv
<!-- Edit the body of your new issue then click the ✓ "Create Issue" button in the top right of the editor. The first line will be the issue title. Assignees and Labels follow after a blank line. Leave an empty line before beginning the body of the issue. -->
Replace setup.py to pyproject.toml to integrate setup.py
, requrements.txt
, and MANIFEST.in
create pyproject.toml
0.4.6
None
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.