GithubHelp home page GithubHelp logo

akikuno / dajin2 Goto Github PK

View Code? Open in Web Editor NEW
7.0 2.0 0.0 569.86 MB

🔬 Genotyping tool for genome-edited samples, utilizing nanopore sequencer target sequencing

License: MIT License

Python 98.69% HTML 1.30% CSS 0.01%
bioinformatics crispr-target genomics long-read-sequencing python3 nanopore

dajin2's People

Contributors

akikuno avatar

Stargazers

 avatar  avatar  avatar  avatar

Watchers

 avatar

dajin2's Issues

DAJIN2 reports extremely low-frequency alleles (e.g. 0.005%)

Describe the Bug

Current DAJIN2 aims to detect alleles greater than 0.5% to differentiate them from Nanopore sequencing errors. In other words, alleles less than or equal to 0.5% are considered sequencing errors.

However, the mutation info and HTML reports show extremely low-frequency alleles, such as 0.005%.

Expected Behavior

Extremely low-frequency alleles, such as 0.005%, should be classified under the most similar major allele.

Actual Behavior

None

Steps/Code to Reproduce

None

Operating System

WSL2

Python version

3.10

DAJIN2 version

0.4.6

Additional Context

None

Add a detection method for unintended inversion

Is your feature request related to a problem? Please describe.

DAJIN2 v0.4.6 does not consider the detection sensitivity for unintended inversion alleles.

Describe the solution you'd like

Currently, there is no script to extract unintended inversions, so a dedicated script for detecting inversions should be prepared, referring to insertions_to_fasta.py.

Describe alternatives you've considered if you have any

None

Additional context

None

Improve Handling of Sequence Error Deletions within Large Deletions

Describe the features you want.

In the current DAJIN2, deletions due to sequencing error within large deletions are considered true mutations.

While this behavior is somewhat accurate as it reflects mutations within large deletion alleles, it leads to an undesirable outcome where the clustering detects deletions of sequence errors within large deletions and reports them as independent alleles.

Describe the solution you'd like if you have any.

Similar to insertions_to_fasta.py, we aim to classify deletions within large deletion alleles during the classification step before clustering by pre-separating the large deletion alleles from the control.

DAJIN2 version

0.4.6

Additional context

None

GUI implementation

Describe the features you want.

GUI implementation of DAJIN2.

Describe the solution you'd like if you have any.

Create a GUI that runs on localhost using Flask or streamlit.
The GUI should automatically launch with the DAJIN2 gui command, allowing users to upload FASTQ files and perform other tasks through the GUI.

some request for help

Hi, Dr. Akihiro
I am Chen,a student who recently started benchmark research on CRISPR editing outcome approaches and I am very interested in your previous work DAJIN which used Nano sim results to cluster reads for long reads sequencing genotyping. In my study, I tried to compare the performance of different genotyping methods in a complicated situations like long deletion lesions
However, I met some problems while trying to repeat your methods on my own public dataset. I could not run DAJIN with its basic usage (listed in another issue) and discovered this DAJIN2 repository. would you like to allow me to have a try on DAJIN2 with some brief description and guidelines on my own data? the dataset is like:
Input
Nanopore and Pacbio sequencing outcomes of several certain gRNAs, the sequencing depth and designed gRNAs vary from each sample.
output
Major editing outcomes of editing events with read count and percentage (like DAJIN )
It would be very grateful for me if you could help we with my study, thank you for your time and help
btw, I found the third nanopore data is really noisy and the length of reads is complete certain (some are entire and some lack pieces), how could you handle this low quality reads in DAJIN ,I am not fully understand your work in DAJIN
thank you!

AttributeError: module 'wslPath' has no attribute 'is_windows_path'

Hi, I faced an error in execution.

I conducted a paired analysis of a control and a sample by following command:

DAJIN2 --control barcode01 --sample barcode02 --allele actc1L_cont_knockin.fa --name 02 --genome xenLae2

but this resulted:

2024-03-26 17:41:54, INFO, barcode01 is now processing...
2024-03-26 17:41:54, ERROR, Catch an Exception. Traceback:
Traceback (most recent call last):
  File "/home/user/miniconda3/envs/dajin2/bin/DAJIN2", line 10, in <module>
    sys.exit(execute())
  File "/home/user/miniconda3/envs/dajin2/lib/python3.10/site-packages/DAJIN2/main.py", line 236, in execute
    execute_single_mode(arguments)
  File "/home/user/miniconda3/envs/dajin2/lib/python3.10/site-packages/DAJIN2/main.py", line 47, in execute_single_mode
    core.execute_control(arguments)
  File "/home/user/miniconda3/envs/dajin2/lib/python3.10/site-packages/DAJIN2/core/core.py", line 26, in execute_control
    ARGS: FormattedInputs = preprocess.format_inputs(arguments)
  File "/home/user/miniconda3/envs/dajin2/lib/python3.10/site-packages/DAJIN2/core/preprocess/input_formatter.py", line 90, in format_inputs
    path_sample, path_control, path_allele = convert_input_paths_to_posix(path_sample, path_control, path_allele)
  File "/home/user/miniconda3/envs/dajin2/lib/python3.10/site-packages/DAJIN2/core/preprocess/input_formatter.py", line 33, in convert_input_paths_to_posix
    sample = io.convert_to_posix(sample)
  File "/home/user/miniconda3/envs/dajin2/lib/python3.10/site-packages/DAJIN2/utils/io.py", line 147, in convert_to_posix
    if wslPath.is_windows_path(path):
AttributeError: module 'wslPath' has no attribute 'is_windows_path'

I'm using Ubuntu20.04 and installed the latest version of DAJIN2 via conda in a new environment without any error.
Please let me know if anyone have idea to fix it.

two more errors in execution (one fixed)

Another errors were occurred when I tried another PC.

I conducted same commands after successful install in another PC:
DAJIN2 --control barcode01/ --sample barcode02/ --allele actc1L_cont_knockin.fa --name act1c --genome xenLae2 --threads 8

But another error was occurred:

2024-03-26 18:29:11, INFO, barcode01/ is now processing...
2024-03-26 18:29:14, INFO, Preprocess barcode01/...
2024-03-26 18:30:27, INFO, Output BAM files of barcode01/...
2024-03-26 18:30:28, INFO, 🍵 barcode01/ is finished!
2024-03-26 18:30:28, INFO, barcode02/ is now processing...
2024-03-26 18:30:31, INFO, Preprocess barcode02/...
2024-03-26 18:30:38, ERROR, Catch an Exception. Traceback:
Traceback (most recent call last):
  File "/home/igawa/miniconda3/envs/dajin2/bin/DAJIN2", line 10, in <module>
    sys.exit(execute())
  File "/home/igawa/miniconda3/envs/dajin2/lib/python3.10/site-packages/DAJIN2/main.py", line 236, in execute
    execute_single_mode(arguments)
  File "/home/igawa/miniconda3/envs/dajin2/lib/python3.10/site-packages/DAJIN2/main.py", line 48, in execute_single_mode
    core.execute_sample(arguments)
  File "/home/igawa/miniconda3/envs/dajin2/lib/python3.10/site-packages/DAJIN2/core/core.py", line 120, in execute_sample
    preprocess.cache_mutation_loci(ARGS, is_control=False)
  File "/home/igawa/miniconda3/envs/dajin2/lib/python3.10/site-packages/DAJIN2/core/preprocess/mutation_extractor.py", line 322, in cache_mutation_loci
    mutation_loci = extract_mutation_loci(
  File "/home/igawa/miniconda3/envs/dajin2/lib/python3.10/site-packages/DAJIN2/core/preprocess/mutation_extractor.py", line 280, in extract_mutation_loci
    anomal_loci = extract_anomal_loci(
  File "/home/igawa/miniconda3/envs/dajin2/lib/python3.10/site-packages/DAJIN2/core/preprocess/mutation_extractor.py", line 128, in extract_anomal_loci
    idx_outliers = detect_anomalies(values_sample, values_control, thresholds[mut], is_consensus)
  File "/home/igawa/miniconda3/envs/dajin2/lib/python3.10/site-packages/DAJIN2/core/preprocess/mutation_extractor.py", line 111, in detect_anomalies
    kmeans = MiniBatchKMeans(n_clusters=2, random_state=0, n_init="auto").fit(values_subtract_reshaped)
  File "/home/igawa/miniconda3/envs/dajin2/lib/python3.10/site-packages/sklearn/cluster/_kmeans.py", line 1960, in fit
    self._check_params(X)
  File "/home/igawa/miniconda3/envs/dajin2/lib/python3.10/site-packages/sklearn/cluster/_kmeans.py", line 1792, in _check_params
    super()._check_params(X)
  File "/home/igawa/miniconda3/envs/dajin2/lib/python3.10/site-packages/sklearn/cluster/_kmeans.py", line 818, in _check_params
    if self.n_init <= 0:
TypeError: '<=' not supported between instances of 'str' and 'int'

I could fix this by updating scikit-learn by pip:
pip install -U scikit-learn

However, an another error was occurred after fixing scikit-learn:

2024-03-26 18:46:37, INFO, barcode01/ is now processing...
2024-03-26 18:46:39, INFO, Preprocess barcode01/...
2024-03-26 18:47:52, INFO, Output BAM files of barcode01/...
2024-03-26 18:47:53, INFO, 🍵 barcode01/ is finished!
2024-03-26 18:47:53, INFO, barcode02/ is now processing...
2024-03-26 18:47:56, INFO, Preprocess barcode02/...
2024-03-26 18:48:06, INFO, Classify barcode02/...
2024-03-26 18:48:07, INFO, Clustering barcode02/...
2024-03-26 18:48:28, INFO, Consensus calling of barcode02/...
2024-03-26 18:50:15, INFO, Output reports of barcode02/...
2024-03-26 18:50:15, ERROR, Catch an Exception. Traceback:
Traceback (most recent call last):
  File "/home/igawa/miniconda3/envs/dajin2/bin/DAJIN2", line 10, in <module>
    sys.exit(execute())
  File "/home/igawa/miniconda3/envs/dajin2/lib/python3.10/site-packages/DAJIN2/main.py", line 236, in execute
    execute_single_mode(arguments)
  File "/home/igawa/miniconda3/envs/dajin2/lib/python3.10/site-packages/DAJIN2/main.py", line 48, in execute_single_mode
    core.execute_sample(arguments)
  File "/home/igawa/miniconda3/envs/dajin2/lib/python3.10/site-packages/DAJIN2/core/core.py", line 214, in execute_sample
    report.report_bam.export_to_bam(
  File "/home/igawa/miniconda3/envs/dajin2/lib/python3.10/site-packages/DAJIN2/core/report/report_bam.py", line 127, in export_to_bam
    write_sam_to_bam(sam_headers + sam_content, path_sam_output, path_bam_output, THREADS)
  File "/home/igawa/miniconda3/envs/dajin2/lib/python3.10/site-packages/DAJIN2/core/report/report_bam.py", line 86, in write_sam_to_bam
    Path(path_sam).write_text(formatted_sam + "\n")
  File "/home/igawa/miniconda3/envs/dajin2/lib/python3.10/pathlib.py", line 1154, in write_text
    with self.open(mode='w', encoding=encoding, errors=errors, newline=newline) as f:
  File "/home/igawa/miniconda3/envs/dajin2/lib/python3.10/pathlib.py", line 1119, in open
    return self._accessor.open(self, mode, buffering, encoding, errors,
FileNotFoundError: [Errno 2] No such file or directory: 'DAJIN_Results/.tempdir/act1c/report/bam/tmp240891_allele1_control_indels_40.876%.sam'

It may be an error in dajin2 code, specifically in pathlib.py?
I'm happy if this error will be fixed by dajin2 update.

Thanks,

genome_fetcher.py error

After updating to 0.4.3, genome_fetcher.py reported error:

DAJIN2 --control barcode01 --sample barcode02 --allele actc1L_cont_knockin.fa --name 02 --genome xenLae2 --threads 8
2024-03-29 11:48:17, INFO, barcode01 is now processing...
2024-03-29 11:48:19, ERROR, Catch an Exception. Traceback:
Traceback (most recent call last):
  File "/home/igawa/miniconda3/envs/dajin2/bin/DAJIN2", line 8, in <module>
    sys.exit(execute())
  File "/home/igawa/miniconda3/envs/dajin2/lib/python3.10/site-packages/DAJIN2/main.py", line 236, in execute
    execute_single_mode(arguments)
  File "/home/igawa/miniconda3/envs/dajin2/lib/python3.10/site-packages/DAJIN2/main.py", line 47, in execute_single_mode
    core.execute_control(arguments)
  File "/home/igawa/miniconda3/envs/dajin2/lib/python3.10/site-packages/DAJIN2/core/core.py", line 26, in execute_control
    ARGS: FormattedInputs = preprocess.format_inputs(arguments)
  File "/home/igawa/miniconda3/envs/dajin2/lib/python3.10/site-packages/DAJIN2/core/preprocess/input_formatter.py", line 96, in format_inputs
    genome_coordinates = get_genome_coordinates(genome_urls, fasta_alleles, is_cache_genome, tempdir)
  File "/home/igawa/miniconda3/envs/dajin2/lib/python3.10/site-packages/DAJIN2/core/preprocess/input_formatter.py", line 67, in get_genome_coordinates
    genome_coordinates = preprocess.fetch_coordinates(genome_coordinates, genome_urls, fasta_alleles["control"])
  File "/home/igawa/miniconda3/envs/dajin2/lib/python3.10/site-packages/DAJIN2/core/preprocess/genome_fetcher.py", line 29, in fetch_coordinates
    coordinate_start = fetch_seq_coordinates(genome, blat_url, seq_start)
  File "/home/igawa/miniconda3/envs/dajin2/lib/python3.10/site-packages/DAJIN2/core/preprocess/genome_fetcher.py", line 18, in fetch_seq_coordinates
    raise ValueError(f"{seq[:60]}... is not found in {genome}")
ValueError: TTATAATTCAGCATCTAGACAGCAGCAACAAGCATTACCCTGGAATGGTTCATAATATGC... is not found in xenLae2

I confirmed run completed successfully when I replaced older genome_fetcher.py, so it may come from updated one.
Thank you for your effort, anyway!

Add VCF reports

Is your feature request related to a problem? Please describe.

Consensus sequence is now represented as HTML and FASTA. Using my cstag package, it is possible to output in VCF format.

https://github.com/akikuno/cstag

Describe the solution you'd like

Use the cstag.to_vcf function.

Describe alternatives you've considered if you have any

None

Additional context

None

midsmask:最大頻度の塩基でマスクする方法では、ヘテロがホモになってしまう。


課題

  • midsmaskの手法に問題がある。最大頻度の塩基でマスクする方法では、ヘテロがホモになってしまう。

解決案

  • クオリティの高い塩基のMIDS頻度にしたがって、ランダムにMIDSを付与する。

期待する結果

  • クオリティの高い塩基のMIDS頻度にしたがってMIDSが付与されている
  • midsmaskの点変異部分(829番目)で、クオリティの高い塩基のMIDS頻度にしたがってMIDSが付与されている

テストデータ

cat << EOF | sort >tmp1.csv
fooa,!!0
foob,!!0
fooc,!00
food,!00
EOF

cat << EOF | sort >tmp2.csv
fooa,M,M,M
foob,M,M,M
fooc,M,S,M
food,M,M,M
EOF

join -t, tmp1.csv tmp2.csv |
  maskMS > tmp_maskMS.csv

cat tmp_maskMS.csv |
  Rscript ./library/03-preprocess/maskMIDS.R

rm tmp1.csv tmp2.csv

<!-- Edit the body of your new issue then click the ✓ "Create Issue" button in the top right of the editor. The first line will be the issue title. Assignees and Labels follow after a blank line. Leave an empty line before beginning the body of the issue. -->

Replace setup.py to pyproject.toml

Describe the features you want.

Replace setup.py to pyproject.toml to integrate setup.py, requrements.txt, and MANIFEST.in

Describe the solution you'd like if you have any.

create pyproject.toml

DAJIN2 version

0.4.6

Additional context

None

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.