GithubHelp home page GithubHelp logo

Enhancement and refactoring about rain HOT 3 OPEN

mboissel avatar mboissel commented on August 19, 2024
Enhancement and refactoring

from rain.

Comments (3)

mcanouil avatar mcanouil commented on August 19, 2024

Note: PLINK is used to convert from VCF to BED/BIM/FAM for flashpcaR to use directly.

from rain.

mcanouil avatar mcanouil commented on August 19, 2024

Notes:

  • VCFtools (and awk) to be replaced by BCFtools call:
    • rain/R/format_vcf.R

      Lines 41 to 48 in dbfa02c

      bin_path[["vcftools"]],
      "--gzvcf", ref1kg_vcfs,
      "--maf", ref1kg_maf,
      "--recode",
      "--stdout",
      "|", bin_path[["bgzip"]], "-c >", output_ref,
      "&&",
      bin_path[["tabix"]], "-p vcf", output_ref
    • rain/R/format_vcf.R

      Lines 72 to 79 in dbfa02c

      paste(bin_path[["vcftools"]],
      "--gzvcf", input_vcfs,
      "--get-INFO", quality_tag,
      "--out", gsub("filtered_", "excluded_", output_study_temp),
      "&&",
      'awk \'{if($5<', quality_threshold, ') print $1"\t"$2}\'',
      paste0(gsub("filtered_", "excluded_", output_study_temp), ".INFO"),
      ">", paste0(gsub("filtered_", "excluded_", output_study_temp), ".exclude"),
    • rain/R/format_vcf.R

      Lines 83 to 95 in dbfa02c

      bin_path[["vcftools"]],
      "--gzvcf", input_vcfs,
      if (!is.null(quality_tag)) {
      paste0(
      "--exclude-positions ", gsub("filtered_", "excluded_", output_study_temp), ".exclude"
      )
      },
      "--remove-indels",
      "--remove-filtered-all",
      "--max-missing-count 1",
      "--recode",
      "--stdout",
      "|", bin_path[["bgzip"]], "-c >", output_study_temp,
  • I think we should embedded PLINK within the package to avoid having to download a binary. This would make the dependency contained.

from rain.

mcanouil avatar mcanouil commented on August 19, 2024

Note: PLINK is used to convert from VCF to BED/BIM/FAM for flashpcaR to use directly.

In fact, we might want and be able to remove PLINK completely with something like the following:

geno_mat <- unique(data.table::setnames(
  x = data.table::fread(
    cmd = paste(
      bcftools, "query",
      "--format", "'%CHROM\t%POS\t%REF\t%ALT[\t%GT]\n'",
      "--print-header",
      file.path(output_directory, "all-samples-1kg.vcf.gz")
    ),
    sep = "\t"
  ),
  old = function(x) sub("^#", "", sub(":GT", "", sub(" *\\[[[:digit:]]+\\]", "", x)))
))
sample_columns <- setdiff(names(geno_mat), c("CHROM", "POS", "REF", "ALT"))
geno_mat <- geno_mat[
  j = (sample_columns) := lapply(
    X = .SD,
    FUN = function(x) c("00" = 0L, "10" = 1L, "01" = 1L, "11" = 2L)[sub("/|\\|", "", x)]
  ),
  .SDcols = sample_columns
][order(CHROM, POS)]

pca_res <- flashpcaR::flashpca(
  X = geno_mat[j = .SD, .SDcols = !c("CHROM", "POS", "REF", "ALT")],
  ndim = n_comp
)

from rain.

Related Issues (4)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.