GithubHelp home page GithubHelp logo

biofiles's People

Contributors

gschofl avatar kablag avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

biofiles's Issues

GbFeatureTable Extension

I have updated annotation data for my Genbank file. Is it possible to create new feature objects and extend the list of features of the given Genbank file ?

Create a GenBank file from a data frame in R

Hello, I have an R data frame like the following and I was wondering if it is possible to create a GenBank file out of each one of the entries.

This is an example data frame:

              Barcode     TRAV   TRAJ               TRA_CDR3                 TRA_Full
12 AACxxxxxxxxxxxAT-1   TRAV21 TRAJ41 TGTGCxxxxxxxxxxxxxxTTT ACTxxxxxxxxxxxxxxGGAA...
13 AACxxxxxxxxxxxTC-1   TRAV21 TRAJ41 TGTGCxxxxxxxxxxxxxxTTT GATxxxxxxxxxxxxxxACAA...
27 AAGxxxxxxxxxxxTA-1   TRAV21 TRAJ41 AAAGGxxxxxxxxxxxxxxTTT TGGxxxxxxxxxxxxxxGAGC...
30 AAGxxxxxxxxxxxTT-1   TRAV11 TRAJ31 TGTGCxxxxxxxxxxxxxxTTT ACTxxxxxxxxxxxxxxGGAA...
37 AATxxxxxxxxxxxGG-1   TRAV11 TRAJ31 GCTACxxxxxxxxxxxxxxTTT GTGxxxxxxxxxxxxxxCTCC...
39 AATxxxxxxxxxxxTC-1   TRAV11 TRAJ31 GTGAGxxxxxxxxxxxxxxTTC CCGxxxxxxxxxxxxxxCAGT...

My goal is to make a different GenBank file for each Barcode, including the TRAV and TRAJ info (maybe in SOURCE or some other field), the CDR3 location in the FEATURE field, and finally TRA_Full as the the ORIGIN sequence.

I was wondering whether biofiles would allow me to not have to do this manually. Looking at the write.GenBank function, it accepts >gbRecord instance. It seems like I can create such instance with the gbRecord function, that accepts A vector of paths to GenBank/Embl format records, an efetch object containing GenBank record(s), or a textConnection to a character vector that can be parsed as a Genbank or Embl record. I am hopping the later option can serve my goal.

I think the question here would be how to format my data frame as a textConnection to a character vector that can be parsed as a Genbank or Embl record, so I can eventually print out as GenBank files... is that even possible? I cannot find any example for this particular task.

Many thanks!
Daniel

biofiles::gbRecord(gb_file) Error in { : task 1 failed - "'mc.cores' must be >= 1"

Hi,
I got an error,What should I do

> gb_file <- reutils::efetch("CP000148", "nuccore", rettype = "gbwithparts", retmode = "text")
[2018-12-06 12:05:43] [error] handle_read_frame error: websocketpp.transport:7 (End of File)
> gb_file
Object of class ‘efetch’ 
LOCUS       CP000148             3997420 bp    DNA     circular BCT 28-JAN-2014
DEFINITION  Geobacter metallireducens GS-15, complete genome.
ACCESSION   CP000148 AAAS03000000 AAAS03000001 AAAS03000002 AAAS03000003
            AAAS03000004 AAAS03000005 AAAS03000006 AAAS03000007 AAAS03000008
            AAAS03000009 AAAS03000010 AAAS03000011
VERSION     CP000148.1
DBLINK      BioProject: PRJNA177
            BioSample: SAMN02598399
KEYWORDS    .
SOURCE      Geobacter metallireducens GS-15
  ORGANISM  Geobacter metallireducens GS-15
            Bacteria; Proteobacteria; Deltaproteobacteria; Desulfuromonadales;
...
EFetch query using the ‘nuccore’ database.
Query url: ‘https://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?=efetch&db=nuccore&id=CP000148&retmode=text&rettype=...’
Retrieval type: ‘gbwithparts’, retrieval mode: ‘text’

> rec <- biofiles::gbRecord(gb_file)
Error in { : task 1 failed - "'mc.cores' must be >= 1"
> rec
Error: object 'rec' not found

Error in strsplit(seqs, "(?< = .{10})(? = .)", perl = TRUE)

I'm having this error when I try to write a genbank file:

write.GenBank(gb, 'genome1_g1.gbk')
Error in strsplit(seqs, "(?< = .{10})(? = .)", perl = TRUE) : 
  invalid split pattern '(?< = .{10})(? = .)' 
In addition: Warning message:
In strsplit(seqs, "(?< = .{10})(? = .)", perl = TRUE) :
  PCRE pattern compilation error 
	'unrecognized character after (?<'
	a ' = .{10})(? = .)'

filter 16S rRNA sequence from gnenbank file

Dear biofiles developpers / users,

I am trying to extract 16S operon from a collection of genbank flat files using R.

I am able to load the file sucessfully:

gbk <- biofiles::gbRecord("~/Projects/ETH/benoit/Bacteroides/200824/GCF_903181445.1_NZ_Bacteroides_fragilis_8E3_BL_hyb_genomic.gbff", 
                          progress = TRUE)

But when it comes to use the biofiles::filter function I can't nake it work:
biofiles::filter(gbk, Feature = "rRNA", product = "16S ribosomal RNA") -> rRNA

biofiles::summary(rRNA)

[[NZ_CAEUHN010000001]]
1857785 bp: Bacteroides fragilis isolate NZ_Bacteroides_fragilis_8E3_BL_hyb, whole genome shotgun sequence.
Error in if (is.atomic(x) || len == 1L && length(nm) <= 1L) { :
missing value where TRUE/FALSE needed

biofiles::getSequence(rRNA)

A DNAStringSet instance of length 24
width seq names
[1] 1857785 AAGTTCTGATAGAACTTAGAAGAGAATGCTCTTTTTACTATTGATTTTAATACTTTTCTCT...AATGACCGTCAATAAATTTTCGACATCCTGAACAGAGCTAATATTGTCCCTTATTGGGAT NZ_CAEUHN010000001
[2] 47339 GTCCGTTTTACCACTATAAATAGTTTCGGAAATACTTACGGTTTGAATGAGAAAAGATGTC...GAACGGATAATAAATTGGATATATTCATTTGTTTTCCAAATAGTTACTACTAAAATGCCT NZ_CAEUHN010000002
[3] 44645 AGAAACATTGATTATCAATGTTCTACAGGATAAACACCCAACTTTACGTCCAAACTGTAAA...GATACCTATTATTGGGCTACAAACATATACGTTATGTAGAATTTATAGAAAAAATAGGGG NZ_CAEUHN010000003
[4] 43518 ACATTCGTTCGTTCTCAACTTCTAAAAATGTTTCGTAAAATTTGACGGTTTGAAAGAGAAA...CAAAGCTTTTGAAGATAGAAATCATACTTTTTAAAGGTATTGATTTTCAGAAGGTTTATC NZ_CAEUHN010000004
[5] 5021 AATTATTGGATACAATTTCCAGAAAGAATAATTAGTTTGCTATTGGAAGATAAATATAAAA...GTACCGACCTCGACAACACTTGCTCCGATGACTGTTTCACCGGTAGCATCCGTAACCACA NZ_CAEUHN010000005
... ... ...
[20] 128200 TAATATTAAAGTGATATTAAAGACTGACTCTAAGGCACTTGATGAAGTGGTAGTAGTAGCT...TGTGTCAAAACGTTGGCACAACCTCCTTTTATATTTTACAGTTCTGCTATATTTTCTTTT NZ_CAEUHN010000020
[21] 109732 AAGTTCTGATAGGACTTGGCATTTCTGCCGGCCTACTCTCTCCGAACCATGTGTTCGCTAC...GCTGTTACTACGACTTCGTCAAGTGCCTTAGCATCAGTCTTTAATATCACTTTGATTATC NZ_CAEUHN010000021
[22] 88286 TCGCCTACCGTCCCGATAGAACTTAGTAAACAGTTTTAAAAACACATATAAACATCTTTAT...GGGAGAATCTTGAAGTGTAAGGATCTTGTTATTAGTTATTTATCTTAAGATATAGGTGTC NZ_CAEUHN010000022
[23] 80205 GTTTGTGTAACTATTGTATCCAACAGTAGCTGCTACCGTAAAGTCACCCTTTTTAGGAGTA...ATGATTGTTTGTCAGAGAAGGACTGCCAAAATGACTGATGACATTGGAAAAACAGGCGCT NZ_CAEUHN010000023
[24] 59327 AGATTCATTTATTTTATCTATATTTGCAAATGAGTATTTATATAAACGTTTAAAGCAAATA...CCTCTTACAGAGGCGTTATACGATAACGTAATAATAAAGATTAGGAAAAGACTTTTCTCA NZ_CAEUHN010000024

The full contigs are exported.
Any idea where I am wrong?

Thanks a ton

Removed from CRAN?

You probably are aware of this, but biofiles has been removed from CRAN. I was still able to install it using devtools::install_github, but I hope you'll fix the issues and put it back on CRAN! I found it to work better than genbankr for the GB files I had to work with.

Thanks!

Consider this for bioconductor

That goes for your reutils package as well - I was about to begin working on this as well as this functionality was missing in Bioconductor, but then I saw your repos...

objects not exported by some bioconductor packages

After building the package with RStudio I get the following error:

installing to Z:/Documents/R/win-library/biofiles/libs/x64
** R
** inst
** preparing package for lazy loading
Error : object 'unlist' is not exported by 'namespace:Biostrings'
ERROR: lazy loading failed for package 'biofiles'

Checking the NAMESPACE file for the recently installed Biostrings packages reveals that unlist is exported but as a method. Testing the package (devtools::test()) produces three additional warnings:

Warnmeldungen:
1: object 'unlist' is not exported by 'namespace:Biostrings' 
2: object 'queryHits' is not exported by 'namespace:IRanges' 
3: object 'split' is not exported by 'namespace:IRanges' 
4: object 'subjectHits' is not exported by 'namespace:IRanges'

I'm using R 3.3.0 64 bit and Bioconductor 3.3 with Biostrings 2.40.2. How to solve this problem?

PS: Installing boost_regex was a nightmare.

Installation on Ubuntu 14.04

Error upon install in Ubuntu 14.04, R 3.3.1, Bioconductor 3.3:

Error in dyn.load(file, DLLpath = DLLpath, ...) : 
  unable to load shared object '/home/rc16041/Software/Rpackages/biofiles/libs/biofiles.so':
  /home/rc16041/Software/Rpackages/biofiles/libs/biofiles.so: undefined symbol: _ZNSt20regex_token_iteratorIN9__gnu_cxx17__normal_iteratorIPKcSsEEcSt12regex_traitsIcEEneERKS7

REFERENCE field

Files without REFERENCE field raise
Error in { : task 1 failed - "'to' cannot be NA, NaN or infinite"
doe it necessary field?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.