gschofl / biofiles Goto Github PK

View Code? Open in Web Editor NEW

14.0 14.0 10.0 1.64 MB

R interface to GenBank/GenPept files

License: Other

R 88.70% C++ 8.88% Gnuplot 2.42%

biofiles's People

Contributors

Stargazers

Watchers

Forkers

awenocur rickcopin jeanmanguy scfurl mchiapello urineri zetcheuv moloncolab

biofiles's Issues

GbFeatureTable Extension

I have updated annotation data for my Genbank file. Is it possible to create new feature objects and extend the list of features of the given Genbank file ?

Create a GenBank file from a data frame in R

Hello, I have an R data frame like the following and I was wondering if it is possible to create a GenBank file out of each one of the entries.

This is an example data frame:

              Barcode     TRAV   TRAJ               TRA_CDR3                 TRA_Full
12 AACxxxxxxxxxxxAT-1   TRAV21 TRAJ41 TGTGCxxxxxxxxxxxxxxTTT ACTxxxxxxxxxxxxxxGGAA...
13 AACxxxxxxxxxxxTC-1   TRAV21 TRAJ41 TGTGCxxxxxxxxxxxxxxTTT GATxxxxxxxxxxxxxxACAA...
27 AAGxxxxxxxxxxxTA-1   TRAV21 TRAJ41 AAAGGxxxxxxxxxxxxxxTTT TGGxxxxxxxxxxxxxxGAGC...
30 AAGxxxxxxxxxxxTT-1   TRAV11 TRAJ31 TGTGCxxxxxxxxxxxxxxTTT ACTxxxxxxxxxxxxxxGGAA...
37 AATxxxxxxxxxxxGG-1   TRAV11 TRAJ31 GCTACxxxxxxxxxxxxxxTTT GTGxxxxxxxxxxxxxxCTCC...
39 AATxxxxxxxxxxxTC-1   TRAV11 TRAJ31 GTGAGxxxxxxxxxxxxxxTTC CCGxxxxxxxxxxxxxxCAGT...

My goal is to make a different GenBank file for each Barcode, including the TRAV and TRAJ info (maybe in SOURCE or some other field), the CDR3 location in the FEATURE field, and finally TRA_Full as the the ORIGIN sequence.

I was wondering whether biofiles would allow me to not have to do this manually. Looking at the write.GenBank function, it accepts >gbRecord instance. It seems like I can create such instance with the gbRecord function, that accepts A vector of paths to GenBank/Embl format records, an efetch object containing GenBank record(s), or a textConnection to a character vector that can be parsed as a Genbank or Embl record. I am hopping the later option can serve my goal.

I think the question here would be how to format my data frame as a textConnection to a character vector that can be parsed as a Genbank or Embl record, so I can eventually print out as GenBank files... is that even possible? I cannot find any example for this particular task.

Many thanks!
Daniel

biofiles::gbRecord(gb_file) Error in { : task 1 failed - "'mc.cores' must be >= 1"

Hi,
I got an error,What should I do

> gb_file <- reutils::efetch("CP000148", "nuccore", rettype = "gbwithparts", retmode = "text")
[2018-12-06 12:05:43] [error] handle_read_frame error: websocketpp.transport:7 (End of File)
> gb_file
Object of class ‘efetch’ 
LOCUS       CP000148             3997420 bp    DNA     circular BCT 28-JAN-2014
DEFINITION  Geobacter metallireducens GS-15, complete genome.
ACCESSION   CP000148 AAAS03000000 AAAS03000001 AAAS03000002 AAAS03000003
            AAAS03000004 AAAS03000005 AAAS03000006 AAAS03000007 AAAS03000008
            AAAS03000009 AAAS03000010 AAAS03000011
VERSION     CP000148.1
DBLINK      BioProject: PRJNA177
            BioSample: SAMN02598399
KEYWORDS    .
SOURCE      Geobacter metallireducens GS-15
  ORGANISM  Geobacter metallireducens GS-15
            Bacteria; Proteobacteria; Deltaproteobacteria; Desulfuromonadales;
...
EFetch query using the ‘nuccore’ database.
Query url: ‘https://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?=efetch&db=nuccore&id=CP000148&retmode=text&rettype=...’
Retrieval type: ‘gbwithparts’, retrieval mode: ‘text’

> rec <- biofiles::gbRecord(gb_file)
Error in { : task 1 failed - "'mc.cores' must be >= 1"
> rec
Error: object 'rec' not found

Error in strsplit(seqs, "(?< = .{10})(? = .)", perl = TRUE)

I'm having this error when I try to write a genbank file:

write.GenBank(gb, 'genome1_g1.gbk')
Error in strsplit(seqs, "(?< = .{10})(? = .)", perl = TRUE) : 
  invalid split pattern '(?< = .{10})(? = .)' 
In addition: Warning message:
In strsplit(seqs, "(?< = .{10})(? = .)", perl = TRUE) :
  PCRE pattern compilation error 
	'unrecognized character after (?<'
	a ' = .{10})(? = .)'

filter 16S rRNA sequence from gnenbank file

Dear biofiles developpers / users,

I am trying to extract 16S operon from a collection of genbank flat files using R.

I am able to load the file sucessfully:

gbk <- biofiles::gbRecord("~/Projects/ETH/benoit/Bacteroides/200824/GCF_903181445.1_NZ_Bacteroides_fragilis_8E3_BL_hyb_genomic.gbff", 
                          progress = TRUE)

But when it comes to use the biofiles::filter function I can't nake it work:
biofiles::filter(gbk, Feature = "rRNA", product = "16S ribosomal RNA") -> rRNA

biofiles::summary(rRNA)

[[NZ_CAEUHN010000001]]
1857785 bp: Bacteroides fragilis isolate NZ_Bacteroides_fragilis_8E3_BL_hyb, whole genome shotgun sequence.
Error in if (is.atomic(x) || len == 1L && length(nm) <= 1L) { :
missing value where TRUE/FALSE needed

biofiles::getSequence(rRNA)

A DNAStringSet instance of length 24
width seq names
[1] 1857785 AAGTTCTGATAGAACTTAGAAGAGAATGCTCTTTTTACTATTGATTTTAATACTTTTCTCT...AATGACCGTCAATAAATTTTCGACATCCTGAACAGAGCTAATATTGTCCCTTATTGGGAT NZ_CAEUHN010000001
[2] 47339 GTCCGTTTTACCACTATAAATAGTTTCGGAAATACTTACGGTTTGAATGAGAAAAGATGTC...GAACGGATAATAAATTGGATATATTCATTTGTTTTCCAAATAGTTACTACTAAAATGCCT NZ_CAEUHN010000002
[3] 44645 AGAAACATTGATTATCAATGTTCTACAGGATAAACACCCAACTTTACGTCCAAACTGTAAA...GATACCTATTATTGGGCTACAAACATATACGTTATGTAGAATTTATAGAAAAAATAGGGG NZ_CAEUHN010000003
[4] 43518 ACATTCGTTCGTTCTCAACTTCTAAAAATGTTTCGTAAAATTTGACGGTTTGAAAGAGAAA...CAAAGCTTTTGAAGATAGAAATCATACTTTTTAAAGGTATTGATTTTCAGAAGGTTTATC NZ_CAEUHN010000004
[5] 5021 AATTATTGGATACAATTTCCAGAAAGAATAATTAGTTTGCTATTGGAAGATAAATATAAAA...GTACCGACCTCGACAACACTTGCTCCGATGACTGTTTCACCGGTAGCATCCGTAACCACA NZ_CAEUHN010000005
... ... ...
[20] 128200 TAATATTAAAGTGATATTAAAGACTGACTCTAAGGCACTTGATGAAGTGGTAGTAGTAGCT...TGTGTCAAAACGTTGGCACAACCTCCTTTTATATTTTACAGTTCTGCTATATTTTCTTTT NZ_CAEUHN010000020
[21] 109732 AAGTTCTGATAGGACTTGGCATTTCTGCCGGCCTACTCTCTCCGAACCATGTGTTCGCTAC...GCTGTTACTACGACTTCGTCAAGTGCCTTAGCATCAGTCTTTAATATCACTTTGATTATC NZ_CAEUHN010000021
[22] 88286 TCGCCTACCGTCCCGATAGAACTTAGTAAACAGTTTTAAAAACACATATAAACATCTTTAT...GGGAGAATCTTGAAGTGTAAGGATCTTGTTATTAGTTATTTATCTTAAGATATAGGTGTC NZ_CAEUHN010000022
[23] 80205 GTTTGTGTAACTATTGTATCCAACAGTAGCTGCTACCGTAAAGTCACCCTTTTTAGGAGTA...ATGATTGTTTGTCAGAGAAGGACTGCCAAAATGACTGATGACATTGGAAAAACAGGCGCT NZ_CAEUHN010000023
[24] 59327 AGATTCATTTATTTTATCTATATTTGCAAATGAGTATTTATATAAACGTTTAAAGCAAATA...CCTCTTACAGAGGCGTTATACGATAACGTAATAATAAAGATTAGGAAAAGACTTTTCTCA NZ_CAEUHN010000024

The full contigs are exported.
Any idea where I am wrong?

Thanks a ton

Removed from CRAN?

You probably are aware of this, but biofiles has been removed from CRAN. I was still able to install it using devtools::install_github, but I hope you'll fix the issues and put it back on CRAN! I found it to work better than genbankr for the GB files I had to work with.

Thanks!

Consider this for bioconductor

That goes for your reutils package as well - I was about to begin working on this as well as this functionality was missing in Bioconductor, but then I saw your repos...

objects not exported by some bioconductor packages

After building the package with RStudio I get the following error:

installing to Z:/Documents/R/win-library/biofiles/libs/x64
** R
** inst
** preparing package for lazy loading
Error : object 'unlist' is not exported by 'namespace:Biostrings'
ERROR: lazy loading failed for package 'biofiles'

Checking the NAMESPACE file for the recently installed Biostrings packages reveals that unlist is exported but as a method. Testing the package (devtools::test()) produces three additional warnings:

Warnmeldungen:
1: object 'unlist' is not exported by 'namespace:Biostrings' 
2: object 'queryHits' is not exported by 'namespace:IRanges' 
3: object 'split' is not exported by 'namespace:IRanges' 
4: object 'subjectHits' is not exported by 'namespace:IRanges'

I'm using R 3.3.0 64 bit and Bioconductor 3.3 with Biostrings 2.40.2. How to solve this problem?

PS: Installing boost_regex was a nightmare.

Installation on Ubuntu 14.04

Error upon install in Ubuntu 14.04, R 3.3.1, Bioconductor 3.3:

Error in dyn.load(file, DLLpath = DLLpath, ...) : 
  unable to load shared object '/home/rc16041/Software/Rpackages/biofiles/libs/biofiles.so':
  /home/rc16041/Software/Rpackages/biofiles/libs/biofiles.so: undefined symbol: _ZNSt20regex_token_iteratorIN9__gnu_cxx17__normal_iteratorIPKcSsEEcSt12regex_traitsIcEEneERKS7

REFERENCE field

Files without REFERENCE field raise
Error in { : task 1 failed - "'to' cannot be NA, NaN or infinite"
doe it necessary field?

gschofl / biofiles Goto Github PK

biofiles's People

Contributors

Stargazers

Watchers

Forkers

biofiles's Issues

GbFeatureTable Extension

Create a GenBank file from a data frame in R

biofiles::gbRecord(gb_file) Error in { : task 1 failed - "'mc.cores' must be >= 1"

Error in strsplit(seqs, "(?< = .{10})(? = .)", perl = TRUE)

filter 16S rRNA sequence from gnenbank file

Removed from CRAN?

Consider this for bioconductor

objects not exported by some bioconductor packages

Installation on Ubuntu 14.04

REFERENCE field

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs