GithubHelp home page GithubHelp logo

yodeng / fsplit Goto Github PK

View Code? Open in Web Editor NEW
6.0 1.0 1.0 15.45 MB

for split fastq from mix bcl or fastq data by barcode/index

License: MIT License

Python 60.11% Shell 0.09% Go 39.79%
barcode bcl2fastq split

fsplit's Introduction

fsplit

fsplit是用于根据barcode信息从BCLfastq混合数据中拆分样本数据的软件。

软件环境

  • python >=2.7.10, <=3.11
  • bcl2fastq
  • linux

安装

pip install git+https://github.com/yodeng/fsplit.git

用法

fsplit可用于fastqbcl数据拆分。

fastq数据拆分

对于含有barcode的fastq混合数据,可根据barcode信息将其拆分为样本信息数据。

需提前为fastq数据创建索引文件,加快程序运行速度。

1) fsplit index

1.0.3之前版本采用索引多进程方式实现,1.0.4及以后版本不在需要索引。

fastq文件创建fai索引文件,输出test.fastq.fai文件,自动识别gzip压缩格式。

fsplit index -i test.fastq.gz

也可以使用samtools建立索引,fsplit兼容samtools fqidx的索引输出格式.

2) fsplit split

根据barcode序列,从fastq文件中拆分属于各样本的fastq数据。若fastq索引文件不存在,会先创建索引文件,然后运行split程序。

1.0.4及以后版本不在需要索引,直接读取fastq并处理。

fsplit split --help查看帮助:

参数 描述
-i/--input 输入的fastq文件
-I/--Input 输入的paired fastq文件, read2
-b/--barcode barcode信息文件,两列或三列,第一列为样本名,第二列为barcode1序列,第三列为barcode2序列
-m/--mismatch barcode拆分时运行的错配碱基数,默认0,不允许错配
-o/--output 结果输出目录,不存在会自动创建
-d/--drup 输出结果中是否去除barcode序列,默认不去除
-rc1/--rc-bc1 对barcode1进行反向互补查找
-rc2/--rc-bc2 对barcode2进行反向互补查找
--output-gzip 输出gzip压缩的fastq文件,使用python zlib接口,会减慢运行速度。

BCL数据拆分

支持BCL原始芯片测序数据的拆分,封装bcl2fastq软件,根据barcode信息拆分为各自样本的fastq数据,兼容单端或双端index拆分。

参数说明

使用fsplit bcl2fq命令,拆分bcl数据,相关参数如下:

参数 描述
-i/--input 输入的BCL数据flowcell目录
-s/--sample sample sheet信息文件,两列或三列,空白隔开,第一列为样本名,第二列为indel1(i7)序列,第三列为index2(i5)序列
-m/--mismatch barcode拆分时运行的错配碱基数,默认1,允许1个碱基错配
-t/--threads 运行使用的cpu核数
-o/--output 结果输出目录,不存在会自动创建
-rc1/--rc-index1 将index1(i7)序列反向互补
-rc2/--rc-index2 将index2(i5)序列反向互补
--bcl2fq 指定bcl2fastq软件路径,不指定会自动从$PATH或sys.prefix中查找

版本更新记录

version 1.0.0

  • 设计多进程并发读取和运行方式
  • 仅支持fastq数据拆分
  • 需建立fastq索引

version 1.0.1

  • 添加运行时间记录
  • 优化进程共享队列,批量处理输出

version 1.0.2

  • 新增BCL数据单端index拆分功能
  • fastq读取索引优化

version 1.0.3

  • 新增BCL双端index拆分功能
  • 新增屏幕输出logging日志记录
  • 优化fastq index步骤,采用稀疏索引,减小索引文件大小,加快读取速度
  • 采用互斥锁取代进程共享队列

version 1.0.4

  • 单线程读取,子进程解压,处理后序列直接写入文件,取消建立索引步骤,取消多进程处理,取消文件互斥锁
  • split步骤同时添加golang实现gsplit.

version 1.0.5

  • 新增bcl2fq子命令封装bcl2fastq软件,用于bcl数据拆分

version 1.0.6

  • 新增split子命令对双端paired fastq拆分支持

fsplit's People

Contributors

yodeng avatar

Stargazers

JohnCachy avatar  avatar Song avatar Junbo Yang avatar  avatar  avatar

Watchers

 avatar

Forkers

joybio

fsplit's Issues

AttributeError: 'Namespace' object has no attribute 'log'

It seems like specifying the log parameter is necessary. How should this parameter be specified?

$ /SoftWare/Python-3.7.3/bin/fsplit split -i Undetermined_R1.fq.gz -I Undetermined_R2.fq.gz -b barcode.txt -m 1 -o ./
Traceback (most recent call last):
  File "/SoftWare/Python-3.7.3/bin/fsplit", line 8, in <module>
    sys.exit(main())
  File "/SoftWare/Python-3.7.3/lib/python3.7/site-packages/fsplit/utils.py", line 209, in wrapper
    value = func(*args, **kwargs)
  File "/SoftWare/Python-3.7.3/lib/python3.7/site-packages/fsplit/main.py", line 11, in main
    logs = log(logfile=args.log)
AttributeError: 'Namespace' object has no attribute 'log'

$ /SoftWare/Python-3.7.3/bin/fsplit split -i Undetermined_R1.fq.gz -I Undetermined_R2.fq.gz -b barcode.txt -m 1 -o ./ --log=log.txt
usage: fsplit [-h] [-v] command ...
fsplit: error: unrecognized arguments: --log=log.txt

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.