GithubHelp home page GithubHelp logo

Comments (3)

aradahir avatar aradahir commented on August 20, 2024

package requirement: pandas, biopython, os
input ie.
subtype = 'H3N2'
passage_h = 'Egg'
GENE = 'HA'
filter_fasta = './sample.fasta' # output file name
filter_csv = './output.csv' # output metafiles
input_fasta = "./dump.fasta" # input file name
input_csv = './example.csv' # input csv from fuzee

def rename(input_fasta, input_csv, filter_fasta, passage_h, subtype, GENE):
    dump_seq = list(SeqIO.parse(input_fasta, "fasta"))
    data = pd.read_csv(input_csv)
    #param setting
    gene = dict({   '4': 'HA',
                    '6': 'NA',
                    '1': 'PB2',
                    '2': 'PB1',
                    '3': 'PA',
                    '5': 'NP',
                    '7': 'MP' })
    passage = dict({    'q':'QMC',
                        'e': 'Egg', 
                        'o': 'original_specimen', 
                        'cs': 'clinical_specimen'})

    #pandas operation
    #seperate data to new column for extracting the gene id
    data[['Seq','Gene_id']] = data['Seq No'].str.split('.',expand=True)
    #map gene id with name and write it to a new column
    data['Gene'] = data['Gene_id'].map(gene)
    #extract month and convert into the String (ie. jan, feb, etc.)
    data['Month'] = pd.to_datetime(data['Sample Date']).dt.strftime('%b').str.lower()
    #concatenate a designation and month to create the name using in a phylogenetic tree
    data['name'] = data['Designation'].str.replace(" ", "_") + '_' + data['Month']
    #find the pattern and map the passage history
    data['Passage type'] = data['Passage History'].astype(str).str[0].str.lower().map(passage)

    #filter only the row that contains information about Designation, HA clade, Sub Type, and Passage History
    processed_data = data.loc[(data['Designation'].isnull() + data['HA Clade'].isnull() + data['Sub Type'].isnull()+ data['Passage History'].isnull()) ==0 ]

    #filter only the row that match with our query
    processed_data = processed_data.loc[processed_data['Sub Type'] == subtype]
    processed_data = processed_data.loc[processed_data['Passage type'] == passage_h]
    processed_data = processed_data.loc[processed_data['Gene'] == GENE]

    # use the info from data to subset only sequences you want
    subset_fasta = [] 
    for seq in dump_seq:
        if seq.id in list(processed_data['Seq No']):
            subset_fasta.append(seq)

    #iterate through them to rename and write into new fasta file
    with open(filter_fasta, 'w') as f_fasta:
        for seq in subset_fasta:
            name = list(processed_data.loc[processed_data['Seq No'] == seq.id, 'name'])[0]
            seq.id = name
            seq.description = name
            SeqIO.write(seq, f_fasta, 'fasta')

    #write clean metafile
    colname = ['Seq No','Lab #','Lineage', 'Designation',' Passage History' , 'Sample Date',' Results',	'GISAID #','HA Clade', 'name']
    processed_data[colname].to_csv(filter_csv,index=False)

from flukit.

ammaraziz avatar ammaraziz commented on August 20, 2024

Thanks @aradahir that's great work. I'll work on integrating it.

from flukit.

ammaraziz avatar ammaraziz commented on August 20, 2024

This has been combined with #6

from flukit.

Related Issues (7)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.