GithubHelp home page GithubHelp logo

canklot / orifinder Goto Github PK

View Code? Open in Web Editor NEW
0.0 2.0 0.0 368 KB

The origin of replication (also called the replication origin) is a particular sequence in a genome at which replication is initiated.

Jupyter Notebook 86.00% Python 14.00%

orifinder's Introduction

OriFinder

Origin of replication finder using python. Thanks to hash tables its very fast

The origin of replication (also called the replication origin) is a particular sequence in a genome at which replication is initiated.

from collections import Counter
#Cause of string.count() is slow and we have a HUGE dataset we need a faster method to search.
#Collections.Counter is a data type for counting data using hash tables which is very fast

def diveandconquer(s):
    maxcount = 0
    maxsubstring = ""
    for u in range(5, 20):
        allsubstrings = []
        for i in (range(len(s)-u+2)):
            allsubstrings.append(s[i:i+u])
        c = Counter(allsubstrings).most_common(1)
        if(c[0][1] >= maxcount):
            maxsubstring = c[0][0]
            maxcount = c[0][1]
    return maxsubstring + " " +str(maxcount)
#We have two variables to store max number of occurances and most frequent substring. Then we have a list to store all possible substring. 
#In the for loop we create 5 charecter long substrings by shift throug original string.
# Then we create a counter object with all possible substring and retrive the most common one and assign it to C.
# Counter object return two tupples in a list in this format [("theword"),(20)] first the word and the number of occurances. 
#If the number of occurance is higher than previus maxcount value we assign current substring and count to maxcount and maxsubstring. 
#The we itterate the same code with 5 to 20 charecters long substrings.


#f = open("vibrio_cholerae_light.txt", "r")
#f = open("vibrio_cholerae_med.txt", "r")
# = open("deneme.txt", "r")

f = open("vibrio_cholerae.txt", "r")
data = f.read()
f.close()
#We read our dataset from a file and store it in a variable named data. 
#I created 3 alternative datasets by deleting much of the original dataset to prevent memory errors and reduce the time taken by running the code while prototyping

sonuc = diveandconquer(data)
print(sonuc)
TTTTT 3193
And laslty we call our function and print it

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.