GithubHelp home page GithubHelp logo

drstef / machine-learning-and-digital-signal-processing-for-genome-classification Goto Github PK

View Code? Open in Web Editor NEW
1.0 1.0 0.0 20.83 MB

Supervised classification of various species DNA sequences using FFT and Machine Learning.

License: MIT License

cross-correlation digital-signal-processing fasta fft genome-classification genomics genomics-analysis machine-learning numerical-representation-of-dna-sequences python scipy-signal sklearn

machine-learning-and-digital-signal-processing-for-genome-classification's Introduction

Machine Learning - Deep Learning Projects

Advanced projects

This section contains Research and Development projects in Machine Learning and Deep Learning that require original developments. They call on our expertise in Digital Signal Processing, Optimization, Calculus, Linear Algebra.


        Automatic environmental sound classification (ESC) based on ESC-50 dataset (and ESC-10 subset) built by Karol Piczak and described in the following article:
        "ESC: Dataset for Environmental Sound Classification." by Karol J. Piczak. 2015. In Proceedings of the 23rd ACM international conference on Multimedia (MM '15). Association for Computing Machinery, New York, NY, USA, 1015โ€“1018. https://doi.org/10.1145/2733373.2806390"

        Multi-feature Convolutional Neural Networks (CNN) achieves accuracy close to 99%, with custom pre-processing and a fusion mel-spectrograms + complex wavelet transforms.
        The last confusion "sea waves" "rain" is solved by developping an original transform of the complex CWT. This Transform, aT-CWT replaces the phase of the CWT for stationary, pseudo-stationary sounds with a Gaussian characteristics.
        With the aT-CWT transform, the multi-feature CNN model achieves 100% accuracy.

        In this project we develop effective methods for classifying mitochondrial genomes (DNA sequences) based on Digital Signal Processing, Machine Learning, Deep Learning. This is on-going research and results will be published on a regular basis. As a starting point we analyzed the following paper:
        "ML-DSP: Machine Learning with Digital Signal Processing for ultrafast, accurate, and scalable genome classification at all taxonomic levels" by Gurjit S. Randhawa , Kathleen A. Hill and Lila Kari. https://doi.org/10.1186/s12864-019-5571-y

        Their alignement free DNA sequence classification approach: ML-DSP is very effective. By introducing a simple alignment technique and short FFTs: ML-FFT + SoftAlign, we outperform ML-DSP with difficult datasets: Fungi, Insects.
      • Deep Learning and Digital Signal Processing: Voice Activity Detection

      • Machine Learning and Digital Signal Processing: Sound Source Localization



    Standard projects

    This section is a portfolio of Machine Learning projects with Python and various visualization and analysis tools. Most of these projects were carried out within the framework of IBM certifications. They are presented with Jupyter Notebooks.
    Some projects have been improved by incorporating more in-depth data analysis, better graphs, advanced ML techniques.

        In this project, we predict if the Falcon 9 first stage will land successfully. Project includes: SpaceX data collection, Data Wrangling, Webscraping, EDA with SQL Queries & Data visualization, SpaceX Launch Records Dashboard, Launch Sites Locations Analysis with Folium, Machine Learning classification with optimization of hyperparameters and selection of best model: KNN, Decision Tree, SVM, Logistic Regression.
        A widerange of small projects with various ML techniques, prediction, supervised and unsupervised classification: Linear Regression, Polynomial Regression, Non-Linear Regression, Recommandation Systems, KNN, Customer Segmentation with K-Means, Hierarchical Clustering, Density-Based Clustering, Logistic Regression.
        The project consists of finding the best model for predicting home prices in King County, USA in Washington State, based on a dataset of homes sold between May 2014 and May 2015. Prediction accuracy was improved by implementing a spline regression model.
        One Jupyter Notebook includes interactive Folium maps (interactive maps will not display on Github).

        Loan Status Prediction using Supervised Classification Algorithms: KNN, Decision Tree, SVM, Logistic Regression.

    Data Analysis - SQL, MySQL

        Old dataset on housing prices derived from the U.S. Census Service to present insights based on our experience in Statistics. Median value of houses bounded by the Charles river, of owner-occupied units built before 1940, relationship between Nitric oxide concentrations and the proportion of non-retail business acres per town, impact of weighted distance to the five Boston employment centres on the median value of owner-occupied homes.

        Dataset: car dataset including various makes, specifications and prices.
        After cleaning the dataset, running statistics, identifying the most relevant variables, we develop several models that will predict the price of a car using a set of features/variables.

        Word Cloud

        Folium with markers

        Choropleth

      • Databases and SQL for Data Science

      • Stock extraction & vizualisation - yFinance, Webscraping



    Digital Signal Processing


    Modeling and Scientific Computing


        "Figure 8" toroid

        Gyroid

        Truncated cuboctahedron

        Helicoid-Catenoid

      • Linear Algebra problems




    ๐Ÿ”ญ Iโ€™m currently working on advanced projects in ML & DL
    ๐Ÿ‘ฏ Iโ€™m looking to collaborate on Digital Signal Processing, Machine Learning, Deep Learning
    ๐Ÿ“ซ How to reach me: [email protected]

    machine-learning-and-digital-signal-processing-for-genome-classification's People

    Contributors

    drstef avatar

    Watchers

     avatar

    Recommend Projects

    • React photo React

      A declarative, efficient, and flexible JavaScript library for building user interfaces.

    • Vue.js photo Vue.js

      ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

    • Typescript photo Typescript

      TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

    • TensorFlow photo TensorFlow

      An Open Source Machine Learning Framework for Everyone

    • Django photo Django

      The Web framework for perfectionists with deadlines.

    • D3 photo D3

      Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

    Recommend Topics

    • javascript

      JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

    • web

      Some thing interesting about web. New door for the world.

    • server

      A server is a program made to process requests and deliver data to clients.

    • Machine learning

      Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

    • Game

      Some thing interesting about game, make everyone happy.

    Recommend Org

    • Facebook photo Facebook

      We are working to build community through open source technology. NB: members must have two-factor auth.

    • Microsoft photo Microsoft

      Open source projects and samples from Microsoft.

    • Google photo Google

      Google โค๏ธ Open Source for everyone.

    • D3 photo D3

      Data-Driven Documents codes.