GithubHelp home page GithubHelp logo

gsoc-final-report's Introduction

GSoC Final Report

Organization : International Neuroinformatics Coordinating Facility (INCF)
Project : Automated comparison of scientific methods for time series analysis.
Student : Salman Khan
Mentors : Ben Fulcher, Oliver Cliff, Joseph Lizier

Introduction

In this Google Summer of Code 2020 project I developed a web-based system that helps the user to compare new time-series analysis algorithms to a collection of over 7700 existing algorithms, implemented as the hctsa package. The website takes a new time-series analysis algorithm (as python code) from the user and computes its outputs across a dataset of 1000 diverse time series. It then analyzes the correlation between the output of the user's algorithm with the hctsa feature library and presents a range of intuitive output visualizations that show the best-matching features. This output helps the user to understand connections between their method and the existing interdisciplinary time-series analysis literature, and therefore to assess whether their algorithm is really contributing progress to the literature.

Here is an example of the website functionality I developed from scratch in this GSoC project:

What was done

Since the project needs to be developed from scratch, I have broken down the development process into three parts:

  1. First phase - Backend / logic development

  2. I developed a series of functions to enable successful execution of the user's code, and to perform systematic comparison of its output to that of existing algorithms:

    • Read the user's code as a string to check for malicious code before execution.
    • Passes a diverse time-series dataset through user's function and generate long feature vector.
    • Compute the Spearman correlation coefficient between the computed feature vector and with every individual hctsa feature, and sort and store all of the relevant information: (Feature name, Keywords, p-value, Correlation coefficient).
    • Structure the results for rendering in a dynamic table and interactive plotting.

  3. Second phase - Front-end development

  4. In this phase, I focused on front-end development, that will be used by the user. I implemented a range of functionality, including:

    • Development of pages for websites, including 'Home', 'How-it-works', 'Contact', 'Preloader', 'Result', 'Syntax error', 'Timeout Error', and '404 Not found'.
    • Interactive results table (functionality shown in the gif below), that allows users to:
      • Toggle to change representation of results.
      • Download all results in .csv format.
      • Toggle button to view table in full size.
      • Choose show / hide column from table.
    • Visualization of top 12 results as interactive scatter plots (as visualized in the gif below), which enables users to:
      • Hover to see data points.
      • Zoom each plot or all subplots simultaneously to more clearly visualize the relationships.
    • Visualization of pairwise relationships between each of the top 12 matches as a correlation heatmap reordered using linkage clustering.

  5. Third phase - Running user's code securely and with error handling.

    This was one of the major challenges, as executing custom user code on a server could compromise the system. Thus, in order to run user's code safely, we:

    • Used RestrictedPython to run the user's code in a restricted environment.
    • Allow the user to import only specific modules that are relevant to scientific data analysis, and thus disabling functionality related to accessing/modifying the system.
    • Restricted in-built functions like exec or eval that could be used to harm the system.
    • Added a timeout limit so that the system is protected from algorithms falling into an infinite loop.

    Link to work


    Weekly Reports

    These are the weekly reports that i had submitted to INCF during GSoC period:

    Week 1 & Week 2
    Week 3
    Week 4
    Week 5
    Week 6
    Week 7
    Week 8
    Week 9
    Week 10
    Week 11
    Week 12


    Future Work

    Although all the requirements of this project as outlined in the GSoC proposal have been completed, this project represents the important initial steps in the full development of CompEngine-Features. After the official GSoC period, I plan to contribute to this further development by:

    • Adding an explore mode by which user can compare already exisiting features.
    • Adding a nested result table clicking on any result will take to other result table of similar features
    • Implementing additional visualizations, including a network visualization.

gsoc-final-report's People

Contributors

benfulcher avatar salmankhancodes avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.