GithubHelp home page GithubHelp logo

ml-project-2-ml_pls's Introduction

Second Project of the EPFL Machine Learning Course (CS-433)

General Information

This repository contains the code for project 2 of the Machine Learning course 2021 (CS-433).

Team

The project is accomplished by team PLSteam with members:

Leandre Castagna: @Defteggg
Pascal Epple : @epplepascal 
Selima Jaoua : @salimajaoua

Project presentation and methods used

Presentation :

The aim of this project is to explore if the heterogeneity of schizophrenia can be tackled by dividing the patients into subgroups using unsupervised learning. A clinical data set where 227 schizophrenia patients performed a battery of perceptual and cognitive tasks has been provided. The purpose is to build clusters in order to construct different classifications and use them to describe differences between groups of patients. To validate the subgroups, polygenic risk scores and the symptoms subscales are provided in the data set.

Data set :

The data set is not available as it contains highly confidential information. The computed results and plots are nonetheless present in the provided Jupyter notebook.

Needed packages in order to run the code:

To run the code, one needs to download the latest version of Python. Moreover, the following external libraries are used:

  • NumPy
  • Sklearn
  • SciPy
  • Pandas

Clustering methods used:

Three clustering methods were used: K-Means with the L1 norm, the Expectation Maximization (EM) algorithm and DBSCAN.

Methods:

Our code is separated into 6 different .py files, with 3 .ipynb files which are consisting of the Main of our project and were all of our results can be seen, DBSCAN and Topological Clustering . As the data set cannot be shared, the results are not reproducible. Contact us in case you would like to get further informations about some results, we are glad to answer your questions.

We hereby briefly introduce the 6 .py files and explain their utility:

  • anova_ttest.py: Functions to perform ANOVA and Welch's t-test on our data.
  • Cross_validation.py: Implements cross-validation as described in the final report (search for the report.pdf file in the repository for further information)
  • data_import.py: Data management : In this .py, we just create a function which download the data with the symptom and the output of the function returns a list with all the symptom and an other list with the name corresponding to each symptom
  • map_label.py: Reassigns the label of one cluster with the labels of another one. This method is necessary in order to compare two clusters, as they might be differently labeled depending on the initial value we launched the algorithm with.
  • score.py: Computes the Davis Bouldin and silhouette scores of a given cluster.
  • Stability_function.py: Functions to determine how the labelling is affected with respect to the initial value the clustering algorithm is launched with.

ml-project-2-ml_pls's People

Contributors

salimajaoua avatar epplepascal avatar castagnaleandre avatar

Watchers

Matteo Pagliardini avatar Martin Jaggi avatar Roberto Castello avatar Maksym Andriushchenko avatar ztzthu avatar  avatar  avatar

Forkers

epplepascal

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.