GithubHelp home page GithubHelp logo

bigdataproject's People

Contributors

ahhhz975 avatar asabaghpour avatar dasha-x avatar its-alann avatar

Watchers

 avatar  avatar  avatar

bigdataproject's Issues

Making a decision to choose either "Clustering" or "Classification" methods

One potential way to do this is to deeply understand the features in the dataset and explore it further.

The other important point here is that we should clearly talk to Sofvie and determine what exactly their goal is. For example, they might say we need only a method that classifies the data into 4 or 5 groups. In this case, we should implement one of the clustering methods mentioned in our project summary. In another scenario, assume they might say they have some pre-defined classes (groups) like (low, med, and high completion rates) and then they need to classify the employees in these classes.

Anyway, this has to be done before starting any other tasks.

Determining the categories of employees based on their completion rate.

To perform the classification or clustering task for classifying the employees based on their training completion rates, we need to determine two parameters: 1- The number of classes (or categories) that we expect our method to break down the data into 2- The classes (categories) that we expect our method to break down the data into

If the team comes up with Clustering-based methods (unsupervised), then we need only the first parameter (the number of classes). Otherwise, if we come up with classification-methods (supervised), then we need both parameters to implement the method.

Saving dataset tables in several .csv files

The dataset shared by Sofvie now is an online dataset that needs to be saved in our project directory to easily work with. To do that, we need a simple "dataLoader" function to automatically load the dataset from the Sofvie domain and then store it in the directory of the project.

Design and implementation of a deep-learning-based method to classify employees based on their training completion rates

Whether we decide on performing clustering (unsupervised) methods or classification (supervised) methods, we need to implement a deep neural framework to classify the employees based on their training completion rates.

Note: Personally, I am thinking of an auto-encoder (encoder-decoder) architecture as a potential architectural design for our deep neural framework. In this architecture, we first extract unique features of the data using the encoder part of the architecture. This part is the common part in both clustering and classification methods. Then, we convert those features to some classes or a number of classes using the decoder part. This design of the decoder will be changed based on the task we will decide to work on (which could be either clustering or classification).

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.