GithubHelp home page GithubHelp logo

ih_project7_ml_algorithms's Introduction

project7

Ironhack project 7 - machine learning

Models implemented:

  • PassiveAggressiveClassifier
  • SGDClassifier
  • ComplementNB

Additional models (not tested) :

  • LinearSVC
  • KNeighborsClassifier

Process

  1. Data discovery
  2. Data cleaning
  3. Data pre-processing/ scaling
  4. Feature selection (if needed)
  5. Implement your models on your data
  6. Hyperparameters tuning
  7. Implement AutoML (TPOT) optional
  8. Compare the results using metrics:
  • accuracy
  • recall
  • precision
  • ROC_AUC score
  • plot ROC_AUC curve

Expectations

  • Clean, well-commented code
  • Clean data with EDA
  • Clear board in Trello with logged time for each task
  • Clear descriprion of each model
  • Models implementation and comparison

Deliverables

  • '1. Clean and encoded data: two files, data_corr_dropped and data_low_var_dropped, based on which version is preferred
  • '2. ipynb files with all code concerning EDA, data cleaning and modelling for each of the models
  • '3. Slides with each model description (how it works, what the parameters are, what exactly you used) and results (for each model and final table with models comparison).
  • '4. Please state the conclusion about usability of each model.
  • '5. Trello board with logged time.

Time spent on each task

  • Data cleaning 2 hours
  • Data preprocessing (features, scaling) 1 hours
  • Models investigation 1 hour per person
  • Models implementation - 2 hours
  • Slides/dashboard/notebook - 2 hours
  • Finalization and "beautification" (github, etc) - 2 hours Average time per person 10 hours.

About the data

The data was collected from the Taiwan Economic Journal for the years 1999 to 2009. Company bankruptcy was defined based on the business regulations of the Taiwan Stock Exchange.

Below is the description of the different variables

  • Y - Bankrupt?: Class label
  • X1 - ROA(C) before interest and depreciation before interest: Return On Total Assets(C)
  • X2 - ROA(A) before interest and % after tax: Return On Total Assets(A)
  • X3 - ROA(B) before interest and depreciation after tax: Return On Total Assets(B)
  • X4 - Operating Gross Margin: Gross Profit/Net Sales
  • X5 - Realized Sales Gross Margin: Realized Gross Profit/Net Sales
  • X6 - Operating Profit Rate: Operating Income/Net Sales
  • X7 - Pre-tax net Interest Rate: Pre-Tax Income/Net Sales
  • X8 - After-tax net Interest Rate: Net Income/Net Sales
  • X9 - Non-industry income and expenditure/revenue: Net Non-operating Income Ratio
  • X10 - Continuous interest rate (after tax): Net Income-Exclude Disposal Gain or Loss/Net Sales
  • X11 - Operating Expense Rate: Operating Expenses/Net Sales
  • X12 - Research and development expense rate: (Research and Development Expenses)/Net Sales
  • X13 - Cash flow rate: Cash Flow from Operating/Current Liabilities
  • X14 - Interest-bearing debt interest rate: Interest-bearing Debt/Equity
  • X15 - Tax rate (A): Effective Tax Rate
  • X16 - Net Value Per Share (B): Book Value Per Share(B)
  • X17 - Net Value Per Share (A): Book Value Per Share(A)
  • X18 - Net Value Per Share (C): Book Value Per Share(C)
  • X19 - Persistent EPS in the Last Four Seasons: EPS-Net Income
  • X20 - Cash Flow Per Share
  • X21 - Revenue Per Share (Yuan ¥): Sales Per Share
  • X22 - Operating Profit Per Share (Yuan ¥): Operating Income Per Share
  • X23 - Per Share Net profit before tax (Yuan ¥): Pretax Income Per Share
  • X24 - Realized Sales Gross Profit Growth Rate
  • X25 - Operating Profit Growth Rate: Operating Income Growth
  • X26 - After-tax Net Profit Growth Rate: Net Income Growth
  • X27 - Regular Net Profit Growth Rate: Continuing Operating Income after Tax Growth
  • X28 - Continuous Net Profit Growth Rate: Net Income-Excluding Disposal Gain or Loss Growth
  • X29 - Total Asset Growth Rate: Total Asset Growth
  • X30 - Net Value Growth Rate: Total Equity Growth
  • X31 - Total Asset Return Growth Rate Ratio: Return on Total Asset Growth
  • X32 - Cash Reinvestment %: Cash Reinvestment Ratio
  • X33 - Current Ratio
  • X34 - Quick Ratio: Acid Test
  • X35 - Interest Expense Ratio: Interest Expenses/Total Revenue
  • X36 - Total debt/Total net worth: Total Liability/Equity Ratio
  • X37 - Debt ratio %: Liability/Total Assets
  • X38 - Net worth/Assets: Equity/Total Assets
  • X39 - Long-term fund suitability ratio (A): (Long-term Liability+Equity)/Fixed Assets
  • X40 - Borrowing dependency: Cost of Interest-bearing Debt
  • X41 - Contingent liabilities/Net worth: Contingent Liability/Equity
  • X42 - Operating profit/Paid-in capital: Operating Income/Capital
  • X43 - Net profit before tax/Paid-in capital: Pretax Income/Capital
  • X44 - Inventory and accounts receivable/Net value: (Inventory+Accounts Receivables)/Equity
  • X45 - Total Asset Turnover
  • X46 - Accounts Receivable Turnover
  • X47 - Average Collection Days: Days Receivable Outstanding
  • X48 - Inventory Turnover Rate (times)
  • X49 - Fixed Assets Turnover Frequency
  • X50 - Net Worth Turnover Rate (times): Equity Turnover
  • X51 - Revenue per person: Sales Per Employee
  • X52 - Operating profit per person: Operation Income Per Employee
  • X53 - Allocation rate per person: Fixed Assets Per Employee
  • X54 - Working Capital to Total Assets
  • X55 - Quick Assets/Total Assets
  • X56 - Current Assets/Total Assets
  • X57 - Cash/Total Assets
  • X58 - Quick Assets/Current Liability
  • X59 - Cash/Current Liability
  • X60 - Current Liability to Assets
  • X61 - Operating Funds to Liability
  • X62 - Inventory/Working Capital
  • X63 - Inventory/Current Liability
  • X64 - Current Liabilities/Liability
  • X65 - Working Capital/Equity
  • X66 - Current Liabilities/Equity
  • X67 - Long-term Liability to Current Assets
  • X68 - Retained Earnings to Total Assets
  • X69 - Total income/Total expense
  • X70 - Total expense/Assets
  • X71 - Current Asset Turnover Rate: Current Assets to Sales
  • X72 - Quick Asset Turnover Rate: Quick Assets to Sales
  • X73 - Working capitcal Turnover Rate: Working Capital to Sales
  • X74 - Cash Turnover Rate: Cash to Sales
  • X75 - Cash Flow to Sales
  • X76 - Fixed Assets to Assets
  • X77 - Current Liability to Liability
  • X78 - Current Liability to Equity
  • X79 - Equity to Long-term Liability
  • X80 - Cash Flow to Total Assets
  • X81 - Cash Flow to Liability
  • X82 - CFO to Assets
  • X83 - Cash Flow to Equity
  • X84 - Current Liability to Current Assets
  • X85 - Liability-Assets Flag: 1 if Total Liability exceeds Total Assets, 0 otherwise
  • X86 - Net Income to Total Assets
  • X87 - Total assets to GNP price
  • X88 - No-credit Interval
  • X89 - Gross Profit to Sales
  • X90 - Net Income to Stockholder's Equity
  • X91 - Liability to Equity
  • X92 - Degree of Financial Leverage (DFL)
  • X93 - Interest Coverage Ratio (Interest expense to EBIT)
  • X94 - Net Income Flag: 1 if Net Income is Negative for the last two years, 0 otherwise
  • X95 - Equity to Liability

ih_project7_ml_algorithms's People

Contributors

tatdef avatar axele-mahalin avatar katrinajmd avatar

Stargazers

 avatar

Watchers

 avatar

Forkers

katrinajmd

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.