GithubHelp home page GithubHelp logo

mlf's Introduction

Machine Learning for Finance (FN 570) 2023-24 Module 3 (Spring 2024)

Announcements

  • The WeChat group will be created by TA. (No 1-to-1 chat please.)
  • Email is the preferred method of communication. The class mailing list will be created as [email protected].

Course Resources

Lectures:

No Date Contents
01 2.20 Tue Course overview (Syllabus) | Required software (Python, Github, PyCharm) | Python crash course (Basic, Numpy (Notebook Shorcut Keys), Pandas. Also see Datacamp, CheatSheet)
02 2.23 Fri PML Ch. 1: Intro (Slides) | Notations, Regression, Weight update (Slides)
03 2.27 Tue PML Ch. 2: Perceptron, Adaline, Gradient descent, Stochastic Gradient Descent
04 3.01 Fri PML Ch. 3: Logistic Regression (LR) (Slides) and Support Vector Machine (SVM) (Slides)
05 3.05 Tue PML Ch. 3: KNN (Slides, Code), Decision Tree (Slides).
06 3.08 Fri PML Ch. 4: Data Preprocessing, PML Ch. 5: SVD/PCA (Slides)
07 3.12 Tue PML Ch. 5: LDA (Slides), PML Ch. 6: Bias-Variance, Cross-validation (Slides)
08 3.15 Fri PML Ch. 6: Hyperparameter tuning, Evaluation Metric, Class imbalance (Slides)
09 3.19 Tue PML Ch. 7: Ensenble Learning (Slides), Kernel Method (Slides, PML Ch 3, 5)
14 3.22 Fri HSBC Guest Lecture [1/4]: Overview and data introduction.
10 3.26 Tue PML Ch. 8: Sentiment Analysis (Slides)
11 3.29 Fri Topics in Finance ML: Recession prediction (Slides), ML in Finance Research (Slides), Collaborative Filtering (Slides)
12 4.02 Tue Neural Network, Deep Learning, CNN (Slides, PML Ch. 12-15)
13 4.07 Sun Midterm Exam (In Class)
15 4.09 Tue HSBC Guest Lecture [2/4]
16 4.12 Fri HSBC Guest Lecture [3/4]
17 4.16 Tue HSBC Guest Lecture [4/4]
18 4.19 Fri Course Project Presentation (may be scheduled later)

Homeworks:

  • Set 0: [Required Software] [Due by Friday]

    • Register on Github.com and let TA know your ID. Make sure to use your full real name in your profile. Accept TA's invitation to the PHBS organization.
      • Create a designated repository GITHUB_ID/PHBS_MLF_2023 for your HW and project. Tick Initialize this repository with a README and select python under .gitignore
      • Fork PML repository to your repository.
    • Install Github Desktop. Then clone the PML repository to your local storage.
    • Install Anaconda Python distribution (3.X version, not 2.X version). Anaconda distribution is core Python + useful scientific computation libraries (e.g., numpy, scipy, pandas) + package management system (pip or conda)
    • Install PyCharm Community version. (Or Professional version after applying for free student license)
    • Send to TA the screenshots of (1) Github Desktop (showing the PML repository) (2) Jupyter Notebook (Anaconda) (3) PyCharm (See my examples: Github Desktop, Anaconda Spyder).
  • Set 1: [Classifiers] [Due by 3.19 Tues]

    • The goal of this HW is to be familiar with the basic classifiers PML Ch 3.
    • For this HW, we will use Give Me Some Credit on Kaggle. You may download it from the Kaggle link or CMS.
    • Load cs-training.csv into a Pandas dataframe.
    • Fill-in the missing values (nan) with the column means. (Use pd.fillna() or See Ch 4 of PML)
    • Select the 2 most important features using LogisticRegression with L1 penalty. (Adjust C until you see 2 features)
    • Using the 2 selected features, apply LR / SVM / decision tree. Try your own hyperparameters (C, gamma, tree depth, etc) to maximize the prediction accuracy. (Just try several values. You don't need to show your answer is the maximum.)
    • Visualize your classifiers using the plot_decision_regions function from PML Ch. 3
    • Put your result in YOUR_GITHUB_ID/Give-Me-Some-Credit/code/Classifiers.ipynb
  • Set 2: [PCA/Hyperparameter/CV] [Due by 3.29 Fri]

    • The goal of this HW is to be familiar with PCA (feature extraction), grid search, pipeline, k-fold CV.
    • For this HW, we continue to use Give Me Some Credit on Kaggle.
    • Extract a few (>2) features using PCA method.
    • Using the selected features from above, we are going to apply LR / SVM / decision tree (or any other algorithm).
    • Implement the methods using pipeline. (PML p185)
    • Use grid search for finding optimal hyperparameters. (PML p199). In the search, apply 5-fold cross-validation.

Syllabus

Classes:

  • Lectures: Tuesday & Friday 1:30 – 3:20 PM
  • Venue: PHBS Building, Room 313

Instructor: Jaehyuk Choi

  • Office: PHBS Building, Room 755
  • Phone: 86-755-2603-0568
  • Email: [email protected]
  • Office Hour: Monday 7-9 PM

Teaching Assistance: 苏南 (Nan SU)

Course overview

With the advent of computation power and big data, machine learning (ML) recently became one of the most spotlighted research fields in industry and academia. This course provides a broad introduction to ML in theoretical and practical perspectives. Through this course, students will learn the intuition and implementation behind the popular ML methods and gain hands-on experience in using ML software packages such as SK-learn and Tensorflow. This course will also explore the possibility of applying ML to finance and business. Each student is required to complete a final course project. This year, the compliance analytics team in HSBC bank (Gunagzhou) will give 4 guest lectures to demonstrate how ML is developed and shared in banking industry.

Prerequisites

This course assumes prior knowkedge in probability/statistics and experience in Python. This course is ideally recommended for those who have taken introductory ML/AI courses from an undergraduate program.

Textbooks and Reading Materials

Primary textbook

  • PML (primary textbook): Python Machine Learning 3rd Ed. by Sebastian Raschka.

Other books and online courses

Assessment / Grading Details

  • Attendance 20%, Mid-term exam 30%, Assignments 20%, Course Project 30%
  • Attendance: Randomly checked. The score is calculated as 20 – 2x(#of absence). Leave requests should be made 24 hours before with supporting documents, except for emergencies. Job interview/internship cannot be a valid reason for leave.
  • Mid-term exam: 11.1 Mon. In-class open-book without computer/phone/calculator
  • Course project: Data Proposal and Presentation. Group of up to ?? people.
  • Grade in letters (e.g., A+, A-, ... ,D+, D, F). A- or above < 30% and B- or below > 10%.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.