GithubHelp home page GithubHelp logo

cs777-term-paper's Introduction

1. Set up environment

1.1 python

pwd
python3 --version
python 3 -m venv py_env
source py_env/bin/activate

1.2 airflow

1.2.1

pip install 'apache-airflow==2.8.3' \
 --constraint "https://raw.githubusercontent.com/apache/airflow/constraints-2.8.3/constraints-3.8.txt"
curl -LfO 'https://airflow.apache.org/docs/apache-airflow/2.8.3/docker-compose.yaml'
mkdir -p ./dags ./logs ./plugins ./config
docker compose up airflow-init

1.2.2

in docker-cmopose.yaml replace image:

and add ports : 5432:5432 to map container's internal port 5432 to port 5432 on the host machine.

# image: ${AIRFLOW_IMAGE_NAME:-apache/airflow:2.8.3}
image: ${AIRFLOW_IMAGE_NAME:-extending_airflow:latest}

  postgres:
    image: postgres:13
    environment:
      POSTGRES_USER: airflow
      POSTGRES_PASSWORD: airflow
      POSTGRES_DB: airflow
    volumes:
      - postgres-db-volume:/var/lib/postgresql/data
    ports:
      - 5432:5432
    healthcheck:
      test: [ "CMD", "pg_isready", "-U", "airflow" ]
      interval: 10s
      retries: 5
      start_period: 5s
    restart: always

dockerfile

FROM apache/airflow:2.8.3
COPY requirements.txt /requirements.txt
RUN pip install --user --upgrade pip
RUN pip install --no-cache-dir --user -r /requirements.txt

requirements.txt

pandas==2.0.3
yfinance==0.2.31
psycopg2-binary
scikit-learn==1.3.2
numpy==1.23.5
keras==2.13.1
tensorflow==2.13.1
docker build . --tag extending_airflow:latest

This command builds a Docker image named extending_airflow and tags it as latest. Specifically, it searches for a file named Dockerfile in the current directory and follows the instructions provided in that file to build the image. During the build process, each command in the Dockerfile, such as FROM, COPY, RUN, etc., is executed to create the image with the required environment and application. Once the build is complete, the resulting image is labeled with the specified tag, which can be latest to indicate it as the most recent version.

1.2.3

docker compose up -d

Image 1

docker ps

Image 2

1.3 minio

mkdir -p ~/minio/data

docker run \
   -p 9000:9000 \
   -p 9001:9001 \
   --name minio \
   -v ~/minio/data:/data \
   -e "MINIO_ROOT_USER=ROOTNAME" \
   -e "MINIO_ROOT_PASSWORD=CHANGEME123" \
   quay.io/minio/minio server /data --console-address ":9001"

Image 3

1.4 create connection using Airflow UI

Image 6

1.4.1 postgres connection

1.4.2 minio connection

2. Run

Go to http://localhost:8080/ Airflow UI and DAGs.

Image 5

Then run it.

3. DAG

Image 4

task1 >> task2 >> model_sensor >> [task3, task4, task5] >> task6

task1 - query_stock_info:

Connect to local database -> create table if not exist -> get and save stock data in local postgres database(table name: stock_info)

task2 - processed_stock_data:

Save the projection of column ['Stock_Name', 'Date', 'Open', 'Close'] to table 'processed_stock_data'

model_sensor - check_models_in_minio:

check if the models in minio, if not wait until it is saved in minio. (Timeout, poke_interval)

task3, 4, 5 - 3 models:

Load models from minio

Each one will give the results 1 or 0, 1 means good stock, 0 means bad

task6 - make_investment_decision:

Make investment decision based on Task 3,4,5

Only if the prediction results of more than or equal to two models are good stocks, the final result is a good stock.

4. Dataset and Results

4.1 dataset

Image 9

Our program will download the specified stocks dataset from Yahoo. Here is a screen shot of dataset:

stock_name: Stock name in Yahoo, apple -> AAPL, Nvdia -> NVDA

date: date of the stock

open: open price is the price at which a particular stock starts trading when the stock market opens for the day

high: the highest price of the stock in a day

low: the lowest price of the stock in a day

close: close price is the final price at which a particular stock is traded on a given trading day

volume: the total number of shares or contracts traded

adj_close: adjusted close price is the final price adjusted before the next trading day

short_ma: short-term moving average of a financial metric

long_ma: long-term moving average of a financial metric

4.2 Results

SELECT psd.date,
       psd.stock_name,
       psd.open,
       psd.close,
       sp.prediction,
       (psd.close - psd.open) AS earning
FROM processed_stock_data psd
JOIN stock_prediction sp ON psd.date = sp.date AND psd.stock_name = sp.stock_name
WHERE psd.stock_name = 'NVDA';

You can use this SQL to see your stock results.

Below are some examples of our results with stock NVDA,

4.2.1 NVDA

Image 10

3 correct, 6 wrong predictions

4.2.2 GOOGL

Image 11

6 correct, 3 wrong predictions

4.2.3 AMZN

Image 12

5 correct, 4 wrong predictions

5. Run on Google Cloud

5.1 create composer(airflow on GC)

https://console.cloud.google.com/composer/

Image 13

Create composer2.

Image 14

Image 15

You can add python packages using PYPI. Then composer will download for you automatically.

5.2 create postgres database

Image 16

You need to add your airflow ip to authorized networks here.

Image 17

Image 18

And create connection in Airflow UI using the postgres database public IP address.

5.3 Google Cloud Storage

Image 19

Here allow public access to the bucket and save the models in the bucket.

gcs_hook = GoogleCloudStorageHook(google_cloud_conn_id)
# First download .h5 model file from minio and then load model
model_stream = gcs_hook.download(bucket_name=bucket_name, object_name=model_key, filename=f'/tmp/{model_key}')

Because we use GoogleCloudStorageHook to download the model.

5.4 Composer Dashboard

Logs:

Image 20

Dags:

Image 21

Monitor:

Image 22

cs777-term-paper's People

Contributors

stanford997 avatar cnguyen1808 avatar

Watchers

 avatar

Forkers

cnguyen1808

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.