code_copilot_web's Introduction

code_copilot_web

In this work, I use python code dataset from huggingface course https://huggingface.co/course/chapter7/6?fw=pt which consist of "pandas", "sklearn", "matplotlib",

"seaborn" .

Since the dataset is too large, I split a little data from all dataset.In python code, There are space and \n too, So we have 2 option.

First, we may write code to add \n and space to result if they find ':' but it still hard. Second, just tokenize \n and space which spacy seem can do that.

This time, I try use \n and space token to train model ( just to see if it can learn about spacing and new line too ).

review

I write code if push spacebar or enter user text will send to model and give list of predict output to rightside.

In summary

i think using spacy en_core_web_sm may not be a good idea because vocab size will be a lot bigger, or maybe we have to work on preprocess more like clean dataset ..
we may clean data first to improve model.
maybe i should save model at last epoch to compare with best weight.
I can't upload weight model because it too large

Recommend Projects

iforgeti / code_copilot_web Goto Github PK

code_copilot_web's Introduction

code_copilot_web

review

code_copilot_web's People

Contributors

Watchers

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs