GithubHelp home page GithubHelp logo

comments-in-korean_dataset's Introduction

๐Ÿ“œ Korean community comment Dataset

DC์ธ์‚ฌ์ด๋“œ์˜ 15000์—ฌ๊ฐœ์˜ ๋Œ“๊ธ€๋ฐ์ดํ„ฐ์…‹
15,000 comment data parsed in the Korean community
Project Date ๐Ÿ“† 2020-06-20

label_g

์ „์ฒด ๋ฐ์ดํ„ฐ์ค‘ 18%์ •๋„์˜ ์•…์„ฑ๋Œ“๊ธ€์„ ๊ฐ€์ง€๊ณ  ์žˆ๋Š” ๋ฐ์ดํ„ฐ์…‹ ์ž…๋‹ˆ๋‹ค(2-way ๊ธฐ์ค€)

1.Dataset_Class

Class Description
Text ์›๋ฌธ ํ…์ŠคํŠธ์ž…๋‹ˆ๋‹ค
Malignant index ์•…์„ฑ์ง€์ˆ˜ ์ž…๋‹ˆ๋‹ค 0~2์˜ ๊ฐ’์œผ๋กœ ๋ถ€์—ฌ๋˜์–ด ์žˆ์Šต๋‹ˆ๋‹ค

1-1.Malignant index

Malignant index Description
0 ๊ฒŒ์‹œ๋˜์–ด๋„ ์ „ํ˜€ ๋ฌธ์ œ๊ฐ€ ์—†๋Š” ๋Œ“๊ธ€ ์ž…๋‹ˆ๋‹ค.
1 ๋น„์†์–ด๋ฅผ ์‚ฌ์šฉํ•˜์ง€ ์•Š์•˜์ง€๋งŒ ์•…์„ฑ๋Œ“๊ธ€์ด๋ผ ํŒ๋‹จํ•˜๊ธฐ์— ๋ถ€์กฑํ•จ์ด ์—†๋Š” ๋Œ“๊ธ€ ์ž…๋‹ˆ๋‹ค.
2 ๋น„์†์–ด๋ฅผ ์‚ฌ์šฉํ•˜๊ณ  ๋ช…๋ฐฑํ•˜๊ฒŒ ์•…์„ฑ๋Œ“๊ธ€์ด๋ผ ํŒ๋‹จ์ด ๊ฐ€๋Šฅํ•œ ๋Œ“๊ธ€์ž…๋‹ˆ๋‹ค

3-way Classification๋กœ ์ž‘์„ฑ๋˜์–ด ์žˆ์ง€๋งŒ ํ•™์Šต๊ฒฐ๊ณผ Binary Classification ํ˜•ํƒœ๋กœ ๋ฐ์ดํ„ฐ์…‹์„ ์‚ฌ์šฉํ•˜๋ฉด ์ •ํ™•๋„๊ฐ€ ์ž˜ ๋‚˜์˜ค๊ธฐ ๋•Œ๋ฌธ์— ์•„๋ž˜์—์„œ ์„ค๋ช…ํ•˜๋Š” ํ•จ์ˆ˜๋กœ ๋ฐ์ดํ„ฐ๋ฅผ ์žฌ๊ฐ€๊ณตํ•˜์—ฌ ์‚ฌ์šฉํ•˜๋Š”๊ฒƒ์„ ์ถ”์ฒœ๋“œ๋ฆฝ๋‹ˆ๋‹ค

1-2.Rework malicious index(3-way -> 2-way)

malicious index๋ฅผ Binary Classification ํ˜•ํƒœ๋กœ ๋ณ€๊ฒฝํ•˜๋Š” ๊ธฐ์ค€์€ ๋‘๊ฐ€์ง€์ž…๋‹ˆ๋‹ค.

Low level (malicious index Value Change 1 -> 0, 2 -> 1)

def Row_rework_label(data): #Binary Classification (Low level)  
count = 0
    for i in data:
        if(i==2):
            data[count] = 1
        elif(i==1):
            data[count] = 0
        count = count+1
    return data

malicious index๊ฐ€ 1์ธ ๊ฒฝ์šฐ์—” 0์œผ๋กœ ์ˆ˜์ •ํ•˜๋Š”, ๋‚ฎ์€ ์—„๊ฒฉ๋„๋ฅผ ๊ฐ€์ง€๋Š” ์ˆ˜์ •๋ฐฉ๋ฒ•์ž…๋‹ˆ๋‹ค

High level (malicious index Value Change 1 -> 0, 2 -> 1)

def High_rework_label(data): #Binary Classification ํ†ตํ•ฉ (high level) 
count = 0
    for i in data:
        if(i==2):
            data[count] = 1
        count = count+1
    return data

malicious index๊ฐ€ 1์ธ ๊ฒฝ์šฐ์—” 0์œผ๋กœ ์ˆ˜์ •ํ•˜๋Š”, ๋†’์€ ์—„๊ฒฉ๋„๋ฅผ ๊ฐ€์ง€๋Š” ์ˆ˜์ •๋ฐฉ๋ฒ•์ž…๋‹ˆ๋‹ค

Dataset Load and Rework malicious index

dataset_csv = pd.read_csv('DCcomment.csv', names=['Text', 'label'])
X, Y = dataset_csv['Text'].values, dataset_csv['label'].values
#Y = High_rework_label(Y)
#Y = Row_rework_label(Y)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.