GithubHelp home page GithubHelp logo

lecturebank's Introduction

LectureBank: a corpus for NLP Education and Prerequisite Chain Learning

This is the github page for our paper What Should I Learn First: Introducing LectureBank for NLP Education and Prerequisite Chain Learning in the proceedings AAAI 2019.

Code for replicating the results will be uploaded soon. Stay tuned.

An example of prerequisite relations from lecture slides depicted as a directed graph

The list of descriptions can be found in the following:

LectureBank Dataset

LectureBank Dataset is a manually-collected dataset of lecture slides. We collected 1352 online lecture files from 60 courses covering 5 different domains, including Natural Language Processing (nlp), Machine Learning (ml), Artificial Intelligence (ai), Deep Learning (dl) and Information Retrieval (ir). In addition, we release the corresponding annotations for each slide file to the taxonomy described below. We also provide an additional vocabulary list of size 1221 extracted from the corpus.

lecturebank.tsv

Each line identifies a lecture file. Format:

(ID, Title, URL, Topic_ID, Year, Author, Domain, Venue)

  • ID: Id of each line.
  • Title: File tile.
  • URL: Online URL.
  • Topic_ID: Classified taxonomy Topic ID, referring topics from taxonomy.tsv.
  • Year: Year of the course.
  • Author: The author name(s).
  • Domain: The domain (nlp, ir, dl, ml, ai).
  • Venue: Name of the university, or GitHub.

download_all.py

The scripts of downloading the resources from the urls of lecturebank.tsv. After running the scripts, all the resources will be downloaded into data_lecturebank/ folder (change the base_path if you want), organized by the Domain (for example, nlp, ir). The code is in python3, and you will need to install wget to run it. Run with: python3 download_all.py. It may take an hour or less for the resources to be downloaded.

Due to the change of the links by the owner, some of the URLs may have broken.

taxonomy.tsv

Contains taxonomy topics and corresponding IDs referred by lecturebank.tsv.

208topics.csv

Contains the 208 topics which we annotated, format:

(ID, Topic, Wiki_Page_URL)

prerequisite_annotation.csv

Contains the prerequisite chain annotation for each possible pair from the 208 topics. Format:

(Source_Topic_ID, Target_Topic_ID, If_prerequisite)

vocabulary.txt

Contains 1221 vocabulary terms combined from taxonomy, 208 topics and terms extracted from LectureBank.

lecturebank's People

Contributors

irenezihuili avatar mistobaan avatar

Watchers

James Cloos avatar Allen avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.