GithubHelp home page GithubHelp logo

dl_code_completion_paper's People

Contributors

agfeather avatar masuhar avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

dl_code_completion_paper's Issues

typo

  • a. we hereafter call predictions -> we hereafter call them predictions
  • b. 4.3 NT2V -> TT2V

POPL'19論文との違い (査読者B)

The following recent paper proposes a
similar technique for representing AST.
Uri Alon, Meital Zilberstein, Omer Levy, and Eran Yahav. 2019.
code2vec: learning distributed representations of code. Proc. ACM
Program. Lang. 3, POPL, Article 40 (January 2019), 29 pages. DOI:
https://doi.org/10.1145/3290353
Please add discussions comparing ASTToken2Vec and this study for
differences, potential issues, and so on.

図1の位置付け、説明を改善

d. I do not think that Figure 1 shows ASTToken2Vec LSTM integration
model. A current token is always given from the code repository?
In real usage, test set is always needed? This figure might show
the model and experimental design. I recommend revise it.

図4の説明を追加

c. The explanation of Figure 4 must be added. Which module is an
input or an output layer?

単語の類似性についての議論、AST構造情報は既存研究

(Moreover, the use case shown in Figure 8 gave me a hint that
denotes the characteristics of two field “pageX” and “pageY”.
However, I do not figure out something behind the idea of dealing
with these types distinctively.)

  • Why do the authors consider information on AST tokens into the construction of the LSTM models is used in order to predict a next AST token?
  • What is the similarity of tokens in the context in which code completion in conventional LSTM models and the proposed new one?
  • Is it quite hard to discuss this on deep-leaning based code completion?
  • Additionally, the paper should present a motivation the authors have and an example complementing it in an early section (before describing the details of the models).

照会 (第 1 回)

From: Daisaku Yokoyama [email protected]
Subject: 論文原稿 19-CONF-05 についての照会 (第 1 回)
To: [email protected]
Cc: YOKOYAMADAISAKU [email protected]
Date: Tue, 12 Nov 2019 22:33:27 +0900 (3 days, 15 hours, 35 minutes ago)

Dongfang Li様、Hidehiko Masuhara様、

拝啓時下ますますご健勝のこととお慶び申しあげます。

さて、このたび、日本ソフトウェア科学会「コンピュータソフトウェア」にご
投稿いただきました論文

論文番号:19-CONF-05
著者:Dongfang Li  Hidehiko Masuhara 
論文題目:ASTToken2Vec: an Embedding Method for Neural Code Completion

の査読者から、照会後判定との中間報告がありました。つきましては、添付の
照会事項について、

3箇月後(2020 年 02 月 11 日)

までに、下記担当編集委員あてにご回答いただきたいと存じます。

論文原稿を改訂した場合には改訂版最新原稿(変更箇所を回答書に明記)を、
改訂のない場合には投稿時の原稿をあわせてお送り願います。回答書をプレイ
ンテキストまたは PDF 形式で、また論文原稿を PDF 形式でお送りください。

敬具

2019 年 11 月 12 日

日本ソフトウェア科学会編集委員
横山大作
明治大学理工学部情報科学科
[email protected]

========================================

査読者A

1 回目査読結果
判定:照会

Summary:
This paper proposes ASTTokenVec that exploits information on types
of AST token (terminal and non-terminal) to improve the success rate
of a LSTM-based code completion. The experimental results with
150,000 JavaScript files show that the proposed implementation
embedding ASTTokenVec is superior to that not embedding it.

Evaluation:
I found this paper interesting. It has a scientific contribution
that demonstrates the improvement of existing LSTM-based code
completion. The idea of introducing AST token types is feasible.

With respect to the following two aspects, solid answers and
revising based on the answers are required for the acceptance
of the paper.

  1. Although the paper occupies much space for describing the proposed
    model about ASTToken2Vec, it is not easy for me (and many readers)
    to understand. I know that an AST consists of terminal and
    non-terminal
    tokens. Moreover, the use case shown in Figure 8 gave me a hint that
    denotes the characteristics of two field “pageX” and “pageY”.
    However, I do not figure out something behind the idea of dealing (→ #6)
    with these types distinctively. Why do the authors consider
    information on AST tokens into the construction of the LSTM models
    is used in order to predict a next AST token? What is the similarity
    of tokens in the context in which code completion in conventional
    LSTM models and the proposed new one? Is it quite hard to discuss
    this on deep-leaning based code completion? At least, I think that
    the paper should provide furthermore discussion on the experimental
    results. Additionally, the paper should present a motivation the
    authors have and an example complementing it in an early section
    (before describing the details of the models).

  2. I wonder if the experimental results truly show the improvement (→ #7)
    of the prediction accuracy. Surely, the accuracy of the proposed
    method is 1.5%-point or 3.1%-point better. Does this always claim?
    Do the results depend on the values of several parameters that the
    authors decided? The section of “Threats to Validity” must provide
    enough information on factors that affect th experimental results.

Minor comments:
a. we hereafter call predictions -> we hereafter call them predictions (#8)
b. 4.3 NT2V -> TT2V (#8)
c. The explanation of Figure 4 must be added. Which module is an
input or an output layer? (#9)
d. I do not think that Figure 1 shows ASTToken2Vec LSTM integration (#10)
model. A current token is always given from the code repository?
In real usage, test set is always needed? This figure might show
the model and experimental design. I recommend revise it.
e. The Japanese title might not denote the contents of the paper. (#11)

========================================

査読者B

1 回目査読結果
判定:採録(採録条件コメントあり)

This paper proposes ASTToken2Vec, a technique for neural network based
code completion with a vector representation of program tokens in
abstract syntax trees. Experimental comparison with 150,000 JavaScript
program files shows that ASTToken2Vec outperforms the baseline of a
LSTM model.

Strengths:

  • The proposal is technically reasonable and sound.
  • The experiment is conducted with a publicly available dataset.
  • The proposal is applied to a large-scale data.
  • The paper is easy to read.

Weaknesses:

  • The performance improvement is not quite big.
  • From the current evaluation, it is not clear how each structure, (#12)
    such as NT2V and TT2V, contribute to the improvement of the final
    performance.

Considering the above strengths, I consider the paper has enough
contributions to the community with technical originality,
effectiveness, and readability.

《採録条件コメント》
I have one comment for revision. The following recent paper proposes a (#13)
similar technique for representing AST.
Uri Alon, Meital Zilberstein, Omer Levy, and Eran Yahav. 2019.
code2vec: learning distributed representations of code. Proc. ACM
Program. Lang. 3, POPL, Article 40 (January 2019), 29 pages. DOI:
https://doi.org/10.1145/3290353
Please add discussions comparing ASTToken2Vec and this study for
differences, potential issues, and so on.

《その他論文改善のための参考意見》
Minor: Fig 6 and 7 are too small and difficult to read. please modify (#14)
these figures.

===============

実験結果の信頼性 (査読者A第2点)

I wonder if the experimental results truly show the improvement
of the prediction accuracy. Surely, the accuracy of the proposed
method is 1.5%-point or 3.1%-point better. Does this always claim?
Do the results depend on the values of several parameters that the
authors decided? The section of “Threats to Validity” must provide
enough information on factors that affect th experimental results.

commenst by Paul

  • A roadmap for the paper is welcome. For me, It was not easy to know start the proposal of this paper.
  • A section without an introduction (eg. “Experiments [NO TEXT] 6. 1 Dataset”) confuses me (and loss motivation) because I cannot know why I am reading that.
  • Figure 3. Some texts are vertical and other horizontal; it looks strange.
  • Gray scale for all figure may be welcome for “poor reviewers without color printer”
  • I would recommend to change the title “2 Background” to another more specific.
  • The title footnote (I guess) has a Japanese text.
  • “Tung et al. [9] extends” -> without “s”?
  • Use top ([t]) for figures?
  • Should Section 3 be a subsection in Section 2?

change style file for journal submission

Currently the manuscript is formatted for the proceedings of the conference, which should be changed to the journal style. It can be done by just changing an option.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.