josepablocam / changestructor Goto Github PK

View Code? Open in Web Editor NEW

0.0 0.0 0.0 120 KB

Demo repo

License: MIT License

Makefile 0.20% Shell 5.95% Python 92.79% Dockerfile 1.06%

changestructor's People

Contributors

Watchers

changestructor's Issues

Empirically validate ranking loss argument

Our question proposal is based on a ranking loss, where we want to make the dialogue for a given change more similar to that change versus to randomly sampled historical changes.

Can we show this empirically?

If we take a look at high quality versus low quality repositories, is it true that commit messages have this property? Can we compare between these two repository quality groups? @citostyle

Question templates

What are good question templates for commit logs? Can we mine them from existing high-quality repositories? If not systematically, then at least manually/qualitatively?

We currently have very simple templates (https://github.com/josepablocam/changestructor/blob/master/chg/annotator/template_annotator.py#L129)

@citostyle

add --dev flag

so that we don't have to run git add (just look at changes), no commit, and no writes to DB. Exclusively for ease of development

Review embedding/question ranking

I've implemented some basic embedding and question ranking functionality.

Embedding

https://github.com/josepablocam/changestructor/blob/master/chg/embed/basic.py
uses Microsoft's CodeBERT model to embed both code changes (BasicEmbedder.embed_code), stand-alone natural language questions, (BasicEmbedder.embed_nl) and dialogue (i.e. a sequence of question and answers) (`BasicEmbedder.embed_dialogue).

CodeBERT has a maximum input length (in terms of tokens), so to handle arbitrary length inputs we chunk up the input into maximum size parts, embed each of them separately, and then average those embeddings.

To produce an embedding for a chunk, we take the hidden state after each token in a sequence is consumed and we average those hidden states dimension-wise.

Question Ranking

https://github.com/josepablocam/changestructor/blob/master/chg/ranker/model_based_ranking.py

Given a dialogue, code changes, and a set of randomly sampled code changes, we compute the following ranking loss
(https://github.com/josepablocam/changestructor/blob/master/chg/ranker/model_based_ranking.py#L73)

code_vec <- embed_code(code_changes)
diag_vec <- embed_dialogue(dialogue)
# negative code in that this dialogue is *not* associated with these changes
neg_code_vecs <- [embed_code(neg_code) for neg_code in random_sample_code_changes]
# compute for each neg_code_vec
max(0, delta - sim(code_vec, diag_vec) + sim(neg_code_vec), diag_vec)
# and then average those individual losses to get a mean loss

We will refer to this as our ranking loss. The intuition is we want the dialogue associated with our current code change to be more similar to the current code change than a set of randomly sampled negative changes

We implement a simple model to predict this loss, given the code change, the dialogue history for that code change up to that point, and a set of randomly sampled code changes.

https://github.com/josepablocam/changestructor/blob/master/chg/ranker/model_based_ranking.py#L21

This model is based on using random forests and predicts the ranking loss as a function of the feature vector x defined below

code_vec <- embed_code(code_changes)
hist_vec <- embed_dialogue(dialogue_up_to_now)
neg_code_vecs <- [embed_code(neg_code) for neg_code in random_sample_code_changes]

https://github.com/josepablocam/changestructor/blob/master/chg/ranker/model_based_ranking.py#L118
context_vec <- concat(code_vec, hist_vec, neg_code_vecs)

question_vec <- embed_nl(proposed_question)
x <- concat(context_vec, question_vec)

To choose a question, we pick the question in our candidate collection that maximizes the expected improvement over the current ranking loss.
https://github.com/josepablocam/changestructor/blob/master/chg/ranker/model_based_ranking.py#L31

The model accumulates user's dialogue, and uses that to refit the regression model (https://github.com/josepablocam/changestructor/blob/master/chg/ranker/model_based_ranking.py#L194). We also fit an initial version offline using the existing git history (https://github.com/josepablocam/changestructor/blob/master/chg/ranker/model_based_ranking.py#L213)

Random musings

It feels like this problem has some related connections to both RL and Bayesian optimization. For the former, we have a stateful sequential process, where the questions and answers affect an eventual final ranking loss. For the latter, we have an "expensive function" to evaluate (i.e. the user's time writing an answer to the question). I would love to see if there is a better setup than what I've proposed in this implementation -- but I don't know enough about either area to see it myself, so very open to reformulations/reimplementations

run on csail machine

currently can only run using docker, should work normally. I think AFS may be part of the issue??

Populating templates with relevant code change/dialogue entities

To produce questions, we append "entities" to the templates we currently have. These entities are e.g. variable names, function names, in relevant contexts in the code change or extracted from the user's prior dialogue answers.

We should extend/refine these:

https://github.com/josepablocam/changestructor/blob/master/chg/analysis/py_analysis.py does some very lightweight program analysis to extract some of these. This can be refined, add things like call graphs, to identify other entities. We may also want to implement something similar for at least one more language (e.g. Java could be a good idea)

We also need to filter out entities that don't make sense (e.g. simple pronouns)

@citostyle

Structuring ideas for paper (and proper implementation)

Templatized question generation
Templates are populated with:
- NLP: answer analysis (simple POS/entity extraction)
- Program analysis:
  - Intraprocedural: inspecting changes in code to identify variables/functions and enclosing scopes
  - Interprocedural (TODO): extract static CFG and perform graph differencing, identify names that change (https://arxiv.org/abs/2103.00587)
Dynamic question ranking:
- Online question ranking based on expected improvement in similarity/dissimilarity of NL description of code changes
Infrastructure implemented:
- Annotation
- Search (and describe ideas behind search)

Evaluating changestructor

Proposed a user study:
Take a large/serious SE project, and a given code change in that project.
Group A1: annotates the change using changestructor's dialogue system
Group A2: annotates the change using changestructor but with randomized proposed question order
Separately we automatically generate a commit message using an existing generation system

First eval: qualitative experience of A1 vs A2, qualitative characterization of answers (e.g. length, time etc)

Group B1: review automated message vs changestructor dialogue, answer understandability/quality questions

Group B2: review automated message vs changestructor random order, answer understandability/quality questions

Group B3: changestructor vs changestructor random order, answer understandability/quality questions

There are a ton of questions to answer here and properly structure, but this is the gist of what we thought about. We also probably need to refine/limit and be realistic about time expectations for participants etc. @citostyle

josepablocam / changestructor Goto Github PK

changestructor's People

Contributors

Watchers

changestructor's Issues

Empirically validate ranking loss argument

Question templates

add --dev flag

Review embedding/question ranking

Embedding

Question Ranking

Random musings

run on csail machine

Populating templates with relevant code change/dialogue entities

Structuring ideas for paper (and proper implementation)

Evaluating changestructor

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs