delihiros / pseudogen Goto Github PK

View Code? Open in Web Editor NEW

149.0 13.0 40.0 18 KB

A tool to automatically generate pseudo-code from source code.

License: Apache License 2.0

Shell 32.67% Python 63.65% Dockerfile 3.67%

pseudogen's Introduction

Pseudogen

A tool to automatically generate pseudo-code from source code.

Demo

Installation

Using Docker

docker is all you need.

  docker attach `docker run -itd delihiros/pseudogen`
  /# cd pseudogen/data
  /# ../run-pseudogen.sh -f tune/travatar.ini

Requirements

Requires Python 3.5+

  apt install git libboost-all-dev autoconf automake autotools-dev libtool zlib1g-dev cmake build-essential python3 python3-pip wget -y
  pip3 install nltk

for Mac OS X users: GIZA++ is written for Linux, so you may need to do some modifications to install. http://catherinegasnier.blogspot.jp/2014/04/install-giza-107-on-mac-osx-1092.html

  git clone https://github.com/delihiros/pseudogen.git
  cd pseudogen
  ./tool_setup.sh

Usage

Download and extract corpus from annotated Django source code.

  mkdir data
  cd data
  wget -O- http://ahclab.naist.jp/pseudogen/en-django.tar.gz | tar zxvf -
  mv en-django/all.* .

  ../train-pseudogen.sh -p all.code -e all.anno
  ../run-pseudogen.sh -f tune/travatar.ini
  # input Python code you want to translate
  # in some environments, you may need to press Ctrl+D few times in order to start tranlating

How does Pseudogen work?

Papers

Tools Used

GIZA++ to make alignment
Travatar to train Tree-to-String machine translation model
mteval to evaluate

Contributors

pseudogen's People

Contributors

Stargazers

Watchers

pseudogen's Issues

Couldn't filter at /pseudogen/tools/travatar/script/mert/mert-travatar.pl line 90.

I get this error while building

Couldn't open travatar-model/model/travatar.ini
Exit code: 2
Couldn't filter at /pseudogen/tools/travatar/script/mert/mert-travatar.pl line 90.
The command '/bin/sh -c git clone https://github.com/delihiros/pseudogen.git && 	cd pseudogen && 	./tool_setup.sh && 	mkdir data && 	cd data && 	wget -O- http://ahclab.naist.jp/pseudogen/en-django.tar.gz | tar zxvf - && 	mv en-django/all.* . && 	../train-pseudogen.sh -p all.code -e all.anno' returned a non-zero code: 2

More logging following

tokenizing python ... 
tokenizing english ... 
parsing python ... 
head insertion ... 
simplifying ... 
making data ... 
making alignment ... 
../train-pseudogen.sh: 52: ../train-pseudogen.sh: /pseudogen/tools/pialign/src/bin/pialign: not found
making language model ... 
=== 1/5 Counting and sorting n-grams ===
Reading /pseudogen/data/train.entok
----5---10---15---20---25---30---35---40---45---50---55---60---65---70---75---80---85---90---95--100
****************************************************************************************************
Unigram tokens 240407 types 6844
=== 2/5 Calculating and sorting adjusted counts ===
Chain sizes: 1:82128 2:980730240 3:1838869248 4:2942190592 5:4290694912
Statistics:
1 6844 D1=0.456042 D2=1.32836 D3+=1.95408
2 39124 D1=0.752714 D2=1.33396 D3+=1.60661
3 74152 D1=0.823118 D2=1.3625 D3+=1.55919
4 100578 D1=0.877647 D2=1.41071 D3+=1.59675
5 117034 D1=0.769353 D2=1.29688 D3+=1.40612
Memory estimate for binary LM:
type      kB
probing 7243 assuming -p 1.5
probing 8523 assuming -r models -p 1.5
trie    3216 without quantization
trie    1668 assuming -q 8 -b 8 quantization 
trie    2965 assuming -a 22 array pointer compression
trie    1416 assuming -a 22 -q 8 -b 8 array pointer compression and quantization
=== 3/5 Calculating and sorting initial probabilities ===
Chain sizes: 1:82128 2:625984 3:1483040 4:2413872 5:3276952
=== 4/5 Calculating and writing order-interpolated probabilities ===
Chain sizes: 1:82128 2:625984 3:1483040 4:2413872 5:3276952
Chain sizes: 1:82128 2:625984 3:1483040 4:2413872 5:3276952
=== 5/5 Writing ARPA model ===
Name:lt-lmplz	VmPeak:10002084 kB	VmRSS:8088 kB	RSSMax:1755116 kB	user:0.779529	sys:0.834626	CPU:1.61416	real:1.41535
Reading lm/lm.arpa
----5---10---15---20---25---30---35---40---45---50---55---60---65---70---75---80---85---90---95--100
****************************************************************************************************
SUCCESS
training travatar ... 

Executing: mkdir travatar-model
(1) Preparing data @ Mon May 11 10:32:46 UTC 2020
Executing: mkdir -p travatar-model/data
Executing: /pseudogen/tools/travatar/src/bin/tree-converter -input_format penn -output_format word < train.reducedtree > travatar-model/data/src.word
Main arguments:
Optional arguments:
 -input_format 	penn
 -output_format 	word
 -split 	
 -compoundsplit 	
 -compoundsplit_filler 	
 -compoundsplit_threshold 	0.01
 -compoundsplit_minchar 	3
 -binarize 	none
 -case 	none
 -flatten 	false
 -debug 	0
Transforming trees (.=10,000, !=100,000 sentences)
.
(2) Creating alignments @ Mon May 11 10:32:47 UTC 2020
Executing: mkdir -p travatar-model/align
Executing: /pseudogen/tools/giza-pp/mkcls -c50 -n2 -ptrain.entok -Vtravatar-model/align/trg.vcb.classes opt
Executing: /pseudogen/tools/giza-pp/mkcls -c50 -n2 -ptravatar-model/data/src.word -Vtravatar-model/align/src.vcb.classes opt

***** 2 runs. (algorithm:TA)*****
;KategProblem:cats: 50   words: 7026

start-costs: MEAN: 1.57369e+06 (1.57369e+06-1.57369e+06)  SIGMA:0.702256   
  end-costs: MEAN: 1.44237e+06 (1.44218e+06-1.44255e+06)  SIGMA:185.301   
   start-pp: MEAN: 99.2817 (99.2812-99.2822)  SIGMA:0.000498938   
     end-pp: MEAN: 38.8095 (38.7581-38.8609)  SIGMA:0.0514388   
 iterations: MEAN: 192496 (191162-193831)  SIGMA:1334.5   
       time: MEAN: 3.09004 (3.08413-3.09595)  SIGMA:0.005906   

***** 2 runs. (algorithm:TA)*****
;KategProblem:cats: 50   words: 6842

start-costs: MEAN: 3.0405e+06 (3.03629e+06-3.04472e+06)  SIGMA:4217.96   
  end-costs: MEAN: 2.77351e+06 (2.77251e+06-2.77451e+06)  SIGMA:995.689   
   start-pp: MEAN: 90.8114 (89.3224-92.3005)  SIGMA:1.48907   
     end-pp: MEAN: 32.1567 (32.0323-32.2812)  SIGMA:0.124481   
 iterations: MEAN: 204423 (202255-206591)  SIGMA:2168   
       time: MEAN: 4.4065 (4.36276-4.45023)  SIGMA:0.0437335   
Executing: /pseudogen/tools/giza-pp/snt2cooc.out travatar-model/align/src.vcb travatar-model/align/trg.vcb travatar-model/align/src-trg.snt > travatar-model/align/src-trg.cooc
line 1000
line 2000
line 3000
line 4000
line 5000
line 6000
line 7000
line 8000
line 9000
line 10000
line 11000
line 12000
line 13000
line 14000
line 15000
line 16000
END.
Executing: /pseudogen/tools/giza-pp/snt2cooc.out travatar-model/align/trg.vcb travatar-model/align/src.vcb travatar-model/align/trg-src.snt > travatar-model/align/trg-src.cooc
line 1000
line 2000
line 3000
line 4000
line 5000
line 6000
line 7000
line 8000
line 9000
line 10000
line 11000
line 12000
line 13000
line 14000
line 15000
line 16000
END.
Executing: /pseudogen/tools/giza-pp/GIZA++ -CoocurrenceFile travatar-model/align/trg-src.cooc -c travatar-model/align/trg-src.snt -m1 5 -m2 0 -m3 3 -m4 3 -model1dumpfrequency 1 -model4smoothfactor 0.4 -nodumps 1 -nsmooth 4 -o travatar-model/align/trg-src.giza -onlyaldumps 1 -p0 0.999 -s travatar-model/align/trg.vcb -t travatar-model/align/src.vcb
Executing: /pseudogen/tools/giza-pp/GIZA++ -CoocurrenceFile travatar-model/align/src-trg.cooc -c travatar-model/align/src-trg.snt -m1 5 -m2 0 -m3 3 -m4 3 -model1dumpfrequency 1 -model4smoothfactor 0.4 -nodumps 1 -nsmooth 4 -o travatar-model/align/src-trg.giza -onlyaldumps 1 -p0 0.999 -s travatar-model/align/src.vcb -t travatar-model/align/trg.vcb
ERROR: Execution of: /pseudogen/tools/giza-pp/GIZA++ -CoocurrenceFile travatar-model/align/src-trg.cooc -c travatar-model/align/src-trg.snt -m1 5 -m2 0 -m3 3 -m4 3 -model1dumpfrequency 1 -model4smoothfactor 0.4 -nodumps 1 -nsmooth 4 -o travatar-model/align/src-trg.giza -onlyaldumps 1 -p0 0.999 -s travatar-model/align/src.vcb -t travatar-model/align/trg.vcb
  died with signal 11, without coredump
ERROR: Execution of: /pseudogen/tools/giza-pp/GIZA++ -CoocurrenceFile travatar-model/align/trg-src.cooc -c travatar-model/align/trg-src.snt -m1 5 -m2 0 -m3 3 -m4 3 -model1dumpfrequency 1 -model4smoothfactor 0.4 -nodumps 1 -nsmooth 4 -o travatar-model/align/trg-src.giza -onlyaldumps 1 -p0 0.999 -s travatar-model/align/trg.vcb -t travatar-model/align/src.vcb
  died with signal 11, without coredump
tuning travatar ... 
Executing: mkdir tune
Executing: /pseudogen/tools/travatar/script/train/filter-model.pl travatar-model/model/travatar.ini tune/run1.ini tune/filtered "/pseudogen/tools/travatar/script/train/filter-rule-table.py dev.reducedtree"
Couldn't open travatar-model/model/travatar.ini
Exit code: 2
Couldn't filter at /pseudogen/tools/travatar/script/mert/mert-travatar.pl line 90.
The command '/bin/sh -c git clone https://github.com/delihiros/pseudogen.git && 	cd pseudogen && 	./tool_setup.sh && 	mkdir data && 	cd data && 	wget -O- http://ahclab.naist.jp/pseudogen/en-django.tar.gz | tar zxvf - && 	mv en-django/all.* . && 	../train-pseudogen.sh -p all.code -e all.anno' returned a non-zero code: 2

input string out of range

# user@droid:/git/pseudogen$ ./run-pseudogen.sh
# def f(x):
#    if x == 0:
#        return 1
#    a = f(x-1)
#    b = f(x-2)
#    return a + b
'''
Traceback (most recent call last):
  File "/git/pseudogen/scripts/head-insertion.py", line 30, in 
    main()
  File "/git/pseudogen/scripts/head-insertion.py", line 24, in main
    insert_head(t)
  File "/git/pseudogen/scripts/head-insertion.py", line 8, in insert_head
    if t.label()[0].isupper() and not isinstance(t[0], str):
IndexError: string index out of range
'''
#############################################################################
# user@droid:/git/pseudogen$ cat scripts/head-insertion.py
from nltk.tree import Tree
import sys

def insert_head(t):
    if isinstance(t, Tree):
        for ch in t:
            insert_head(ch)
        if t.label()[0].isupper() and not isinstance(t[0], str):
            t.insert(0, Tree('HEAD', [t.label()]))

def encode(t):
    if isinstance(t, Tree):
        ret = '(' + t.label()
        for ch in t:
            ret += ' ' + encode(ch)
        return ret + ')'
    else:
        return str(t)

def main():
    for l in sys.stdin:
        t = Tree.fromstring(l)
        insert_head(t)
        print(encode(t))
        sys.stdout.flush()

if __name__ == '__main__':
    main()
#############################################################################
# https://github.com/delihiros/pseudogen/
# http://ahclab.naist.jp/pseudogen/

A question about the training files

Dear Delihiros,
Thanks for sharing this wonderful tool.
Pseudogen generates multiple files for training. Two of them are 'train.reducedtree' and 'train.reducedsurf'. Is 'reducedtree' the tree after pruning and 'reducedsurf' the tree after pruning & simplifying? Is 'reducedsurf' the one that is labeled in your paper as Reduced-T2SMT?
Sample from 'train.reducedtree':
(Expr (HEAD Expr) (value (Call (HEAD Call) (func (Attribute (HEAD Attribute) (value (Name os)) (attr (str remove)))) (args (list (Name fname))))))
Sample from 'train.reducedsurf':
Expr Call Attribute os remove fname

generating only 300+ lines of pseudo code

need to investigate why... BLEU score is pretty low because of that.

Setup

Is someone available to walkthrough setting Pseudogen up?

running problem

running this project needs to install docker?

Pseudogen online demo

Dear All,
it is great work
please, is there any online demo for python code.
thanks

Dataset

Dear,

Thanks for your great work.
How can we download the dataset?

Thanks in advance

Broken pipe

# user@droid:/git/pseudogen$ ./run-pseudogen.sh -f ./data/tune/travatar.ini
# def f(x):
#    if x == 0:
#        return 1
#    a = f(x-1)
#    b = f(x-2)
#    return a + b
'''
Traceback (most recent call last):
  File "/git/pseudogen/scripts/head-insertion.py", line 30, in 
    main()
  File "/git/pseudogen/scripts/head-insertion.py", line 24, in main
    insert_head(t)
  File "/git/pseudogen/scripts/head-insertion.py", line 8, in insert_head
    if t.label()[0].isupper() and not isinstance(t[0], str):
IndexError: string index out of range
Traceback (most recent call last):
  File "/git/pseudogen/scripts/simplify.py", line 94, in 
    main()
  File "/git/pseudogen/scripts/simplify.py", line 91, in main
    sys.stdout.flush()
IOError: [Errno 32] Broken pipe
'''

MTEval Build error

Cloning into 'mteval'...
remote: Enumerating objects: 504, done.
remote: Total 504 (delta 0), reused 0 (delta 0), pack-reused 504
Receiving objects: 100% (504/504), 175.58 KiB | 454.00 KiB/s, done.
Resolving deltas: 100% (290/290), done.
-- The CXX compiler identification is GNU 9.2.1
-- Check for working CXX compiler: /usr/bin/c++
-- Check for working CXX compiler: /usr/bin/c++ -- works
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Boost version: 1.67.0
-- Found the following Boost libraries:
-- program_options
-- unit_test_framework
-- Boost include directory: /usr/include
-- Configuring done
-- Generating done
-- Build files have been written to: /home/vikrant/psgen_docker/try2/pseudogen/tools/mteval/build
Scanning dependencies of target mteval
[ 5%] Building CXX object mteval/CMakeFiles/mteval.dir/Dictionary.cc.o
[ 11%] Building CXX object mteval/CMakeFiles/mteval.dir/BLEUEvaluator.cc.o
[ 17%] Building CXX object mteval/CMakeFiles/mteval.dir/NISTEvaluator.cc.o
[ 23%] Building CXX object mteval/CMakeFiles/mteval.dir/EvaluatorFactory.cc.o
/home/vikrant/psgen_docker/try2/pseudogen/tools/mteval/mteval/BLEUEvaluator.cc: In member function ‘virtual MTEval::Statistics MTEval::BLEUEvaluator::map(const MTEval::Sample&) const’:
/home/vikrant/psgen_docker/try2/pseudogen/tools/mteval/mteval/BLEUEvaluator.cc:58:19: error: moving a local object in a return statement prevents copy elision [-Werror=pessimizing-move]
58 | return std::move(stats);
| ~~~~~~~~~^~~~~~~
/home/vikrant/psgen_docker/try2/pseudogen/tools/mteval/mteval/BLEUEvaluator.cc:58:19: note: remove ‘std::move’ call
/home/vikrant/psgen_docker/try2/pseudogen/tools/mteval/mteval/NISTEvaluator.cc: In member function ‘virtual MTEval::Statistics MTEval::NISTEvaluator::map(const MTEval::Sample&) const’:
/home/vikrant/psgen_docker/try2/pseudogen/tools/mteval/mteval/NISTEvaluator.cc:73:19: error: moving a local object in a return statement prevents copy elision [-Werror=pessimizing-move]
73 | return std::move(stats);
| ~~~~~~~~~^~~~~~~
/home/vikrant/psgen_docker/try2/pseudogen/tools/mteval/mteval/NISTEvaluator.cc:73:19: note: remove ‘std::move’ call
cc1plus: all warnings being treated as errors
make[2]: *** [mteval/CMakeFiles/mteval.dir/build.make:63: mteval/CMakeFiles/mteval.dir/BLEUEvaluator.cc.o] Error 1
make[2]: *** Waiting for unfinished jobs....
cc1plus: all warnings being treated as errors
make[2]: *** [mteval/CMakeFiles/mteval.dir/build.make:102: mteval/CMakeFiles/mteval.dir/NISTEvaluator.cc.o] Error 1
make[1]: *** [CMakeFiles/Makefile2:224: mteval/CMakeFiles/mteval.dir/all] Error 2
make: *** [Makefile:95: all] Error 2

delihiros / pseudogen Goto Github PK

pseudogen's Introduction

Pseudogen

Installation

Using Docker

Requirements

Usage

How does Pseudogen work?

Papers

Tools Used

Contributors

pseudogen's People

Contributors

Stargazers

Watchers

Forkers

pseudogen's Issues

Recommend Projects

Recommend Topics

Recommend Org

Jobs