GithubHelp home page GithubHelp logo

delihiros / pseudogen Goto Github PK

View Code? Open in Web Editor NEW
149.0 13.0 40.0 18 KB

A tool to automatically generate pseudo-code from source code.

License: Apache License 2.0

Shell 32.67% Python 63.65% Dockerfile 3.67%

pseudogen's Introduction

Pseudogen

A tool to automatically generate pseudo-code from source code.

Demo

Installation

Using Docker

docker is all you need.

  docker attach `docker run -itd delihiros/pseudogen`
  /# cd pseudogen/data
  /# ../run-pseudogen.sh -f tune/travatar.ini

Requirements

Requires Python 3.5+

  apt install git libboost-all-dev autoconf automake autotools-dev libtool zlib1g-dev cmake build-essential python3 python3-pip wget -y
  pip3 install nltk

for Mac OS X users: GIZA++ is written for Linux, so you may need to do some modifications to install. http://catherinegasnier.blogspot.jp/2014/04/install-giza-107-on-mac-osx-1092.html

  git clone https://github.com/delihiros/pseudogen.git
  cd pseudogen
  ./tool_setup.sh

Usage

Download and extract corpus from annotated Django source code.

  mkdir data
  cd data
  wget -O- http://ahclab.naist.jp/pseudogen/en-django.tar.gz | tar zxvf -
  mv en-django/all.* .
  ../train-pseudogen.sh -p all.code -e all.anno
  ../run-pseudogen.sh -f tune/travatar.ini
  # input Python code you want to translate
  # in some environments, you may need to press Ctrl+D few times in order to start tranlating

How does Pseudogen work?

Papers

Tools Used

  • GIZA++ to make alignment
  • Travatar to train Tree-to-String machine translation model
  • mteval to evaluate

Contributors

pseudogen's People

Contributors

delihiros avatar tjt263 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

pseudogen's Issues

Couldn't filter at /pseudogen/tools/travatar/script/mert/mert-travatar.pl line 90.

I get this error while building

Couldn't open travatar-model/model/travatar.ini
Exit code: 2
Couldn't filter at /pseudogen/tools/travatar/script/mert/mert-travatar.pl line 90.
The command '/bin/sh -c git clone https://github.com/delihiros/pseudogen.git && 	cd pseudogen && 	./tool_setup.sh && 	mkdir data && 	cd data && 	wget -O- http://ahclab.naist.jp/pseudogen/en-django.tar.gz | tar zxvf - && 	mv en-django/all.* . && 	../train-pseudogen.sh -p all.code -e all.anno' returned a non-zero code: 2

More logging following

tokenizing python ... 
tokenizing english ... 
parsing python ... 
head insertion ... 
simplifying ... 
making data ... 
making alignment ... 
../train-pseudogen.sh: 52: ../train-pseudogen.sh: /pseudogen/tools/pialign/src/bin/pialign: not found
making language model ... 
=== 1/5 Counting and sorting n-grams ===
Reading /pseudogen/data/train.entok
----5---10---15---20---25---30---35---40---45---50---55---60---65---70---75---80---85---90---95--100
****************************************************************************************************
Unigram tokens 240407 types 6844
=== 2/5 Calculating and sorting adjusted counts ===
Chain sizes: 1:82128 2:980730240 3:1838869248 4:2942190592 5:4290694912
Statistics:
1 6844 D1=0.456042 D2=1.32836 D3+=1.95408
2 39124 D1=0.752714 D2=1.33396 D3+=1.60661
3 74152 D1=0.823118 D2=1.3625 D3+=1.55919
4 100578 D1=0.877647 D2=1.41071 D3+=1.59675
5 117034 D1=0.769353 D2=1.29688 D3+=1.40612
Memory estimate for binary LM:
type      kB
probing 7243 assuming -p 1.5
probing 8523 assuming -r models -p 1.5
trie    3216 without quantization
trie    1668 assuming -q 8 -b 8 quantization 
trie    2965 assuming -a 22 array pointer compression
trie    1416 assuming -a 22 -q 8 -b 8 array pointer compression and quantization
=== 3/5 Calculating and sorting initial probabilities ===
Chain sizes: 1:82128 2:625984 3:1483040 4:2413872 5:3276952
=== 4/5 Calculating and writing order-interpolated probabilities ===
Chain sizes: 1:82128 2:625984 3:1483040 4:2413872 5:3276952
Chain sizes: 1:82128 2:625984 3:1483040 4:2413872 5:3276952
=== 5/5 Writing ARPA model ===
Name:lt-lmplz	VmPeak:10002084 kB	VmRSS:8088 kB	RSSMax:1755116 kB	user:0.779529	sys:0.834626	CPU:1.61416	real:1.41535
Reading lm/lm.arpa
----5---10---15---20---25---30---35---40---45---50---55---60---65---70---75---80---85---90---95--100
****************************************************************************************************
SUCCESS
training travatar ... 

Executing: mkdir travatar-model
(1) Preparing data @ Mon May 11 10:32:46 UTC 2020
Executing: mkdir -p travatar-model/data
Executing: /pseudogen/tools/travatar/src/bin/tree-converter -input_format penn -output_format word < train.reducedtree > travatar-model/data/src.word
Main arguments:
Optional arguments:
 -input_format 	penn
 -output_format 	word
 -split 	
 -compoundsplit 	
 -compoundsplit_filler 	
 -compoundsplit_threshold 	0.01
 -compoundsplit_minchar 	3
 -binarize 	none
 -case 	none
 -flatten 	false
 -debug 	0
Transforming trees (.=10,000, !=100,000 sentences)
.
(2) Creating alignments @ Mon May 11 10:32:47 UTC 2020
Executing: mkdir -p travatar-model/align
Executing: /pseudogen/tools/giza-pp/mkcls -c50 -n2 -ptrain.entok -Vtravatar-model/align/trg.vcb.classes opt
Executing: /pseudogen/tools/giza-pp/mkcls -c50 -n2 -ptravatar-model/data/src.word -Vtravatar-model/align/src.vcb.classes opt

***** 2 runs. (algorithm:TA)*****
;KategProblem:cats: 50   words: 7026

start-costs: MEAN: 1.57369e+06 (1.57369e+06-1.57369e+06)  SIGMA:0.702256   
  end-costs: MEAN: 1.44237e+06 (1.44218e+06-1.44255e+06)  SIGMA:185.301   
   start-pp: MEAN: 99.2817 (99.2812-99.2822)  SIGMA:0.000498938   
     end-pp: MEAN: 38.8095 (38.7581-38.8609)  SIGMA:0.0514388   
 iterations: MEAN: 192496 (191162-193831)  SIGMA:1334.5   
       time: MEAN: 3.09004 (3.08413-3.09595)  SIGMA:0.005906   

***** 2 runs. (algorithm:TA)*****
;KategProblem:cats: 50   words: 6842

start-costs: MEAN: 3.0405e+06 (3.03629e+06-3.04472e+06)  SIGMA:4217.96   
  end-costs: MEAN: 2.77351e+06 (2.77251e+06-2.77451e+06)  SIGMA:995.689   
   start-pp: MEAN: 90.8114 (89.3224-92.3005)  SIGMA:1.48907   
     end-pp: MEAN: 32.1567 (32.0323-32.2812)  SIGMA:0.124481   
 iterations: MEAN: 204423 (202255-206591)  SIGMA:2168   
       time: MEAN: 4.4065 (4.36276-4.45023)  SIGMA:0.0437335   
Executing: /pseudogen/tools/giza-pp/snt2cooc.out travatar-model/align/src.vcb travatar-model/align/trg.vcb travatar-model/align/src-trg.snt > travatar-model/align/src-trg.cooc
line 1000
line 2000
line 3000
line 4000
line 5000
line 6000
line 7000
line 8000
line 9000
line 10000
line 11000
line 12000
line 13000
line 14000
line 15000
line 16000
END.
Executing: /pseudogen/tools/giza-pp/snt2cooc.out travatar-model/align/trg.vcb travatar-model/align/src.vcb travatar-model/align/trg-src.snt > travatar-model/align/trg-src.cooc
line 1000
line 2000
line 3000
line 4000
line 5000
line 6000
line 7000
line 8000
line 9000
line 10000
line 11000
line 12000
line 13000
line 14000
line 15000
line 16000
END.
Executing: /pseudogen/tools/giza-pp/GIZA++ -CoocurrenceFile travatar-model/align/trg-src.cooc -c travatar-model/align/trg-src.snt -m1 5 -m2 0 -m3 3 -m4 3 -model1dumpfrequency 1 -model4smoothfactor 0.4 -nodumps 1 -nsmooth 4 -o travatar-model/align/trg-src.giza -onlyaldumps 1 -p0 0.999 -s travatar-model/align/trg.vcb -t travatar-model/align/src.vcb
Executing: /pseudogen/tools/giza-pp/GIZA++ -CoocurrenceFile travatar-model/align/src-trg.cooc -c travatar-model/align/src-trg.snt -m1 5 -m2 0 -m3 3 -m4 3 -model1dumpfrequency 1 -model4smoothfactor 0.4 -nodumps 1 -nsmooth 4 -o travatar-model/align/src-trg.giza -onlyaldumps 1 -p0 0.999 -s travatar-model/align/src.vcb -t travatar-model/align/trg.vcb
ERROR: Execution of: /pseudogen/tools/giza-pp/GIZA++ -CoocurrenceFile travatar-model/align/src-trg.cooc -c travatar-model/align/src-trg.snt -m1 5 -m2 0 -m3 3 -m4 3 -model1dumpfrequency 1 -model4smoothfactor 0.4 -nodumps 1 -nsmooth 4 -o travatar-model/align/src-trg.giza -onlyaldumps 1 -p0 0.999 -s travatar-model/align/src.vcb -t travatar-model/align/trg.vcb
  died with signal 11, without coredump
ERROR: Execution of: /pseudogen/tools/giza-pp/GIZA++ -CoocurrenceFile travatar-model/align/trg-src.cooc -c travatar-model/align/trg-src.snt -m1 5 -m2 0 -m3 3 -m4 3 -model1dumpfrequency 1 -model4smoothfactor 0.4 -nodumps 1 -nsmooth 4 -o travatar-model/align/trg-src.giza -onlyaldumps 1 -p0 0.999 -s travatar-model/align/trg.vcb -t travatar-model/align/src.vcb
  died with signal 11, without coredump
tuning travatar ... 
Executing: mkdir tune
Executing: /pseudogen/tools/travatar/script/train/filter-model.pl travatar-model/model/travatar.ini tune/run1.ini tune/filtered "/pseudogen/tools/travatar/script/train/filter-rule-table.py dev.reducedtree"
Couldn't open travatar-model/model/travatar.ini
Exit code: 2
Couldn't filter at /pseudogen/tools/travatar/script/mert/mert-travatar.pl line 90.
The command '/bin/sh -c git clone https://github.com/delihiros/pseudogen.git && 	cd pseudogen && 	./tool_setup.sh && 	mkdir data && 	cd data && 	wget -O- http://ahclab.naist.jp/pseudogen/en-django.tar.gz | tar zxvf - && 	mv en-django/all.* . && 	../train-pseudogen.sh -p all.code -e all.anno' returned a non-zero code: 2

input string out of range

# user@droid:/git/pseudogen$ ./run-pseudogen.sh
# def f(x):
#    if x == 0:
#        return 1
#    a = f(x-1)
#    b = f(x-2)
#    return a + b
'''
Traceback (most recent call last):
  File "/git/pseudogen/scripts/head-insertion.py", line 30, in 
    main()
  File "/git/pseudogen/scripts/head-insertion.py", line 24, in main
    insert_head(t)
  File "/git/pseudogen/scripts/head-insertion.py", line 8, in insert_head
    if t.label()[0].isupper() and not isinstance(t[0], str):
IndexError: string index out of range
'''
#############################################################################
# user@droid:/git/pseudogen$ cat scripts/head-insertion.py
from nltk.tree import Tree
import sys

def insert_head(t):
    if isinstance(t, Tree):
        for ch in t:
            insert_head(ch)
        if t.label()[0].isupper() and not isinstance(t[0], str):
            t.insert(0, Tree('HEAD', [t.label()]))

def encode(t):
    if isinstance(t, Tree):
        ret = '(' + t.label()
        for ch in t:
            ret += ' ' + encode(ch)
        return ret + ')'
    else:
        return str(t)

def main():
    for l in sys.stdin:
        t = Tree.fromstring(l)
        insert_head(t)
        print(encode(t))
        sys.stdout.flush()

if __name__ == '__main__':
    main()
#############################################################################
# https://github.com/delihiros/pseudogen/
# http://ahclab.naist.jp/pseudogen/

A question about the training files

Dear Delihiros,
Thanks for sharing this wonderful tool.
Pseudogen generates multiple files for training. Two of them are 'train.reducedtree' and 'train.reducedsurf'. Is 'reducedtree' the tree after pruning and 'reducedsurf' the tree after pruning & simplifying? Is 'reducedsurf' the one that is labeled in your paper as Reduced-T2SMT?
Sample from 'train.reducedtree':
(Expr (HEAD Expr) (value (Call (HEAD Call) (func (Attribute (HEAD Attribute) (value (Name os)) (attr (str remove)))) (args (list (Name fname))))))
Sample from 'train.reducedsurf':
Expr Call Attribute os remove fname

Setup

Is someone available to walkthrough setting Pseudogen up?

Pseudogen online demo

Dear All,
it is great work
please, is there any online demo for python code.
thanks

Dataset

Dear,

Thanks for your great work.
How can we download the dataset?

Thanks in advance

Broken pipe

# user@droid:/git/pseudogen$ ./run-pseudogen.sh -f ./data/tune/travatar.ini
# def f(x):
#    if x == 0:
#        return 1
#    a = f(x-1)
#    b = f(x-2)
#    return a + b
'''
Traceback (most recent call last):
  File "/git/pseudogen/scripts/head-insertion.py", line 30, in 
    main()
  File "/git/pseudogen/scripts/head-insertion.py", line 24, in main
    insert_head(t)
  File "/git/pseudogen/scripts/head-insertion.py", line 8, in insert_head
    if t.label()[0].isupper() and not isinstance(t[0], str):
IndexError: string index out of range
Traceback (most recent call last):
  File "/git/pseudogen/scripts/simplify.py", line 94, in 
    main()
  File "/git/pseudogen/scripts/simplify.py", line 91, in main
    sys.stdout.flush()
IOError: [Errno 32] Broken pipe
'''

MTEval Build error

Cloning into 'mteval'...
remote: Enumerating objects: 504, done.
remote: Total 504 (delta 0), reused 0 (delta 0), pack-reused 504
Receiving objects: 100% (504/504), 175.58 KiB | 454.00 KiB/s, done.
Resolving deltas: 100% (290/290), done.
-- The CXX compiler identification is GNU 9.2.1
-- Check for working CXX compiler: /usr/bin/c++
-- Check for working CXX compiler: /usr/bin/c++ -- works
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Boost version: 1.67.0
-- Found the following Boost libraries:
-- program_options
-- unit_test_framework
-- Boost include directory: /usr/include
-- Configuring done
-- Generating done
-- Build files have been written to: /home/vikrant/psgen_docker/try2/pseudogen/tools/mteval/build
Scanning dependencies of target mteval
[ 5%] Building CXX object mteval/CMakeFiles/mteval.dir/Dictionary.cc.o
[ 11%] Building CXX object mteval/CMakeFiles/mteval.dir/BLEUEvaluator.cc.o
[ 17%] Building CXX object mteval/CMakeFiles/mteval.dir/NISTEvaluator.cc.o
[ 23%] Building CXX object mteval/CMakeFiles/mteval.dir/EvaluatorFactory.cc.o
/home/vikrant/psgen_docker/try2/pseudogen/tools/mteval/mteval/BLEUEvaluator.cc: In member function ‘virtual MTEval::Statistics MTEval::BLEUEvaluator::map(const MTEval::Sample&) const’:
/home/vikrant/psgen_docker/try2/pseudogen/tools/mteval/mteval/BLEUEvaluator.cc:58:19: error: moving a local object in a return statement prevents copy elision [-Werror=pessimizing-move]
58 | return std::move(stats);
| ~~~~~~~~~^~~~~~~
/home/vikrant/psgen_docker/try2/pseudogen/tools/mteval/mteval/BLEUEvaluator.cc:58:19: note: remove ‘std::move’ call
/home/vikrant/psgen_docker/try2/pseudogen/tools/mteval/mteval/NISTEvaluator.cc: In member function ‘virtual MTEval::Statistics MTEval::NISTEvaluator::map(const MTEval::Sample&) const’:
/home/vikrant/psgen_docker/try2/pseudogen/tools/mteval/mteval/NISTEvaluator.cc:73:19: error: moving a local object in a return statement prevents copy elision [-Werror=pessimizing-move]
73 | return std::move(stats);
| ~~~~~~~~~^~~~~~~
/home/vikrant/psgen_docker/try2/pseudogen/tools/mteval/mteval/NISTEvaluator.cc:73:19: note: remove ‘std::move’ call
cc1plus: all warnings being treated as errors
make[2]: *** [mteval/CMakeFiles/mteval.dir/build.make:63: mteval/CMakeFiles/mteval.dir/BLEUEvaluator.cc.o] Error 1
make[2]: *** Waiting for unfinished jobs....
cc1plus: all warnings being treated as errors
make[2]: *** [mteval/CMakeFiles/mteval.dir/build.make:102: mteval/CMakeFiles/mteval.dir/NISTEvaluator.cc.o] Error 1
make[1]: *** [CMakeFiles/Makefile2:224: mteval/CMakeFiles/mteval.dir/all] Error 2
make: *** [Makefile:95: all] Error 2

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.