clab / dynet Goto Github PK

DyNet: The Dynamic Neural Network Toolkit

License: Apache License 2.0

CMake 1.25% C++ 65.51% Cuda 0.83% Python 3.52% Scala 4.01% Java 0.11% Shell 0.24% C 4.41% C# 0.96% Rust 6.37% SWIG 1.74% Cython 11.04%

dynet's People

Contributors

Stargazers

Watchers

Forkers

eschling nagyist noisychannel ndnlp zuiwufenghua ruyiwei tandakun tjsongzw yoavg ws15mt sugaton kevinduh yanweifu dhgarrette mrzu mfaruqui ikekonglp lemaoliu hanzmyco pquentin andresmo liaopan trevorcohn ml-ai-nlp-ir jiajunzhangnlp hiroki-kyoto lrank moefang xubenben hitwsl sohuren varunnagpaal roeeaharoni jungikim 23119841 miradel51 xlhdh zodiacr felixzhang00 alvations cdg720 wanglinjie jiangfeng1124 justinxutianyu nddk mikelewis0 andy0731 lvzhiqiang aria42 milesqli passstory bernkastel buptqitian puzhao nobodyonly xuerenlv yangjunpro plr123 neopenx zhangxinnan fang19911030 jerrymomo10 hijackedmc jackeylu codeaudit nianyacode niumeng07 duweifu yzq1979 pombredanne danielhers lizuyao2010 riyazbhat gatsbyustc talbaumel danielbaak zqj7 milesrison jayantk yanyankangkang ming-wei-chang-zz chao-su devy001 coachwei chagge chentiejian guker ungar7 jankim fofof200 zakiindrasukma vanova masamigithub winingsky jniehues-kit kartikgo xwixcn wyslatitude wheatwaves mobangjack

dynet's Issues

DyNet needs documentation!

There is very little documentation and comments for dynet, except for the examples. It would be really nice to have this. Perhaps we can add comments to the code in doxygen format, etc. to facilitate automatic creation of documentation.

Switch linkage from static to dynamic

Currently cnn compiles twice per backend device, once for a static library and once for a shared library. This takes unnecessary compile time and is not clean, so it should be fixed.

Compile fails on Travis CI

Currently compile fails on Travis CI. This should be fixed.

Better model saving/loading

Currently, cnn models use model the boost serialization library to perform model saving/loading. This is easy, but only works if you use the same version of boost to save/load files. Ideally, we can create a version of model saving that works across boost versions, perhaps by creating a custom boost serializer that reads/writes the parameters appropriately in text format.

How to add dropout to a specific layer in python

I am using pyCNN:

self.hid2Layer = parameter(self.model["hidden2-layer"])
self.hidLayer = parameter(self.model["hidden-layer"])
self.outLayer = parameter(self.model["output-layer"])

How can I explicitly have one of the layers with a pre-defined dropout probability?

Thanks,
Mohammad

Adam updates

Hi,

I see a small difference in Adam update from the original algorithm in the paper. Here you increment variable t per parameter update while I think it should be per update round.
https://github.com/rasoolims/cnn/blob/master/cnn/training.cc#L266

model saving and loading is broken

One of the latest changes broke the model loading mechanism: loading a saved model results in a boost input stream error exception.

(this happened at: a6d937d )

To reproduce: run

cd build/examples
./read-write

Create a node for general tensor contractions

It would be nice to be able to multiply (contract) tensors of any dimension, parameterizing the axes that we would like to multiply.

--dynet options having to be at the beginning of the command line is confusing

When running programs compiled with --dynet, the command line options having to be at the beginning of the command line options is confusing. It'd be better if we didn't have this restriction.

--cnn-mem How to do this?

How to allocate more memory to cnn. There is an error that it ran out of memory each time.

Question: how to get averaged parameters?

Hi,

Thanks for the great library.

I was wondering if this library support averaging? I mean, maintaining an averaged parameter that is going to be used in decoding.

Thanks

reshape with batch size zeros out values

Following #72 , I encountered the following unepexted behavior in reshape (CPU):
(not sure if the bug is in reshape or in as_vector)

#include "dynet/dynet.h"
#include "dynet/expr.h"

#include <iostream>
#include <fstream>

using namespace std;
using namespace dynet;
using namespace dynet::expr;

int main(int argc, char** argv) {
  dynet::initialize(argc, argv);

  // parameters
  Model m;
  ComputationGraph cg;
  LookupParameter L;
  L =  m.add_lookup_parameters(1, {8});
  L.initialize(0,{1,2,3,4,5,6,7,8});
  Expression x = lookup(cg, L, (unsigned)0);
  auto v = as_vector(cg.forward(x));
  // this prints 1...8
  for (int i = 0; i < v.size(); ++i) {
      cout << v[i] << endl;
  }
  cout << endl;

  // this prints 0s
  Expression z = reshape(x, Dim({1}, 8));
  auto v2 = as_vector(cg.forward(z));
  for (int i = 0; i < v2.size(); ++i) {
      cout << v2[i] << endl;
  }
}

reserved identifier violation

I would like to point out that an identifier like "_TIMING_H_" does not fit to the expected naming convention of the C++ language standard.
Would you like to adjust your selection for unique names?

When using GPU, memory usage on CPU and the accompanying message is confusing

When using GPU, if memory is set using --dynet_mem, the memory on the GPU is set properly, but DyNet additionally allocates 512MB of memory on the CPU and sends a message indicating so. This unneeded allocation and confusing message should be fixed.

Support custom initialization of parameters

Sometimes we want to initialize parameters in ways other than the standard. It would be nice to have a way to specify custom initializers that did this for us.

assertion failure during update if no lookup is performed (v2)

When a lookup parameters is added to a model, it must be queried at least once before a model update or code crashes with assertion failure.

the error message:

Assertion failed: (it != non_zero_grads.end()), function g_squared_l2norm_dev, file /Users/yogo/V/playwith/cnn_v2/cnn/cnn/model.cc, line 343.

code:

import pycnn as pc

m = pc.Model()
lp = m.add_lookup_parameters((10, 10))
pb = m.add_parameters(10)

trainer = pc.AdamTrainer(m)

pc.renew_cg()
lp[0] # without this line the code crashes

b = pc.parameter(pb)
e = pc.dot_product(b,b)

e.value()
e.backward()
trainer.update()

CPU you selected does not support x86-64 instruction set

I have the following CPU config;
lscpu
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 6
.
.
.

But when I try to make cnn, i get an error message saying CPU you selected does not support x86-64 instruction set.

Here is the complete error log:
make VERBOSE=1
/usr/local/bin/cmake -H/home/riyaz/GIT_Projects/cNN/cnn -B/home/riyaz/GIT_Projects/cNN/cnn/build --check-build-system CMakeFiles/Makefile.cmake 0
/usr/local/bin/cmake -E cmake_progress_start /home/riyaz/GIT_Projects/cNN/cnn/build/CMakeFiles /home/riyaz/GIT_Projects/cNN/cnn/build/CMakeFiles/progress.marks
make -f CMakeFiles/Makefile2 all
make[1]: Entering directory /home/riyaz/GIT_Projects/cNN/cnn/build' make -f cnn/CMakeFiles/cnn.dir/build.make cnn/CMakeFiles/cnn.dir/depend make[2]: Entering directory/home/riyaz/GIT_Projects/cNN/cnn/build'
cd /home/riyaz/GIT_Projects/cNN/cnn/build && /usr/local/bin/cmake -E cmake_depends "Unix Makefiles" /home/riyaz/GIT_Projects/cNN/cnn /home/riyaz/GIT_Projects/cNN/cnn/cnn /home/riyaz/GIT_Projects/cNN/cnn/build /home/riyaz/GIT_Projects/cNN/cnn/build/cnn /home/riyaz/GIT_Projects/cNN/cnn/build/cnn/CMakeFiles/cnn.dir/DependInfo.cmake --color=
make[2]: Leaving directory /home/riyaz/GIT_Projects/cNN/cnn/build' make -f cnn/CMakeFiles/cnn.dir/build.make cnn/CMakeFiles/cnn.dir/build make[2]: Entering directory/home/riyaz/GIT_Projects/cNN/cnn/build'
/usr/local/bin/cmake -E cmake_progress_report /home/riyaz/GIT_Projects/cNN/cnn/build/CMakeFiles 1
[ 3%] Building CXX object cnn/CMakeFiles/cnn.dir/cfsm-builder.cc.o
cd /home/riyaz/GIT_Projects/cNN/cnn/build/cnn && /usr/bin/c++ -fPIC -funroll-loops -Wall -std=c++11 -Ofast -g -DEIGEN_FAST_MATH -march=native -I/home/riyaz/GIT_Projects/cNN/cnn -I/home/riyaz/GIT_Projects/cNN/cnn/external/easyloggingpp/src -I/usr/local/include -I/home/riyaz/GIT_Projects/cNN/cnn/../eigen -I/home/riyaz/GIT_Projects/cNN/cnn/build -o CMakeFiles/cnn.dir/cfsm-builder.cc.o -c /home/riyaz/GIT_Projects/cNN/cnn/cnn/cfsm-builder.cc
/home/riyaz/GIT_Projects/cNN/cnn/cnn/cfsm-builder.cc:1:0: error: CPU you selected does not support x86-64 instruction set
make[2]: *** [cnn/CMakeFiles/cnn.dir/cfsm-builder.cc.o] Error 1
make[2]: Leaving directory /home/riyaz/GIT_Projects/cNN/cnn/build' make[1]: *** [cnn/CMakeFiles/cnn.dir/all] Error 2 make[1]: Leaving directory/home/riyaz/GIT_Projects/cNN/cnn/build'
make: *** [all] Error 2

Multi-device support

Currently, cnn supports running computation on either CPU or GPU, based on the setting of the -DHAVE_CUDA flag. It would be better if a single program could utilize both CPU and one or more GPUs within the same program. The framework for this is mostly there, minus a few remaining issues:

There are a number of places, mainly in the parameters and tensor operations that are still hard-coded based on the setting of HAVE_CUDA. These all need to be changed to be dynamic based on the device setting for the tensor.
Currently parameters are all declared on the default device. There needs to be an option to let parameters live on other devices.
We need to create a node that moves memory between devices.
Test code for multi-device support needs to be created.

Error on batched affine transforms for batch size > 32

Currently if you try to do the backward step of an affine transform for minibatch sizes greater than 32 you may get a memory error or segmentation fault. This is related to the following Eigen upstream issue:

http://eigen.tuxfamily.org/bz/show_bug.cgi?id=1295

Questions about L2 regularization on lookup parameters

Hi, all

Thanks for the great tool. I enjoyed both using it and reading the code.

When looking into the SGD training part, I met a question in cnn/training.cc. More specifically, it's about the L2 regularization on lookup parameters.

According to my understanding, the cnn::LookupParameters stores a sparse representation of the training example. For each instance, only a few dimensions are fired and filled into the cnn::LookupParameters::non_zero_grads. When using SGD to train a model, the original form of L2 regularization require updates over all the parameters (not only the non_zero_grads). But codes in the cnn/training.cc (line:48 to line:57) seems only updates on the non_zero_grads. So I am a bit confused if there is any thing I missed?

According to section 5.1 in http://research.microsoft.com/pubs/192769/tricks-2012.pdf, there is a trick which guarantees correct L2 regularization updates on sparsely represented training example. I am not sure if this trick is employed. Could anyone kindly advise on this? Thanks so much.

DyNet should have coding style conventions, and document them

Currently DyNet has very few coding style conventions. Currently the only ones are basically that indents should be two spaces, and that functions should be snake case? These should be made more clear and documented if necessary.

Find pycnn.so, but wrong architecture

Hi,
It seems like I got pycnn built successfully. But an error of "wrong architecture" showed up when running example files.
Do you have any idea to fix this issue?
Thank you very much.

=== pycnn$ make install ===
...
...
39 warnings generated.
++ -bundle -undefined dynamic_lookup -arch i386 -arch x86_64 -Wl,-F. build/temp.macosx-10.11-intel-2.7/pycnn.o -L. -L$ORIGIN/./ -lcnn_shared -o build/lib.macosx-10.11-intel-2.7/pycnn.so
ld: warning: directory not found for option '-L$ORIGIN/./'
ld: warning: ignoring file ./libcnn_shared.dylib, file was built for x86_64 which is not the architecture being linked (i386): ./libcnn_shared.dylib
ld: warning: ignoring file /opt/local/lib/gcc48/libstdc++.dylib, file was built for x86_64 which is not the architecture being linked (i386): /opt/local/lib/gcc48/libstdc++.dylib
ld: warning: ignoring file /opt/local/lib/gcc48/libgcc_ext.10.5.dylib, missing required architecture i386 in file /opt/local/lib/gcc48/libgcc_ext.10.5.dylib (1 slices)
ld: warning: ignoring file /opt/local/lib/gcc48/gcc/x86_64-apple-darwin15/4.8.5/libgcc.a, file was built for archive which is not the architecture being linked (i386): /opt/local/lib/gcc48/gcc/x86_64-apple-darwin15/4.8.5/libgcc.a
creating build/bdist.macosx-10.11-intel/egg
copying build/lib.macosx-10.11-intel-2.7/pycnn.so -> build/bdist.macosx-10.11-intel/egg
creating stub loader for pycnn.so
byte-compiling build/bdist.macosx-10.11-intel/egg/pycnn.py to pycnn.pyc
creating build/bdist.macosx-10.11-intel/egg/EGG-INFO
copying pyCNN.egg-info/PKG-INFO -> build/bdist.macosx-10.11-intel/egg/EGG-INFO
copying pyCNN.egg-info/SOURCES.txt -> build/bdist.macosx-10.11-intel/egg/EGG-INFO
copying pyCNN.egg-info/dependency_links.txt -> build/bdist.macosx-10.11-intel/egg/EGG-INFO
copying pyCNN.egg-info/top_level.txt -> build/bdist.macosx-10.11-intel/egg/EGG-INFO
writing build/bdist.macosx-10.11-intel/egg/EGG-INFO/native_libs.txt
zip_safe flag not set; analyzing archive contents...
creating 'dist/pyCNN-0.0.0-py2.7-macosx-10.11-intel.egg' and adding 'build/bdist.macosx-10.11-intel/egg' to it
removing 'build/bdist.macosx-10.11-intel/egg' (and everything under it)
Processing pyCNN-0.0.0-py2.7-macosx-10.11-intel.egg
Removing /Users/dqnguyen/Library/Python/2.7/lib/python/site-packages/pyCNN-0.0.0-py2.7-macosx-10.11-intel.egg
Copying pyCNN-0.0.0-py2.7-macosx-10.11-intel.egg to /Users/dqnguyen/Library/Python/2.7/lib/python/site-packages
pyCNN 0.0.0 is already the active version in easy-install.pth

Installed /Users/dqnguyen/Library/Python/2.7/lib/python/site-packages/pyCNN-0.0.0-py2.7-macosx-10.11-intel.egg
Processing dependencies for pyCNN==0.0.0
Finished processing dependencies for pyCNN==0.0.0

=== pyexamples$ python xor.py ===

Traceback (most recent call last):
File "xor.py", line 1, in
from pycnn import *
File "build/bdist.macosx-10.11-intel/egg/pycnn.py", line 7, in
File "build/bdist.macosx-10.11-intel/egg/pycnn.py", line 6, in bootstrap
ImportError: dlopen(/Users/dqnguyen/.python-eggs/pyCNN-0.0.0-py2.7-macosx-10.11-intel.egg-tmp/pycnn.so, 2): no suitable image found. Did find:
/Users/dqnguyen/.python-eggs/pyCNN-0.0.0-py2.7-macosx-10.11-intel.egg-tmp/pycnn.so: mach-o, but wrong architecture

there is an error when build on 32bit ubuntu without GPU

I use cmake2.8.7 to build, when I build on 48% it exist build error, I change unsigned long recvd_size to std::size_t recvd_size in mp.h, it continues to build again, but exist [cnn/CMakeFiles/cnn_shared.dir/all] Error 2 on 70%.
could you help me point some issue? thanks!

Tensor contraction and inverse tests are failing

Currently, 3 tests in the test suite fail: contract3d_1d_gradient, contract3d_1d_1d_gradient, inverse_gradient. These result in failure in Travis CI.

These tests should be examined, and it should be made clear whether this is a problem in the test or in the code.

Use of angle brackets around file names for include statements

Would you like to replace any double quotes by angle brackets around file names for include statements?

Problems when update lookup parameters in parent process

I try to use sgd->update() in parent process to update lookup parameters. However, it doesn't work. When I use sgd->update() in child process, it works well. This problem is caused by that the variable non_zero_grads in model is not shared. So parent process will not update lookup parameters because this variable is empty.

Compiling time is slow

Compiling time for cnn is slow. We should try to move includes of the heavier libraries (boost, etc.) from .h files to .cc files.

Segfault on CUDA when using GCC 4.9.3 and optimization -O2

Currently, when using cnn on CUDA, compiling with GCC 4.9.3, segfaults occur when using the default optimization setting of -O2. This is not a problem on GCC 4.8.*, but may be a problem on other versions of GCC.

We don't currently know the cause, but a patch fixing the bug would be welcome. If you want a quick fix, find the line including APPEND CUDA_NVCC_FLAGS in cnn/CMakeLists.txt and change -O2 to -O1.

PyCNN should compile with main make

Currently PyCNN and the main CNN make are separate, but they should be integrated into a single cmake file.

problem with boost while performing 'make'

A follow-up of a discussion on Linkedin concerning BIST Parser. I'm aware this is not a pyCNN specific issue but searching on internet I was not able to find a solution.

When installing pyCNN (as specified in https://github.com/clab/cnn/blob/master/INSTALL.md), I encounter a problem when I perform 'make' (after customizing setup.py). Even if the previous steps seem ok, when I run ' make', an error is issued:

make
cp ../build/cnn/libcnn_shared.so .
python setup.py build_ext --inplace
running build_ext
skipping 'pycnn.cpp' Cython extension (up-to-date)
building 'pycnn' extension
/usr/bin/gcc-4.9 -fno-strict-aliasing -DNDEBUG -g -fwrapv -O2 -Wall -Wstrict-prototypes -fPIC -I/home/lavelli/cnn -I/home/lavelli/eigen-eigen-774d70b529d6 -I/usr/include/python2.7 -c pycnn.cpp -o build/temp.linux-x86_64-2.7/pycnn.o -std=c++11
cc1plus: warning: command line option '-Wstrict-prototypes' is valid for C/ObjC but not for C++
In file included from pycnn.cpp:257:0:
/home/lavelli/cnn/cnn/tensor.h:16:41: fatal error: boost/serialization/array.hpp: No such file or directory

include <boost/serialization/array.hpp>

^
compilation terminated.
error: command '/usr/bin/gcc-4.9' failed with exit status 1
make: *** [pycnn.so] Error 1

Note that I have installed boost_1_60_0 under my home directory (I also created a link named 'boost' to boost_1_60_0). and the file array.hpp exists both under $HOME/boost/boost/serialization/ and $HOME/boost/include/boost/serialization/.

Thanks in advance

support resizing of parameters

it should be possible to resize parameters after they are allocated.

CUDA/GPU Support is Incomplete

Currently, GPU/CUDA support in cnn is incomplete. The most common functions are implemented in CUDA, but some of the less frequently used ones are not. Up until now this led to segmentation faults, but now most unimplemented functions should throw an error when you try to use them.

We're planning on fixing this in a couple ways:

In the long term, we're hoping to transition to the Eigen GPU functionality, which will mean that we won't have to re-implement every function on both CPUs and GPUs, and make both basically interchangeable. However, this requires some major changes/rethinking of the way some things are designed, so it may take a while.
In the short term, it is possible to add individual GPU implementations of any functions that you want to use. This can generally be done by taking a look at nodes.cc to find a similar function, and modifying nodes.cc, gpu-ops.cc, gpu-ops.h, and functors.h to add the functionality. If you'd like to help with this, please do and send a pull request.

Also, if there are any additional segfaults when using the GPU, please report them here. It will help if you provide an example to reproduce the problem including the exact command you ran and the data that you ran it on.

No 'add_lookup_parameters' in LSTMBuilder

Some of the code in RNN tutorial doesn't work:
enc_lstm.add_lookup_parameters("lookup", (VOCAB_SIZE, ENC_INPUT_DIM))
AttributeError: 'pycnn.LSTMBuilder' object has no attribute 'add_lookup_parameters'

Thanks

Sparse update for embedding matrix technically incorrect for momentum (but probably better)

Hi,

I noticed this issue in my own implementation, and remembered that your solvers also use a sparse update implementation for the embedding matrix. I thought I'd drop you a note.

The issue is that most momentum-based solvers involve some sort of exponential decay on the momentum vector. In my implementation, this decay was being applied unevenly, because the solver would only be called for words present in the minibatch. If I'm reading your implementation right, I think the same is true for you.

The technically correct solution is pretty obvious --- we can just track a timestamp, and do the update lazily as momentum ** (current_time - time_of_last_update).

However, in practice I think this solution might be worse than the "bug". It would mean rare words get no momentum, because we can expect them to have a long time between updates. Alternatively we could set the momentum decay really low, which would make the gradients of the rare words too sticky.

For the classical momentum, the "bug" version means that the momentum is simply the EMA of the active gradients for that word. I'm satisfied with this.

However, I think there might be some problem for other solvers, such as Adam, which also take the number of overall updates as a parameter. We might consider using the number of updates for the feature, instead. In my implementation, I happen to have the timestamp of the last change to the feature, and I didn't want to track another book keeping variable for a minor thing. So I'm guessing the number of updates using the time since last change, time / (time-last_upd).

I doubt that there's a huge difference in results from this issue, either way. But it might be good to note ways in which the implementation behaves differently from others. A similar situation occurs for regularization penalties. I saw from another issue that you're aware of that.

Best,
Matt

examples/encdec cannot work well in GPU

Hi all,

Thanks a lot for this great NN tool. I'am happy to see that it provides the GPU version. I have tested the examples using GPU and find that encdec cannot work. The detail is as follows. I further find that the problem occurs in "loss += as_scalar(cg.forward())"; . If someone knows why, please tell me the reason. Thanks again.

[cnn] using GPU
[cnn] Device Number: 0
[cnn] Device name: Tesla K80
[cnn] Memory Clock Rate (KHz): 2505000
[cnn] Memory Bus Width (bits): 384
[cnn] Peak Memory Bandwidth (GB/s): 240.48

[cnn] Memory Free (MB): 1478.89/-805.765

[cnn] Device Number: 1
[cnn] Device name: Tesla K80
[cnn] Memory Clock Rate (KHz): 2505000
[cnn] Memory Bus Width (bits): 384
[cnn] Peak Memory Bandwidth (GB/s): 240.48

[cnn] Memory Free (MB): 1575.23/-805.765

[cnn] *_USING DEVICE: 1
[cnn] random seed: 1801438780
[cnn] allocating memory: 1536MB
[cnn] memory allocation done.
Reading training data from data/en.40k...
636637 lines, 16006553 tokens, 40003 types
Reading dev data from data/test.en.40k...
919 lines, 25936 tokens
Parameters will be written to: bilm_3_500_500-pid16320.params
*_SHUFFLE
before forward computation ...
Segmentation fault (core dumped)

Best,
Jiajun

examples/xor-batch-lookup broken

The project can be compiled correctly, but when I want to run ./xor-batch-lookup in examples, it broken with error "backward() called on non-scalar node" with exit code 6. The same problem both occurs on MAC OS and CentOS

rate decay in Trainer not set?

The learning rate decay in training is set according to:

  void update_epoch(real r = 1) {
    epoch += r;
    eta = eta0 / (1 + epoch * eta_decay);
  }

https://github.com/clab/dynet/blob/master/dynet/training.h#L27

However, from what I can see eta_decay is never initialized.

lookup parameters initialized to zeros (v2)

The lookup parameters seem to be initialized to zeros instead of random values.

import pycnn as pc
m = pc.Model()
print "lookups"
lp = m.add_lookup_parameters((10,10))
for i in xrange(10):
    print lp[i].npvalue()

# vs.
print "regular"
p = m.add_parameters((10,10))
print pc.parameter(p).npvalue()

Eigen

Hi,

when using a large hidden state size for the LSTM, it runs a bit slow. Are there any recommended settings for Eigen? e.g., using multithreading or linking with intel MKL?

Also, is there an option to use double instead of float for all Parameters?

Segmentation fault in save (v2)

Saving any non-empty model results in "Segmentation fault (core dumped)":

The following code produces this error:

from gpycnn import *
m = Model()
m.save('model')
bilstm_builder = BiRNNBuilder(1, 2, 100, m, LSTMBuilder)
m.save('model')

The first save operation works well, and the second causes a segmentation fault. It also happens when bilstm_builder = BiRNNBuilder(1, 2, 100, m, LSTMBuilder) is replaced with a lookup table initialization.

Thanks in advance!

CUDA compile often fails with error about -fPIC

When compiling with the CUDA backend, compile often fails with the following error:

Linking CXX shared library libcnncuda_shared.so
/usr/bin/ld: CMakeFiles/cnncuda_shared.dir/./cnncuda_shared_intermediate_link.o: relocation R_X86_64_32S against `__nv_module_id' can not be used when making a shared object; recompile with -fPIC
CMakeFiles/cnncuda_shared.dir/./cnncuda_shared_intermediate_link.o: error adding symbols: Bad value

Error in PickNegLogSoftmax

I'm trying to use pickneglogsoftmax_batch (using gpycnn, v2) and get the following error:

python: /home/vered/cnn/cnn/nodes.cc:1380: void cnn::PickNegLogSoftmax::forward_dev_impl(const MyDevice&, const std::vector<const cnn::Tensor*>&, cnn::Tensor&) const [with MyDevice = cnn::Device_GPU]: Assertion `pvals->size() == fx.d.batch_elems()' failed.

With the following code:

loss = pickneglogsoftmax_batch(batch_predictions, batch_labels)

where the shape of batch_predictions is (batch_size * output_dim, 1). Other shapes (i.e. a matrix of batch_size x output_dim or its transpose) return a dimension error.

Thanks in advance.

Require specification of the variable that forward/backward computation should target

Currently, cnn calculates forward values and backward derivatives with respect to the last variable in the computation graph. However, we want to calculate values or derivatives with respect to a particular variable, and this might not be last in the graph. This change should make it necessary for the user to explicitly choose the variable that is used as the target of forward, or start of backward computation.

overly aggressive nonzero-component assertion

In model.h, LookupParameterStorage contains the following field:

// gradients are sparse, so track which components are nonzero
std::unordered_set<unsigned> non_zero_grads;

In model.cc, LookupParameterStorage::g_squared_l2norm_dev contains the following assertion:

auto it = non_zero_grads.begin();
assert(it != non_zero_grads.end());

However, this assertion is causing failures under two conditions that seem like they should be valid:

Parameters that are created but not used. Likely if this happens it is a programming error, but either it should be ignored because it doesn't matter or a more appropriate error message should be displayed.
Lookup parameters that are set not to update (using expr::const_lookup). This one definitely should not be throwing errors.

Note: I discovered and tested these using pycnn, but presumably the issues hold for c++ code as well.

Weird EIGEN Error While Installing PyCnn V2.0

Hi, I'm trying to install PyCnn 2.0 on MacOS
This is the error I'm getting:

/Users/talbaumel/cnn/eigen/unsupported/Eigen/CXX11/src/Tensor/TensorChipping.h:152:5: error:
static_assert failed "YOU_MADE_A_PROGRAMMING_MISTAKE"
EIGEN_STATIC_ASSERT(NumInputDims >= 2, YOU_MADE_A_PROGRAMMING_MISTAKE);
^ ~~~~~~~~~~~~~~~~~
/Users/talbaumel/cnn/eigen/Eigen/src/Core/util/StaticAssert.h:32:40: note:
expanded from macro 'EIGEN_STATIC_ASSERT'
#define EIGEN_STATIC_ASSERT(X,MSG) static_assert(X,#MSG);
^ ~
/Users/talbaumel/cnn/eigen/unsupported/Eigen/CXX11/src/Tensor/TensorEvaluator.h:323:7: note:
in instantiation of member function 'Eigen::TensorEvaluator<const
Eigen::TensorChippingOp<0, const Eigen::TensorMap<Eigen::Tensor<float, 1,
0, long>, 0> >, Eigen::DefaultDevice>::TensorEvaluator' requested here
m_rightImpl(op.rhsExpression(), device)
^
/Users/talbaumel/cnn/eigen/unsupported/Eigen/CXX11/src/Tensor/TensorAssign.h:102:7: note:
in instantiation of member function 'Eigen::TensorEvaluator<const
Eigen::TensorCwiseBinaryOp<Eigen::internal::scalar_difference_op,
const Eigen::TensorMap<Eigen::Tensor<float, 0, 0, long>, 0>, const
Eigen::TensorChippingOp<0, const Eigen::TensorMap<Eigen::Tensor<float, 1,
0, long>, 0> > >, Eigen::DefaultDevice>::TensorEvaluator' requested here
m_rightImpl(op.rhsExpression(), device)
^
/Users/talbaumel/cnn/eigen/unsupported/Eigen/CXX11/src/Tensor/TensorExecutor.h:56:48: note:
in instantiation of member function 'Eigen::TensorEvaluator<const
Eigen::TensorAssignOp<Eigen::TensorMap<Eigen::Tensor<float, 0, 0, long>,
0>, const
Eigen::TensorCwiseBinaryOpEigen::internal::scalar_difference_op<float,
const Eigen::TensorMap<Eigen::Tensor<float, 0, 0, long>, 0>, const
Eigen::TensorChippingOp<0, const Eigen::TensorMap<Eigen::Tensor<float, 1,
0, long>, 0> > > >, Eigen::DefaultDevice>::TensorEvaluator' requested here
TensorEvaluator<Expression, DefaultDevice> evaluator(expr, device);
^
/Users/talbaumel/cnn/eigen/unsupported/Eigen/CXX11/src/Tensor/TensorDevice.h:35:59: note:
in instantiation of member function 'Eigen::internal::TensorExecutor<const
Eigen::TensorAssignOp<Eigen::TensorMap<Eigen::Tensor<float, 0, 0, long>,
0>, const
Eigen::TensorCwiseBinaryOpEigen::internal::scalar_difference_op<float,
const Eigen::TensorMap<Eigen::Tensor<float, 0, 0, long>, 0>, const
Eigen::TensorChippingOp<0, const Eigen::TensorMap<Eigen::Tensor<float, 1,
0, long>, 0> > > >, Eigen::DefaultDevice, true>::run' requested here
internal::TensorExecutor<const Assign, DeviceType>::run(assign, m_device);
^
/Users/talbaumel/cnn/cnn/nodes.cc:1377:38: note: in instantiation of function
template specialization
'Eigen::TensorDevice<Eigen::TensorMap<Eigen::Tensor<float, 0, 0, long>,
0>,
Eigen::DefaultDevice>::operator=Eigen::TensorCwiseBinaryOp<Eigen::internal::scalar_difference_op<float,
const Eigen::TensorMap<Eigen::Tensor<float, 0, 0, long>, 0>, const
Eigen::TensorChippingOp<0, const Eigen::TensorMap<Eigen::Tensor<float, 1,
0, long>, 0> > > >' requested here
fx.t<0>().device(_dev.edevice) = z.t<0>() - xs[0]->t<1>().chip<0>(_pval);

LD_LIBRARY_PATH overwritten by CUDA

Both cuDNN and cnn uses LD_LIBRARY_PATH environment variable.
Can it be change so the same system run both?

Thanks

examples/nlm broken with CUDA backend

The nlm example seems to be broken when it is built with the CUDA backend. cnn is at commit e1f5f3c and eigen @ changeset 7948:417bfb9807ba.

Input (ngrams.txt):

a b c d
b c d e
e f g h
a b c d

Run:

gpu01% ./nlm ngrams.txt
[cnn] using GPU
[cnn] Device Number: 0
[cnn]   Device name: GeForce GTX TITAN X
[cnn]   Memory Clock Rate (KHz): 3505000
[cnn]   Memory Bus Width (bits): 384
[cnn]   Peak Memory Bandwidth (GB/s): 336.48

[cnn]   Memory Free (MB): -142.275/-0.196608

[cnn] **USING DEVICE: 0
[cnn] random seed: 3833634821
[cnn] allocating memory: 512MB
[cnn] memory allocation done.
digraph G {
  rankdir=LR;
  nodesep=.05;
  N0 [label="v0 = lookup_parameters(|x|=29 --> {100})"];
  N1 [label="v1 = lookup_parameters(|x|=29 --> {100})"];
  N2 [label="v2 = lookup_parameters(|x|=29 --> {100})"];
  N3 [label="v3 = parameters({100,300}, 0x5a79300)"];
  N4 [label="v4 = parameters({100}, 0x5a794a0)"];
  N5 [label="v5 = parameters({29,100}, 0x5a79650)"];
  N6 [label="v6 = parameters({29}, 0x5a797e0)"];
  N7 [label="v7 = concat(v0,v1,v2)"];
  N0 -> N7;
  N1 -> N7;
  N2 -> N7;
  N8 [label="v8 = v3 * v7"];
  N3 -> N8;
  N7 -> N8;
  N9 [label="v9 = v4 + v8"];
  N4 -> N9;
  N8 -> N9;
  N10 [label="v10 = ReLU(v9)"];
  N9 -> N10;
  N11 [label="v11 = v5 * v10"];
  N5 -> N11;
  N10 -> N11;
  N12 [label="v12 = v6 + v11"];
  N6 -> N12;
  N11 -> N12;
  N13 [label="v13 = log_softmax(v12)"];
  N12 -> N13;
  N14 [label="v14 = pick(v13,4201307680)"];
  N13 -> N14;
  N15 [label="v15 = -v14"];
  N14 -> N15;
}
zsh: segmentation fault (core dumped)  ./nlm ngrams.txt

"Memory Free" seems obviously odd, but it prints reasonable values when omitting the int casts in cnn/cuda.cc:32.

Just using the CPU works fine. Other examples work too, e.g. 'xor' or 'rnnlm'.

Technical info:

gpu01% uname -a
Linux gpu01 3.19.0-37-generic #42~14.04.1-Ubuntu SMP Mon Nov 23 15:13:51 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux

GCC

gpu01% gcc -v
Using built-in specs.
COLLECT_GCC=gcc
COLLECT_LTO_WRAPPER=/usr/lib/gcc/x86_64-linux-gnu/4.8/lto-wrapper
Target: x86_64-linux-gnu
Configured with: ../src/configure -v --with-pkgversion='Ubuntu 4.8.4-2ubuntu1~14.04' --with-bugurl=file:///usr/share/doc/gcc-4.8/README.Bugs --enable-languages=c,c++,java,go,d,fortran,objc,obj-c++ --prefix=/usr --program-suffix=-4.8 --enable-shared --enable-linker-build-id --libexecdir=/usr/lib --without-included-gettext --enable-threads=posix --with-gxx-include-dir=/usr/include/c++/4.8 --libdir=/usr/lib --enable-nls --with-sysroot=/ --enable-clocale=gnu --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-gnu-unique-object --disable-libmudflap --enable-plugin --with-system-zlib --disable-browser-plugin --enable-java-awt=gtk --enable-gtk-cairo --with-java-home=/usr/lib/jvm/java-1.5.0-gcj-4.8-amd64/jre --enable-java-home --with-jvm-root-dir=/usr/lib/jvm/java-1.5.0-gcj-4.8-amd64 --with-jvm-jar-dir=/usr/lib/jvm-exports/java-1.5.0-gcj-4.8-amd64 --with-arch-directory=amd64 --with-ecj-jar=/usr/share/java/eclipse-ecj.jar --enable-objc-gc --enable-multiarch --disable-werror --with-arch-32=i686 --with-abi=m64 --with-multilib-list=m32,m64,mx32 --with-tune=generic --enable-checking=release --build=x86_64-linux-gnu --host=x86_64-linux-gnu --target=x86_64-linux-gnu
Thread model: posix
gcc version 4.8.4 (Ubuntu 4.8.4-2ubuntu1~14.04)

CUDA

gpu01% /usr/local/cuda-7.5/bin/nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2015 NVIDIA Corporation
Built on Tue_Aug_11_14:27:32_CDT_2015
Cuda compilation tools, release 7.5, V7.5.17

DyNet init function is not easy to use

Currently, the DyNet "init" function takes command line arguments as-is, and is not straightforward to access programmatically. It would be better to have a structure containing the options that can be passed to the initializer (while maintaining the compatibility with the current interface).

support running forward and backward computation on a threadpool

Here are the details of how to do it (maybe):
http://stackoverflow.com/questions/35249961/boostshared-future-and-when-all-with-multiple-continuations

clab / dynet Goto Github PK

dynet's People

Contributors

Stargazers

Watchers

Forkers

dynet's Issues

include <boost/serialization/array.hpp>

Recommend Projects

Recommend Topics

Recommend Org

Jobs