GithubHelp home page GithubHelp logo

torch / threads Goto Github PK

View Code? Open in Web Editor NEW
248.0 22.0 55.0 108 KB

Threads for Lua and LuaJIT. Transparent exchange of data between threads is allowed thanks to torch serialization.

License: Other

Lua 66.34% CMake 2.20% C 31.46%

threads's Introduction

Threads

Build Status

A thread package for Lua and LuaJIT.

The documentation for the threads library is organized as follows

Why another threading package for Lua, you might wonder? Well, to my knowledge existing packages are quite limited: they create a new thread for a new given task, and then end the thread when the task ends. The overhead related to creating a new thread each time I want to parallelize a task does not suit my needs. In general, it is also very hard to pass data between threads.

The magic of the threads package lies in the seven following points:

  • Threads are created on demand (usually once in the program).
  • Jobs are submitted to the threading system in the form of a callback function. The job will be executed on the first free thread.
  • If provided, a ending callback will be executed in the main thread, when a job finishes.
  • Job callback are fully serialized (including upvalues!), which allows transparent copy of data to any thread.
  • Values returned by a job callback will be passed to the ending callback (serialized transparently).
  • As ending callbacks stay on the main thread, they can directly "play" with upvalues of the main thread.
  • Synchronization between threads is easy.

threads relies on Torch7 for serialization. It uses pthread, and Windows thread implementation. One could easily get inspired from Torch serialization system to adapt the package to its own needs. Torch should be straighforward to install, so this dependency should be minor too.

At this time, if you have torch7 installed, the installation can easily achieved with luarocks:

luarocks install threads

A simple example is better than convoluted explanations:

local threads = require 'threads'

local nthread = 4
local njob = 10
local msg = "hello from a satellite thread"


local pool = threads.Threads(
   nthread,
   function(threadid)
      print('starting a new thread/state number ' .. threadid)
      gmsg = msg -- get it the msg upvalue and store it in thread state
   end
)

local jobdone = 0
for i=1,njob do
   pool:addjob(
      function()
         print(string.format('%s -- thread ID is %x', gmsg, __threadid))
         return __threadid
      end,

      function(id)
         print(string.format("task %d finished (ran on thread ID %x)", i, id))
         jobdone = jobdone + 1
      end
   )
end

pool:synchronize()

print(string.format('%d jobs done', jobdone))

pool:terminate()

Typical output:

starting a new thread/state number 1
starting a new thread/state number 3
starting a new thread/state number 2
starting a new thread/state number 4
hello from a satellite thread -- thread ID is 1
hello from a satellite thread -- thread ID is 2
hello from a satellite thread -- thread ID is 1
hello from a satellite thread -- thread ID is 2
hello from a satellite thread -- thread ID is 4
hello from a satellite thread -- thread ID is 2
hello from a satellite thread -- thread ID is 1
hello from a satellite thread -- thread ID is 3
task 1 finished (ran on thread ID 1)
hello from a satellite thread -- thread ID is 4
task 2 finished (ran on thread ID 2)
hello from a satellite thread -- thread ID is 4
task 3 finished (ran on thread ID 1)
task 4 finished (ran on thread ID 2)
task 5 finished (ran on thread ID 4)
task 9 finished (ran on thread ID 4)
task 10 finished (ran on thread ID 4)
task 8 finished (ran on thread ID 3)
task 6 finished (ran on thread ID 2)
task 7 finished (ran on thread ID 1)
10 jobs done

Advanced Example

See a neural network threaded training example for a more advanced usage of threads.

The library provides different low-level and high-level threading capabilities.

Soon some more high-level features will be proposed, built on top of Threads.

The mid-level feature of the threads package is the threads.Threads() class, built upon low-level features. This class could be easily leveraged to create higher-level abstractions.

This class is used to manage a set of queue threads:

local threads = require 'threads'
local t = threads.Threads(4) -- create a pool of 4 threads

Note that in the past the threads package was providing only one class (Threads) and it was possible to do:

local Threads = require 'threads'
local t = Threads(4) -- create a pool of 4 threads

While this is still possible, the first (explicit) way is recommended for clarity, as more and more high-level classes will be added to threads.

Internally, a Threads instance uses several Queues, i.e. thread-safe task queues:

  • mainqueue is used by the queue threads to communicate serialized endcallback functions back to the main thread; and
  • threadqueue is used by the main thread to communicate serialized callback function to the queue threads.
  • threadspecificqueues are used by the main thread to communicate serialized callback function to a specific thread.

Internally, the queue threads consist of an infinite loop that waits for the next job to be available on the threadqueue queue. The queue threads can be switched from "specific" mode (in which case each thread i is looking at jobs put in its specific threadspecificqueues[i] queue, or non-specific mode (in which case, threads are looking at available jobs in threadqueue. Specific and non-specific mode can be switched with Threads:specific(boolean).

When a job is available, one of the threads executes it and returns the results back to the main thread via the mainqueue queue. Upon receipt of the results, an optional endcallback is executed on the main thread (see Threads:addjob()).

There are no guarantee that all jobs are executed until Threads:synchronize() is called.

Each thread has its own lua_State. However, we provide a serialization scheme which allows automatic sharing for several Torch objects (storages, tensors and tds types). Sharing of vanilla lua objects is not possible, but instances of classes that support serialization (eg. classic objects with using require 'classic.torch' or those created with torch.class) can be shared, but remember that only the memory in tensor storages and tds objects will be shared by the instances, other fields will be copies. Also if synchronization is required that must be implemented by the user (ie. with mutex).

Argument N of this constructor specifies the number of queue threads that will be spawned. The optional arguments f1,f2,... can be a list of functions to execute in each queue thread. To be clear, all of these functions will be executed in each thread. However, each optional function f takes an argument threadid which is a number between 1 and N identifying each thread. This could be used to make each thread have different behaviour.

Example:

threads.Threads(4,
   function(threadid)
      print("Initializing thread " .. threadid)
   end
)

Note that the id of each thread is also stored into the global variable __threadid (in each thread Lua state).
Notice about Upvalues:
When deserializing a callback, upvalues must be of known types. Since f1,f2,... in threads.Threads() are deserialized in order, we suggest that you make a separated f1 containing all the definitions and put the other code in f2,f3,.... e.g.

require 'nn'
local threads = require 'threads'
local model = nn.Linear(5, 10)
threads.Threads(
    2,
    function(idx)                       -- This code will crash
        require 'nn'                    -- because the upvalue 'model'
        local myModel = model:clone()   -- is of unknown type before deserialization
    end
)
require 'nn'
local threads = require 'threads'
local model = nn.Linear(5, 10)
threads.Threads(
    2,
    function(idx)                      -- This code is OK.
        require 'nn'
    end,                               -- child threads know nn.Linear when deserializing f2
    function(idx)
        local myModel = model:clone()  -- because f1 has already been executed
    end
)

Switch the Threads system into specific (true) or non-specific (false) mode. In specific mode, one must provide the thread index which is going to execute a given job (when calling addjob()). In non-specific mode, the first available thread will execute the first available job.

Switching from specific to non-specific, or vice-versa, will first synchronize the current running jobs.

This method is used to queue jobs to be executed by the pool of queue threads.

The id is the thread number that will be executing the given job. It must be passed in specific mode, and is absent in non-specific mode. The callback is a function that will be executed in each queue thread with the optional ... arguments. The endcallback is a function that will be executed in the main thread (the one calling this method). It defaults to function() end.

This method will return immediately, unless the Queue queue is full, in which case it will wait (i.e. block) until one of the queue threads retrieves a new job from the queue.

Before being executed in the queue thread, the callback and its optional ... arguments are serialized by the main thread and unserialized by the queue. Other than through the optional arguments, the main thread can also transfer data to the queue by using upvalues:

local upvalue = 10
pool:addjob(
   function()
      queuevalue = upvalue
      return 1
   end,
   function(inc)
      upvalue = upvalue + inc
   end
)

In the above example, each queue thread will have a global variable queuevalue which will contain a copy of the main thread's upvalue. Note that if the main thread's upvalue were global, as opposed to local it would not be an upvalue, and therefore would not be serialized along with the callback. In which case, queuevalue would be nil.

In the same example, the queue also communicates a value to the main thread. This is accomplished by having the callback return one ore many values which will be serialized and unserialized as arguments to the endcallback function. In this case a value of 1 is received by the main thread as argument inc to the endcallback function, which then uses it to increment upvalue. This demonstrates how communication between threads is easily achieved using the addjob method.

This method is used to tell the main thread to execute the next endcallback in the queue (see Threads:addjob). If no such job is available, the main thread of execution will wait (i.e. block) until the mainthread Queue (i.e. queue) is filled with a job.

In general, this method should not be called, except if one wants to use the async capabilities of the Threads class. Instead, synchronize() should be called to make sure all jobs are executed.

This method will call dojob until all callbacks and corresponding endcallbacks are executed on the queue and main threads, respectively. This method will also raise an error for any errors raised in the pool of queue threads.

This method will call synchronize, terminate each queue and free their memory.

Specify which serialization scheme should be used. This function should be called (if you want a particular serialization) before calling threads.Threads() constructor.

A serialization package (pkgname) should return a table of serialization functions when required (save and load). See serialize specifications for more details.

By default the serialization system uses the 'threads.serialize' sub-package, which leverages torch serialization.

The 'threads.sharedserialize' sub-package is also provided, which transparently shares the storages, tensors and tds C data structures. This approach is great if one needs to pass large data structures between threads. See the shared example for more details.

In specific mode, id must be a number and the function will return true if the corresponding thread queue is not full, false otherwise.

In non-specific mode, id should not be passed, and the function will return true if the global thread queue is not full, false otherwise.

Returns true if there are still some unfinished jobs running, false otherwise.

The methods acceptsjob() and hasjob() allow you to use the threads.Threads in an asynchronous manner, without the need of calling synchronize(). See the asynchronous example for a typical test case.

This class is in effect a thread-safe task queue. The class is returned upon requiring the sub-package:

Queue = require 'threads.queue'

Queue(N)

The Queue constructor takes a single argument N which specifies the maximum size of the queue.

This method is called by a thread to put a job in the queue. The job is specified in the form of a callback function taking arguments .... Both the callback function and ... arguments are serialized before being put into the queue. If the queue is full, i.e. it has more than N jobs, the calling thread will wait (i.e. block) until a job is retrieved by another thread.

This method is called by a thread to get, unserialize and execute a job inserted via addjob from the queue. A calling thread will wait (i.e. block) until a new job can be retrieved. It returns to the calller whatever the job function returns after execution.

A table of serialization functions is returned upon requiring the sub-package:

serialize = require 'threads.serialize'

This function serializes function func. It returns a torch CharStorage.

This function unserializes the outputs of a serialize.save (a CharStorage). The unserialized object obj is returned.

The function returns a new thread-safe function which embedds func (call arguments and returned arguments are the same). A mutex is created and locked before the execution of func(), and unlocked after. The mutex is destroyed at the garbage collection of func.

If needed, one can specify the mutex to use as a second optional argument to threads.safe(). It is then up to the user to free this mutex when needed.

Dive-in low-level features with the provided example.

Thread

The threads.Thread class simply starts a thread, and executes a given Lua code in this thread. It is up to the user to manage the event loop (if one is needed) to communicate with the thread. The class threads.Threads is an built upon this class.

Returns a thread id, and execute the code given as a string. The thread must be freed with free().

Wait for the given thread to finish, and free its resources.

Mutex

Standard mutex.

Returns a new mutex. If id is given, it must be a number returned by another mutex with id(), in which case the returned mutex is equivalent to the one uniquely referred by id.

A mutex must be freed with free().

Lock the given mutex. If a thread already locked the mutex, it will block until it has been unlock.

Unlock the given mutex. This method call must follow a lock() call.

Returns a number unambiguously representing the given mutex.

Free given mutex.

Condition

Standard condition variable.

Returns a new condition variable. If id is given, it must be a number returned by another condition variable with id(), in which case the returned condition is equivalent to the one uniquely referred by id.

A condition must be freed with free().

Returns a number unambiguously representing the given condition.

This function must be preceded by a mutex:lock() call. Assuming the mutex is locked, this method unlock it and wait until the condition signal has been raised.

Raise the condition signal.

Free given condition.

tds.AtomicCounter has been implemented to be used with sharedserialize to provide fast and safe lockless counting of progress (steps) between threads. See example for usage.

threads's People

Contributors

adamlerer avatar andresy avatar apaszke avatar atcold avatar btnc avatar colesbury avatar gchanan avatar jaspersnoek avatar jonathantompson avatar lake4790k avatar linusu avatar nicholas-leonard avatar rewanthtammana avatar soumith avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

threads's Issues

nn and thread-ffi

Hello,

I am currently implementing a program using a network in the thread-ffi.
It seems that the network output is quite random when using more than one thread.
I am loading the network in each thread from the same file, then they are not sharing the weights.
I checked the weights of the network in each thread, and they seem to be equal.

There is the test code I am using:

local Thread = require('threads')
local sdl = require 'sdl2'

local nthread = 4
local i = 1

sdl.init(0)

torch.setdefaulttensortype('torch.FloatTensor')

local requirement = function()
   require 'nnx'
   require 'image'
   torch.setdefaulttensortype('torch.FloatTensor')
end

local initData = function()
   network = torch.load('../Models/model-85-float.net')
end

local threads = Thread(nthread, requirement, initData)

for job = 1, 10 do
   threads:addjob(function()
      local lena = image.lena()
      local output = network.modules[1]:forward(lena)
      local tmp = output[1]
      tmp:pow(2)
      return math.sqrt(tmp:sum())
   end,
   function(val)
      print(i, val)
      i = i + 1
   end)
end

threads:synchronize()

threads:terminate()

output with nthread = 4

1       28.120826367006  
2       6.0572596294124  
3       27.152286792546  
4       10.393384170717  
5       2.4370800787178  
6       8.1725043696065  
7       0.1692221143499  
8       4.1842784251104  
9       7.4543807982242  
10      11.971466749804  

and nthread = 1

1       76.516878440195
2       76.516878440195
3       76.516878440195
4       76.516878440195
5       76.516878440195
6       76.516878440195
7       76.516878440195
8       76.516878440195
9       76.516878440195
10      76.516878440195

Gregory

Workers duplicated

I launch a program using 4 ffi-threads, I wanted to know the life cycle of the worker.
My understanding was that the 4 worker will be alive during the whole life time of the program. So I was expecting to have the main thread and 4 workers.

However in htop I get:

I guess that the luajit are the torch threads. And we can also see 4 workers with more time then the others.

But what are the other? Why are there so many of them? And can it be a problem, because I can get a segmentation fault sometime.

Thanks,

Greg

Common data

How can I access global (shared) data from a thread without making a seperate copy for each thread during the first access? And why can't I access global (like the "very global", non-"local") variables.

Segmentation fault while loading VGG-16 model

I am trying to load the VGG-16 layer caffe model on the Jetson Tk1 but I get a segmentation fault while loading the model in torch using loadcaffe_wrap. RAM is 4GB. Is there any work around to this?

Error messages are not shown when using Threads

Hi,

I am trying to create a separate thread to load data for my deep learning application. I observed that if an error occurs in either the main thread or the data thread, it is not printed at stderr, and moreover, the entire program seems to be stuck doing nothing (like running an infinite loop). This is very unfortunate, since it is hard to debug such an application. What is the proper way to deal with this ?

Thanks a lot

No longer listed on luarocks

typing luarocks install threads doesnt return any results and a search of luarocks.org does not show up your project. This makes installing it very difficult as you give no working installation guide

Code issues when building w/ win32 threads

It appears there are quite a few problems with the latest head resulting in a broken build when compiling under windows. Here are the issues I've encountered so far in no particular order:

  • THThread.c uses USE_WIN32_THREADS but the CMake file actually defines USE_WIN_THREADS=1 causing incorrect preprocessing. This is easily fixed by changing one of them to match.
  • The restrict keyword is a c99 feature but THThread.c is being compiled without using -std=c99 resulting in errors.
  • In THThread.c, THREAD_FUNCTION isn't defined anywhere but pthread_create attempts to cast into it nonetheless resulting in compile errors. According to MSDN _beginthreadex needs unsigned ( __stdcall *start_address )( void * ) for the 3rd argument. Perhaps THREAD_FUNCTION should be defined to that?
  • threads.c #include <dlfcn.h> but that header doesn't exist under windows. Not sure what the fix should be here.

Please advise on how best to address the above issues.

Thanks

undefined symbol: THAtomicIncrementRef

Hi guys,
I have installed torch threads several times in the past, but now I get the following error:
"torch/install/lib/lua/5.1/libthreads.so: undefined symbol: THAtomicIncrementRef"
I have tried on an Ubuntu and a Centos machine.
Any idea how to solve this?

A few questions.

The first question is, would this threading solution be viable with realtime (16ms -33.33ms) calculations? I have been looking into using LuaJit for the higher level programming for the most part simply because it's easy to write, and requires no recompilation. But I am concerned about the overhead for serialization of jobs and the lack of a shared memory state. Is serialization negligible?

The second, when synchronizing information from jobs back to the main thread, does the system overwrite what already exists?

How to kill process, queue?

Is it possible to kill a queue?
I need somethink like Java Thread.isInterrupted() or Objective-c [NSThread isCancelled].

Threads picking up cached outputs?

Here is the script I used to reproduce this problem

require 'cudnn'

local model = (require 'loadcaffe').load('deploy.prototxt', 'VGG_ILSVRC_16_layers.caffemodel', 'cudnn')

local nThreads = 8
torch.setnumthreads(nThreads)
local Threads = require 'threads'
Threads.serialization('threads.sharedserialize')
local mutex_id = Threads.Mutex():id()
local threads = Threads(nThreads,
  function()
    require 'cudnn'
  end,
  function()
    _model = model
    _mutex = (require 'threads').Mutex(mutex_id)
  end
)

for t=1,100 do
  for i=2,10 do
    threads:addjob(
      function()
        _mutex:lock()
        local inputs = torch.rand(i, 3, 224, 224):cuda()
        local outputs = _model:forward(inputs)
        if i ~= outputs:size(1) then
          print("mismatch!", inputs:size(1), outputs:size(1))
        end
        _mutex:unlock()
      end
    )
  end
  threads:synchronize()
end

When I run this, I see "mismatch" being printed.

I tried replacing VGG with a simpler model (eg, nn.Sequential of a bunch of nn.Linear) but could not reproduce the issue like that. So, maybe the model itself has something to do with the problem?

Threads - attempt to call method split a nil value

I am running some threads but I can't get the string split method to work, the method doesn't exist in the threads but it does work normally.
The below code prints "string" for each thread however if I add the line which is currently commented out it throws the error;

donkeys.lua:16: attempt to call method 'split' (a nil value)
donkeys = Threads(
                      params.nThreads,
                         function(idx)
                                 require "torch"
                                 require "string"
                                 tid = idx -- Thread id
                                 print(string.format("Initialized thread with id : %d.", did))
                                 eg = "x,y,z"
                                 print(type(eg)) 
                                 --print(eg:split(",")) 
                         end
                         )
end                     

Serialization memory cost

I am just trying to get an idea of GPU memory usage - when CudaTensors are serialized to be passed to the threads - are they basically pointers or is there some additional device memory usage?

threads-scm-1.rockspec

The compiling files are not arranged in a temp folder like build through luarocks to install it.

I suggest to change the build variable like following:

build = {
   type = "command",
   build_command = [[
         cmake -E make_directory build;
         cd build;
         cmake .. -DCMAKE_BUILD_TYPE=Release -DCMAKE_PREFIX_PATH="$(LUA_BINDIR)/.." -DCMAKE_INSTALL_PREFIX="$(PREFIX)"
         $(MAKE)
   ]],
   install_command = "cd build && $(MAKE) install"
}

Threads in thread pool getting hung on ppc64le

Hi,

We have built torch on ppc64le. And while testing it, we found a problem wherein threads get hung if we have a thread pool containing higher number of threads say 16 (at times, even with 8). I guess something similar to (#62, OR #16)

Here is our test case -

local Threads = require 'threads'
nthreads = 64
Threads.serialization('threads.sharedserialize')
thrds = Threads(nthreads,
         function()  print('Starting thread ') end,
         function()  require 'image' end
      );
thrds:synchronize()
print "Done"

Some of my observations -

  • After debugging using gdb, I saw all of the threads are sitting in pthread_cond_wait function.
  • Also, in above example, I also observed that if I remove "require 'image'", it even works with 64 threads. So, I think either it has to be something with the memory as well, as require image loads a bunch of stuff and probably eating up the memory. But really not sure.
  • Above test works well on x86 even with 64 threads.

Any help would be really appreciated.

Problem loading packages in local directory in threads

Hi,

I'm having a problem loading packages from local directory. The problem can be reproduced by the following code:

local init = require 'empty.lua'
local Threads = require 'threads'
Threads.serialization('threads.sharedserialize')

function main()
    require 'empty.lua'
    local function init() 
        require 'empty.lua'
    end 
    t = Threads(1, init)
end 

main()

where empty.lua is a lua package saved in local directory.

Error is only reported for the last require (i.e. the one used in thread initialisation), and loading built-in packages does not generate an error.

Thanks

double free or corruption on creation of thread pool (race condition)

The following script:

local threads = require 'threads'

for i=1,1000 do
   print( tonumber(i) )
   local pool = threads.Threads( 12,
      function (threadid)
         require 'torch'
      end )
   pool:terminate()
end

Crashes with:

$ th thread_test.lua
1
2
*** Error in `/usr/local/bin/luajit': double free or corruption (fasttop): 0x00007f80b8003090 ***
zsh: abort (core dumped)  th thread_test.lua

It definitely looks like a race condition in the thread initialization code as once the pool is started, I haven't seen any crash nor issue.

Condition waiting buggy?

From the documentation, a call to condition:wait(mutex) should unlock the mutex when it returns, but doesn't seem to. The following piece of code is showing the problem:

threads = require 'threads'
m, c = threads.Mutex(), threads.Condition()
do
   local c_id = c:id()
   thread = threads.Threads(1,
      function() threads_t = require 'threads'; require 'os'; c_t = threads_t.Condition(c_id); end)
end
thread:addjob(function() os.execute('sleep 1'); c_t:signal(); end)
m:lock() -- locking the mutex
c:wait(m) -- waiting about 1s
print("wait has returned, the mutex should be unlocked") -- this message is correctly printed
m:lock() -- locking the mutex again
print("the mutex has been locked again") -- this message never displays

Unlocking the mutex manually doesn't quite solve the problem. It is unlocked, and can be reused, however, it systematically stalls at the third wait on it. The following piece of code is showing the problem:

threads = require 'threads'
m, c = threads.Mutex(), threads.Condition()
do
   local c_id = c:id()
   thread = threads.Threads(1,
      function() threads_t = require 'threads'; require 'os'; c_t = threads_t.Condition(c_id); end)
end
thread:addjob(function() os.execute('sleep 1'); c_t:signal(); end)
m:lock() -- locking the mutex
c:wait(m) -- waiting about 1s
print("wait has returned (1st call)") -- this message is correctly printed
m:unlock() -- manually unlocking (shouldn't be necessary according to the doc)
thread:addjob(function() os.execute('sleep 1'); c_t:signal(); end)
m:lock() -- locking the mutex
c:wait(m) -- waiting about 1s
print("wait has returned (2nd call)") -- this message is correctly printed
m:unlock() -- manually unlocking (shouldn't be necessary according to the doc)
thread:addjob(function() os.execute('sleep 1'); c_t:signal(); end)
m:lock() -- locking the mutex
c:wait(m) -- waiting about 1s
print("wait has returned (3rd call)") -- this message is never printed

Unless I'm not using the conditions correctly (but I'm mainly copying from the provided example), this effectively makes the conditions unusable.

Benchmark

Hi guys,

I am running a script that requires loading images using different threads. But it seems to be getting slower with more threads. I created a simple benchmark to illustrate this:

require 'os'
require 'paths'
require 'xlua'
require 'image'
_ = require 'moses'
require 'lfs'


torch.setdefaulttensortype('torch.FloatTensor')

--[[command line arguments]]--
cmd = torch.CmdLine()
cmd:text()
cmd:text("Threads Benchmark")
cmd:option("--nThread", 2, "No of threads")
cmd:option("--nTask", 100, "No of tasks")
cmd:option("--imageDir", "/home/nicholas14/Desktop", "path to images")
cmd:option("--cpuBound", false, "do only cpu stuff")
cmd:text()
opt = cmd:parse(arg or {})

-- the task to be run in a worker thread
if opt.cpuBound then 
   function task()
      buffer = buffer or torch.FloatTensor()
      buffer:resize(10000)
      for i=1,1000 do
         buffer:random(1,1000)
      end
   end
else
   function task(filename)
      image.load(paths.concat(opt.imageDir, filename))
      collectgarbage()
   end
end

local threads = require "threads"
-- tensors won't be serialized (pointers to data will)
threads.Threads.serialization('threads.sharedserialize')
-- upvalues (which can only be local) can be serialized, globals can't :
local options = opt
local taskf = task
pool = threads.Threads(
   opt.nThread,
   function()
      require 'os'
      require 'paths'
      require 'xlua'
      _ = require 'moses'
      require 'image'
      torch.setdefaulttensortype('torch.FloatTensor')
   end,
   function(idx)
      tid = idx         
      task = taskf
      opt = options
   end
)

local filenames = {}
for filename in lfs.dir(opt.imageDir) do
   if paths.filep(paths.concat(opt.imageDir, filename)) and filename:sub(1,1) ~= '.' and filename:sub(-3) == 'jpg' then
      table.insert(filenames, filename)
   end
end

-- warmup (allocate buffers)
if opt.nThread > 1 then
   for i=1,opt.nThread*2 do
      pool:addjob(
         -- the job callback (runs in data-worker thread)
         function(fn)
            task(fn)
         end,
         nil,
         filenames[1]
      )
   end
else
   task(filenames[1])
end
pool:synchronize()

-- actual test
local start = os.clock()
local k = 0
for i = 1,opt.nTask do
   k = k + 1
   if k > #filenames then
      k = 1
   end
   local filename = filenames[k] 
   if opt.nThread > 1 then
      pool:addjob(
         -- the job callback (runs in data-worker thread)
         function(fn)
            task(fn)
         end,
         nil,
         filename
      )
   else
      task(filename)
   end
end
pool:synchronize()
print("Throughput :"..opt.nTask/(os.clock()-start).." t/s; ".." Time : "..(os.clock()-start))

When I run to test the loading of images in parallel, I get this:

nicholas14@hermes:~/projects/hades/dpe/sandbox$ th threadsbenchmark.lua --nThread 4 --nTask 1000 --imageDir /media/nicholas14/Nick/images/elt-000082/l/
Throughput :6.1674654737111 t/s;  Time : 162.141165 
nicholas14@hermes:~/projects/hades/dpe/sandbox$ th threadsbenchmark.lua --nThread 4 --nTask 1000 --imageDir /media/nicholas14/Nick/images/elt-000082/l/
Throughput :6.1883189748965 t/s;  Time : 161.59478  
nicholas14@hermes:~/projects/hades/dpe/sandbox$ th threadsbenchmark.lua --nThread 2 --nTask 1000 --imageDir /media/nicholas14/Nick/images/elt-000082/l/
Throughput :6.7784964942294 t/s;  Time : 147.525343 
nicholas14@hermes:~/projects/hades/dpe/sandbox$ th threadsbenchmark.lua --nThread 2 --nTask 1000 --imageDir /media/nicholas14/Nick/images/elt-000082/l/
Throughput :6.8066203558285 t/s;  Time : 146.915793 
nicholas14@hermes:~/projects/hades/dpe/sandbox$ th threadsbenchmark.lua --nThread 1 --nTask 1000 --imageDir /media/nicholas14/Nick/images/elt-000082/l/
Throughput :6.1727224149631 t/s;  Time : 162.003077 
nicholas14@hermes:~/projects/hades/dpe/sandbox$ th threadsbenchmark.lua --nThread 1 --nTask 1000 --imageDir /media/nicholas14/Nick/images/elt-000082/l/
Throughput :6.1484461155101 t/s;  Time : 162.642725 

So why do 4 threads do worse then 2? I was thinking it could be io bound.

But when I run with the --cpuBound switch, I get this:

nicholas14@hermes:~/projects/hades/dpe/sandbox$ th threadsbenchmark.lua --nThread 4 --cpuBound --nTask 1000
Throughput :7.5196618666735 t/s;  Time : 132.984705 
nicholas14@hermes:~/projects/hades/dpe/sandbox$ th threadsbenchmark.lua --nThread 4 --cpuBound --nTask 1000
Throughput :8.1207628245056 t/s;  Time : 123.141145 
nicholas14@hermes:~/projects/hades/dpe/sandbox$ th threadsbenchmark.lua --nThread 2 --cpuBound --nTask 1000
Throughput :8.6548213608088 t/s;  Time : 115.542538 
nicholas14@hermes:~/projects/hades/dpe/sandbox$ th threadsbenchmark.lua --nThread 2 --cpuBound --nTask 1000
Throughput :8.5866952248701 t/s;  Time : 116.459243 
nicholas14@hermes:~/projects/hades/dpe/sandbox$ th threadsbenchmark.lua --nThread 2 --cpuBound --nTask 1000
Throughput :8.6565602768195 t/s;  Time : 115.519328 
nicholas14@hermes:~/projects/hades/dpe/sandbox$ th threadsbenchmark.lua --nThread 2 --cpuBound --nTask 1000
Throughput :8.6657195728431 t/s;  Time : 115.397229 
nicholas14@hermes:~/projects/hades/dpe/sandbox$ th threadsbenchmark.lua --nThread 1 --cpuBound --nTask 1000
Throughput :8.7296370304219 t/s;  Time : 114.552301 
nicholas14@hermes:~/projects/hades/dpe/sandbox$ th threadsbenchmark.lua --nThread 1 --cpuBound --nTask 1000
Throughput :8.7231877123922 t/s;  Time : 114.636994 

So what is the overhead or what am I doing wrong?

Shared tds.hash gives sigsegv

I am trying to figure out the proper way to update a shared tds.hash by multiple threads concurrently. The following code inspired after test-threads-shared.lua gives 'Segmentation fault (double free or corruption)' every time I run it:

function f()
  local threads = require 'threads'
  threads.Threads.serialization('threads.sharedserialize')
  local njob = 10
  local tds = require 'tds'
  local dict = tds.Hash()
  collectgarbage()
  collectgarbage()

  local pool = threads.Threads(
    10,
    function(threadIdx)
    end
  )

  for k=1,njob do
    pool:addjob(
      function()
        for i = 1,100000 do
          dict[i * 11 + k] = math.random()
        end
        collectgarbage()
        collectgarbage()
        return __threadid
      end,
      function(id)
      end
    )
  end

  for k=1,njob do
    pool:addjob(
      function()
        collectgarbage()
        collectgarbage()
      end
    )
  end
  pool:synchronize()
  pool:terminate()
  return dict
end

a = f()
print(a[1])

Any ideas if the problem is on my side ? Thanks.

Shared read/write object that is not copied to each thread

Hi,

Does anyone know if it is possible to read from a big shared table or write in a shared tensor or table from each thread (assuming that each thread writes in a different part of the tensor or table at any moment) ? I would like to know if a shared memory mechanism exists that doesn't copy the global table to each thread (assuming this shared variable is big, copying would be undesirable). Also, it seems that reading a global variable that is not declared as 'local' is not working (I get 'nil' when accessing it).

Thanks!

Race conditions in the benchmarks

When I started working on loltorch I used benchmark/threadedtrain.lua as an example of using the torch threads library for training. Unfortunately after much trial and error I realized that there are no concurrency guarantees using the threads library. In the loltorch file train.lua:256-321 I addressed the issue using concurrency primitives to ensure proper order of execution.

It seems like a disclaimer in the benchmark/README.md might be useful or potentially fixing the issues in benchmark/threadedtrain.lua file. As it stands others might run into similar issues without realizing it since the first line of the README is: This is a benchmark of the threads package, as well as a good real use-case example, using Torch neural network packages.

I am not sure the number of people who would use torch threads for training as I mostly saw it used for data loading in the open source projects I looked at. I did training on my Macbook Pro and the GPU was slower than the CPU (despite having a dedicated GPU), thus the need to use the threading library.

Failed Running Test Program

When I run the test program test/test-threads.lua, I got the following error:

FATAL THREAD PANIC: (dojob) /usr/local/share/lua/5.1/torch/File.lua:254: unknown object FATAL THREAD PANIC: (dojob) /usr/local/share/lua/5.1/torch/File.lua:254: unknown object

I'm using Ubuntu 14.04.

Any idea how to figure out the problem? Thanks.

thread return value nil

Hi, after PR #25 , return value of a thread is nil. Test code:

local Threads = require 'threads'
Threads.serialization('threads.sharedserialize')

my_threads = Threads(1,
            function()
                -- nothing
            end)

function doinmain(a)
    print(a)
end

my_threads:addjob(function()
    return 10
 end,
 doinmain)
my_threads:synchronize()

Stability

Hey @andresy I see you have been quite busy. Even the package name has changed. So is this thing stable yet?

thread-ffi utilisation

Hi,

I am currently working with Torch to train neural networks. Because of the size of the dataset we are using a memory mapping to avoid to load everything in the RAM. Like we are using the GPU for training, one thread load each batch and the main thread takes care of sending it to the GPU.
For now I was using one thread (llthread), it is fine if the computation time is high enough but for smaller networks the batches aren't ready when the computation is over.
So I was looking for a new solution because using 2 llthreads isn't so appealing.
But your library seems perfect, I am just a little bit concern about the data transfer if I want to send back a torch.Tensor, will I serialize the pointer or the whole tensor?

Thank you,

Gregory

Signal handler in threads

I found that it seems impossible to register a signal handler in thread pools, see code below

-- test.lua
local classic = require 'classic'
local threads = require 'threads'
threads.Threads.serialization('threads.sharedserialize')
local tds = require 'tds'
local signal = require 'posix.signal'

local threadsTest = classic.class('threadsTest')

function threadsTest:_init()
    self.ctrlPool = threads.Threads(1)
    self.ctrlPool:addjob(function ()
        __threadid = 0
        local signal = require 'posix.signal'

        signal.signal(signal.SIGINT, function(signum)
            print('\nSIGINT received')
            print('Ex(c)iting')
            atomic:set(-1)
        end)
    end)
    self.ctrlPool:synchronize()
    local atomic = tds.AtomicCounter()
    atomic:set(1)
    self.nThreads = 2
    self.game = threads.Threads(self.nThreads,
        function ()
            inner = atomic
        end)
end

function threadsTest:start()
    for i = 1, self.nThreads do
        self.game:addjob(function ()
            while true do
                if inner:get() < 0 then break end
            end
        end)
    end

    self.game:synchronize()
    self.game:terminate()
end

test = threadsTest()
test:start()

the running result is just ignore the signal...

> th test.lua
^C^C^C^C^C^C^C^C

But I do see some one setting the signal handler in threadpool, see https://github.com/Kaixhin/Atari/blob/master/async/AsyncMaster.lua from line 117

And it's also weird if I call os.execute() in thread pool

-- threadTester.lua
local classic = require 'classic'
local threads = require 'threads'
threads.Threads.serialization('threads.sharedserialize')
local tds = require 'tds'
local threadTester = classic.class('threadTester')

function threadTester:_init(atomic)
    self.game = threads.Threads(1,
    function ()
        inner = atomic
    end)
    self.atomic = atomic
    classic.strict(self)
end

function threadTester:play()
    self.game:addjob(function ()
        os.execute("sleep ".. 10)
    end)
    -- do some stuff outside
    while true do
        if self.atomic:get() < 0 then break end
    end
end

return threadTester
-- test.lua
local threads = require 'threads'
threads.Threads.serialization('threads.sharedserialize')
local tds = require 'tds'
local signal = require 'posix.signal'

-- local ctrlpool = threads.Threads(1, function ()
--  local tds = require 'tds'
-- end)

local atomic = tds.AtomicCounter()
atomic:set(1)
nThreads = 4
-- local ctrlPool = threads.Threads(1)
-- ctrlPool:addjob(function ()
    local signal = require 'posix.signal'
    signal.signal(signal.SIGINT, function(signum)
        print('\nSIGINT received')
        print('Ex(c)iting')
        atomic:set(-1)
    end)
-- end)

-- ctrlPool:synchronize()

local gamePool = threads.Threads(nThreads, function ()
    threadTester = require 'threadTester'
    player = threadTester(atomic)
end)

for i = 1, nThreads do
    gamePool:addjob(function ()
        print(string.format("begin in thread %d", __threadid))
        local status, err = xpcall(player.play, debug.traceback, player)
        if not status then
            print(string.format('%s', err))
            os.exit(128)
        end
    end)
end

gamePool:synchronize()
gamePool:terminate()

here is the test result

> th test.lua
begin in thread 1
begin in thread 2
begin in thread 3
begin in thread 4
^C^C^C^C^C
SIGINT received
Ex(c)iting

I am now doing a project where I need to call a game to execute using os.execute() in each thread while collecting info... But I found that the signal handler is not functioning well...

SIGSEGV when libthreads.so is unmapped before pthread finishes

In some circumstances, the main Lua thread can unmap the libthreads.so while child threads are still running. I've see this with Lua 5.2 on Mac OS X and LuaJIT on Linux.

During lua_close(L), Lua unmaps shared libraries (e.g. from require 'libthreads'). This is a problem because the child thread may still be executing code from the shared library, such as: https://github.com/torch/threads/blob/master/lib/threads.c#L12-L38

This doesn't happen in the typical case because the finalizer on threads.Threads waits for the child pthreads to terminate:
https://github.com/torch/threads/blob/master/threads.lua#L294-L297

However, the parent Lua thread may finish first when there is an error in a thread callback or in some other circumstances.

I can think of three solutions to this problem:

  1. Make sure the parent thread doesn't finish lua_close() before all the child threads finish
  2. Load the thread code (i.e. newthread() and thread_closure()) separately from libthreads.so and only unload it when all threads finish.
  3. Don't call lua_close() (i.e. exit via os.exit() or some other means)

I think (2) is the best solution and I'm working on a patch.

Sample code:
https://gist.github.com/colesbury/a92cd2923a22819ccb03

Unwritable object <userdata> at <?>.callback.self.resnet.DataLoader.threads.__gc__

Hi, I modified the code fb.resnet.torch/dataloader.lua in order to read data triplet by triplet. But I encountered with an confusing error:

FATAL THREAD PANIC: (write) /home/haha/torch/install/share/lua/5.1/torch/File.lua:141: 
Unwritable object <userdata> at <?>.callback.self.resnet.DataLoader.threads.__gc__  

Below is my code...

function DataLoader:run()
   local threads = self.threads
   local size, batchSize = self.__size, self.batchSize
   local perm = torch.randperm(size)

   local tripletList = self:genTriplet()

   local idx, sample = 1, nil
   local function enqueue()
      while idx <= size and threads:acceptsjob() do
         local indices = perm:narrow(1, idx, math.min(batchSize, size - idx + 1))
         threads:addjob(
            function(indices, nCrops, tripletList)
               local sz = indices:size(1) * 3 --should be 3 times as previous, since now it is triplet
               local batch, imageSize
               local target = torch.IntTensor(sz)
               for i, idx in ipairs(indices:totable()) do

                  local idx_anchor = tripletList[idx][1]
                  local idx_positive = tripletList[idx][2]
                  local idx_negative = tripletList[idx][3]

                  local sample_anchor = _G.dataset:get(idx_anchor)   --get images
                  local sample_positive = _G.dataset:get(idx_positive)
                  local sample_negative = _G.dataset:get(idx_negative)


                  local input_anchor = _G.preprocess(sample_anchor.input)
                  local input_positive = _G.preprocess(sample_positive.input)
                  local input_negative = _G.preprocess(sample_negative.input)

                  if not batch then
                     imageSize = input_anchor:size():totable()
                     if nCrops > 1 then table.remove(imageSize, 1) end
                     batch = torch.FloatTensor(sz, nCrops, table.unpack(imageSize))
                  end
                  batch[(i-1)*2 + 1]:copy(input_anchor)
                  batch[(i-1)*2 + 2]:copy(input_positive)
                  batch[self.samples*self.blocks + i]:copy(input_negative)

                  target[(i-1)*2 + 1] = sample_anchor.target
                  target[(i-1)*2 + 2] = sample_positive.target
                  target[self.samples*self.blocks + i] = sample_negative.target

               end
               collectgarbage()
               return {
                  input = batch:view(sz * nCrops, table.unpack(imageSize)),
                  target = target,
               }
            end,
            function(_sample_)
              -- print ('WHAT????')
               sample = _sample_
            end,
            indices,
            self.nCrops,
            tripletList
         )
         idx = idx + batchSize
      end
   end

   local n = 0
   local function loop()
      enqueue()
      if not threads:hasjob() then
         return nil
      end
      threads:dojob()
      if threads:haserror() then
         threads:synchronize()
      end
      enqueue()
      n = n + 1
      return n, sample
   end

   return loop
end

Below is the original code:

function DataLoader:run()
   local threads = self.threads
   local size, batchSize = self.__size, self.batchSize
   local perm = torch.randperm(size)

   local idx, sample = 1, nil
   local function enqueue()
      while idx <= size and threads:acceptsjob() do
         local indices = perm:narrow(1, idx, math.min(batchSize, size - idx + 1))
         threads:addjob(
            function(indices, nCrops)
               local sz = indices:size(1)
               local batch, imageSize
               local target = torch.IntTensor(sz)
               for i, idx in ipairs(indices:totable()) do
                  local sample = _G.dataset:get(idx)
                  local input = _G.preprocess(sample.input)
                  if not batch then
                     imageSize = input:size():totable()
                     if nCrops > 1 then table.remove(imageSize, 1) end
                     batch = torch.FloatTensor(sz, nCrops, table.unpack(imageSize))
                  end
                  batch[i]:copy(input)
                  target[i] = sample.target
               end
               collectgarbage()
               return {
                  input = batch:view(sz * nCrops, table.unpack(imageSize)),
                  target = target,
               }
            end,
            function(_sample_)
               sample = _sample_
            end,
            indices,
            self.nCrops
         )
         idx = idx + batchSize
      end
   end

   local n = 0
   local function loop()
      enqueue()
      if not threads:hasjob() then
         return nil
      end
      threads:dojob()
      if threads:haserror() then
         threads:synchronize()
      end
      enqueue()
      n = n + 1
      return n, sample
   end

   return loop
end

FATAL THREAD PANIC: (addjob) /usr/local/share/lua/5.1/torch/File.lua:107: Unwritable object <function>

I'am just testing run multiples querys to postgresql in parallel.
But this:

local pool = threads.Threads(
   8,
   function(threadid)
     pg = pgmoon.new({
       host = "127.0.0.1",
       port = "5432",
       database = "test",
       user = "test",
       password = "qwe123"
     })

     r = assert(pg:connect())
     print(r)

      print('starting a new thread/state number ' .. threadid)
      gmsg = msg -- get it the msg upvalue and store it in thread state
   end
)

i get:

FATAL THREAD PANIC: (addjob) /usr/local/share/lua/5.1/torch/File.lua:107: Unwritable object

Shared tds.Vec gets a segfault when cunn is used

I need a background thread that will share values with the main one. Apparently, this works as long as I don't require 'cunn' (it's ok with cutorch). A snippet that reproduces this behaviour:

--require 'cunn'          -- uncommenting this crashes the program
local tds = require 'tds'                                 
local threads = require 'threads'                         

threads.Threads.serialization('threads.sharedserialize')  

local THREAD = {                                      
   val = tds.Vec(),                                       
   mutex = threads.Mutex(),                               
}                                                         
function initThread()                                  
   local val = THREAD.val                             
   local id = THREAD.mutex:id()                       
   THREAD.pool = threads.Threads(                     
         1,                                               
         function()                                       
            _threads = require 'threads'                  
            require 'tds'                                 
         end,                                             
         function()                                       
            mut = _threads.Mutex(id)                      
            output = val                                  
         end                                              
      )                                                   
   THREAD.pool:addjob(function()                      
      while true do                                       
         mut:lock()                                       
         output[1] = 10                                   
         mut:unlock()                                     
         os.execute('sleep 1')                            
      end                                                 
   end)                                                   
end                                                       

initThread()                                           
while true do                                             
   THREAD.mutex:lock()                                
   print('===> ', THREAD.val[1])                      
   THREAD.mutex:unlock()                              
   os.execute('sleep 1')                                  
end                                                       

This is the backtrace I got from gdb:

(gdb) bt                                                                                 
#0  0xeed09280 in tds_vec_set () from /home/ubuntu/torch/install/lib/lua/5.1/libtds.so   
#1  0x00062b38 in lj_vm_ffi_call ()                                                      
#2  0x0004149e in lj_cf_ffi_meta___call ()                                               
#3  0x00060a78 in lj_BC_FUNCC ()                                                         
#4  0x00054d4c in lua_pcall ()                                                           
#5  0xe6109746 in THThread_main () from /home/ubuntu/torch/install/lib/libthreadsmain.so 
#6  0xf76f4fbc in start_thread (arg=0xe0635460) at pthread_create.c:314                  
#7  0xf767a20c in ?? () at ../ports/sysdeps/unix/sysv/linux/arm/nptl/../clone.S:92       
   from /lib/arm-linux-gnueabihf/libc.so.6                                               
Backtrace stopped: previous frame identical to this frame (corrupt stack?)               

Threads.synchronize() fails tensor copy to fail

I want to copy tensors in batches of 5. I created a thread each to copy the tensor. I call Threads.synchronize() to complete the end-callback (i.e copying) before further processing. But calling Threads.synchronize() always causes copy to fail. Is there any way to ensure end-callbacks is completed without calling synchronize()??

require("nn")
local threads = require 'threads'
local nthread = 4
local njob = 10
local batch_size = 5
local contexts_batch = torch.FloatTensor(batch_size,6):zero()

local pool = threads.Threads(
nthread,
function(threadid)
print('starting a new thread/state number ' .. threadid)
end
)
local c = 1
local jobdone = 0
for j=1,20 do
if c<= batch_size then
pool:addjob(
function()
local contexts = torch.FloatTensor(6)
contexts[1] = 2
local i = 0
while i < 5 do
contexts[i+2] = 1
i = i + 1
end
print (string.format('Contexts size %d, from task %d, thread ID is %x', contexts:size()[1], i,__threadid))
return contexts
end,

     function(contexts)
        print(string.format("Ending callback : task %d finished ", j ))
        contexts_batch[c]:copy(contexts)
        jobdone = jobdone + 1
     end
  )
  c = c+1

else
--pool:synchronize() <<<<This causes error
c = 1
end
end

pool:synchronize() <<<<This does not
print(string.format('%d jobs done', jobdone))
pool:terminate()

Error ๐Ÿ‘
Ending callback : task 1 finished
Ending callback : task 2 finished
Ending callback : task 3 finished
Ending callback : task 4 finished
Ending callback : task 5 finished
/opt/torch/install/bin/luajit: /opt/torch/install/share/lua/5.1/threads/threads.lua:255:
[thread 1 endcallback] bad argument #2 to '?' (out of range at /opt/torch/pkg/torch/generic/Tensor.c:853)
[thread 2 endcallback] bad argument #2 to '?' (out of range at /opt/torch/pkg/torch/generic/Tensor.c:853)
[thread 1 endcallback] bad argument #2 to '?' (out of range at /opt/torch/pkg/torch/generic/Tensor.c:853)
[thread 2 endcallback] bad argument #2 to '?' (out of range at /opt/torch/pkg/torch/generic/Tensor.c:853)
[thread 1 endcallback] bad argument #2 to '?' (out of range at /opt/torch/pkg/torch/generic/Tensor.c:853)
stack traceback:
[C]: in function 'error'
/opt/torch/install/share/lua/5.1/threads/threads.lua:255: in function 'synchronize'

Threads and Queue serialization

It was reported by @simopal6 via gitter that it is currently not possible to pass a Threads object between threads. Simopal created a simple example that indirectly creates a reference to a Threads object via reference to an outer local variable (upvalue).

It is currently possible to serialize/deserialize a queue pointer as number via id = queue:id() and queue.new(id)... but unfortunately the same is not possible for a thread object. The thread obj has an id() function but there is no way to create a new thread-object from an existing id.

@andresy Could we make Queue and Thread real torch-classes with custom read/write functions that pass on their pointer similar to how sharedserialize operates (serializing pointers)?

(btw in THThread.c#L118 there is a minor issue: the if(!self) return NULL; should immediately follow the malloc - otherwise this will likely cause a segfault in case malloc returned 0...)

threads.Threads(...) constructor hangs in low-memory conditions

If one of threads in the pool fails to reach the main loop (https://github.com/torch/threads/blob/master/threads.lua#L70) because of failed memory allocations, the thread pool does not notice it and tries to remove all endcallbacks from the queue in the synchronize(...) call https://github.com/torch/threads/blob/master/threads.lua#L258 which leads to a hang.

As a solution, one could split the code executed on threads in two parts: one is initialization, the other is the main loop; and one would check next if any threads failed to execute the initialization part.

What's the correct way to build this under Windows w/ MinGW?

In particular how does my build environment need to be setup to ensure cmake finds what it needs?

Here's what's setup so far:

  • CMake 2.8.12.2
  • Mingw-64 gcc 4.9.1
  • luajit 2.1.0 built from luajit's repo
  • luarocks is setup

The corresponding paths for each of the above have been added to set Path.

Problems encountered so far:

  • Under mingw ssize_t is already defined under one of its headers. I fixed it by wrapping that typedef in an #ifndef guard in lib/TH/THGeneral.h.in.

  • The torch7 cmake file isn't properly locating my luajit location setup. I had to manually add

    FIND_PACKAGE(Lua51 REQUIRED)
    SET(LUA_INCDIR ${LUA_INCLUDE_DIR})
    SET(LUALIB ${LUA_LIBRARIES})
    

And even then cmake couldn't locate my lua header directly. I had to manually set the LUA_INCLUDE_DIR to G:/LuaJIT-2.1.0/include/luajit-2.1.

Torch7 finally builds after making the above changes. It produced 3 targets: libtorch.dll, libTH.dll and libluaT.dll and they all reside in G:/OSS/torch7/build. It's unclear to me if the directory layout of those target outputs and intermediate files is even correct. What should a properly installed directory tree look like so that outside projects can correctly and reliably find it as a dependency?

Since I'm originally trying to get torch-threads working, let's use that as an example. Running cmake over torch-threads gives me the follow errors:

CMake Error at CMakeLists.txt:7 (find_package):
  By not providing "FindTorch.cmake" in CMAKE_MODULE_PATH this project has
  asked CMake to find a package configuration file provided by "Torch", but
  CMake did not find one.

  Could not find a package configuration file provided by "Torch" with any of
  the following names:

    TorchConfig.cmake
    torch-config.cmake

  Add the installation prefix of "Torch" to CMAKE_PREFIX_PATH or set
  "Torch_DIR" to a directory containing one of the above files.  If "Torch"
  provides a separate development package or SDK, be sure it has been
  installed.

Now I can point Torch_DIR to some place in where TorchConfig.cmake exist but somehow that doesn't seem to be the right way to resolve things. And indeed I tried exactly that setting Torch_DIR to G:/OSS/torch7/build/cmake-exports but that only gives me more errors like:

CMake Error at G:/OSS/torch7/build/cmake-exports/TorchConfig.cmake:24 (INCLUDE):
  include could not find load file:

    TorchPathsInit

Likewise for TorchPackage, TorchWrap and others.

I'd really like to not have to play whack-a-mole with the build process. What's the right approach to ensure I can build torch and torch-threads painlessly with the environment I have?

Thanks

Deadlock with 'sys' module

I know this isn't directly related to the 'threads' module, but threads certainly make this a heck of a lot more likely and I'm probably not the only one to have been bitten by this issue.

torch/sys#8

If I load the 'sys' library in a thread on certain machines it was deadlocking during thread creation, which then deadlocks the whole thread pool.

Essentially. Local fix has been to avoid using 'sys' anywhere in a thread, though this is tricky because other libraries also often depend on 'sys'.

shared atomic counter

Currently it's not possible to implement a counter shared by multiple threads without the overhead of locking. But for a simple counter that's too much overhead.

Would it make sense in this package to support an atomic shared simple counter? Currently storages and tds structures can be shared, this would be just a single counter variable that is atomic.

Confusing behavior when upvalues are involved

Hi,

I found that the code below causes threads to crash with error: torch/File.lua:263: unknown Torch class < nn.Linear >

require 'nn'
local threads = require 'threads'
local model = nn.Linear(5, 10)
threads.Threads(
    2,
    function(idx)
        require 'nn'
        local myModel = model:clone()
    end
)

While this code works, which is equivalent to previous one according to the documentation:

require 'nn'
local threads = require 'threads'
local model = nn.Linear(5, 10)
threads.Threads(
    2,
    function(idx)
        require 'nn'
    end,
    function(idx)
        local myModel = model:clone()
    end
)

I looked into the details of the threads library's implementation. The reason is that the upvalue 'model' in the first piece of code cannot be deserialized.

In queue.lua: 50

local callback = serialize.load(self:callback(self.head))

The child thread deserializes the entire callback as a whole. It knows noting about nn.Linear becasue the require 'nn' hasn't been executed yet.

Maybe it's good to have a clarification in the document suggesting that all packages be required in a separated function before any other functions passed to Threads so that others won't their waste time.

Thanks.

is it posible launch another pool inside a thread?

See this piece of code:

local threads = require 'threads'
local parser = require("parser")["discounts"]
local config = require("lapis.config").get()

-- Returns paginator instance
local paginator = function()
    local model = require("models")["discounts"]
    return model:paginated(model:query())
end
-- Returns elasticsearch client
local elasticsearch = function()
    local elasticsearch = require "elasticsearch"
    return elasticsearch.client{
      hosts = config.elasticsearch.hosts
  }
end

local nthread = config.nthread
local pages = paginator():num_pages()
local pool = threads.Threads(
   nthread,
   function(threadid)
      paginated = paginator()
      client = elasticsearch()
      from_json = require("lapis.util").from_json
      es_type = paginated.model:table_name()
      print('starting a new thread/state number ' .. threadid)
   end
)

local jobdone = 0
for i=1,pages do
   pool:addjob(function()
       local items = paginated:get_page(i)
     for i=1,#items do
       local data, err = client:index({
          index = config.elasticsearch.index,
                type = es_type,
          id = items[i].id,
          body = parser(items[i])
         })
     end
    return __threadid
   end,
   function(id)
      jobdone = jobdone + 1
   end)
end

pool:synchronize()
print(string.format('%d jobs done', jobdone))
pool:terminate()

So, can i create another pool inside pool:addjob() ?

Huge memory usage

I'm not sure if it is a bug, but the following code eats up all my memory (12 gigabytes) and I don't understand why:

local threads=require 'threads'
th=threads.Threads(8);
while true do
    th:addjob(function ()
        collectgarbage()
        print("Thread: "..(collectgarbage("count")/1024).." Mbytes")
        local d=torch.rand(10000,10000) --should be 762Mbytes
        collectgarbage()    
        return d
    end)
    print("Main: "..(collectgarbage("count")/1024).." Mbytes")
    th:dojob()
    collectgarbage()
end

The amount of memory used depends on the number of threads (about 1.6Gbyte/thread). The garbage collector shows ~1 Mbytes of memory for both thread and main block, but if I check it out with htop, I can see that it eats up all the memory and gets killed by the kernel. Shouldn't be about 800Mbytes of memory enough for this code to run? Should't the dojob() free the memory required by the thread?

This is only a toy example, I want to create a loader that fetches and prepossesses data in the background, but it uses enormous amount of memory, although the useful data fits in a much smaller space.

dpnn module required error

Hello

I'm trying to run a torch model through DIGITS. I'm getting the following error:
ERROR: /usr/share/lua/5.1/trepl/init.lua:384: .../share/digits/digits/jobs/20160919-102150-25c4/model.lua:1: dpnn module required: luarocks install dpnn
The error comes from the require function in the model.
I already installed dpnn with luarocks install dpnn and give the permissions to the lua folder to access it.

This is the model
https://github.com/NVIDIA/DIGITS/blob/master/examples/text-classification/text-classification-model.lua

luarocks list
Warning: Failed loading manifest for /home/gonzalo/.luarocks/lib/luarocks/rocks: /home/gonzalo/.luarocks/lib/luarocks/rocks/manifest: No such file or directory
Installed rocks:

argcheck
scm-1 (installed) - /home/gonzalo/torch/install/lib/luarocks/rocks

cudnn
scm-1 (installed) - /home/gonzalo/torch/install/lib/luarocks/rocks

cunn
scm-1 (installed) - /home/gonzalo/torch/install/lib/luarocks/rocks

cutorch
scm-1 (installed) - /home/gonzalo/torch/install/lib/luarocks/rocks

cwrap
scm-1 (installed) - /home/gonzalo/torch/install/lib/luarocks/rocks

dok
scm-1 (installed) - /home/gonzalo/torch/install/lib/luarocks/rocks

dpnn
scm-1 (installed) - /home/gonzalo/torch/install/lib/luarocks/rocks

env
scm-1 (installed) - /home/gonzalo/torch/install/lib/luarocks/rocks

gnuplot
scm-1 (installed) - /home/gonzalo/torch/install/lib/luarocks/rocks

graph
scm-1 (installed) - /home/gonzalo/torch/install/lib/luarocks/rocks

hdf5
0-0 (installed) - /home/gonzalo/torch/install/lib/luarocks/rocks

image
1.1.alpha-0 (installed) - /home/gonzalo/torch/install/lib/luarocks/rocks

lightningmdb
0.9.18.2-1 (installed) - /home/gonzalo/torch/install/lib/luarocks/rocks

lpeg
1.0.0-1 (installed) - /home/gonzalo/torch/install/lib/luarocks/rocks

lua-cjson
2.1devel-1 (installed) - /home/gonzalo/torch/install/lib/luarocks/rocks

lua-pb
scm-0 (installed) - /home/gonzalo/torch/install/lib/luarocks/rocks

luaffi
scm-1 (installed) - /home/gonzalo/torch/install/lib/luarocks/rocks

luafilesystem
1.6.3-1 (installed) - /home/gonzalo/torch/install/lib/luarocks/rocks

moses
1.4.0-1 (installed) - /home/gonzalo/torch/install/lib/luarocks/rocks

nccl
scm-1 (installed) - /home/gonzalo/torch/install/lib/luarocks/rocks

nn
scm-1 (installed) - /home/gonzalo/torch/install/lib/luarocks/rocks

nngraph
scm-1 (installed) - /home/gonzalo/torch/install/lib/luarocks/rocks

nnx
0.1-1 (installed) - /home/gonzalo/torch/install/lib/luarocks/rocks

optim
1.0.5-0 (installed) - /home/gonzalo/torch/install/lib/luarocks/rocks

paths
scm-1 (installed) - /home/gonzalo/torch/install/lib/luarocks/rocks

penlight
scm-1 (installed) - /home/gonzalo/torch/install/lib/luarocks/rocks

qtlua
scm-1 (installed) - /home/gonzalo/torch/install/lib/luarocks/rocks

qttorch
scm-1 (installed) - /home/gonzalo/torch/install/lib/luarocks/rocks

struct
1.4-1 (installed) - /home/gonzalo/torch/install/lib/luarocks/rocks

sundown
scm-1 (installed) - /home/gonzalo/torch/install/lib/luarocks/rocks

sys
1.1-0 (installed) - /home/gonzalo/torch/install/lib/luarocks/rocks

tds
scm-1 (installed) - /home/gonzalo/torch/install/lib/luarocks/rocks

threads
scm-1 (installed) - /home/gonzalo/torch/install/lib/luarocks/rocks

torch
scm-1 (installed) - /home/gonzalo/torch/install/lib/luarocks/rocks

totem
0-0 (installed) - /home/gonzalo/torch/install/lib/luarocks/rocks

trepl
scm-1 (installed) - /home/gonzalo/torch/install/lib/luarocks/rocks

xlua
1.0-0 (installed) - /home/gonzalo/torch/install/lib/luarocks/rocks

How can I solve this?
In terminal i've done:
`th
______ _ | Torch7
/_ / __/ / | Scientific computing for Lua.
/ / / _ / **/ __/ _ \ | Type ? for help
/
/ _**/
/ __/
//
/ | https://github.com/torch
| http://torch.ch

th> require('dpnn')
{
bigtest : function: 0x410fce68
test : function: 0x410fce20
version : 2
}`

addjob() blocking even if the queue is not full?

I'm currently writting a small threaded application. I ran into an issue using torch.threads. Basically, my code can be reduced to this:

tpool:addjob(threadFunc)
-- Do another long task here
-- ...
while currentPos <= shuffledIdx:storage():size() do
    att = sys.clock()
    tpool:synchronize()
    print("Time elapsed waiting for synchronize : " .. sys.clock() - att)

    local dataIn = threadBufferIn[{{1, bsize}, {}, {}, {}}]:clone()
    local dataOut = threadBufferOut[{{1, bsize}, {}, {}, {}}]:clone()

    currentPos = currentPos + bsize
    if currentPos <= shuffledIdx:storage():size() then
        att2 = sys.clock()
        print(tpool:acceptsjob())
        tpool:addjob(threadFunc)
        print("Time elapsed waiting for addjob : " .. sys.clock() - att2)
    end

    --  Do another long task in the main thread
end

where, of course, tpool is a well-defined thread pool. I use this pool at no other place in my code. Now, I get that tpool:synchronize() should block until the "threadFunc" function is done -- so the time elapsed in synchronize could be > 0. But here is the output I get:

Time elapsed waiting for synchronize : 9.9897384643555e-05
true
Time elapsed waiting for addjob : 2.5406899452209

I also print the return of acceptsjob() (which is true) -- I also tried to print the return value of hasjob(), which is always false at this point. According to the documentation, this should indicates that the corresponding queue is not full, and thus that addjob will not block. Then why does it hangs more than 2 seconds there? I would get that synchronize blocks (and it indeed does a few time), but addjob?
Is there a way for me to gather more debug information?

Incompatible with lua 5.3

Builds but does not import

+ lua -e 'require '\''threads'\'''
lua: error loading module 'libthreads' from file '/Users/awiltschko/anaconda/envs/_test/lib/lua/5.3/libthreads.so':
    dlopen(/Users/awiltschko/anaconda/envs/_test/lib/lua/5.3/libthreads.so, 6): Symbol not found: _luaL_checkint
  Referenced from: /Users/awiltschko/anaconda/envs/_test/lib/lua/5.3/libthreads.so
  Expected in: flat namespace
 in /Users/awiltschko/anaconda/envs/_test/lib/lua/5.3/libthreads.so
stack traceback:
    [C]: in ?
    [C]: in function 'require'
    ...schko/anaconda/envs/_test/share/lua/5.3/threads/init.lua:3: in main chunk
    [C]: in function 'require'
    (command line):1: in main chunk
    [C]: in ?

FATAL THREAD PANIC with upvalues

Hello dear developers.

I met strange problem with upvalues and caught FATAL THREAD PANIC: (addjob)

I've tryed to use upvalues in callback regestered using threads:addjob. It works for numbers and strings, and failed for tables (_G or logger = logging.rolling_file() for example) with the same error.
Transfer tables through callback's arguments failed too cause serialization I think.
Any requirement inside a callback results in the same error. For example:

local pool = threads.Threads( 5, function()
    print(  "Threadinit is initialized and started " .. __threadid  )
end )
local port = 234
local worker2 = function ( ... )
    require( "logging.rolling_file" )
    local p = port

    print( "Thread is initialized and started. Port: " .. p )

    result = 0
    while result < 100000000 do
        result = result + 1
    end
    return result
end

pool:addjob( worker2, function ( res ) print( "result: " .. res ) end )

Running failed with error:

Threadinit is initialized and started 4 
Threadinit is initialized and started 1 
Threadinit is initialized and started 3 
Threadinit is initialized and started 2 
Threadinit is initialized and started 5 
FATAL THREAD PANIC: (addjob) /home/test/torch/install/share/lua/5.1/torch/File.lua:107: Unwritable object <function>    

Your example also doesn't work for me:

nn = require "nn"
local threads = require 'threads'
local model = nn.Linear(5, 10)

-- create pool of threads
--local pool = threads.Threads( npool, trheadinit2 )
threads.Threads(
    2,
    function(idx)                      -- This code is OK.
        require 'nn'
    end,                               -- child threads know nn.Linear when deserializing f2
    function(idx)
        local myModel = model:clone()  -- because f1 has already been executed
    end
)

Running failed with error:

FATAL THREAD PANIC: (addjob) /home/test/torch/install/share/lua/5.1/torch/File.lua:107: Unwritable object <function>    

Please help me to find a reason of fail.
I use updated Torch (Nov 7 2015) with LuaJIT v2.0.4 on VM Ubuntu 14.04

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.