squadrick / shadesmar Goto Github PK

View Code? Open in Web Editor NEW

541.0 19.0 83.0 456 KB

Fast C++ IPC using shared memory

License: MIT License

CMake 0.13% C++ 99.27% Shell 0.08% Python 0.51%

interprocess-communication publish-subscribe shared-memory high-performance cpp rpc ipc

shadesmar's Introduction

Shadesmar

An IPC library that uses the system's shared memory to pass messages. Supports publish-subscribe and RPC.

Requires: Linux and x86. Caution: Alpha software.

Features

Multiple subscribers and publishers.
Uses a circular buffer to pass messages between processes.
Faster than using the network stack. High throughput, low latency for large messages.
Decentralized, without resource starvation.
Minimize or optimize data movement using custom copiers.

Usage

There's a single header file generated from the source code which can be found here.

If you want to generate the single header file yourself, clone the repo and run:

$ cd shadesmar
$ python3 simul/simul.py

This will generate the file in include/.

Publish-Subscribe

Publisher:

#include <shadesmar/pubsub/publisher.h>

int main() {
    shm::pubsub::Publisher p("topic_name");
    const uint32_t data_size = 1024;
    void *data = malloc(data_size);
    
    for (int i = 0; i < 1000; ++i) {
        p.publish(data, data_size);
    }
}

Subscriber:

#include <shadesmar/pubsub/subscriber.h>

void callback(shm::memory::Memblock *msg) {
  // `msg->ptr` to access `data`
  // `msg->size` to access `data_size`

  // The memory will be free'd at the end of this callback.
  // Copy to another memory location if you want to persist the data.
  // Alternatively, if you want to avoid the copy, you can call
  // `msg->no_delete()` which prevents the memory from being deleted
  // at the end of the callback.
}

int main() {
    shm::pubsub::Subscriber sub("topic_name", callback);

    // Using `spin_once` with a manual loop
    while(true) {
        sub.spin_once();
    }
    // OR
    // Using `spin`
    sub.spin();
}

RPC

Client:

#include <shadesmar/rpc/client.h>

int main() {
  Client client("channel_name");
  shm::memory::Memblock req, resp;
  // Populate req.
  client.call(req, &resp);
  // Use resp here.

  // resp needs to be explicitly free'd.
  client.free_resp(&resp);
}

Server:

#include <shadesmar/rpc/server.h>

bool callback(const shm::memory::Memblock &req,
              shm::memory::Memblock *resp) {
  // resp->ptr is a void ptr, resp->size is the size of the buffer.
  // You can allocate memory here, which can be free'd in the clean-up lambda.
  return true;
}

void clean_up(shm::memory::Memblock *resp) {
  // This function is called *after* the callback is finished. Any memory
  // allocated for the response can be free'd here. A different copy of the
  // buffer is sent to the client, this can be safely cleaned.
}

int main() {
  shm::rpc::Server server("channel_name", callback, clean_up);

  // Using `serve_once` with a manual loop
  while(true) {
    server.serve_once();
  }
  // OR
  // Using `serve`
  server.serve();
}

shadesmar's People

Contributors

Stargazers

Watchers

Forkers

shrijitsingh99 sudo-panda moneytech blockspacer linecode dendisuhubdy ezhangle dreamplayer-zhang dido1998 ericyung alan0526 liuxw7 victorsow jinyuttt mmmquantdev sighingnow youcannotburnmyshadow alehacksp watch-later zkas litterfisher ziyuyouxia xian0gang wfq-public xiaozuonanjing vbirds 4ga1n louislau86 jyjatbupt jjzhang166 zabrane adas-eye ahdcoming randoruf imkcy9 shjinyuan mjjdick twywleo nc-loop chengchannel irobothy crackercat jeff-shen capitaneanu okandok yk0211 xiaopeifeng kybr lilu6301 dize2022 lucky0618 littleboys arturotorressby gerhobbelt xcwhite0 classic130 qiueamo y11en 345161974 chaoswen03 luoqianlin lzroc linjx520074913 jpolaris pei-t mediumcore crazyguitar dingdong1123 shiyi23 iamwrm yangjietadie zileyue163 wzhang47 brugarolas j8267643 liyongming1982 gyzhang2019 juanmaneo zsqrq long1024 feelwa17 nicholas-zww ohadmen

shadesmar's Issues

[memory] Auto flush

With the merging of #40, it is one step closer to making a uniform circular buffer structure in shared memory. The only change from one topic/channel to another will be the allocation buffer.

Introduce a lockless set to the circular buffer, to store the PIDs of all participating processes. During the initialization of memory, where we're "mapping" the shared memory to the current process, check if other processes exist.

This will get rid of needing to keep flushing the shared memory mappings after each run.

just some questions

Every instance running shadesmar will share the same events base correct? There is no way to open a different channel like unix sockets?
What about instances running inside containers (docker), will they share the same events base too?

Thanks

[memory] Make Memory dynamic

Currently, Memory has queue size (queue_size) as a template parameter, and needs to be specified at compile time. This makes all other classes (PublisherBase, SubscirberBase) that rely on Memory to be templated, and we need to include implementation in the header file.

The solution would be to make queue_size a regular parameter that can be passed in the constructor of Memory, and refactor all the code into .h and .cpp files where possible.

The dynamic allocation of shared memory will be an issue. We need to so some pointer mangling to prevent multiple shared memory segments.

use protobuf for big message

i try use protobuf to convey big message, like pointcloud
when the convey message is bigger, the publish program can run ,but the subscriber program will gets stuck there after running for a period of time。

subscriber code

#include <shadesmar/pubsub/subscriber.h>
#include <proto/cloud.pb.h>
#include <string.h>
#include <chrono>
#include <thread>

void callback(shm::memory::Memblock *msg) {
  // `msg->ptr` to access `data`
  // `msg->size` to access `data_size`
  std::cout << "start call back" << std::endl;
  shadesmar::TanwyProtoCloud tw_proto_point_cloud;

  auto start = std::chrono::high_resolution_clock::now();
  if (!tw_proto_point_cloud.ParseFromArray((char*)msg->ptr, msg->size)) {
    std::cerr << "Failed to deserialize message." << std::endl;
    return;
  }
  // 获取结束时间点
  auto end = std::chrono::high_resolution_clock::now();

  // 计算时间差
  auto duration =std::chrono::duration_cast<std::chrono::microseconds>(end - start);
  // 输出计时结果
  std::cout << "deserialize Time taken: " << duration.count() << " microseconds" << std::endl;

  std::cout << "msg->size " << msg->size << std::endl;
  // std::cout << "recieve msg size " << serialized_data.size() << std::endl;
  std::cout << "name: " << tw_proto_point_cloud.name() << std::endl;
  std::cout << "frame_id: " << tw_proto_point_cloud.frame_id() << std::endl;
  

  float* points = (float*)tw_proto_point_cloud.points().c_str();
  unsigned int* points_time = (unsigned int*)tw_proto_point_cloud.points_time().c_str();

  int num_points = tw_proto_point_cloud.num_points();
  int point_num_fields = tw_proto_point_cloud.point_num_fields();
  int time_num_fields = tw_proto_point_cloud.time_num_fields();

  std::cout << "num_points: " << num_points << std::endl;
  std::cout << "point_num_fields: " << point_num_fields << std::endl;
  std::cout << "time_num_fields: " << time_num_fields << std::endl;

}

int main() {
  shm::pubsub::Subscriber sub("test", callback);
  
  sub.spin();
}

publish code

#include <iostream>
#include <mutex>
#include <thread>
#include <vector>

#include <shadesmar/pubsub/publisher.h>
#include <proto/cloud.pb.h>
#include <chrono>

int main(){

  std::string topic = "test";
  shm::pubsub::Publisher pub(topic);

  shadesmar::TanwyProtoCloud tw_proto_point_cloud;
  std::string name = "tanway_cloud";
  int frame_id = 1;
  double timestamp = 3.1333333;
  int point_num_fields = 4;
  int num_points = 100000;
  int time_num_fields = 2;
  float* points = new float[num_points * point_num_fields];
  int* points_time = new int[num_points * time_num_fields];
  for (int i=0;i<num_points;i++){
    for (int j = 0; j < point_num_fields;j++){
      points[i * point_num_fields + j] = float(i) + ((float)j)/10;
    }
    for (int k = 0; k < 2; k++) {
      points_time[i * 2 + k] = i * 2 + k;
    }
  }

  tw_proto_point_cloud.set_name(name);
  tw_proto_point_cloud.set_frame_id(frame_id);
  tw_proto_point_cloud.set_timestamp(timestamp);
  tw_proto_point_cloud.set_point_num_fields(point_num_fields);
  tw_proto_point_cloud.set_num_points(num_points);
  tw_proto_point_cloud.set_time_num_fields(time_num_fields);
  tw_proto_point_cloud.set_points((void*)points, num_points * point_num_fields * sizeof(float));
  tw_proto_point_cloud.set_points_time((void*)points_time, num_points * time_num_fields * sizeof(unsigned int));
  

  for (int i=0;i<500;i++) {
    frame_id = i;
    tw_proto_point_cloud.set_frame_id(frame_id);
    int serialized_size = tw_proto_point_cloud.ByteSizeLong();
    char* buffer = new char[serialized_size];

    if (!tw_proto_point_cloud.SerializeToArray(buffer, serialized_size)) {
      std::cerr << "Failed to serialize message." << std::endl;
      // delete[] buffer;
      // return -1;
    }

    std::this_thread::sleep_for(std::chrono::milliseconds(50));
    pub.publish(reinterpret_cast<void*>(buffer), serialized_size);
    std::cerr << "publish : "<< i << std::endl;
    delete[] buffer;
  }
}

proto msg define
syntax = "proto2";

package shadesmar;

message TanwyProtoCloud{
    required string name = 1;
    required int32 frame_id = 2;
    required double timestamp = 3;
    required int32 point_num_fields = 4;
    required int32 num_points = 5;
    required bytes points = 6;
    required int32 time_num_fields = 7;
    required bytes points_time = 8;
}

Build a lockless implementation of the shared queue for pubsub

There's two good references:

rpc_test.cpp msgpack::v1::insufficient_bytes' what(): insufficient bytes

terminate called after throwing an instance of 'msgpack::v1::insufficient_bytes'
what(): insufficient bytes

a naive design for shared memory

I read your code, it's really masterpiece.

but could you have a look at my design, it's naive and for a very simple purpose:

multi-producer multi-consumer shared memory, and avoid CPU busy-waiting (so use a conditional_variable)

I have a header with conditional_variable and pthread_mutex_t. will create when shared_memory are allocated.

// header.hpp
  #pragma once
  #include <sys/ipc.h>
  #include <sys/shm.h>
  #include <sys/types.h>
  #include <semaphore.h>
  #include <pthread.h>
  #include <stdio.h>
  #include <string.h>
  #include <error.h>
  #include <errno.h>
  #include <fcntl.h>
  #include <unistd.h>
  #include <sys/sysinfo.h>
  #include <atomic>
  #include <memory>
  #include <mutex>
  #include <condition_variable>
  #include <iostream>
  using namespace std;
  
  struct Header {
    const size_t size_;
    std::atomic_int32_t tail_;
    std::mutex cv_mut_;
    std::condition_variable cv_;
    pthread_mutex_t mut_;
    Header(size_t size, int32_t tail) : size_(size), tail_(tail) {
      pthread_mutexattr_t mutexattr;
      pthread_mutexattr_init(&mutexattr);
      pthread_mutexattr_setpshared(&mutexattr, PTHREAD_PROCESS_SHARED);
      pthread_mutexattr_setrobust(&mutexattr, PTHREAD_MUTEX_ROBUST);
      pthread_mutex_init(&mut_, &mutexattr); 
    }
  };

I also have a shm_worker which used to init and allocate memory. the only thing it do is to init, create or connect to existed.

  // shm_worker.hpp
 #pragma once
 #include "./header.hpp"
 
 class ShmWorker {
  public:
   ShmWorker() : is_init(false) {}
 
   virtual ~ShmWorker() {
     shmdt(m_data);
     if (create_new) shmctl(shmid, IPC_RMID, 0); 
   }
 
  protected:
   key_t get_keyid(const std::string & name) {
     const std::string & s = "./" + name;
     if (access(s.c_str(), F_OK) == -1) { const std::string & s1 = "touch " + s; system(s1.c_str()); }
     key_t semkey = ftok(name.c_str(), 1); 
     if (semkey == -1) { printf("shm_file:%s not existed\n", s.c_str()); exit(1); }
     return semkey;
   }
 
   template <typename T>
   void init(const std::string& name, int size) {  // one word to conclude: if not existed, create, else connect
     if (is_init) { printf("this shmworker has beed inited!\n"); return; }
     m_key = get_keyid(name);
     shmid = shmget(m_key, 0, 0); 
     if (shmid == -1) {
       if (errno == ENOENT || errno == EINVAL) {
         shmid = shmget(m_key, sizeof(Header) + sizeof(T) * size, 0666 | IPC_CREAT | O_EXCL);
         if (shmid == -1) { printf("both connet and create are failed for shm\n"); exit(1); }
         printf("creating new shm %s\n", name.c_str());
         create_new = true;
         m_data = (char*)shmat(shmid, NULL, 0); 
         Header* header = new Header(size, 0); 
         memcpy(m_data, header, sizeof(Header));
       } else {
         exit(1);
       }   
     } else {
       m_data = (char*)shmat(shmid, 0, 0); 
       // m_size = reinterpret_cast<std::atomic_int*>(m_data)->load();
     }   
 
     if (m_data == (char*)(-1)) { perror("shmat"); exit(1); }
     is_init = true;
   }
 
   int m_key;
   int shmid;
   char* m_data;
   bool is_init;
   bool create_new = false;
 };

the producer code is:

#pragma once
 #include <mutex>
 #include <fcntl.h>
 #include <fstream>
 #include "./shm_worker.hpp"
 
 template <typename T>
 class ShmSender: public ShmWorker {
  public:
   ShmSender(const std::string& key, int size = 4096) {
     init <T> (key, size);
     header_ = (Header*)m_data;
   }
 
   virtual ~ShmSender() = default;
 
   void Send(const T& shot) {
     // pthread_mutex_lock(&header_->mut_);
     {   
       std::lock_guard<std::mutex> lk(header_->cv_mut_);
       memcpy(m_data + sizeof(Header) + header_->tail_.load() % header_->size_ * sizeof(T), &shot, sizeof(T));
       header_->tail_.fetch_add(1);
     }   
     // pthread_mutex_unlock(&header_->mut_);
     header_->cv_.notify_all();
   }
 
  private:
   Header * header_;
 };

the consumer code is:

#pragma once
 #include <unistd.h>
 #include "./shm_worker.hpp"
 
 template <typename T>
 class ShmRecver : public ShmWorker {
  public:
   ShmRecver(const std::string & key, int size = 4096) {
     init <T> (key, size);
     header_ = (Header*)m_data;
     read_index = header_->tail_.load();
   }
 
   virtual ~ShmRecver() {}
 
   void Recv(T& t) {
     std::unique_lock<std::mutex> lk(header_->cv_mut_);
     header_->cv_.wait(lk, [&] { return read_index != header_->tail_; });  // wait tail changed by Sender
     t = *(T*)(m_data + sizeof(*header_) + (read_index ++));
   }
 
  private:
   int read_index;
   Header* header_;
 };

but, when i call sender.Send(), the recver.Recv() not response.

seems conditional_variable notify_all didn't wake up Recver process's wait

could you help on this?

and it would be great if you can kindly give some advice about this design.

Add support for custom copy functions

Currently, we default to using std::memcpy for doing transfer between CPU processes. We should change this to allow users from passing custom copying semantics for two reasons:

Adding new device support: CPU->GPU or vice-versa support. This should be easy in the case of CUDA since cudaMemcpy operators of regular C-style pointer. OpenCL support would be much tougher.
Different copy mechanisms. See (include/memory/dragons.h).

We can define an abstract class called BaseCopier with the following definition:

class Copier {
public:
  // copy from shared to process memory
  virtual void shm_to_user(void*, void*, size_t) = 0;

  // copy from process to shared memory
  virtual void user_to_shm(void*, void*, size_t) = 0;
}

An example of how to implement current behaviour:

class Memcpy: public Copier {
  Memcpy() {}

  void shm_to_user(void* shm_ptr, void* ptr, size_t size) {
    std::memcpy(ptr, shm_ptr, size);
  }

  void user_to_shm(void* ptr, void* shm_ptr, size_t size) {
    std::memcpy(shm_ptr, ptr, size);
  }
}

Usage: We can pass a pointer to object of the copier to the constructor of either pubsub::{Publisher,Subscriber}Bin.

auto memcpy = Memcpy();
shm::pubsub::PublisherBin<16> pub("topic_name", &memcpy);
shm::pubsub::SubscriberBin<16> pub("topic_name", callback, &memcpy);

Change: The constructor of {Publisher,Subscriber}Bin will optionally accept Copier*:

PublisherBin(std::string topic_name, Copier*);

Refactor the binary and serialized versions

Right now, particularly for pubusb, the binary vs serialized versions are a convoluted mess. We would ideally to have all logic in the binary version, and the serialized version can simply call its binary counterpart after calling msgpack APIs.

Similarly for RPC mechanism.

Clarification about RobustLock::lock

void RobustLock::lock() {
  while (!mutex_.try_lock()) {
    if (exclusive_owner.load() != 0) {
      auto ex_proc = exclusive_owner.load();
      if (proc_dead(ex_proc)) {
        // Here, I thought, ex_proc is always equal to exclusive_owner.
        // Why not remove the compare_exchange_strong check?
        if (exclusive_owner.compare_exchange_strong(ex_proc, 0)) {
          mutex_.unlock();
          continue;
        }
      }
    } else {
      prune_readers();
    }

    std::this_thread::sleep_for(std::chrono::microseconds(1));
  }
  exclusive_owner = getpid();
}

Support zero-copy communication

Here's one way to achieve this:

Publisher p("topic");
void *ptr = p.get_msg(size);

ptr is allocated in the shared memory (using Allocator) and given to the user. We also assign an Element in the shared queue to ptr. We hold a writer lock on this element until ptr is published. We may need to update the base Element to add an extra field: is_zero_copied, so that the consumer can react accordingly.

auto obj = new (ptr) SomeClass( /* params */);
// update obj
p.zero_copy_publish(ptr); // releases the shared queue element lock

On the consumer side, we'll return a subclass of Memblock: ZeroCopyMemblock which will not have a no_delete() and will be deallocated at the end of the callback. We'll need to check the logic for locks as well.

Code path for copied-communication:

// element is the currently accessed shared queue position
Memblock memblock;
element.lock.acquire();
memcpy(element.ptr, memblock.ptr, element.size);
memblock.size = element.size;
element.lock.release();

callback(memblock);

if (memblock.should_free) {
   delete memblock;
}

New code path for zero-copy communication:

element.lock.acquire();
callback(ZeroCopyMemblock{element.ptr, element.size});
element.lock.release();

allocator.dealloc(element);

The above has been shown for pub-sub, but they can be extended to RPC too.

Here's a problem, we can't free each message pointer independently. A message pointer can only be free after all preceding message allocations are released, which is due to the logic in which Allocator works. It is a strictly FIFO-based allocation strategy. For performance, we may want to consider moving to a more complex general-purpose allocator.

NOTE: Writing a general-purpose allocator to work on a single chunk of shared memory is very error-prone.

cannot complie lib on windows and linux platform with CMakeLists.txt

here is the errors log in windows :

**CMake Error at CMakeLists.txt:9 (find_package):
By not providing "Findbenchmark.cmake" in CMAKE_MODULE_PATH this project
has asked CMake to find a package configuration file provided by
"benchmark", but CMake did not find one.

Could not find a package configuration file provided by "benchmark" with
any of the following names:

benchmarkConfig.cmake
benchmark-config.cmake

Add the installation prefix of "benchmark" to CMAKE_PREFIX_PATH or set
"benchmark_DIR" to a directory containing one of the above files. If
"benchmark" provides a separate development package or SDK, be sure it has
been installed.**

std::filesystem namespace conflict

When trying to build shadesmar tests this error pops up

../include/shadesmar/memory/tmp.h:41:48: error: ‘namespace std::filesystem = std::experimental::std::experimental::filesystem;’ conflicts with a previous declaration

The previous declaration is in chrono header.

Any example publishing a char *?

First, I'm new to c++, but I'm trying to port this library to nodejs.
This is a part of my code:

     Napi::Buffer<char> buff = info[1].As<Napi::Buffer<char>>();          //nodejs buffer
     const uint32_t data_size = buff.Length();
     char * word = buff.Data();        
     shm::memory::DefaultCopier cpy;
     shm::pubsub::Publisher pub = shm::pubsub::Publisher("topic_example", &cpy);
     pub.publish(reinterpret_cast<void *>(word), data_size);

But I'm getting this error on publish method: free(): invalid pointer

Switch from cmake to bazel

Catch could be updated due to the compile error

I met the same problem with the issue, an update will solve this.

Performance (at least compared to nanomsg)

I wonder how this performs compared to nanomsg (zero-copy, very minimal, fully scalable, ...).

Could you do some measurements (at least preliminary with trade-offs to get a general sense) and publish them ideally right in the readme?

Rationale

I consider nanomsg as a baseline, so that's why I'm not asking about comparison to other messaging/IPC products (there are many good ones but nanomsg is easy to measure against due to its stability, easy setup, nice perf package, and leading performance).

Implement try_lock, try_lock_sharable for RobustLock

See include/shadesmar/robust_lock.h.

The interface should be similar to RWLock found in include/shadesmar/rw_lock.h.

"Increase buffer_size" and not sending message

I'm trying to create a server/client system with Shadesmar, using topics to communicate with specific clients.
After the server receive the message, it responds and then i get this problem.
Both, server and client are running on the same instance. Only two messages was exchanged, so, i dont think i could be a memory problem.
Other specs:

i'm using multithread.
the copier and publisher are created at a global map
tried to create a new instance of copier and publisher to send a message back, same problem.

any ideas?

Set up GitHub actions

Started working on this with fce6d5b, but there's any error with Boost libraries.

Add tests to CI

Having tests as part of the CI would ensure the repo is always in a functional state and fewer bugs creep in.
gtest would be a good fit for this.

Optimize includes in single header file

Currently, the different headers included across the source files are copied as-is when generating the single header file. This leads to duplicates.

[test] Rewrite benchmarks and tests

Currently, the different tests and benchmarks do not use any framework, and resort to using asserts and manually timing functions. Ideally, we would want to use Catch for testing and Google benchmark for the different benchmarks.

[macos] msgpack.hpp not found

install_deps.sh successfully installs msgpack from brew.
cmake can successfully find the package.

During the compilation step it fails with (failed CI):

[1/18] Building CXX object CMakeFiles/micro_benchmark.dir/test/micro_benchmark.cpp.o
FAILED: CMakeFiles/micro_benchmark.dir/test/micro_benchmark.cpp.o 
/Applications/Xcode_11.5.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/c++   -I../include -O2 -DNDEBUG -isysroot /Applications/Xcode_11.5.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.15.sdk   -march=native -std=gnu++17 -MD -MT CMakeFiles/micro_benchmark.dir/test/micro_benchmark.cpp.o -MF CMakeFiles/micro_benchmark.dir/test/micro_benchmark.cpp.o.d -o CMakeFiles/micro_benchmark.dir/test/micro_benchmark.cpp.o -c ../test/micro_benchmark.cpp
../test/micro_benchmark.cpp:25:10: fatal error: 'msgpack.hpp' file not found
#include <msgpack.hpp>
         ^~~~~~~~~~~~~
1 error generated.
[2/18] Building CXX object CMakeFiles/pubsub_bin_test.dir/test/pubsub_bin_test.cpp.o
FAILED: CMakeFiles/pubsub_bin_test.dir/test/pubsub_bin_test.cpp.o 
/Applications/Xcode_11.5.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/c++   -I../include -O2 -DNDEBUG -isysroot /Applications/Xcode_11.5.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.15.sdk   -march=native -std=gnu++17 -MD -MT CMakeFiles/pubsub_bin_test.dir/test/pubsub_bin_test.cpp.o -MF CMakeFiles/pubsub_bin_test.dir/test/pubsub_bin_test.cpp.o.d -o CMakeFiles/pubsub_bin_test.dir/test/pubsub_bin_test.cpp.o -c ../test/pubsub_bin_test.cpp
In file included from ../test/pubsub_bin_test.cpp:32:
In file included from ../include/shadesmar/pubsub/publisher.h:35:
../include/shadesmar/message.h:32:10: fatal error: 'msgpack.hpp' file not found
#include <msgpack.hpp>
         ^~~~~~~~~~~~~
1 error generated.

Maybe it's been renamed from msgpack.hpp to msgpack.h?

[RPC] Replace poll with condvar

We can add running stats of time spent waiting under condvar, so that each successive sleep time will be inline with the actual sleeping time. This will prevent sleeping for too longer or waking up too early and wasting CPU cycles doing polls.

Trying to compile

I'm kind of new in c++ and trying to create a nodejs addon.
But I'm failing to compile your project.
Here are my steps:

$ sudo apt-get install libboost-all-dev libmsgpack-dev
$ git clone --recursive https://github.com/Squadrick/shadesmar.git
$ cd ./vendors/shadesmar/
$ ./install_deps.sh
$ ./configure
$ ninja

But I'm getting:

FAILED: CMakeFiles/dragons_test.dir/test/dragons_test.cpp.o 
/usr/bin/c++  -DDEBUG_BUILD -Iinclude -O3 -DNDEBUG   -march=native -O2 -std=gnu++1z -MD -MT CMakeFiles/dragons_test.dir/test/dragons_test.cpp.o -MF CMakeFiles/dragons_test.dir/test/dragons_test.cpp.o.d -o CMakeFiles/dragons_test.dir/test/dragons_test.cpp.o -c test/dragons_test.cpp
In file included from /usr/lib/gcc/x86_64-linux-gnu/7/include/immintrin.h:43:0,
                 from include/shadesmar/memory/dragons.h:31,
                 from test/dragons_test.cpp:29:
/usr/lib/gcc/x86_64-linux-gnu/7/include/avx2intrin.h: In function ‘void shm::memory::dragons::_avx_async_cpy(void*, const void*, size_t)’:
/usr/lib/gcc/x86_64-linux-gnu/7/include/avx2intrin.h:920:1: error: inlining failed in call to always_inline ‘__m256i _mm256_stream_load_si256(const __m256i*)’: target specific option mismatch
 _mm256_stream_load_si256 (__m256i const *__X)
 ^~~~~~~~~~~~~~~~~~~~~~~~
In file included from test/dragons_test.cpp:29:0:
include/shadesmar/memory/dragons.h:111:55: note: called from here
     const __m256i temp = _mm256_stream_load_si256(sVec);
                                                       ^
[6/8] Building CXX object CMakeFiles/dragons_bench.dir/benchmark/dragons.cpp.o
FAILED: CMakeFiles/dragons_bench.dir/benchmark/dragons.cpp.o 
/usr/bin/c++  -DDEBUG_BUILD -Iinclude -isystem /usr/local/include -O3 -DNDEBUG   -march=native -O2 -std=gnu++1z -MD -MT CMakeFiles/dragons_bench.dir/benchmark/dragons.cpp.o -MF CMakeFiles/dragons_bench.dir/benchmark/dragons.cpp.o.d -o CMakeFiles/dragons_bench.dir/benchmark/dragons.cpp.o -c benchmark/dragons.cpp
In file included from /usr/lib/gcc/x86_64-linux-gnu/7/include/immintrin.h:43:0,
                 from include/shadesmar/memory/dragons.h:31,
                 from benchmark/dragons.cpp:23:
/usr/lib/gcc/x86_64-linux-gnu/7/include/avx2intrin.h: In function ‘void shm::memory::dragons::_avx_async_cpy(void*, const void*, size_t)’:
/usr/lib/gcc/x86_64-linux-gnu/7/include/avx2intrin.h:920:1: error: inlining failed in call to always_inline ‘__m256i _mm256_stream_load_si256(const __m256i*)’: target specific option mismatch
 _mm256_stream_load_si256 (__m256i const *__X)
 ^~~~~~~~~~~~~~~~~~~~~~~~
In file included from benchmark/dragons.cpp:23:0:
include/shadesmar/memory/dragons.h:111:55: note: called from here
     const __m256i temp = _mm256_stream_load_si256(sVec);
                                                       ^
ninja: build stopped: subcommand failed.

Any help?
Thanks in advance.

Replace boost's managed shared memory with custom allocator

In Memory, we currently use boost::interprocess::managed_shared_memory for allocating/deallocating of shared memory needed for each message during runtime. It uses a red-black tree for best-fit allocation. This is overkill for shadesmar since the number of allocations is fixed to the buffer size, and the size of each allocation (message) will be roughly equal. Profiling raw_benchmark shows that the most number of function calls are to boost's red-black tree implementation.

Memory alignment

Hello, what are the considerations for the implementation of memory alignment in the code? Hope you can understand, thank you

The support of multiple producers

If multiple producers are employed, it could lead to a blockage at the location of the following code:

https://github.com/Squadrick/shadesmar/blob/master/include/shadesmar/pubsub/topic.h#L136

Below is my test code

struct stu {
    int a = 0;
    char b[10] = {0};
};

int test() {
  auto sp_pub = make_shared<Publisher>("test_shared_topic");
  if (!sp_pub) {
      return -1;
  }
  int i = 0;
  while(true) {
      struct stu s;
      s.a = i++;
      string str = to_string(i);
      strncpy(s.b, str.c_str(), str.length());
      bool ret = sp_pub->publish(reinterpret_cast<void*>(&s), sizeof(stu));
      if (!ret) {
          cout << "publish error" << endl;
      }
      this_thread::sleep_for(chrono::milliseconds(1));
  }
  return 0;
}

I spawn multiple processes to execute the above code.

`pubsub_test` failed two cases

env

master@win:/home/user/shadesmar-master/build$ uname -a
Linux win 5.10.16.3-microsoft-standard-WSL2 #1 SMP Fri Apr 2 22:23:49 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux
master@win:/home/user/shadesmar-master/build$ g++ --version
g++ (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0
Copyright (C) 2019 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

run test

./pubsub_test
Increase buffer_size

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
pubsub_test is a Catch v2.12.1 host application.
Run with -? for options

-------------------------------------------------------------------------------
single_message
-------------------------------------------------------------------------------
/home/user/shadesmar-master/test/pubsub_test.cpp:39
...............................................................................

/home/user/shadesmar-master/test/pubsub_test.cpp:54: FAILED:
  REQUIRE( answer == message )
with expansion:
  0 == 3

Publishing: 1
Increase buffer_size
Publishing: 2
Increase buffer_size
Publishing: 3
Increase buffer_size
Publishing: 4
Increase buffer_size
Publishing: 5
Increase buffer_size
-------------------------------------------------------------------------------
multiple_messages
-------------------------------------------------------------------------------
/home/user/shadesmar-master/test/pubsub_test.cpp:57
...............................................................................

/home/user/shadesmar-master/test/pubsub_test.cpp:78: FAILED:
  REQUIRE( answers == messages )
with expansion:
  {  } == { 1, 2, 3, 4, 5 }

Publishing: 1
Publishing: 2
Publishing: 3
Publishing: 4
Publishing: 5
Subscribe message: 0
Subscriber: 0
Subscriber: 1
Subscriber: 2
Subscribe message: 1
Subscriber: 0
Subscriber: 1
Subscriber: 2
Subscribe message: 2
Subscriber: 0
Subscriber: 1
Subscriber: 2
Subscribe message: 3
Subscriber: 0
Subscriber: 1
Subscriber: 2
Subscribe message: 4
Subscriber: 0
Subscriber: 1
Subscriber: 2
Publishing: 1
Publishing: 2
Publishing: 3
Publishing: 4
Publishing: 5
Subscriber: 0
Subscribe message: 0
Subscribe message: 1
Subscribe message: 2
Subscribe message: 3
Subscribe message: 4
Subscriber: 1
Subscribe message: 0
Subscribe message: 1
Subscribe message: 2
Subscribe message: 3
Subscribe message: 4
Subscriber: 2
Subscribe message: 0
Subscribe message: 1
Subscribe message: 2
Subscribe message: 3
Subscribe message: 4
Publishing: 1
Publishing: 2
Publishing: 3
Publishing: 4
Publishing: 5
Publishing: 1
Publishing: 2
Publishing: 3
Publishing: 4
Publishing: 5
===============================================================================
test cases: 10 |  8 passed | 2 failed
assertions: 33 | 31 passed | 2 failed

where is rpc code

hello, I can't find the RPC code，where is 'shadesmar/rpc/server.h' ？

how to imporve ipc limits , make it can send/write big data , maybe this data larger then 48M

hi , i want to send a tensor to another processer, this tensor's size maybe larger then 48M , i just use this lib's RPC have a try, but my server alway has mang callbacks, and every callback's req memblock always have the same req.size ,only zero .

Get back into working state

bbf9451 broke both pubsub and RPC since the change was pretty massive.

Idiotic of me to not make smaller commits.

Probable memory leak

Hallo, @Squadrick , thank for for your helpful ipc project, it really saved my life. I tested this project for half an hour, the memory it consumed grow from 0.2% to 0.5%, but all my program do is just received and publish, nothing else. I doubt there are some memory leak related bugs in the code. Thank you for replay:)

I am wondering what advantages this library offers over ros2 for example.

Thanks and kind regards

Can it handle when one of the properties of the message passed is std::string？

I try to send a std::string from client to server. I modify the code in the rpc.cpp

I modify the struct Message,

struct Message {
  uint64_t count;
  uint64_t timestamp;
  uint8_t *data;
  std::string code;
};

function callback(),

bool callback(const shm::memory::Memblock &req, shm::memory::Memblock *resp) {
  auto *msg = reinterpret_cast<Message *>(req.ptr);
  auto lag = shm::current_time() - msg->timestamp;
  std::cout << current_count << ", " << msg->count <<"    "<<msg->code << "\n";
  ....
}

and function client_loop().

void client_loop(int seconds, int vector_size) {
  ...
  auto *rawptr = malloc(vector_size);
  std::memset(rawptr, 255, vector_size);
  Message *msg = reinterpret_cast<Message *>(rawptr);
  msg->count = 0;
  msg->code = "hello world!";
  ...
}

But it got a segmentation fault error when I std::cout the msg->code.