I am wondering if it would be faster and more maintainable to handle serialization, da

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

<a class="user-mention notranslate" data-hovercard-type="user" data-hover

MapD Thrift interactions in C++ (and C++ in general) about cudf HOT 15 CLOSED

rapidsai commented on May 18, 2024

MapD Thrift interactions in C++ (and C++ in general)

from cudf.

Comments (15)

sklam commented on May 18, 2024

Given that we want to leave room to explore how we do IPC, I don't think we want to commit to a lot of C++ code just yet. However, I think mapd (https://github.com/mapd/mapd-core) may already has that code. If that is the case, that would make things easier.

Btw, there will likely be other parts in pygdf that would need to be in C++ code soon. But, I don't think we need everything to be in C++. At the end, the heavy lifting in done in numba compiled and other GPU code.

from cudf.

wesm commented on May 18, 2024

At the end, the heavy lifting in done in numba compiled and other GPU code.

This is a somewhat Python-centric view of things -- it would be interesting to define a core implementation that can be used from any program with reasonable C/C++ FFI. This includes Rust, Julia, Go, Lua/LuaJIT, etc. Such a core library would be very small (< 10 KLOC I would say) and focused on data structures, memory management / movement, and APIs for user-defined functions. Numba then could use this public API to plug in custom code written by the user.

from cudf.

m1mc commented on May 18, 2024

MapD creates the IPC memory but always relays its ownership to pygdf for now. I am all for data exchange layer(that does ref-counting etc.) and maybe renaming pygdf to sth. else. As of UDF, not sure if the use-case is language independent where we might need an ABI, or link its LLVM bitcode.

from cudf.

sklam commented on May 18, 2024

Such a core library would be very small (< 10 KLOC I would say) and focused on data structures, memory management / movement, and APIs for user-defined functions.

I am open to the idea for a C++ core library but it is too soon to commit to one. My approach to PyGDF now is to keep it simple and loosely-coupled for things to evolve quickly. There are many technical questions about how the GDF will look like and all the management it needs. There are a lot more things to think about than CPU code:

GPU device context ownership?
- many applications tend to assume sole ownership of all GPU devices and contexts.
GPU memory management
- sharing: unified memory? IPC memory? peer GPU memory access? managed lifetime?
- special memory: pagelocked host memory? mapped host memory?
- working with custom allocator in other GPU software
Asynchronous execution queue (CUDA Stream)
- Who owns? Who manage?
- Async execution could already be a challenge
GPU error
- error may segfault or corrupt the GPU context, forcing the process to terminate and relaunch to recover (see error description in the cuda runtime docs). This can cause problem for multiple applications to share the same process.

By focusing on the IPC front first, we avoid a lot of the technical questions above since each process is isolated in its own process-space. By using Arrow and flatbuffers toolset, we can already share data among multiple languages. Given the requirements for GDF is still evolving, we should focus on the application-side first to get a sense of the usecases. Then, we can come back with a design for the C++ core library if it becomes necessary.

from cudf.

wesm commented on May 18, 2024

My approach to PyGDF now is to keep it simple and loosely-coupled for things to evolve quickly.

Would it make sense to start drafting some design documents to lay out the requirements and other constraints / implementation considerations and tradeoffs? I think it would be worth doing some up front design work to build consensus around these questions. Among all the people here, there's a huge amount of domain expertise in doing analytics on the GPU, so I think this would be a productive use of time.

With Apache Arrow we spent a lot of time iterating on the specification documents and the mechanics of zero-copy IPC / the Flatbuffer metadata, and I think resolving many of the design questions up front has been a big help.

from cudf.

m1mc commented on May 18, 2024

@sklam this is a good list of problems to think about that could be longer, but I don't want to see it overcomplicated. Basically we can well isolate the management layer or service regardless of what the use cases are, and all memory de/allocation count on that where probably only IPC memory works. As of error handling this is another topic that could be also well handled. Just need parallelism in tasking.

from cudf.

sklam commented on May 18, 2024

Would it make sense to start drafting some design documents to lay out the requirements and other constraints / implementation considerations and tradeoffs?

Yes, I think we should as we want to be open about our design and implementation. Perhaps, we should open a new repo under gpuopenanalytics to put all the design docs and questions. At this stage, I guess the design note would just be "trying out Arrow on the GPU" =).

Also, I would like to get the core cross-language IPC code in its own repo. The arrow flatbuffers code does not need to be in the this repo. And, I don't want folks to think that PyGDF is The GDF.

@wesm, is there a recommended way to ship the generated python flatbuffer code?

from cudf.

sklam commented on May 18, 2024

@m1mc , yes, we should focus on IPC memory first. It will make things a lot simpler.

from cudf.

wesm commented on May 18, 2024

@wesm, is there a recommended way to ship the generated python flatbuffer code?

We can start a separate thread about this. Why would you would want to reimplement the metadata serialization and IPC loading/unloading in Python (vs. simply using libarrow)? If there's some aspect of the libarrow API that's inadequate for this use case, I will be happy to scope out and help do some development to help this project.

from cudf.

wesm commented on May 18, 2024

One thing I've thought about is defining some C structs to make interacting with raw Arrow memory from LLVM or C simpler. So you could have

typedef struct {
  int64_t length;
  int64_t null_count;
  const uint8_t* valid_bits;
  int type;
} arrow_base_t;

typedef struct {
  struct arrow_base_t base;
  const uint8_t* data;
} arrow_primitive_t;

typedef struct {
  struct arrow_base_t base;
  const int32_t* offsets;
  const uint8_t* data;
} arrow_string_t;

I don't know all the requirements so it would be great to start some design docs in Markdown or some other format.

from cudf.

sklam commented on May 18, 2024

@wesm

Why would you would want to reimplement the metadata serialization and IPC loading/unloading in Python (vs. simply using libarrow)? If there's some aspect of the libarrow API that's inadequate for this use case, I will be happy to scope out and help do some development to help this project.

I wouldn't want to reimplement anything if it is already available. But, there are some complication due to the metadata and data being in a single CUDA IPC memory region. It cannot be accessed directly through normal CPU pointer. We need to keep it on the GPU to avoid transfer to host (CPU) memory. Is libarrow willing to include CUDA support?

from cudf.

sklam commented on May 18, 2024

(adding to my previous comment)

PyGDF only has the IPC reading part. The metadata is the only portion copied back to the host. The data is kept on the device. The reader tries to minimize device->host transfer.

I am interested to know how mapd implemented the IPC serialization part. (ping @m1mc)

from cudf.

m1mc commented on May 18, 2024

Since Arrow (up to 0.3.0) only has CPU serializer, we are keeping and serializing results in system memory and upload to the device for now. Otherwise, we had had to come up w/ our own GPU serializer in short term but should be easy to do by generating an extra null bitmap in a kernel. BTW, PyGDF should have got a separate metadata through thrift API.

from cudf.

sklam commented on May 18, 2024

BTW, PyGDF should have got a separate metadata through thrift API.

Yes. I just picked the one in the IPC memory. It doesn't have to be that way. The only reason for the current way is that a single memory region feels more consistent. Probably not be a strong reason.

Depending on how the serialization is done, we can just keep one copy of the metadata instead of duplicating it on both host and device.

FYI. Beside the metadata for the Schema, there are small bits of metadata inside the RecordBatches header that need to be parsed on the host.

from cudf.

wesm commented on May 18, 2024

I would like to start a libarrow_cuda add-on library in the Arrow codebase and try to get some simple IPC loading tests running on the host with the existing codebase to see if there are any issues. If others would like to get involved https://issues.apache.org/jira/browse/ARROW-1055

from cudf.

MapD Thrift interactions in C++ (and C++ in general) about cudf HOT 15 CLOSED

Comments (15)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs