mercury-hpc / mercury Goto Github PK

View Code? Open in Web Editor NEW

167.0 23.0 60.0 6.91 MB

Mercury is a C library for implementing RPC, optimized for HPC.

Home Page: http://www.mcs.anl.gov/projects/mercury/

License: BSD 3-Clause "New" or "Revised" License

CMake 5.18% C 91.96% C++ 1.70% Shell 0.34% Python 0.82%

hpc rpc networking data-services

mercury's Introduction

Mercury

Mercury is an RPC framework specifically designed for use in HPC systems that allows asynchronous transfer of parameters and execution requests, as well as direct support of large data arguments. The network implementation is abstracted, allowing easy porting to future systems and efficient use of existing native transport mechanisms. Mercury's interface is generic and allows any function call to be serialized. Mercury is a core component of the Mochi ecosystem of microservices.

Please see the accompanying LICENSE.txt file for license details.

Contributions and patches are welcomed but require a Contributor License Agreement (CLA) to be filled out. Please contact us if you are interested in contributing to Mercury by subscribing to the mailing lists.

Architectures supported

Architectures supported by MPI implementations are generally supported by the network abstraction layer.

The OFI libfabric plugin as well as the shared-memory (SM) plugin are stable and provide the best performance in most workloads.

The UCX plugin is also available as an alternative transport on platforms for which libfabric is either not available or not recommended to use.

For both OFI and UCX plugins, please run the hg_info command for a list of available transports on the system.

MPI and BMI plugins are deprecated and no longer supported.

See the plugin requirements section for plugin requirement details.

Documentation

Please see the documentation available on the mercury website for a quick introduction to Mercury.

Software requirements

Compiling and running Mercury requires up-to-date versions of various software packages. Beware that using excessively old versions of these packages can cause indirect errors that are very difficult to track down.

Plugin requirements

To make use of the OFI libfabric plugin, please refer to the libfabric build instructions available on this page.

To make use of the UCX plugin, please refer to the UCX build instructions available on this page.

To make use of the native NA shared-memory (SM) plugin on Linux, the cross-memory attach (CMA) feature introduced in kernel v3.2 is required. The yama security module must also be configured to allow remote process memory to be accessed (see this page). On MacOS, code signing with inclusion of the na_sm.plist file into the binary is currently required to allow process memory to be accessed.

Optional requirements

For optional automatic code generation features (which are used for generating serialization and deserialization routines), the preprocessor subset of the BOOST library must be included (Boost v1.48 or higher is recommended). The library itself is therefore not necessary since only the header is used. Mercury includes those headers if one does not have BOOST installed and wants to make use of this feature.

Building

If you install the full sources, put the tarball in a directory where you have permissions (e.g., your home directory) and unpack it:

bzip2 -dc mercury-X.tar.bz2 | tar xvf -

Replace 'X' with the version number of the package.

(Optional) If you checked out the sources using git (without the --recursive option) and want to build the testing suite (which requires the kwsys submodule) or use checksums (which requires the mchecksum submodule), you need to issue from the root of the source directory the following command:

git submodule update --init

Mercury makes use of the CMake build-system and requires that you do an out-of-source build. In order to do that, you must create a new build directory and run the ccmake command from it:

cd mercury-X
mkdir build
cd build
ccmake .. (where ".." is the relative path to the mercury-X directory)

Type 'c' multiple times and choose suitable options. Recommended options are:

BUILD_SHARED_LIBS                ON (or OFF if the library you link
                                 against requires static libraries)
BUILD_TESTING                    ON/OFF
BUILD_TESTING_PERF               ON/OFF
BUILD_TESTING_UNIT               ON/OFF
Boost_INCLUDE_DIR                /path/to/include/directory
CMAKE_INSTALL_PREFIX             /path/to/install/directory
MERCURY_ENABLE_DEBUG             ON/OFF
MERCURY_TESTING_ENABLE_PARALLEL  ON/OFF
MERCURY_USE_BOOST_PP             ON/OFF
MERCURY_USE_CHECKSUMS            ON/OFF
MERCURY_USE_SYSTEM_BOOST         ON/OFF
MERCURY_USE_SYSTEM_MCHECKSUM     ON/OFF
MERCURY_USE_XDR                  ON/OFF
NA_USE_DYNAMIC_PLUGINS           ON/OFF
NA_USE_BMI                       ON/OFF
NA_USE_MPI                       ON/OFF
NA_USE_OFI                       ON/OFF
NA_USE_PSM                       ON/OFF
NA_USE_PSM2                      ON/OFF
NA_USE_SM                        ON/OFF
NA_USE_UCX                       ON/OFF

Setting include directory and library paths may require you to toggle to the advanced mode by typing 't'. Once you are done and do not see any errors, type 'g' to generate makefiles. Once you exit the CMake configuration screen and are ready to build the targets, do:

make

(Optional) Verbose compile/build output:

This is done by inserting VERBOSE=1 in the make command. E.g.:

make VERBOSE=1

Installing

Assuming that the CMAKE_INSTALL_PREFIX has been set (see previous step) and that you have write permissions to the destination directory, do from the build directory:

 make install

If RPATH is not requested, ensure also that CMAKE_SKIP_INSTALL_RPATH has previously been set when configuring the project with CMake.

Testing

Tests can be run to check that basic RPC functionality (requests and bulk data transfers) is properly working. With BUILD_TESTING_UNIT set to ON, CTest is used to run the tests, simply run from the build directory:

ctest .

(Optional) Verbose testing:

This is done by inserting -V in the ctest command. E.g.:

ctest -V .

Extra verbose information can be displayed by inserting -VV. E.g.:

ctest -VV .

Some tests run with one server process and X client processes. To change the number of client processes that are being used, the MPIEXEC_MAX_NUMPROCS variable may need to be modified (toggle to advanced mode if you do not see it). The default value is automatically detected by CMake based on the number of cores that are available. Note that you need to run make again after the makefile generation to use the new value.

FAQ

Below is a list of the most common questions.

Q: Why am I getting undefined references to libfabric symbols?

A: In rare occasions, multiple copies of the libfabric library are installed on the same system. To make sure that you are using the correct copy of the libfabric library, do:
```
ldconfig -p | grep libfabric
```
If the library returned is not the one that you would expect, make sure to either set LD_LIBRARY_PATH or add an entry in your /etc/ld.so.conf.d directory.
Q: Is there any logging mechanism?

A: To turn on error/warning/debug logs, the HG_LOG_LEVEL environment variable can be set to either error, warning or debug values. Note that for debugging output to be printed, the CMake variable MERCURY_ENABLE_DEBUG must also be set at compile time. Specific subsystems can be selected using the HG_LOG_SUBSYS environment variable.

mercury's People

Contributors

Stargazers

Watchers

mercury's Issues

Addr uninitialized

Reported by anonymous on 05/02/14
After calling NA_Addr_lookup_wait to fill an address, the function returns successfully (HG_SUCCESS), but on usage of the returned addr, it is uninitialized memory.

Addr lookup returns:

na_ssm_addr_lookup:695: Enter (in_name: tcp://localhost:10000).

na_ssm_addr_lookup:793: Exit (Addr: 0x59f6a10, Status: 0).

Further usage of that addr results in:
==2916== Use of uninitialised value of size 8
==2916== at 0x40C1A0: NA_Addr_is_self (na.c:582)
==2916== by 0x402CB1: HG_Forward (mercury.c:836)
==2916== by 0x4020D0: main (ping_server.c:173)
==2916==
==2916== Jump to the invalid address stated on the next line
==2916== at 0x0: ???
==2916== by 0x402CB1: HG_Forward (mercury.c:836)
==2916== by 0x4020D0: main (ping_server.c:173)
==2916== Address 0x0 is not stack'd, malloc'd or (recently) free'd
==2916==
==2916==
==2916== Process terminating with default action of signal 11 (SIGSEGV)
==2916== Bad permissions for mapped region at address 0x0
==2916== at 0x0: ???
==2916== by 0x402CB1: HG_Forward (mercury.c:836)
==2916== by 0x4020D0: main (ping_server.c:173)

Add callback version of HG API

Reported by jsoumagne on 05/09/14
Switch mercury to callback model.

Mercury uses HG_Register_data internally

HG_Register_data is used in HG_Register for attaching serialization info, meaning users can't use it. I tried using it earlier today and the code promptly blew up :), specifically in HG_Get_input, which calls the serialization functions. Seems easy enough to put the hg_proc_info struct being registered into the hg_rpc_info struct directly, at least.

Blocking get of HG_Forward arguments (case of overflow)

Reported by jsoumagne on 10/29/13
HG_Forward arguments that do not fit in an eager message are pulled from the client to the server using an HG_Bulk_read call. This call is followed by an HG_Bulk_wait, preventing the server from processing other calls until the bulk transfer is over.

Enable/disable checksums

Reported by jsoumagne on 09/20/13
Needs to be able to enable/disable checksum of metadata per operation.

getting NA class string

Adding a function to get the class name from an na_class_t will let me not worry about carrying around the class name through the code after initialization.

/** Returns the name of the NA class.
 * \param na_class [IN] pointer to NA class
 * \param class_name [OUT] pointer to NA class name
 *
 * \return NA_SUCCESS or corresponding NA error code
 */ 
na_return_t NA_Get_class_name(na_class_t *na_class, const char **class_name);

type mismatch in bulk interface

HG_Bulk_get_size returns an hg_uint64_t, while the remaining bulk functions use hg_size_t to represent the size abstracted by a bulk handle. get_size should probably be changed to use hg_size_t. Looks like the typedef for hg_size_t resolves to hg_uint64_t, so it can be safely changed at the moment without breakage.

Remove MERCURY_CREATE_SINGLE_LIB option

Reported by jsoumagne on 10/08/13
This option can be used to generate one single library called 'mercury' instead of 'na'/'util'/'mercury'. Removing it will make it easier to maintain the cmake files, especially since now libraries can be picked up by using either the pkg-config file or the cmake config files.

problems with function argument size in Mercury > 4K

Reported by chaarawi on 05/20/14
Mercury does not detect when the function arguments or the response sent are bigger than NA_MPI_UNEXPECTED_SIZE.

Right now this value is 4K. If something larger than that is sent, mercury should detect that and preferably handle it transparently or with some hints from the user. The least preferable solution is to return an error when that happens.

race condition in na_bmi put

Reported by carns on 04/20/14
The na_bmi put operation stores the BMI operation ID for two BMI operations that are posted back to back. Later on in na_bmi_progress_expected() it compares them to completed operations IDs to determine how to make progress on the put operation.

If another thread is being used to drive progress, however, then BMI may actually complete the first operation before the second one is issued. In that case they can both be assigned the same operation ID. This causes the put to hang because the operation is mis-identified in na_bmi_progress_expected().

I can easily reproduce this issue and will be happy to test possible fixes.

Support const parameters

Reported by jsoumagne on 08/16/13
Because hg_proc_xxx functions are used for both encoding and decoding, const parameters cannot be passed without an explicit cast. There should be a way of avoiding that, maybe by having hg_const_string_t etc ?

Add convenience routines to bulk data interface

Reported by jsoumagne on 01/22/14
when using mercury locally (ie without shipping anything), we need to be able to get bulk data without doing any memory copy and access it through pointers. We should add convenience routines that allow users to directly get a pointer when doing a bulk_read (HG_Bulk_read_ptr ?) / refactor API for this case.

need mechanism to free memory allocated by decoder on client side

Reported by carns on 08/15/13
The output struct from HG_Forward() may contain memory implicitly allocated by the decoder. It isn't clear how to safely free this memory after inspecting the output struct.

This is only a problem on the client side. The server automatically frees any memory that was allocated as part of decoding the request.

na_bmi_wait() deadlock after communication failure

Reported by carns on 08/30/13
The na_bmi_wait() function (called in the HG_Wait() path) doesn't look like it releases mutexes correctly on all error paths. As a result, if you experience a communication failure and then try issue another RPC, it will deadlock.

Observed in git revision 35b29eb.

HG_Handler_process() hang with na_bmi

Reported by carns on 02/12/14
This isn't 100% reproducable, but I seem to be hitting a case where HG_Handler_process() doesn't return, even though I'm specifying a timeout of 1.

The use cases is that I have a dedicated thread calling HG_Handler_process() in a loop, and I need to to break out of that call periodically to check if the daemon (and therefore this thread) is shutting down.

The stack trace looks like this when it is hung:

#0  na_bmi_progress_unexpected (na_class=na_class@entry=0x1249710, 
    context=context@entry=0x124b330, 
    progressed=progressed@entry=0x2b7cbf247d57 "", timeout=0)
    at /home/pcarns/working/mercury/src/na/na_bmi.c:1665
#1  0x0000000000480707 in na_bmi_progress (na_class=0x1249710, 
    context=0x124b330, timeout=<optimized out>)
    at /home/pcarns/working/mercury/src/na/na_bmi.c:1634
#2  0x000000000047f024 in NA_Progress (na_class=0x1249710, context=0x124b330, 
    timeout=timeout@entry=1) at /home/pcarns/working/mercury/src/na/na.c:859
#3  0x000000000047bfd0 in HG_Handler_process (timeout=1, 
    status=0x1 <Address 0x1 out of bounds>)
    at /home/pcarns/working/mercury/src/mercury_handler.c:882
#4  0x00000000004550a0 in svr_thread_fn (foo=0x0)
    at ../src/remote/mercury-engine.ae:251
#5  0x00002b7cb95c5f6e in start_thread (arg=0x2b7cbf248700)
    at pthread_create.c:311
#6  0x00002b7cba1549cd in clone ()
    at ../sysdeps/unix/sysv/linux/x86_64/clone.S:113

I have confirmed that na_bmi_progress_unexpected() is being called repeatedly, so BMI itself is not hanging.

I can't see the value of the timeout argument at the na_bmi_progress() level (need to recompile with different flags), but I can see the value of the double "remaining" variable within na_bmi_progress(), however, and its present value in my example is 4292707.9868252408. That's way too high since it is supposed to be in units of seconds. This is probably the reason why the function doesn't return.

I'll try to debug further later and see if I can observe why that variable value is getting so high in the first place, but I wanted to go ahead and post the symptoms here in trac.

get hg_request_init argument back on hg_request_finalize

If the "arg" argument needs cleaning up, it needs to be retrieved somehow before finalizing.

The workaround right now is to create a new request, get the pointer via hg_request_get_data, then immediately destroy the request.

Fix HG_Init() and HG_Bulk_Init() to not return error when called second time.

Reported by sumit on 10/23/13
Currently, HG_Init() and HG_Bulk_Init() return error when called the second time. This doesn't work well in cases where the user is set up as a client and also as a server. It ends up calling HG_Init() and HG_Handler_init().

We haven't figured out yet if and how we are going to fix it. Need more discussions. For now, in some places, we are going to drop the error returned by these initialization function. Look for this trac number in the code.

Add callback version of HG_Bulk API

Reported by jsoumagne on 12/10/13
Switch mercury_bulk.h to callback model.

CMake build fails when TESTING enabled and changing NA plugin

Reported by sumit on 09/18/13

Steps to reproduce:

Configure Mercury with NA plugin (eg. BMI and others OFF) and BUILD_TESTING ON.
Build. Succeeds.
Reconfigure Mercury with a different NA plugin (BMI off, MPI on)
Build fails.
$ make clean
$ make
Fails in building test code with "undefined reference to NA_BMI_Init" error.

server unresponsive after multiple HG_Responds

// Copied from the mercury email list

I've been doing some Mercury benchmarking as of late and ran into the following problem: after calling multiple HG_Responds in a row, the RPC callee (the "server") doesn't process future RPCs. I've attached a test case [see mail in mailing list] that exhibits the problem. N client processes "check in" to a single server, and the server issues a batch of HG_Responds upon receiving the N'th checkin RPC.

error messages unhelpful

Reported by robl on 09/18/14
When the server has not registered a function, and the client asks the server for a function, we get the message

HG: Error in /home/robl/work/mercury/src/mercury.c:267
 # hg_set_input(): hg_hash_table_lookup failed
HG: Error in /home/robl/work/mercury/src/mercury.c:862
 # HG_Forward(): Could not set input

What that means is the client tried to HG_Forward a routine, but the server has not called MERCURY_REGISTER for that routine.

bad timeout conversions in HG_Wait, hg_request_wait

Reported by jenkins on 09/29/14
In the first line of the HG_Wait and hg_request_wait functions:

double remaining = timeout / 1000; /* Convert timeout in ms into seconds */

timeout is an unsigned int, so the resulting computation is truncated at a second granularity before being cast to double. In other words, all waits of < 1000ms are being truncated to no-ops (0s).

Using the double constant 1000.0 for both should do the trick w.r.t. casting.

Mercury version export

Reported by dkimpe on 09/10/13
To make it simpler for mercury users to determine which version is being used, we should do the following:

Add a compile time version

Make mercury define HG_VERSION_MAJOR and HG_VERSION_MINOR
which can be used by the preprocessor.

Add a runtime version retrieval method.

somehting like

int hg_version_get (unsigned int * major, unsigned int * minor);

Converting error codes into string

Reported by jsoumagne on 10/07/13
(both mercury and na, though more relevant for NA)

Package Config isn't correct.

Reported by harms on 10/03/13
This is related to ticket #5. It don't seem like the package config output is completely correct. (Or i'm using it incorrectly)

Here is the output of pkg-config:

harms@sirsteve:~/working/triton/mercury-test$ cat ~/working/triton/install/lib/pkgconfig/mercury.pc 
# This gives access to the mercury header files
prefix=/home/harms/working/triton/install
exec_prefix=/home/harms/working/triton/install
libdir=/home/harms/working/triton/install/lib
includedir=/home/harms/working/triton/install/include

Name: mercury
Description: The Function Shipper for I/O Forwarding
Version: 0.8.0
URL: http://trac.mcs.anl.gov/projects/mercury
Requires:
Libs: -L${libdir}  -lmercury -lna -lmercury_util
Libs.private:  /home/harms/working/triton/install/lib/libbmi.a -lpthread -lrt
Cflags: -I${includedir}  -I/usr/include -I/home/harms/working/triton/install/include

Here's what libs gives me:

harms@sirsteve:~/working/triton/mercury-test$ PKG_CONFIG_PATH=/home/harms/working/triton/install/lib/pkgconfig pkg-config mercury --libs
-L/home/harms/working/triton/install/lib -lmercury -lna -lmercury_util

Here's what cflags gives me:

harms@sirsteve:~/working/triton/mercury-test$ PKG_CONFIG_PATH=/home/harms/working/triton/install/lib/pkgconfig pkg-config mercury --cflags
-I/home/harms/working/triton/install/include

I would like the '--libs' flag to include everything in Libs.private otherwise all the mercury libraries fail to link.

const qualifier in hg_string_t is problematic

Reported by carns on 08/15/13
The hg_string_t type is typedefed to a "const char_", which means that hg_proc_hg_string_t() cannot be reused to handle encoding of an existing char_ without typecasting.

The hg_proc_hg_string_t() encoding function itself also produces warnings in mercury_proc.h if -Wcast-qual is enabled because it is having to cast to non-const pointers internally to free allocated data.

The const qualifier might not be appropriate in that particular typedef?

NA method for debug

Reported by harms on 12/05/14
Perhaps NA should have a method to turn on debugging of the transport. I
could go for something simple that is just on/off although BMI has a fancy
debug system.

clang type annotations

Reported by robl on 09/22/14
Mercury uses a lot of void pointers. It would be nice if we could annotate the interface to catch problems.

see http://hpc-ua.org/hpc-ua-12/files/proceedings/3.pdf
and http://clang.llvm.org/docs/AttributeReference.html#type-safety-checking

Converting na addr into string (for debugging/log purposes)

Reported by jsoumagne on 10/07/13
None

Mercury does not allow Bulk register of same Buffer

Reported by chaarawi on 05/21/14
As an I/O library built on top pf Mercury, we would have to force the application that needs to write the same data to different datasets/attributes/Maps, to either do it synchronously or to have have multiple copies of the same data.
Furthermore, we can't track application buffers, so it won't be possible to enforce this limitation, which will result in errors from mercury when deregistering the bulk handle.

NA_Initialize() failure for BMI clients

Reported by carns on 08/04/13
To reproduce, call:

NA_Initialize("bmi", NULL, 0);

This produces the following error:

[21:52:15.346942](E) BMI_initialize: Failed to find an appropriate listening address for the bmi method: bmi_tcp
Error in /home/pcarns/working/mercury/src/na/na_bmi.c:209 (NA_BMI_Init): BMI_initialize() failed.
Segmentation fault (core dumped)

BMI was built with default configuration options and no methods other than bmi_tcp.

Mercury was built with default configuration options except that BMI support was enabled and Boost preprocessor support was enabled.

Reproduced with Mercury git revision 0d8a195 (master as of 2013-08-04).

Excess stderr output on communication failure

Reported by carns on 08/30/13
I see the following messages in stderr when an RPC operation fails (for example, sending to a server that isn't up yet):

Error in /home/pcarns/working/mercury/src/na/na_bmi.c:374 (na_bmi_msg_send_unexpected): BMI_post_sendunexpected() failed.
Error in /home/pcarns/working/mercury/src/mercury.c:385 (HG_Forward): Could not send buffer.

This is problematic in cases where it is expected for communication to fail.

Observed in git 35b29eb.

HG_Handler_process() never returns unless a request arrives

Reported by carns on 07/18/13
To reproduce, do the following in a Mercury server:

hg_ret = HG_Handler_process(1, HG_STATUS_IGNORE);
fprintf(stderr, "HG_Handler_process() completed.\n");

... but do not issue any RPC operations to the server. The expected behavior is that HG_Handler_process() will honor the timeout value of 1 and return even though no requests have been received, but it appears to hang indefinitely instead.

Mercury is configured to use the BMI transport in this case.

This problem is present in git revision 167235d. It worked as expected a few weeks ago but I'm not sure what revision that was.

Mercury not reporting error on BMI_initialize() failure.

Reported by sumit on 09/17/13
In na_bmi.c, Mercury prints an error log when BMI_initialize() fails, but does not return a failure; instead continues.

address management woes

As discussed earlier, there are some issues with address management:

CCI seems to "do it's own thing" with addresses. This happens for TCP (port not respected), IB, and SM (URI replaced with an entry to /tmp)
MPI configuration (dynamic processes in particular) is cumbersome, though that's not necessarily Mercury's fault.

Getting CCI to respect user options would aid usability significantly as that's been the focus of our benchmarking so far.

More generally, initialization could do more to indicate the "right" options to provide. E.g. NA_Initialize having separate args for class, transport, and "host" where the host type is class-dependent (MPI port string or communicator/rank for MPI, hostname/port pair for TCP, filesystem entry / PID for shmem, etc.). Class-specific parser functions can be used to generate the arguments. At the moment it's unclear how exactly the input is utiltized. Additionally, clients don't need a "host" argument AFAICT, though the parsing function checks for one anyways.

Add DMAPP plugin for NA

Reported by jsoumagne on 05/09/14
DMAPP is a higher level interface but is limited to MPMD execution from the same job (as DMAPP calls need a PE arguments that only gets initialized when dmapp_init is called), so it seems that we cannot do dynamic connect/accept with this method.

Add protocol version number

Reported by jsoumagne on 09/25/13
Add version number of protocol used in mercury header when encoding/decoding (same for bulk data)

Boost version dependency

Reported by dkimpe on 05/29/13
Currently required 1.50;
Is that really needed?

Might be that the functionality we use is supported by older versions.

(Ubuntu 13.04 currently comes with version 1.49)

Size of buffer passed to NA_Addr_to_string

Reported by jsoumagne on 05/08/15
There is currently no way of telling what the buffer size that needs to be passed to NA_Addr_to_string should be. Unless there is a function to get the length of the stringified NA addr, should this function return the length needed rather than NA_SIZE_ERROR if it is too small?

-Wundef warnings in mercury headers

Reported by carns on 08/15/13
If -Wundef is enabled for projects that use the Mercury, then the Mercury headers will produce the following warnings in gcc:

/home/pcarns/working/mercury/install/include/mercury_error.h:25:5: warning: "__STDC_VERSION__" is not defined [warning: "_WIN32" is not defined [-Wundef](-Wundef]
/home/pcarns/working/mercury/install/include/mercury_error.h:31:7:)
In file included from ../src/remote/core-rpc.ae:13:0:
/home/pcarns/working/mercury/install/include/mercury_proc.h:197:20: warning: "__GNUC_STDC_INLINE__" is not defined [-Wundef]

These can probably be trivially fixed by replacing the "#if " blocks by "if defined && if " in the headers.

Simplify Mercury context pointer management

(context as in na_class/context_t and hg_class/context_t)

Currently, all four are needed - the NA pointers for address lookup, and the HG pointers for RPC registration / management. In the case where there are many "services" attaching to mercury, this strains usability. There's a number of ways in which the situation can be improved, based on the observation that every HG/NA context is associated with a single HG/NA class, and every HG class is associated with a single NA class/context:

Eliminate redundant parameters. Based on the class/context relationship, any function using the both of them can, say, drop the class pointer (HG_Init, HG_Bulk_init, HG_Create, HG_Progress, HG_Trigger, NA_Addr_lookup). This is also beneficial in the case it would be an error to, say, mismatch the class and context. Alternatively, combined with 3., you could take just a class_t when the context_t is unnecessary, and vice versa.
NA need not be exposed to the user at all - HG classes/contexts correspond to exactly one NA context / class. You could, say, have HG versions of address management functions.
Simply provide "getters" for classes and contexts. That way, I can for example just pass in an hg_context_t to a service and it can retrieve the other metadata structures as needed to perform it's work.

need -lrt for older toolchains

Reported by harms on 08/02/13
Mercury needs to add -lrt in the pkg-config flags when an older GLIBC is used. I'm running Ubuntu 12.04 and clock_gettime is not in GLIBC but is in librt. Here's an excerpt for a more recent clock_gettime manpage.

Link with -lrt (only for glibc versions before 2.17).

OSX type size warnings

Compiling on 64-bit OSX systems give a number of conversion warnings, mostly around mixing size_t (of type "unsigned long"), hg_size_t and hg_uint64_t (both of type "uint64_t -> unsigned long long"). Fixing would require upcasting arguments where necessary and using PRIu64 for printfs.

Not a big deal since long == long long on most systems we'd ever care about, though...

ability to cancel pending operations

Reported by carns on 05/19/14
We will eventually need the ability to cancel pending hg_request_t (issued via HG_Forward()) and hg_bulk_request_t (issued via HG_Bulk_write() and HG_Bulk_read()) operations.

The completion of cancelled operations would ideally be handled through the normal completion mechanism for Mercury so that it isn't a special case for API users.

Resolve build warnings

Reported by sumit on 10/17/13
Task: Add -Wundef and -Wc++-compat compiler flags to build and fix the warnings, if any.

Support for batch calls

Reported by dkimpe on 10/30/13
It might be useful to batch up multiple calls to a single destination, to avoid sending many small packets.

Something like:

h = HG_Start_batch();

HG_Forward(H, func1);
HG_Forward(H, func2);

HG_Submit_batch(h);

...

and then offer a method to wait on the completion of the whole batch or individual completion.

One the wire, it would allow us to combine all of the initial RPC requests
(and possibly responses) into one larger packet.

If this is functionality we know we want, we could already implement the functions and not do the optimization.

Bulk data optimization

Reported by dkimpe on 10/22/13
For variable length arguments that could end up being big,
the user now always has to do the bulk data setup.

However, if the argument ends up being small (for example it would fit in the initial RPC request), a lot of extra work has been done.

At the same time, the user is not in a position to determine how much space is available in the eager RPC request (due to unknown encoding/header overhead etc.)

Proposed solution:

Have the bulk data module determine if a get/put should be used or if the data should be transmitted in eager mode.
If eager mode: the data gets encoded into the serialized handle transparently. The register/unregister/publish/... calls would do nothing in this case.
The origin would do the same, and put/get would complete immediately.

Optimization (not sure that it makes sense though):

If a function has multiple bulk data arguments, it would be good if eager mode is not used if it would trigger a put/get for the rest of the arguments.
To implement this, the encoder routines would need to get access to the number of eager bytes left in the header.

test ticket email

Reported by dkimpe on 07/23/13
test

get HG_Registered_data directly from handle?

Some initial thoughts from using HG_Register_data and HG_Registered_data -

The main use case I'm encountering is within an RPC, where I'm translating from a "global" worldview (the rpc callback) into a "local" worldview (the object that this RPC type is working on). Caching the registered data in the handle, while increasing handle sizes by a pointer size, would prevent the need to consult the hash table on each and every RPC which wants to operate in this local view, as well as needing to grab the hg_class_t and hg_id_t on each RPC (less of an issue, of course). Maybe a call "HG_Registered_data_handle(hg_handle_t handle)"?

Another question is whether it makes sense to have more than one pointer registered with the RPC. I'm thinking, as the meeting earlier today brought up, of having e.g. multiple db instances on a node, which could potentially be spawned as threads in a process. Though this one is just food for thought - more than likely we'd be having multiple mercury endpoints and hence multiple processes for that use-case.

Is the first point something that can be easily addressed?

Support RPCs with no input arguments

Reported by carns on 08/02/13
It would be helpful to support RPC functions that have no input parameters (examples: "noop" operation, or a "shutdown" operation).

It is possible that Mercury supports this already but I just can't figure out how to do it properly. An example test case for this in the Mercury repo might be helpful if that is the case.

I tried two guesses: a) omitting the MERCURY_GEN_PROC call for the input struct and passing in either void or NULL as appropriate for the various register and forward function calls. b) calling MERCURY_GEN_PROC with only one argument to try to create an empty input struct.

a) produces a variety of compile errors, including "hg_proc_void undeclared", while b) produces a "error: macro "MERCURY_GEN_PROC" requires 2 arguments, but only 1 given" error message.