pmem / pmdk Goto Github PK
View Code? Open in Web Editor NEWPersistent Memory Development Kit
Home Page: https://pmem.io
License: Other
Persistent Memory Development Kit
Home Page: https://pmem.io
License: Other
Hi !
I use the example "manpage.c" of librpmem, the application of client side was executed successfully but the memory pool created has invalid signature.
The client program :
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
#include <unistd.h>
#include <errno.h>
#include <librpmem.h>
#define POOL_SIZE (32 * 1024 * 1024)
#define NLANES 4
//static unsigned char pool[POOL_SIZE];
int
main(int argc, char *argv[])
{
int ret;
unsigned nlanes = NLANES;
void *pool;
size_t align = (size_t)sysconf(_SC_PAGESIZE);
errno = posix_memalign(&pool, align, POOL_SIZE);
if (errno) {
perror("posix_memalign");
return -1;
}
/* fill pool_attributes */
struct rpmem_pool_attr pool_attr;
memset(&pool_attr, 0, sizeof(pool_attr));
/* create a remote pool */
RPMEMpool *rpp = rpmem_create("[email protected]", "pool.set",
pool, POOL_SIZE, &nlanes, &pool_attr);
if (!rpp) {
fprintf(stderr, "rpmem_create: %s\n", rpmem_errormsg());
return 1;
}
/* store data on local pool */
memset(pool, 0, POOL_SIZE);
/* make local data persistent on remote node */
ret = rpmem_persist(rpp, 0, POOL_SIZE, 0);
if (ret) {
fprintf(stderr, "rpmem_persist: %s\n", rpmem_errormsg());
return 1;
}
/* close the remote pool */
ret = rpmem_close(rpp);
if (ret) {
fprintf(stderr, "rpmem_close: %s\n", rpmem_errormsg());
return 1;
}
return 0;
}
The debug messages :
$ ./manpage
<librpmem>: <1> [out.c:283 out_init] pid 20017: program: /root/nvml/src/examples/librpmem/manpage
<librpmem>: <1> [out.c:285 out_init] librpmem version 1.1
<librpmem>: <1> [out.c:289 out_init] src version:
<librpmem>: <3> [librpmem.c:63 librpmem_init]
<librpmem>: <3> [librpmem.c:68 librpmem_init] Libfabric is fork safe
<librpmem>: <3> [rpmem.c:454 rpmem_create] target [email protected], pool_set_name pool.set, pool_addr 0x7f4c5adaa000, pool_size 33554432, nlanes 0x7ffc0e6e3948, create_attr 0x7ffc0e6e38d0
<librpmem>: <3> [rpmem.c:361 rpmem_log_args] req create, target [email protected], pool_set_name pool.set, pool_addr 0x7f4c5adaa000, pool_size 33554432, nlanes 4
<librpmem>: <3> [rpmem.c:363 rpmem_log_args] create request:
<librpmem>: <3> [rpmem.c:364 rpmem_log_args] target: [email protected]
<librpmem>: <3> [rpmem.c:365 rpmem_log_args] pool set: pool.set
<librpmem>: <4> [rpmem.c:366 rpmem_log_args] pool addr: 0x7f4c5adaa000
<librpmem>: <4> [rpmem.c:367 rpmem_log_args] pool size: 33554432
<librpmem>: <3> [rpmem.c:368 rpmem_log_args] nlanes: 4
<librpmem>: <3> [rpmem.c:394 rpmem_check_args] pool_addr 0x7f4c5adaa000, pool_size 33554432, nlanes 0x7ffc0e6e3948
<librpmem>: <3> [rpmem.c:196 rpmem_common_init] target [email protected]
<librpmem>: <3> [rpmem.c:133 rpmem_get_provider] node 10.1.0.48
<librpmem>: <3> [rpmem.c:104 env_get_bool] name RPMEM_ENABLE_SOCKETS, valp 0x7ffc0e6e37ac
<librpmem>: <3> [rpmem.c:104 env_get_bool] name RPMEM_ENABLE_VERBS, valp 0x7ffc0e6e37a8
<librpmem>: <3> [rpmem.c:219 rpmem_common_init] provider: verbs
<librpmem>: <4> [rpmem.c:233 rpmem_common_init] establishing out-of-band connection
<librpmem>: <4> [rpmem_cmd.c:147 rpmem_cmd_log] executing command 'ssh -T -oBatchMode=yes [email protected] rpmemd'
<librpmem>: <4> [rpmem_ssh.c:319 rpmem_ssh_open] received status: 0
<librpmem>: <3> [rpmem.c:241 rpmem_common_init] out-of-band connection established
<librpmem>: <4> [rpmem_obc.c:494 rpmem_obc_create] sending create request message
<librpmem>: <3> [rpmem_obc.c:502 rpmem_obc_create] create request message sent
<librpmem>: <4> [rpmem_obc.c:503 rpmem_obc_create] receiving create request response
<librpmem>: <3> [rpmem_obc.c:512 rpmem_obc_create] create request response received
<librpmem>: <3> [rpmem.c:377 rpmem_log_resp] req create, resp 0x7ffc0e6e3860
<librpmem>: <3> [rpmem.c:379 rpmem_log_resp] create request response:
<librpmem>: <3> [rpmem.c:380 rpmem_log_resp] nlanes: 4
<librpmem>: <3> [rpmem.c:381 rpmem_log_resp] port: 53899
<librpmem>: <3> [rpmem.c:383 rpmem_log_resp] persist method: General Purpose Server Persistency Method
<librpmem>: <3> [rpmem.c:384 rpmem_log_resp] remote addr: 0x7fa440001000
<librpmem>: <3> [rpmem.c:288 rpmem_common_fip_init] rpp 0xc3f6d0, req 0x7ffc0e6e3880, resp 0x7ffc0e6e3860, pool_addr 0x7f4c5adaa000, pool_size 33554432, nlanes 0x7ffc0e6e3948
....
<librpmem>: <3> [rpmem.c:318 rpmem_common_fip_init] final nlanes: 4
<librpmem>: <4> [rpmem.c:319 rpmem_common_fip_init] establishing in-band connection
<librpmem>: <3> [rpmem.c:327 rpmem_common_fip_init] in-band connection established
<librpmem>: <3> [rpmem.c:177 rpmem_monitor_thread] arg 0xc3f6d0
<librpmem>: <3> [rpmem.c:616 rpmem_persist] rpp 0xc3f6d0, offset 0, length 33554432, lane 0
<librpmem>: <3> [rpmem.c:584 rpmem_close] rpp 0xc3f6d0
<librpmem>: <4> [rpmem.c:586 rpmem_close] closing out-of-band connection
<librpmem>: <4> [rpmem_obc.c:677 rpmem_obc_close] sending close request message
<librpmem>: <3> [rpmem_obc.c:685 rpmem_obc_close] close request message sent
<librpmem>: <4> [rpmem_obc.c:686 rpmem_obc_close] receiving close request response
<librpmem>: <3> [rpmem_obc.c:695 rpmem_obc_close] close request response received
<librpmem>: <3> [rpmem.c:596 rpmem_close] out-of-band connection closed
<librpmem>: <3> [rpmem.c:343 rpmem_common_fip_fini] rpp 0xc3f6d0
<librpmem>: <4> [rpmem.c:345 rpmem_common_fip_fini] closing in-band connection
<librpmem>: <3> [rpmem.c:349 rpmem_common_fip_fini] in-band connection closed
<librpmem>: <3> [rpmem.c:261 rpmem_common_fini] rpp 0xc3f6d0, join 1
<librpmem>: <3> [librpmem.c:80 librpmem_fini]
In the server side :
$ cat pool.set
PMEMPOOLSET
2G /mnt/pmem0/pool.obj
The check of the memory pool created :
$pmempool check /mnt/pmem0/pool.obj
invalid signature
pool.obj: not consistent
I would like to know if the check result is excepted by this example or there was an error when using this example ?
Thank you,
There's no way to programmatically get the information about pmemobj pool set. I'm mostly interested in pool size, but information about number of replicas, their types (local/remote) and number/paths of parts could be useful.
There is a need for function that can do check on all kinds of pool on device DAX
The zone metadata mostly consists of a fixed-size chunk headers array. The contained data forms a linked list of variably-sized chunks. To verify consistency of the data, it's enough to check if the the mentioned list is correct and spans the entire zone. What's impossible however, is correcting any possible errors. To solve this problem, the proposal is to introduce a new chunk type that would store the chunk headers array and all operations on this array would be performed in N+1 places (depending on how many metadata backups would the user want).
The new chunk will be allocated/deallocated on-demand, depending on the user. By default there would be no backups. The location of the chunk would be stored in all of the chunk header arrays, but because that's what this structure is intended to protect, it would also contain a header with a magic field and a checksum that will allow to locate just by traversing all possible chunks.
Modifications to the chunk headers array would be applied first to the backups.
All of the backup chunks will be identified at startup and compared against the master copy. If any differences are found, a correct header array is identified and all of the array copies are updated to match it.
New CTL API:
"heap.zone.metadata.ncopies"
Defines how many chunk header copies are required. Allocates or deallocates the backup chunks to match this number.
TBD
The pvector is the data structure used for undo logs. Its consistency can be verified by traversing its entire contents - there should be only one zero entry at the end of the vector.
TBD
(should save the number of entries?)
(allocate 2x the vector slots and duplicate data? - would require API change and wouldn't be backward compatible)
(grab one extra lane?)
TBD
Most database software is designed from the ground-up to work as a daemon with clients connecting to it to perform operations. This means that multiprocessing is effectively free. Our library is designed to be embedded into software and it heavily uses memory mappings to provide no-copy (compared to traditional databases) interfaces to the user. That design choice makes it much more difficult to manage state of the pool across multiple processes because the library lacks single controlling entity (the daemon) that could serialize access.
While keeping the above in mind, I believe it's a topic worth pursuing because it enables libpmemobj to be used in conjunction with MPI and many different areas of HPC.
There are several big ticket items that need to be addressed for multiprocess support, here are some I could think of:
PMEMobjpool
is allocated from the transient heap and filled with relevant data. It might appear that this is a trivial change, but in reality we rely on PMEMobjpool
to point to the beginning of the memory pool in so many places that it might be a real difficulty.TBD
For certain pmempool transform operations to be feasible and to facilitate other functionality of libpmempool sync/transform/check the proposal suggest adding the following fields into pool_hdr's unused space:
- unsigned char unused[3944]; /* must be zero */
+ /* fields utilizing space that was unused prior to version 1.3 */
+ uint64_t offset; /* offset of data part in the lot, as of 1.3 */
+ uint64_t size; /* size of data mapping in the lot, as of 1.3 */
+ uint64_t poolsize; /* size of a pool in the lot, as of 1.3 */
+ uint64_t alignment; /* data alignment, as of 1.3 */
+ unsigned char unused[3912]; /* must be zero */
Related issues:
pmem/issues#475
pmem/issues#476
In some cases it could be more convenient to specify the maximum time of benchmark execution instead of fixed number of operations.
Let's add an option for pmembench to specify the max. time of execution of a single run (i.e. "-t 10s").
PMEMOBJ_CONF/pmemobj_ctl_[set|get] provides nice/consistent interface for influencing pmemobj's behavior at runtime.
Please port it to other libraries and use it instead of multitude of environment variables.
See also: Glibc Tunables
Use the new memory protection keys feature of the CPU to prevent bugs from scribbling on large pmem pools. See: https://lwn.net/Articles/643797/
https://github.com/pmem/nvml/blob/master/src/common/util.c#L454
This if statement can only be possible if the (addr % 4) equals (csump % 4) there should be an assert for this check.
Also the function header is saying that the function assumes little endian:
https://github.com/pmem/nvml/blob/master/src/common/util.c#L437
But then later it is using the le32toh and htole64 functions that are architecture specific for little/big endian. Isn't this actually making the function support both endians?
Interleaving potentially can improve performance.
If user doesn't want to use hardware interleaving (because they want to interleave only some of the pools stored on pmem) or hardware doesn't support it, pmemobj could implement it in software.
Example pool set file:
PMEMPOOLSET
INTERLEAVING 2MB
1G /mnt/pmem1/part1
1G /mnt/pmem2/part2
It could be mapped as:
[0, 2MB] of part1
[0, 2MB] of part2
[2MB, 4MB] of part1
[2MB, 4MB] of part2
[4MB, 6MB] of part1
[4MB, 6MB] of part2
...
https://github.com/pmem/pmdk/blob/dd14c39/src/common/pmemcommon.inc
assumes the build system is the deployment system using uname
to define a target OS.
Proposition: Add an optional HOST parameter allowing to pass the deployment system.
Issue reported by @raphaelcohn: https://groups.google.com/forum/#!msg/pmem/dSR5KOrli8I/rYWWrRVGEQAJ
It would be nice if printing info about many files would be possible.
It can be useful especially when comparing information about many parts of the same pool.
e.g. at now we have to call pmempool info X times to print info about X files:
$pmempool info pool.part1
$pmempool info pool.part2
Following call results in printing info about first pool file only:
$pmempool info pool.part1 pool.part2
Consider concatenating files before printing on stdout (as unix's CAT(1) does).
This is a super feature request for all the open issues imported from Trello. Not prioritized.
fchmod
Add support of resolving $HOME env. var. in pool set files.
$cat pool.set
PMEMPOOLSET
20M $HOME/pool
$pmempool create obj pool.set
error: 'pool.set' -- pool.set [incorrect path (must be an absolute one):2]
error: creating pool file failed
Currently we build two sets of libraries from single source tree: stripped libraries intended for production and libraries with all debugging enabled.
We install release libraries to /usr/lib* and debug libraries to /usr/lib*/pmdk_debug.
Debug libraries are built with:
The existence of 2 sets of libraries simultaneously built from one tree is quite unusual.
The proposal is to build only one set of libraries, install them to /usr/lib*, expand release builds to include some of the more useful but still lightweight debugging features and get rid of pmdk_debug directory.
Release libraries would be built with all symbols (and distributions would move those symbols to dbginfo packages) and with some lightweight logging.
Debug builds would be developer-only, so distributions would provide only release packages.
The libpmemobj non-transactional API allows to atomically allocate (and initialize), reallocate, change the type number and eventually to free the object, but there is no simple way to atomically modify the content of an object without using transactions.
The proposal is to add a function that would fill the gap in the existing set of atomic non-transactional operations by providing an ability to atomically modify/update the content of a single object without the need of using transactions.
The new function may be implemented in a couple of ways:
int pmemobj_modify(PMEMobjpool *pop, PMEMoid *oidp,
void (*constructor)(PMEMobjpool *pop, void *ptr, void *arg), void *arg);
This would allow to modify the object data, but not to resize or change the type of an object. Although, it may also take a type_num argument, if needed.
In practice, this would be more or less equivalent to the following code:
int
pmemobj_modify(PMEMobjpool *pop, PMEMoid *oidp,
void (*constructor)(PMEMobjpool *pop, void *ptr, void *arg), void *arg);
{
TX_BEGIN(pop, ...) {
TX_ADD(*oidp); /* only one object per tx */
void *ptr = D_RW(*oidp);
constructor(ptr, arg); /* constructor cannot be NULL */
} TX_ONABORT {
return -1;
} TX_END;
return 0;
}
pmemobj_realloc()
:int pmemobj_realloc(PMEMobjpool *pop, PMEMoid *oidp, size_t size, unsigned int type_num,
void (*constructor)(PMEMobjpool *pop, void *ptr, void *arg), void *arg);
If constructor is not NULL, the library would never attempt to do reallocation in-place, but would always allocate a new object, copy over the user data, and then invoke the user-defined constructor function that can modify the old data and/or initialize the added memory.
Passing the size that is equal to the current allocation size, together with non-NULL constructor pointer will be equivalent to pmemobj_modify()
.
It would be useful to provide (or auto-generate) gnuplot scripts for NVML benchmarks. This would help to quickly visualize benchmark results.
Key requirements:
Sometimes, to investigate the build/test failure it might be useful to have more info about the build/test environment.
build log:
test log (additional info may be optional / dumped only in case of test failure):
Acquiring a lock in a transaction currently requires iteration of all of the existing locks in order to ensure that situation like this:
TX_BEGIN(pop) {
TX_BEGIN_LOCK(pop, &lock) {
} TX_END
TX_BEGIN_LOCK(pop, &lock) {
} TX_END
} TX_END
does not create a deadlock.
The container of the current active locks must be changed in order to improve performance of this function.
Apart from pvector undo logs, the only information stored in a transaction lane is the state
which indicates whether the transaction is committed or aborted.
This proposal suggests adding another state
field which would be located at the opposite side of the lane, roughly ~1kilobyte apart from the first one. Both of those fields would have to be updated to change the state of the variable, where:
And this will have to enforced by properly ordering changes to the transaction field.
"tx.lanes.state.duplicate"
Defines whether the transaction state field must be duplicated or not. The effects of the change are immediate and as such, this value should only be changed when there are no transactions running.
TBD
This is not an issue as of now, but it is something that can affect future discussions about portability.
The current implementation of PMEMmutex, PMEMrwlock:
https://github.com/pmem/nvml/blob/master/src/libpmemobj/sync.h#L50
assume that a platform provided mutex, rwlock implementation's data structure fits in 56 bytes ( 64 bytes minus the runid ).
This doesn't hold true with Apple's libpthread:
https://opensource.apple.com/source/libpthread/libpthread-218.1.3/sys/_pthread/_pthread_types.h.auto.html
In this case sizeof(pthread_mutex_t) is 64, and sizeof(pthread_rwlock_t) is 200.
One possible solution for such cases, is to allocate space for a mutex/rwlock in volatile memory. The PMEMmutex struct would contain a pointer, instead of the actual mutex itself, looking somewhat like FreeBSD's pthread types:
https://github.com/freebsd/freebsd/blob/master/sys/sys/_pthreadtypes.h#L69
This might have drawbacks, e.g. such allocated mutexes can show up as memory leaks in the analysis of some tools.
We only have labels for OS:linux and OS:windows, so I didn't use any labels. We don't have POSIX_portabiliy label. yet.
Instead of a simple check if required API version matches the actual library API version, xxx_check_version()
functions could provide more information about the library. I.e. it could be used to retrieve the source version (tag/commit), whether it is debug/nondebug build, was the library compiled with Valgrind support enabled, etc.
All this data could be compiled into a static string that is returned in case of a call to xxx_check_version()
with major/minor version number set to 0.
If one creates a poolset file (pool.set) like the following:
PMEMPOOLSET
AUTO /dev/dax0.0
AUTO /dev/dax1.0
REPLICA
AUTO /dev/dax2.0
AUTO /dev/dax3.0
OPTION SINGLEHDR
then:
Data read back from addresses (calculated with use of saved offsets) of the resynced poolset is incorrect, not equal to the pattern written in by pmemobj_memset_persist.
BUT if data is written with extra offset > 4k everything works fine.
Found on: pmempool 1.4-rc1 & pmempool 1.4-rc1-71-g70eddc798
The only reliable information about existence of an allocation are bits in the run bitmap. Even the allocation header cannot be used to determine if an object exists because it isn't zeroed on free and, in some cases, the user data that can be present in the location where an allocation header might have existed, might look like a header.
This proposal suggests adding a chunk flag, valid only for runs, that would indicate whether or not the bitmap is duplicated. The bitmap will be stored at the end of the run data, and will be updated alongside the master copy in a redo log.
This would by disabled by default, and the flag would only be set for new chunks created after the option is set. Likewise, disabling this option wouldn't remove this flag from existing chunks, so they would still have their bitmap duplicated.
"heap.zone.run.on_create.flags.duplicate"
Indicates whether the duplicate flag should be set on creation of new chunks.
Currently pmemobj_open fails immediately with EAGAIN errno when pool file is locked by another process. Application that expects pool file to be temporarily locked by another process must loop on pmemobj_open as long as it returns NULL and errno is set to EAGAIN. This is not efficient.
I'd like to tell pmemobj_open to wait on the lock instead of failing. It can be implemented as new version of pmemobj_open that accepts flags or a new ctl.
$cat pool.set:
PMEMPOOLSET
10M /dev/shm/pool
REPLICA 192.168.0.182 rep.set
$cat rep.set
PMEMPOOLSET
10M /dev/shm/rep.1
10M /dev/shm/rep.2
$pmempool create obj pool.set
Local and remote pool files are created
Remove one part from remote replica:
$rm -f /dev/shm/rep.1
$pmempool rm pool.set
error: cannot remove 'rep.set' on '192.168.0.182': Invalid argument
error: removing 'pool.set' failed: Invalid argument
Part /dev/shm/rep.2 from remote was removed, although man pages states that -f switch is required to ignore nonexistent files and proceed:
$man pmempool-rm
-f, --force
Remove all specified files, ignore nonexistent files, never
prompt.
Found on 1.4-rc3-14-g53bdc41
I'd like to tell pmemobj to fail opening the pool if pool configuration includes remote replication.
I think there should be pmemobj_open variant which accepts flags. Alternatively it can be implemented as ctl.
Why? In pmemfile we have a problem when we intercept syscall from libc while libc is holding a lock and something in syscall handling calls back into libc which requires taking the same lock. We can manage it for stuff pmemfile, libpmemobj, libpmem do, but it's impossible to handle it for libfabric, libibverbs and anything below.
There is a need for more accurate documentation for unit tests. Actual documentation is missing:
· number of dax devices needed to run all tests
· number of nodes needed to run all tests
· order of dax devices or nodes
· requirements for HW(master and slaves) and SW
Introduce an API function which allows retrieving replica size from remote poolset file.
This functionality is required when one want to get the size of a damaged remote replica - currently, we cannot get the size of a remote pool without opening it.
Related issues:
pmem/issues#360
User currently has no control over the persistent allocations happening inside of a transaction. They happen automatically and from the global pool - this might interfere with users allocation classes and induce additional fragmentation. The initial idea I had was to simply allow the user to remap the entire internal allocation classes map so that it would be possible to simply substitute custom ones for metadata allocations. That had two problems: a) allocation classes map is common for all allocations, meaning that having custom classes only for metadata might be difficult, and b) the user never really knows what are the sizes of metadata allocations - it would force applications to fill the entire alloc class map, which might not be needed for anything else.
The new idea is to allow the user to substitute metadata allocator for a transaction. Each transaction would take an optional argument - instance of transaction metadata suballocator. If provided, each allocation/deallocation would go through it. This includes pvector arrays, snapshot caches and huge snapshots.
The simplest suballocator would be passthrough to pmalloc with custom allocation class (and other flags), where a little bit more complicated and useful one would be a fixed-size linear allocator that resets when the transaction ends.
There would be a new structure that defines the suballocator:
struct pobj_tx_alloc {
int (*alloc)(struct pobj_tx_alloc *alloc, uint64_t *offset, size_t size);
void (*dealloc)(struct pobj_tx_alloc *alloc, uint64_t *offset);
void (*on_tx_begin)(struct pobj_tx_alloc *alloc);
void (*on_tx_end)(struct pobj_tx_alloc *alloc);
};
It could include a type of the thing being allocated and/or a constructor. TBD
The user would be responsible for instantiating the allocator, which can be then passed to pmemobj_tx_begin as one of the varargs, like so:
struct my_super_tx_allocator *mallocator = ...;
pmemobj_tx_begin(pop, NULL, POBJ_PARAM_ALLOCATOR, mallocator);
The user structure must include struct pobj_tx_alloc
at offset 0:
struct my_super_tx_allocator {
struct pobj_tx_alloc base;
uint64_t *offset;
char *data;
}
There's a problem with the fact that we internally do not use PMEMoids, which means that the user would not be able to simply implement the alloc function as pmemobj_xalloc(..., &oid, ...);
, but would need to use the reserve/publish API and use the pmemobj_set_value()
function to set the offset. Not sure if this is a big issue.
RUNTEST
has the nice feature of loading per test group configuration files (which was done for remote tests). However in the odd case you would want to run the test not through RUNTEST
- which you probably shouldn't - this functionality is absent. The question is, do we want it there for consistency? And there is still the issue of having this on Windows.
pmempool and libpmempool must provide support for checking consistency and repair of pmemobj pools.
Lets consider the following case: one creates an obj pool set with remote replica(s). In both local (pool.set) and remote (remote.set) poolset files there is an option SINGLEHDR included.
If, after creation of poolsets, one wants to use pmempool info on remote.set, the command returns the following message:
error: opening poolset failed
remote.set: Invalid argument
To get any info about remote poolset parts, one can run pmempool info directly on a single part. But as the SINGLEHDR option is active it is possible to retrieve info about the 1st part only.
Shouldn't pmempool info run on remote poolset file?
The pmemblk_check retruns 1 if BTT Info header is corrupted. The problem is that pmem_check calls pmemblk_map_common which writes new layout if BTT Info header consistency checking fails. The new layout is not written to the file because the rdonly flag. However the runtime laidout variable is set to false after writing new layout and in consequence the btt_check immediately returns that BTT is consistent.
Below is simple backtrace when the write_layout is called.
#0 write_layout (bttp=0x6150a0, lane=0, write=0) at ../btt.c:714
pmem/issues#1 0x00007ffff7bd136f in read_layout (bttp=0x6150a0, lane=0) at ../btt.c:990
pmem/issues#2 0x00007ffff7bd17e7 in btt_init (rawsize=1073733632, lbasize=512, parent_uuid=0x10000000018 "\201\067\276\305\361\034C\327Q\f\305E\327X\310pGT", maxlane=8, ns=0x10000000000, ns_cbp=0x7ffff7dd9d80 <ns_cb>) at
../btt.c:1117
pmem/issues#3 0x00007ffff7bceaca in pmemblk_map_common (fd=13, bsize=512, rdonly=1) at ../blk.c:374
pmem/issues#4 0x00007ffff7bcf316 in pmemblk_check (path=0x7fffffffe0a3 "./blk.pool") at ../blk.c:590
e.g. there is no possibility to check consistency of pool set with remote replicas calling pmempool check CLI or using libpmempool API. Error with invalid argument msg. is returned now.
Build output:
alpine-musl-:~/pmdk# make
make -C src all
make[1]: Entering directory '/root/pmdk/src'
make -C libpmem
make[2]: Entering directory '/root/pmdk/src/libpmem'
cc -MD -c -o ../nondebug/libpmem/os_linux.o -std=gnu99 -Wall -Werror -Wmissing-prototypes -Wpointer-arith -Wsign-conversion -Wsign-compare -Wconversion -Wunused-macros -Wmissing-field-initializers -O2 -U_FORTIFY_SOURCE -D_FORTIFY_SOURCE=2 -std=gnu99 -fno-common -pthread -DSRCVERSION=\"1.3+b2-504-g0124a9715\" -I../include -I../common/ -fPIC ../../src/../src/common/os_linux.c
../../src/../src/common/os_linux.c: In function 'os_getenv':
../../src/../src/common/os_linux.c:243:9: error: implicit declaration of function 'secure_getenv' [-Werror=implicit-function-declaration]
return secure_getenv(name);
^~~~~~~~~~~~~
../../src/../src/common/os_linux.c:243:9: error: return makes pointer from integer without a cast [-Werror=int-conversion]
return secure_getenv(name);
^~~~~~~~~~~~~~~~~~~
cc1: all warnings being treated as errors
make[2]: *** [../Makefile.inc:292: ../nondebug/libpmem/os_linux.o] Error 1
make[2]: Leaving directory '/root/pmdk/src/libpmem'
make[1]: *** [Makefile:166: libpmem] Error 2
make[1]: Leaving directory '/root/pmdk/src'
make: *** [Makefile:82: all] Error 2
Distribution: Alpine Linux 3.7.0
Found on 1.3+b2-504-g0124a9715
When application uses pmemobj on pool A and anywhere in the transaction calls another module or library that operates on pool B, all transactions on pool B abort immediately.
Prominent example is application using pmemobj for pool A and logging something through pmemfile on pool B - in such case logging will always fail.
pmemobj_tx_begin on different pool should push current transaction on stack and create new transaction on specified pool. pmemobj_tx_end should pop last transaction from stack.
Consider following local pool set file:
PMEMPOOLSET
20M /root/part.0
20M /root/part.1
REPLICA 192.168.0.181 /root/remotePool.set
and remotePool.set
PMEMPOOLSET
20M /root/remotePart.0
20M /root/remotePart.1
Create above poolsets, then remove whole remote replica and change layout remotePool.set to:
PMEMPOOLSET
15M /root/remotePart.0
15M /root/remotePart.1
Then call pmempool sync with -d flag on pool set file. Return code is 0 which is not correct because we can not synchronize replicas when one of them got smaller size than size of other replicas.
Found on: 1.2-rc2
Consider following text file "textfile.txt":
"some text"
and then using
pmempool_rm("textfile.txt", 0)
will delete the file. pmempool_rm() should check if it is file or poolset before attempting any actions.
Found on: 1.2+wtp1-227-gac3524e
A commit group operation, allowing to open up individual transactions in multiple threads and then to commit them all at the same time (all transactions commit or none of them).
Potential variants:
Create a document with NVML glossary ("pool", "pool set", etc...). Could be a section in the man pages, or a separate man page.
Also, it looks like some terms are not used consistently in the code and documentation - need to unify that.
rpmem_create and rpmem_open accept pool_addr which can be address obtained by mmaping device DAX. If pointer is not aligned to internal device DAX alignment rpmem_create and rpmem_open can fail.
Please investigate if we can prevent user from failing badly.
Presently, libpmempool/obj/blk/log libraries operate on a pool through a poolset file, which is a text file describing the structure of a pool, or directly on a file with a pool. Some of the implications are:
The proposed change is to:
Some benefits:
Currently there's no way to check how much memory is available, what is the fragmentation, how much memory is wasted by metadata, etc. If there's a (persistent) memory leak, there's no way to quickly evaluate it.
Please add a tool which will help with these tasks and add an API which will let applications get pool statistics (leaks can be debugged already thanks to pmemobj_first / pmemobj_next API)
Bonus points for interactive readline-based tool.
I'd like pmemobj to have an API for suspending its operation and giving up file lock and restoring it with as much runtime state retained as possible. From caller perspective it would basically be fast pool close & open API. It doesn't have to be thread-safe (caller would have to deal with that).
This feature can be used to implement application-level "process switching" (partial/simplified multi-process).
The non-transactional persistent atomic list API in libpmemobj is not entirely thread-safe. The mutex is only used by some of the API (insert, remove, move), but not for things like the POBJ_LIST_FOREACH macro. All list operations need to be made thread-safe. This will be a big win for programmers using this particular set of NVML functions. Re: https://groups.google.com/forum/#!topic/pmem/cWu9lMqGY6g
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.