Comments (10)
For now the HAT-trie only supports a contiguous sequence of chars (could be used to store binary data as a '\0' char inside the sequence is supported).
I'm not that familiar with the composite keys of boost::multi_index_container
. Do you have a simple example of what you want to achieve so I can wrap my head around?
If you want to store something like:
struct employee {
std::string first_name;
std::string last_name;
};
You could store in the HAT-trie the concatenation of the first name and the last name and provide a string_view into this concatenation in your interface.
struct employee {
std::string_view first_name() const;
std::string_view last_name() const;
};
from hat-trie.
I'm more thinking of binary types like:
struct order {
int64_t price;
int32_t id;
};
And then I'd need to find an order by it's price + id combination and iterate over orders at the same price
but having lower id
. Would you recon it could be done by putting both to std::string
maybe in big-endian?
from hat-trie.
You can do something like this.
#include <iostream>
#include "htrie_set.h"
struct order {
int64_t price;
int32_t id;
};
int main() {
order o1{10, 1};
order o2{5, 2};
order o3{10, 3};
order o4{16, 4};
static_assert(std::is_pod<order>::value, "");
tsl::htrie_set<char> test;
test.insert_ks(reinterpret_cast<char*>(&o1), sizeof(o1));
test.insert_ks(reinterpret_cast<char*>(&o2), sizeof(o2));
test.insert_ks(reinterpret_cast<char*>(&o3), sizeof(o3));
test.insert_ks(reinterpret_cast<char*>(&o4), sizeof(o4));
int64_t price = 10;
auto its_prefix = test.equal_prefix_range_ks(
reinterpret_cast<char*>(&price), sizeof(price));
order o;
std::string key_buffer;
// 3: 10 1: 10
for(auto it = its_prefix.first; it != its_prefix.second; ++it) {
it.key(key_buffer);
assert(key_buffer.size() == sizeof(o));
std::memcpy(&o, key_buffer.data(), sizeof(o));
std::cout << o.id << ": " << o.price << std::endl;
}
}
If I'm not mistaken, there should be no undefined behaviour if order
is a POD and you don't transfer the trie between big and little endian but it's quite "hackish".
You also can't really find objects with a id lower than 'x' (you could check for id bigger than 'x' if you use an unsigned int by knowing the binary representation, but well...).
What is the problem with boost::multi_index_container
for this usecase? Using too much memory?
from hat-trie.
"hakish" and slight "undefined behavior" is perfectly fine as long as gcc compiled x64 is doing good.
boost::multi_index_container
has pretty good hash set but I don't like relying on red-black tree to maintain an order as it's not cache friendly and once tree's growing deep quite a bit of different cache lines could be required.
the hash set is heavily modified too i.e. a lot of deletes, inserts going on. I'll try to benchmark trie vs boost for my cases, thanks so much for your help!
if you'd also need to know the sum of all order
quantities (extra uint32_t qty
field in the struct) for each price
is there a good way to store a bit of extra data in the trie node? maybe I can just have some values in the order
itself I could use for extra storage though.
from hat-trie.
Yes I think it should be good to do a benchmark as I'm not sure that the HAT-trie offers a significant advantage compared to boost::multi_index_container
for your use case considering the complexity that it add.
For the sum, I'm not sure to understand. You can do:
std::size_t sum = 0;
for(auto it = its_prefix.first; it != its_prefix.second; ++it) {
it.key(key_buffer);
std::memcpy(&o, key_buffer.data(), sizeof(o));
sum += o.qty;
}
You want to have an aggregate field for each price to avoid this calculation?
from hat-trie.
Exactly, an aggregate and some way to get N highest & lowest aggregates by price.
Now have a separate std::map
collection of aggregates which is constantly updated and each order have a pointer to map node. But not happy with the setup, kind of too many tree structures everywhere.
from hat-trie.
I think it's better to have a separate structure for the aggregates.
As the trie stores binary data, you could store multiple data types in the trie.
Example you will have a char buffer that looks like this for an order:
| char (type) | int64_t (id) | int64_t (price) | uint32 (qty) |
And for an aggregate:
| char (type) | char (aggregate type min, max, avg, ...) | int64_t (price) | uint32 (total_qty) |
The first byte differentiate an order from an aggregate. When you retrieve your data you would do:
if(key_buffer[0] == ORDER_TYPE) {
std::memcpy(&ord, key_buffer.data() + 1, sizeof(order));
}
else {
assert(key_buffer[0] == AGGREGATE_TYPE);
std::memcpy(&aggr, key_buffer.data() + 1, sizeof(aggragate));
}
But I don't see any advantage compared to a separate map while it really increases the complexity of the code.
from hat-trie.
#define NONIUS_RUNNER
#include <boost/multi_index/composite_key.hpp>
#include <boost/multi_index/ordered_index.hpp>
#include <boost/multi_index/hashed_index.hpp>
#include <boost/multi_index_container.hpp>
#include <boost/multi_index/mem_fun.hpp>
#include <boost/multi_index/member.hpp>
#include <boost/pool/pool.hpp>
#include "../../hat-trie/src/htrie_set.h"
#include "../../nonius/nonius.h++"
#include <unordered_set>
struct order
{
int64_t price;
int32_t id;
};
order ht_o1 {10, 1};
order us_o1 {10, 1};
order mi_o1 {10, 1};
struct order_hash
{
size_t operator()(const order &o) const
{
return std::hash<int64_t>()(o.price) + std::hash<int32_t>()(o.id);
}
};
struct order_equal
{
bool operator()(const order &x, const order &y) const
{
return x.price == y.price && x.id == y.id;
}
};
using namespace boost::multi_index;
using mindex_set = multi_index_container<
order,
indexed_by<
hashed_unique<
member<order, const int32_t, &order::id>>,
ordered_non_unique<
composite_key<
order,
member<order, int64_t, &order::price>,
member<order, int32_t, &order::id>>>>>;
tsl::htrie_set<char> ht_test;
std::unordered_set<order, order_hash, order_equal> us_test;
mindex_set mi_test;
NONIUS_BENCHMARK("htrie-basic", [](nonius::chronometer meter)
{
ht_test.clear();
meter.measure([&](int i)
{
++ht_o1.id;
if (ht_o1.id % 10 == 0)
++ht_o1.price;
ht_test.insert_ks(reinterpret_cast<char*>(&ht_o1), sizeof(ht_o1));
});
})
NONIUS_BENCHMARK("unordered_set-basic", [](nonius::chronometer meter)
{
us_test.clear();
meter.measure([&](int i)
{
++us_o1.id;
if (us_o1.id % 10 == 0)
++us_o1.price;
us_test.insert(us_o1);
});
})
NONIUS_BENCHMARK("multi_index-basic", [](nonius::chronometer meter)
{
mi_test.clear();
meter.measure([&](int i)
{
++mi_o1.id;
if (mi_o1.id % 10 == 0)
++mi_o1.price;
mi_test.insert(mi_o1);
});
})
clock resolution: mean is 20.725 ns (20480002 iterations)
benchmarking htrie-basic
collecting 100 samples, 155 iterations each, in estimated 2.077 ms
mean: 87.9874 ns, lb 86.7012 ns, ub 89.3417 ns, ci 0.95
std dev: 6.71555 ns, lb 5.84097 ns, ub 8.5427 ns, ci 0.95
found 1 outliers among 100 samples (1%)
variance is severely inflated by outliers
benchmarking unordered_set-basic
collecting 100 samples, 189 iterations each, in estimated 2.0601 ms
mean: 39.9197 ns, lb 39.7768 ns, ub 40.1902 ns, ci 0.95
std dev: 0.976376 ns, lb 0.597635 ns, ub 1.42363 ns, ci 0.95
found 15 outliers among 100 samples (15%)
variance is moderately inflated by outliers
benchmarking multi_index-basic
collecting 100 samples, 80 iterations each, in estimated 2.072 ms
mean: 62.8324 ns, lb 62.3842 ns, ub 64.4419 ns, ci 0.95
std dev: 3.71753 ns, lb 0.54332 ns, ub 8.42814 ns, ci 0.95
found 5 outliers among 100 samples (5%)
variance is severely inflated by outliers
from hat-trie.
htrie seem to be quite complicated indeed so basic inserts are getting slower. probably got to stick to simple data structures for now.
from hat-trie.
Yes, I think the HAT-trie is not well-suited for your usecase. It could probably be speed-up to take advantage of the fixed size of the entries (the HAT-trie here must support strings of variable length) but it's not the main purpose of the structure, there are probably more suited datastructures for this.
from hat-trie.
Related Issues (20)
- trie using integer keys? HOT 12
- iterator size HOT 1
- Feature suggestion: obtaining a path to the longest prefix HOT 8
- Feature question / suggestion: Insertion via iterators HOT 2
- Add check for _MSVC_LANG in addition to __cplusplus HOT 1
- boolean value is not supported HOT 1
- Question: Visiting each key during mutations HOT 2
- how do u iterate through hat trie HOT 1
- possible to provide golang binding for golang? HOT 1
- possible to rewrite in rustlang? HOT 1
- will there be an advantage if this is written with lsm backed by disk like rocksdb? HOT 2
- cmake --build . <- failed with boost 1.70 HOT 2
- is this production ready? HOT 1
- possible to implement a lru cache example using this hat trie? HOT 2
- benchmark against rust? HOT 1
- possible to pls show how to use emplace function? HOT 1
- any comparison with patricia trie and adaptive radix tree? HOT 1
- Is load factor 16 or greater possible? HOT 1
- windows conflict with windows.h HOT 1
- macos trie_map exceed size HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from hat-trie.