deztructor / marisa-trie Goto Github PK
View Code? Open in Web Editor NEWAutomatically exported from code.google.com/p/marisa-trie
License: Other
Automatically exported from code.google.com/p/marisa-trie
License: Other
I have about 400 marisa tries each of size 9MB .. I wish to merge them. How can
I do so ?
Original issue reported on code.google.com by [email protected]
on 19 Jan 2013 at 5:10
Cython wrapper is unable to provide detailed exception info because
marisa::Exception is not a subclass of std::exception. Can you please inherit
it from std::exception? The attached patch works for me.
Original issue reported on code.google.com by [email protected]
on 30 Aug 2012 at 6:19
Attachments:
I'm storing values in a trie with this encoding scheme:
<utf8-encoded unicode key> + chr(255) + <binary_value>
This works perfectly, but common prefix search is suboptimal with the current
marisa-trie API.
Ideally, this should be implemented like this:
* walk to a char in a key;
* if char is not walkable, exit loop;
* test if data separator (chr(255)) is walkable;
* if it is walkable, add the current key to a list of prefixes.
Instead of this, it can be implemented like this now (pseudocode):
while ind <= key_len:
prefix = key[:ind]
ag.set_query(prefix + _VALUE_SEPARATOR)
if trie.predictive_search(ag):
result.append(prefix)
ind += 1
return res
This is suboptimal because:
* there is no fail-fast if the current prefix is not in a trie;
* trie is walked from the root for each prefix.
Stepwise API would be great for making this more efficient.
Original issue reported on code.google.com by [email protected]
on 23 Aug 2012 at 1:36
I'm not a mingw user myself; a build error was reported here:
https://github.com/kmike/marisa-trie/issues/1
The stat.h header from mingw:
http://gitorious.org/mingw/mingw-runtime/blobs/6e654ca0ceb56a42ebaa23bd43b50d62c
4e4c0c1/include/sys/stat.h
_stat64 is indeed defined only #if __MSVCRT_VERSION__ >= 0x0601
Mingw default for this define is the following (for compatibility with older
Windows):
define __MSVCRT_VERSION__ 0x0600
so _stat64 is not defined under mingw and marisa-trie build fails.
Similar issue:
http://www.mail-archive.com/[email protected]/msg00741.html
- the suggested fix was to manually define less restrictive __MSVCRT_VERSION__.
Original issue reported on code.google.com by [email protected]
on 28 Aug 2012 at 12:55
Hi,
very nice job indeed.
I am wondering if version 0.2 is stable enough to be used in my projects
(academic research in Health care)?
Do you have a roadmap of future developments?
best regards, and once again thanks for this enormous job
Original issue reported on code.google.com by [email protected]
on 18 Oct 2012 at 2:29
I can see two featured versions of marisa-trie ready.
1. marisa-0.1.5.tar.gz
2.marisa-0.2.0-beta7.tar.gz
Which one do u suggest to pickup.
And also could u pls let me know the difference between the two versions.
Regards,
Vinay
Original issue reported on code.google.com by [email protected]
on 7 Feb 2012 at 11:44
What steps will reproduce the problem?
1.It is not a issue , but i would like to ask abt an improvement.Does marisa
trie does wild card search .
If user ypes "cat*r" does the marisa trie gives the output which contains all
the combinationa matching to the pattern like
'cater','caterpiller','catara','catira' etc.
Original issue reported on code.google.com by [email protected]
on 19 Oct 2011 at 1:47
Please implement structure that can map string values to some objects. Thank
you.
Original issue reported on code.google.com by fsqcds
on 27 Apr 2014 at 9:01
What does id and weight infer in the below union of keyset.h
union Union {
UInt32 id;
float weight;
} union_;
Does id mean the id i get when i query marisa trie for a string?
If i put a weight , does the search vary ?
If varies in wht kinds , if there is a prefix and there are 10 word starting
with that prefix , if weight is set , do i get all the 10 words and i hope i
get 10 words sorted by weight.Is my interpreatation right?
if i want to use weight , then wht does id convey?
pls respond asap.
Original issue reported on code.google.com by [email protected]
on 9 Dec 2011 at 2:37
1. License contains a reference to grnxx: "grnxx - An open-source fulltext
search engine and column store." Is this intentional?
2. BSD part contain "<ORGANIZATION>" placeholder which should be probably
replaced with some organization name.
3. Is it OK that I'm referring to you personally and to marisa-trie C++ library
in a wrapper README (https://github.com/kmike/marisa-trie)? License says that
"Neither the name of the <ORGANIZATION> nor the names of its contributors may
be used to endorse or promote products derived from this software without
specific prior written permission."
Original issue reported on code.google.com by [email protected]
on 12 Apr 2013 at 6:47
What steps will reproduce the problem?
1. Compile tests/vector-test.cc like:
$ g++ vector-test.cc -o vector-test -I../lib/ -lmarisa
(g++ 3.4.6)
2. Run the test.
3. A segmentation fault occurs.
What is the expected output? What do you see instead?
No segmentation fault occurs.
What version of the product are you using? On what operating system?
marisa-trie-0.2.0-beta4
CentOS 4.8
Please provide any additional information below.
The error seems to occur at vector-test.cc line 114.
Original issue reported on code.google.com by [email protected]
on 10 May 2011 at 9:25
Marisa-trie under win64 compiles but fails in marisa-test (segfault in
TestTinyTrie).
I guess, the cause is that size_t is a 64bit value in win64 and marisa-trie
assumes it's an 32bit value.
Original issue reported on code.google.com by [email protected]
on 5 Aug 2013 at 7:01
The Xcode compiler get's 'confused' and can't include <cstdio> complaining that
FILE is not a member of global namespace std.
After digging around for hours we suspected the file 'stdio.h' in marisa trie
to 'confuse' the compiler and stop looking for the 'real' stdio.h.
While marisa trie includes stdio.h using two different ways - once with
"stdio.h" and other times with <stdio.h> to distinguish between user and system
include - this still didn't resolve our problem.
We had to rename the marisa trie stdio.h into stdio_xx.h (example name) to
avoid this issue and change the sources accordingly.
What steps will reproduce the problem?
1. Create a project that uses marisa trie
2. Compile and 'install' the project into a directory
3. Include the marisa trie headers
What is the expected output? What do you see instead?
Expected output was successfull compilation - we saw an error saying that
<cstdio> in marisa's stdio.h couldn't be included
What version of the product are you using? On what operating system?
0.2.4 on OSX/iOS
Please provide any additional information below.
We'd be very happy if marisa could use 'non standard' include file names ...
e.g. rename stdio.h, iostream, ... to something 'custom'
Thanks
Original issue reported on code.google.com by [email protected]
on 17 Aug 2013 at 9:25
Hello,
I've created an alternative Python binding for marisa-trie:
https://github.com/kmike/marisa-trie/
It is implemented in Cython and seems to be several times faster than included
SWIG bindings. It is also possible to install these bindings just by "pip
install marisa-trie", without manual downloading and compiling the library. The
interface is closer to https://github.com/kmike/datrie than to original
bindings; there are e.g. no Agent class.
I'll be glad if you include a link to my bindings somewhere in wiki or docs.
Thanks for the marisa-trie, that's an impressive trie library!
Original issue reported on code.google.com by [email protected]
on 17 Aug 2012 at 10:31
predict_depth_first , predict_breadth_first which are available in marisa1.5 are not available in marisa 2.0
in Marisa 2.0 only one api predictive_search is available and it seems to do
predict_depth_first of Marisa 1.5
How can i get the behavior of predict_breadth_first using marisa 2.0
Original issue reported on code.google.com by [email protected]
on 4 Oct 2012 at 1:21
What steps will reproduce the problem?
1. save a marisa-dict using save function
2. open the dictfile and read all info into a buffer
3. call map function
What is the expected output? What do you see instead?
map ok! see a exception instead.
What version of the product are you using? On what operating system?
marisa-0.1.4, Win7
Please provide any additional information below.
itseems something was wrong when I used map function:
and I found that in constructor:
Mapper::Mapper(const void *ptr, std::size_t size)
: ptr_(ptr), origin_(NULL), avail_(size), size_(0),
file_(NULL), map_(NULL) {
MARISA_THROW_IF((ptr != NULL) && (size != 0), MARISA_PARAM_ERROR);
}
it should be (ptr == NULL) when throw exception, i also checked the newest beta
version and that was ok.
Original issue reported on code.google.com by [email protected]
on 30 Sep 2011 at 9:21
Hi ,
Is there any mechanism in marisa trie where i can retrieve the search
candidates in an order,i mean if i search for 'b',
the search returns me
id
b 7
ban 12
bang 13
ben 56
beng 57
and the ids are not sequential, is there a mechanism where i can retrieve the
ids in a sequential order.
Note the strings are returned in a sequential order
Original issue reported on code.google.com by [email protected]
on 27 Feb 2012 at 1:59
Is there any documentation to explain marisa implementation?
Original issue reported on code.google.com by [email protected]
on 16 Nov 2013 at 6:10
Hey,
I have quite a large input consisting of about 2^32 keys that i would like to
have a marisa trie for. I have a server with lots of memory (400GB RAM), and i
was wondering if that was at all possible.
right now i am getting marisa/grimoire/trie/../vector/bit-vector.h:52:
MARISA_SIZE_ERROR: size_ == MARISA_UINT32_MAX: when trying to build. Is there
any way to do so without having to make lots of smaller tries?
Thanks!
Original issue reported on code.google.com by [email protected]
on 11 Dec 2013 at 9:54
Hi,
Have u tried compiling marisa trie on android.
Pls let me know asap
Original issue reported on code.google.com by [email protected]
on 9 Feb 2012 at 12:23
This is reported in debian.
https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=739126
Could you consider apply a patch or use stdint.h method?
---from---
marisa fails to build from source on s390x due to testsuite failures. It
appears that it uses an hardcoded list of architectures to determine the
size of a size_t type, which doesn't include s390x. The patch below
fixes the issue:
--- marisa-0.2.4.orig/lib/marisa/base.h
+++ marisa-0.2.4/lib/marisa/base.h
@@ -30,7 +30,7 @@ typedef uint64_t marisa_uint64;
#if defined(_WIN64) || defined(__amd64__) || defined(__x86_64__) || \
defined(__ia64__) || defined(__ppc64__) || defined(__powerpc64__) || \
- defined(__sparc64__) || defined(__mips64__) || defined(__aarch64__)
+ defined(__sparc64__) || defined(__mips64__) || defined(__aarch64__) ||
defined(__s390x__)
#define MARISA_WORD_SIZE 64
#else // defined(_WIN64), etc.
#define MARISA_WORD_SIZE 32
BTW, __sparc64__ doesn't exist and should be replaced by (__sparc__ &&
__arch64__)
That said, I don't really see the point of using an hardcoded
architecture list to determine the size of a size_t type. This can be
done the following way:
| #include <stdint.h>
|
| #if SIZE_MAX == UINT64_MAX
| #define MARISA_WORD_SIZE 64
| #else
| #define MARISA_WORD_SIZE 32
| #endif
However as marisa is using autotools, the best way to do that would be
to add a test in configure.
------
Original issue reported on code.google.com by [email protected]
on 22 Feb 2014 at 5:28
What steps will reproduce the problem?
1. Put 98000 dict words in a list, loaded them in trie
2.Find all prefixes of a given key.
3.
What is the expected output? What do you see instead?
I expected to see the prefixes. I seen single letters returned (The first
letter of the prefix)
Filter them by prefix did work, I just used list comprehension to remove index.
What version of the product are you using? On what operating system?
I installed today through pip on Linux Mint 13 LTS Cinnamon.
Please provide any additional information below.
Original issue reported on code.google.com by [email protected]
on 25 Apr 2014 at 7:35
What steps will reproduce the problem?
1. ./configure --prefix=/e/SDK/env-gcc-4.8-64bit --enable-sse2 --enable-sse3
--enable-ssse3 --enable-sse4 --enable-sse4.1 --enable-sse4.2
2. make
3. make install
What is the expected output? What do you see instead?
Expect both shared and static libraries built. But only static library gets
built.
What version of the product are you using? On what operating system?
Marisa 0.2.4, MSys, MinGW64, Windows 8.1 64bit.
Please provide any additional information below.
Configuring with --enable-static=no produces makefile that does nothing.
Missing rules for shared library?
Original issue reported on code.google.com by [email protected]
on 27 Nov 2014 at 12:56
It is reported that saving doesn't work under mingw32.
Issue #10 is a related issue.
http://code.google.com/p/marisa-trie/issues/detail?id=10
See also the following.
https://github.com/kmike/marisa-trie/issues/1#issuecomment-8135066
Original issue reported on code.google.com by [email protected]
on 31 Aug 2012 at 1:21
"The biggest advantage of libmarisa is that its **dictioanry** size is
considerably more compact than others. See below for the dictionary size of
other implementations.
Original issue reported on code.google.com by [email protected]
on 28 Aug 2012 at 6:32
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.