GithubHelp home page GithubHelp logo

patrickfrey / strus Goto Github PK

View Code? Open in Web Editor NEW
47.0 7.0 1.0 14.83 MB

Library implementing the storage and the query evaluation for a text search engine. It uses on a key value store database interface to store its data. Currently there exists an implementation based on the google LevelDB library.

Home Page: http://www.project-strus.net

License: Mozilla Public License 2.0

C++ 95.84% CMake 3.08% C 1.08%
search-engine c-plus-plus

strus's Introduction

Library for building a competitive, scalable search engine for information retrieval.
A solution for small projects as well as larger scale applications.

Licenced as MPLv2 (Mozilla Public License, Version 2 - https://www.mozilla.org/en-US/MPL/2.0)
For 3rdParty licenses see LICENSE.3rdParty

The project Strus implements a set of libraries, tools for building a competitive, 
scalable search engine for text retrieval.
It is a solution for small projects as well as larger scale applications.
Strus project homepage at http://project-strus.net with articles, links, documentation.

For installation see description files INSTALL.<platform> in the top level directory of the project.

The project is built regularly with Travis (https://travis-ci.org/patrickfrey/strus) 
and with OpenSuse (https://build.opensuse.org/package/show/home:PatrickFrey/strus):

strus's People

Contributors

andreasbaumann avatar dw avatar patrickfrey avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

Forkers

forkme7

strus's Issues

Browsing the collection

  • Document::addBrowseIndexTerm and Query::addBrowseTerm with Variant type or string as value
  • Browsing:
    ** StorageClient::createDocumentBrowser( termType, rangeStart, rangeEnd, elements);
    ** DocumentBrowser::defineMetaDataRestriction( ...), DocumentBrowser::setMaxNofRanks( ...), DocumentBrowser::setMinRank(...), DocumentBrowser::setUserName(...), DocumentBrowser::evaluate() like in Query;

Browsing needs a new BlockType where the search term is mapped directly as key and no termValueMap is involved.

Example use case: Browse documents inserted in between two days

creating a storage with illegal metadata table leads to storage corruption

shell> strusCreate -s 'path=test; metadata=test UFLOAT17 '
ERROR failed to create storage: error creating storage client: unknown meta data element type name 'UFLOAT17'

so far so good.

Now I see a storage directory containing:

-rw-r--r--.  1 root root    0 May 20 13:41 000005.log
-rw-r--r--.  1 root root   16 May 20 13:41 CURRENT
-rw-r--r--.  1 root root    0 May 20 13:41 LOCK
-rw-r--r--.  1 root root  172 May 20 13:41 LOG
-rw-r--r--.  1 root root   57 May 20 13:41 LOG.old
-rw-r--r--.  1 root root   50 May 20 13:41 MANIFEST-000004

Trying to use the storage results in:

shell> strusInspect  -s 'path=test' metatable
ERROR failed to create storage client: error creating storage client: error creating storage client: corrupt storage, not all mandatory variables defined

The storage should not be in a corrupt state if creation fails.

There is a distribution detector in cmake files

In cmake/LinuxDistribution.cmake.

What is it used for? Buildin software should really not depend on the
distribution and if there should be distribution-independend flags which serve
the same purpose.

For checking for OSX, WINDOWS, LINUX we should use the cmake variables.

A quick grep over all repos shows that INSTALLER_PLATFORM is only used in
cmake/report_build_settings.cmake to print Canonical: xxxx.

Implement alternative posinfo block

I am not sure at all that the current implementation if the posinfo block is really smart.
An alternative implementation should be tried to find a better solution.
Before solving the posinfo problems in annotations (issue #1) it does not make sense to find a solution here.

Weighting function interface question on constness resp. non-constness

WeightingFunctionContextInterface* WeightingFunctionInstanceSmart::createFunctionContext(
        const StorageClientInterface* storage_,
        MetaDataReaderInterface* metadata,
        const GlobalStatistics& stats) const

Why is the storage client interface const, but not the metadata reader interface?

Introspection interface needed for storage

Introspection interface needed for storage client that can iterate on all
Term types,
Term values,
Document ids,
User names

Potentially skip to a certain value or its upperbound

Alignment error accessing document meta data on SPARC64 with FreeBSD

Starting program: /usr/home/abaumann/strus/tests/metaDataRestrictions/src/testMetaDataRestrictions 10 10 10
(no debugging symbols found)...(no debugging symbols found)...(no debugging symbols found)...(no debugging symbols found)...(no debugging symbols found)...Error while reading shared library symbols:
Dwarf Error: wrong version in compilation unit header (is 4, should be 2) [in module /usr/home/abaumann/strus/src/utils/libstrus_utils.so.0.3]
Error while reading shared library symbols:
Dwarf Error: wrong version in compilation unit header (is 4, should be 2) [in module /usr/local/lib/gcc48/libstdc++.so.6](no debugging symbols found)...Error while reading shared library symbols:
Dwarf Error: wrong version in compilation unit header (is 4, should be 2) [in module /usr/local/lib/gcc48/libgcc_s.so.1](no debugging symbols found)...(no debugging symbols found)...[New LWP 100070](no debugging symbols found)...[New Thread 41804400 (LWP 100070/testMetaDataRestric)]
__sparc_utrap: fatal memory address not aligned

Program received signal SIGILL, Illegal instruction.
[Switching to Thread 41804400 (LWP 100070/testMetaDataRestric)]
0x0000000040ec0b88 in kill () from /lib/libc.so.7
(gdb) bt
#0 0x0000000040ec0b88 in kill () from /lib/libc.so.7

valueIteratorInterface storage createDocIdIterator is weird to use

If I don't call skip, is the iterator giving me docids from the beginning?
Is it returning deleted documents or not? Apparently not. Other iterators do (like iterating
the document number). This should maube be clarified.
What's the end condition when reaching EOF? Is the last element of the vector
containg an empty string?

An example of getting all docids:

std::vector<std::string> SearchIndex::getDocids( )
{
    boost::scoped_ptr<strus::ValueIteratorInterface> it;
    it.reset( m_storage->createDocIdIterator( ) );

    std::vector<std::string> res;
    std::vector<std::string> v;
    do {
        v = it->fetchValues( 1024 );
        res.insert( res.end( ), v.begin( ), v.end( ) );
    } while( v.size( ) > 0 && v[v.size( )] != "" );

    return res;
}

wrong number of documents

Inserted 2493 documents in a new index,
nof docs of strusInspect shows 2493.
If I reinsert the same documents I get 4986.

In storageTransaction.cpp I see:

 m_storage->declareNofDocumentsInserted( m_nof_documents);

without any checks whether the documents already existed before.

Metadata type double

Only short float and float implemented till now. For precalculated document weights a double precision floating point type would be appropriate.

Problems inserting big positions?

DEBUG: field 482:11'html_meta_file': './data/etext/10556.html', @192151315
DEBUG: lookup expression for field 'html_meta_file'
DEBUG: got expression number for 'html_meta_file' to be 0
token positions of document '1055610556' are out or range (document too big, 150199 token positions assigned)
DEBUG: buffer reset, rest: 1055710557   Brooke, L. Leslie (Leonard Leslie), 1862-1940   Johnny Crow's Party             English PZ: Language and Literatures: 
failed to process document 'gutenberg.tsv': failed to process document 'gutenberg.tsv': error closing document in transaction: corrupt data (unpackInt32_ 1)

done

The positions of the experimental TSV segmenter with ZIP-file @zipinclude function are quite big,
because it's basically the position within the TSV file and the position of the file withing the
uncompressed ZIP stream.

See

https://github.com/andreasbaumann/strusExamples/tree/master/gutenberg

and

https://github.com/andreasbaumann/strusAnalyzer/tree/tsv_extensions

fdatasync called too often in commit

Some data is stored calling the wrong functions (DatabaseClientInterface::writeImm) instead of writing them as part of the batch (metadata description, df, variables). In the wikipedia search project this leads to a significant slow down of the insert.

strus doen't build with older gcc

gcc is 4.4.7 (Centos 6):

[ 10%] Building CXX object src/storage/CMakeFiles/strus_storage_static.dir/attributeReader.cpp.o
In file included from /home/build/strus/src/storage/storageClient.hpp:12,
                 from /home/build/strus/src/storage/attributeReader.hpp:12,
                 from /home/build/strus/src/storage/attributeReader.cpp:8:
/opt/eurospider/strus/include/strus/numericVariant.hpp: In constructor 'strus::NumericVariant::String::String(const strus::NumericVariant&)':
/opt/eurospider/strus/include/strus/numericVariant.hpp:91: error: expected ')' before 'PRId64'
compilation terminated due to -Wfatal-errors.

There is also a warning:

In file included from /home/build/strus/src/utils/cstring.c:1:
/home/build/strus/include/private/cstring.h:8:1: warning: C++ style comments are not allowed in ISO C90
/home/build/strus/include/private/cstring.h:8:1: warning: (this will be reported only once per input file)

missing a simple summarizer

something just highlighting the hits and doing a little bit of abstraction.

Imagine a 'title' and an 'author' field and a query 'Shakespeare':

**Shakespeare**, William, 1564-1616 The Complete Works of William **Shakespeare**

for the CLI mode mainly.

The supplied summarizers are either too internal (list maatches) or far too sophisticated (phrase).

Strong license

Hi there :)

I'm always overjoyed to discover some new full text search solution, especially one already shipping with a Python binding, however as a preferred go-to option while working commercially, the choice of GPL, and GPL 3 in particular pushes Strus lower in the list than e.g. the ancient CLucene, Groonga, or some out of process solution like Solr or ES.

IIRC in the past the Xapian guys have considered the GPL may have been a mistake, as it prevented the kind of adoptions that might have led to a much stronger ecosystem around their software, and they've been gradually trying to move to LGPL for more than half a decade now.

One needn't look further than Lucene to see how a liberal license can benefit an excellent design. No doubt you've considered this at length, but just in case you haven't, this is a short appeal for you to consider a more liberal license. :)

Thanks regardless, Strus looks awesome for much more than just full text search, I'll be keeping it in mind.

big token positions

2016-08-09 10:52:22; strusWebService, error: Token positions of document 693-2009 are out or range (document too big, only 76263 token positions were assigned, maximum allowed position is %65535) (master.cpp:96)

An idea is to have small, big, very big positions in the index. Simply dropping the positions
is not really good. The document is a big PDF, but splitting it creates a clustering and a
"too small retrieval item" problem.

Java bindings on FreeBSD segfault

#
# A fatal error has been detected by the Java Runtime Environment:
#
#  SIGSEGV (0xb) at pc=0x00000008021d33b3, pid=12399, tid=34384930816
#
# JRE version: OpenJDK Runtime Environment (7.0) (build 1.7.0_95-b00)
# Java VM: OpenJDK 64-Bit Server VM (24.95-b01 mixed mode bsd-amd64 compressed oops)
# Problematic frame:
# V  [libjvm.so+0x5d33b3]  JNI_GetCreatedJavaVMs+0x1f7c3
#
# Core dump written. Default location: /cores/core or core.12399
#
# If you would like to submit a bug report, please visit:
#   http://bugreport.java.com/bugreport/crash.jsp
#

---------------  T H R E A D  ---------------

Current thread (0x0000000803021000):  JavaThread "main" [_thread_in_vm, id=25192448, stack(0x00007fffffafe000,0x00007fffffbfe000)]

siginfo:si_signo=SIGSEGV: si_errno=0, si_code=1 (SEGV_MAPERR), si_addr=0x000000000000726f

Registers:
RAX=0x0000000814aa90d8, RBX=0x0000000803316ca0, RCX=0x0000000803051008, RDX=0x0000000000000000
RSP=0x00007fffffbfd0b0, RBP=0x00007fffffbfd0b0, RSI=0x000000000000726f, RDI=0x000000000000726f
R8 =0x0000000000000000, R9 =0x0000000000000000, R10=0xfffffe003d3a7108, R11=0x0000000000000246
R12=0x00000008028e44b0, R13=0x00000008028e6900, R14=0x000000000000726f, R15=0x0000000803021000
RIP=0x00000008021d33b3, EFLAGS=0x0000000000000001, ERR=0x0000000000000006
  TRAPNO=0x000000000000000c

Top of Stack: (sp=0x00007fffffbfd0b0)
0x00007fffffbfd0b0:   00007fffffbfd0f0 0000000802193662
0x00007fffffbfd0c0:   00000008021935a0 0000000803316ca0
0x00007fffffbfd0d0:   0000000000000000 00007fffffbfd191
0x00007fffffbfd0e0:   0000000803316c90 00000008030211d8
0x00007fffffbfd0f0:   00007fffffbfd120 000000081b41c0d8
0x00007fffffbfd100:   00007fffffbfd300 00007fffffbfd300
0x00007fffffbfd110:   00007fffffbfd1c0 00000008030be7b0
0x00007fffffbfd120:   00007fffffbfd270 0000000820cbf960
0x00007fffffbfd130:   0000000820cbf6c0 0000000820c90000
0x00007fffffbfd140:   0000000000001ab8 0000000803316c50
0x00007fffffbfd150:   00007fffffbfd300 00000008030be590
0x00007fffffbfd160:   0000000803316ca0 000000000000001c
0x00007fffffbfd170:   0000000000000031 000000000000001c
0x00007fffffbfd180:   0000000803316ca0 00007fffffbfd200
0x00007fffffbfd190:   00000008030be590 00000008031dfd10
0x00007fffffbfd1a0:   0000000000001ab8 00000008008275b0
0x00007fffffbfd1b0:   0000000820cd3891 000000000000001a
0x00007fffffbfd1c0:   0000000000000021 000000000000001a
0x00007fffffbfd1d0:   0000000803316c80 0000000000000000
0x00007fffffbfd1e0:   0000000000000000 0000000000000000
0x00007fffffbfd1f0:   0000000000000000 00007f0000000001
0x00007fffffbfd200:   0000000000000000 0000000000000000
0x00007fffffbfd210:   0000000000000000 00007fffffbfd220
0x00007fffffbfd220:   0000000000000000 0000000000000000
0x00007fffffbfd230:   0000000000000000 0000000000000000
0x00007fffffbfd240:   0000000000000000 00000008030be5f8
0x00007fffffbfd250:   00000008030be590 00000008031dfd10
0x00007fffffbfd260:   0000000803316c50 00007fffffbfd4b0
0x00007fffffbfd270:   00007fffffbfd460 0000000820cbf007
0x00007fffffbfd280:   00007fffffbfd3c0 00000008030be3c0
0x00007fffffbfd290:   00007fffffbfd480 0000000800816400
0x00007fffffbfd2a0:   00007fffffbfd340 0000000800610bbb 

Instructions: (pc=0x00000008021d33b3)
0x00000008021d3393:   66 66 66 66 2e 0f 1f 84 00 00 00 00 00 55 48 89
0x00000008021d33a3:   e5 48 85 ff 74 0d 48 8b 05 60 36 6e 00 48 8b 00
0x00000008021d33b3:   48 89 07 5d c3 0f 1f 84 00 00 00 00 00 55 48 89
0x00000008021d33c3:   e5 48 85 ff 74 0d 48 8b 05 40 36 6e 00 48 8b 00 

Register to memory mapping:

RAX=0x0000000814aa90d8 is an oop
java.lang.Object 
 - klass: 'java/lang/Object'
RBX=0x0000000803316ca0 is an unknown value
RCX=0x0000000803051008 is an unknown value
RDX=0x0000000000000000 is an unknown value
RSP=0x00007fffffbfd0b0 is pointing into the stack for thread: 0x0000000803021000
RBP=0x00007fffffbfd0b0 is pointing into the stack for thread: 0x0000000803021000
RSI=0x000000000000726f is an unknown value
RDI=0x000000000000726f is an unknown value
R8 =0x0000000000000000 is an unknown value
R9 =0x0000000000000000 is an unknown value
R10=0xfffffe003d3a7108 is an unknown value
R11=0x0000000000000246 is an unknown value
R12=0x00000008028e44b0: _ZN9Arguments17SharedArchivePathE+0xa6f0 in /usr/local/openjdk7/jre/lib/amd64/server/libjvm.so at 0x0000000801c00000
R13=0x00000008028e6900: _ZN9Arguments17SharedArchivePathE+0xcb40 in /usr/local/openjdk7/jre/lib/amd64/server/libjvm.so at 0x0000000801c00000
R14=0x000000000000726f is an unknown value
R15=0x0000000803021000 is a thread


Stack: [0x00007fffffafe000,0x00007fffffbfe000],  sp=0x00007fffffbfd0b0,  free space=1020k
Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code)
V  [libjvm.so+0x5d33b3]  JNI_GetCreatedJavaVMs+0x1f7c3
V  [libjvm.so+0x593662]  AsyncGetCallTrace+0xcc0d2
C  [libstrus_java.so.0.7.1+0x1c0d8]  operator delete(void*)+0x38
C  [libleveldb.so.1+0x2f960]  leveldb::VersionSet::WriteSnapshot(leveldb::log::Writer*)+0x2a0
C  [libleveldb.so.1+0x2f007]  leveldb::VersionSet::LogAndApply(leveldb::VersionEdit*, leveldb::port::Mutex*)+0x437
C  [libleveldb.so.1+0x1ecd3]  leveldb::DB::Open(leveldb::Options const&, std::__1::basic_string<char, leveldb::Options const&::char_traits<char>, leveldb::Options const&::allocator<char> > const&, leveldb::DB**)+0x193
C  [libstrus_database_leveldb.so.0.7+0x62a7]  _init+0x212f
C  [libstrus_database_leveldb.so.0.7+0x668f]  _init+0x2517
C  [libstrus_database_leveldb.so.0.7+0x49ad]  _init+0x835
C  [libstrus_module.so.0.7+0x6637]  strus::StorageModule::StorageModule(strus::PostingIteratorJoinConstructor const*, strus::WeightingFunctionConstructor const*, strus::SummarizerFunctionConstructor const*)+0x1cf7
C  [libstrus_java.so.0.7.1+0x6421c]  StorageClient::StorageClient(Reference const&, Reference const, std::__1::basic_string<char, Reference const&::char_traits<char>, Reference const&::allocator<char> > const&)+0x7c
C  [libstrus_java.so.0.7.1+0x6a7e8]  Context::createStorageClient(std::__1::basic_string<char, Context::createStorageClient::char_traits<char>, Context::createStorageClient::allocator<char> > const&)+0x38
C  [libstrus_java.so.0.7.1+0x4ee38]  Java_net_strus_api_strusJNI_Context_1createStorageClient_1_1SWIG_11+0x148
j  net.strus.api.strusJNI.Context_createStorageClient__SWIG_1(JLnet/strus/api/Context;Ljava/lang/String;)J+0
j  net.strus.api.Context.createStorageClient(Ljava/lang/String;)Lnet/strus/api/StorageClient;+10
j  net.strus.example.Status.main([Ljava/lang/String;)V+13
v  ~StubRoutines::call_stub
V  [libjvm.so+0x583f13]  AsyncGetCallTrace+0xbc983
V  [libjvm.so+0x583398]  AsyncGetCallTrace+0xbbe08
V  [libjvm.so+0x5a08d6]  AsyncGetCallTrace+0xd9346
V  [libjvm.so+0x5a5439]  AsyncGetCallTrace+0xddea9
C  [java+0x55eb]  JavaMain+0x9bb
C  [libthr.so.3+0x94f5]  operator->+0x725

Java frames: (J=compiled Java code, j=interpreted, Vv=VM code)
j  net.strus.api.strusJNI.Context_createStorageClient__SWIG_1(JLnet/strus/api/Context;Ljava/lang/String;)J+0
j  net.strus.api.Context.createStorageClient(Ljava/lang/String;)Lnet/strus/api/StorageClient;+10
j  net.strus.example.Status.main([Ljava/lang/String;)V+13
v  ~StubRoutines::call_stub

---------------  P R O C E S S  ---------------

Java Threads: ( => current thread )
  0x0000000803026800 JavaThread "Service Thread" daemon [_thread_blocked, id=50622464, stack(0x00007fffff3f7000,0x00007fffff4f7000)]
  0x0000000803025800 JavaThread "C2 CompilerThread1" daemon [_thread_blocked, id=50619392, stack(0x00007fffff4f8000,0x00007fffff5f8000)]
  0x0000000803025000 JavaThread "C2 CompilerThread0" daemon [_thread_blocked, id=50616320, stack(0x00007fffff5f9000,0x00007fffff6f9000)]
  0x0000000803024000 JavaThread "Signal Dispatcher" daemon [_thread_blocked, id=50613248, stack(0x00007fffff6fa000,0x00007fffff7fa000)]
  0x0000000803023800 JavaThread "Finalizer" daemon [_thread_blocked, id=50609152, stack(0x00007fffff7fb000,0x00007fffff8fb000)]
  0x0000000803022800 JavaThread "Reference Handler" daemon [_thread_blocked, id=50606080, stack(0x00007fffff8fc000,0x00007fffff9fc000)]
=>0x0000000803021000 JavaThread "main" [_thread_in_vm, id=25192448, stack(0x00007fffffafe000,0x00007fffffbfe000)]

Other Threads:
  0x000000080325f800 VMThread [stack: 0x00007fffff9fd000,0x00007fffffafd000] [id=50603008]
  0x0000000803260000 WatcherThread [stack: 0x00007fffff2f6000,0x00007fffff3f6000] [id=50624512]

VM state:not at safepoint (normal execution)

VM Mutex/Monitor currently owned by a thread: None

Heap
 def new generation   total 4224K, used 546K [0x0000000806e00000, 0x0000000807290000, 0x000000080b750000)
  eden space 3776K,  14% used [0x0000000806e00000, 0x0000000806e88b08, 0x00000008071b0000)
  from space 448K,   0% used [0x00000008071b0000, 0x00000008071b0000, 0x0000000807220000)
  to   space 448K,   0% used [0x0000000807220000, 0x0000000807220000, 0x0000000807290000)
 tenured generation   total 9408K, used 0K [0x000000080b750000, 0x000000080c080000, 0x0000000814a00000)
   the space 9408K,   0% used [0x000000080b750000, 0x000000080b750000, 0x000000080b750200, 0x000000080c080000)
 compacting perm gen  total 21248K, used 2560K [0x0000000814a00000, 0x0000000815ec0000, 0x0000000819c00000)
   the space 21248K,  12% used [0x0000000814a00000, 0x0000000814c80280, 0x0000000814c80400, 0x0000000815ec0000)
No shared spaces configured.

Card table byte_map: [0x0000000800722000,0x00000008007ba000] byte_map_base: 0x00000007fc6eb000

Polling page: 0x0000000800659000

Code Cache  [0x0000000803a3f000, 0x0000000803caf000, 0x0000000806a3f000)
 total_blobs=191 nmethods=0 adapters=161 free_code_cache=48760Kb largest_free_block=49931136

Compilation events (0 events):
No events

GC Heap History (0 events):
No events

Deoptimization events (0 events):
No events

Internal exceptions (5 events):
Event: 0.050 Thread 0x0000000803021000 Threw 0x0000000806e0f0a0 at /wrkdirs/usr/ports/java/openjdk7/work/openjdk/hotspot/src/share/vm/prims/jni.cpp:3991
Event: 0.104 Thread 0x0000000803021000 Threw 0x0000000806e58308 at /wrkdirs/usr/ports/java/openjdk7/work/openjdk/hotspot/src/share/vm/prims/jvm.cpp:1319
Event: 0.112 Thread 0x0000000803021000 Threw 0x0000000806e5dbf0 at /wrkdirs/usr/ports/java/openjdk7/work/openjdk/hotspot/src/share/vm/prims/jvm.cpp:1319
Event: 0.138 Thread 0x0000000803021000 Threw 0x0000000806e61b50 at /wrkdirs/usr/ports/java/openjdk7/work/openjdk/hotspot/src/share/vm/prims/jvm.cpp:1319
Event: 0.141 Thread 0x0000000803021000 Threw 0x0000000806e701e0 at /wrkdirs/usr/ports/java/openjdk7/work/openjdk/hotspot/src/share/vm/prims/jvm.cpp:1319

Events (10 events):
Event: 0.112 loading class 0x000000080311c0b0
Event: 0.112 loading class 0x000000080311c0b0 done
Event: 0.112 loading class 0x0000000803289740
Event: 0.112 loading class 0x0000000803289740 done
Event: 0.113 loading class 0x000000080311cc20
Event: 0.114 loading class 0x000000080311cc20 done
Event: 0.137 loading class 0x000000080327fd90
Event: 0.137 loading class 0x000000080327fd90 done
Event: 0.141 loading class 0x000000080327fac0
Event: 0.141 loading class 0x000000080327fac0 done


Dynamic libraries:
0x0000000000400000  /usr/local/openjdk7//bin/java
0x0000000800829000  /lib/libz.so.6
0x0000000800a3f000  /lib/libthr.so.3
0x0000000800c64000  /lib/libc.so.7
0x0000000801c00000  /usr/local/openjdk7/jre/lib/amd64/server/libjvm.so
0x000000080100d000  /lib/libm.so.5
0x000000080290a000  /usr/lib/libc++.so.1
0x0000000802bca000  /lib/libcxxrt.so.1
0x0000000802de6000  /lib/libgcc_s.so.1
0x0000000803400000  /usr/local/openjdk7/jre/lib/amd64/libverify.so
0x000000080360f000  /usr/local/openjdk7/jre/lib/amd64/libjava.so
0x0000000803837000  /usr/local/openjdk7/jre/lib/amd64/libzip.so
0x000000081b400000  /usr/home/abaumann/strusBindings/lang/java/libstrus_java.so.0.7.1
0x000000081b698000  /usr/local/openjdk7/jre/lib/amd64/libjawt.so
0x000000081b899000  /usr/local/openjdk7/jre/lib/amd64/xawt/libmawt.so
0x000000081baf5000  /usr/local/lib/strus/libstrus_module.so.0.7
0x000000081bd06000  /usr/local/lib/strus/libstrus_rpc_client.so.0.7
0x000000081bf6a000  /usr/local/lib/strus/libstrus_rpc_client_socket.so.0.7
0x000000081c18b000  /usr/local/lib/strus/libstrus_utils.so.0.7
0x000000081c39a000  /usr/local/lib/strus/libstrus_error.so.0.7
0x000000081c5a2000  /usr/local/openjdk7/jre/lib/amd64/libawt.so
0x000000081c86d000  /usr/local/lib/libXext.so.6
0x000000081ca7e000  /usr/local/lib/libX11.so.6
0x000000081cdb7000  /usr/local/lib/libXrender.so.1
0x000000081cfc0000  /usr/local/lib/libXtst.so.6
0x000000081d1c5000  /usr/local/lib/libXi.so.6
0x000000081d3d3000  /usr/local/lib/strus/libstrus_analyzer.so.0.7
0x000000081d5ec000  /usr/local/lib/strus/libstrus_segmenter_textwolf.so.0.7
0x000000081d829000  /usr/local/lib/strus/libstrus_textproc.so.0.7
0x000000081da3d000  /usr/local/lib/strus/libstrus_queryeval.so.0.7
0x000000081dc5a000  /usr/local/lib/strus/libstrus_queryproc.so.0.7
0x000000081deb7000  /usr/local/lib/strus/libstrus_statsproc.so.0.7
0x000000081e0c6000  /usr/local/lib/strus/libstrus_storage.so.0.7
0x000000081e333000  /usr/local/lib/strus/libstrus_database_leveldb.so.0.7
0x000000081e546000  /usr/local/lib/libintl.so.8
0x000000081e751000  /usr/lib/librt.so.1
0x000000081e957000  /usr/local/lib/libboost_thread.so.1.55.0
0x000000081eb72000  /usr/local/lib/libboost_system.so.1.55.0
0x000000081ed75000  /usr/local/lib/libboost_date_time.so.1.55.0
0x000000081ef83000  /usr/local/lib/libboost_atomic.so.1.55.0
0x000000081f185000  /usr/local/lib/libboost_chrono.so.1.55.0
0x000000081f38d000  /usr/local/lib/libxcb.so.1
0x000000081f5ac000  /usr/lib/librpcsvc.so.5
0x000000081f7b5000  /usr/local/lib/strus/libstrus_detector_std.so.0.7
0x000000081f9b8000  /usr/local/lib/strus/libstrus_normalizer_snowball.so.0.7
0x000000081fbc2000  /usr/local/lib/strus/libstrus_stemmer.so.0.7
0x000000081fe14000  /usr/local/lib/strus/libstrus_normalizer_dictmap.so.0.7
0x0000000820023000  /usr/local/lib/strus/libstrus_normalizer_charconv.so.0.7
0x0000000820231000  /usr/local/lib/strus/libstrus_normalizer_dateconv.so.0.7
0x0000000820455000  /usr/local/lib/strus/libstrus_tokenizer_punctuation.so.0.7
0x000000082066b000  /usr/local/lib/strus/libstrus_tokenizer_word.so.0.7
0x000000082086f000  /usr/local/lib/strus/libstrus_aggregator_vsm.so.0.7
0x0000000820a79000  /usr/local/lib/strus/libstrus_scalarfunc.so.0.7
0x0000000820c90000  /usr/local/lib/libleveldb.so.1
0x0000000820ee3000  /usr/local/lib/libXau.so.6
0x00000008210e5000  /usr/local/lib/libpthread-stubs.so.0
0x00000008212e6000  /usr/local/lib/libXdmcp.so.6
0x000000080060c000  /libexec/ld-elf.so.1

VM Arguments:
java_command: net.strus.example.Status
Launcher Type: SUN_STANDARD

Environment Variables:
JAVA_HOME=/usr/local/openjdk7/
PATH=/opt/maven/bin:/opt/hadoop/bin:/bin:/usr/bin:/usr/local/bin:/usr/X11R6/bin:/usr/KDE/bin:/opt/bin
LD_LIBRARY_PATH=/home/abaumann/strusBindings/lang/java
SHELL=/bin/tcsh
HOSTTYPE=FreeBSD
OSTYPE=FreeBSD
MACHTYPE=x86_64

Signal Handlers:
SIGSEGV: [libjvm.so+0x894ab0], sa_mask[0]=0x7fffffff, sa_flags=0x00000042
SIGBUS: [libjvm.so+0x894ab0], sa_mask[0]=0x7fffffff, sa_flags=0x00000042
SIGFPE: [libjvm.so+0x74da90], sa_mask[0]=0x7fffffff, sa_flags=0x00000042
SIGPIPE: [libjvm.so+0x74da90], sa_mask[0]=0x7fffffff, sa_flags=0x00000042
SIGXFSZ: [libjvm.so+0x74da90], sa_mask[0]=0x7fffffff, sa_flags=0x00000042
SIGILL: [libjvm.so+0x74da90], sa_mask[0]=0x7fffffff, sa_flags=0x00000042
SIGUSR1: SIG_DFL, sa_mask[0]=0x00000000, sa_flags=0x00000000
SIGUSR2: [libjvm.so+0x74e3f0], sa_mask[0]=0x00000000, sa_flags=0x00000042
SIGHUP: [libjvm.so+0x74c130], sa_mask[0]=0x7fffffff, sa_flags=0x00000042
SIGINT: [libjvm.so+0x74c130], sa_mask[0]=0x7fffffff, sa_flags=0x00000042
SIGTERM: [libjvm.so+0x74c130], sa_mask[0]=0x7fffffff, sa_flags=0x00000042
SIGQUIT: [libjvm.so+0x74c130], sa_mask[0]=0x7fffffff, sa_flags=0x00000042


---------------  S Y S T E M  ---------------

OS:BSDuname:FreeBSD 10.1-RELEASE FreeBSD 10.1-RELEASE #0 r274401: Tue Nov 11 21:02:49 UTC 2014     [email protected]:/usr/obj/usr/src/sys/GENERIC amd64
rlimit: STACK 524288k, CORE infinity, NPROC 5749, NOFILE 28746, AS infinity
load average:0.32 0.42 0.25

CPU:total 2 (1 cores per cpu, 2 threads per core) family 15 model 4 stepping 3, cmov, cx8, fxsr, mmx, sse, sse2, sse3, ht, tsc

/proc/cpuinfo:
  <Not Available>

Memory: 4k page, physical 898288k(224572k free)

/proc/meminfo:


vm_info: OpenJDK 64-Bit Server VM (24.95-b01) for bsd-amd64 JRE (1.7.0_95-b00), built on Apr  2 2016 01:52:58 by "root" with gcc 4.2.1 Compatible FreeBSD Clang 3.4.1 (tags/RELEASE_34/dot1-final 208032)

time: Thu Apr  7 15:27:14 2016
elapsed time: 0 seconds

cmake relocation fails

Installing into /opt/strus
with -DCMAKE_INSTALL_PREFIX=/opt/strus fails.

The probing of the strus packages works, I see:

-- Setting strusbase prefix path to /opt/strus
-- Set strusbase include directories to /opt/strus/include
-- Set strusbase linking directories to /opt/strus/lib/strus

but not one variable is actually used in the CMake files:

[ 53%] Building CXX object src/queryproc/utils/CMakeFiles/queryproc_utils.dir/positionWindow.cpp.o
cd /home/build/strus/build/src/queryproc/utils && /usr/bin/c++    -I/home/build/strus/include -I/home/build/strus/src/queryproc/utils  -std=c++98  -Wall -pedantic -g -Wfatal-errors -fvisibility=hidden -fPIC -O3 -O3 -DNDEBUG   -o CMakeFiles/queryproc_utils.dir/positionWindow.cpp.o -c /home/build/strus/src/queryproc/utils/positionWindow.cpp
In file included from /home/build/strus/src/queryproc/utils/positionWindow.cpp:8:0:
/home/build/strus/src/queryproc/utils/positionWindow.hpp:12:31: fatal error: strus/base/stdint.h: No such file or directory
 #include "strus/base/stdint.h"

for instance ../src/queryproc/utils/CMakeLists.txt:

include_directories(
  "${Boost_INCLUDE_DIRS}"
  "${PROJECT_SOURCE_DIR}/include"
  "${PROJECT_SOURCE_DIR}/src/queryproc/utils"
)

The variables strusbase_INCLUDE_DIRS and strusbase_LIBRARY_DIRS should be
added everywhere where needed.

how to add markers in forward index

having spans of meta features with a start and an end (sequence of tokens) it would be
nice if the forward index can store:

word begin_marker word word sign word end_marker word

Now the problem is the token positions, because some tokens (begin_marker,
end_marker should have the same position as the first and the last word of
the span.

Position information of overlapping annotations (attributes in XML) is not handled correctly

The start position of an annotation is bound to the tag, if the tag is selected or to the first term after the tag. Subsequent positions are counted from this base. This has the following consequences:

  1. Elements in annotations except the first get a wrong position
  2. When matching a structure in an annotation you might get matches covering two annotations if the annotations are close or overlapping.

Proposed solution:

  • All elements of an annotation get bound to the one position in the content they are bound to.
  • Provide special posting set operators that use a second position inside the annoation for matching structures in annotations.
  • Implement another type of a block for annotations that have a tuple for each position they match: The first element of the tuple is the content position and the second the position inside the annotation. The first position is used in ordinary operations, the second in the operations refering to the annotation positions.

endless loop in query evaluation

One thread in 100% CPU:

#0  0x004b1a43 in strus::IteratorUnion::skipDoc(int const&) () from /usr/lib/strus/libstrus_queryproc.so.0.8
#1  0x0054abd0 in strus::Accumulator::nextRank(int&, unsigned int&, float&) ()
   from /usr/lib/strus/libstrus_queryeval.so.0.8
#2  0x00543354 in strus::Query::evaluate() () from /usr/lib/strus/libstrus_queryeval.so.0.8
#3  0x08076acf in apps::query::query_cmd (this=0x9baa0a4, name="teste", qry="", query_in_url=false)
    at /data/b/aba/strusWebService/src/query.cpp:270
#4  0x08077673 in apps::query::query_payload_cmd (this=0x9baa0a4, name="teste")
    at /data/b/aba/strusWebService/src/query.cpp:46
#5  0x08077e43 in operator() (this=0x9bac710, a1="teste") at /usr/include/cppcms/url_dispatcher.h:233
#6  callable_impl<void, cppcms::url_dispatcher::binder1<apps::query> >::call (this=0x9bac710, a1="teste")
    at /usr/include/booster/function.h:178
#7  0x0062ca6f in ?? () from /usr/lib/libcppcms.so.1
#8  0x0062c1bb in cppcms::url_dispatcher::dispatch(std::basic_string<char, std::char_traits<char>, std::allocator<char> >) () from /usr/lib/libcppcms.so.1
#9  0x00627443 in cppcms::application::main(std::basic_string<char, std::char_traits<char>, std::allocator<char> >) () from /usr/lib/libcppcms.so.1
#10 0x0062110a in cppcms::http::context::dispatch(booster::intrusive_ptr<cppcms::application>, std::basic_string<char, std::char_traits<char>, std::allocator<char> >, bool) () from /usr/lib/libcppcms.so.1
#11 0x00622738 in booster::function<void ()()>::callable_impl<void, cppcms_boost::_bi::bind_t<void, void (*)(booster::intrusive_ptr<cppcms::application>, std::basic_string<char, std::char_traits<char>, std::allocator<char> >, bool), cppcms_boost::_bi::list3<cppcms_boost::_bi::value<booster::intrusive_ptr<cppcms::application> >, cppcms_boost::_bi::value<std::basic_string<char, std::char_traits<char>, std::allocator<char> > >, cppcms_boost::_bi::value<bool> > > >::call() () from /usr/lib/libcppcms.so.1
#12 0x00624237 in cppcms::impl::thread_pool::worker() () from /usr/lib/libcppcms.so.1
#13 0x00623f6e in booster::function<void ()()>::callable_impl<void, cppcms_boost::_bi::bind_t<void, cppcms_boost::_mfi::mf0<void, cppcms::impl::thread_pool>, cppcms_boost::_bi::list1<cppcms_boost::_bi::value<cppcms::impl::thread_pool*> > > >::call() () from /usr/lib/libcppcms.so.1
#14 0x00b8e41a in booster_thread_func () from /usr/lib/libbooster.so.0
#15 0x00419bc9 in start_thread () from /lib/libpthread.so.0

A fini never returns. Trying to hunt if this is a corruption in my search index or actually a bug
in the union iterator.

FreeBSD 32-bit storage operations test fails

Experienced on 32-bit, FreeBSD 10.3 and 11 after in fact enabling optimization (see fixed bug
concerning compiler options because Clang != clang):

...
2: error fetching next chunk of storage dump: error in dumped dkey 'd[02][14][cf][a7]': illegal range in boolean block (not strictly ascending or unjoined overlapping ranges)
2: Error in test (6) DocumentUpdate: error fetching next chunk of storage dump: error in dumped dkey 'd[02][14][cf][a7]': illegal range in boolean block (not strictly ascending or unjoined overlapping ranges)
1/1 Test #2: StorageOperations ................***Failed    9.39 sec

Not all leveldb options seem to be configurable

            path=<LevelDB storage path>
            create=<yes/no, yes=do create if database does not exist yet>
            cache=<size of LRU cache for LevelDB>
            compression=<yes/no>
            max_open_files=<maximum number of open files for LevelDB>
            write_buffer_size=<Amount of data to build up in memory per file>
            block_size=<approximate size of user data packed per block>
            cachedterms=<file with list of terms to cache>

Checking against options.h notably:

  • verify_checksums
  • paranoid_checks
  • leveldb: compression 0, no compression, 1 snappy compression, this is not a boolean
    but an enum, maybe more compression algorithms arise in the future. So the parameter
    should be NO_COMPRESSION, SNAPPY_COMPRESSION or something similar IMHO.
  • info_log: a low-level logfile of LevelDB-Operations
  • not quite clear: cache == block_cache?

More exotic and maybe not necessary to expose as options:

  • block_restart_interval

compilation on FreeBSD 11 fails

corresponding spezialization of LocalStructAllocator missing?

Building CXX object src/storage/CMakeFiles/strus_storage_static.dir/attributeMap.cpp.o
In file included from /home/abaumann/strus/src/storage/attributeMap.cpp:8:
In file included from /home/abaumann/strus/src/storage/attributeMap.hpp:13:
In file included from /home/abaumann/strus/include/private/stringMap.hpp:12:
In file included from /usr/local/include/boost/unordered_map.hpp:17:
In file included from /usr/local/include/boost/unordered/unordered_map.hpp:19:
In file included from /usr/local/include/boost/functional/hash.hpp:6:
In file included from /usr/local/include/boost/functional/hash/hash.hpp:560:
In file included from /usr/local/include/boost/functional/hash/extensions.hpp:22:
In file included from /usr/local/include/boost/detail/container_fwd.hpp:94:
/usr/include/c++/v1/map:837:5: error: implicit instantiation of undefined template
      '__static_assert_test<false>'
    static_assert((is_same<typename allocator_type::value_type, value_type>::value),
    ^
/usr/include/c++/v1/__config:632:35: note: expanded from macro 'static_assert'
    typedef __static_assert_check<sizeof(__static_assert_test<(__b)>)> \
                                  ^
/home/abaumann/strus/src/storage/attributeMap.hpp:50:6: note: in instantiation of template class
      'std::__1::map<long long, const char *, std::__1::less<long long>,
      strus::LocalStructAllocator<std::__1::pair<long long, const char *> > >' requested here
        Map m_map;
            ^
/usr/include/c++/v1/__config:627:24: note: template is declared here
template <bool> struct __static_assert_test;
                       ^
1 error generated.
*** Error code 1

Limits of operators should be part of the API

Maximal number of (in this case) union operator is hard-coded to 64.

error creating 'union' iterator: number of arguments of union out of range (> 64)

in src/queryproc/iterator/postingIteratorUnion.cpp

if (args_.size() > 64)

throw strus::runtime_error( _TXT( "number of arguments of union out of range (> %u)"), 64);

This should be a constant in a header file (DRY) and it should be configurable at
runtime because some queries can be really big. :-)

Other operators (and functions) affected?

Cannot set custom CFLAGS or CXXFLAGS

For instance:

cmake -DCMAKE_INSTALL_PREFIX=/usr -DCMAKE_BUILD_TYPE=Release -DLIB_INSTALL_DIR=lib -DCMAKE_CXX_FLAGS_RELEASE='-g -O0' -DCMAKE_C_FLAGS_RELEASE='-g -O0' .

or

cmake -DCMAKE_INSTALL_PREFIX=/usr -DCMAKE_BUILD_TYPE=Release -DLIB_INSTALL_DIR=lib -DCMAKE_CXX_FLAGS='-g -O0' -DCMAKE_C_FLAGS='-g -O0' .

has now effect as I can see in:

Compiler:
  C++ compilation flags: -std=c++98  -Wall -pedantic -g -Wfatal-errors -fvisibility=hidden -fPIC -O3
  C compilation flags: -std=c99 -Wall -pedantic -Wfatal-errors -fPIC -O3

This is because in cmake/build_rules.cmake the corresponding flag variables are
just brutally set:

if(CMAKE_COMPILER_IS_GNUCXX)
set( STRUS_OPTIMIZATION_LEVEL "3" )
set( CMAKE_CXX_FLAGS "-std=c++98  -Wall -pedantic -g -Wfatal-errors -fvisibility=hidden -fPIC -O${STRUS_O
PTIMIZATION_LEVEL}" )
set( CMAKE_C_FLAGS "-std=c99 -Wall -pedantic -Wfatal-errors -fPIC -O${STRUS_OPTIMIZATION_LEVEL}" )
endif()

With 'make VERBOSE=1' I can even see, that flags get used in parallel:

[ 33%] Building CXX object src/utils/CMakeFiles/strus_private_utils.dir/utils.cpp.o
cd /home/user/strusAnalyzer_tsv/src/utils && /bin/c++    -I/home/user/strusAnalyzer_tsv/include  -std=c++98  -Wall -pedantic -g -Wfatal-errors -fvisibility=hidden -fPIC -O3 -g -O0   -o CMakeFiles/strus_private_utils.dir/utils.cpp.o -c /home/user/strusAnalyzer_tsv/src/utils/utils.cpp

O3 and O0 in parallel.

All cmake support should be fixed in this regard, see also tipps in:

http://voices.canonical.com/jussi.pakkanen/2013/03/26/a-list-of-common-cmake-antipatterns/

Several reasons why this fix is important:

  • Developers want to set debug options like optimization levels when calling cmake,
    not by editing a file
  • Packagers must respect the flags of their distribution, it's not allowed to
    use your own flags.

This bug is valid for all strus repos.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.