GithubHelp home page GithubHelp logo

vitrivr / cottontaildb Goto Github PK

View Code? Open in Web Editor NEW
36.0 10.0 20.0 14.66 MB

Cottontail DB is a column store vector database aimed at multimedia retrieval. It allows for classical boolean as well as vector-space retrieval (nearest neighbour search) used in similarity search using a unified data and query model.

Home Page: https://www.vitrivr.org/vitrivr.html

License: MIT License

Kotlin 91.10% Dockerfile 0.02% ANTLR 0.78% JavaScript 0.05% TypeScript 3.22% CSS 0.28% HTML 1.28% SCSS 0.06% Java 3.21%
multimedia retrieval database multimedia-retrieval cottontail-db vector-space-retrieval knearest-neighbours-lookup cottontaildb embedding-similarity similarity-search

cottontaildb's People

Contributors

floribur avatar frankier avatar gabuzi avatar lucaro avatar ludlows avatar orpham avatar ppanopticon avatar rasmuswilli avatar sauterl avatar schoenja avatar silvanheller avatar spiess avatar x4e-jonas avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

cottontaildb's Issues

Tests not successfully running on dev

There are currently a number of tests not working on dev:

  • Running the MappedFileChannelStoreTest leads to a deadlock while growing the store.
  • testL2Distance() and testL2SquaredDistance() fail in the IntVectorDistanceTest
  • the UniqueHashIndexTest fails because of the changed name-handling.
  • and some of the complex vector tests fail but that is to be expected i guess since @gabuzi is actively working on those.

[PQ Index] Make distance metric configurable during training

Training in the PQ index currently used the Mahalanobis-Distance. This function should be configurable e.g. by the L1, L2 or another distance. Furthermore,

  • The function used for training should be stored with the index config
  • Mismatches between the distance used during training and at query time should be considered during query planning

db stopping directly after start

Since commit 1241e4b I'm not able to run cottontaildb anymore. Not locally, nor it works on the server. There is no error, it just starts up then shutsdowns directly and terminates with build successful. Any ideas?
grafik
(Gradle version installed is 6.x, so the 7.x warning can not be the cause.)

Select all projection

It is currently not possible to select all fields in a Projection without explicitly enumerating them.

Cottontail blocks indefinitely on creating and batch inserting entity within same transaction

When creating an entity and performing a batch insert on it within the same transaction, Cottontail DB blocks indefinitely.

The CLI may still be responsive after this, but calling system locks causes the CLI to stop responding as well.

A minimal example for reproducibility using the Cottontail DB Python Client:

from cottontaildb_client import CottontailDBClient, Type, Literal, column_def

schema = 'test_schema'
entity = 'test_entity'

with CottontailDBClient('localhost', 1865) as client:
    client.create_schema(schema)
    columns = [column_def('id', Type.STRING, nullable=False)]
    client.start_transaction()
    client.create_entity(schema, entity, columns)
    batch = [
        (schema, entity, {'id': Literal(stringData='test_1')}),
        (schema, entity, {'id': Literal(stringData='test_2')})
    ]
    client.insert_batch(batch)
    client.commit_transaction()

Add more Unit Tests

We should have more Unit Tests that test basic functionality such as:

Basics

  • Arithmetic operations of advanced types (e.g. Complex32 or VectorValues)
  • Distance calculations for kNN lookups
  • Basic data structures (e.g. Entity, Column etc.)
  • Advanced data structures (e.g. Indexes)

Execution

  • Individual Operations (FilterOperator, LimitOperator, EntityScanOperator)
  • Chains of Operations

Feel free to continue this list.

Entity optimization not persistent

When optimizing an entity, such as to rebuild an index, the changes are not persisted and disappear after a restart. This is true for at least the non-unique hash index, other indices have so far not been tested. This behaviour was observed on the dev branch, as of version 108552b.
Steps to reproduce:

  1. create an entity
  2. fill entity with data
  3. create non-unique has index (which is not initialized, despite setRebuild(true) )
  4. query entity --> there are no results returned since the index is used despite it being empty
  5. optimize entity via the CLI
  6. query entity --> the query now returns results
  7. stop cottontail via the CLI and start it again
  8. query entity --> there are again no results

Drop schema CLI prompts for confirmation before parsing arguments

The drop schema CLI command prompts the user for confirmation of the command before the command arguments are parsed resulting in the CLI prompt: Do you really want to drop the schema null [y/N]?:, which is also printed when no schema argument is passed.

The problem is likely that the private val confirm option prompt is executed at command instantiation rather than command execution.

This issue may affect other commands as well.

[PQ Index] Make pre-kNN configurable

PQ implementation currently performs two stages of kNN, first on the approximation (preKnn) and then on the actual database vectors (kNN).

Tests show that having preK > k for low values of k dramatically increases the quality of the results with very little impact on execution time. Depending on the use-cases, an argument can be made for both modes of operation.

Hence, this is something that should be configurable? There is multiple options:

  • Make this a build-time index config (easy but inflexible).
  • Make this a query-time index config (not supported yet).
  • Make single stage PQ default and add the ability for sub-selects to Cottontail DB. Two-stage kNN could then be expressed as sub-select with two different sets of k.

Fix UnitTests for Complex32VectorValue and Complex64VectorValue

These Unit Tests

  • org.vitrivr.cottontail.math.basics.Complex32VectorValueTest
  • org.vitrivr.cottontail.math.basics.Complex64VectorValueTest

which test the proper functioning of basic, mathematical operations, sometimes fail because of numeric / precision errors.

Primary Key in column definition of entity

Hello! I am using cottontail for my master thesis. Thank you for your work!
It seems to me that the primary flag in the column definition of an entity has no effect.
It is possible to insert multiple entries with the same string ID into an entity with primary key set for the column.
Furthermore, the CLI command "entity about schema entity" does not list primary information for the columns.

Best,
Simon

BooleanVectorValues implementation not working

BooleanVectorValues are currently not working. They are implemented on top of BitSets, which are in their raw form not suitable for fixed-length vectors. An all-zero bitset is seen as a bitset with length 0 (via BitSet::length()). BitSet::size() is another aspect to get the size of a bitSet, but it is in multiples of 64 bits, as bitSets are implemented as long arrays. It refers to the currently occupied space of the bitSet and should be considered an implementation detail.
We would need to have a field to track the size of the vector, but this is not possible for inline classes (only computable properties are allowed).

Limit is ignored for Text-Queries

Steps to reproduce:

  • Have an entity with a text column and n entries
  • Do a LIKE query with n-k limit

You will get more than n-k results if there are more than n-k matches

Re-implement select distinct

I don't know what the status is on dev, but on master support for SELECT DISTINCT was removed back in September (37e9107#diff-57d928ad9738b57469365a563d358f1c8c69d2b8d0a53f732fbc035bc279db82L59) and never re-introduced. Unfortunately, Cineast relies on this operation to retrieve all possible options for columns in the LSC Context.

This is not time-critical, so we can wait for the merge from dev to master but I probably have a BSc Thesis starting in April at which point a fully functioning LSC instance would be nice.

Fat JAR broken (breaks Lucene)

The ./gradlew jar task makes a jar which is somehow missing some info lucene needs in META-INF to do some reflection (or something?! see https://stackoverflow.com/a/42462012 )

For example I get the following when trying to run cineast setup against it when it gets to a Lucene based index:

2020-07-23 11:15:36 ERROR CottonDDLService:260 - Error while creating index 'cineast.cineast_tags.index-lucene-cineast_cineast_tags_name'
java.util.ServiceConfigurationError: Cannot instantiate SPI class: org.apache.lucene.codecs.lucene70.Lucene70Codec
	at org.apache.lucene.util.NamedSPILoader.reload(NamedSPILoader.java:82)
	at org.apache.lucene.util.NamedSPILoader.<init>(NamedSPILoader.java:51)
	at org.apache.lucene.util.NamedSPILoader.<init>(NamedSPILoader.java:38)
	at org.apache.lucene.codecs.Codec$Holder.<clinit>(Codec.java:47)
	at org.apache.lucene.codecs.Codec.getDefault(Codec.java:143)
	at org.apache.lucene.index.LiveIndexWriterConfig.<init>(LiveIndexWriterConfig.java:129)
	at org.apache.lucene.index.IndexWriterConfig.<init>(IndexWriterConfig.java:157)
	at org.vitrivr.cottontail.database.index.lucene.LuceneIndex.<init>(LuceneIndex.kt:92)
	at org.vitrivr.cottontail.database.index.IndexType.create(IndexType.kt:48)
	at org.vitrivr.cottontail.database.entity.Entity.createIndex(Entity.kt:190)
	at org.vitrivr.cottontail.server.grpc.services.CottonDDLService.createIndex(CottonDDLService.kt:237)
	at org.vitrivr.cottontail.grpc.CottonDDLGrpc$MethodHandlers.invoke(CottonDDLGrpc.java:1020)
	at io.grpc.stub.ServerCalls$UnaryServerCallHandler$UnaryServerCallListener.onHalfClose(ServerCalls.java:172)
	at io.grpc.internal.ServerCallImpl$ServerStreamListenerImpl.halfClosed(ServerCallImpl.java:331)
	at io.grpc.internal.ServerImpl$JumpToApplicationThreadServerStreamListener$1HalfClosed.runInContext(ServerImpl.java:818)
	at io.grpc.internal.ContextRunnable.run(ContextRunnable.java:37)
	at io.grpc.internal.SerializingExecutor.run(SerializingExecutor.java:123)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1130)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:630)
	at java.base/java.lang.Thread.run(Thread.java:832)
Caused by: java.lang.IllegalArgumentException: An SPI class of type org.apache.lucene.codecs.PostingsFormat with name 'Lucene50' does not exist.  You need to add the corresponding JAR file supporting this SPI to your classpath.  The current classpath supports the following names: [IDVersion]
	at org.apache.lucene.util.NamedSPILoader.lookup(NamedSPILoader.java:116)
	at org.apache.lucene.codecs.PostingsFormat.forName(PostingsFormat.java:112)
	at org.apache.lucene.codecs.lucene70.Lucene70Codec.<init>(Lucene70Codec.java:166)
	at org.apache.lucene.codecs.lucene70.Lucene70Codec.<init>(Lucene70Codec.java:81)
	at java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
	at java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
	at java.base/jdk.internal.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
	at java.base/java.lang.reflect.Constructor.newInstanceWithCaller(Constructor.java:500)
	at java.base/java.lang.reflect.ReflectAccess.newInstance(ReflectAccess.java:124)
	at java.base/jdk.internal.reflect.ReflectionFactory.newInstance(ReflectionFactory.java:346)
	at java.base/java.lang.Class.newInstance(Class.java:604)
	at org.apache.lucene.util.NamedSPILoader.reload(NamedSPILoader.java:72)
	... 19 more

It works just fine when doing ./gradlew run (but then the REPL doesn't work properly)

Add optional DISTINCT in SELECT

It would be great to have the possibility to get distinct values only. Especially when doing batched knns, for instance, we get a lot of duplicates.

Performance for queries with large IN-Clauses / Parallelization

Observing Cottontail’s performance during Cristina Illi’s MSc Thesis it seems to me that cottontail does not utilize parallelization for queries which require index scans / full table scans or there is something broken about my index configuration for non-unique hash indices / non-unique hash index performance.
In particular, when utilizing the current VBS-Dump (https://download-dbis.dmi.unibas.ch/vitrivr/cottontaildb-data-descriptions-201125.zip), queries with 20-30k elements in the IN-Clause for the id column on the largest table (features_segmenttags) can take up to three minutes of execution time during which all cpu cores except one idle.
I’m not sure if parallelization is the fix for this or if there are other issues here.

Express kNN parallelisation in execution graph

Currently, parallelisation of kNN queries is realised within a dedicated executor implementation called ParallelEntityScanKnnTask. Instead, for the sake of consistency, that parallelisation could be expressed in the execution graph as follows:

  1. Partitioned kNN lookup on column
  2. Merge + sort of partial results into one recordset
  3. Selection of Top K entries

Based on such a generalisation, candidate lists produced by Indexes such as LSH could also be parallelised as follows:

  1. Produce candidate list with Index
  2. Partitioned kNN lookup on candidate list
  3. Merge + sort of partial results into one recordset
  4. Selection of Top K entries

This could potentially speed-up lookup for very large collections and very high dimensional vectors.

kNN Retrieval Time Increase for Complex Vectors

k=1 lookup-tests with a dataset of roughly 9 million Complex32Vectors of length 20 show a great increase in retrieval time of rev. c0eb4e2 (I call it new version here) over a version based on rev. 1696878 (old version).

I tested with a query of 16384 Complex32Vectors in batches of 10000 with the CosineDistance measure.

The current version did not return results for the first 10000 Vectors after 7 h while the old version returned results for the entire query set after approx 2.5 h.

Insertion times for the 9 million vectors are also increased for the new version (ca 30 min vs. ca 12 min).

All tests were run on a 13" MacBook Pro from 2015 with 16 GB of RAM and a Core i5-5287U CPU @ 2.9 GHz.

Lucene Indices are broken on Windows

I'll just leave this here for reference.
Windows continously causes issues (#58 #57 #7). We can leave this as wontfix or we can find a way to CI-Test cottontail releases against windows.

To reproduce, run the Unit-Tests in Cineast against a windows instance of Cottontail.

Docker images -> public registry

Currently the Docker images are provided through GitHub packages. Unfortunately this means that even though it's a public repository, one has to auth to GitHub before pulling the image. This makes it difficult/impossible to use with some other services (e.g. SingularityHub). As far as I know, other registries such as DockerHub and GitLab CI don't have this limitation. Perhaps the Docker image could be published to one of those instead?

Drop entity throws error and does not properly delete files

When attempting to drop an existing entity cottontail throws the following exception (on the dev branch).

Attempting to create a new entity with the same name afterwards does not work.

ERROR CottonDDLService:155 - Error while dropping entity 'warren.cottontail.tab4'
java.nio.file.NoSuchFileException: ./cottontaildb-data/schema_cottontail/entity_$tab4
	at java.base/sun.nio.fs.UnixException.translateToIOException(UnixException.java:92)
	at java.base/sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:111)
	at java.base/sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:116)
	at java.base/sun.nio.fs.UnixFileAttributeViews$Basic.readAttributes(UnixFileAttributeViews.java:55)
	at java.base/sun.nio.fs.UnixFileSystemProvider.readAttributes(UnixFileSystemProvider.java:149)
	at java.base/sun.nio.fs.LinuxFileSystemProvider.readAttributes(LinuxFileSystemProvider.java:99)
	at java.base/java.nio.file.Files.readAttributes(Files.java:1763)
	at java.base/java.nio.file.FileTreeWalker.getAttributes(FileTreeWalker.java:219)
	at java.base/java.nio.file.FileTreeWalker.visit(FileTreeWalker.java:276)
	at java.base/java.nio.file.FileTreeWalker.walk(FileTreeWalker.java:322)
	at java.base/java.nio.file.FileTreeIterator.<init>(FileTreeIterator.java:71)
	at java.base/java.nio.file.Files.walk(Files.java:3824)
	at java.base/java.nio.file.Files.walk(Files.java:3878)
	at org.vitrivr.cottontail.database.schema.Schema.dropEntity(Schema.kt:206)
	at org.vitrivr.cottontail.server.grpc.services.CottonDDLService.dropEntity(CottonDDLService.kt:142)
	at org.vitrivr.cottontail.grpc.CottonDDLGrpc$MethodHandlers.invoke(CottonDDLGrpc.java:1075)
	at io.grpc.stub.ServerCalls$UnaryServerCallHandler$UnaryServerCallListener.onHalfClose(ServerCalls.java:180)
	at io.grpc.internal.ServerCallImpl$ServerStreamListenerImpl.halfClosed(ServerCallImpl.java:331)
	at io.grpc.internal.ServerImpl$JumpToApplicationThreadServerStreamListener$1HalfClosed.runInContext(ServerImpl.java:814)
	at io.grpc.internal.ContextRunnable.run(ContextRunnable.java:37)
	at io.grpc.internal.SerializingExecutor.run(SerializingExecutor.java:123)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
	at java.base/java.lang.Thread.run(Thread.java:834

nullable property for entity columns

Hello! Me again :)
Cottontail provides a "nullable" property in the column definition of an entity.
I am now trying to figure out how I can insert an entry with a "NULL" value for a column with the property "nullable=True".
However, cottontail always responds with an error, mostly something like: "INSERT failed because of a database error. A value of NULL cannot be cast to INT."

Environment:

  • cottontail-db Version: 0.12.11
  • cottontail-python-client Version: 0.0.4 (Built with Cottontail DB Proto version 0.12.2)

Examples of what I've tried so far: (column 'count_category' is defined as 'column_def('count_category', Type.INTEGER, nullable=True)'
(i) entry = {'count_category': Literal(intData=None)}
(ii) entry = {'count_category': Literal(nullData=None)}
(ii) omit the column specification for 'count_category' from the insert specification

Do you have any idea what I am doing wrong? Is this feature already supported? Might the problem be caused by the cottontail-python-client?

Many thanks for your help!
Best Simon

forceUnmapMappedFiles causes Crashes on Unix-Systems

setting the forceUnmapMappedFiles variable in the config to true causes crashes on startup and dropping / creating entities.

This is both on ubuntu and macOS an issue. The easiest way to reproduce the bug is to remove the data directory and launch cottontail via ./gradlew run

Example stacktrace:

Exception in thread "main" java.lang.NoSuchMethodError: sun.nio.ch.DirectBuffer.cleaner()Lsun/misc/Cleaner;
        at org.mapdb.volume.ByteBufferVol.unmap(ByteBufferVol.java:340)
        at org.mapdb.volume.MappedFileVol.close(MappedFileVol.java:183)
        at org.mapdb.WriteAheadLog.destroyWalFiles(WriteAheadLog.java:792)
        at org.mapdb.StoreWAL.commit(StoreWAL.kt:632)
        at ch.unibas.dmi.dbis.cottontail.database.catalogue.Catalogue.initStore(Catalogue.kt:254)
        at ch.unibas.dmi.dbis.cottontail.database.catalogue.Catalogue.<init>(Catalogue.kt:59)
        at ch.unibas.dmi.dbis.cottontail.CottontailKt.main(Cottontail.kt:33)

Add integration tests

I have spotted (by accident) a potential issues with the CosineDistance calculation of complex numbers (see #21). Such issues could be found more quickly with tests.

Additionally, larger tests that use the gRPC endpoints could verify that the whole stack works as intended. With larger datasets, these tests could also serve as an indicator of performance costs of code changes.
Such tests could be:

  • Testing of data import
  • Testing of a KNN query on the imported data

When building and starting cotton dB, about bin / cottontail dB / path / to / your/ config.json The command could not find the target path bin / cottontaildb

bin/cottontaildb /path/to/your/ config.json is a command to run at the terminal? Why can't I find the directory bin / cottontaildb, file config.json Is it located under the cottontaildb project folder config.json Documents? /Path / to / your / has changed to your own path.bin/cottontaildb /path/to/your/ config.json Is this a command to run at the terminal? Why can't I find the directory bin / cottontaildb, file config.json Is it located under the cottontaildb project folder config.json Documents? /Path / to / your / has changed to your own path.An attempt to put the cottontaildb project in the / bin folder also failed.

Repairing Indices / Dropping them when they're not present anymore

Currently, if something goes wrong with the file-system (e.g. you delete index.db or one of the index files under an entity), you cannnot drop an index which is not present anymore since cottontail crashes while loading the entity.
Ideally, dropIndex() would cause entity headers to drop the index also if it is not present anymore.
Even if you delete the whole folder of the entity, you can't drop it anymore, since it tries to drop associated index structures first (which are of course not present anymore)

CLI Expansion

Collection of additional features for the (new) CLI and main discussion thread.

See the work in progress: https://github.com/vitrivr/cottontaildb/tree/feature/cli-expansion

Please comment to add other features you would like to see in the CLI!

  • table <schema> <entity> - Tabulated presentation of preview
  • clear <schema> <entity> - Clears the specified entity, without actually dropping the entity
  • create <schema> <entity> ??? - Creates the specified entity. ??? stands for the column definition, TBD
  • index - Creates an index on the given column. Also rebuilds this index
  • use <schema> - To set the schema and subsequent commands will use the specified schema. Not entirely sure about this one, tbh
  • TBD More flexible query as the current 'find'
  • Better visualization

LuceneIndex Optimization for ongoing inserts

Currently a lucene index has to be completely rebuilt if you want to see newly inserted entries. This is suboptimal for very large collections but not a real problem at the moment as the usecase of 'i have already 100 million docs and want to index my 100 million +1th document and instantly retrieve it' is not very common

Concurrent Inserts on the same table can cause a crash

Unfortunately, i don't have a reproducible example. I think what's causing the crash is two concurrent inserts which want to write to the same Volume (File, to be more specific) and then the second transaction wants to aquire the file lock, which causes an exception. (At least that's what i gather from the javadoc to tryLock()

Stacktrace:

ERROR CottonDMLService:84 - Insert failed because of a unknown error: null
java.nio.channels.OverlappingFileLockException
        at java.base/sun.nio.ch.FileLockTable.checkList(FileLockTable.java:229)
        at java.base/sun.nio.ch.FileLockTable.add(FileLockTable.java:123)
        at java.base/sun.nio.ch.FileChannelImpl.tryLock(FileChannelImpl.java:1154)
        at java.base/java.nio.channels.FileChannel.tryLock(FileChannel.java:1165)
        at org.mapdb.CottontailDBVolume.acquireFileLock(CottontailDBVolume.kt:550)
        at org.mapdb.CottontailDBVolume.<init>(CottontailDBVolume.kt:69)
        at org.mapdb.CottontailDBVolume$CottontailDBVolumeFactory.factory(CottontailDBVolume.kt:57)
        at org.mapdb.CottontailDBVolume$CottontailDBVolumeFactory.makeVolume(CottontailDBVolume.kt:44)
        at org.mapdb.CottontailStoreWAL$realVolume$1.invoke(CottontailStoreWAL.kt:82)

Publish Cottontail Docker-container on Dockerhub

Currently, our docker container is only available on Github. This means you need to authenticate with a personal token to pull it, which is suboptimal for shared machines (and also in general, since it's effort for every new machine you set up).

Ideally, we would simply also publish the container on dockerhub, like we do with Cineast.

Preview in CLI / limit query causes java.lang.IllegalArgumentException when executed on empty entity

Running preview in the CLI, which generates a limit query causes an java.lang.IllegalArgumentException when executed on an empty entity.

cottontaildb>count cineast cineast_segment
Counting elements of entity cineast.cineast_segment
2020-07-23 10:24:30 TRACE CottonDQLService:41 - Parsing & binding query aff5c7db-38fd-45b1-a07a-dfd4ae889e7c took 3ms.
2020-07-23 10:24:30 TRACE CottonDQLService:46 - Executing query aff5c7db-38fd-45b1-a07a-dfd4ae889e7c took 1ms.
2020-07-23 10:24:30 TRACE CottonDQLService:150 - Sending back 1 rows for position 0 of query aff5c7db-38fd-45b1-a07a-dfd4ae889e7c took 0ms.
2020-07-23 10:24:30 INFO  CottonDQLService:52 - Query aff5c7db-38fd-45b1-a07a-dfd4ae889e7c took 5ms.
data {
  key: "warren.cineast.cineast_segment.count()"
  value {
    longData: 0
  }
}
cottontaildb> preview cineast cineast_segment
Showing first 10 elements of entity cineast.cineast_segment
Previewing 10 elements of cineast_segment
2020-07-23 10:18:34 ERROR CottonDQLService:67 - Error while executing query query {
  from {
    entity {
      schema {
        name: "cineast"
      }
      name: "cineast_segment"
    }
  }
  projection {
    attributes {
      key: "*"
      value: ""
    }
  }
  limit: 10
}

java.lang.IllegalArgumentException: Start of a ranged entity scan cannot be greater than its end.
	at org.vitrivr.cottontail.database.queries.planning.nodes.basics.EntityScanNodeExpression$RangedEntityScanNodeExpression.<init>(EntityScanNodeExpression.kt:49)
	at org.vitrivr.cottontail.database.queries.planning.rules.LimitPushdownRule.apply(LimitPushdownRule.kt:22)
	at org.vitrivr.cottontail.database.queries.planning.CottontailQueryPlanner.generateCandidates(CottontailQueryPlanner.kt:53)
	at org.vitrivr.cottontail.database.queries.planning.CottontailQueryPlanner.optimize(CottontailQueryPlanner.kt:28)
	at org.vitrivr.cottontail.server.grpc.helper.GrpcQueryBinder.parseAndBindSimpleQuery(GrpcQueryBinder.kt:96)
	at org.vitrivr.cottontail.server.grpc.helper.GrpcQueryBinder.parseAndBind(GrpcQueryBinder.kt:60)
	at org.vitrivr.cottontail.server.grpc.services.CottonDQLService.query(CottonDQLService.kt:40)
	at org.vitrivr.cottontail.grpc.CottonDQLGrpc$MethodHandlers.invoke(CottonDQLGrpc.java:341)
	at io.grpc.stub.ServerCalls$UnaryServerCallHandler$UnaryServerCallListener.onHalfClose(ServerCalls.java:172)
	at io.grpc.internal.ServerCallImpl$ServerStreamListenerImpl.halfClosed(ServerCallImpl.java:331)
	at io.grpc.internal.ServerImpl$JumpToApplicationThreadServerStreamListener$1HalfClosed.runInContext(ServerImpl.java:818)
	at io.grpc.internal.ContextRunnable.run(ContextRunnable.java:37)
	at io.grpc.internal.SerializingExecutor.run(SerializingExecutor.java:123)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1130)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:630)
	at java.base/java.lang.Thread.run(Thread.java:832)
Command execution failed: UNKNOWN: Query execution failed failed because of a unknown error: Start of a ranged entity scan cannot be greater than its end.

NPEs lead to unspecific error messages during the query execution

NPEs often lead to exceptions with unspecific messages in the form of "...cannot be null" caused by another NPE when trying to create an ExecutionPlanException in src/main/kotlin/org/vitrivr/cottontail/execution/ExecutionPlan.kt line 99

This seems to be a rather unspecific issue, I'm just noting it down for reference and as a reminder that we should be aware of the nullability issues, especially when using java code from kotlin (see https://kotlinlang.org/docs/reference/null-safety.html and https://kotlinlang.org/docs/reference/null-safety.html).

Cosine distance can be NaN

As of rev. eef5737, I have observed NaN values in the cosine distance calculation.

I suspect that it's caused by the norm2 method of the complex vector values that ultimately takes kotlin's Implementation of the square root of the real and imaginary part of the number, which can lead to NaNs if either of the imaginary or real part is negative (see excerpt from org.vitrivr.cottontail.model.values.Complex32Value below).

This does not lead to an error from the DB. Instead, results are returned for a KNN query without indication of a problem.

override fun pow(x: Double): Complex64Value {
    val r = this.real.value.pow(x.toFloat()) + this.imaginary.value.pow(x.toFloat())
    val theta =  this.imaginary.value / this.real.value
    return Complex64Value(r * kotlin.math.cos(x*theta), r * kotlin.math.sin(x*theta))
}

override fun pow(x: Int): Complex64Value {
    val r = this.real.value.pow(x) + this.imaginary.value.pow(x)
    val theta =  this.imaginary.value / this.real.value
    return Complex64Value(r * kotlin.math.cos(x*theta), r * kotlin.math.sin(x*theta))
}

override fun sqrt(): Complex64Value = pow(1.0 / 2.0)

exception in thread "main" java.nio.file .NoSuchFileException: /cottontaildb-data/ config.json

I found the image now, but he told me that I didn't have a config file. The error is: exception in thread "main" java.nio.file .NoSuchFileException: /cottontaildb-data/ config.json
I put the configuration file of cottontaildb in / cottontaildb data or / users / rgasser / downloads / data / cottontaildb data. Someone encountered a similar error. He thought it was a permission problem, and he copied the missing file to the corresponding path of the host in docker to solve the problem. However, I did similar operations and did not solve the problem. Here, Co What is the nfig file? How can I solve this problem?

Optimizing Entities causes Entries to disappear

Reproducible on both dev and master with the following test (can be added to AbstractIndexTest and fails on indextests):

    @Test
    fun optimizationCountTest() {
        log("Optimizing entity ${this.entityName}.")
        val txn = this.manager.Transaction(TransactionType.SYSTEM)
        val catalogueTx = txn.getTx(this.catalogue) as CatalogueTx
        val schema = catalogueTx.schemaForName(this.schemaName)
        val schemaTx = txn.getTx(schema) as SchemaTx
        val entity = schemaTx.entityForName(this.entityName)
        val entityTx = txn.getTx(entity) as EntityTx
        val preCount = entityTx.count()
        entityTx.optimize()
        entityTx.commit()
        val countTx = txn.getTx(entity) as EntityTx
        val postCount = entityTx.count()
        if (postCount != preCount) {
            fail("optimizing caused elements to disappear")
        }
        countTx.commit()
    }

Optimizing entity with unique-hash index

I don't have a clean example to reproduce this, but if you use the VBS2020-Dump (https://download-dbis.dmi.unibas.ch/vitrivr/cottontaildb-data.zip) and try to optimize the cineast_tags entity, the following error occurs (on windows, regardless of whether forceUnmapMappedFiles is set to true or false, you can also not delete the index):

2020-11-11 19:48:10 ERROR CottonDDLService:341 - Error while optimizing entity 'warren.cineast.cineast_tags'
org.mapdb.DBException$VolumeIOError
	at org.mapdb.volume.MappedFileVol.truncate(MappedFileVol.java:285)
	at org.mapdb.WriteAheadLog.destroyWalFiles(WriteAheadLog.java:791)
	at org.mapdb.StoreWAL.commit(StoreWAL.kt:643)
	at org.mapdb.DB.commit(DB.kt:435)
	at org.vitrivr.cottontail.database.index.hash.UniqueHashIndex$Tx.performCommit(UniqueHashIndex.kt:294)
	at org.vitrivr.cottontail.database.index.Index$Tx.commit(Index.kt:160)
	at org.vitrivr.cottontail.database.entity.Entity$Tx.commit(Entity.kt:365)
	at org.vitrivr.cottontail.database.general.TransactionKt.begin(Transaction.kt:42)
	at org.vitrivr.cottontail.database.entity.Entity.updateAllIndexes(Entity.kt:268)
	at org.vitrivr.cottontail.server.grpc.services.CottonDDLService.optimize(CottonDDLService.kt:326)
	at org.vitrivr.cottontail.grpc.CottonDDLGrpc$MethodHandlers.invoke(CottonDDLGrpc.java:1107)
	at io.grpc.stub.ServerCalls$UnaryServerCallHandler$UnaryServerCallListener.onHalfClose(ServerCalls.java:182)
	at io.grpc.internal.ServerCallImpl$ServerStreamListenerImpl.halfClosed(ServerCallImpl.java:331)
	at io.grpc.internal.ServerImpl$JumpToApplicationThreadServerStreamListener$1HalfClosed.runInContext(ServerImpl.java:814)
	at io.grpc.internal.ContextRunnable.run(ContextRunnable.java:37)
	at io.grpc.internal.SerializingExecutor.run(SerializingExecutor.java:123)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
	at java.base/java.lang.Thread.run(Thread.java:834)
Caused by: java.io.IOException: Der Vorgang ist bei einer Datei mit einem ge�ffneten Bereich, der einem Benutzer zugeordnet ist, nicht anwendbar
	at java.base/sun.nio.ch.FileDispatcherImpl.truncate0(Native Method)
Caused by: java.io.IOException: Der Vorgang ist bei einer Datei mit einem ge�ffneten Bereich, der einem Benutzer zugeordnet ist, nicht anwendbar

	at java.base/sun.nio.ch.FileDispatcherImpl.truncate(FileDispatcherImpl.java:90)
	at java.base/sun.nio.ch.FileChannelImpl.truncate(FileChannelImpl.java:430)
	at org.mapdb.volume.MappedFileVol.truncate(MappedFileVol.java:283)
	... 18 more

Insertion Time Increase

Insertion of roughly 9 million Complex32Vectors of length 20 show an increase in required time of rev. c0eb4e2 (I call it new version here) over a version based on rev. 1696878 (old version).

The new version took about 30 min, while the old version took ca 12 min.

All tests were run on a 13" MacBook Pro from 2015 with 16 GB of RAM and a Core i5-5287U CPU @ 2.9 GHz.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.