Comments (6)
Hi. I don't think it has anything to do with the columnar library since without it it behaves similarly:
mysql> create table idx(title text, type int, price float);
Query OK, 0 rows affected (0.00 sec)
mysql> show create table idx;
+-------+------------------------------------------------------------+
| Table | Create Table |
+-------+------------------------------------------------------------+
| idx | CREATE TABLE idx (
type integer,
price float,
title text
) |
+-------+------------------------------------------------------------+
1 row in set (0.00 sec)
If you believe it's a problem please create an issue in https://github.com/manticoresoftware/manticoresearch/issues and explain what's bad in this behaviour.
from columnar.
Thank you for your reply.
My usage scenario is like this. I have a lot of csv files that have been generated, and these csv files are still being generated. At present, I have inserted these csv files into clickhouse. Now, I want to perform full-text search on my data. Due to the huge amount of data, I feel that it is not suitable to use clickhouse for full-text search. At this time, I found your project, and I want to use your project to do it. Full Text Search. Because the field correspondence of my csv file has been determined, I see that the order of the fields seems to be adjusted when I create the table, and I am afraid that these csv files cannot be inserted into the manticore. Also, I would like to ask, does a command like clickhouse exist in manticore
cat a.csv | clickhouse-client --database=db --query="insert into table test format CSV"
My create table sql like this:
create table t_log_http (
log_time bigint,
uuid bigint,
src_mac text stored,
dst_mac text stored,
src_location text stored,
dst_location text stored,
app_protocol text stored,
app_name text stored,
is_ipv6 bit(8),
src_ip text stored,
dst_ip text stored,
ip_protocol text stored,
src_port bit(16),
dst_port bit(16),
method text stored,
uri text stored,
host text,
url_category text stored,
referer text stored,
user_agent text stored,
xff text stored,
subject text stored,
rsp_code text stored) engine='columnar';
I currently don't see the way I want on the official documentation, so what I think is to use a method like the following to insert data, but this execution efficiency is too low
cat t_log_http.csv|awk -F "," '{printf("insert into t_log_http values(%s,%s,'\''%s'\'',%s,%s,%s,%s,%s,%s,%s,%s,%s,'\''
%s'\'',%s);\n",$1,$2,$3,$4,$5,$6,$7,$8,$9,$10,$11,$12,$13,$14)}' > t_log_http.sql
from columnar.
➤ Sergey Nikolaev commented:
the order of the fields seems to be adjusted when I create the table, and I am afraid that these csv files cannot be inserted into the manticore
When you do INSERT INTO tbl VALUES(...)
in my opinion it's always good to specify the fields list explicitly, e.g. insert into t_log_http(log_time, uuid, ...) values(...)
.
There's no tool like clickhouse-client
which can accept data at STDIN and convert it into INSERT INTO
. Just simple awk may be suboptimal since it inserts data line by line, which may be not very fast, but it's the right direction. I recommend two ways:
- use plain index and source of type
csvpipe
. Interactive course: https://play.manticoresearch.com/csv/. Docs: https://manual.manticoresearch.com/Adding_data_from_external_storages/Fetching_from_CSV,TSV - make more complex script which can do batches and parallel inserts, e.g. you can take this as an example https://gist.githubusercontent.com/sanikolaev/6a48de957b41481512ff8d94ed4af351/raw/e49d9212bcd964ad08c7199c08f401f6470b4501/load_sql.php .
from columnar.
The reason I don't use plain mode is that I see on the docs
What you cannot do with a plain index:
- insert more data into an index after it's built
- update it
- delete from it
- create/delete/alter a plain index online (you need to define it in a configuration file)
- use UUID for automatic ID generation. When you fetch data from an external storage it must include a unique identifier for each document
In my usage scenario, I will continuously generate csv files and need to continuously insert data. Is the plain index mode still appropriate in this case?
from columnar.
full text fields and attributes are different types and all statements like desc
show create table
and so on iterate attributes then fields that is why there is no original order, ie attributes and fields are not mixed.
from columnar.
In my usage scenario, I will continuously generate csv files and need to continuously insert data. Is the plain index mode still appropriate in this case?
Continuous data processing is also possible with plain indexes, since you can:
- combine multiple plain indexes into a distributed index (https://mnt.cr/distributed)
- add remove indexes to config while Manticore is running or make your config a script at all (https://mnt.cr/shebang) and do
RELOAD INDEXES
(https://mnt.cr/reload%20indexes) to sync with the config - merge indexes (https://manual.manticoresearch.com/Adding_data_from_external_storages/Adding_data_from_indexes/Merging_indexes)
etc.
But it's easier to just use an RT index. I gave you a script example, if you know php it shouldn't be a big deal to make it do what you want. Eventually we have in plans to implement some manticore-client
(similar to clickhouse-client
), but unfortunately it's only plans yet. Pull requests are very welcome!
from columnar.
Related Issues (20)
- Crash on loading index. HOT 2
- RANDOM CRASH HOT 2
- Improve filtering performance by using SIMD predication HOT 1
- SI affects result set HOT 2
- failed tests at c6dbbc HOT 1
- Can publish to homebrew? HOT 2
- columnar: integrate streamvbyte library HOT 2
- crash after 709b9aca HOT 8
- Update deps in the daemon after CI HOT 1
- int -> bigint ALTER
- partial results with SI on HOT 1
- CI failed at da216d8a HOT 1
- mac MCL package doesn't include the knn lib HOT 2
- incorporate autotag into CI HOT 3
- Try Annoy instead of HNSW
- String comparison is not working HOT 1
- latest tests failures HOT 1
- Add possibility to set EF on every search for vector field HOT 6
- distributed index can not handle knn queries HOT 1
- Precision issue while querying the vector on the Qdrant vector benchmark test HOT 9
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from columnar.