<div class="snippet-clipboard-content notranslate position-relative overflow-auto" data-snippet-clip

Why the create table sql is different from the show create table sql about columnar HOT 6 CLOSED

manticoresoftware commented on August 12, 2024

Why the create table sql is different from the show create table sql

from columnar.

Comments (6)

sanikolaev commented on August 12, 2024

Hi. I don't think it has anything to do with the columnar library since without it it behaves similarly:

mysql> create table idx(title text, type int, price float);
Query OK, 0 rows affected (0.00 sec)

mysql> show create table idx;
+-------+------------------------------------------------------------+
| Table | Create Table                                               |
+-------+------------------------------------------------------------+
| idx   | CREATE TABLE idx (
type integer,
price float,
title text
) |
+-------+------------------------------------------------------------+
1 row in set (0.00 sec)

If you believe it's a problem please create an issue in https://github.com/manticoresoftware/manticoresearch/issues and explain what's bad in this behaviour.

from columnar.

sangensong commented on August 12, 2024

Thank you for your reply.
My usage scenario is like this. I have a lot of csv files that have been generated, and these csv files are still being generated. At present, I have inserted these csv files into clickhouse. Now, I want to perform full-text search on my data. Due to the huge amount of data, I feel that it is not suitable to use clickhouse for full-text search. At this time, I found your project, and I want to use your project to do it. Full Text Search. Because the field correspondence of my csv file has been determined, I see that the order of the fields seems to be adjusted when I create the table, and I am afraid that these csv files cannot be inserted into the manticore. Also, I would like to ask, does a command like clickhouse exist in manticore

cat a.csv | clickhouse-client --database=db --query="insert into table test format CSV"

My create table sql like this:

create table t_log_http (
    log_time bigint, 
    uuid bigint, 
    src_mac text stored, 
    dst_mac text stored, 
    src_location text stored, 
    dst_location text stored, 
    app_protocol text stored, 
    app_name text stored, 
    is_ipv6 bit(8), 
    src_ip text stored, 
    dst_ip text stored, 
    ip_protocol text stored, 
    src_port bit(16), 
    dst_port bit(16), 
    method text stored, 
    uri text stored, 
    host text, 
    url_category text stored, 
    referer text stored, 
    user_agent text stored, 
    xff text stored, 
    subject text stored, 
    rsp_code text stored) engine='columnar';

I currently don't see the way I want on the official documentation, so what I think is to use a method like the following to insert data, but this execution efficiency is too low

cat t_log_http.csv|awk -F "," '{printf("insert into t_log_http values(%s,%s,'\''%s'\'',%s,%s,%s,%s,%s,%s,%s,%s,%s,'\''
%s'\'',%s);\n",$1,$2,$3,$4,$5,$6,$7,$8,$9,$10,$11,$12,$13,$14)}' > t_log_http.sql

from columnar.

githubmanticore commented on August 12, 2024

➤ Sergey Nikolaev commented:

the order of the fields seems to be adjusted when I create the table, and I am afraid that these csv files cannot be inserted into the manticore

When you do INSERT INTO tbl VALUES(...) in my opinion it's always good to specify the fields list explicitly, e.g. insert into t_log_http(log_time, uuid, ...) values(...).

There's no tool like clickhouse-client which can accept data at STDIN and convert it into INSERT INTO. Just simple awk may be suboptimal since it inserts data line by line, which may be not very fast, but it's the right direction. I recommend two ways:

use plain index and source of type csvpipe. Interactive course: https://play.manticoresearch.com/csv/. Docs: https://manual.manticoresearch.com/Adding_data_from_external_storages/Fetching_from_CSV,TSV
make more complex script which can do batches and parallel inserts, e.g. you can take this as an example https://gist.githubusercontent.com/sanikolaev/6a48de957b41481512ff8d94ed4af351/raw/e49d9212bcd964ad08c7199c08f401f6470b4501/load_sql.php .

from columnar.

sangensong commented on August 12, 2024

The reason I don't use plain mode is that I see on the docs

What you cannot do with a plain index:

insert more data into an index after it's built
update it
delete from it
create/delete/alter a plain index online (you need to define it in a configuration file)
use UUID for automatic ID generation. When you fetch data from an external storage it must include a unique identifier for each document

In my usage scenario, I will continuously generate csv files and need to continuously insert data. Is the plain index mode still appropriate in this case?

from columnar.

tomatolog commented on August 12, 2024

full text fields and attributes are different types and all statements like desc show create table and so on iterate attributes then fields that is why there is no original order, ie attributes and fields are not mixed.

from columnar.

sanikolaev commented on August 12, 2024

In my usage scenario, I will continuously generate csv files and need to continuously insert data. Is the plain index mode still appropriate in this case?

Continuous data processing is also possible with plain indexes, since you can:

combine multiple plain indexes into a distributed index (https://mnt.cr/distributed)
add remove indexes to config while Manticore is running or make your config a script at all (https://mnt.cr/shebang) and do RELOAD INDEXES (https://mnt.cr/reload%20indexes) to sync with the config
merge indexes (https://manual.manticoresearch.com/Adding_data_from_external_storages/Adding_data_from_indexes/Merging_indexes)

etc.

But it's easier to just use an RT index. I gave you a script example, if you know php it shouldn't be a big deal to make it do what you want. Eventually we have in plans to implement some manticore-client (similar to clickhouse-client), but unfortunately it's only plans yet. Pull requests are very welcome!

from columnar.

Why the create table sql is different from the show create table sql about columnar HOT 6 CLOSED

Comments (6)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs