GithubHelp home page GithubHelp logo

Comments (6)

sanikolaev avatar sanikolaev commented on August 12, 2024

Hi. I don't think it has anything to do with the columnar library since without it it behaves similarly:

mysql> create table idx(title text, type int, price float);
Query OK, 0 rows affected (0.00 sec)

mysql> show create table idx;
+-------+------------------------------------------------------------+
| Table | Create Table                                               |
+-------+------------------------------------------------------------+
| idx   | CREATE TABLE idx (
type integer,
price float,
title text
) |
+-------+------------------------------------------------------------+
1 row in set (0.00 sec)

If you believe it's a problem please create an issue in https://github.com/manticoresoftware/manticoresearch/issues and explain what's bad in this behaviour.

from columnar.

sangensong avatar sangensong commented on August 12, 2024

  Thank you for your reply.
   My usage scenario is like this. I have a lot of csv files that have been generated, and these csv files are still being generated. At present, I have inserted these csv files into clickhouse. Now, I want to perform full-text search on my data. Due to the huge amount of data, I feel that it is not suitable to use clickhouse for full-text search. At this time, I found your project, and I want to use your project to do it. Full Text Search. Because the field correspondence of my csv file has been determined, I see that the order of the fields seems to be adjusted when I create the table, and I am afraid that these csv files cannot be inserted into the manticore. Also, I would like to ask, does a command like clickhouse exist in manticore

cat a.csv | clickhouse-client --database=db --query="insert into table test format CSV"

  My create table sql like this:

create table t_log_http (
    log_time bigint, 
    uuid bigint, 
    src_mac text stored, 
    dst_mac text stored, 
    src_location text stored, 
    dst_location text stored, 
    app_protocol text stored, 
    app_name text stored, 
    is_ipv6 bit(8), 
    src_ip text stored, 
    dst_ip text stored, 
    ip_protocol text stored, 
    src_port bit(16), 
    dst_port bit(16), 
    method text stored, 
    uri text stored, 
    host text, 
    url_category text stored, 
    referer text stored, 
    user_agent text stored, 
    xff text stored, 
    subject text stored, 
    rsp_code text stored) engine='columnar';

  I currently don't see the way I want on the official documentation, so what I think is to use a method like the following to insert data, but this execution efficiency is too low

cat t_log_http.csv|awk -F "," '{printf("insert into t_log_http values(%s,%s,'\''%s'\'',%s,%s,%s,%s,%s,%s,%s,%s,%s,'\''
%s'\'',%s);\n",$1,$2,$3,$4,$5,$6,$7,$8,$9,$10,$11,$12,$13,$14)}' > t_log_http.sql

from columnar.

githubmanticore avatar githubmanticore commented on August 12, 2024

➤ Sergey Nikolaev commented:

the order of the fields seems to be adjusted when I create the table, and I am afraid that these csv files cannot be inserted into the manticore

When you do INSERT INTO tbl VALUES(...) in my opinion it's always good to specify the fields list explicitly, e.g. insert into t_log_http(log_time, uuid, ...) values(...).

There's no tool like clickhouse-client which can accept data at STDIN and convert it into INSERT INTO. Just simple awk may be suboptimal since it inserts data line by line, which may be not very fast, but it's the right direction. I recommend two ways:

  1. use plain index and source of type csvpipe. Interactive course: https://play.manticoresearch.com/csv/. Docs: https://manual.manticoresearch.com/Adding_data_from_external_storages/Fetching_from_CSV,TSV
  2. make more complex script which can do batches and parallel inserts, e.g. you can take this as an example https://gist.githubusercontent.com/sanikolaev/6a48de957b41481512ff8d94ed4af351/raw/e49d9212bcd964ad08c7199c08f401f6470b4501/load_sql.php .

from columnar.

sangensong avatar sangensong commented on August 12, 2024

The reason I don't use plain mode is that I see on the docs

What you cannot do with a plain index:

  • insert more data into an index after it's built
  • update it
  • delete from it
  • create/delete/alter a plain index online (you need to define it in a configuration file)
  • use UUID for automatic ID generation. When you fetch data from an external storage it must include a unique identifier for each document

In my usage scenario, I will continuously generate csv files and need to continuously insert data. Is the plain index mode still appropriate in this case?

from columnar.

tomatolog avatar tomatolog commented on August 12, 2024

full text fields and attributes are different types and all statements like desc show create table and so on iterate attributes then fields that is why there is no original order, ie attributes and fields are not mixed.

from columnar.

sanikolaev avatar sanikolaev commented on August 12, 2024

In my usage scenario, I will continuously generate csv files and need to continuously insert data. Is the plain index mode still appropriate in this case?

Continuous data processing is also possible with plain indexes, since you can:

etc.

But it's easier to just use an RT index. I gave you a script example, if you know php it shouldn't be a big deal to make it do what you want. Eventually we have in plans to implement some manticore-client (similar to clickhouse-client), but unfortunately it's only plans yet. Pull requests are very welcome!

from columnar.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.