GithubHelp home page GithubHelp logo

Comments (2)

findinpath avatar findinpath commented on August 16, 2024

I tried to reproduce exactly the scenario you pointed out and did have initially the following files

content |                                                                            file_path                                                                            | file_format | record_count | file_size_in_bytes |       column_sizes   >
---------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------+-------------+--------------+--------------------+---------------------->
       0 | hdfs://hadoop-master:9000/user/hive/warehouse/t1-57ebe823af944510bbbaec9f2d745fed/data/20240516_081115_00000_t8p79-27602614-48a2-46eb-9baa-b6db886c2e23.parquet | PARQUET     |       874421 |            1752503 | {1=889537, 2=862660} >
       0 | hdfs://hadoop-master:9000/user/hive/warehouse/t1-57ebe823af944510bbbaec9f2d745fed/data/20240516_081115_00000_t8p79-3c85df99-c2ab-47d5-9253-e330fcb9c711.parquet | PARQUET     |      2262365 |            4518053 | {1=2301435, 2=2216305>
       0 | hdfs://hadoop-master:9000/user/hive/warehouse/t1-57ebe823af944510bbbaec9f2d745fed/data/20240516_081115_00000_t8p79-fc9ace1d-b5c5-4fc1-83f8-0f41ecdf966c.parquet | PARQUET     |      4015604 |            8040134 | {1=4084631, 2=3955190>
       0 | hdfs://hadoop-master:9000/user/hive/warehouse/t1-57ebe823af944510bbbaec9f2d745fed/data/20240516_081115_00000_t8p79-252522f7-225d-47ab-b119-8f9ccc692251.parquet | PARQUET     |      6275254 |           12568807 | {1=6382707, 2=6185785>
       0 | hdfs://hadoop-master:9000/user/hive/warehouse/t1-57ebe823af944510bbbaec9f2d745fed/data/20240516_081115_00000_t8p79-cb981908-7b63-4a96-a209-6882586fb0b2.parquet | PARQUET     |      9906127 |           19865711 | {1=10076133, 2=978926>
       0 | hdfs://hadoop-master:9000/user/hive/warehouse/t1-57ebe823af944510bbbaec9f2d745fed/data/20240516_081115_00000_t8p79-0bdc5507-edec-4ebd-8e05-6a4f5c09e142.parquet | PARQUET     |     16666230 |           33443003 | {1=16946156, 2=164965>
(6 rows)

After optimize however there was only one file

trino> SELECT * from iceberg.default."t1$files";
 content |                                                                            file_path                                                                            | file_format | record_count | file_size_in_bytes |       column_sizes   >
---------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------+-------------+--------------+--------------------+---------------------->
       0 | hdfs://hadoop-master:9000/user/hive/warehouse/t1-57ebe823af944510bbbaec9f2d745fed/data/20240516_081901_00002_t8p79-03481c56-03ef-46da-9482-0bdc1c8f7eb4.parquet | PARQUET     |     40000001 |           80273843 | {1=40660676, 2=396128>
(1 row)

from trino.

jhatcher1 avatar jhatcher1 commented on August 16, 2024

After trying some things, I think it might be related to the number of workers in the cluster. When I scaled down to a single worker I was able to optimize down to one file, but with 3 workers I could only get it down to 3 files.

from trino.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.