supabase / index_advisor Goto Github PK

View Code? Open in Web Editor NEW

1.4K 6.0 10.0 1.12 MB

PostgreSQL Index Advisor

Home Page: https://supabase.com/docs/guides/database/extensions/index_advisor

License: PostgreSQL License

Makefile 0.84% PLpgSQL 99.16%

extension indexing postgres postgresql

index_advisor's Introduction

PostgreSQL Index Advisor

A PostgreSQL extension for recommending indexes to improve query performance.

Features

Supports generic parameters e.g. $1, $2
Supports materialized views
Identifies tables/columns obfuscaed by views

API

Description

For a given query, searches for a set of SQL DDL create index statements that improve the query's execution time;

Signature

index_advisor(query text)
returns
    table  (
        startup_cost_before jsonb,
        startup_cost_after jsonb,
        total_cost_before jsonb,
        total_cost_after jsonb,
        index_statements text[],
        errors text[]
    )

Usage

For a minimal example, the index_advisor function can be given a single table query with a filter on an unindexed column.

create extension if not exists index_advisor cascade;

create table book(
  id int primary key,
  title text not null
);

select
    *
from
  index_advisor('select book.id from book where title = $1');

 startup_cost_before | startup_cost_after | total_cost_before | total_cost_after |                  index_statements                   | errors
---------------------+--------------------+-------------------+------------------+-----------------------------------------------------+--------
 0.00                | 1.17               | 25.88             | 6.40             | {"CREATE INDEX ON public.book USING btree (title)"},| {}

(1 row)

More complex queries may generate additional suggested indexes

create extension if not exists index_advisor cascade;

create table author(
    id serial primary key,
    name text not null
);

create table publisher(
    id serial primary key,
    name text not null,
    corporate_address text
);

create table book(
    id serial primary key,
    author_id int not null references author(id),
    publisher_id int not null references publisher(id),
    title text
);

create table review(
    id serial primary key,
    book_id int references book(id),
    body text not null
);

select
    *
from
    index_advisor('
        select
            book.id,
            book.title,
            publisher.name as publisher_name,
            author.name as author_name,
            review.body review_body
        from
            book
            join publisher
                on book.publisher_id = publisher.id
            join author
                on book.author_id = author.id
            join review
                on book.id = review.book_id
        where
            author.id = $1
            and publisher.id = $2
    ');

 startup_cost_before | startup_cost_after | total_cost_before | total_cost_after |                  index_statements                         | errors
---------------------+--------------------+-------------------+------------------+-----------------------------------------------------------+--------
 27.26               | 12.77              | 68.48             | 42.37            | {"CREATE INDEX ON public.book USING btree (author_id)",   | {}
                                                                                    "CREATE INDEX ON public.book USING btree (publisher_id)",
                                                                                    "CREATE INDEX ON public.review USING btree (book_id)"}
(3 rows)

Install

Requires Postgres with hypopg installed.

git clone https://github.com/supabase/index_advisor.git
cd index_advisor
sudo make install

Run Tests

make install; make installcheck

index_advisor's People

Contributors

Stargazers

Watchers

Forkers

monad-one 831jsh techventurebuilder qqq-tech richardsonjf alexpaiva jenningsje joshuabellew jn7163 makwana-ashish

index_advisor's Issues

ERROR: Failed to run sql query: data type json has no default operator class for access method "btree"

Running the function will always result in the following error:

Failed to run sql query: data type json has no default operator class for access method "btree"

Current workaround is to make the following change:

from

        where
            pc.relnamespace::regnamespace::text not in ( -- ignore schema list
                'pg_catalog', 'pg_toast', 'information_schema'
            )
            and er.oid is null -- ignore entities owned by extensions
            and pc.relkind in ('r', 'm') -- regular tables, and materialized views
            and pc.relpersistence = 'p' -- permanent tables (not unlogged or temporary)
            and pa.attnum > 0
            and not pa.attisdropped
            and pi.indrelid is null
        )

        where
            pc.relnamespace::regnamespace::text not in ( -- ignore schema list
                'pg_catalog', 'pg_toast', 'information_schema'
            )
            and er.oid is null -- ignore entities owned by extensions
            and pc.relkind in ('r', 'm') -- regular tables, and materialized views
            and pc.relpersistence = 'p' -- permanent tables (not unlogged or temporary)
            and pa.attnum > 0
            and not pa.attisdropped
            and pi.indrelid is null
            and pa.atttypid in (20,16,1082,1184,1114,701,23,21,700,1083,2950,1700,25,18,1042,1043)
        )

Handle multi-queries optimization

Hello,

I just saw this repository listed on HN.
I like the idea.
At first, I thought it was completely trivial to optimize index creations for a single query.
And then I thought that it still would be nice to test all possibilities when you have a query with 15 distinct fields used in WHERE part.
Clearly, this is not hard to optimize when you know your data well,
but still it is better if you can automatize exhaustive search,
in case you missed some unknown feature of your data.
And you can also save some time.
So the base step is still interesting.
And the more interesting step(s) in my opinion is to handle queries distributions.
For example I worked on MySQL queries with a common part regarding tables and joins mainly, for a filter list view screen,
but maybe between 30 and 70 or 100 possible filters.
Somehow, in most cases, the trick was to simplify the job by forcing that a small datetime range filtered out most of the data.
But we had customers that were not using the parametring that forced small datetime ranges, or only on "normal users" and not on "admin" users.
And thus query building logic that I did generated optimized "USE INDEX" for MySQL.
(I don't remember if I also had to do some dynamic STRAIGHT JOIN for this/these query.)
But the fact is that I also had a policy of adding all indexes without looking too much on database size constraints.
In my current work, exploitation database size is way larger since we do not bound the timerange of history.
And thus the size of indexes matters much more.
If you could handle the optimization of choosing indexes in a clever way:

for variations of the "same" query with objectives to minimize the size of indexes
for queries using the same tables
for queries using sets of tables that are not disjoint
for "connected components" of queries graph (an edge if sets of tables are not disjoint)

somehow with a continuum before and between these possibilities,
it would be nice.
Of course, think that the problem behind must be NP-complete.
But clearly, a good tool must help, because our approximate solutions by hand
are far from perfect most of the time unless the data has really distinct features and that you know them, or that you have a lot of luck.

Best regards,
Laurent Lyaudet

Error handling for invalid queries is not implemented

If an existing query fails, the function will throw an error such as:

Failed to run sql query: relation "xxxx" does not exist

This scenario will happen when:

The user is trying to optimize based on queries found in pg_stat_statements
A table has been deleted that is referenced in a pg_stat_statements query

Desired result:
Error handling should detect and gracefully ignore queries that may currently fail.

Queries containing embedded comments may fail because they contain a semicolon

Queries obtained from pg_stat_statement often contain embedded comments which often contain a semicolon. This causes index_advisor to throw an error even though the semicolon doesn't exist in the actual SQL statement.

To fix this, queries should be run through a function to strip embedded comments:

trim(regexp_replace(
         regexp_replace(
            regexp_replace(query,'\/\*.+\*\/','','g'),
         '--[^\r\n]*', ' ', 'g')
      , '\s+', ' ', 'g'))

supabase / index_advisor Goto Github PK

index_advisor's Introduction

PostgreSQL Index Advisor

Features

API

Description

Signature

Usage

Install

Run Tests

index_advisor's People

Contributors

Stargazers

Watchers

Forkers

index_advisor's Issues

Recommend Projects

Recommend Topics

Recommend Org

Jobs