GithubHelp home page GithubHelp logo

supabase / index_advisor Goto Github PK

View Code? Open in Web Editor NEW
1.4K 6.0 10.0 1.12 MB

PostgreSQL Index Advisor

Home Page: https://supabase.com/docs/guides/database/extensions/index_advisor

License: PostgreSQL License

Makefile 0.84% PLpgSQL 99.16%
extension indexing postgres postgresql

index_advisor's Introduction

PostgreSQL Index Advisor

PostgreSQL version License tests

A PostgreSQL extension for recommending indexes to improve query performance.

Dashboard

Features

  • Supports generic parameters e.g. $1, $2
  • Supports materialized views
  • Identifies tables/columns obfuscaed by views

API

Description

For a given query, searches for a set of SQL DDL create index statements that improve the query's execution time;

Signature

index_advisor(query text)
returns
    table  (
        startup_cost_before jsonb,
        startup_cost_after jsonb,
        total_cost_before jsonb,
        total_cost_after jsonb,
        index_statements text[],
        errors text[]
    )

Usage

For a minimal example, the index_advisor function can be given a single table query with a filter on an unindexed column.

create extension if not exists index_advisor cascade;

create table book(
  id int primary key,
  title text not null
);

select
    *
from
  index_advisor('select book.id from book where title = $1');
 startup_cost_before | startup_cost_after | total_cost_before | total_cost_after |                  index_statements                   | errors
---------------------+--------------------+-------------------+------------------+-----------------------------------------------------+--------
 0.00                | 1.17               | 25.88             | 6.40             | {"CREATE INDEX ON public.book USING btree (title)"},| {}

(1 row)

More complex queries may generate additional suggested indexes

create extension if not exists index_advisor cascade;

create table author(
    id serial primary key,
    name text not null
);

create table publisher(
    id serial primary key,
    name text not null,
    corporate_address text
);

create table book(
    id serial primary key,
    author_id int not null references author(id),
    publisher_id int not null references publisher(id),
    title text
);

create table review(
    id serial primary key,
    book_id int references book(id),
    body text not null
);

select
    *
from
    index_advisor('
        select
            book.id,
            book.title,
            publisher.name as publisher_name,
            author.name as author_name,
            review.body review_body
        from
            book
            join publisher
                on book.publisher_id = publisher.id
            join author
                on book.author_id = author.id
            join review
                on book.id = review.book_id
        where
            author.id = $1
            and publisher.id = $2
    ');
 startup_cost_before | startup_cost_after | total_cost_before | total_cost_after |                  index_statements                         | errors
---------------------+--------------------+-------------------+------------------+-----------------------------------------------------------+--------
 27.26               | 12.77              | 68.48             | 42.37            | {"CREATE INDEX ON public.book USING btree (author_id)",   | {}
                                                                                    "CREATE INDEX ON public.book USING btree (publisher_id)",
                                                                                    "CREATE INDEX ON public.review USING btree (book_id)"}
(3 rows)

Install

Requires Postgres with hypopg installed.

git clone https://github.com/supabase/index_advisor.git
cd index_advisor
sudo make install

Run Tests

make install; make installcheck

index_advisor's People

Contributors

kiwicopple avatar olirice avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

index_advisor's Issues

ERROR: Failed to run sql query: data type json has no default operator class for access method "btree"

Running the function will always result in the following error:

Failed to run sql query: data type json has no default operator class for access method "btree"

Current workaround is to make the following change:

from

        where
            pc.relnamespace::regnamespace::text not in ( -- ignore schema list
                'pg_catalog', 'pg_toast', 'information_schema'
            )
            and er.oid is null -- ignore entities owned by extensions
            and pc.relkind in ('r', 'm') -- regular tables, and materialized views
            and pc.relpersistence = 'p' -- permanent tables (not unlogged or temporary)
            and pa.attnum > 0
            and not pa.attisdropped
            and pi.indrelid is null
        )

to

        where
            pc.relnamespace::regnamespace::text not in ( -- ignore schema list
                'pg_catalog', 'pg_toast', 'information_schema'
            )
            and er.oid is null -- ignore entities owned by extensions
            and pc.relkind in ('r', 'm') -- regular tables, and materialized views
            and pc.relpersistence = 'p' -- permanent tables (not unlogged or temporary)
            and pa.attnum > 0
            and not pa.attisdropped
            and pi.indrelid is null
            and pa.atttypid in (20,16,1082,1184,1114,701,23,21,700,1083,2950,1700,25,18,1042,1043)
        )

Handle multi-queries optimization

Hello,

I just saw this repository listed on HN.
I like the idea.
At first, I thought it was completely trivial to optimize index creations for a single query.
And then I thought that it still would be nice to test all possibilities when you have a query with 15 distinct fields used in WHERE part.
Clearly, this is not hard to optimize when you know your data well,
but still it is better if you can automatize exhaustive search,
in case you missed some unknown feature of your data.
And you can also save some time.
So the base step is still interesting.
And the more interesting step(s) in my opinion is to handle queries distributions.
For example I worked on MySQL queries with a common part regarding tables and joins mainly, for a filter list view screen,
but maybe between 30 and 70 or 100 possible filters.
Somehow, in most cases, the trick was to simplify the job by forcing that a small datetime range filtered out most of the data.
But we had customers that were not using the parametring that forced small datetime ranges, or only on "normal users" and not on "admin" users.
And thus query building logic that I did generated optimized "USE INDEX" for MySQL.
(I don't remember if I also had to do some dynamic STRAIGHT JOIN for this/these query.)
But the fact is that I also had a policy of adding all indexes without looking too much on database size constraints.
In my current work, exploitation database size is way larger since we do not bound the timerange of history.
And thus the size of indexes matters much more.
If you could handle the optimization of choosing indexes in a clever way:

  • for variations of the "same" query with objectives to minimize the size of indexes
  • for queries using the same tables
  • for queries using sets of tables that are not disjoint
  • for "connected components" of queries graph (an edge if sets of tables are not disjoint)

somehow with a continuum before and between these possibilities,
it would be nice.
Of course, think that the problem behind must be NP-complete.
But clearly, a good tool must help, because our approximate solutions by hand
are far from perfect most of the time unless the data has really distinct features and that you know them, or that you have a lot of luck.

Best regards,
Laurent Lyaudet

Error handling for invalid queries is not implemented

If an existing query fails, the function will throw an error such as:

Failed to run sql query: relation "xxxx" does not exist

This scenario will happen when:

  1. The user is trying to optimize based on queries found in pg_stat_statements
  2. A table has been deleted that is referenced in a pg_stat_statements query

Desired result:
Error handling should detect and gracefully ignore queries that may currently fail.

Queries containing embedded comments may fail because they contain a semicolon

Queries obtained from pg_stat_statement often contain embedded comments which often contain a semicolon. This causes index_advisor to throw an error even though the semicolon doesn't exist in the actual SQL statement.

To fix this, queries should be run through a function to strip embedded comments:

trim(regexp_replace(
         regexp_replace(
            regexp_replace(query,'\/\*.+\*\/','','g'),
         '--[^\r\n]*', ' ', 'g')
      , '\s+', ' ', 'g'))

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.