postgrespro / aqo Goto Github PK

Adaptive query optimization for PostgreSQL

License: GNU Affero General Public License v3.0

Makefile 0.47% C 72.93% PLpgSQL 19.82% Perl 6.78%

adaptive optimization postgresql query-optimizer

aqo's Introduction

PostgreSQL Database Management System
=====================================

This directory contains the source code distribution of the PostgreSQL
database management system.

PostgreSQL is an advanced object-relational database management system
that supports an extended subset of the SQL standard, including
transactions, foreign keys, subqueries, triggers, user-defined types
and functions.  This distribution also contains C language bindings.

PostgreSQL has many language interfaces, many of which are listed here:

	http://www.postgresql.org/download

See the file INSTALL for instructions on how to build and install
PostgreSQL.  That file also lists supported operating systems and
hardware platforms and contains information regarding any other
software packages that are required to build or run the PostgreSQL
system.  Copyright and license information can be found in the
file COPYRIGHT.  A comprehensive documentation set is included in this
distribution; it can be read as described in the installation
instructions.

The latest version of this software may be obtained at
http://www.postgresql.org/download/.  For more information look at our
web site located at http://www.postgresql.org/.

aqo's People

Contributors

Stargazers

Watchers

aqo's Issues

First-time user question - setting up and running aqo

The following plan shows a large cardinality estimation error. AQO isn't able to improve the estimate. Is this hitting a known limitation of AQO?

There is also an index-based plan for delivering the order needed by the aggregate has lower startup cost, and should have been selected in this case because of the LIMIT 1.

The plan:

Limit (cost=6479.18..6498.46 rows=1 width=4) (actual time=51.979..51.979 rows=1 loops=1)
-> Subquery Scan on tmp (cost=6479.18..6498.46 rows=1 width=4) (actual time=51.978..51.978 rows=1 loops=1)
Filter: ((tmp.pstn_vrsn_nbr = tmp.a) AND (((tmp.prty_1_trd_rfrnc_id = 'MQa4147f9c7ac809116532482232ee0630'::text) AND (tmp.prty_1_id = 'EQP2317'::text)) OR ((tmp.prty_2_trd_rfrnc_id = 'MQ7867600a47151071bd01bc6c71c0f4d9'::text) AND (tmp.prty_2_id = 'EQP2895'::text))))
-> WindowAgg (cost=6479.18..6487.62 rows=482 width=96) (actual time=51.974..51.974 rows=1 loops=1)
-> Sort (cost=6479.18..6480.39 rows=482 width=92) (actual time=51.962..51.962 rows=2 loops=1)
Sort Key: pos.dtcc_trd_ref_id
Sort Method: quicksort Memory: 102kB
-> Gather (cost=1000.00..6457.70 rows=482 width=92) (actual time=0.343..51.517 rows=519 loops=1)
Workers Planned: 2
Workers Launched: 2
-> Parallel Seq Scan on "position" pos (cost=0.00..5409.50 rows=201 width=92) (actual time=0.314..41.780 rows=173 loops=3)
Filter: ((prty_1_id = 'EQP2317'::text) OR (prty_2_id = 'EQP2895'::text))
Rows Removed by Filter: 83160
Planning time: 0.747 ms
Execution time: 53.444 ms
Using aqo: true
(16 rows)

The SQL:

explain analyze
SELECT 1
FROM
(SELECT
dtcc_trd_ref_id,
pstn_vrsn_nbr,
prty_1_trd_rfrnc_id,
prty_1_id,
prty_2_trd_rfrnc_id,
prty_2_id ,
MAX(pstn_vrsn_nbr) OVER (PARTITION BY dtcc_trd_ref_id) AS a
FROM ddl_position.position pos
WHERE (prty_1_id = 'EQP2317' or prty_2_id = 'EQP2895')
) as tmp
WHERE
pstn_vrsn_nbr = a
AND ((prty_1_trd_rfrnc_id = 'MQa4147f9c7ac809116532482232ee0630' AND prty_1_id = 'EQP2317') OR
(prty_2_trd_rfrnc_id = 'MQ7867600a47151071bd01bc6c71c0f4d9' AND prty_2_id = 'EQP2895')
)
limit 1 ;

DockerHub Distribution

It would be great to have a ready-to-run docker image. This would lower the hurdle to get started with aqo.

Search path order could cause aqo_* tables to not be stored in the public schema

The aqo--1.0.sql file doesn't specify the public schema name on the aqo_* relations, but internally the code expects that these objects will be in the public schema. Specifying "public." on the aqo_* objects would eliminate this source of errors if the user sets the search_path to pg_catalog, public; (for example).

AQO не может найти оптимальный параметризованый план запроса, поскольку то не разу не выполнялся

Archive.zip

can not build aqo

I'm trying to build aqo on postgresql 12.5, but it always fails with error like the followed picture attached.
The step in README.md is strictly followed.

How to use this extension on a Postgres using Docker ?

Hello there ! First, thanks for creating this repository, it looks very promising and I can't wait to test it out.

I have some trouble understanding the steps to get going, especially when using Postgres in a Docker container. So I have some questions :

For now I'm using an official image (postgres:13.4-buster), is it possible to install the extension in the container ? Or do I need to create my own Dockerfile which builds from the official image but has some additional steps ?

If so, what are the steps required ? I'm sure i'm not the only one using Postgres in Docker, maybe it could be a good idea to put it in the README ? What do you think ?

Thanks !

AQO service storage problems

One of the main aqo problems is the storage (aqo_* relations). Extension writes query and a learn result during (sub)transaction, before and after query execution. It induce some problems:

During service relation write access we change CommandCounter state. It can't be happened during a parallel mode (now we disable aqo in this case); Also, changing CommandCounter state during partitioning triggers execution lead to inifinity invalidation messages cycle (it can be demonstrated by alter_table.sql regression test). And some another problems.
Parametrized plans has same hash value. It is induces aqo-related deadlocks which is not expected without aqo learning. It is meaning that on a high loads we need to use AQO_MODE_FORCED mode and learn aqo before high loading.

One of the Join Order Benchmark queries stops reducing cardinality error

In one case (and one case only), cardinality error achieves stability without progressing towards 0. here is the query and the plan after convergence. I also have the states of the aqo tables which I can forward if requested.

Here is one line of the explain output showing a non-converged estimated vs actual rows:

Hash Join (cost=96459.24..130001.90 rows=68198 width=29) (actual time=973.541..1415.390 rows=198750 loops=3)

The query:

EXPLAIN (analyze,verbose,buffers) SELECT MIN(chn.name) AS character, MIN(t.title) AS movie_with_american_producer FROM char_name AS chn, cast_info AS ci, company_name AS cn, company_type AS ct, movie_companies AS mc, role_type AS rt, title AS t WHERE ci.note like '%(producer)%' AND cn.country_code = '[us]' AND t.production_year > 1990 AND t.id = mc.movie_id AND t.id = ci.movie_id AND ci.movie_id = mc.movie_id AND chn.id = ci.person_role_id AND rt.id = ci.role_id AND cn.id = mc.company_id AND ct.id = mc.company_type_id;

The text format plan:

QUERY PLAN
Finalize Aggregate (cost=242898.15..242898.16 rows=1 width=64) (actual time=6689.777..6689.777 rows=1 loops=1)
Output: min(chn.name), min(t.title)
Buffers: shared hit=23037955
-> Gather (cost=242897.93..242898.14 rows=2 width=64) (actual time=6689.740..6689.811 rows=3 loops=1)
Output: (PARTIAL min(chn.name)), (PARTIAL min(t.title))
Workers Planned: 2
Workers Launched: 2
Buffers: shared hit=23037955
-> Partial Aggregate (cost=241897.93..241897.94 rows=1 width=64) (actual time=6670.368..6670.368 rows=1 loops=3)
Output: PARTIAL min(chn.name), PARTIAL min(t.title)
Buffers: shared hit=23037955
Worker 0: actual time=6663.069..6663.069 rows=1 loops=1
Buffers: shared hit=7855588
Worker 1: actual time=6658.810..6658.810 rows=1 loops=1
Buffers: shared hit=7632929
-> Nested Loop (cost=96460.50..241897.93 rows=1 width=33) (actual time=2452.152..6670.319 rows=3 loops=3)
Output: chn.name, t.title
Inner Unique: true
Buffers: shared hit=23037955
Worker 0: actual time=2371.185..6663.041 rows=2 loops=1
Buffers: shared hit=7855588
Worker 1: actual time=2539.235..6658.735 rows=4 loops=1
Buffers: shared hit=7632929
-> Nested Loop (cost=96460.36..241897.77 rows=1 width=37) (actual time=2452.140..6670.293 rows=3 loops=3)
Output: chn.name, ci.role_id, t.title
Inner Unique: true
Buffers: shared hit=23037933
Worker 0: actual time=2371.169..6663.020 rows=2 loops=1
Buffers: shared hit=7855583
Worker 1: actual time=2539.220..6658.703 rows=4 loops=1
Buffers: shared hit=7632920
-> Nested Loop (cost=96460.23..241897.62 rows=1 width=41) (actual time=2452.113..6670.247 rows=3 loops=3)
Output: chn.name, ci.role_id, mc.company_type_id, t.title
Inner Unique: true
Buffers: shared hit=23037911
Worker 0: actual time=2371.132..6662.976 rows=2 loops=1
Buffers: shared hit=7855578
Worker 1: actual time=2539.188..6658.652 rows=4 loops=1
Buffers: shared hit=7632911
-> Nested Loop (cost=96459.80..199048.11 rows=89456 width=29) (actual time=973.601..6596.097 rows=260701 loops=3)
Output: ci.person_role_id, ci.role_id, mc.company_type_id, t.title
Join Filter: (t.id = ci.movie_id)
Buffers: shared hit=23037869
Worker 0: actual time=968.453..6589.125 rows=263841 loops=1
Buffers: shared hit=7855569
Worker 1: actual time=976.696..6582.546 rows=264690 loops=1
Buffers: shared hit=7632894
-> Hash Join (cost=96459.24..130001.90 rows=68198 width=29) (actual time=973.541..1415.390 rows=198750 loops=3)
Output: mc.movie_id, mc.company_type_id, t.title, t.id
Inner Unique: true
Hash Cond: (mc.movie_id = t.id)
Buffers: shared hit=270485
Worker 0: actual time=968.404..1435.629 rows=203090 loops=1
Buffers: shared hit=90069
Worker 1: actual time=976.625..1420.650 rows=197823 loops=1
Buffers: shared hit=90479
-> Hash Join (cost=6990.11..39505.27 rows=391430 width=8) (actual time=49.950..343.319 rows=384599 loops=3)
Output: mc.movie_id, mc.company_type_id
Inner Unique: true
Hash Cond: (mc.company_id = cn.id)
Buffers: shared hit=54971
Worker 0: actual time=49.496..350.360 rows=393049 loops=1
Buffers: shared hit=18231
Worker 1: actual time=49.708..348.897 rows=389949 loops=1
Buffers: shared hit=18641
-> Parallel Seq Scan on public.movie_companies mc (cost=0.00..29661.37 rows=1087137 width=12) (actual time=0.143..82.673 rows=869710 loop
s=3)
Output: mc.id, mc.movie_id, mc.company_id, mc.company_type_id, mc.note
Buffers: shared hit=37478
Worker 0: actual time=0.111..81.862 rows=860027 loops=1
Buffers: shared hit=12390
Worker 1: actual time=0.136..85.262 rows=892083 loops=1
Buffers: shared hit=12800
-> Hash (cost=5932.46..5932.46 rows=84612 width=4) (actual time=49.380..49.380 rows=84843 loops=3)
Output: cn.id
Buckets: 131072 Batches: 1 Memory Usage: 4007kB
Buffers: shared hit=17433
Worker 0: actual time=48.959..48.959 rows=84843 loops=1
Buffers: shared hit=5811
Worker 1: actual time=49.206..49.206 rows=84843 loops=1
Buffers: shared hit=5811
-> Seq Scan on public.company_name cn (cost=0.00..5932.46 rows=84612 width=4) (actual time=0.110..35.890 rows=84843 loops=3)
Output: cn.id
Filter: ((cn.country_code)::text = '[us]'::text)
Rows Removed by Filter: 150154
Buffers: shared hit=17433
Worker 0: actual time=0.058..35.900 rows=84843 loops=1
Buffers: shared hit=5811
Worker 1: actual time=0.126..35.363 rows=84843 loops=1
Buffers: shared hit=5811
-> Hash (cost=67601.90..67601.90 rows=1749378 width=21) (actual time=914.259..914.259 rows=1749032 loops=3)
Output: t.title, t.id
Buckets: 2097152 Batches: 1 Memory Usage: 110019kB
Buffers: shared hit=215514
Worker 0: actual time=909.719..909.719 rows=1749032 loops=1
Buffers: shared hit=71838
Worker 1: actual time=917.516..917.516 rows=1749032 loops=1
Buffers: shared hit=71838
-> Seq Scan on public.title t (cost=0.00..67601.90 rows=1749378 width=21) (actual time=0.099..446.134 rows=1749032 loops=3)
Output: t.title, t.id
Filter: (t.production_year > 1990)
Rows Removed by Filter: 779280
Buffers: shared hit=215514
Worker 0: actual time=0.117..418.654 rows=1749032 loops=1
Buffers: shared hit=71838
Worker 1: actual time=0.059..457.994 rows=1749032 loops=1
Buffers: shared hit=71838
-> Index Scan using movie_id_cast_info on public.cast_info ci (cost=0.56..1.00 rows=1 width=12) (actual time=0.015..0.026 rows=1 loops=596250)
Output: ci.id, ci.person_id, ci.movie_id, ci.person_role_id, ci.note, ci.nr_order, ci.role_id
Index Cond: (ci.movie_id = mc.movie_id)
Filter: (ci.note ~~ '%(producer)%'::text)
Rows Removed by Filter: 34
Buffers: shared hit=22767384
Worker 0: actual time=0.015..0.025 rows=1 loops=203090
Buffers: shared hit=7765500
Worker 1: actual time=0.015..0.026 rows=1 loops=197823
Buffers: shared hit=7542415
-> Index Scan using char_name_pkey on public.char_name chn (cost=0.43..0.48 rows=1 width=20) (actual time=0.000..0.000 rows=0 loops=782104)
Output: chn.id, chn.name, chn.imdb_index, chn.imdb_id, chn.name_pcode_nf, chn.surname_pcode, chn.md5sum
Index Cond: (chn.id = ci.person_role_id)
Buffers: shared hit=42
Worker 0: actual time=0.000..0.000 rows=0 loops=263841
Buffers: shared hit=9
Worker 1: actual time=0.000..0.000 rows=0 loops=264690
Buffers: shared hit=17
-> Index Only Scan using company_type_pkey on public.company_type ct (cost=0.13..0.15 rows=1 width=4) (actual time=0.010..0.010 rows=1 loops=10)
Output: ct.id
Index Cond: (ct.id = mc.company_type_id)
Heap Fetches: 4
Buffers: shared hit=22
Worker 0: actual time=0.019..0.019 rows=1 loops=2
Buffers: shared hit=5
Worker 1: actual time=0.010..0.010 rows=1 loops=4
Buffers: shared hit=9
-> Index Only Scan using role_type_pkey on public.role_type rt (cost=0.14..0.15 rows=1 width=4) (actual time=0.006..0.006 rows=1 loops=10)
Output: rt.id
Index Cond: (rt.id = ci.role_id)
Heap Fetches: 4
Buffers: shared hit=22
Worker 0: actual time=0.008..0.008 rows=1 loops=2
Buffers: shared hit=5
Worker 1: actual time=0.006..0.006 rows=1 loops=4
Buffers: shared hit=9
Planning time: 11.438 ms
Execution time: 6698.929 ms
Plan Hash: -1203362165
Using aqo: true
(146 rows)

ERROR: could not find block containing chunk

The error above is raised when attempting to EXPLAIN ANALYZE a statement (shown below) that is being managed by aqo. If aqo.mode is set to 'disabled', then the error no longer appears and the EXPLAIN ANALYZE completes successfully. If aqo.mode is subsequently reset, the error reappears. This is currently reproducible.

In case there is some sort of interaction here, I'll mention that pg_stat_statements and apg_plan_mgmt were also overriding the ExecutorEnd hook in this test, with shared_preload_libraries listing them in the order pg_stat_statements,aqo,apg_plan_mgmt.

imdbload=# EXPLAIN (ANALYZE) SELECT MIN(n.name) AS member_in_charnamed_movie FROM cast_info AS ci, company_name AS cn, keyword AS k, movie_companies AS mc, movie_keyword AS mk, name AS n, title AS t WHERE k.keyword ='character-name-in-title' AND n.name LIKE '%Bert%' AND n.id = ci.person_id AND ci.movie_id = t.id AND t.id = mk.movie_id AND mk.keyword_id = k.id AND t.id = mc.movie_id AND mc.company_id = cn.id AND ci.movie_id = mc.movie_id AND ci.movie_id = mk.movie_id AND mc.movie_id = mk.movie_id;
ERROR: could not find block containing chunk 0x2b625af3cd80
imdbload=# set aqo.mode = 'disable';
ERROR: invalid value for parameter "aqo.mode": "disable"
HINT: Available values: intelligent, forced, controlled, learn, disabled.
imdbload=# set aqo.mode = 'disabled';
SET
imdbload=# EXPLAIN (ANALYZE) SELECT MIN(n.name) AS member_in_charnamed_movie FROM cast_info AS ci, company_name AS cn, keyword AS k, movie_companies AS mc, movie_keyword AS mk, name AS n, title AS t WHERE k.keyword ='character-name-in-title' AND n.name LIKE '%Bert%' AND n.id = ci.person_id AND ci.movie_id = t.id AND t.id = mk.movie_id AND mk.keyword_id = k.id AND t.id = mc.movie_id AND mc.company_id = cn.id AND ci.movie_id = mc.movie_id AND ci.movie_id = mk.movie_id AND mc.movie_id = mk.movie_id;
QUERY PLAN
Aggregate (cost=2938.32..2938.33 rows=1 width=32) (actual time=4747.865..4747.865 rows=1 loops=1)
-> Nested Loop (cost=2.71..2938.31 rows=1 width=15) (actual time=26.390..4744.284 rows=11538 loops=1)
-> Nested Loop (cost=2.28..2937.87 rows=1 width=27) (actual time=26.380..4718.974 rows=11538 loops=1)
-> Nested Loop (cost=1.86..2937.43 rows=1 width=31) (actual time=26.365..4683.451 rows=11538 loops=1)
-> Nested Loop (cost=1.43..2936.94 rows=1 width=23) (actual time=26.357..4668.745 rows=1972 loops=1)
-> Nested Loop (cost=1.00..2698.36 rows=528 width=12) (actual time=0.653..1276.244 rows=1038393 loops=1)
-> Nested Loop (cost=0.43..2662.48 rows=34 width=4) (actual time=0.644..39.356 rows=41840 loops=1)
-> Seq Scan on keyword k (cost=0.00..2626.12 rows=1 width=4) (actual time=0.628..10.811 rows=1 loops=1)
Filter: (keyword = 'character-name-in-title'::text)
Rows Removed by Filter: 134169
-> Index Scan using keyword_id_movie_keyword on movie_keyword mk (cost=0.43..36.02 rows=34 width=8) (actual time=0.012..22.448 rows=41840 loops=1)
Index Cond: (keyword_id = k.id)
-> Index Scan using movie_id_cast_info on cast_info ci (cost=0.56..0.91 rows=15 width=8) (actual time=0.004..0.026 rows=25 loops=41840)
Index Cond: (movie_id = mk.movie_id)
-> Index Scan using name_pkey on name n (cost=0.43..0.45 rows=1 width=19) (actual time=0.003..0.003 rows=0 loops=1038393)
Index Cond: (id = ci.person_id)
Filter: (name ~~ '%Bert%'::text)
Rows Removed by Filter: 1
-> Index Scan using movie_id_movie_companies on movie_companies mc (cost=0.43..0.47 rows=2 width=8) (actual time=0.004..0.006 rows=6 loops=1972)
Index Cond: (movie_id = ci.movie_id)
-> Index Only Scan using company_name_pkey on company_name cn (cost=0.42..0.44 rows=1 width=4) (actual time=0.003..0.003 rows=1 loops=11538)
Index Cond: (id = mc.company_id)
Heap Fetches: 11538
-> Index Only Scan using title_pkey on title t (cost=0.43..0.45 rows=1 width=4) (actual time=0.002..0.002 rows=1 loops=11538)
Index Cond: (id = ci.movie_id)
Heap Fetches: 11538
Planning time: 6.762 ms
Execution time: 4747.965 ms
(28 rows)
imdbload=#

Recommended way to delete queries

What is the recommended way to delete queries that a user doesn't want to track any more? Do you recommend just deleting unwanted rows from aqo_query_texts and aqo_queries, or is there an API for that?

AQO doesn't learn in plpgsql script

nulls.sql.zip

AQO state for IndexScan doesn't contain Index column

explore_query_plan.sql.zip

Index Scan using person_pkey on person (cost=0.29..3226.29 rows=9738 width=8) (actual rows=9738 loops=1) (AQO: cardinality=9738, error=0%, fss hash = -1150030445)

SELECT * FROM aqo_data WHERE fsspace_hash=-1150030445;
fspace_hash | fsspace_hash | nfeatures | features | targets
-------------+--------------+-----------+----------+---------
1916361181 | -1150030445 | 0 | | {0.01}

Index Only Scan using person_id_age_idx on public.person (cost=0.29..8.31 rows=1 width=8) (actual rows=1 loops=10) (AQO: cardinality=1, error=0%, fss hash = 1972255378)

SELECT * FROM aqo_data WHERE fsspace_hash=1972255378;
fspace_hash | fsspace_hash | nfeatures | features | targets
-------------+--------------+-----------+-------------------------+---------
1916361181 | 1972255378 | 1 | {{-11.512925464970229}} | {0}

Hook calling order

The copy_generic_path_info_hook currently performs its action first, and then at the end checks to see if a prev_copy_generic_path_info_hook was defined, and (if a previous is defined) then does that previous action last. The test and set should occur at the start of aqo_prev_copy_generic_path_info so that the order of execution follows the order of execution as specified in shared_preload_libraries in the event that more than one hook override is specified.

A sequence of hook callbacks following the call-my-predecessor-first convention would call the hooks in the order specified in shared_preload_libraries.

Getting join cardinality convergence on a difficult query

The Join Order Benchmark contains the difficult query shown below. AQO was able to improve the cardinality estimates enough to improve this plan from 40 seconds to 30 seconds on an AWS r4.large, but as you can see from the plan below, the cardinality estimates are still quite far from the actuals at the upper levels of the plan after more than 10 iterations.

If you are able to get this plan to converge, can you describe how you were able to do it? What is the best known plan for this query?

I tried adding a pg_hint_plan rows hint at one point to see if this would help aqo get past a stuck point, but this made the plan much worse (60 sec) and again did not converge from that point after multiple iterations. Is it helpful or harmful to attempt to "help" aqo by specifying a rows hint, in general?

Thank you,

/Jim Finnerty

================================

imdbload=# explain (analyze) SELECT MIN(n.name) AS voicing_actress, MIN(t.title) AS jap_engl_voiced_movie FROM aka_name AS an, char_name AS chn, cast_info AS ci, company_name AS cn, info_type AS it, movie_companies AS mc, movie_info AS mi, name AS n, role_type AS rt, title AS t WHERE ci.note in ('(voice)', '(voice: Japanese version)', '(voice) (uncredited)', '(voice: English version)') AND cn.country_code ='[us]' AND it.info = 'release dates' AND n.gender ='f' AND rt.role ='actress' AND t.production_year > 2000 AND t.id = mi.movie_id AND t.id = mc.movie_id AND t.id = ci.movie_id AND mc.movie_id = ci.movie_id AND mc.movie_id = mi.movie_id AND mi.movie_id = ci.movie_id AND cn.id = mc.company_id AND it.id = mi.info_type_id AND n.id = ci.person_id AND rt.id = ci.role_id AND n.id = an.person_id AND ci.person_id = an.person_id AND chn.id = ci.person_role_id;
QUERY PLAN

Finalize Aggregate (cost=320321.28..320321.29 rows=1 width=64) (actual time=30947.238..30947.238 rows=1 loops=1)
-> Gather (cost=320321.06..320321.27 rows=2 width=64) (actual time=26312.904..30948.645 rows=3 loops=1)
Workers Planned: 2
Workers Launched: 2
-> Partial Aggregate (cost=319321.06..319321.07 rows=1 width=64) (actual time=27533.687..27533.687 rows=1 loops=3)
-> Hash Join (cost=266398.43..315653.05 rows=733602 width=32) (actual time=14696.038..26887.515 rows=586882 loops=3)
Hash Cond: (mi.info_type_id = it.id)
-> Nested Loop (cost=266396.00..315336.23 rows=115441 width=36) (actual time=14695.834..25412.125 rows=3447230 loops=3)
Join Filter: (t.id = mi.movie_id)
-> Nested Loop (cost=266395.57..307771.28 rows=3908 width=44) (actual time=14695.811..17634.932 rows=88661 loops=3)
-> Nested Loop (cost=266395.14..300561.72 rows=13115 width=52) (actual time=14694.376..17106.971 rows=32440 loops=3)
-> Nested Loop (cost=266394.71..294819.73 rows=7332 width=33) (actual time=14694.349..16690.333 rows=32462 loops=3)
-> Nested Loop (cost=266394.28..285195.83 rows=14323 width=37) (actual time=14694.315..16280.735 rows=34797 loops=3)
-> Merge Join (cost=266393.85..271532.82 rows=26327 width=16) (actual time=14694.278..15562.276 rows=63185 loops=3)
Merge Cond: (mc.movie_id = ci.movie_id)
-> Sort (cost=91113.37..92107.56 rows=397675 width=4) (actual time=2670.659..2864.283 rows=384599 loops=3)
Sort Key: mc.movie_id
Sort Method: external merge Disk: 4920kB
-> Hash Join (cost=7342.99..48688.15 rows=397675 width=4) (actual time=262.085..1904.540 rows=384599 loops=3)
Hash Cond: (mc.company_id = cn.id)
-> Parallel Seq Scan on movie_companies mc (cost=0.00..29661.37 rows=1087137 width=8) (actual time=0.224..667.425 rows=869710 loops=3)
-> Hash (cost=5932.46..5932.46 rows=85962 width=4) (actual time=260.817..260.817 rows=84843 loops=3)
Buckets: 131072 Batches: 2 Memory Usage: 2525kB
-> Seq Scan on company_name cn (cost=0.00..5932.46 rows=85962 width=4) (actual time=0.197..194.391 rows=84843 loops=3)
Filter: ((country_code)::text = '[us]'::text)
Rows Removed by Filter: 150154
-> Materialize (cost=175280.28..176661.11 rows=276166 width=12) (actual time=12018.184..12343.888 rows=305714 loops=3)
-> Sort (cost=175280.28..175970.70 rows=276166 width=12) (actual time=12018.176..12199.636 rows=276140 loops=3)
Sort Key: ci.movie_id
Sort Method: external merge Disk: 5888kB
-> Nested Loop (cost=0.56..145600.04 rows=276166 width=12) (actual time=0.708..11363.055 rows=276166 loops=3)
-> Seq Scan on role_type rt (cost=0.00..1.15 rows=1 width=4) (actual time=0.027..0.033 rows=1 loops=3)
Filter: ((role)::text = 'actress'::text)
Rows Removed by Filter: 11
-> Index Scan using role_id_cast_info on cast_info ci (cost=0.56..142837.23 rows=276166 width=16) (actual time=0.676..11244.881 rows=276166 loops=3)
Index Cond: (role_id = rt.id)
Filter: (note = ANY ('{(voice),"(voice: Japanese version)","(voice) (uncredited)","(voice: English version)"}'::text[]))
Rows Removed by Filter: 7175807
-> Index Scan using title_pkey on title t (cost=0.43..0.52 rows=1 width=21) (actual time=0.010..0.010 rows=1 loops=189554)
Index Cond: (id = mc.movie_id)
Filter: (production_year > 2000)
Rows Removed by Filter: 0
-> Index Only Scan using char_name_pkey on char_name chn (cost=0.43..0.67 rows=1 width=4) (actual time=0.010..0.010 rows=1 loops=104391)
Index Cond: (id = ci.person_role_id)
Heap Fetches: 27791
-> Index Scan using name_pkey on name n (cost=0.43..0.78 rows=1 width=19) (actual time=0.011..0.011 rows=1 loops=97386)
Index Cond: (id = ci.person_id)
Filter: ((gender)::text = 'f'::text)
Rows Removed by Filter: 0
-> Index Only Scan using person_id_aka_name on aka_name an (cost=0.42..0.52 rows=3 width=4) (actual time=0.010..0.014 rows=3 loops=97320)
Index Cond: (person_id = n.id)
Heap Fetches: 55315
-> Index Scan using movie_id_movie_info on movie_info mi (cost=0.43..1.45 rows=39 width=8) (actual time=0.013..0.061 rows=39 loops=265983)
Index Cond: (movie_id = mc.movie_id)
-> Hash (cost=2.41..2.41 rows=1 width=4) (actual time=0.058..0.058 rows=1 loops=3)
Buckets: 1024 Batches: 1 Memory Usage: 9kB
-> Seq Scan on info_type it (cost=0.00..2.41 rows=1 width=4) (actual time=0.023..0.052 rows=1 loops=3)
Filter: ((info)::text = 'release dates'::text)
Rows Removed by Filter: 112
Planning time: 203.974 ms
Execution time: 30954.839 ms
SQL Hash: -1864849711, Plan Hash: 1163570348
Using aqo: true
(63 rows)

query_id usage in bundle of AQO and pg_stat_statements (PGSS) extensions

In stable 14 and beyond we pretend to use the same query_id in both extensions.
Adding AQO TAP-tests on this feature we found out next issues:

AQO stores not-parameterized query texts.
In the case of EXPLAIN ANALYZE, query_id in the extensions are different
Also, we should check the case with queries in stored procedures

License?

There's no license information for this project which makes it rather hard for people to use it if they work in an environment where these things are of concern.

Could you add a license file somewhere under which this code is released?

WARNING: Unexpected number of features for hash

The following warnings were issued while explain-analyzing the query shown below from the Join Order Benchmark. This also caused postgres to crash. I'll delete data and re-try

imdbload=# explain (hashes, verbose, analyze, buffers) SELECT MIN(n.name) AS voicing_actress, MIN(t.title) AS jap_engl_voiced_movie FROM aka_name AS an, char_name AS chn, cast_info AS ci, company_name AS cn, info_type AS it, movie_companies AS mc, movie_info AS mi, name AS n, role_type AS rt, title AS t WHERE ci.note in ('(voice)', '(voice: Japanese version)', '(voice) (uncredited)', '(voice: English version)') AND cn.country_code ='[us]' AND it.info = 'release dates' AND n.gender ='f' AND rt.role ='actress' AND t.production_year > 2000 AND t.id = mi.movie_id AND t.id = mc.movie_id AND t.id = ci.movie_id AND mc.movie_id = ci.movie_id AND mc.movie_id = mi.movie_id AND mi.movie_id = ci.movie_id AND cn.id = mc.company_id AND it.id = mi.info_type_id AND n.id = ci.person_id AND rt.id = ci.role_id AND n.id = an.person_id AND ci.person_id = an.person_id AND chn.id = ci.person_role_id;
WARNING: unexpected number of features for hash (-101880755, -778540527): expected 0 features, obtained 8
WARNING: unexpected number of features for hash (-101880755, -1173058144): expected 7 features, obtained 6
WARNING: unexpected number of features for hash (-101880755, 1726969303): expected 9 features, obtained 8
WARNING: unexpected number of features for hash (-101880755, 1200086649): expected 8 features, obtained 6

Document dependency of aqo and pg_hint_plan

The relationship of pg_hint_plan Rows hints and aqo should be explored and documented, and if there is a dependency of the order of aqo and pg_hint_plan in shared_preload_libraries, this should be documented also.

Compilation Issues

I'm running into compilation issues during testing AQO.

Can you propose a release / tag of PostgreSQL and AQO which works together?

For example:

PostgreSQL release REL9_6_14 (https://github.com/postgres/postgres/releases)
AQO stable tag

How can i move aqo data from one db to another?

Our db have huge amount queries. For some postgres make good plans, for some - bad.
I want to use aqo on copy database. After it gather enough data for changing only required plans of queries, i want to copy aqo data to production db.
But i saw only views with names aqo*. Only views?
Can i copy views to another bd& And how?

postgrespro / aqo Goto Github PK

aqo's Introduction

aqo's People

Contributors

Stargazers

Watchers

Forkers

aqo's Issues

Recommend Projects

Recommend Topics

Recommend Org

Jobs