awsdocs / amazon-redshift-developer-guide Goto Github PK

This is the documentation for the Amazon Redshift Developer Guide

License: Other

amazon-redshift-developer-guide's Introduction

NOTICE

This repository is archived, read-only, and no longer updated. For more information, read the announcement on the AWS News Blog.

You can find up-to-date AWS technical documentation on the AWS Documentation website, where you can also submit feedback and suggestions for improvement.

amazon-redshift-developer-guide's People

Contributors

Stargazers

Watchers

Forkers

harunpehlivan raknas999 ragavansvijay cxystras andy2db trevorrobertsjr neil-cloudbi warpspeed6 sameer-goel matthewmahgrefteh karthik2try shibinmatrix nabinn gibbsie changlees bszwej sangeeyeah abhishekanand mgalcult prabz swojtow47 jimmyboyle yuhonghong95721 jatin-kakani atharvai ofaakye allwinalexhd jelder johnnylvp feluelle geenkonzo hanson-wu ayodeleohh pakapi josesaribeiro upendram91 abov bstaudacher philnielsen stjordanis mjalkio apotl vashistkamal11 rlucas7 mc35792 phajduk nagu4dwh skokado sandipseal halama prdoyle yogjoshi2011 ngocson2vn george-hart jdelman naireeni andrew-nesan avivrotbart bogdan7900 calleo quangphu1912 armaseg matchablast justintm zinmin-htun zacharyrsmith jimmycfa franbulax zxwvrblv fundou bsfarias yash1697 gr-br nilawafer justinnaldzin pvbouwel sfc-gh-cshi eduardowillame githubbenrig44 syedyousufsohail nahaprasaath pruthvirajksuresh jbaehne magj madams-qqcw af-jameshd njoslyn seenu302 wordlesstruth mxloh landon-splitwise sweb olegfridman dfundakowski-c2fo lukaseder anilktechie ancamillo sasanahmadi adam-tokarski evie26

amazon-redshift-developer-guide's Issues

Small Spelling issue

The following sentence here:

The following example loads data from as Amazon EMR cluster

Did we meant to say:

The following example loads data from an Amazon EMR cluster

Thank you

description for table column position in stl_load_errors page is wrong

On this page: https://docs.aws.amazon.com/redshift/latest/dg/r_STL_LOAD_ERRORS.html

The description for table column position is "Position of the error in the field." when it should be "Position of the error in the raw_line" because on this page "the field" means "the table column", not "the raw_line".

queues and concurrency level

Hey, is the following statement true? I was told that the total concurrency for WLM regardless of number of queues is 50. Based on this statement, a WLM can support 400(=8*5) queries?

"You can define up to eight queues. Each queue can be configured with a maximum concurrency level of 50"

Using COPY to Load Data and Permissions

From reading the documentation:

I see the following: The role must have, at a minimum, the permissions listed in IAM Permissions for COPY, UNLOAD, and CREATE LIBRARY.

It would be nice to either have notes/pointers on either of these pages mentioning that if data is encrypted, permissions must also be on the KMS Key Policy and the IAM user or role needs to have KMS permissions as well.

https://docs.aws.amazon.com/redshift/latest/dg/loading-data-access-permissions.html
https://docs.aws.amazon.com/redshift/latest/dg/c_loading-encrypted-files.html

proof-of-concept-playbook.html is missing.

I found a little issue, but I didn't find the doc source.

Redshift tutorial links to s3 demo file with bad permissions that don't allow access

amazon-redshift-developer-guide/doc_source/tutorial-loading-data-download-files.md links to this URL: https://s3.amazonaws.com/awssampledb/LoadingDataSampleFiles.zip

... as part of the Tutorial. However the s3 download .zip file has file level or bucket level permissions that do not allow for the sample files to be downloaded.

Vacuum Delete not working even if Background vacuum and manual vacuum command is executed

We have found that the Vacuum delete operation under Full vacuum or vacuum command does not work when some orphan transaction ids are still active in the cluster.
These transactions IDs can be found running before the vacuum is executed and that skips the vacuum delete operation in order to maintain the integrity of the data which the active transaction might be using.
Using below SQL, we can find the PID and transaction which might still be actve:

select *,datediff(s,txn_start,getdate())/86400||' days '||datediff(s,txn_start,getdate())%86400/3600||' hrs '||datediff(s,txn_start,getdate())%3600/60||' mins '||datediff(s,txn_start,getdate())%60||' secs'
from svv_transactions where lockable_object_type='transactionid' and pid<>pg_backend_pid() order by 3;

This is important because sometimes after the Delete operation on a table, when we run vacuum on the same table and if there are any other transactions which are already running in the cluster at the same time, then vacuum delete will not execute and as a result the rows marked for deletion are not removed. As a result the next phase of the vacuum, i.e., vacuum sort will run longer since it will resort all the rows including rows marked for deletion. This will cause delay in the ETLs.
The possible way to avoid this is to use Truncate - Load and if that is not possible, then make sure nothing is running in parallel when the Vacuum is started.

The last column, "avg_request_parallelism" appears twice https://docs.aws.amazon.com/redshift/latest/dg/r_SVL_S3QUERY.html

The last column, "avg_request_parallelism", appears twice on the current doc page for the SVL_S3QUERY system log view. The link to the page is: https://docs.aws.amazon.com/redshift/latest/dg/r_SVL_S3QUERY.html

It looks like a simple editing typo. It would be nice to correct it when when the team gets a chance.

Switch data type name and alias for numeric and decimal

This page: https://docs.aws.amazon.com/redshift/latest/dg/c_Supported_data_types.html (GitHub: https://github.com/awsdocs/amazon-redshift-developer-guide/blob/master/doc_source/c_Supported_data_types.md ) shows the DECIMAL data type having alias NUMERIC. However, when you create decimal / numeric columns in the DB, the canonical name returned by the DB is "NUMERIC". Maybe switch the name and alias for this type in the docs?

dev=# create table brad_loves_numbers (dec DECIMAL(8,3), num DECIMAL(10,5));
CREATE TABLE

dev=# \d brad_loves_numbers
 Table "public.brad_loves_numbers"
 Column |     Type      | Modifiers
--------+---------------+-----------
 dec    | numeric(8,3)  |
 num    | numeric(10,5) |

dev=# select column_name, data_type from information_schema.columns where table_name LIKE 'brad_loves_numbers';
 column_name | data_type
-------------+-----------
 num         | numeric
 dec         | numeric
(2 rows)

Unsupported Type

I'm getting an error while creating a table with the SUPER data type.

[Amazon](500310) Invalid operation: Column "table_name.column_name" has unsupported type "super".

I'm running the same script I used to run to create my database, but as of cluster version 1.0.23412 it no longer shows as supported when I try to use this data type.

service_class_name missing from doc'd columns in stl_wlm_query

Clicking the LoadingDataSampleFiles.zip link leads to AccessDenied error.

Unable to download the LoadingDataSampleFiles.zip file that is referred to on: https://docs.aws.amazon.com/redshift/latest/dg/tutorial-loading-data-download-files.html

When clicking the link, the error returned is:

Error is consistent across multiple browsers and ISPs.

Note about DECIMAL type limitation not clear for NUMERIC

Hi,
The Note included in this section states a limitation for the DECIMAL data type, but it seems a bit unclear that the limitation also applies for the NUMERIC data type.
Even if the docs indicates that both types are equivalent, it can be understood that the limitation can be avoided using the NUMERIC data type.
Regards

Is_integer function documentation needs update

As per the is_integer function's documentation, the function can expect a Column or A super. We ran into issue while we passed it a Varchar type field. Here is the code we ran

CREATE TEMP TABLE t(s char(100));
INSERT INTO t VALUES ('5');
SELECT s, is_integer(s) FROM t;

description for stl_query.starttime is wrong

It is "Time in UTC that the query started executing" but as seen by the below query, the time is actually before the query is put into a service class, let alone actually started executing:

select stl_query.starttime, stl_wlm_query.*
from stl_query
join stl_wlm_query
    using(query)
limit 1

Please change to "Time in UTC that the query started processing", "Time in UTC that the query was accepted", or be vague like other Redshift doc pages: "Time in UTC that the query started".

Similarly for endtime.

INTERVAL word is missing

Please add INTERVAL word to this list.

SHOW PROCEDURE not displaying procedure definition

I can create (and replace) a store procedure correctly but the definition can't be retrieved by using SHOW PROCEDURE sp_name or SHOW PROCEDURE sp_name(arg_name arg_type,...)

Only the following is displayed:
0 rows affected
SHOW executed successfully

I am able to find it exists with the following query but how to see the definition?

SELECT *
FROM PG_PROC_INFO
WHERE proowner<>1

`ALTER FUNCTION` is missing

The following SQL seems to be working, but it is undocumented.

ALTER FUNCTION function_name OWNER TO new_owner

awsdocs / amazon-redshift-developer-guide Goto Github PK

amazon-redshift-developer-guide's Introduction

NOTICE

amazon-redshift-developer-guide's People

Contributors

Stargazers

Watchers

Forkers

amazon-redshift-developer-guide's Issues

Recommend Projects

Recommend Topics

Recommend Org

Jobs