The project aimed to gain my skills of Data Engineering regarding Best Practice of Pipeline using DBT and Snowflake. DBT is a tool to build useful documentation of Data Pipeline process start with Data Raw Collection, Data Transforming and Data Mart.
- Instantiate dbt project directory namely
digitalskola_climate
- Connect to Snowflake
- Classify models : stagging, intermediate, mart in
digitalskola_climate/dbt_project.yml
- Configure
schema.yml
andsources.yml
- Run and learn behaviours of
dbt run -s <model>
,dbt build
,dbt test
,dbt deps
- Generate documentation and run the UI
- Implement best practice using macros
dbt_utils
likestar
macros
- Create python virtual environment
python3 -m venv env
- Install the requirements.txt
python3 -m pip install -r requirements.txt
- Initialize DBT project
dbt init
. Identifier account should be :<orgname>-<account_name>
- Choose keypair auth for more secure way to authenticate
- Generate key pair and setup public key for specific user in Snowflake https://docs.snowflake.com/en/user-guide/key-pair-auth
- Run your first model
dbt run -s my_first_dbt_model
- If database not found don't forget to create it first using
CREATE DATABASE <DB_NAME>;
- Setup profiles.yml https://docs.getdbt.com/docs/core/connect-data-platform/snowflake-setup
- Setup sources https://docs.getdbt.com/docs/build/sources
- Open the profiles.yml
- For the confidentials params/keys replace the value with
"{{ env_var('VAR_NAME') }}"
- Run Command
EXPORT VAR_NAME=<confidential_value>