GithubHelp home page GithubHelp logo

snowflake-labs / snowpark-python-demos Goto Github PK

View Code? Open in Web Editor NEW
243.0 12.0 142.0 227.99 MB

This repository provides various demos/examples of using Snowpark for Python.

License: Apache License 2.0

Jupyter Notebook 95.59% Python 4.39% PLpgSQL 0.01%
python snowpark dataengineering datascience machine-learning

snowpark-python-demos's Introduction

Snowpark For Python Demos

This repository provides various demos/examples of using Snowpark for Python. Please navigate to each of the subfolders to learn more about a specific demo/example.

Snowpark For Python Overview

The Snowpark for Python library provides intuitive API for querying and processing data using DataFrames. Using this library, you can build applications that process data in Snowflake without having to first move data out of Snowflake. The library also enables data application developers to run complex transformations within Snowflake (using User-Defined Functions, User-Defined Table Functions, and Stored Procedures) while taking advantage of the built-in unlimited scalability, performance, governance and security features.

Snowpark For Python: API Source Code | Developer Guide | API Reference

Snowpark For Python QuickStart Guides

Developer Resources

For more resources, please visit Snowpark Day.

snowpark-python-demos's People

Contributors

ccarrero-sf avatar cginther-snowflake avatar fjkattan avatar iamontheinet avatar indexseek avatar jdanielmyers avatar jfielding1 avatar sfc-gh-dkaufman avatar sfc-gh-ejohnson avatar sfc-gh-ghernandez avatar sfc-gh-imehaddi avatar sfc-gh-mgorkow avatar sfc-gh-mstellwall avatar sfc-gh-praj avatar sfc-gh-scoombes avatar sfc-gh-skhara avatar sfc-gh-twhite avatar sfc-gh-vbatra avatar sfc-gh-vshiv avatar vinodhini-sd avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

snowpark-python-demos's Issues

Argument mismatch error

I am trying to wrap the model training as part of a stored procedure in customer spend prediction.
However, I get the below error - I am unable to debug this
image

Below is the code for the model training definition and SP registration
image

No Loading of Data in External table

For the retail-churn-analytics demo
The data doesn't get copied from the S3 bucket to the external tables created.

CREATE OR REPLACE EXTERNAL TABLE SRC_CUSTOMER
(CUSTOMER_ID VARCHAR(40) as (value:c1::varchar),
CREATED_DT DATE as (value:c2::date),
CITY VARCHAR(40) as (value:c3::varchar),
STATE VARCHAR(2) as (value:c4::varchar),
FAV_DELIVERY_DAY VARCHAR(40) as (value:c5::varchar),
REFILL NUMBER(38,0) as (value:c6::integer),
DOOR_DELIVERY NUMBER(38,0) as (value:c7::integer),
PAPERLESS NUMBER(38,0) as (value:c8::integer),
CUSTOMER_NAME VARCHAR(40) as (value:c9::varchar),
RETAINED NUMBER(38,0) as (value:c10::integer)
)
LOCATION = @churn_source_data/customer/
REFRESH_ON_CREATE = TRUE
AUTO_REFRESH = TRUE
FILE_FORMAT = ( TYPE = CSV SKIP_HEADER=1);

This is the exact syntax I have used.
@iamontheinet - please guide me on the same

Retail churn analytics data source is incorrect

The datasource specified does not contain the fields mentioned in the code

CREATE OR REPLACE EXTERNAL TABLE SRC_CUSTOMER
(CUSTOMER_ID VARCHAR(40) as (value:c1::varchar),
CREATED_DT DATE as (value:c2::date),
CITY VARCHAR(40) as (value:c3::varchar),
STATE VARCHAR(2) as (value:c4::varchar),
FAV_DELIVERY_DAY VARCHAR(40) as (value:c5::varchar),
REFILL NUMBER(38,0) as (value:c6::integer),
DOOR_DELIVERY NUMBER(38,0) as (value:c7::integer),
PAPERLESS NUMBER(38,0) as (value:c8::integer),
CUSTOMER_NAME VARCHAR(40) as (value:c9::varchar),
RETAINED NUMBER(38,0) as (value:c10::integer)
)

These datasets were generated for this demo using a Kaggle dataset below.

Reference: https://www.kaggle.com/uttamp/store-data

batch_predict_roi UDF input type error

Hello All,
In the Snowpark_For_Python.ipynb demo, the batch_predict_roi UDF returns the error below. I assume this is because of the
expectation of a pandas df in the function:
def batch_predict_roi(budget_allocations_df: PandasDataFrame[int, int, int, int]) -> PandasSeries[float]:
but instead, it gets an array. Any help in clearing this up would be appreciated.

https://github.com/Snowflake-Labs/snowpark-python-demos/blob/main/Advertising-Spend-ROI-Prediction/Snowpark_For_Python.ipynb

Failed to execute query [queryID: 01a80045-0004-3087-002b-9e87000e507e] SELECT "SEARCH_ENGINE", "SOCIAL_MEDIA", "VIDEO", "EMAIL", batch_predict_roi(array_construct("SEARCH_ENGINE", "SOCIAL_MEDIA", "VIDEO", "EMAIL")) AS "PREDICTED_ROI" FROM ( SELECT * FROM ( VALUES (250000 :: INT, 250000 :: INT, 200000 :: INT, 450000 :: INT), (500000 :: INT, 500000 :: INT, 500000 :: INT, 500000 :: INT), (8500 :: INT, 9500 :: INT, 2000 :: INT, 500 :: INT) AS SNOWPARK_TEMP_TABLE_H7ZD6LZPZ4("SEARCH_ENGINE", "SOCIAL_MEDIA", "VIDEO", "EMAIL"))) LIMIT 10 001044 (42P13): SQL compilation error: error line 1 at position 58 Invalid argument types for function 'BATCH_PREDICT_ROI': (ARRAY)

Readme setup instructions `conda`

For https://github.com/Snowflake-Labs/snowpark-python-demos/tree/main/Advertising-Spend-ROI-Prediction
Setup instructions says use pip install conda then run conda ... but get following error:

conda
ERROR: The install method you used for conda--probably either `pip install conda`
or `easy_install conda`--is not compatible with using conda as an application.
If your intention is to install conda as a standalone application, currently
supported install methods include the Anaconda installer and the miniconda
installer.  You can download the miniconda installer from
https://conda.io/miniconda.html.

Therefore should update the instructions to remove pip install conda and install via miniconda instead

UDF Error with Credit Card Fraud Detection

I am running through credit card fraud detection snowpark exercises, everything looks good except I am getting an error when i try to use query with the new UDF.

Failed Query In Snowsight:
SELECT TRANSACTION_ID, TX_DATETIME, CUSTOMER_ID, TERMINAL_ID, TX_AMOUNT ,detect_fraud_batch_udf(TX_AMOUNT,TX_DURING_WEEKEND, TX_DURING_NIGHT, CUST_CNT_TX_1, CUST_AVG_AMOUNT_1, CUST_CNT_TX_7, CUST_AVG_AMOUNT_7, CUST_CNT_TX_30,CUST_AVG_AMOUNT_30, NB_TX_WINDOW_1, TERM_RISK_1, NB_TX_WINDOW_7,TERM_RISK_7, NB_TX_WINDOW_30,TERM_RISK_30) AS FRAUD_PROB
FROM CUSTOMER_TRX_FRAUD_FEATURES
WHERE TX_DATETIME > '2019-07-15 00:00:00' LIMIT 10;

100357 (P0000): Python Interpreter Error:
Traceback (most recent call last):
File "_udf_code.py", line 32, in compute
File "_udf_code.py", line 21, in wrapper
File "/var/folders/ck/ll2bz1_s3ng7w67zf6mdvqh40000gn/T/ipykernel_77660/546210058.py", line 17, in detect_fraud_batch
File "/Users/hayan/opt/anaconda3/envs/snowpark_070/lib/python3.8/site-packages/cachetools/init.py", line 641, in wrapper
File "/var/folders/ck/ll2bz1_s3ng7w67zf6mdvqh40000gn/T/ipykernel_77660/546210058.py", line 10, in read_file
File "/usr/lib/python_udf/5dd4a97c20bf6b6243e739c66c4fbfa80ab53172655f7f15cf1c55d0f462ae66/lib/python3.8/site-packages/joblib/numpy_pickle.py", line 577, in load
obj = _unpickle(fobj)
File "/usr/lib/python_udf/5dd4a97c20bf6b6243e739c66c4fbfa80ab53172655f7f15cf1c55d0f462ae66/lib/python3.8/site-packages/joblib/numpy_pickle.py", line 506, in _unpickle
obj = unpickler.load()
File "/usr/lib/python_udf/5dd4a97c20bf6b6243e739c66c4fbfa80ab53172655f7f15cf1c55d0f462ae66/lib/python3.8/pickle.py", line 1212, in load
dispatchkey[0]
KeyError: 255
in function DETECT_FRAUD_BATCH_UDF with handler compute

Snowflake BUILD 2022: Sentiment Analysis Demo notebook is incomplete

@ sfc-gh-scoombes

After creating the various stages and uploading 2 files to the stages, the notebook goes straight to querying the table TRAINING_DATA. But this table was never created? Or am I missing something??

# create the stage for python and model data
session.sql('create stage if not exists scratch.raw_data').collect()
session.sql('create stage if not exists scratch.model_data').collect()
session.sql('create stage if not exists scratch.python_load').collect()

# create the directory stage for the data
session.sql('create stage if not exists scratch.raw_data_stage directory = (enable = true)').collect()

# upload the unstructured file and stop words to the stages
session.file.put('reviews__0_0_0.dat','@scratch.raw_data_stage',auto_compress=False)
session.file.put('en_core_web_sm.zip','@scratch.model_data')

# refresh the stage
session.sql('alter stage scratch.raw_data_stage refresh').collect()

session.table("TRAINING_DATA").show(30)

Thanks!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.