snowflake-labs / snowpark-python-demos Goto Github PK

This repository provides various demos/examples of using Snowpark for Python.

License: Apache License 2.0

Jupyter Notebook 95.59% Python 4.39% PLpgSQL 0.01%

python snowpark dataengineering datascience machine-learning

snowpark-python-demos's Issues

create setup.py

This will allow easy direct install via pip and in requirements.txt files without having to specify a directory on your PYTHONPATH to git clone the repo to.

Simple example: https://github.com/jeslago/epftoolbox/blob/master/setup.py

batch_predict_roi UDF input type error

Hello All,
In the Snowpark_For_Python.ipynb demo, the batch_predict_roi UDF returns the error below. I assume this is because of the
expectation of a pandas df in the function:
def batch_predict_roi(budget_allocations_df: PandasDataFrame[int, int, int, int]) -> PandasSeries[float]:
but instead, it gets an array. Any help in clearing this up would be appreciated.

https://github.com/Snowflake-Labs/snowpark-python-demos/blob/main/Advertising-Spend-ROI-Prediction/Snowpark_For_Python.ipynb

Failed to execute query [queryID: 01a80045-0004-3087-002b-9e87000e507e] SELECT "SEARCH_ENGINE", "SOCIAL_MEDIA", "VIDEO", "EMAIL", batch_predict_roi(array_construct("SEARCH_ENGINE", "SOCIAL_MEDIA", "VIDEO", "EMAIL")) AS "PREDICTED_ROI" FROM ( SELECT * FROM ( VALUES (250000 :: INT, 250000 :: INT, 200000 :: INT, 450000 :: INT), (500000 :: INT, 500000 :: INT, 500000 :: INT, 500000 :: INT), (8500 :: INT, 9500 :: INT, 2000 :: INT, 500 :: INT) AS SNOWPARK_TEMP_TABLE_H7ZD6LZPZ4("SEARCH_ENGINE", "SOCIAL_MEDIA", "VIDEO", "EMAIL"))) LIMIT 10 001044 (42P13): SQL compilation error: error line 1 at position 58 Invalid argument types for function 'BATCH_PREDICT_ROI': (ARRAY)

Retail churn analytics data source is incorrect

The datasource specified does not contain the fields mentioned in the code

CREATE OR REPLACE EXTERNAL TABLE SRC_CUSTOMER
(CUSTOMER_ID VARCHAR(40) as (value:c1::varchar),
CREATED_DT DATE as (value:c2::date),
CITY VARCHAR(40) as (value:c3::varchar),
STATE VARCHAR(2) as (value:c4::varchar),
FAV_DELIVERY_DAY VARCHAR(40) as (value:c5::varchar),
REFILL NUMBER(38,0) as (value:c6::integer),
DOOR_DELIVERY NUMBER(38,0) as (value:c7::integer),
PAPERLESS NUMBER(38,0) as (value:c8::integer),
CUSTOMER_NAME VARCHAR(40) as (value:c9::varchar),
RETAINED NUMBER(38,0) as (value:c10::integer)
)

These datasets were generated for this demo using a Kaggle dataset below.

Reference: https://www.kaggle.com/uttamp/store-data

Argument mismatch error

I am trying to wrap the model training as part of a stored procedure in customer spend prediction.
However, I get the below error - I am unable to debug this

Below is the code for the model training definition and SP registration

Snowflake BUILD 2022: Sentiment Analysis Demo notebook is incomplete

@ sfc-gh-scoombes

After creating the various stages and uploading 2 files to the stages, the notebook goes straight to querying the table TRAINING_DATA. But this table was never created? Or am I missing something??

# create the stage for python and model data
session.sql('create stage if not exists scratch.raw_data').collect()
session.sql('create stage if not exists scratch.model_data').collect()
session.sql('create stage if not exists scratch.python_load').collect()

# create the directory stage for the data
session.sql('create stage if not exists scratch.raw_data_stage directory = (enable = true)').collect()

# upload the unstructured file and stop words to the stages
session.file.put('reviews__0_0_0.dat','@scratch.raw_data_stage',auto_compress=False)
session.file.put('en_core_web_sm.zip','@scratch.model_data')

# refresh the stage
session.sql('alter stage scratch.raw_data_stage refresh').collect()

session.table("TRAINING_DATA").show(30)

Thanks!

invalid path 'CI:CD for Machine Learning using GitHub Actions/README.md'`

Hi,

You might want to consider removing ":" from this folder.

Following this tutorial on Windows machine gives:

error: invalid path 'CI:CD for Machine Learning using GitHub Actions/README.md'

Inquiry on webinar

@iamontheinet
@sfc-gh-ejohnson

Are there any recordings, or webinars that utilize this use case? If so, can you please share the link?

Thank you!

No Loading of Data in External table

For the retail-churn-analytics demo
The data doesn't get copied from the S3 bucket to the external tables created.

This is the exact syntax I have used.
@iamontheinet - please guide me on the same

Readme setup instructions `conda`

For https://github.com/Snowflake-Labs/snowpark-python-demos/tree/main/Advertising-Spend-ROI-Prediction
Setup instructions says use pip install conda then run conda ... but get following error:

conda
ERROR: The install method you used for conda--probably either `pip install conda`
or `easy_install conda`--is not compatible with using conda as an application.
If your intention is to install conda as a standalone application, currently
supported install methods include the Anaconda installer and the miniconda
installer.  You can download the miniconda installer from
https://conda.io/miniconda.html.

Therefore should update the instructions to remove pip install conda and install via miniconda instead

UDF Error with Credit Card Fraud Detection

I am running through credit card fraud detection snowpark exercises, everything looks good except I am getting an error when i try to use query with the new UDF.

Failed Query In Snowsight:
SELECT TRANSACTION_ID, TX_DATETIME, CUSTOMER_ID, TERMINAL_ID, TX_AMOUNT ,detect_fraud_batch_udf(TX_AMOUNT,TX_DURING_WEEKEND, TX_DURING_NIGHT, CUST_CNT_TX_1, CUST_AVG_AMOUNT_1, CUST_CNT_TX_7, CUST_AVG_AMOUNT_7, CUST_CNT_TX_30,CUST_AVG_AMOUNT_30, NB_TX_WINDOW_1, TERM_RISK_1, NB_TX_WINDOW_7,TERM_RISK_7, NB_TX_WINDOW_30,TERM_RISK_30) AS FRAUD_PROB
FROM CUSTOMER_TRX_FRAUD_FEATURES
WHERE TX_DATETIME > '2019-07-15 00:00:00' LIMIT 10;

100357 (P0000): Python Interpreter Error:
Traceback (most recent call last):
File "_udf_code.py", line 32, in compute
File "_udf_code.py", line 21, in wrapper
File "/var/folders/ck/ll2bz1_s3ng7w67zf6mdvqh40000gn/T/ipykernel_77660/546210058.py", line 17, in detect_fraud_batch
File "/Users/hayan/opt/anaconda3/envs/snowpark_070/lib/python3.8/site-packages/cachetools/init.py", line 641, in wrapper
File "/var/folders/ck/ll2bz1_s3ng7w67zf6mdvqh40000gn/T/ipykernel_77660/546210058.py", line 10, in read_file
File "/usr/lib/python_udf/5dd4a97c20bf6b6243e739c66c4fbfa80ab53172655f7f15cf1c55d0f462ae66/lib/python3.8/site-packages/joblib/numpy_pickle.py", line 577, in load
obj = _unpickle(fobj)
File "/usr/lib/python_udf/5dd4a97c20bf6b6243e739c66c4fbfa80ab53172655f7f15cf1c55d0f462ae66/lib/python3.8/site-packages/joblib/numpy_pickle.py", line 506, in _unpickle
obj = unpickler.load()
File "/usr/lib/python_udf/5dd4a97c20bf6b6243e739c66c4fbfa80ab53172655f7f15cf1c55d0f462ae66/lib/python3.8/pickle.py", line 1212, in load
dispatchkey[0]
KeyError: 255
in function DETECT_FRAUD_BATCH_UDF with handler compute

Autogluon is now part of Anaconda's distribution but doesnt appear in Snowpark?

autogluon/autogluon#612

it seems it was added back in February to conda - is Snowpark using an older Anaconda distribution? Would be great to be able to include this simply without having to download each of the wheels from PyPi and installing that way?

Execution failure

Got a Python Interpreter Error-
@sfc-gh-ejohnson Please help with the same.

Thank you!

snowflake-labs / snowpark-python-demos Goto Github PK

snowpark-python-demos's Issues

create setup.py

batch_predict_roi UDF input type error

Retail churn analytics data source is incorrect

Argument mismatch error

Snowflake BUILD 2022: Sentiment Analysis Demo notebook is incomplete

invalid path 'CI:CD for Machine Learning using GitHub Actions/README.md'`

Inquiry on webinar

No Loading of Data in External table

Readme setup instructions `conda`

UDF Error with Credit Card Fraud Detection

Autogluon is now part of Anaconda's distribution but doesnt appear in Snowpark?

Execution failure

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs