snowflake-labs / snowpark-python-demos Goto Github PK
View Code? Open in Web Editor NEWThis repository provides various demos/examples of using Snowpark for Python.
License: Apache License 2.0
This repository provides various demos/examples of using Snowpark for Python.
License: Apache License 2.0
This will allow easy direct install via pip and in requirements.txt files without having to specify a directory on your PYTHONPATH to git clone the repo to.
Simple example: https://github.com/jeslago/epftoolbox/blob/master/setup.py
Hello All,
In the Snowpark_For_Python.ipynb
demo, the batch_predict_roi
UDF returns the error below. I assume this is because of the
expectation of a pandas df in the function:
def batch_predict_roi(budget_allocations_df: PandasDataFrame[int, int, int, int]) -> PandasSeries[float]:
but instead, it gets an array. Any help in clearing this up would be appreciated.
Failed to execute query [queryID: 01a80045-0004-3087-002b-9e87000e507e] SELECT "SEARCH_ENGINE", "SOCIAL_MEDIA", "VIDEO", "EMAIL", batch_predict_roi(array_construct("SEARCH_ENGINE", "SOCIAL_MEDIA", "VIDEO", "EMAIL")) AS "PREDICTED_ROI" FROM ( SELECT * FROM ( VALUES (250000 :: INT, 250000 :: INT, 200000 :: INT, 450000 :: INT), (500000 :: INT, 500000 :: INT, 500000 :: INT, 500000 :: INT), (8500 :: INT, 9500 :: INT, 2000 :: INT, 500 :: INT) AS SNOWPARK_TEMP_TABLE_H7ZD6LZPZ4("SEARCH_ENGINE", "SOCIAL_MEDIA", "VIDEO", "EMAIL"))) LIMIT 10 001044 (42P13): SQL compilation error: error line 1 at position 58 Invalid argument types for function 'BATCH_PREDICT_ROI': (ARRAY)
The datasource specified does not contain the fields mentioned in the code
CREATE OR REPLACE EXTERNAL TABLE SRC_CUSTOMER
(CUSTOMER_ID VARCHAR(40) as (value:c1::varchar),
CREATED_DT DATE as (value:c2::date),
CITY VARCHAR(40) as (value:c3::varchar),
STATE VARCHAR(2) as (value:c4::varchar),
FAV_DELIVERY_DAY VARCHAR(40) as (value:c5::varchar),
REFILL NUMBER(38,0) as (value:c6::integer),
DOOR_DELIVERY NUMBER(38,0) as (value:c7::integer),
PAPERLESS NUMBER(38,0) as (value:c8::integer),
CUSTOMER_NAME VARCHAR(40) as (value:c9::varchar),
RETAINED NUMBER(38,0) as (value:c10::integer)
)
These datasets were generated for this demo using a Kaggle dataset below.
Reference: https://www.kaggle.com/uttamp/store-data
@ sfc-gh-scoombes
After creating the various stages and uploading 2 files to the stages, the notebook goes straight to querying the table TRAINING_DATA. But this table was never created? Or am I missing something??
# create the stage for python and model data
session.sql('create stage if not exists scratch.raw_data').collect()
session.sql('create stage if not exists scratch.model_data').collect()
session.sql('create stage if not exists scratch.python_load').collect()
# create the directory stage for the data
session.sql('create stage if not exists scratch.raw_data_stage directory = (enable = true)').collect()
# upload the unstructured file and stop words to the stages
session.file.put('reviews__0_0_0.dat','@scratch.raw_data_stage',auto_compress=False)
session.file.put('en_core_web_sm.zip','@scratch.model_data')
# refresh the stage
session.sql('alter stage scratch.raw_data_stage refresh').collect()
session.table("TRAINING_DATA").show(30)
Thanks!
@iamontheinet
@sfc-gh-ejohnson
Are there any recordings, or webinars that utilize this use case? If so, can you please share the link?
Thank you!
For the retail-churn-analytics demo
The data doesn't get copied from the S3 bucket to the external tables created.
CREATE OR REPLACE EXTERNAL TABLE SRC_CUSTOMER
(CUSTOMER_ID VARCHAR(40) as (value:c1::varchar),
CREATED_DT DATE as (value:c2::date),
CITY VARCHAR(40) as (value:c3::varchar),
STATE VARCHAR(2) as (value:c4::varchar),
FAV_DELIVERY_DAY VARCHAR(40) as (value:c5::varchar),
REFILL NUMBER(38,0) as (value:c6::integer),
DOOR_DELIVERY NUMBER(38,0) as (value:c7::integer),
PAPERLESS NUMBER(38,0) as (value:c8::integer),
CUSTOMER_NAME VARCHAR(40) as (value:c9::varchar),
RETAINED NUMBER(38,0) as (value:c10::integer)
)
LOCATION = @churn_source_data/customer/
REFRESH_ON_CREATE = TRUE
AUTO_REFRESH = TRUE
FILE_FORMAT = ( TYPE = CSV SKIP_HEADER=1);
This is the exact syntax I have used.
@iamontheinet - please guide me on the same
For https://github.com/Snowflake-Labs/snowpark-python-demos/tree/main/Advertising-Spend-ROI-Prediction
Setup instructions says use pip install conda
then run conda ...
but get following error:
conda
ERROR: The install method you used for conda--probably either `pip install conda`
or `easy_install conda`--is not compatible with using conda as an application.
If your intention is to install conda as a standalone application, currently
supported install methods include the Anaconda installer and the miniconda
installer. You can download the miniconda installer from
https://conda.io/miniconda.html.
Therefore should update the instructions to remove pip install conda
and install via miniconda instead
I am running through credit card fraud detection snowpark exercises, everything looks good except I am getting an error when i try to use query with the new UDF.
Failed Query In Snowsight:
SELECT TRANSACTION_ID, TX_DATETIME, CUSTOMER_ID, TERMINAL_ID, TX_AMOUNT ,detect_fraud_batch_udf(TX_AMOUNT,TX_DURING_WEEKEND, TX_DURING_NIGHT, CUST_CNT_TX_1, CUST_AVG_AMOUNT_1, CUST_CNT_TX_7, CUST_AVG_AMOUNT_7, CUST_CNT_TX_30,CUST_AVG_AMOUNT_30, NB_TX_WINDOW_1, TERM_RISK_1, NB_TX_WINDOW_7,TERM_RISK_7, NB_TX_WINDOW_30,TERM_RISK_30) AS FRAUD_PROB
FROM CUSTOMER_TRX_FRAUD_FEATURES
WHERE TX_DATETIME > '2019-07-15 00:00:00' LIMIT 10;
100357 (P0000): Python Interpreter Error:
Traceback (most recent call last):
File "_udf_code.py", line 32, in compute
File "_udf_code.py", line 21, in wrapper
File "/var/folders/ck/ll2bz1_s3ng7w67zf6mdvqh40000gn/T/ipykernel_77660/546210058.py", line 17, in detect_fraud_batch
File "/Users/hayan/opt/anaconda3/envs/snowpark_070/lib/python3.8/site-packages/cachetools/init.py", line 641, in wrapper
File "/var/folders/ck/ll2bz1_s3ng7w67zf6mdvqh40000gn/T/ipykernel_77660/546210058.py", line 10, in read_file
File "/usr/lib/python_udf/5dd4a97c20bf6b6243e739c66c4fbfa80ab53172655f7f15cf1c55d0f462ae66/lib/python3.8/site-packages/joblib/numpy_pickle.py", line 577, in load
obj = _unpickle(fobj)
File "/usr/lib/python_udf/5dd4a97c20bf6b6243e739c66c4fbfa80ab53172655f7f15cf1c55d0f462ae66/lib/python3.8/site-packages/joblib/numpy_pickle.py", line 506, in _unpickle
obj = unpickler.load()
File "/usr/lib/python_udf/5dd4a97c20bf6b6243e739c66c4fbfa80ab53172655f7f15cf1c55d0f462ae66/lib/python3.8/pickle.py", line 1212, in load
dispatchkey[0]
KeyError: 255
in function DETECT_FRAUD_BATCH_UDF with handler compute
it seems it was added back in February to conda - is Snowpark using an older Anaconda distribution? Would be great to be able to include this simply without having to download each of the wheels from PyPi and installing that way?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.