GithubHelp home page GithubHelp logo

hopsworks-api's Introduction

Hopsworks Client

Hopsworks Community Hopsworks Documentation python PyPiStatus Scala/Java Artifacts Downloads Ruff License

hopsworks is the python API for interacting with a Hopsworks cluster. Don't have a Hopsworks cluster just yet? Register an account on Hopsworks Serverless and get started for free. Once connected to your project, you can:

  • Insert dataframes into the online or offline Store, create training datasets or serve real-time feature vectors in the Feature Store via the Feature Store API. Already have data somewhere you want to import, checkout our Storage Connectors documentation.
  • register ML models in the model registry and deploy them via model serving via the Machine Learning API.
  • manage environments, executions, kafka topics and more once you deploy your own Hopsworks cluster, either on-prem or in the cloud. Hopsworks is open-source and has its own Community Edition.

Our tutorials cover a wide range of use cases and example of what you can build using Hopsworks.

Getting Started On Hopsworks

Once you created a project on Hopsworks Serverless and created a new Api Key, just use your favourite virtualenv and package manager to install the library:

pip install "hopsworks[python]"

Fire up a notebook and connect to your project, you will be prompted to enter your newly created API key:

import hopsworks

project = hopsworks.login()

Feature Store API

Access the Feature Store of your project to use as a central repository for your feature data. Use your favourite data engineering library (pandas, polars, Spark, etc...) to insert data into the Feature Store, create training datasets or serve real-time feature vectors. Want to predict likelyhood of e-scooter accidents in real-time? Here's how you can do it:

fs = project.get_feature_store()

# Write to Feature Groups
bike_ride_fg = fs.get_or_create_feature_group(
  name="bike_rides",
  version=1,
  primary_key=["ride_id"],
  event_time="activation_time",
  online_enabled=True,
)

fg.insert(bike_rides_df)

# Read from Feature Views
profile_fg = fs.get_feature_group("user_profile", version=1)

bike_ride_fv = fs.get_or_create_feature_view(
  name="bike_rides_view",
  version=1,
  query=bike_ride_fg.select_except(["ride_id"]).join(profile_fg.select(["age", "has_license"]), on="user_id")
)

bike_rides_Q1_2021_df = bike_ride_fv.get_batch_data(
  start_date="2021-01-01",
  end_date="2021-01-31"
)

# Create a training dataset
version, job = bike_ride_fv.create_train_test_split(
    test_size=0.2,
    description='Description of a dataset',
    # you can have different data formats such as csv, tsv, tfrecord, parquet and others
    data_format='csv'
)

# Predict the probability of accident in real-time using new data + context data
bike_ride_fv.init_serving()

while True:
    new_ride_vector = poll_ride_queue()
    feature_vector = bike_ride_fv.get_online_feature_vector(
      {"user_id": new_ride_vector["user_id"]},
      passed_features=new_ride_vector
    )
    accident_probability = model.predict(feature_vector)

The API enables interaction with the Hopsworks Feature Store. It makes creating new features, feature groups and training datasets easy.

The API is environment independent and can be used in two modes:

  • Spark mode: For data engineering jobs that create and write features into the feature store or generate training datasets. It requires a Spark environment such as the one provided in the Hopsworks platform or Databricks. In Spark mode, HSFS provides bindings both for Python and JVM languages.

  • Python mode: For data science jobs to explore the features available in the feature store, generate training datasets and feed them in a training pipeline. Python mode requires just a Python interpreter and can be used both in Hopsworks from Python Jobs/Jupyter Kernels, Amazon SageMaker or KubeFlow.

Scala API is also available, here is a short sample of it:

import com.logicalclocks.hsfs._
val connection = HopsworksConnection.builder().build()
val fs = connection.getFeatureStore();
val attendances_features_fg = fs.getFeatureGroup("games_features", 1);
attendances_features_fg.show(1)

Machine Learning API

Or you can use the Machine Learning API to interact with the Hopsworks Model Registry and Model Serving. The API makes it easy to export, manage and deploy models. For example, to register models and deploy them for serving you can do:

mr = project.get_model_registry()
# or
ms = connection.get_model_serving()

# Create a new model:
model = mr.tensorflow.create_model(name="mnist",
                                   version=1,
                                   metrics={"accuracy": 0.94},
                                   description="mnist model description")
model.save("/tmp/model_directory") # or /tmp/model_file

# Download a model:
model = mr.get_model("mnist", version=1)
model_path = model.download()

# Delete the model:
model.delete()

# Get the best-performing model
best_model = mr.get_best_model('mnist', 'accuracy', 'max')

# Deploy the model:
deployment = model.deploy()
deployment.start()

# Make predictions with a deployed model
data = { "instances": [ model.input_example ] }
predictions = deployment.predict(data)

Tutorials

Need more inspiration or want to learn more about the Hopsworks platform? Check out our tutorials.

Documentation

Documentation is available at Hopsworks Documentation.

Issues

For general questions about the usage of Hopsworks and the Feature Store please open a topic on Hopsworks Community.

Please report any issue using Github issue tracking.

Related to Feautre Store API

Please attach the client environment from the output below to your issue, if it is related to Feature Store API:

import hopsworks
import hsfs
hopsworks.login().get_feature_store()
print(hsfs.get_env())

Contributing

If you would like to contribute to this library, please see the Contribution Guidelines.

hopsworks-api's People

Contributors

aversey avatar berthoug avatar bubriks avatar davitbzh avatar dependabot[bot] avatar dhananjay-mk avatar ermiasg avatar gibchikafa avatar jacarte avatar javierdlrm avatar jimdowling avatar kennethmhc avatar kherashchenko avatar kouzant avatar lovew-lc avatar maismail avatar manu-sj avatar maxxx-zh avatar mklepium avatar moritzmeister avatar o-alex avatar rktraz avatar robzor92 avatar siroibaf avatar smkniazi avatar tdoehmen avatar tkakantousis avatar vatj avatar yiksanchan avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

hopsworks-api's Issues

Cannot install on Python 3.10

Trying to install the library on 3.10 fails with the following error:

❯ pip install hopsworks
Collecting hopsworks
  Using cached hopsworks-3.0.4.tar.gz (35 kB)
  Preparing metadata (setup.py) ... done
  Using cached hopsworks-3.0.3.tar.gz (35 kB)
  Preparing metadata (setup.py) ... done
  Using cached hopsworks-3.0.2.tar.gz (34 kB)
  Preparing metadata (setup.py) ... done
  Using cached hopsworks-3.0.1.tar.gz (34 kB)
  Preparing metadata (setup.py) ... done
ERROR: Cannot install hopsworks==3.0.1, hopsworks==3.0.2, hopsworks==3.0.3 and hopsworks==3.0.4 because these package versions have conflicting dependencies.

The conflict is caused by:
    hopsworks 3.0.4 depends on hsfs[python]<3.1.0 and >=3.0.0
    hopsworks 3.0.3 depends on hsfs[python]<3.1.0 and >=3.0.0
    hopsworks 3.0.2 depends on hsfs[python]<3.1.0 and >=3.0.0
    hopsworks 3.0.1 depends on hsfs[python]<3.1.0 and >=3.0.0

To fix this you could try to:
1. loosen the range of package versions you've specified
2. remove package versions to allow pip attempt to solve the dependency conflict

ERROR: ResolutionImpossible: for help visit https://pip.pypa.io/en/latest/topics/dependency-resolution/#dealing-with-dependency-conflicts

Hopsworks prints API Key when configured incorrectly

If the API Key is incorrectly printed as a secret in (for example huggingface.co) and contains a "[API-Key]\n", newline at the end you the resulting stacktrace will contain the plain API Key. The http

image

I have redacted my API key, but this error is only there when the API Key contains a \n at the end. If I remove the \n from the end of my API key, the error disappears.

Improvements to login

Support setting project name, api_key_value and api_key_file in login function arguments.

Support multiple projects. Prompt for user to select project to use.

Dependency Conflict

Had an issue when attempting to perform pip install hopswork.

Installing collected packages: sqlalchemy Attempting uninstall: sqlalchemy Found existing installation: SQLAlchemy 2.0.31 Uninstalling SQLAlchemy-2.0.31: Successfully uninstalled SQLAlchemy-2.0.31 ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts. ipython-sql 0.5.0 requires sqlalchemy>=2.0, but you have sqlalchemy 1.4.48 which is incompatible. Successfully installed sqlalchemy-1.4.48

So I attempted to upgrade sqlalchemy's version in a bid to resolve this issue, and I got another issue

Requirement already satisfied: sqlalchemy in /usr/local/lib/python3.10/dist-packages (1.4.48) Collecting sqlalchemy Downloading SQLAlchemy-2.0.31-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (3.1 MB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 3.1/3.1 MB 39.2 MB/s eta 0:00:00 Requirement already satisfied: typing-extensions>=4.6.0 in /usr/local/lib/python3.10/dist-packages (from sqlalchemy) (4.12.2) Requirement already satisfied: greenlet!=0.4.17 in /usr/local/lib/python3.10/dist-packages (from sqlalchemy) (3.0.3) Installing collected packages: sqlalchemy Attempting uninstall: sqlalchemy Found existing installation: SQLAlchemy 1.4.48 Uninstalling SQLAlchemy-1.4.48: Successfully uninstalled SQLAlchemy-1.4.48 ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts. hsfs 3.7.6 requires sqlalchemy<=1.4.48, but you have sqlalchemy 2.0.31 which is incompatible. Successfully installed sqlalchemy-2.0.31

I kinda need more advise on what shall I do

Kafka external support

The library needs to get the certificates in .pem format and return a configuration for connecting to kafka from an external environment.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.