GithubHelp home page GithubHelp logo

nvidia-merlin / systems Goto Github PK

View Code? Open in Web Editor NEW
88.0 18.0 29.0 128.78 MB

Merlin Systems provides tools for combining recommendation models with other elements of production recommender systems (like feature stores, nearest neighbor search, and exploration strategies) into end-to-end recommendation pipelines that can be served with Triton Inference Server.

License: Apache License 2.0

Python 99.66% Shell 0.34%
deep-learning gpu recommender-system recommendation-system ensemble machine-learning python tensorflow

systems's Introduction

PyPI - Python Version PyPI version shields.io GitHub License Documentation

Merlin Systems provides tools for combining recommendation models with other elements of production recommender systems like feature stores, nearest neighbor search, and exploration strategies into end-to-end recommendation pipelines that can be served with Triton Inference Server.

Quickstart

Merlin Systems uses the Merlin Operator DAG API, the same API used in NVTabular for feature engineering, to create serving ensembles. To combine a feature engineering workflow and a Tensorflow model into an inference pipeline:

import tensorflow as tf

from merlin.systems.dag import Ensemble
from merlin.systems.dag.ops import PredictTensorflow, TransformWorkflow
from nvtabular.workflow import Workflow

# Load saved NVTabular workflow and TensorFlow model
workflow = Workflow.load(nvtabular_workflow_path)
model = tf.keras.models.load_model(tf_model_path)

# Remove target/label columns from feature processing workflowk
workflow = workflow.remove_inputs([<target_columns>])

# Define ensemble pipeline
pipeline = (
	workflow.input_schema.column_names >>
	TransformWorkflow(workflow) >>
	PredictTensorflow(model)
)

# Export artifacts to disk
ensemble = Ensemble(pipeline, workflow.input_schema)
ensemble.export(export_path)

After you export your ensemble, you reference the directory to run an instance of Triton Inference Server to host your ensemble.

tritonserver --model-repository=/export_path/

Refer to the Merlin Example Notebooks for exploring notebooks that demonstrate how to train and evaluate a ranking model with Merlin Models and then how to serve it as an ensemble on Triton Inference Server.

For training models with XGBoost and Implicit, and then serving with Systems, you can visit these examples.

Building a Four-Stage Recommender Pipeline

Merlin Systems can also build more complex serving pipelines that integrate multiple models and external tools (like feature stores and nearest neighbor search):

# Load artifacts for the pipeline
retrieval_model = tf.keras.models.load_model(retrieval_model_path)
ranking_model = tf.keras.models.load_model(ranking_model_path)
feature_store = feast.FeatureStore(feast_repo_path)

# Define the fields expected in requests
request_schema = Schema([
    ColumnSchema("user_id", dtype=np.int32),
])

# Fetch user features, use them to a compute user vector with retrieval model,
# and find candidate items closest to the user vector with nearest neighbor search
user_features = request_schema.column_names >> QueryFeast.from_feature_view(
    store=feature_store, view="user_features", column="user_id"
)

retrieval = (
    user_features
    >> PredictTensorflow(retrieval_model_path)
    >> QueryFaiss(faiss_index_path, topk=100)
)

# Filter out candidate items that have already interacted with
# in the current session and fetch item features for the rest
filtering = retrieval["candidate_ids"] >> FilterCandidates(
    filter_out=user_features["movie_ids"]
)

item_features = filtering >> QueryFeast.from_feature_view(
    store=feature_store, view="movie_features", column="filtered_ids",
)

# Join user and item features for the candidates and use them to predict relevance scores
combined_features = item_features >> UnrollFeatures(
    "movie_id", user_features, unrolled_prefix="user"
)

ranking = combined_features >> PredictTensorflow(ranking_model_path)

# Sort candidate items by relevance score with some randomized exploration
ordering = combined_features["movie_id"] >> SoftmaxSampling(
    relevance_col=ranking["output"], topk=10, temperature=20.0
)

# Create and export the ensemble
ensemble = Ensemble(ordering, request_schema)
ensemble.export("./ensemble")

Refer to the Example Notebooks for exploring building-and-deploying-multi-stage-RecSys notebooks with Merlin Models and Systems.

Installation

Merlin Systems requires Triton Inference Server and Tensorflow. The simplest setup is to use the Merlin Tensorflow Inference Docker container, which has both pre-installed.

Installing Merlin Systems Using Pip

You can install Merlin Systems with pip:

pip install merlin-systems

Installing Merlin Systems from Source

Merlin Systems can be installed from source by cloning the GitHub repository and running setup.py

git clone https://github.com/NVIDIA-Merlin/systems.git
cd systems && python setup.py develop

Running Merlin Systems from Docker

Merlin Systems is installed on multiple Docker containers that are available from the NVIDIA GPU Cloud (NGC) catalog. The following table lists the containers that include Triton Inference Server for use with Merlin.

Container Name Container Location Functionality
merlin-hugectr https://catalog.ngc.nvidia.com/orgs/nvidia/teams/merlin/containers/merlin-hugectr Merlin frameworks, HugeCTR, and Triton Inference Server
merlin-tensorflow https://catalog.ngc.nvidia.com/orgs/nvidia/teams/merlin/containers/merlin-tensorflow Merlin frameworks selected for only Tensorflow support and Triton Inference Server

If you want to add support for GPU-accelerated workflows, you will first need to install the NVIDIA Container Toolkit to provide GPU support for Docker. You can use the NGC links referenced in the table above to obtain more information about how to launch and run these containers.

Feedback and Support

To report bugs or get help, please open an issue.

systems's People

Contributors

ajschmidt8 avatar albert17 avatar ayodeawe avatar benfred avatar bschifferer avatar edknv avatar jperez999 avatar karlhigley avatar mikemckiernan avatar nv-alaiacano avatar oliverholworthy avatar radekosmulski avatar rjzamora avatar rnyak avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

systems's Issues

[DOC] Docstring coverage in Systems

Coverage improved:

  • From 35.0% to 70.7% by #52

Modules with missing docstrings:

  • dag
  • triton
  • workflow
Name Total Miss Cover Cover%
_version.py 22 0 22 100%
dag/ensemble.py 2 0 2 100%
dag/node.py 3 1 2 67%
dag/op_runner.py 3 3 0 0%
dag/ops/faiss.py 7 0 7 100%
dag/ops/feast.py 7 4 3 43%
dag/ops/operator.py 8 1 7 88%
dag/ops/session_filter.py 6 0 6 100%
dag/ops/softmax_sampling.py 6 1 5 83%
dag/ops/tensorflow.py 4 2 2 50%
dag/ops/unroll_features.py 5 2 3 60%
dag/ops/workflow.py 3 1 2 67%
triton/conversions.py 1 0 1 100%
triton/export.py 7 0 7 100%
triton/oprunner_model.py 3 3 0 0%
triton/utils.py 2 2 0 0%
triton/workflow_model.py 3 2 1 33%
workflow/base.py 3 3 0 0%
workflow/hugectr.py 2 2 0 0%
workflow/pytorch.py 1 1 0 0%
workflow/tensorflow.py 1 1 0 0%
TOTAL 99 29 70 70.7%

[FEA] Create a util function with run_ensemble_on_tritonserver() function

🚀 Feature request

We need to use _run_ensemble_on_tritonserver() in the PoC example to be able to send request to TIS, but we do not have this function publicly available. To make it work we had to create another python file that we can call the functions from, and added _run_ensemble_on_tritonserver and run_triton_server functions in it. see the example python file below.

Can we have such util function available either in the examples folder or merlin/systems. Thanks.

import contextlib
import glob
import os
import random
import signal
import subprocess
import time
from distutils.spawn import find_executable

import dask
import numpy as np
import pandas as pd
import pytest

from merlin.io import Dataset
import cudf

import merlin.systems.triton as triton
import merlin.systems.triton.conversions as data_conversions
import tritonclient as tritonclient
import tritonclient.grpc as grpcclient

TRITON_SERVER_PATH = find_executable("tritonserver")

@contextlib.contextmanager
def run_triton_server(modelpath):
    cmdline = [
        TRITON_SERVER_PATH,
        "--model-repository",
        modelpath,
        "--backend-config=tensorflow,version=2",
    ]
    env = os.environ.copy()
    env["CUDA_VISIBLE_DEVICES"] = "0"
    with subprocess.Popen(cmdline, env=env) as process:
        try:
            with grpcclient.InferenceServerClient("localhost:8001") as client:
                # wait until server is ready
                for _ in range(60):
                    if process.poll() is not None:
                        retcode = process.returncode
                        raise RuntimeError(f"Tritonserver failed to start (ret={retcode})")

                    try:
                        ready = client.is_server_ready()
                    except tritonclient.utils.InferenceServerException:
                        ready = False

                    if ready:
                        yield client
                        return

                    time.sleep(1)

                raise RuntimeError("Timed out waiting for tritonserver to become ready")
        finally:
            # signal triton to shutdown
            process.send_signal(signal.SIGINT)



def _run_ensemble_on_tritonserver(
    tmpdir,
    output_columns,
    df,
    model_name,
):
    inputs = triton.convert_df_to_triton_input(df.columns, df)
    outputs = [grpcclient.InferRequestedOutput(col) for col in output_columns]
    response = None
    with run_triton_server(tmpdir) as client:
        response = client.infer(model_name, inputs, outputs=outputs)

    return response

[BUG] Error while serving NVT Model on Triton Inference Server

Describe the bug

I am getting the error below from TIS when loading the models to the server. These columns are not tagged as continuous or categorical after groupby op and therefore their proper tags are missing in the workflow.output_schema and there are more output columns produced by the workflow than are in cats and conts. This is causing the issue.

Would AddMetadata op solve the issue if we add it after groupby op and tag these columns as continuous?

| Model | Version | Status |
+-----------------+---------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| t4r_pytorch_nvt | 1 | UNAVAILABLE: Internal: ValueError: The following extra columns were found in the workflow's output: {'timestamp/weekday/sin-list_trim', 'day-first', 'timestamp/age_days-list |
| | | _trim'} |
| | | |
| | | At: |
| | | /nvtabular/nvtabular/inference/workflow/base.py(70): init |
| | | /nvtabular/nvtabular/inference/workflow/tensorflow.py(33): init |
| | | /workspace/models_new/t4r_pytorch_nvt/1/model.py(85): initialize |
| t4r_pytorch_pt | 1 | READY

Steps/Code to reproduce bug
The code can be reproduced by running the getting started notebooks in Transformers4Rec repo. Add this script after model training:

import nvtabular as nvt
workflow = nvt.Workflow.load('workflow_etl')
from nvtabular.inference.triton import export_pytorch_ensemble
export_pytorch_ensemble(
    model,
    workflow,
    sparse_max=trainer.get_train_dataloader().dataset.sparse_max,
    name= "t4r_pytorch",
    model_path= "/workspace/models",
    label_columns =[],
)

Expected behavior
No error should occur when we load models to TIS.

Environment details (please complete the following information):

  • Environment location: [Bare-metal, Docker, Cloud(specify cloud provider)]
  • Method of NVTabular install: [conda, Docker, or from source]: DOCKER
    • If method of install is [Docker], provide docker pull & docker run commands used

Using docker-pytorch-inference:22.03 wit the latest NVTabular main branch pulled.

Additional context
Add any other context about the problem here.

Use GPU tensors in Triton ensemble operators

  • Use DLpack to leverage GPU memory between triton models in ensemble
  • Upgrade numpy in containers and see if dlpack works with Triton tensors
  • Try to build a repro of transferring cupy tensors to Triton with dlpack (re: issue with contiguous arrays)

[DOC]Systems - Readme Bash v1.0

  • Systems
    • Recently updated fixes to Doc adding several Components, Needs to be reviewed
    • Need to add an advanced example in merlin systems repo

[FEA] run_ensemble_on_tritonserver should use send_triton_request

🚀 Feature request

run_ensemble_on_tritonserver should start triton server and use the function send_triton_request to send the request to triton server (+ stop the triton server).

Motivation

We use run_ensemble_on_tritonserver in unittests of the notebooks to replace send_triton_request. That we automatically start, send request and stop the server. We still want to test send_triton_request in the notebook. If run_ensemble_on_tritonserver uses send_triton_request we accomplish both - testing the notebooks and testing send_triton_request function.

[BUG] run_triton_server() func does not launch the TIS

Bug description

At the last step of the 02-Deploying-multi-stage-RecSys-with-Merlin-Systems.ipynb notebook, I run run_triton_server(export_path) but this does not launch TIS. Based on run_triton_server func definition, it is returning the client object but it is not clear how returned client object is used in send_triton_request() func to be able to sent the request to running server.

Please see error showing that ``run_triton_server(export_path)` does not launch TIS:

# create a request to be sent to TIS
from merlin.core.dispatch import make_df
from merlin.systems.triton.utils import run_triton_server
from merlin.systems.triton.utils import send_triton_request
from merlin.core.dispatch import get_lib

request = make_df({"user_id": [1]})
request["user_id"] = request["user_id"].astype(np.int32)
run_triton_server(export_path)
outputs = ensemble.graph.output_schema.column_names
output = send_triton_request(request, outputs)

---------------------------------------------------------------------------
InferenceServerException                  Traceback (most recent call last)
Input In [49], in <cell line: 14>()
     10 run_triton_server(export_path)
     12 outputs = ensemble.graph.output_schema.column_names
---> 14 output = send_triton_request(request, outputs)

File /systems/merlin/systems/triton/utils.py:132, in send_triton_request(df, outputs_list, endpoint, request_id, triton_model)
    129 except Exception as e:
    130     raise e
--> 132 if not triton_client.is_server_live():
    133     raise ValueError("Client could not establish commuincation with Triton Inference Server.")
    135 inputs = triton.convert_df_to_triton_input(df.columns, df, grpcclient.InferInput)

File /usr/local/lib/python3.8/dist-packages/tritonclient/grpc/__init__.py:302, in InferenceServerClient.is_server_live(self, headers)
    300     return response.live
    301 except grpc.RpcError as rpc_error:
--> 302     raise_error_grpc(rpc_error)

File /usr/local/lib/python3.8/dist-packages/tritonclient/grpc/__init__.py:62, in raise_error_grpc(rpc_error)
     61 def raise_error_grpc(rpc_error):
---> 62     raise get_error_grpc(rpc_error) from None

InferenceServerException: [StatusCode.UNAVAILABLE] failed to connect to all addresses
​

It doesn't look like send_triton_request accepts a client object? It'd be nice to have example of how these functions work together?

Steps/Code to reproduce bug

Please run the notebooks 01 and 02 in here Deploying-multi-stage-RecSys to repro the issue.

Environment details

  • Merlin version:
  • Platform:
  • Python version:
  • PyTorch version (GPU?):
  • Tensorflow version (GPU?):

merlin-tensorflow-inference:22.04 with the latest main branches pulled.

[FEA] Avoid Deprecation warnings while importing Ensemble class

🚀 Feature request

It'd be nice to avoid all these DeprecationWarnings when we run from merlin.systems.dag.ensemble import Ensemble.

/usr/local/lib/python3.8/dist-packages/tritonclient/grpc/model_config_pb2.py:19: DeprecationWarning: Call to deprecated create function FileDescriptor(). Note: Create unlinked descriptors is going to go away. Please use get/find descriptors from generated code or query the descriptor_pool.
  DESCRIPTOR = _descriptor.FileDescriptor(
/usr/local/lib/python3.8/dist-packages/tritonclient/grpc/model_config_pb2.py:33: DeprecationWarning: Call to deprecated create function EnumValueDescriptor(). Note: Create unlinked descriptors is going to go away. Please use get/find descriptors from generated code or query the descriptor_pool.
  _descriptor.EnumValueDescriptor(
/usr/local/lib/python3.8/dist-packages/tritonclient/grpc/model_config_pb2.py:27: DeprecationWarning: Call to deprecated create function EnumDescriptor(). Note: Create unlinked descriptors is going to go away. Please use get/find descriptors from generated code or query the descriptor_pool.
  _DATATYPE = _descriptor.EnumDescriptor(
/usr/local/lib/python3.8/dist-packages/tritonclient/grpc/model_config_pb2.py:322: DeprecationWarning: Call to deprecated create function FieldDescriptor(). Note: Create unlinked descriptors is going to go away. Please use get/find descriptors from generated code or query the descriptor_pool.
  _descriptor.FieldDescriptor(
/usr/local/lib/python3.8/dist-packages/tritonclient/grpc/model_config_pb2.py:315: DeprecationWarning: Call to deprecated create function Descriptor(). Note: Create unlinked descriptors is going to go away. Please use get/find descriptors from generated code or query the descriptor_pool.
  _MODELRATELIMITER_RESOURCE = _descriptor.Descriptor(

[BUG] Type annotation breaks FAISS op

Bug description

This line has an incorrect type annotation that prevents the Faiss operator from running.

Steps/Code to reproduce bug

Run the end-to-end example notebooks

Expected behavior

Should run without errors (maybe the annotation should be List[Union[dict, list]?

Set up the initial repo structure

  • Initialize a git repo
  • Create merlin package directory
  • Add .gitignore, pre-commit config, setup.cfg, setup.py
  • Add Github Actions CI
  • Add README
  • Add CLA
  • Set up Versioneer

Make it possible to use operators for serving or batch

  • Add the capability to select output formats at the operator and/or ensemble level (np/cp/pandas/cudf etc)
  • Adjust the operators as needed to work with pandas/cudf output
  • Figure out how to handle operators that fetch data from external systems (provide an option to replace that with a cudf join?)
  • ...

[FEA] create send triton request function

create a function that abstracts away the following when creating a triton inference request:

inputs = triton.convert_df_to_triton_input(df.columns, df)
outputs = [grpcclient.InferRequestedOutput(col) for col in output_columns]
# send request to tritonserver
with grpcclient.InferenceServerClient("localhost:8001") as client:
    response = client.infer("ensemble_model", inputs, request_id="1", outputs=outputs)
    # access individual response columns to get values back.
for col in ensemble.graph.output_schema.column_names:
    print(col, response.as_numpy(col), response.as_numpy(col).shape)

would like to setup as:
response = send_triton_request(...)

[BUG] Convert cuDF to Triton Object does not work with null values

Bug description

If I use convert_df_to_triton_input with a dataframe containing Null values, then I get an error:

ValueError: Column must have no nulls.

image

We could replace .values_host with .to_pandas().values. However, that will change the nan value to -2147483648

Expected behavior

I can convert DataFrames with Nulls

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.