nvidia-merlin / systems Goto Github PK

Merlin Systems provides tools for combining recommendation models with other elements of production recommender systems (like feature stores, nearest neighbor search, and exploration strategies) into end-to-end recommendation pipelines that can be served with Triton Inference Server.

License: Apache License 2.0

Python 99.66% Shell 0.34%

deep-learning gpu recommender-system recommendation-system ensemble machine-learning python tensorflow

systems's Introduction

Merlin Systems

Merlin Systems provides tools for combining recommendation models with other elements of production recommender systems like feature stores, nearest neighbor search, and exploration strategies into end-to-end recommendation pipelines that can be served with Triton Inference Server.

Quickstart

Merlin Systems uses the Merlin Operator DAG API, the same API used in NVTabular for feature engineering, to create serving ensembles. To combine a feature engineering workflow and a Tensorflow model into an inference pipeline:

import tensorflow as tf

from merlin.systems.dag import Ensemble
from merlin.systems.dag.ops import PredictTensorflow, TransformWorkflow
from nvtabular.workflow import Workflow

# Load saved NVTabular workflow and TensorFlow model
workflow = Workflow.load(nvtabular_workflow_path)
model = tf.keras.models.load_model(tf_model_path)

# Remove target/label columns from feature processing workflowk
workflow = workflow.remove_inputs([<target_columns>])

# Define ensemble pipeline
pipeline = (
	workflow.input_schema.column_names >>
	TransformWorkflow(workflow) >>
	PredictTensorflow(model)
)

# Export artifacts to disk
ensemble = Ensemble(pipeline, workflow.input_schema)
ensemble.export(export_path)

After you export your ensemble, you reference the directory to run an instance of Triton Inference Server to host your ensemble.

tritonserver --model-repository=/export_path/

Refer to the Merlin Example Notebooks for exploring notebooks that demonstrate how to train and evaluate a ranking model with Merlin Models and then how to serve it as an ensemble on Triton Inference Server.

For training models with XGBoost and Implicit, and then serving with Systems, you can visit these examples.

Building a Four-Stage Recommender Pipeline

Merlin Systems can also build more complex serving pipelines that integrate multiple models and external tools (like feature stores and nearest neighbor search):

# Load artifacts for the pipeline
retrieval_model = tf.keras.models.load_model(retrieval_model_path)
ranking_model = tf.keras.models.load_model(ranking_model_path)
feature_store = feast.FeatureStore(feast_repo_path)

# Define the fields expected in requests
request_schema = Schema([
    ColumnSchema("user_id", dtype=np.int32),
])

# Fetch user features, use them to a compute user vector with retrieval model,
# and find candidate items closest to the user vector with nearest neighbor search
user_features = request_schema.column_names >> QueryFeast.from_feature_view(
    store=feature_store, view="user_features", column="user_id"
)

retrieval = (
    user_features
    >> PredictTensorflow(retrieval_model_path)
    >> QueryFaiss(faiss_index_path, topk=100)
)

# Filter out candidate items that have already interacted with
# in the current session and fetch item features for the rest
filtering = retrieval["candidate_ids"] >> FilterCandidates(
    filter_out=user_features["movie_ids"]
)

item_features = filtering >> QueryFeast.from_feature_view(
    store=feature_store, view="movie_features", column="filtered_ids",
)

# Join user and item features for the candidates and use them to predict relevance scores
combined_features = item_features >> UnrollFeatures(
    "movie_id", user_features, unrolled_prefix="user"
)

ranking = combined_features >> PredictTensorflow(ranking_model_path)

# Sort candidate items by relevance score with some randomized exploration
ordering = combined_features["movie_id"] >> SoftmaxSampling(
    relevance_col=ranking["output"], topk=10, temperature=20.0
)

# Create and export the ensemble
ensemble = Ensemble(ordering, request_schema)
ensemble.export("./ensemble")

Refer to the Example Notebooks for exploring building-and-deploying-multi-stage-RecSys notebooks with Merlin Models and Systems.

Installation

Merlin Systems requires Triton Inference Server and Tensorflow. The simplest setup is to use the Merlin Tensorflow Inference Docker container, which has both pre-installed.

Installing Merlin Systems Using Pip

You can install Merlin Systems with pip:

pip install merlin-systems

Installing Merlin Systems from Source

Merlin Systems can be installed from source by cloning the GitHub repository and running setup.py

git clone https://github.com/NVIDIA-Merlin/systems.git
cd systems && python setup.py develop

Running Merlin Systems from Docker

Merlin Systems is installed on multiple Docker containers that are available from the NVIDIA GPU Cloud (NGC) catalog. The following table lists the containers that include Triton Inference Server for use with Merlin.

Container Name	Container Location	Functionality
`merlin-hugectr`	https://catalog.ngc.nvidia.com/orgs/nvidia/teams/merlin/containers/merlin-hugectr	Merlin frameworks, HugeCTR, and Triton Inference Server
`merlin-tensorflow`	https://catalog.ngc.nvidia.com/orgs/nvidia/teams/merlin/containers/merlin-tensorflow	Merlin frameworks selected for only Tensorflow support and Triton Inference Server

If you want to add support for GPU-accelerated workflows, you will first need to install the NVIDIA Container Toolkit to provide GPU support for Docker. You can use the NGC links referenced in the table above to obtain more information about how to launch and run these containers.

Feedback and Support

To report bugs or get help, please open an issue.

systems's People

Contributors

Stargazers

Watchers

systems's Issues

Docs rendering didn’t pick up the changes

Migrate Triton inference ensemble API to systems repo

Copy over the relevant tests
Copy over the ensemble API code
Inline whatever export code from the old version we actually use

[DOC] Docstring coverage in Systems

Coverage improved:

From 35.0% to 70.7% by #52

Modules with missing docstrings:

dag
triton
workflow

Name	Total	Miss	Cover	Cover%
_version.py	22	0	22	100%
dag/ensemble.py	2	0	2	100%
dag/node.py	3	1	2	67%
dag/op_runner.py	3	3	0	0%
dag/ops/faiss.py	7	0	7	100%
dag/ops/feast.py	7	4	3	43%
dag/ops/operator.py	8	1	7	88%
dag/ops/session_filter.py	6	0	6	100%
dag/ops/softmax_sampling.py	6	1	5	83%
dag/ops/tensorflow.py	4	2	2	50%
dag/ops/unroll_features.py	5	2	3	60%
dag/ops/workflow.py	3	1	2	67%
triton/conversions.py	1	0	1	100%
triton/export.py	7	0	7	100%
triton/oprunner_model.py	3	3	0	0%
triton/utils.py	2	2	0	0%
triton/workflow_model.py	3	2	1	33%
workflow/base.py	3	3	0	0%
workflow/hugectr.py	2	2	0	0%
workflow/pytorch.py	1	1	0	0%
workflow/tensorflow.py	1	1	0	0%

TOTAL	99	29	70	70.7%

[DOC]Systems Docstring bash for v1.0

Issue Reported By Impact Owner Status

Update the instructions to build the docs in here. https://github.com/NVIDIA-Merlin/systems/tree/main/docs Looks like it was copy pasted from models? Ronay L

Docs rendering didn’t pick up the changes Julio XL

Issue	Reported By	Impact	Owner	Status
Update the instructions to build the docs in here. https://github.com/NVIDIA-Merlin/systems/tree/main/docs Looks like it was copy pasted from `models`?	Ronay	L
Docs rendering didn’t pick up the changes	Julio	XL

Add integrations for other nearest neighbor search tools

Libraries:

Faiss
cuML
Annoy

Vector search platforms:

[FEA] Create a util function with run_ensemble_on_tritonserver() function

🚀 Feature request

We need to use _run_ensemble_on_tritonserver() in the PoC example to be able to send request to TIS, but we do not have this function publicly available. To make it work we had to create another python file that we can call the functions from, and added _run_ensemble_on_tritonserver and run_triton_server functions in it. see the example python file below.

Can we have such util function available either in the examples folder or merlin/systems. Thanks.

import contextlib
import glob
import os
import random
import signal
import subprocess
import time
from distutils.spawn import find_executable

import dask
import numpy as np
import pandas as pd
import pytest

from merlin.io import Dataset
import cudf

import merlin.systems.triton as triton
import merlin.systems.triton.conversions as data_conversions
import tritonclient as tritonclient
import tritonclient.grpc as grpcclient

TRITON_SERVER_PATH = find_executable("tritonserver")

@contextlib.contextmanager
def run_triton_server(modelpath):
    cmdline = [
        TRITON_SERVER_PATH,
        "--model-repository",
        modelpath,
        "--backend-config=tensorflow,version=2",
    ]
    env = os.environ.copy()
    env["CUDA_VISIBLE_DEVICES"] = "0"
    with subprocess.Popen(cmdline, env=env) as process:
        try:
            with grpcclient.InferenceServerClient("localhost:8001") as client:
                # wait until server is ready
                for _ in range(60):
                    if process.poll() is not None:
                        retcode = process.returncode
                        raise RuntimeError(f"Tritonserver failed to start (ret={retcode})")

                    try:
                        ready = client.is_server_ready()
                    except tritonclient.utils.InferenceServerException:
                        ready = False

                    if ready:
                        yield client
                        return

                    time.sleep(1)

                raise RuntimeError("Timed out waiting for tritonserver to become ready")
        finally:
            # signal triton to shutdown
            process.send_signal(signal.SIGINT)



def _run_ensemble_on_tritonserver(
    tmpdir,
    output_columns,
    df,
    model_name,
):
    inputs = triton.convert_df_to_triton_input(df.columns, df)
    outputs = [grpcclient.InferRequestedOutput(col) for col in output_columns]
    response = None
    with run_triton_server(tmpdir) as client:
        response = client.infer(model_name, inputs, outputs=outputs)

    return response

Update QueryFeast op to work with non-local feature repo

Create an example notebook of a simple ensemble with a workflow and a TF model

Add large-scale filtering with approximate set membership queries (e.g. using Bloom filters)

Integrate OpenTelemetry for logging, metrics, traces

Set up CI

Create an example of serving session-based recommendations (with TF4Rec?)

[BUG] Error while serving NVT Model on Triton Inference Server

Describe the bug

I am getting the error below from TIS when loading the models to the server. These columns are not tagged as continuous or categorical after groupby op and therefore their proper tags are missing in the workflow.output_schema and there are more output columns produced by the workflow than are in cats and conts. This is causing the issue.

Would AddMetadata op solve the issue if we add it after groupby op and tag these columns as continuous?

| Model | Version | Status |
+-----------------+---------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| t4r_pytorch_nvt | 1 | UNAVAILABLE: Internal: ValueError: The following extra columns were found in the workflow's output: {'timestamp/weekday/sin-list_trim', 'day-first', 'timestamp/age_days-list |
| | | _trim'} |
| | | |
| | | At: |
| | | /nvtabular/nvtabular/inference/workflow/base.py(70): init |
| | | /nvtabular/nvtabular/inference/workflow/tensorflow.py(33): init |
| | | /workspace/models_new/t4r_pytorch_nvt/1/model.py(85): initialize |
| t4r_pytorch_pt | 1 | READY

Steps/Code to reproduce bug
The code can be reproduced by running the getting started notebooks in Transformers4Rec repo. Add this script after model training:

import nvtabular as nvt
workflow = nvt.Workflow.load('workflow_etl')
from nvtabular.inference.triton import export_pytorch_ensemble
export_pytorch_ensemble(
    model,
    workflow,
    sparse_max=trainer.get_train_dataloader().dataset.sparse_max,
    name= "t4r_pytorch",
    model_path= "/workspace/models",
    label_columns =[],
)

Expected behavior
No error should occur when we load models to TIS.

Environment details (please complete the following information):

Environment location: [Bare-metal, Docker, Cloud(specify cloud provider)]
Method of NVTabular install: [conda, Docker, or from source]: DOCKER
- If method of install is [Docker], provide docker pull & docker run commands used

Using docker-pytorch-inference:22.03 wit the latest NVTabular main branch pulled.

Additional context
Add any other context about the problem here.

Integrate HugeCTR into the new Triton ensemble API

Provide support for serving HugeCTR models in Merlin Systems

[FEA] Use send_triton_request in run_ensemble_on_tritonserver

Use the currently available send_triton_request inside the run_ensemble... to verify the send_triton_request function works when using run_ensemble in unit tests.

Measure and optimize serving performance

Update FAISS op to copy nearest neighbor search index into export path

Use GPU tensors in Triton ensemble operators

Use DLpack to leverage GPU memory between triton models in ensemble
Upgrade numpy in containers and see if dlpack works with Triton tensors
Try to build a repro of transferring cupy tensors to Triton with dlpack (re: issue with contiguous arrays)

Update operators keep tensors on GPU between ops (where possible)

Create a numpy/cupy dispatch mechanism (like pandas/cudf in NVT)
Apply DLpack to pass GPU tensors from Python back-end to other models
Update FilterCandidates
Update SoftmaxSampling
Update Faiss and Feast ops to convert to GPU?

Update the instructions to build the docs in here. https://github.com/NVIDIA-Merlin/systems/tree/main/docs

[DOC]Systems - Readme Bash v1.0

Systems
- Recently updated fixes to Doc adding several Components, Needs to be reviewed
- Need to add an advanced example in merlin systems repo

Add graph-based candidate retrieval via random walks

Make the systems repo public

Add integrations for other feature stores

[DOC]Update the instructions to build the docs

Update the instructions to build the docs in here. https://github.com/NVIDIA-Merlin/systems/tree/main/docs
Looks like it was copy pasted from models?

Impact: L

Add automated docstring coverage checks to commit hooks and CI build

[FEA] QueryFeast remove path requirement in from_feature_view

Would like to remove requirement for repo_path when created queryfeast operator from_feature_view. It already has a featurestore that has that information saved.

Adjust the operators so they handle batched requests

Operators:

Feast
Tensorflow
Faiss
...

Open Questions:

How to use async in TritonPythonModel and OperatorRunner (execute becomes async, does transform?)

Set up CPU tests in Github Actions

[FEA] run_ensemble_on_tritonserver should use send_triton_request

🚀 Feature request

run_ensemble_on_tritonserver should start triton server and use the function send_triton_request to send the request to triton server (+ stop the triton server).

Motivation

We use run_ensemble_on_tritonserver in unittests of the notebooks to replace send_triton_request. That we automatically start, send request and stop the server. We still want to test send_triton_request in the notebook. If run_ensemble_on_tritonserver uses send_triton_request we accomplish both - testing the notebooks and testing send_triton_request function.

Create operators for common ordering business logic

Pin a particular item in a particular position
Match distribution of recommended items to distribution of user history (e.g. by genre)
...

Support multi-stage ranking

[BUG] run_triton_server() func does not launch the TIS

Bug description

At the last step of the 02-Deploying-multi-stage-RecSys-with-Merlin-Systems.ipynb notebook, I run run_triton_server(export_path) but this does not launch TIS. Based on run_triton_server func definition, it is returning the client object but it is not clear how returned client object is used in send_triton_request() func to be able to sent the request to running server.

Please see error showing that ``run_triton_server(export_path)` does not launch TIS:

# create a request to be sent to TIS
from merlin.core.dispatch import make_df
from merlin.systems.triton.utils import run_triton_server
from merlin.systems.triton.utils import send_triton_request
from merlin.core.dispatch import get_lib

request = make_df({"user_id": [1]})
request["user_id"] = request["user_id"].astype(np.int32)
run_triton_server(export_path)
outputs = ensemble.graph.output_schema.column_names
output = send_triton_request(request, outputs)

---------------------------------------------------------------------------
InferenceServerException                  Traceback (most recent call last)
Input In [49], in <cell line: 14>()
     10 run_triton_server(export_path)
     12 outputs = ensemble.graph.output_schema.column_names
---> 14 output = send_triton_request(request, outputs)

File /systems/merlin/systems/triton/utils.py:132, in send_triton_request(df, outputs_list, endpoint, request_id, triton_model)
    129 except Exception as e:
    130     raise e
--> 132 if not triton_client.is_server_live():
    133     raise ValueError("Client could not establish commuincation with Triton Inference Server.")
    135 inputs = triton.convert_df_to_triton_input(df.columns, df, grpcclient.InferInput)

File /usr/local/lib/python3.8/dist-packages/tritonclient/grpc/__init__.py:302, in InferenceServerClient.is_server_live(self, headers)
    300     return response.live
    301 except grpc.RpcError as rpc_error:
--> 302     raise_error_grpc(rpc_error)

File /usr/local/lib/python3.8/dist-packages/tritonclient/grpc/__init__.py:62, in raise_error_grpc(rpc_error)
     61 def raise_error_grpc(rpc_error):
---> 62     raise get_error_grpc(rpc_error) from None

InferenceServerException: [StatusCode.UNAVAILABLE] failed to connect to all addresses

It doesn't look like send_triton_request accepts a client object? It'd be nice to have example of how these functions work together?

Steps/Code to reproduce bug

Please run the notebooks 01 and 02 in here Deploying-multi-stage-RecSys to repro the issue.

Environment details

Merlin version:
Platform:
Python version:
PyTorch version (GPU?):
Tensorflow version (GPU?):

merlin-tensorflow-inference:22.04 with the latest main branches pulled.

[FEA] Avoid Deprecation warnings while importing Ensemble class

🚀 Feature request

It'd be nice to avoid all these DeprecationWarnings when we run from merlin.systems.dag.ensemble import Ensemble.

/usr/local/lib/python3.8/dist-packages/tritonclient/grpc/model_config_pb2.py:19: DeprecationWarning: Call to deprecated create function FileDescriptor(). Note: Create unlinked descriptors is going to go away. Please use get/find descriptors from generated code or query the descriptor_pool.
  DESCRIPTOR = _descriptor.FileDescriptor(
/usr/local/lib/python3.8/dist-packages/tritonclient/grpc/model_config_pb2.py:33: DeprecationWarning: Call to deprecated create function EnumValueDescriptor(). Note: Create unlinked descriptors is going to go away. Please use get/find descriptors from generated code or query the descriptor_pool.
  _descriptor.EnumValueDescriptor(
/usr/local/lib/python3.8/dist-packages/tritonclient/grpc/model_config_pb2.py:27: DeprecationWarning: Call to deprecated create function EnumDescriptor(). Note: Create unlinked descriptors is going to go away. Please use get/find descriptors from generated code or query the descriptor_pool.
  _DATATYPE = _descriptor.EnumDescriptor(
/usr/local/lib/python3.8/dist-packages/tritonclient/grpc/model_config_pb2.py:322: DeprecationWarning: Call to deprecated create function FieldDescriptor(). Note: Create unlinked descriptors is going to go away. Please use get/find descriptors from generated code or query the descriptor_pool.
  _descriptor.FieldDescriptor(
/usr/local/lib/python3.8/dist-packages/tritonclient/grpc/model_config_pb2.py:315: DeprecationWarning: Call to deprecated create function Descriptor(). Note: Create unlinked descriptors is going to go away. Please use get/find descriptors from generated code or query the descriptor_pool.
  _MODELRATELIMITER_RESOURCE = _descriptor.Descriptor(

Write a README (description of the library/API and instructions for how to use it)

Why does it exist?
What is it?
How do you use it?
What are the benefits of using it?

[BUG] Type annotation breaks FAISS op

Bug description

This line has an incorrect type annotation that prevents the Faiss operator from running.

Steps/Code to reproduce bug

Run the end-to-end example notebooks

Expected behavior

Should run without errors (maybe the annotation should be List[Union[dict, list]?

Add the capability to select output formats at the operator and/or ensemble level (np/cp/pandas/cudf etc)
Adjust the operators as needed to work with pandas/cudf output
Figure out how to handle operators that fetch data from external systems (provide an option to replace that with a cudf join?)
...

inputs = triton.convert_df_to_triton_input(df.columns, df)
outputs = [grpcclient.InferRequestedOutput(col) for col in output_columns]
# send request to tritonserver
with grpcclient.InferenceServerClient("localhost:8001") as client:
    response = client.infer("ensemble_model", inputs, request_id="1", outputs=outputs)
    # access individual response columns to get values back.
for col in ensemble.graph.output_schema.column_names:
    print(col, response.as_numpy(col), response.as_numpy(col).shape)

would like to setup as:
response = send_triton_request(...)

[BUG] Convert cuDF to Triton Object does not work with null values

Bug description

If I use convert_df_to_triton_input with a dataframe containing Null values, then I get an error:

ValueError: Column must have no nulls.

We could replace .values_host with .to_pandas().values. However, that will change the nan value to -2147483648

Expected behavior

I can convert DataFrames with Nulls

nvidia-merlin / systems Goto Github PK

systems's Introduction

Quickstart

Building a Four-Stage Recommender Pipeline

Installation

Installing Merlin Systems Using Pip

Installing Merlin Systems from Source

Running Merlin Systems from Docker

Feedback and Support

systems's People

Contributors

Stargazers

Watchers

Forkers

systems's Issues

🚀 Feature request

🚀 Feature request

Motivation

Bug description

Steps/Code to reproduce bug

Environment details

🚀 Feature request

Bug description

Steps/Code to reproduce bug

Expected behavior

Bug description

Expected behavior

Recommend Projects

Recommend Topics

Recommend Org

Jobs