machine-learning-exchange / katalog Goto Github PK

View Code? Open in Web Editor NEW

11.0 11.0 13.0 1.99 MB

MLX Katalog is a project to hold the default content samples to bootstrap Machine Learning eXchange.

License: Apache License 2.0

Python 3.19% Jupyter Notebook 96.09% Dockerfile 0.08% Makefile 0.06% Shell 0.57%

katalog's People

Contributors

Stargazers

Watchers

Forkers

tomcli mlx-bot henrywork02 srishtipithadia ckadner romeokienzler kmh4321 cwickniss jaulet rrm123 krishnakumar27 rafvasq jbusche

katalog's Issues

Import components from Tekton Hub

Identify suitable tasks from Tekton Hub
Create a script to convert the Tekton task YAML into a Kubeflow Pipeline component YAML
Add the generated component YAML to the Katalog
Add the new components to the two MLX catalog_upload.json files under bootstrapper and quickstart

@yhwang @animeshsingh

Notebook dependency versions should not be limited to `==`

TODO: open an issue in Elyra-AI notebook runner to not require == for Python dependency versions.

See elyra-ai/kfp-notebook: "bootstrapper.py", def package_list_to_dict()
https://github.com/elyra-ai/kfp-notebook/blob/v0.26.0/etc/docker-scripts/bootstrapper.py#L548

Should we move external pipelines from KFP-Tekton into the katalog?

The catalog_upload.json file (MLX bootstrapper) pulls in some external assets.

$ curl -s https://raw.githubusercontent.com/machine-learning-exchange/mlx/main/bootstrapper/catalog_upload.json | \
    grep -E '^  "|"url":' |  grep -v "/katalog/"

  "components": 
      "url": "https://raw.githubusercontent.com/Trusted-AI/AIF360/master/mlops/kubeflow/bias_detector_pytorch/component.yaml"
      "url": "https://raw.githubusercontent.com/Trusted-AI/adversarial-robustness-toolbox/main/utils/mlops/kubeflow/robustness_evaluation_fgsm_pytorch/component.yaml"

  "pipelines": 
      "url": "https://github.com/kubeflow/kfp-tekton/blob/master/sdk/python/tests/compiler/testdata/parallel_join.yaml"
      "url": "https://github.com/kubeflow/kfp-tekton/blob/master/sdk/python/tests/compiler/testdata/sequential.yaml"
      "url": "https://github.com/kubeflow/kfp-tekton/blob/master/sdk/python/tests/compiler/testdata/resourceop_basic.yaml"

The 2 Trusted-AI components we probably want to keep in their place. The KFP-Tekton pipelines however we may want to move them into the katalog repo, especially since they are only meant to be compiler test data in KFP-Tekton.

One of the impacts the current situation, is the script that updates the README cannot get a proper name (and description) for these pipelines and it cannot (re)generate the true listing of katalog assets. So whoever runs the script needs to heed the warning which the scripts prints out and exercise some judgment before committing the changes or risk that the external asset links may have been removed unintentionally.

A better way, albeit more work, might be to have the script read the existing README file, find absolute YAML links which point outside the katalog repo, and re-inject them into the newly generated README. But that introduces more additional complexity.

/cc @animeshsingh
/cc @Tomcli

Originally posted by @ckadner in #69 (comment)

Add script to generate/verify SDPX license headers

See machine-learning-exchange/mlx#210

Enhance `check_doc_links` script to verify links in YAML files

All too often there are typos in links or links run out of date. This is especially bad when MLX requires the URL to load resources like:

related assets
readme_url

A script for this already exists:
https://github.com/machine-learning-exchange/katalog/blob/main/tools/python/verify_doc_links.py

And the change to include YAML files is similar to the script in the mlx repo which included JSON files:
https://github.com/machine-learning-exchange/mlx/blob/20c66c092d1775cc15de2af906c835f0c4e066c1/tools/python/verify_doc_links.py#L34

Update "Related Links" in MAX/DAX YAMLs

The Related Links that point to other MAX/DAX assets could be replaced to point to the corresponding MAX/DAX assets in MLX -- either pointing to ml-exchange.org or use relative links that will open the asset within the local MLX UI

Originally posted by @ckadner in #15 (comment)

@kmh4321

Return top 3 predictions in Codenet Language Classifier model

@kmh4321 -- I thought we had decided to print the first 3 highest predictions, if there are more than one? When I run the prediction on a Python script I get 85% match for Haskel. It would be interesting to see the next best prediction.

Curl

curl -X POST "http://0.0.0.0:5000/model/predict" -H  "accept: application/json" -H  "Content-Type: multipart/form-data" -F 
"file=@animal_tasks.py;type=text/x-python-script"

Request URL

http://0.0.0.0:5000/model/predict

Server response

Code: 200

Response body:

{
  "status": "ok",
  "predictions": [
    {
      "language": "Haskell",
      "probability": 0.8541885018348694
    }
  ]
}

Originally posted by @ckadner in #41 (comment)

Create and document an authoritative list of `filter_categories` and "categorize" remaining MAX/DAX assets

We should create an authoritative list of filter_categories that users who upload models/datasets can rely on and expand on.

The authoritative list will also be helpful when deciding which labels the UI will put on the featured cards (@drewbutlerbb4 ):

    filter_categories:
      domain:      "image-to-text"
      platform:    ["kubernetes", "kfserving"]
      language:    "python"
      framework:   "tensorflow"
      industry:    "finance"
      application: ["devops", "CI/CD"]
      mediatype:    ["audio", "video", "text"]
      ...

If we run out of good ideas for some more interesting categorizations, we could that a look at the LF conference submissions, where they slot talks into various categories (see https://linuxfoundation.smapply.io/prog/)

Originally posted by @ckadner in #16 (comment)

Add Codenet language classifier Model

Thanks @kmh4321 for working on this!

Tasks:

create the model source repo on Github, with REST API and Dockerfile
build and push the Docker image
add the Model YAML to the katalog repo
add the Model to the Bootstrapper and Quickstart in the mlx repo
fix up the README (fill in place holders)

Future MAX and DAX assets

List of MAX models to be migrated into MLX:

CodeNet code complexity analysis (currently prototype)
...

List of DAX datasets to be integrated into MLX:

FinTabNet - 15 GB
Genomics (2 datasets) - 80 GB -> need to confirm if we need this
...

@kmh4321

Broken links in MAX and DAX asset READMEs

Run the following command:

$ make check_doc_links

See all the broken links in *.md files in the katalog project subfolders:

Checking for Markdown files here:

  **/*.md

Checked 534 links (270 unique URLs) in 29 Markdown files.

dataset-samples/codenet_langclass/codenet_langclass.md:5: `[paper](https://github.com/IBM/Project_CodeNet/blob/main/ProjectCodeNet.pdf)` 404
dataset-samples/codenet/codenet.md:3: `[paper](https://github.com/IBM/Project_CodeNet/blob/main/ProjectCodeNet.pdf)` 404
dataset-samples/codenet_mlm/codenet_mlm.md:5: `[paper](https://github.com/IBM/Project_CodeNet/blob/main/ProjectCodeNet.pdf)` 404
model-samples/codenet-language-classification/codenet-language-classification.md:21: `[LICENSE](/model-samples/codenet-language-classification/LICENSE)` 404
model-samples/codenet-language-classification/codenet-language-classification.md:22: `[LICENSE](/model-samples/codenet-language-classification/LICENSE)` 404
model-samples/codenet-language-classification/codenet-language-classification.md:23: `[LICENSE](/model-samples/codenet-language-classification/LICENSE)` 404
model-samples/codenet-language-classification/codenet-language-classification.md:24: `[samples README](/model-samples/codenet-language-classification/samples/README.md)` 404
model-samples/codenet-language-classification/codenet-language-classification.md:114: `[INSERT SWAGGER UI SCREENSHOT HERE](/model-samples/codenet-language-classification/docs/swagger-screenshot.png)` 404
model-samples/max-image-resolution-enhancer/max-image-resolution-enhancer.md:93: `[Example Result](/model-samples/max-image-resolution-enhancer/images/example.png)` 404
model-samples/max-question-answering/max-question-answering.md:82: `[demo notebook](/model-samples/max-question-answering/[https://github.com/IBM/MAX-Question-Answering/samples/demo.ipynb])` 404
model-samples/max-recommender/max-recommender.md:3: `[Neural Collaborative Filtering model](/model-samples/max-recommender/[https://github.com/microsoft/recommenders])` 404
model-samples/max-weather-forecaster/max-weather-forecaster.md:5: `[CODAIT team](/model-samples/max-weather-forecaster/codait.org)` 404

ERROR: Found 12 invalid Markdown links
make: *** [check_doc_links] Error 1

Model and Dataset links are broken on main README

FYI @animeshsingh @Tomcli

I will fix those shortly

Add `id` field to notebook YAML

Currently the notebook YAMLs do not have an id or notebook_identifier field. This makes it hard to reference a notebook as a related_asset from other assets, like datasets.

Note: Make sure the value of the ids added to the YAML files match the existing ones:

curl -X GET --header 'Accept: application/json' 'http://ml-exchange.org/apis/v1alpha1/notebooks' -s | grep '"id":'

      "id": "aif360-bias-detection-example",
      "id": "art-detector-model",
      "id": "art-poisoning-attack",
      "id": "jfk-airport-analysis",
      "id": "project-codenet-language-classification",
      "id": "project-codenet-mlm",
      "id": "qiskit-neural-network-classifier-and-regressor",
      "id": "qiskit-quantum-kernel-machine-learning",

These got generated from the name of the notebook, by lower-casing and replacing the spaces with dashes -

Related issues:

machine-learning-exchange/mlx#212

Rename MAX/DAX markdown files to README.md

So that they get displayed in Github when users land in the respective folders.

Rename the *.md files to README.md under dataset-samples/* and model-samples/*
Update the readme_url fields in the accompanying *.yaml files

Should the `sklearn-maintainer-model` notebook be in the catalog?

https://github.com/machine-learning-exchange/katalog/blob/main/notebook-samples/src/notebook-yaml-samples/maintainer.yaml

@Tomcli

Add READMEs for MAX models

Similar to how we show markdown content (README.md) for DAX datasets, we need to add README.md files for MAX models.

Related issues:

Error in cell 5 of "ART Detector Model" notebook

The ART Detector Model notebook has an error in cell 5.

UnknownError:  Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above.
	 [[node conv2d_1/convolution (defined at /home/beat/codes/anaconda3/envs/py37_tf220/lib/python3.7/site-packages/keras/backend/tensorflow_backend.py:3009) ]] [Op:__inference_keras_scratch_graph_32692]

@kmh4321

Add related assets (notebooks) to CodeNet dataset

In order to run the 2 notebooks associated with the CodeNet dataset, the dataset YAML needs to include a related_assets like this:

    related_assets:
      - name: MLX Language classification notebook
        description: Language classification notebook with CodeNet dataset mounted as PVC
        mime_type: application/x-ipynb+json
        application:
          name: MLX
          asset_id: notebooks/project-codenet-language-classification

Update compiled pipeline katalog with Kubeflow 1.4

Right now our pipeline katalog is compiled with Kubeflow 1.3 (kfp-tekton 0.8) SDK. Although they can run on Kubeflow 1.4, so of the features like metadata tracking was not fully functioning because Kubeflow 1.4 has slightly different on the artifact name mapping.

Add script to update the asset list in main README

https://github.com/machine-learning-exchange/katalog#list-of-default-katalog-assets

This list needs to reflect the actual assets in the katalog and indicate if it is featured based on bootstrapper/catalog_upload.json in mlx repo.

And of course the links need to be verified (see #43)

Remove "MAX" prefix from model names

katalog README (#46)

model YAMLs name (#47)

name: MAX Image Caption Generator
model_identifier: max-image-caption-generator
description: "IBM Model Asset eXchange(MAX) model that generates captions from a fixed vocabulary describing the contents of images in the COCO dataset"

keep asset IDs for model URLs
mlx bootstrapper and quickstart catalog_upload.json (machine-learning-exchange/mlx#224)
list of models in model-samples/README.md (#50)
rename YAML and MD file names (optional)

FYI @animeshsingh

Add Qiskit notebook(s)

https://qiskit.org/textbook/ch-prerequisites/python-and-jupyter-notebooks.html

@animeshsingh @kmh4321

Add links to AI FactSheets to model READMEs

From @michaelhind

Previously the MAX webpage linked to the example FactSheet for the models that have FactSheets. Likewise, the FactSheet examples linked to the MAX model page. With MAX moving to ML-Exchange, it seems that link to the FactSheets has been lost.

Would it be possible to add the link?

Here are the 4 FSs that correspond to models in ML Exchange. (We also have 1 for Audio Classifier, but it looks like that one didn’t make it to ML Exchange.)

https://aifs360.mybluemix.net/examples/max_object_detector

https://aifs360.mybluemix.net/examples/max_image_caption_generator

https://aifs360.mybluemix.net/examples/max_text_sentiment_classifier

https://aifs360.mybluemix.net/examples/max_weather_forecaster

machine-learning-exchange / katalog Goto Github PK

katalog's People

Contributors

Stargazers

Watchers

Forkers

katalog's Issues

Server response

Recommend Projects

Recommend Topics

Recommend Org

Jobs