GithubHelp home page GithubHelp logo

machine-learning-exchange / katalog Goto Github PK

View Code? Open in Web Editor NEW
11.0 11.0 13.0 1.99 MB

MLX Katalog is a project to hold the default content samples to bootstrap Machine Learning eXchange.

License: Apache License 2.0

Python 3.19% Jupyter Notebook 96.09% Dockerfile 0.08% Makefile 0.06% Shell 0.57%

katalog's People

Contributors

animeshsingh avatar ckadner avatar drewbutlerbb4 avatar jaulet avatar kmh4321 avatar mlx-bot avatar romeokienzler avatar srishtipithadia avatar tomcli avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

katalog's Issues

Should we move external pipelines from KFP-Tekton into the katalog?

The catalog_upload.json file (MLX bootstrapper) pulls in some external assets.

$ curl -s https://raw.githubusercontent.com/machine-learning-exchange/mlx/main/bootstrapper/catalog_upload.json | \
    grep -E '^  "|"url":' |  grep -v "/katalog/"

  "components": 
      "url": "https://raw.githubusercontent.com/Trusted-AI/AIF360/master/mlops/kubeflow/bias_detector_pytorch/component.yaml"
      "url": "https://raw.githubusercontent.com/Trusted-AI/adversarial-robustness-toolbox/main/utils/mlops/kubeflow/robustness_evaluation_fgsm_pytorch/component.yaml"

  "pipelines": 
      "url": "https://github.com/kubeflow/kfp-tekton/blob/master/sdk/python/tests/compiler/testdata/parallel_join.yaml"
      "url": "https://github.com/kubeflow/kfp-tekton/blob/master/sdk/python/tests/compiler/testdata/sequential.yaml"
      "url": "https://github.com/kubeflow/kfp-tekton/blob/master/sdk/python/tests/compiler/testdata/resourceop_basic.yaml"


The 2 Trusted-AI components we probably want to keep in their place. The KFP-Tekton pipelines however we may want to move them into the katalog repo, especially since they are only meant to be compiler test data in KFP-Tekton.

One of the impacts the current situation, is the script that updates the README cannot get a proper name (and description) for these pipelines and it cannot (re)generate the true listing of katalog assets. So whoever runs the script needs to heed the warning which the scripts prints out and exercise some judgment before committing the changes or risk that the external asset links may have been removed unintentionally.

A better way, albeit more work, might be to have the script read the existing README file, find absolute YAML links which point outside the katalog repo, and re-inject them into the newly generated README. But that introduces more additional complexity.

/cc @animeshsingh
/cc @Tomcli

Originally posted by @ckadner in #69 (comment)

Enhance `check_doc_links` script to verify links in YAML files

All too often there are typos in links or links run out of date. This is especially bad when MLX requires the URL to load resources like:

  • related assets
  • readme_url

A script for this already exists:
https://github.com/machine-learning-exchange/katalog/blob/main/tools/python/verify_doc_links.py

And the change to include YAML files is similar to the script in the mlx repo which included JSON files:
https://github.com/machine-learning-exchange/mlx/blob/20c66c092d1775cc15de2af906c835f0c4e066c1/tools/python/verify_doc_links.py#L34

Return top 3 predictions in Codenet Language Classifier model

@kmh4321 -- I thought we had decided to print the first 3 highest predictions, if there are more than one? When I run the prediction on a Python script I get 85% match for Haskel. It would be interesting to see the next best prediction.

Curl

curl -X POST "http://0.0.0.0:5000/model/predict" -H  "accept: application/json" -H  "Content-Type: multipart/form-data" -F 
"file=@animal_tasks.py;type=text/x-python-script"

Request URL

http://0.0.0.0:5000/model/predict

Server response

Code: 200

Response body:

{
  "status": "ok",
  "predictions": [
    {
      "language": "Haskell",
      "probability": 0.8541885018348694
    }
  ]
}

Originally posted by @ckadner in #41 (comment)

Create and document an authoritative list of `filter_categories` and "categorize" remaining MAX/DAX assets

We should create an authoritative list of filter_categories that users who upload models/datasets can rely on and expand on.

The authoritative list will also be helpful when deciding which labels the UI will put on the featured cards (@drewbutlerbb4 ):

    filter_categories:
      domain:      "image-to-text"
      platform:    ["kubernetes", "kfserving"]
      language:    "python"
      framework:   "tensorflow"
      industry:    "finance"
      application: ["devops", "CI/CD"]
      mediatype:    ["audio", "video", "text"]
      ...

If we run out of good ideas for some more interesting categorizations, we could that a look at the LF conference submissions, where they slot talks into various categories (see https://linuxfoundation.smapply.io/prog/)

Originally posted by @ckadner in #16 (comment)

Add Codenet language classifier Model

Thanks @kmh4321 for working on this!

Tasks:

  • create the model source repo on Github, with REST API and Dockerfile
  • build and push the Docker image
  • add the Model YAML to the katalog repo
  • add the Model to the Bootstrapper and Quickstart in the mlx repo
  • fix up the README (fill in place holders)

Future MAX and DAX assets

List of MAX models to be migrated into MLX:

  • CodeNet code complexity analysis (currently prototype)
  • ...

List of DAX datasets to be integrated into MLX:

  • FinTabNet - 15 GB
  • Genomics (2 datasets) - 80 GB -> need to confirm if we need this
  • ...

@kmh4321

Broken links in MAX and DAX asset READMEs

Run the following command:

$ make check_doc_links

See all the broken links in *.md files in the katalog project subfolders:

Checking for Markdown files here:

  **/*.md

Checked 534 links (270 unique URLs) in 29 Markdown files.

dataset-samples/codenet_langclass/codenet_langclass.md:5: `[paper](https://github.com/IBM/Project_CodeNet/blob/main/ProjectCodeNet.pdf)` 404
dataset-samples/codenet/codenet.md:3: `[paper](https://github.com/IBM/Project_CodeNet/blob/main/ProjectCodeNet.pdf)` 404
dataset-samples/codenet_mlm/codenet_mlm.md:5: `[paper](https://github.com/IBM/Project_CodeNet/blob/main/ProjectCodeNet.pdf)` 404
model-samples/codenet-language-classification/codenet-language-classification.md:21: `[LICENSE](/model-samples/codenet-language-classification/LICENSE)` 404
model-samples/codenet-language-classification/codenet-language-classification.md:22: `[LICENSE](/model-samples/codenet-language-classification/LICENSE)` 404
model-samples/codenet-language-classification/codenet-language-classification.md:23: `[LICENSE](/model-samples/codenet-language-classification/LICENSE)` 404
model-samples/codenet-language-classification/codenet-language-classification.md:24: `[samples README](/model-samples/codenet-language-classification/samples/README.md)` 404
model-samples/codenet-language-classification/codenet-language-classification.md:114: `[INSERT SWAGGER UI SCREENSHOT HERE](/model-samples/codenet-language-classification/docs/swagger-screenshot.png)` 404
model-samples/max-image-resolution-enhancer/max-image-resolution-enhancer.md:93: `[Example Result](/model-samples/max-image-resolution-enhancer/images/example.png)` 404
model-samples/max-question-answering/max-question-answering.md:82: `[demo notebook](/model-samples/max-question-answering/[https://github.com/IBM/MAX-Question-Answering/samples/demo.ipynb])` 404
model-samples/max-recommender/max-recommender.md:3: `[Neural Collaborative Filtering model](/model-samples/max-recommender/[https://github.com/microsoft/recommenders])` 404
model-samples/max-weather-forecaster/max-weather-forecaster.md:5: `[CODAIT team](/model-samples/max-weather-forecaster/codait.org)` 404

ERROR: Found 12 invalid Markdown links
make: *** [check_doc_links] Error 1

Add `id` field to notebook YAML

Currently the notebook YAMLs do not have an id or notebook_identifier field. This makes it hard to reference a notebook as a related_asset from other assets, like datasets.

Note: Make sure the value of the ids added to the YAML files match the existing ones:

curl -X GET --header 'Accept: application/json' 'http://ml-exchange.org/apis/v1alpha1/notebooks' -s | grep '"id":'
      "id": "aif360-bias-detection-example",
      "id": "art-detector-model",
      "id": "art-poisoning-attack",
      "id": "jfk-airport-analysis",
      "id": "project-codenet-language-classification",
      "id": "project-codenet-mlm",
      "id": "qiskit-neural-network-classifier-and-regressor",
      "id": "qiskit-quantum-kernel-machine-learning",

These got generated from the name of the notebook, by lower-casing and replacing the spaces with dashes -

Related issues:

Rename MAX/DAX markdown files to README.md

So that they get displayed in Github when users land in the respective folders.

  • Rename the *.md files to README.md under dataset-samples/* and model-samples/*
  • Update the readme_url fields in the accompanying *.yaml files

Add READMEs for MAX models

Similar to how we show markdown content (README.md) for DAX datasets, we need to add README.md files for MAX models.

Related issues:

Error in cell 5 of "ART Detector Model" notebook

The ART Detector Model notebook has an error in cell 5.

UnknownError:  Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above.
	 [[node conv2d_1/convolution (defined at /home/beat/codes/anaconda3/envs/py37_tf220/lib/python3.7/site-packages/keras/backend/tensorflow_backend.py:3009) ]] [Op:__inference_keras_scratch_graph_32692]

@kmh4321

Add related assets (notebooks) to CodeNet dataset

In order to run the 2 notebooks associated with the CodeNet dataset, the dataset YAML needs to include a related_assets like this:

    related_assets:
      - name: MLX Language classification notebook
        description: Language classification notebook with CodeNet dataset mounted as PVC
        mime_type: application/x-ipynb+json
        application:
          name: MLX
          asset_id: notebooks/project-codenet-language-classification

Update compiled pipeline katalog with Kubeflow 1.4

Right now our pipeline katalog is compiled with Kubeflow 1.3 (kfp-tekton 0.8) SDK. Although they can run on Kubeflow 1.4, so of the features like metadata tracking was not fully functioning because Kubeflow 1.4 has slightly different on the artifact name mapping.

Remove "MAX" prefix from model names

  • katalog README (#46)
  • model YAMLs name (#47)
    name: MAX Image Caption Generator
    model_identifier: max-image-caption-generator
    description: "IBM Model Asset eXchange(MAX) model that generates captions from a fixed vocabulary describing the contents of images in the COCO dataset"
  • keep asset IDs for model URLs
  • mlx bootstrapper and quickstart catalog_upload.json (machine-learning-exchange/mlx#224)
  • list of models in model-samples/README.md (#50)
  • rename YAML and MD file names (optional)

FYI @animeshsingh

Add links to AI FactSheets to model READMEs

From @michaelhind

Previously the MAX webpage linked to the example FactSheet for the models that have FactSheets. Likewise, the FactSheet examples linked to the MAX model page. With MAX moving to ML-Exchange, it seems that link to the FactSheets has been lost.

Would it be possible to add the link?

Here are the 4 FSs that correspond to models in ML Exchange. (We also have 1 for Audio Classifier, but it looks like that one didn’t make it to ML Exchange.)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.