machine-learning-exchange / katalog Goto Github PK
View Code? Open in Web Editor NEWMLX Katalog is a project to hold the default content samples to bootstrap Machine Learning eXchange.
License: Apache License 2.0
MLX Katalog is a project to hold the default content samples to bootstrap Machine Learning eXchange.
License: Apache License 2.0
catalog_upload.json
files under bootstrapper and quickstartTODO: open an issue in Elyra-AI notebook runner to not require ==
for Python dependency versions.
See elyra-ai/kfp-notebook
: "bootstrapper.py"
, def package_list_to_dict()
https://github.com/elyra-ai/kfp-notebook/blob/v0.26.0/etc/docker-scripts/bootstrapper.py#L548
The catalog_upload.json
file (MLX bootstrapper) pulls in some external assets.
$ curl -s https://raw.githubusercontent.com/machine-learning-exchange/mlx/main/bootstrapper/catalog_upload.json | \
grep -E '^ "|"url":' | grep -v "/katalog/"
"components":
"url": "https://raw.githubusercontent.com/Trusted-AI/AIF360/master/mlops/kubeflow/bias_detector_pytorch/component.yaml"
"url": "https://raw.githubusercontent.com/Trusted-AI/adversarial-robustness-toolbox/main/utils/mlops/kubeflow/robustness_evaluation_fgsm_pytorch/component.yaml"
"pipelines":
"url": "https://github.com/kubeflow/kfp-tekton/blob/master/sdk/python/tests/compiler/testdata/parallel_join.yaml"
"url": "https://github.com/kubeflow/kfp-tekton/blob/master/sdk/python/tests/compiler/testdata/sequential.yaml"
"url": "https://github.com/kubeflow/kfp-tekton/blob/master/sdk/python/tests/compiler/testdata/resourceop_basic.yaml"
The 2 Trusted-AI components we probably want to keep in their place. The KFP-Tekton pipelines however we may want to move them into the katalog repo, especially since they are only meant to be compiler test data in KFP-Tekton.
One of the impacts the current situation, is the script that updates the README cannot get a proper name (and description) for these pipelines and it cannot (re)generate the true listing of katalog assets. So whoever runs the script needs to heed the warning which the scripts prints out and exercise some judgment before committing the changes or risk that the external asset links may have been removed unintentionally.
A better way, albeit more work, might be to have the script read the existing README file, find absolute YAML links which point outside the katalog
repo, and re-inject them into the newly generated README. But that introduces more additional complexity.
/cc @animeshsingh
/cc @Tomcli
Originally posted by @ckadner in #69 (comment)
All too often there are typos in links or links run out of date. This is especially bad when MLX requires the URL to load resources like:
related assets
readme_url
A script for this already exists:
https://github.com/machine-learning-exchange/katalog/blob/main/tools/python/verify_doc_links.py
And the change to include YAML files is similar to the script in the mlx
repo which included JSON files:
https://github.com/machine-learning-exchange/mlx/blob/20c66c092d1775cc15de2af906c835f0c4e066c1/tools/python/verify_doc_links.py#L34
The Related Links
that point to other MAX/DAX assets could be replaced to point to the corresponding MAX/DAX assets in MLX -- either pointing to ml-exchange.org
or use relative links that will open the asset within the local MLX UI
Originally posted by @ckadner in #15 (comment)
@kmh4321 -- I thought we had decided to print the first 3 highest predictions, if there are more than one? When I run the prediction on a Python script I get 85% match for Haskel. It would be interesting to see the next best prediction.
Curl
curl -X POST "http://0.0.0.0:5000/model/predict" -H "accept: application/json" -H "Content-Type: multipart/form-data" -F
"file=@animal_tasks.py;type=text/x-python-script"
Request URL
http://0.0.0.0:5000/model/predict
Code: 200
Response body:
{
"status": "ok",
"predictions": [
{
"language": "Haskell",
"probability": 0.8541885018348694
}
]
}
Originally posted by @ckadner in #41 (comment)
We should create an authoritative list of filter_categories
that users who upload models/datasets can rely on and expand on.
The authoritative list will also be helpful when deciding which labels the UI will put on the featured cards (@drewbutlerbb4 ):
filter_categories:
domain: "image-to-text"
platform: ["kubernetes", "kfserving"]
language: "python"
framework: "tensorflow"
industry: "finance"
application: ["devops", "CI/CD"]
mediatype: ["audio", "video", "text"]
...
If we run out of good ideas for some more interesting categorizations, we could that a look at the LF conference submissions, where they slot talks into various categories (see https://linuxfoundation.smapply.io/prog/)
Originally posted by @ckadner in #16 (comment)
Thanks @kmh4321 for working on this!
Tasks:
mlx
repoList of MAX models to be migrated into MLX:
List of DAX datasets to be integrated into MLX:
Run the following command:
$ make check_doc_links
See all the broken links in *.md
files in the katalog project subfolders:
Checking for Markdown files here:
**/*.md
Checked 534 links (270 unique URLs) in 29 Markdown files.
dataset-samples/codenet_langclass/codenet_langclass.md:5: `[paper](https://github.com/IBM/Project_CodeNet/blob/main/ProjectCodeNet.pdf)` 404
dataset-samples/codenet/codenet.md:3: `[paper](https://github.com/IBM/Project_CodeNet/blob/main/ProjectCodeNet.pdf)` 404
dataset-samples/codenet_mlm/codenet_mlm.md:5: `[paper](https://github.com/IBM/Project_CodeNet/blob/main/ProjectCodeNet.pdf)` 404
model-samples/codenet-language-classification/codenet-language-classification.md:21: `[LICENSE](/model-samples/codenet-language-classification/LICENSE)` 404
model-samples/codenet-language-classification/codenet-language-classification.md:22: `[LICENSE](/model-samples/codenet-language-classification/LICENSE)` 404
model-samples/codenet-language-classification/codenet-language-classification.md:23: `[LICENSE](/model-samples/codenet-language-classification/LICENSE)` 404
model-samples/codenet-language-classification/codenet-language-classification.md:24: `[samples README](/model-samples/codenet-language-classification/samples/README.md)` 404
model-samples/codenet-language-classification/codenet-language-classification.md:114: `[INSERT SWAGGER UI SCREENSHOT HERE](/model-samples/codenet-language-classification/docs/swagger-screenshot.png)` 404
model-samples/max-image-resolution-enhancer/max-image-resolution-enhancer.md:93: `[Example Result](/model-samples/max-image-resolution-enhancer/images/example.png)` 404
model-samples/max-question-answering/max-question-answering.md:82: `[demo notebook](/model-samples/max-question-answering/[https://github.com/IBM/MAX-Question-Answering/samples/demo.ipynb])` 404
model-samples/max-recommender/max-recommender.md:3: `[Neural Collaborative Filtering model](/model-samples/max-recommender/[https://github.com/microsoft/recommenders])` 404
model-samples/max-weather-forecaster/max-weather-forecaster.md:5: `[CODAIT team](/model-samples/max-weather-forecaster/codait.org)` 404
ERROR: Found 12 invalid Markdown links
make: *** [check_doc_links] Error 1
FYI @animeshsingh @Tomcli
I will fix those shortly
Currently the notebook YAMLs do not have an id
or notebook_identifier
field. This makes it hard to reference a notebook as a related_asset
from other assets, like datasets.
Note: Make sure the value of the id
s added to the YAML files match the existing ones:
curl -X GET --header 'Accept: application/json' 'http://ml-exchange.org/apis/v1alpha1/notebooks' -s | grep '"id":'
"id": "aif360-bias-detection-example",
"id": "art-detector-model",
"id": "art-poisoning-attack",
"id": "jfk-airport-analysis",
"id": "project-codenet-language-classification",
"id": "project-codenet-mlm",
"id": "qiskit-neural-network-classifier-and-regressor",
"id": "qiskit-quantum-kernel-machine-learning",
These got generated from the name
of the notebook, by lower-casing and replacing the spaces with dashes -
Related issues:
So that they get displayed in Github when users land in the respective folders.
*.md
files to README.md
under dataset-samples/*
and model-samples/*
readme_url
fields in the accompanying *.yaml
filesSimilar to how we show markdown content (README.md
) for DAX datasets, we need to add README.md files for MAX models.
Related issues:
The ART Detector Model notebook has an error in cell 5.
UnknownError: Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above.
[[node conv2d_1/convolution (defined at /home/beat/codes/anaconda3/envs/py37_tf220/lib/python3.7/site-packages/keras/backend/tensorflow_backend.py:3009) ]] [Op:__inference_keras_scratch_graph_32692]
In order to run the 2 notebooks associated with the CodeNet dataset, the dataset YAML needs to include a related_assets
like this:
related_assets:
- name: MLX Language classification notebook
description: Language classification notebook with CodeNet dataset mounted as PVC
mime_type: application/x-ipynb+json
application:
name: MLX
asset_id: notebooks/project-codenet-language-classification
Right now our pipeline katalog is compiled with Kubeflow 1.3 (kfp-tekton 0.8) SDK. Although they can run on Kubeflow 1.4, so of the features like metadata tracking was not fully functioning because Kubeflow 1.4 has slightly different on the artifact name mapping.
https://github.com/machine-learning-exchange/katalog#list-of-default-katalog-assets
This list needs to reflect the actual assets in the katalog and indicate if it is featured based on bootstrapper/catalog_upload.json
in mlx
repo.
And of course the links need to be verified (see #43)
name
(#47)
name: MAX Image Caption Generator
model_identifier: max-image-caption-generator
description: "IBM Model Asset eXchange(MAX) model that generates captions from a fixed vocabulary describing the contents of images in the COCO dataset"
mlx
bootstrapper and quickstart catalog_upload.json
(machine-learning-exchange/mlx#224)FYI @animeshsingh
From @michaelhind
Previously the MAX webpage linked to the example FactSheet for the models that have FactSheets. Likewise, the FactSheet examples linked to the MAX model page. With MAX moving to ML-Exchange, it seems that link to the FactSheets has been lost.
Would it be possible to add the link?
Here are the 4 FSs that correspond to models in ML Exchange. (We also have 1 for Audio Classifier, but it looks like that one didn’t make it to ML Exchange.)
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.