GithubHelp home page GithubHelp logo

Comments (8)

yhwang avatar yhwang commented on May 27, 2024

Here is the error message:

Traceback (most recent call last):
  File "/usr/src/app/swagger_server/util.py", line 259, in invoke_controller_impl
    results = impl_func(**parameters)
  File "/usr/src/app/swagger_server/controllers_impl/pipeline_service_controller_impl.py", line 194, in list_pipelines
    api_pipelines: [ApiPipeline] = load_data(ApiPipelineExtended, filter_dict=filter_dict, sort_by=sort_by,
  File "/usr/src/app/swagger_server/data_access/mysql_client.py", line 678, in load_data
    _verify_or_create_table(table_name, swagger_class)
  File "/usr/src/app/swagger_server/data_access/mysql_client.py", line 359, in _verify_or_create_table
    _validate_schema(table_name, swagger_class)
  File "/usr/src/app/swagger_server/data_access/mysql_client.py", line 440, in _validate_schema
    raise ApiError(err_msg)
swagger_server.util.ApiError: The MySQL table 'mlpipeline.pipelines_extended' does not match Swagger class 'ApiPipelineExtended'.
 Found table with columns:
  - 'UUID' varchar(255)
  - 'CreatedAtInSec' bigint(20)
  - 'Name' varchar(255)
  - 'Description' varchar(255)
  - 'Parameters' longtext
  - 'Status' varchar(255)
  - 'DefaultVersionId' varchar(255)
  - 'Namespace' varchar(255)
  - 'Annotations' longtext
  - 'Featured' tinyint(1)
  - 'PublishApproved' tinyint(1).
 Expected table with columns:
  - 'UUID' varchar(255)
  - 'CreatedAtInSec' bigint(20)
  - 'Name' varchar(255)
  - 'Description' longtext
  - 'Parameters' longtext
  - 'Status' varchar(255)
  - 'DefaultVersionId' varchar(255)
  - 'Namespace' varchar(63)
  - 'Annotations' longtext
  - 'Featured' tinyint(1)
  - 'PublishApproved' tinyint(1).
 Delete and recreate the table by calling the API endpoint 'DELETE /pipelines_extended/*' (500)

After importing the quickstart catalog, the pipelines url is good. I can see all pipeline cards. The stress test is sending requests to get 2 of the pipeline cards repeatedly. After I ran the test for a while, the /apis/v1alpha1/pipelines api started to sending back 500: internal server error and I saw the error message above in the mlx-api pod. I always start with 1 pod for mlx-api, after importing the quickstart catalog, I scale up to 3 or more pods. Not sure if this is related to the issue.

from mlx.

ckadner avatar ckadner commented on May 27, 2024

Could there have been some pods that crashed? There is a code path in the MLX API that creates the pipelines table if it does not exists. That code path was never used since we always find the pipelines table already created by KFP or by the init_db.sh script I wrote for the quickstart with Docker Compose.

from mlx.

Tomcli avatar Tomcli commented on May 27, 2024

@ckadner when I rerun the init_db.sh job, the tables are recreated and everything works fine. But once we ran the stress test again, then the above error will pop up.

from mlx.

ckadner avatar ckadner commented on May 27, 2024

@ckadner when I rerun the init_db.sh job, the tables are recreated and everything works fine. But once we ran the stress test again, then the above error will pop up.

that seems to indicate that the MLX API pod does not find the pipelines table and creates it with the wrong column length for the namespace column. This should not happen unless there is a new MySQL instance which does not get initialized in time before the first call the the MLX API to GET /apis/v1alpha1/pipelines

from mlx.

ckadner avatar ckadner commented on May 27, 2024

This may be an instance of inopportune timing due to the stress test scenario. If we need to support that, I can make changes to the MLX API. (In the Docker Compose setup I made the catalog upload service dependent on the MySQL service having finished the initialization.)

from mlx.

yhwang avatar yhwang commented on May 27, 2024

@ckadner I guess the problem is caused by the second or third pod when we scale up the mlx-api. Like I mentioned, we always do the quickstart import when the replicas=1, the 1st pod. Then I scale up the mlx-api to replicas=2 or 3. And this error will show up in 2nd and 3rd pod.

from mlx.

ckadner avatar ckadner commented on May 27, 2024

@ckadner I guess the problem is caused by the second or third pod when we scale up the mlx-api. Like I mentioned, we always do the quickstart import when the replicas=1, the 1st pod. Then I scale up the mlx-api to replicas=2 or 3. And this error will show up in 2nd and 3rd pod.

The 2nd or 3rd replica of MLX-API are connecting to the same (already initialized) MySQL database.

  • The init_db.sql was not being run. In Docker Compose the mysql service gets initialized via "magic" volume:
    volumes:
        - ./init_db.sql:/docker-entrypoint-initdb.d/init_db.sql
    MySQL uses this volume to find any initialization scripts and runs it, anything under /docker-entrypoint-initdb.d/ will be executed at startup of MySQL (PR #126)
  • The 1st mlx-api pod will not find the pipelines table and CREATE TABLE with the incorrect namespace column and internally remember that it created it
  • The 2nd and 3rd mlx-api pod with check and find the pipelines table exists, but then they go on to verify the table schema and complain about the incorrect namespace column length

from mlx.

ckadner avatar ckadner commented on May 27, 2024

The MLX API is not designed to be running with multiple replicas:

  • database schema initialization and/or schema verification is done once (and cached) at API startup (this issue)
  • API settings are currently file based and stored in the API pod, multiple pods can toggle on/off each others settings (i.e. the Inference Services were either on or off depending on which API instance was chosen, see #135)
  • GET request caching assumes single instance API deployment PR #140
  • there are likely more issues to be listed here :-)

from mlx.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.