GithubHelp home page GithubHelp logo

kubeedge / sedna Goto Github PK

View Code? Open in Web Editor NEW
499.0 16.0 160.0 41.2 MB

AI tookit over KubeEdge

Home Page: https://sedna.readthedocs.io

License: Apache License 2.0

Makefile 0.48% Dockerfile 0.33% Go 49.16% Python 41.27% Shell 8.76%

sedna's Introduction

English | 简体中文

Sedna

CI Go Report Card LICENSE

What is Sedna?

Sedna is an edge-cloud synergy AI project incubated in KubeEdge SIG AI. Benefiting from the edge-cloud synergy capabilities provided by KubeEdge, Sedna can implement across edge-cloud collaborative training and collaborative inference capabilities, such as joint inference, incremental learning, federated learning, and lifelong learning. Sedna supports popular AI frameworks, such as TensorFlow, Pytorch, PaddlePaddle, MindSpore.

Sedna can simply enable edge-cloud synergy capabilities to existing training and inference scripts, bringing the benefits of reducing costs, improving model performance, and protecting data privacy.

Features

Sedna has the following features:

  • Provide the edge-cloud synergy AI framework.

    • Provide dataset and model management across edge-cloud, helping developers quickly implement synergy AI applications.
  • Provide edge-cloud synergy training and inference frameworks.

    • Joint inference: under the condition of limited resources on the edge, difficult inference tasks are offloaded to the cloud to improve the overall performance, keeping the throughput.
    • Incremental training: For small samples and non-iid data on the edge, models can be adaptively optimized over time on the cloud or edge.
    • Federated learning: For those scenarios that the data being too large, or unwilling to migrate raw data to the cloud, or high privacy protection requirements, models are trained at the edge and parameters are aggregated on the cloud to resolve data silos effectively.
    • Lifelong learning: Confronted with the challenge of heterogeneous data distributions in complex scenarios and small samples on the edge, the edge-cloud synergy lifelong learning:
      • leverages the cloud knowledge base which empowers the scheme with memory ability, which helps to continuously learn and accumulate historical knowledge to overcome the catastrophic forgetting challenge.
      • is essentially the combination of another two learning schemes, i.e., multi-task learning and incremental learning, so that it can learn unseen tasks with shared knowledge among various scenarios over time.
    • etc..
  • Compatibility

    • Compatible with mainstream AI frameworks such as TensorFlow, Pytorch, PaddlePaddle, and MindSpore.
    • Provides extended interfaces for developers to quickly integrate third-party algorithms, and some necessary algorithms for edge-cloud synergy have been preset, such as hard sample discovering, aggregation algorithm.

Architecture

Sedna's edge-cloud synergy is implemented based on the following capabilities provided by KubeEdge:

  • Unified orchestration of across edge-cloud applications.
  • Router: across edge-cloud message channel in management plane.
  • EdgeMesh: across edge-cloud microservice discovery and traffic governance in data plane.

Component

Sedna consists of the following components:

GlobalManager

  • Unified edge-cloud synergy AI task management
  • Cross edge-cloud synergy management and collaboration
  • Central Configuration Management

LocalController

  • Local process control of edge-cloud synergy AI tasks
  • Local general management: model, dataset, and status synchronization

Worker

  • Do inference or training, based on existing ML framework.
  • Launch on demand, imagine they are docker containers.
  • Different workers for different features.
  • Could run on edge or cloud.

Lib

  • Expose the Edge AI features to applications, i.e. training or inference programs.

Guides

Documents

Documentation is located on readthedoc.io. These documents can help you understand Sedna better.

Installation

Follow the Sedna installation document to install Sedna.

Examples

Example1:Using Joint Inference Service in Helmet Detection Scenario.
Example2:Using Incremental Learning Job in Helmet Detection Scenario.
Example3:Using Federated Learning Job in Surface Defect Detection Scenario.
Example4:Using Federated Learning Job in YoLov5-based Object Detection.
Example5:Using Lifelong Learning Job in Thermal Comfort Prediction Scenario.
Example6:Using MultiEdge Inference Service to Track an Infected COVID-19 Carrier in Pandemic Scenarios.

Roadmap

Meeting

Regular Community Meeting:

Resources:

Contact

If you have questions, feel free to reach out to us in the following ways:

Contributing

If you're interested in being a contributor and want to get involved in developing the Sedna code, please see CONTRIBUTING for details on submitting patches and the contribution workflow.

License

Sedna is under the Apache 2.0 license. See the LICENSE file for details.

sedna's People

Contributors

congrool avatar danny200 avatar enfangcui avatar henrychou90 avatar hey-kong avatar jaypume avatar joeyhwong-gk avatar kanakami avatar kevin-wangzefeng avatar khalid-huang avatar kubeedge-bot avatar lj1ang avatar luosiqi avatar poorunga avatar qxygxt avatar ratuchetp avatar ryanzhaoxb avatar shelley-baoyue avatar sherlockshemol avatar soumajm avatar tangming1996 avatar tymonxie avatar vcozzolino avatar wbc6080 avatar wjf222 avatar xhken avatar xinyao1994 avatar xujingjing-cmss avatar ymh13383894400 avatar yqhok1 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

sedna's Issues

How does federated learning server and client communicate through KubeEdge?

Hi,

The server and client of federated learning communicate through websocket in the example provided here

f"ws://{self.config.bind_ip}:{self.config.bind_port}")
.

While I cannot find the settings that actually bind the IP and Port to environment variables ENV. I am also wondering (1) whether the websocket is through KubeEdge or a separate connection? If it is, (2) how does the websocket actually run connect with KubeEdge network?

Thank you.

Add multi-edge collaborative inference support

Motivation

Multi-edge collaborative inference refers to the use of multiple edge computing nodes for collaborative inference. This technology can make full use of the distributed computing resources of edge computing nodes, reduce the delay of edge AI services and improve inference accuracy and throughput. It is a key technology of edge intelligence Therefore, we propose a multi-edge collaborative inference framework to help users build multi-edge collaborative AI business easily based on KubeEdge.

Goals

  • The framework can utilize multiple edge computing nodes for collaborative inference.
  • Utilize KubeEdge's EdgeMesh to realize multi-edge load balancing.
  • Provide typical multi-edge inference case study (such as ReID, multi-source data fusion, etc.).

Solution

Take pedestrian ReID as a example:

  1. pedestrian ReID workflow
    image

The client is used to read the camera video data and carry out the first stage of inference.
The server is used to receive the region proposal predicted by the client for final target detection and pedestrian feature matching.

  1. CRD example (preliminary design)
apiVersion: sedna.io/v1alpha1
kind: MultiEdgeInferenceService
metadata:
  name: pedestrian-reid
  namespace: default
spec:
  clientWorkers:
    -template:
      spec:
        containers:
          - image: kubeedge/sedna-example-multi-edge-inference-reid-client:v0.1.0
          ......
        nodeSelector:
          ......
  serverWorkers:
    -template:
      metadata:
      labels:
        app: reid-server
      spec:
        containers:
          - image: kubeedge/sedna-example-multi-edge-inference-reid-server:v0.1.0
          ......
        nodeSelector:
          ......
    -template:
      metadata:
      labels:
        app: user-server
      spec:
        containers:
          - image: kubeedge/sedna-example-multi-edge-inference-user-server:v0.1.0
          ......
        nodeSelector:
          ......
    -template:
      metadata:
      labels:
        app: mqtt-server
      spec:
        containers:
          - image: kubeedge/sedna-example-multi-edge-inference-mqtt-server:v0.1.0
          ......
        nodeSelector:
          ......

Open issues

We hope to build a general framework to adapt to a variety of multi-edge collaborative inference scenarios. Therefore, users and developers are welcome to put forward more scenarios, applications and requirements to improve our architecture and CRD interface design.

Add nodeSelector for worker pod template

In the issue #19, we add the pod template support. But the nodeSelector is not support in pr #33 because of its complex changes:

  1. federated learning: since the train workers need to access the aggregation worker, currently we create the train workers injecting with the aggregation service NodeIP:NodePort. So GM needs to know the nodeName at first. With nodeSelector, we don't know the node where the aggregation worker be scheduled at when creating the train workers, so we need to delay the train-worker creation with service info until the aggregation worker is scheduled.
    Another note: if with edgemesh support, this is not a probelm.
  2. joint inference: same problem with federated learning, edgeWorker needs to access cloudWorker.
  3. the downstream controller: it syncs the features to these edge nodes when get the feature updates, so downstream need to know the nodeName.

he ***trainingWorks*** design in federatedlearningjob.yaml

Hi, i do not quite understand the yaml design here. why are the 2 dataset sections and the 2 template sections in trainingWorkers here?

I guess what you mean is 2 edges with different datasets and different configurations. if this is the case, it makes more sense for me to have 2 worker sections under the trainingWorks and assign the dataset and template to the worker

trainingWorkers:
- dataset:
name: "edge1-surface-defect-detection-dataset"
template:
spec:
nodeName: "edge1"
containers:
- image: kubeedge/sedna-example-federated-learning-surface-defect-detection-train:v0.1.0
name: train-worker
imagePullPolicy: IfNotPresent
env: # user defined environments
- name: "batch_size"
value: "32"
- name: "learning_rate"
value: "0.001"
- name: "epochs"
value: "1"
resources: # user defined resources
limits:
memory: 2Gi
- dataset:
name: "edge2-surface-defect-detection-dataset"
template:
spec:
nodeName: "edge2"
containers:
- image: kubeedge/sedna-example-federated-learning-surface-defect-detection-train:v0.1.0
name: train-worker
imagePullPolicy: IfNotPresent
env: # user defined environments
- name: "batch_size"
value: "32"
- name: "learning_rate"
value: "0.001"
- name: "epochs"
value: "1"
resources: # user defined resources
limits:
memory: 2Gi

examples: enhance incremental_learning/helmet_detection

tracking the issues of the incremental learning example.

  • the namespace sedna-test should be renamed, default is recommended.
  • there exists an useless file named train_data.txt.bak1 in dataset.tar.gz
  • there exists an useless directory named base_model/model/ in model.tar.gz
  • need to test hard-example-mining algorithm because of high hard-example storage
  • with bad model eval-worker should report low precision/recall instead of raise exception
  • alleviate the memory usage of training worker

Lib Refactoring

Description

Some of current components in Sedna Lib provide services independently based on different features. Each feature provides an independent interface and code structure, which is confusing. To better facilitate those use cases, We need to start thinking about decoupling some of our components, While not a specific action item for Sedna, believe this conversation is worthwhile.

Discussion so far:

  1. Secondary development is difficult, developers cannot replace with their customized processing logic. (e.g. pre-process, post-process, feature engineering, etc.).
  2. Too many non-global variables defined in baseConfig is that since every function has access to these, it becomes increasingly hard to figure out which feature of sedna actually read and write these variables.
  3. Right now Sedna Lib supports only Tensorflow, which possibly limits the number of possible users. If we can provide anyway to allowed extend other ML framework, it will benefit more developers.
  4. The current feature modules (incremental_learningjoint_inferencefederated_learning, etc.) have classes scattered around in a generally unorganized mess,it throws users out of their rhythm when they go to learn 、use different modules.

Details

​ During reviews I've seen a few areas where we can decouple concepts:

  1. By using a registration of class-factory functions to emulate virtual constructors, developers can invoke different components by change variables in the Config file.
class ClassFactory(object):
    """A Factory Class to manage all class need to register with config."""

    __registry__ = {}

    @classmethod
    def register(cls, type_name='common', alias=None):
        """Register class into registry.

        :param type_name: type_name: type name of class registry
        :param alias: alias of class name
        :return: wrapper
        """

        def wrapper(t_cls):
            """Register class with wrapper function.

            :param t_cls: class need to register
            :return: wrapper of t_cls
            """
            t_cls_name = alias if alias is not None else t_cls.__name__
            if type_name not in cls.__registry__:
                cls.__registry__[type_name] = {t_cls_name: t_cls}
            else:
                if t_cls_name in cls.__registry__:
                    raise ValueError(
                        "Cannot register duplicate class ({})".format(t_cls_name))
                cls.__registry__[type_name].update({t_cls_name: t_cls})

            return t_cls

        return wrapper

    @classmethod
    def register_cls(cls, t_cls, type_name='common', alias=None):
        """Register class with type name.

        :param t_cls: class need to register.
        :param type_name: type name.
        :param alias: class name.
        :return:
        """
        t_cls_name = alias if alias is not None else t_cls.__name__
        if type_name not in cls.__registry__:
            cls.__registry__[type_name] = {t_cls_name: t_cls}
        else:
            if t_cls_name in cls.__registry__:
                raise ValueError(
                    "Cannot register duplicate class ({})".format(t_cls_name))
            cls.__registry__[type_name].update({t_cls_name: t_cls})
        return t_cls
    
 	@classmethod
    def get_cls(cls, type_name, t_cls_name=None):
        """Get class and bind config to class.

        :param type_name: type name of class registry
        :param t_cls_name: class name
        :return:t_cls
        """
        if not cls.is_exists(type_name, t_cls_name):
            raise ValueError("can't find class type {} class name {} in class registry".format(type_name, t_cls_name))
        if t_cls_name is None:
            raise ValueError("can't find class. class type={}".format(type_name))
        t_cls = cls.__registry__.get(type_name).get(t_cls_name)
        return t_cls
  1. Clean up and redesign the base Config class, each feature maintains it's specific variables, and ensures that developers can be manually updated the variables.

    class ConfigSerializable(object):
        """Seriablizable config base class."""
    
        __original__value__ = None
    
        @property
        def __allattr__(self):
            attrs = filter(lambda attr: not (attr.startswith("__") or ismethod(getattr(self, attr))
                                             or isfunction(getattr(self, attr))), dir(self))
            return list(attrs)
    
        def update(self, **kwargs):
            for attr in self.__allattr__:
                if attr not in kwargs:
                    continue
                setattr(self, attr, kwargs[attr])
    
        def to_json(self):
            """Serialize config to a dictionary. It will be very useful in distributed systems. """
            pass
    
        def __getitem__(self, item):
            return getattr(self, item, None)
    
        def get(self, item, default=""):
            return self.__getitem__(item) or default
    
        @classmethod
        def from_json(cls, data):
            """Restore config from a dictionary or a file."""
            pass
    
  2. Refer to tools such as moxingneuropod .etc, decouple the ML framework from the features of sedna, allows developers to choose their favorite framework.
    Few frameworks which would be nice to support:

  3. Add a common file operation and a unified log-format in common module, use an abstract base class to standardize the feature modules' interface, and features are invoked by inheriting the base class.

    Specifically, the goals are:

    • Clean up the base module's classes and package structure
    • Create a necessary amount of modules based on big features
    • Revise dependency structure for all the modules

Implement automatic model conversion and model serving image selection to support heterogeneous hardware platform

Function description: Automatically select AI server (TFserving, PaddleServing, MindSpore Serving, Tensorflow lite, Paddle lite, MNN, OpenVINO, etc.) according to the hardware platform, and automatically perform model format conversion.

Solution:
Provided a k8s operator to realize the following functions:

  1. Read the edge node hardware type label and select model serving image. Edge node hardware type is labeled by kubectl as follows:
kubectl label node edge-node-1 hardware=Ascend
  1. Deploy model conversion image for model conversion;
  2. Deploy model serving image.

lifelong learning enhancements tracking issue

This issue is to track these known lifelong learning added in #72 enhancements to be done:

  • KB server need its standalone code directory instead of placing into lib directory
  • KB API need to be reviewed again(such as /file/download, /file/upload)
  • Reduce KB image size(v0.3.0 is 1.28G showed by docker-images)
  • In multi-task-learning of lib code, need to add docs and code comment
  • Remove scikit-learn requirement for federated-learning surface-defect-detection example
  • Reduce lifelong example image size(v0.3.0 is 1.36G)
  • Review the requirements of lib

the issue of downloading the raw images in dataset

In the #18, I proposed to add an init-containers to download dataset/models before running workers.

In the incremental-learning example, the user creates a dataset with the url s3://helmet_detection/train_data/train_data.txt, where train_data.txt only contains the label info, not the image blobs.

So where and who does download these images?

Model Management module is needed.

Why a model management module is needed?

Currently, the model management capability of Sedna is fragmented and has the following problems:

1. Unclear module responsibilities.

It is found that the code for model uploading and downloading was duplicated in different modules(LC and Lib in Sedna), and they were even in different languages.
In this case, for example, if you want to extend the protocol for saving the model, you will add code with the same function to these different modules, which is difficult to maintain.

2. Difficult to leverage basic capabilities.

For example, the model saving function may be used for model compression and model conversion. it is not necessary to implement model saving function both in compression and model conversion.

3. Model basic functions interfaces are not designed in a unified manner.

For example, model compression and model deployment may be combined, as may model transformation and model deployment. If there is a basic interface to standardize the functions of the model, it can provide more flexible functions.

4. The style of interfaces exposed to users might be different.

Em.., this point is intuitive.

I will add some concrete examples to illustrate this later.

Current model management requirements

I summarize the requirements for model management of sedna's existing features.

Incremental Learning

Fncremental Learning

Joint Inference

Future model management requirements

Through a survay, I have learned that model can also have the following behavior:

  • write
  • read
  • watch
  • version/history
  • evaluate
  • predict/serving
  • publish/deploy
  • conversion
  • compresion
  • monitoring
  • searching

For example, multi-task lifelong learning is in the roadmap of Sedna, it requires capabilities such as multi-model serving, model metric recording, and model metadata search.

Summary

Therefore, I hope that we can design an edge-cloud synergy model management component with a unified architecture and interface style based on current or future features.

Model serving should support hot loading

In joint inference or incremental learning scenario, sometimes we need to redelpoy our model. For example, in incremental learning scenario, after continuous data training, the model precision reaches the trigger condition, and the model needs to be redeployed.
How to redepoy our model without service interruption is need to be considerd.
Some serving framework support hot loading, for example pytorch server, but some framework do not, such as mindspore.

Add examples images GPU version

What would you like to be added/modified:
Add examples images GPU version.
Why is this needed:
Train/inference deep learning models in GPU is much quicker than in CPU, we need to add the corresponding GPU version for our examples.

Edge AI Benchmark review

Edge AI Benchmark Draft pdf is here

If you have comments about this proposal, you can directly reply under this issue.
If you want to join the benchmark writing, you can contact Dr. Zimu Zheng.

Dr. Zimu Zheng
Email: [email protected]
WeChat ID: moodzheng

the ***trainingWorks*** design in federatedlearningjob.yaml

Hi, i do not quite understand the yaml design here. why are the 2 dataset sections and the 2 template sections in trainingWorkers here?

I guess what you mean is 2 edges with different datasets and different configurations. if this is the case, it makes more sense for me to have 2 worker sections under the trainingWorks and assign the dataset and template to the worker

trainingWorkers:
- dataset:
name: "edge1-surface-defect-detection-dataset"
template:
spec:
nodeName: "edge1"
containers:
- image: kubeedge/sedna-example-federated-learning-surface-defect-detection-train:v0.1.0
name: train-worker
imagePullPolicy: IfNotPresent
env: # user defined environments
- name: "batch_size"
value: "32"
- name: "learning_rate"
value: "0.001"
- name: "epochs"
value: "1"
resources: # user defined resources
limits:
memory: 2Gi
- dataset:
name: "edge2-surface-defect-detection-dataset"
template:
spec:
nodeName: "edge2"
containers:
- image: kubeedge/sedna-example-federated-learning-surface-defect-detection-train:v0.1.0
name: train-worker
imagePullPolicy: IfNotPresent
env: # user defined environments
- name: "batch_size"
value: "32"
- name: "learning_rate"
value: "0.001"
- name: "epochs"
value: "1"
resources: # user defined resources
limits:
memory: 2Gi

[Incremental Learning] Add the support with different `nodeName` of train/eval workers

What would you like to be added/modified:
Currently in the feature of incremental learning, the nodeNames of dataset and train/eval workers should be same.

When the dataset is in shared storage, the support with different nodeName could be added

Why is this needed:

  1. In principle we can't require the user must train and eval model in same node.
  2. Train requires much more resources than eval worker, they may not in the same node.
  3. Sometimes the user may need to do evalution in the same/similar node with infer-worker, such as both at edge.

incremental-learning: data path problem of dataset label file

A label file contains a field specifying the image path, this path could be relative or absolute path.
And as far as i know, the main case is the relative path, such as:

imgs/black/34290.jpg,1
imgs/black/36347.jpg,1
imgs/white/30049.jpg,0

but in the incremental-learning, LC would monitor the original dataset, split it into trainset and test set, and save them in another path.

There is something wrong when I deploy the jointinferenceservice example...

What happened?

I successfully deployed the sedna with following the documentation of installation, then I saw gm and lc pod are running, like this:

archlab@cloud-master:~/gopath/src$ kubectl get pod -n sedna -o wide
NAME                 READY   STATUS    RESTARTS   AGE   IP               NODE           NOMINATED NODE   READINESS GATES
gm-f58b846ff-mlltx   1/1     Running   0          27h   10.244.0.3       cloud-master   <none>           <none>
lc-g799f             1/1     Running   0          27h   192.168.30.207   cloudnode-1    <none>           <none>
lc-l7m5t             1/1     Running   0          27h   192.168.30.206   cloud-master   <none>           <none>
lc-q7jhf             1/1     Running   0          27h   192.168.60.36    edgenode2201   <none>           <none>

Then I tried the Helmet Detection Experiment. And I built images and models we needed, like this:

archlab@cloud-master:~/gopath/src$ kubectl get Model -A
NAMESPACE   NAME                                      AGE
default     helmet-detection-inference-big-model      9h
default     helmet-detection-inference-little-model   8h

archlab@cloud-master:~/gopath/src$ kubectl get jointinferenceservices.sedna.io
NAME                                 AGE
helmet-detection-inference-example   8h

Finally I Mock Video Stream for Inference in Edge Side:

archlab001@edgenode2201:~/joint_inference_example/data/video$ sudo ffmpeg -re -i video.mp4 -vcodec libx264 -strict -2 -f rtsp rtsp://localhost/video 
sudo: 无法解析主机:edgenode2201
ffmpeg version 2.8.17-0ubuntu0.1 Copyright (c) 2000-2020 the FFmpeg developers
  built with gcc 5.4.0 (Ubuntu 5.4.0-6ubuntu1~16.04.12) 20160609
  configuration: --prefix=/usr --extra-version=0ubuntu0.1 --build-suffix=-ffmpeg --toolchain=hardened --libdir=/usr/lib/x86_64-linux-gnu --incdir=/usr/include/x86_64-linux-gnu --cc=cc --cxx=g++ --enable-gpl --enable-shared --disable-stripping --disable-decoder=libopenjpeg --disable-decoder=libschroedinger --enable-avresample --enable-avisynth --enable-gnutls --enable-ladspa --enable-libass --enable-libbluray --enable-libbs2b --enable-libcaca --enable-libcdio --enable-libflite --enable-libfontconfig --enable-libfreetype --enable-libfribidi --enable-libgme --enable-libgsm --enable-libmodplug --enable-libmp3lame --enable-libopenjpeg --enable-libopus --enable-libpulse --enable-librtmp --enable-libschroedinger --enable-libshine --enable-libsnappy --enable-libsoxr --enable-libspeex --enable-libssh --enable-libtheora --enable-libtwolame --enable-libvorbis --enable-libvpx --enable-libwavpack --enable-libwebp --enable-libx265 --enable-libxvid --enable-libzvbi --enable-openal --enable-opengl --enable-x11grab --enable-libdc1394 --enable-libiec61883 --enable-libzmq --enable-frei0r --enable-libx264 --enable-libopencv
  libavutil      54. 31.100 / 54. 31.100
  libavcodec     56. 60.100 / 56. 60.100
  libavformat    56. 40.101 / 56. 40.101
  libavdevice    56.  4.100 / 56.  4.100
  libavfilter     5. 40.101 /  5. 40.101
  libavresample   2.  1.  0 /  2.  1.  0
  libswscale      3.  1.101 /  3.  1.101
  libswresample   1.  2.101 /  1.  2.101
  libpostproc    53.  3.100 / 53.  3.100
Input #0, mov,mp4,m4a,3gp,3g2,mj2, from 'video.mp4':
  Metadata:
    major_brand     : mp42
    minor_version   : 0
    compatible_brands: isommp42
    creation_time   : 2019-03-31 14:46:46
  Duration: 00:15:06.00, start: 0.000000, bitrate: 1430 kb/s
    Stream #0:0(und): Video: h264 (Main) (avc1 / 0x31637661), yuv420p(tv, bt709), 1280x720 [SAR 1:1 DAR 16:9], 1298 kb/s, 23.98 fps, 23.98 tbr, 24k tbn, 47.95 tbc (default)
    Metadata:
      creation_time   : 2019-03-31 14:46:46
      handler_name    : ISO Media file produced by Google Inc. Created on: 03/31/2019.
    Stream #0:1(eng): Audio: aac (LC) (mp4a / 0x6134706D), 44100 Hz, stereo, fltp, 128 kb/s (default)
    Metadata:
      creation_time   : 2019-03-31 14:46:46
      handler_name    : ISO Media file produced by Google Inc. Created on: 03/31/2019.
[libx264 @ 0x750740] using SAR=1/1
[libx264 @ 0x750740] using cpu capabilities: MMX2 SSE2Fast SSSE3 SSE4.2
[libx264 @ 0x750740] profile High, level 3.1
[libx264 @ 0x750740] 264 - core 148 r2643 5c65704 - H.264/MPEG-4 AVC codec - Copyleft 2003-2015 - http://www.videolan.org/x264.html - options: cabac=1 ref=3 deblock=1:0:0 analyse=0x3:0x113 me=hex subme=7 psy=1 psy_rd=1.00:0.00 mixed_ref=1 me_range=16 chroma_me=1 trellis=1 8x8dct=1 cqm=0 deadzone=21,11 fast_pskip=1 chroma_qp_offset=-2 threads=6 lookahead_threads=1 sliced_threads=0 nr=0 decimate=1 interlaced=0 bluray_compat=0 constrained_intra=0 bframes=3 b_pyramid=2 b_adapt=1 b_bias=0 direct=1 weightb=1 open_gop=0 weightp=2 keyint=250 keyint_min=23 scenecut=40 intra_refresh=0 rc_lookahead=40 rc=crf mbtree=1 crf=23.0 qcomp=0.60 qpmin=0 qpmax=69 qpstep=4 ip_ratio=1.40 aq=1:1.00
Output #0, rtsp, to 'rtsp://localhost/video':
  Metadata:
    major_brand     : mp42
    minor_version   : 0
    compatible_brands: isommp42
    encoder         : Lavf56.40.101
    Stream #0:0(und): Video: h264 (libx264), yuv420p, 1280x720 [SAR 1:1 DAR 16:9], q=-1--1, 23.98 fps, 90k tbn, 23.98 tbc (default)
    Metadata:
      creation_time   : 2019-03-31 14:46:46
      handler_name    : ISO Media file produced by Google Inc. Created on: 03/31/2019.
      encoder         : Lavc56.60.100 libx264
    Stream #0:1(eng): Audio: aac, 44100 Hz, stereo, fltp, 128 kb/s (default)
    Metadata:
      creation_time   : 2019-03-31 14:46:46
      handler_name    : ISO Media file produced by Google Inc. Created on: 03/31/2019.
      encoder         : Lavc56.60.100 aac
Stream mapping:
  Stream #0:0 -> #0:0 (h264 (native) -> h264 (libx264))
  Stream #0:1 -> #0:1 (aac (native) -> aac (native))
Press [q] to stop, [?] for help
frame=21722 fps= 24 q=-1.0 Lsize=N/A time=00:15:05.99 bitrate=N/A    
video:313537kB audio:19168kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: unknown
[libx264 @ 0x750740] frame I:172   Avg QP:21.53  size: 54827
[libx264 @ 0x750740] frame P:11817 Avg QP:24.38  size: 22320
[libx264 @ 0x750740] frame B:9733  Avg QP:27.95  size:  4919
[libx264 @ 0x750740] consecutive B-frames: 21.3% 49.7% 21.3%  7.6%
[libx264 @ 0x750740] mb I  I16..4: 12.7% 70.1% 17.2%
[libx264 @ 0x750740] mb P  I16..4:  4.6% 12.8%  2.5%  P16..4: 44.0% 15.3%  6.2%  0.0%  0.0%    skip:14.7%
[libx264 @ 0x750740] mb B  I16..4:  0.3%  0.5%  0.2%  B16..8: 39.8%  4.2%  0.8%  direct: 2.7%  skip:51.5%  L0:45.2% L1:47.5% BI: 7.3%
[libx264 @ 0x750740] 8x8 transform intra:64.3% inter:63.0%
[libx264 @ 0x750740] coded y,uvDC,uvAC intra: 53.6% 36.5% 3.6% inter: 21.1% 9.3% 0.1%
[libx264 @ 0x750740] i16 v,h,dc,p: 23% 30% 11% 37%
[libx264 @ 0x750740] i8 v,h,dc,ddl,ddr,vr,hd,vl,hu: 23% 23% 19%  4%  5%  5%  7%  5%  8%
[libx264 @ 0x750740] i4 v,h,dc,ddl,ddr,vr,hd,vl,hu: 24% 22% 15%  5%  8%  7%  9%  5%  6%
[libx264 @ 0x750740] i8c dc,h,v,p: 65% 17% 13%  4%
[libx264 @ 0x750740] Weighted P-Frames: Y:21.0% UV:4.7%
[libx264 @ 0x750740] ref P L0: 59.7% 18.9% 15.5%  5.3%  0.6%
[libx264 @ 0x750740] ref B L0: 85.8% 13.6%  0.5%
[libx264 @ 0x750740] ref B L1: 95.8%  4.2%
[libx264 @ 0x750740] kb/s:2835.01

After these, I can't see any container or pod is running, why? And there are no results in my ouput dir.

My Enviornment Settings

  • create_big_model_resources.yaml
apiVersion: sedna.io/v1alpha1
kind:  Model
metadata:
  name: helmet-detection-inference-big-model
  namespace: default
spec:
  url: "/home/archlab/gopath/src/joint_inference_example/data/big-model/yolov3_darknet.pb"
  format: "pb"
  • create_little_model_resources.yaml
apiVersion: sedna.io/v1alpha1
kind: Model
metadata:
  name: helmet-detection-inference-little-model
  namespace: default
spec:
  url: "/home/archlab/gopath/src/joint_inference_example/data/little-model/yolov3_resnet18.pb"
  format: "pb"
  • create_joint_inference_service.yaml
apiVersion: sedna.io/v1alpha1
kind: JointInferenceService
metadata:
  name: helmet-detection-inference-example
  namespace: default
spec:
  edgeWorker:
    model:
      name: "helmet-detection-inference-little-model"
    hardExampleMining:
      name: "IBT"
      parameters:
        - key: "threshold_img"
          value: "0.9"
        - key: "threshold_box"
          value: "0.9"
    template:
      spec:
        nodeName: edgenode2201
        containers:
        - image: kubeedge/sedna-example-joint-inference-helmet-detection-little:v0.1.0
          imagePullPolicy: IfNotPresent
          name:  little-model
          env:  # user defined environments
          - name: input_shape
            value: "416,736"
          - name: "video_url"
            value: "rtsp://localhost/video"
          - name: "all_examples_inference_output"
            value: "/data/output"
          - name: "hard_example_cloud_inference_output"
            value: "/data/hard_example_cloud_inference_output"
          - name: "hard_example_edge_inference_output"
            value: "/data/hard_example_edge_inference_output"
          resources:  # user defined resources
            requests:
              memory: 64M
              cpu: 100m
            limits:
              memory: 2Gi
          volumeMounts:
            - name: outputdir
              mountPath: /data/
        volumes:   # user defined volumes
          - name: outputdir
            hostPath:
              # user must create the directory in host
              path: /home/archlab001/joint_inference_example/joint_inference/output
              type: Directory

  cloudWorker:
    model:
      name: "helmet-detection-inference-big-model"
    template:
      spec:
        nodeName: cloud-master
        containers:
          - image: kubeedge/sedna-example-joint-inference-helmet-detection-big:v0.1.0
            name:  big-model
            imagePullPolicy: IfNotPresent
            env:  # user defined environments
              - name: "input_shape"
                value: "544,544"
            resources:  # user defined resources
              requests:
                memory: 2Gi

And my dir is:

# In Cloud Side
archlab@cloud-master:~/gopath/src/joint_inference_example$ ls
create_big_model_resources.yaml      create_little_model_resources.yaml  joint_inference
create_joint_inference_service.yaml  data
archlab@cloud-master:~/gopath/src/joint_inference_example$ pwd
/home/archlab/gopath/src/joint_inference_example
# In Edge Side
archlab@cloud-master:~/gopath/src/joint_inference_example/joint_inference/output$ pwd
/home/archlab/gopath/src/joint_inference_example/joint_inference/output

Add more e2e testcases

Following #14 with only one testcase(i.e. create dataset) added, we need to add more e2e testcases:

  • dataset: get/delete
  • model: create/get/delete
  • joint-inference: create/get/delete
  • federated-learning: create/get/delete
  • incremental-learning: create/get/delete

[Enhancement Request] Integrate Plato into Sedna as a backend for supporting federated learning

What would you like to be added/modified:

Plato is a new software framework to facilitate scalable federated learning. So far, Plato has already supported PyTorch and MindSpore. Several advantages are summarized as follow:

  1. Simplicity: Plato provides user-friendly APIs.
  2. Scalability: Plato is scalable. Plato also supports running multiple (unlimited) workers, which share one GPU in turn.
  3. Extensibility: Plato manages various machine learning models and aggregation algorithms.
  4. Framework-agnostic: Most of the codebases in Plato can be used with various machine learning libraries.
  5. Hierarchical Design: Plato supports multiple-level cells, including edge-cloud (2 levels) federated learning and device-edge-cloud (3 levels) federated learning.

This proposal discusses how to integrate Plato into Sedna as a backend for supporting federated learning. @li-ch @baochunli @jaypume

Why is this needed:
The motivation of this proposal could be summarized as follow:

  1. Algorithm:
    Sedna (Aggregator) currently supports FedAvg. With Plato, Sedna can choose various aggregation algorithms, such as FedAvg, Adaptive Freezing, Mistnet, and Adaptive sync.
  2. Dataset:
    Sedna needs to manually prepare the user data. With Plato, it can provide a "datasources" module, including various public datasets (e.g., cifar10, cinic10, and coco). Non-iid samplers could also be supported.
  3. Model:
    Sedna specifies the model in the images as a file. It uploads the whole model to the server. With Plato, it can specify all models as user configurations. The Report class can help the worker to determine the strategy of uploading gradients for fast convergence, such as Adaptive Freezing, Nova, Sarah, Mistnet, and so on.

Plans:

  1. Overview
    Sedna aims to provide the following federated learning features:

    • Write easy and short configuration files in Sedna to support flexible federated learning setups.
    • It should handle real datasets in the industry and simulate a non-iid version of public standard dataset in academia.
    • It should consider how to configure a customized model.

    Therefore, two resources are updated:

    • Dataset: The definition of Dataset
    • Model: The definition of model

    Configuration updates in aggregationWorker and trainingWorkers:

    apiVersion: sedna.io/v1alpha1
    kind: FederatedLearningJob
    metadata:
      name: surface-defect-detection
    spec:
      aggregationWorker:
        # read and write
        model:
          name: "surface-defect-detection-model"
        platoConfig: 
          url: "sdd_rcnn.yml" # stored in S3 or github
        template:
          spec:
            nodeName: $CLOUD_NODE
            containers:
              - image: kubeedge/sedna-example-federated-learning-surface-defect-detection-aggregation:v0.1.0
                name:  agg-worker
                imagePullPolicy: IfNotPresent
                # env: # user defined environments
                resources:  # user defined resources
                  limits:
                    memory: 2Gi
        - dataset:
            name: "cloud-surface-defect-detection-dataset"
    
      trainingWorkers:
        # read only
        model:
          name: "surface-defect-detection-model"
        - dataset:
            name: "edgeX-surface-defect-detection-dataset"
          template:
            spec:
              nodeName: $EDGE1_NODE
              containers:
                - image: kubeedge/sedna-example-federated-learning-surface-defect-detection-train:v0.1.0
                  name:  train-worker
                  imagePullPolicy: IfNotPresent
                  env:  # user defined environments or given by the GlobalManager. 
                    - name: "server_ip"
                      value: "localhost"
                    - name: "server_port"
                      value: "8000"
                  resources:  # user defined resources
                    limits:
                      memory: 2Gi
  2. How to write Plato code in Sedna
    The users only need to prepare the configuration file in public storage. The Plato code is settled in the Sedna libraries:
    An example of configuration file sdd_rcnn.yml:

    clients:
     # Type
     type: mistnet
     # The total number of clients
     total_clients: 1
     # The number of clients selected in each round
     per_round: 1
     # Should the clients compute test accuracy locally?
     do_test: false
    
    # this will be discarded in the future
    # server:
    #  address: localhost
    #  port: 8000
    
    data:
     datasource: sednaDataResource
     # Number of samples in each partition
     partition_size: 128
     # IID or non-IID?
     sampler: iid
    
    trainer:
     # The type of the trainer
     type: yolov5
     # The maximum number of training rounds
     rounds: 1
     # Whether the training should use multiple GPUs if available
     parallelized: false
     # The maximum number of clients running concurrently
     max_concurrency: 3
     # The target accuracy
     target_accuracy: 0.99
     # Number of epoches for local training in each communication round
     epochs: 500
     batch_size: 16
     optimizer: SGD
     linear_lr: false
     # The machine learning model
     model_name: sednaModelResource
    
    algorithm:
     # Aggregation algorithm
     type: mistnet
     cut_layer: 4
     epsilon: 100
  3. How to integrate the Dataset in Plato
    In this part, several functions are added to Dataset.

    apiVersion: sedna.io/v1alpha1
    kind: Dataset
    metadata:
      name: "edge1-surface-defect-detection-dataset"
    spec:
      name: COCO
      data_params: packages/coco128.yaml
      # if download_url is None, the data should be stored in disk by default
      download_url: https://github.com/ultralytics/yolov5/releases/download/v1.0/coco128.zip 
      data_path: ./data/
        train_path: ./data/COCO/coco128/images/train2017/
        test_path: ./data/COCO/coco128/images/train2017/
      # number of classes
      num_classes: 80
      # image size
      image_size: 640
      # class names
      classes:
          [
              "person",
              "bicycle",
              ...
          ]
      # remark
      format: ""
      nodeName: $EDGE1_NODE
  4. How to integrate the Models management tools in Plato
    In this part, several functions are added to Model.

    apiVersion: sedna.io/v1alpha1
    kind: Model
    metadata:
      name: "surface-defect-detection-model"
    spec:
      model_name: vgg_16
      url: "/model"
      # ONNX (https://onnx.ai/) or specify a framework 
      format: "ckpt"
      framework: "PyTorch"
      model_config: packages/models/vgg_16.yaml
      # if true, the model needs to be loaded from url before training
      pretrained: True

To-Do Lists

  • Enhance aggregationWorker and trainingWorkers interfaces
  • Datasets interface
  • Models management
  • Examples and demo presentation
    • CV: yolo-v5 demo in KubeEdge
    • NLP: huggingface demo in KubeEdge

Model serving should support multi-model

In model inference, multiple models may be composed for inference. For example, model A receives input and its output will be inputed to model B for inference to obtain the final inference result. For more scenarios, see this link

Implementation for injecting storage-initializer

In the #18, I proposed to add an init-containers to download dataset/models before running workers.
Then we need to inject the storage-initializer to workers.

the simply way

The obvious way to implement it is to modify the creating-worker logic in each collaboration features in GM.
I can abstract the common logic to one func/file.
its pros: simply and quick
its cons: need to modify the gm

the more decouple way

Another good way I found is to leverage the k8s admission hooks used by the kfserving.
its pros: decoupling with each collaboration features
its cons: add extra a webhook server; more code worker

What I decide to do now

For simplicity, firstly to implement the simply way, then evolute to the admission hook way when needed since injecting code can be reused.

repository setup tracking issue

This issue is to track the things need to be done for completely setup our repository.

  • add fossa checker for licenses checking
  • add prowbot support for auto merging pr etc.
  • add e2e testcases, current empty

Feature worker spec support configmap

What would you like to be added/modified:
Feature worker spec configuration information can be injected through the configmap or environment variables. (Currently, it is an environment variable.)

Why is this needed:
Developers have built-in containers on the Worker. The ConfigMap is equivalent to the configuration file of the native application. There can be one or more configuration files.
After the container is started, the container obtains the ConfigMap content from the host machine, generates a local file, and maps the file to a specified directory in the container as a volume.
Applications in the container read the configuration files in the container-specific directory in the original manner.
For a container, the configuration file is packaged in a specific directory inside the container. The entire process does not intrude on applications.

Add pod template like support for worker spec

What would you like to be added:

Add pod template like support for worker spec

Why is this needed:

Current state of the worker spec definition:

 type WorkerSpec struct {
    ScriptDir        string     `json:"scriptDir"`
    ScriptBootFile   string     `json:"scriptBootFile"`
    FrameworkType    string     `json:"frameworkType"`
    FrameworkVersion string     `json:"frameworkVersion"`
    Parameters       []ParaSpec `json:"parameters"`
 }

 // ParaSpec is a description of a parameter
 type ParaSpec struct {
    Key   string `json:"key"`
    Value string `json:"value"`
 }
  1. ScriptDir/ScriptBootFile is the entrypoint of worker, localpath or shared storage(e.g. s3).
  2. FrameworkType/FrameworkVersion specifies the base container image of worker.
  3. Parameters specifies the environment of worker.

pros:

  1. simply for demo

cons:

  1. need to copy the code script to all known nodes manually.
  2. don't support docker-container cap: code version mgmt, distribution etc.
  3. don't support k8s pod similar features: resource limits, user defined volumes etc.
  4. need shared storage(e.g. s3) for code if not localpath.
  5. need to build base image if the current base image can't satisfy the user
    requirements(user-defined code package dependents, or new framework).
    And then reedit the configuration of GM and restart it.

Contributor Experience Improvement Tracking Issue

One of the top priority work to do duing the next few months is to improve contributor experience, help more new contributors get on board.

This issue is to track the things need to be done.

General

  • [Docs] Separate user manual and contributor guide docs.
  • [Docs] Code of Conduct

Releases

  • [Docs] Release lifecycle
  • [Tooling] Release automation: automate the process of making a new release.
  • [Docs] Usage of Milestone and Projects view.

Feature guide

  • [Docs] Feature lifecycle
  • [Docs] Proposing Enhancements (features).
  • [Docs] How to write Enhancement (KEP) guide.

Developer self-service

  • [Docs] local env setup (for development), with local-up script already done
  • [Docs] PR flow.
  • [Docs] Debugging guide.
  • [Docs] Reporting Bugs
  • [Tooling] Speed up CI checking.

Add shared storage support for dataset/model

What would you like to be added:

Add shared storage support for dataset/model, such as s3/http protocols.

Why is this needed:

Currently only dataset/model uri with host localpath is supported, thus limiting cross node model training/serving.

Join_inference cannot be connected to the cloud for analysis

What happened:
When I use the example which is joint_inference/helmet_detection_inference, there are some errors in the log
helmet-detection-inference-example-edge-5pg66 logs:
[2021-04-16 02:32:48,383][sedna.joint_inference.joint_inference][ERROR][124]: send request to http://192.168.2.211:30691 failed, error is HTTPConnectionPool(host='192.168.2.211', port=30691): Max retries exceeded with url: / (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7efe5409dcf8>: Failed to establish a new connection: [Errno 111] Connection refused',)), retry times: 5 [2021-04-16 02:32:48,384][sedna.joint_inference.joint_inference][WARNING][365]: retrieve cloud infer service failed, use edge result

What you expected to happen:
The above error does not exist

How to reproduce it (as minimally and precisely as possible):
Just follow the example: https://github.com/kubeedge/sedna/blob/main/examples/joint_inference/helmet_detection_inference/README.md

Anything else we need to know?:

Environment:

Sedna Version
$ kubectl get -n sedna deploy gm -o jsonpath='{.spec.template.spec.containers[0].image}'
# paste output here
kubeedge/sedna-gm:v0.1.0
$ kubectl get -n sedna ds lc -o jsonpath='{.spec.template.spec.containers[0].image}'
# paste output here

kubeedge/sedna-lc:v0.1.0

Kubernets Version
$ kubectl version
# paste output here

Client Version: version.Info{Major:"1", Minor:"18", GitVersion:"v1.18.3", GitCommit:"2e7996e3e2712684bc73f0dec0200d64eec7fe40", GitTreeState:"clean", BuildDate:"2020-05-20T12:52:00Z", GoVersion:"go1.13.9", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"18", GitVersion:"v1.18.3", GitCommit:"2e7996e3e2712684bc73f0dec0200d64eec7fe40", GitTreeState:"clean", BuildDate:"2020-05-20T12:43:34Z", GoVersion:"go1.13.9", Compiler:"gc", Platform:"linux/amd64"}

KubeEdge Version
$ cloudcore --version
# paste output here
1.6
$ edgecore --version
# paste output here

1.6

CloudSide Environment:

Hardware configuration
$ lscpu
# paste output here
OS
$ cat /etc/os-release
# paste output here
Kernel
$ uname -a
# paste output here
Others

EdgeSide Environment:

Hardware configuration
$ lscpu
# paste output here
OS
$ cat /etc/os-release
# paste output here
Kernel
$ uname -a
# paste output here
Others

feature enhancements tracking issue

This issue is to track these known enhancements to be done:

  • add resource limits for worker
  • add gpu support for worker
  • add pod template like support for worker spec
  • add model management with central storage support
  • add dataset management with central storage support
  • add example code style checker
  • add descriptions for CRD fields
  • abstract the worker controller into one, currently each feature controller has own similarity worker implementation
  • move the feature CR logic embedded in upstream/downstream to respective feature controller
  • replace self-built websocket between gm and lc with KubeEdge message communication
  • improve the state translation implementation of incremental learning
  • make the python lib interface more clearer
  • model serving should support hot loading & multiple models
  • the basic TensorFlow images in the examples needs to be unified to one version
  • the networking differences need to be considered when the LC is deployed on the cloud

gm and kb pods are always in pending status

What happened:
Hello, I tried to install sedna and encountered some problems. I followed the instructions of Sedna installation document, while the gm and kb pods were in pending status forever.

What you expected to happen:
gm and kb pods are ready and in running status.

How to reproduce it (as minimally and precisely as possible):
Follow the instructions in Sedna installation document.

Anything else we need to know?:
I have solved it through adding tolerations to gm and kb deployment in install.sh. It looks like this:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: kb
...
spec:
...
  template:
  ...
    spec:
    ...
      containers:
      ...
        tolerations:
        - key: node-role.kubernetes.io/master
          effect: NoSchedule
...

Modify gm deployment in the same way and do install.sh, then it works.

It confused me that kb deployment and gm deployment use nodeSelector of sedna: control-plane, but they can't tolerate the taint node-role.kubernetes.io/master:NoSchedule on master(cloud) node which is labeled as sedna: control-plane.

Is there any reason?

decouple collaboration feature code and public code

What would you like to be added/modified:

  1. Partitioning folders at "sedna/pkg/globalmanager/" by collaboration feature and place the corresponding code.
  2. Partitioning folders at "sedna/pkg/localcontroller/" by collaboration feature and place the corresponding code.
  3. Decoupling upstream file by collaboration feature and put the corresponding code into the collaboration feature fold
  4. Decoupling downsteam file by collaboration feature and put the corresponding code into the collaboration feature fold
  5. Any other modifications after decoupling

Why is this needed:
Currently, multiple collaboration features are available, and more collaboration features may be available in the future. Decoupling facilitates the development of the developer ecosystem, Such as after decoupling, developers can easily integrate more features.

Automatically pushing docker-images when creating a release

What would you like to be added/modified:
Add a github action for automatically pushing docker-images when creating a release.
Why is this needed:

  1. automatically push instead of manually
  2. the image should be consistent with the same code

Incremental learning supports hot model updates.

What would you like to be added/modified:
Currently, models are updated through restart. infer worker. Incremental learning is required to support hot model updates rather than only through restart.

Why is this needed:
inference engine supports model reloading. Therefore, it is hoped that dynamic update can be supported.

Add GlobalManager high availability support

What would you like to be added/modified:
Add GlobalManager multi-instance support for high availability and high throughput.

Why is this needed:
Currently GlobalManager is deployed as a k8s deployment with replicas=1, and only one instance is supported.

For GlobalManager's high availability and high throughput, we need to deploy GlobalManager with replicas >=2.

Add arm support

Add Sedna's support for below platforms:

  1. Raspberry Pi
  2. Arm64 Server

Add operator to support dynamic neural networks

Dynamic neural network is an emerging technology in deep learning. Compared to static models which have fixed computational graphs and parameters at the inference stage, dynamic networks can adapt their structures or parameters to different inputs, leading to notable advantages in terms of accuracy, computational efficiency, adaptiveness, etc.

In general, dynamic neural network has the following advantages:

  1. Adaptiveness. Dynamic models are able to achieve a desired trade-off between accuracy and efficiency for dealing with varying computational budgets on the fly. Therefore, they are more adaptable to different hardware platforms and changing environments, compared to static models with a fixed computational cost.

  2. Representation power. Due to the data-dependent network architecture/parameters, dynamic networks have significantly enlarged parameter space and improved representation power.

  3. Compatibility. Dynamic networks are compatible with most advanced techniques in deep learning, such as model compression, neural architecture search (NAS) .

Demo:

  1. Model early-exiting with BranchyNet. A real industrial case is here.
  2. Model layer skipping with ResNet.

tracking issues of pod template support #19

  • refactor workerSpec of API/implementation into pod template
  • change CRD of all features, and we can avoid this by kube-builder #11
  • update the installation doc
  • update the local-up script
  • update the proposal
  • update all three examples
  • add all example dockerfiles

Wrong base model in surface defect detection example with the federated learning job

What happened:
As I have successfully setup and run the surface defect detection example with the federated learning job. I have then use the model generated to run inference on other images in the Magnetic tile defect dataset. However, I have noticed that the results are not ideal, and the model always predicts "No defect" for all the images I fed into the model.

How to reproduce it (as minimally and precisely as possible):

fl_instance = FederatedLearning(estimator=Estimator)
fl_instance.load(model_url)

print(fl_instance.predict(test_data))

Anything else we need to know?:

Environment:

Sedna Version
v0.3.0

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.