GithubHelp home page GithubHelp logo

openebs-archive / longhorn Goto Github PK

View Code? Open in Web Editor NEW

This project forked from longhorn/longhorn-engine

18.0 18.0 12.0 4.2 MB

We put storage on cows and move them around

License: Apache License 2.0

Makefile 0.11% Go 75.22% Python 22.75% Shell 1.81% Dockerfile 0.10%

longhorn's People

Contributors

angie1015 avatar ibuildthecloud avatar imikushin avatar jaciechao avatar kmjayadeep avatar kmova avatar kp6 avatar niusmallnan avatar payes avatar sheng-liang avatar yasker avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

longhorn's Issues

Issue while syncing: Failed to find meta file error

Detailed logs for this issue can be seen here:
https://api.travis-ci.org/v3/job/409381115/log.txt

[0014] Done running ssync[ssync -port 9704 -daemon volume-snap-5f24a71d-7d81-40f5-b5dd-2d3a01716433.img.meta -timeout 7]
[2018-07-28T21:19:52Z] Done synchronizing volume-snap-5f24a71d-7d81-40f5-b5dd-2d3a01716433.img.meta to volume-snap-5f24a71d-7d81-40f5-b5dd-2d3a01716433.img.meta@172.18.0.5:9704
[2018-07-28T21:19:52Z] reloadAndVerify tcp://172.18.0.5:9502        
[2018-07-28T21:19:52Z] Reload Replica                               
[2018-07-28T21:19:52Z] Reloading volume                             
[2018-07-28T21:19:52Z] error in Reload                              
[2018-07-28T21:19:52Z] Error Failed to find metadata for volume-snap-00103065-6f23-4da5-9d4b-d1168d5e9559.img in doOp: /v1/replicas/1?action=reload
[2018-07-28T21:19:52Z] error in reloadReplica Failed to find metadata for volume-snap-00103065-6f23-4da5-9d4b-d1168d5e9559.img
[2018-07-28T21:19:52Z] Error in request: Failed to find metadata for volume-snap-00103065-6f23-4da5-9d4b-d1168d5e9559.img
172.18.0.5 - -[28/Jul/2018:21:19:52 +0000] "POST /v1/replicas/1?action=reload HTTP/1.1" 500 226
[2018-07-28T21:19:52Z] Error in reloadreplica tcp://172.18.0.5:9502
[2018-07-28T21:19:52Z] Error adding replica, err: Bad response: 500 500 Internal Server Error: {"actions":{},"code":"Server Error","detail":"","links":{"self":"http://172.18.0.5:9502/v1/replicas/1"},"message":"Failed to find metadata for volume-snap-00103065-6f23-4da5-9d4b-d1168d5e9559.img","status":500,"type":"error"}
, will retry
[2018-07-28T21:19:54Z] Closing replica                              
[2018-07-28T21:19:54Z] Addreplica tcp://172.18.0.5:9502             
[2018-07-28T21:19:54Z] Get Volume info from controller
[2018-07-28T21:19:54Z] CheckAndResetFailedRebuild tcp://172.18.0.5:9502
[2018-07-28T21:19:54Z] Opening volume /vol3, size 2147483648/512    
[2018-07-28T21:19:54Z] Error Failed to find metadata for volume-snap-00103065-6f23-4da5-9d4b-d1168d5e9559.img during open
[2018-07-28T21:19:54Z] Error during open in checkAndResetFailedRebuild
[2018-07-28T21:19:54Z] CheckAndResetFailedRebuild failed, err:Failed to find metadata for volume-snap-00103065-6f23-4da5-9d4b-d1168d5e9559.img
[2018-07-28T21:19:54Z] Error adding replica, err: Failed to find metadata for volume-snap-00103065-6f23-4da5-9d4b-d1168d5e9559.img, will retry
[2018-07-28T21:19:56Z] Closing replica    
[2018-07-28T21:19:56Z] Close replica failed, s.r not set
[2018-07-28T21:19:56Z] Addreplica tcp://172.18.0.5:9502     
[2018-07-28T21:19:56Z] Get Volume info from controller              
[2018-07-28T21:19:56Z] CheckAndResetFailedRebuild tcp://172.18.0.5:9502
[2018-07-28T21:19:56Z] Opening volume /vol3, size 2147483648/512    
[2018-07-28T21:19:56Z] Error Failed to find metadata for volume-snap-00103065-6f23-4da5-9d4b-d1168d5e9559.img during open
[2018-07-28T21:19:56Z] Error during open in checkAndResetFailedRebuild
[2018-07-28T21:19:56Z] CheckAndResetFailedRebuild failed, err:Failed to find metadata for volume-snap-00103065-6f23-4da5-9d4b-d1168d5e9559.img
[2018-07-28T21:19:56Z] Error adding replica, err: Failed to find metadata for volume-snap-00103065-6f23-4da5-9d4b-d1168d5e9559.img, will retry
[2018-07-28T21:19:58Z] Closing replica

Revert snapshot fails intermittently due to "Snapshot revert failed: Server status error: Internal Server Error" from Jiva replicas

Revert snapshot fails intermittently due to "Bad status: 500 500 Internal Server Error" from Jiva replicas.

This issue was observed during e2e run.

After successful creation of snapshot, tried reverting to the snapshot has error-ed out "Snapshot revert failed: Server status error: Internal Server Error"

E2E ansible logs:

TASK [Confirm successful snapshot creation] ************************************
task path: /var/lib/jenkins/[*******]/e2e/ansible/playbooks/feature/snapshots/simple-volume/snapshot.yml:163
changed: [localhost -> None] => {"changed": true, "cmd": "source ~/.profile; kubectl exec maya-apiserver-6c5764ddf5-hjwbn -n [*******] -c maya-apiserver -- mayactl snapshot list --volname simple-volume-vut -n simple-volume", "delta": "0:00:00.676632", "end": "2018-08-16 17:57:50.484217", "failed_when_result": false, "rc": 0, "start": "2018-08-16 17:57:49.807585", "stderr": "", "stderr_lines": [], "stdout": "\nSnapshot Details:\n------------------\nNAME                                     CREATED AT                       SIZE(in MB)      PARENT                                   CHILDREN\n-----                                    -----------                      ------------     -------                                  --------- \nbb4e8281-5a6f-4e08-a428-063bb6ac60fe     Thu Aug 16 12:27:40 UTC 2018     0.0000           NA                                       18b70eb4-ec31-49b5-9f61-56af94d52525\n                                                                                                                                     \n18b70eb4-ec31-49b5-9f61-56af94d52525     Thu Aug 16 12:27:44 UTC 2018     31.3401          bb4e8281-5a6f-4e08-a428-063bb6ac60fe     quicksnap\n                                                                                                                                     \nquicksnap                                Thu Aug 16 12:27:49 UTC 2018     98.6493          18b70eb4-ec31-49b5-9f61-56af94d52525     head-003\n                                                                                                                                     ", "stdout_lines": ["", "Snapshot Details:", "------------------", "NAME                                     CREATED AT                       SIZE(in MB)      PARENT                                   CHILDREN", "-----                                    -----------                      ------------     -------                                  --------- ", "bb4e8281-5a6f-4e08-a428-063bb6ac60fe     Thu Aug 16 12:27:40 UTC 2018     0.0000           NA                                       18b70eb4-ec31-49b5-9f61-56af94d52525", "                                                                                                                                     ", "18b70eb4-ec31-49b5-9f61-56af94d52525     Thu Aug 16 12:27:44 UTC 2018     31.3401          bb4e8281-5a6f-4e08-a428-063bb6ac60fe     quicksnap", "                                                                                                                                     ", "quicksnap                                Thu Aug 16 12:27:49 UTC 2018     98.6493          18b70eb4-ec31-49b5-9f61-56af94d52525     head-003", "                                                                                                                                     "]}

TASK [Remount the volume] ******************************************************
task path: /var/lib/jenkins/[*******]/e2e/ansible/playbooks/feature/snapshots/simple-volume/snapshot.yml:174
changed: [localhost -> None] => {"changed": true, "dump": "0", "fstab": "/etc/fstab", "fstype": "ext4", "name": "/mnt/jiva", "opts": "discard,_netdev", "passno": "0", "src": "/dev/sdc"}

TASK [Remove the file created] *************************************************
task path: /var/lib/jenkins/[*******]/e2e/ansible/playbooks/feature/snapshots/simple-volume/snapshot.yml:184
changed: [localhost -> None] => {"changed": true, "path": "/mnt/jiva/f1", "state": "absent"}

TASK [Unmount the volume again before snap revert] *****************************
task path: /var/lib/jenkins/[*******]/e2e/ansible/playbooks/feature/snapshots/simple-volume/snapshot.yml:191
changed: [localhost -> None] => {"changed": true, "dump": "0", "fstab": "/etc/fstab", "name": "/mnt/jiva", "opts": "defaults", "passno": "0"}

TASK [Revert volume snapshot] **************************************************
task path: /var/lib/jenkins/[*******]/e2e/ansible/playbooks/feature/snapshots/simple-volume/snapshot.yml:198
fatal: [localhost -> None]: FAILED! => {"changed": true, "cmd": "source ~/.profile; kubectl exec maya-apiserver-6c5764ddf5-hjwbn -n [*******] -c maya-apiserver -- mayactl snapshot revert --volname simple-volume-vut -n simple-volume --snapname quicksnap", "delta": "0:00:00.636941", "end": "2018-08-16 17:57:52.597013", "failed_when_result": true, "msg": "non-zero return code", "rc": 1, "start": "2018-08-16 17:57:51.960072", "stderr": "Snapshot revert failed: Server status error: Internal Server Error\ncommand terminated with exit code 1", "stderr_lines": ["Snapshot revert failed: Server status error: Internal Server Error", "command terminated with exit code 1"], "stdout": "Executing volume snapshot revert ...", "stdout_lines": ["Executing volume snapshot revert ..."]}

API for configuring S3 credentials

The snapshots can be backed up to S3. The backup API should allow to pass the S3 credentials and parameters (bucket/folder) for storing or retrieving the snapshots. If no credentials are provided, it should use the default one provided to the container environment.

Ability collect core dump in case of crash or manual induced

As a Developer I would want to:

  • Debug the production issues easier in case of crashes.
  • Collect coredump & be able to get all details from the coredump, like build number
  • Steps for stack trace, so that support team can inform me the same.

Make is failing with hash sum mismatch

Issue:
When tried to build the project using make command, Step 14 is failing with error hash sum mismatch and the dependencies are failing to install

Full error message:

E: Failed to fetch http://archive.ubuntu.com/ubuntu/dists/xenial-updates/main/binary-amd64/by-hash/SHA256/76599a679c52da6081c2626159418966af8973ca91bb02e8f131b91e880e05d5 Hash Sum mismatch E: Some index files failed to download. They have been ignored, or old ones used instead.

Speed up the replica sync process

Currently, in longhorn, when replica connects to controller, replica gets its data synced from other healthy RW replica. Healthy replica reads the entire data to find the snapshot that need to be rebuilt at WO replica. This is time taking process. This US is to speed up the sync process between RW and WO replicas.

Ability to monitor the bottlenecks in the storage IO path

The IO traverses through the frontend controller to the replication/backend containers to the actual storage media. The duration of time spent at each layer of the IO stack should be available for debugging and analysis.

The timing information collected should be able to pinpoint to the cause for the slowness of the IO. For instance, it should be easy to identify if the slowness is caused due to network latency, disk latency or the load on the hosts running the containers etc.,

Define the contributing guidelines for openebs/longhorn

The guidelines should consider the following:

  • This project is forked from rancer/longhorn, which is pretty active. The features that are developed
    openebs/longhorn and rancher/longhorn will need to be merged with each other.
  • In addition, we need a branch where openebs specific features are also implemented.

Details need to be given on which branch to use for generating pull requests to rancher/longhorn, the branches for merging openebs feature.

Replica self registeration with Controller

Controller should be able to run independently without any replicas. The IO should be gracefully handled. The controller can be launched with a secret key.

When replica's are launched with controller ip, and secret key, they should be able to self-register with controller. Also in case replica's are launched before controller or in case the controller crashes, the replicas should self-register.

e2e : Maintain data consistency over controller container restarts

The following e2e scenario needs to be supported:

  • Controller is running.
  • Replica1 and Replica2 are registered with Controller
  • Connect to the volume from client side, write data and check the md5sum (store it)
  • Restart the the controller with the same IP address
  • The client should auto-reconnect to the volume.
  • Verify the md5sum of the data on the volume with the previously stored value.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.