GithubHelp home page GithubHelp logo

harmony-one / harmony-ops Goto Github PK

View Code? Open in Web Editor NEW
35.0 19.0 25.0 17.92 MB

Harmony Ops Master Repository.

License: MIT License

Shell 32.11% Makefile 0.20% Go 8.28% Python 35.19% JavaScript 3.37% CSS 2.50% HTML 17.26% Jinja 1.08%
harmony-one devops aws blockchain harmony scripts

harmony-ops's Introduction

Harmony Foundational Node Success Guide

Harmony Day ONE Mainnet

This provides information for Harmony Foundational Nodes to be sucessful.

What is Harmony?

Our vision is ‘Open Consensus for 10 Billion People.’

Our open infrastructure is a high-throughput, low-latency, and low-fee consensus platform designed to power decentralized economies of the future.

We aim to deliver both scalability and decentralization. The promise of open consensus is to enable decentralized coordination at scale but no platform has yet been able to achieve both.

Similar to the way that Google vertically integrates its search infrastructure, we take a full stack approach to solve consensus at scale. We apply 10x innovations at every layer in consensus algorithms, systems, and networking to maximize the performance of our network while maintaining decentralization. Our end-to-end integration allows us to iterate faster and make more aggressive optimizations than could be done with a modular approach.

Our Approach

Harmony's technical architecture implements a full & secure sharding scheme, efficient consensus, and scalable networking infrastructure.

Harmony relies on a secure sharding process, where validators are distributed into shards based on a randomness that is both unpredictable and unbiasable. Harmony is a Proof-of-Stake blockchain, where validators needs to stake a certain amount of tokens to be eligible for block validation. Harmony integrates an efficient consensus protocol called FBFT that combines BLS multi-signature and view change protocol to achieve high robustness and low latency. With the adoption of networking technology including RaptorQ fountain code and Kademlia routing, Harmony is able to achieve cross-shard transactions that scale sub linearly by the number of shards.

By innovating on each layer, Harmony aims to provide a scalable, secure, and decentralized system that supports economic activities, including data marketplaces, gaming, and financial transactions, for billions of people.

To participate in our research and technical discussions, join our Research forum on talk.harmony.one

harmony-ops's People

Contributors

alajko avatar andybowu avatar cem-harmony avatar daniel-vdm avatar dependabot[bot] avatar diego1q2w avatar fxfactorial avatar jhd2best avatar john-harmony avatar mattlockyer avatar pr2a avatar rhazberries avatar sophoah avatar trueinsider avatar xchickens avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

harmony-ops's Issues

DevOps.sh Script improvement

For log checking and Bingo keyword verification, the current script works well for the legacy running nodes, but it won't work for node launched by Terraform due to the location of log folder change. I am thinking to improve it.

@LeoHChen Your thoughts?

Upgrade Grafana EBS Volume to 250 GB

Per LC's request, we'd like to store one month of historical data on Grafana server. In that case, we need to increase the size of the Grafana to 250 GB.

Amazon EC2 Instance Retirement

5 in total, they are:

(instance-ID: i-02da31bb79757709b) in the us-west-2 region
(instance-ID: i-00d3acf938c7f8520) in the us-west-2 region
(instance-ID: i-058e4758c92d0a955) in the us-west-2 region
(instance-ID: i-091232c0b9f84ef02) in the us-west-2 region
(instance-ID: i-0980e4873f3764c84) in the us-west-2 region
(instance-ID: i-08b896007684d3611) in the eu-west-1 region

Consolidated Issues for Harmony Explorer

Priority: Low

URL: https://explorer.harmony.one/#/

All 4 shards are working properly at the moment, but the information displayed in the explorer website is not accurate.

From the messages gathered from Discord and Wechat, there are a few issues we'd like to address:

  • stability issue, it went offline sometime for no specific reasons
  • it shows 2 shards are online, but all 4 are good
  • number of block count get reset, and sometimes it needs to re-sync

Screen Shot 2019-08-26 at 10 53 37 AM

Recover an Offline Node on Sep 7, 2019

https://harmonyone.pagerduty.com/incidents/POL952U

The monitor shard-3-008 (13.228.72.14) is currently DOWN (Port Is Not Listening).

We were not able to recover the node using the devops.sh script, then we decide to launch a new node to replace the old one using Terraform.

Finally

  • update uptime

./uptimerobot.sh -G update shard-3-008 54.249.98.66 3

  • update shard3.txt (local on devops instance, and on S3)

Investigation needed for the shrinking disk space issue on the Grafana Server

This issue happened before about 2 months ago since when we started to monitor 400+ nodes in our testnet. The issue is about Prometheus service keeps all historical data from every single server, and there is no mechanism yet to recycle those data.

Need to figure out a decent solution to solve this problem.

local diagnostic system for foundational node runners

we need to design a system to help foundation node runners to run/monitor their nodes properly. This will also help reduce our support task, and enhance our network liveness.

The goals:

  1. monitor the running status of FN nodes, earning, issue, online/offline, alert/paging
  2. monitor the status of the host, cpu, ram, networking, disk, and alarm
  3. collect diagnostic logs to help our team triage the issues of the node, reduce communication cost

The project can be implemented in different stages. We shall also use existing framework/system to accomplish the tasks.

Stages

In the interim stage, we can employ a monitoring script and a diagnostic script.
In the longer term, we can build local grafana server, logstash, and pushagent, to collect the metrics from node, from local host, and present/visualize the data to make it more user-friendly.

Monitoring script

we have a few monitoring script already. We need to adapt it to the latest log.
ETA: 10 hours

Diagnostic script

Node runner can run this script to collect issues, if they report they are not earning or offline, or not working. It can create a tarball and upload the data to a public repo so that we can do further analysis by the team.
ETA: 20 hours

Prometheus and Grafana deployment using Terraform

We have a bash script for this task, but in the long term, we should provide this server using an open-source IaC tool. In this case, Terraform is the best candidate for this project based on our research.

Tools Needed to be installed:

  • Prometheus
  • Grafana
  • Prometheus Alertmanager
  • Prometheus Pushgateway?

Consolidated Issues for /1h Website

ISSUE 1
Shard 2 shows offline, but I suspect the leader has been rotated.

ISSUE 2
The hourly reward does not look right, it should be somewhere around 46.

Send out notification to offline node runners

Been doing technical supports for our node runners, one common request is that the offline node runners would like to have a prompt (email or sms) notification about offline nodes.

Screen Shot 2019-08-28 at 9 04 17 AM

MainNet Preparation

This Issue keeps track of tasks that may be worked on in preparation for mainnet.

Task List

  • Monitoring
    • mystatus.sh
    • uptimeRobot automatic scripting
  • Documentation
    • Create Harmony Wiki
      • Status Page
    • Trouble Shooting Playbook
    • Foundational Node Operators Guide
    • Release Process

Some nodes earning rewards less than expected

Some customer complained about his nodes are earning rewards around 27 per hour, but it should be around 40. Need to investigate this issue.

Note that the customer has shared the account addresses with me.

EC2 Instance Retirement - i-0f06451ba2ca27c9c

EC2 has detected degradation of the underlying hardware hosting your Amazon EC2 instance (instance-ID: i-0f06451ba2ca27c9c) associated with your AWS account (AWS Account ID: 656503231766) in the eu-central-1 region. Due to this degradation your instance could already be unreachable. We will stop your instance after 2019-09-11 08:00 UTC.

Shard 0..2 are offline from /balances website

In the /balances webpage, it shows shard 0, 1 and 2 are offline, but from the explorer webpage, all 4 shards are up and online. Very likely the IP of the leader in those shards (0, 1, and 2) have been changed.

Screen Shot 2019-09-04 at 8 47 32 PM

Decommission Retired Grafana Server

There was an issue about the underlying hardware running Grafana server. We launched a new server to replace the old one. Need to delete the old retired grafana server under 6565-0323-1766 AWS account.

Harmony Node Reboot Recovery

We need to automatically recovery harmony node after ec2 instance reboot.

  1. add a service script to auto start soldier after boot up.
  2. modify userdata script to install the service and run the service the first time
  3. after soldier start, run local curl command to re-init, if there is an existing init.json file.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.