GithubHelp home page GithubHelp logo

query-ethereum's Introduction

query-ethereum

Production

Geth node

We need to make this restart if it crashes or if the server restarts. We should also do a full sync eventually

docker run  --restart always -d -it -p 8547:8547 -p 8545:8545 -p 30303:30303 -v ~/development/query-ethereum/geth-data:/root/.ethereum ethereum/client-go --graphql --graphql.addr 0.0.0.0 --rpc --rpcaddr 0.0.0.0 --nousb --graphql.vhosts=* --rpcvhosts=*

Ethereum ETL Docker Container

First clone the repository:

git clone https://github.com/blockchain-etl/ethereum-etl.git

Then create the docker image:

cd ethereum-etl
docker build -t ethereum-etl:latest .

Installing docker on EC2

sudo apt install docker.io
sudo groupadd docker
sudo usermod -aG docker ${USER}

If permissions are still messed up, you could try just changing the permissions on the docker file thing

sudo chmod 666 /var/run/docker.sock

Exit the terminal and re-enter

Setting up Node.js server in production

You need to create a .env-production file in the root directory with the environment variables

You also need to install docker and create the ethereum-etl image

git clone
npm install
npm run startup-production
npx pm2 save
npx pm2 startup

query-ethereum's People

Contributors

lastmjs avatar

Watchers

 avatar  avatar  avatar

query-ethereum's Issues

Geth crashed

I believe geth ran for 1-2 weeks without problems, but it just crashed on February 16th, I'm not sure why. A quick reboot should get it up and running again. It also runs consistently about 20-30 minutes behind, and I'm not quite sure why. I hope it's not just because of the specs of the machine

Conferences

We should probably get on this sooner rather than later...I would like to be on Lambda, with transactions and logs up to date and syncing before jumping into conferences though

Consider custom queries

It might be nice to have some custom queries for use-case specific stuff. For example, providing similar functionality to what eth gas station does, just as a custom query. Perhaps this library could evolve into something like web3.js or ethers.js, it'll just be entirely exposed through GraphQL. There can be queries that are executed locally and remotely. The local queries will do stuff that can only be done on the client, like signing transactions, generating keys, etc. The remote queries can do stuff like check gas prices, submit transactions, etc...hmmm...this would all be really cool

Use Chainlink for historical price data

I'm hoping I can get price data from chain link. I believe this will work very well for all current blocks and moving forward, but obviously I won't be able to get price data from the beginning of the chain

Beta

  • put in last and first query

  • We need to make importing much more efficient if possible...perhaps hook into geth's db directly instead of going through the GraphQL endpoint or rpc endpoint

  • Generate TypeScript types from the GraphQL schema

  • Put an interval on stats. By default, the interval is the entire range from the first date to the last date of the returned records. If you set the interval to seconds, minutes, hours, days, weeks etc, the stats will return an array with each of the stats for the interval as each item. You'll be able to grab the startDate and endDate of the interval inside of the stats object

  • Get Ethereum node to stay up to date...constantly import into postgres, shouldn't be that bad

  • Optimize stats, only calculate stats that are requested

  • Add transactions

  • Add price data

  • Consider rate limiting and pricing

  • Solicit feedback on next steps

  • Ensure infrastructure will scale, consider fargate and auto-scaling if necessary

  • Setup service for geth on AWS, make sure it will always remain running and will restart when necessary

  • Security audit of entire system

  • Write introductory article (perhaps on dev.to), post on Twitter, Reddit. Reach out to people you know are doing analytics

  • Give the load balancer its own security group...for some reason it is in the same security group as the Postgres database, so I've opened both the database and the load balanacer to general traffic

  • Support this use case: https://twitter.com/gane5h/status/1223665938970509313?s=20

  • Seems like we're going to need some more powerful filtering, like intervals...that person above wants to see week over week, I've personally been wanting to see month over month how tx/sec is increasing or decreasing, figure out how to do stuff like this elegantly

  • Add Google analytics to get some user counts if possible, not sure how to easily integrate that into the playground but hopefully it's possible.

  • For performance, I want to be able to query entire months of block and transaction data from June 2017 to January 2018. That's the height of the bull market and very interesting data. Right now it's too much for the servers to handle, and I believe it actually crashes the GraphQL server process. pm2 restarts it. Get rid of that behavior of crashing

  • Create production-ready versions of the geth docker container and the ethereum-etl docker container

  • https only for queryethereum.com

  • check in the ethereum-etl-data directory somehow, permissions are weird because of docker, but we need it to exist in production

  • Lock down permissions, do not run all processes as root

  • Multithread everything

  • optimize the first query

    • See if you can use normal select for normal queries, and then use the cursor only when necessary. using the cursor to go all the way to the beginning is really bad for performance
  • Make sure the node process has enough memory, allow it to use up as much memory as possible

    • I broke the server asking for too much data, this cannot happen. I think using threads will help with this
  • Make repo self-contained...the repo should be able to install docker, clone ethereum-etl, build the image, all from npm install or npm start...same for the ethereum node, especially make sure that it can restart correctly

  • Consider downgrading to t3.small

    • This gives us 2 cores and 2 g of memory, which is probably the lowest we should go for the node server...I want at least 2 cores, and below 2 g of memory just seems really low
    • The big question is the geth server, not sure how small that one can be
    • We might even be able to do a t3.micro, still has 2 cpus, but 1 g of ram. We'll have to decide on the db, node, graphql, the graphql server itself can probably be pretty small. In fact, it might even do well as a lambda function. In fact, that would be nice. We could then get rid of the load balancer as well, and the lambda functions should scale pretty niftily...hmmm...that could work rather nicely, we'll see
    • We do not to continuously update the postgres server though, so lambdas might not work super well for that. There are scheduled tasks
  • Consider Unlock protocol for payments...making my own might be much better

  • Only select the fields from the select sql query that are requested in the selection set

  • We have to speed up importing from geth somehow...instead of executing update manies, perhaps create a bunch of update many and send them all at once

    • perhaps individual create statements would work better
    • We also might want to get rid of the unnecessary indices that might be slowing things down a lot
  • If a request gets cancelled on the client, it would be nice to cancel it on the server. For example, I asked for all blocks on accident, and I cancelled it on the client but I believe it was too late. The server is frozen now

    • I should put a cap on the memory that a node process is allowed to use, I think I just ran into the problem of a node process using all of the memory available
  • Put the memory max as an environment variable so that it is easy to switch when we move ec2 instances around

  • Some really good robustness testing

    • Make a request for all blocks
    • Make sure other requests can still come through and that the process that asked for all of the blocks is killed eventually and cleaned up
  • If one person asks for too much from a query, it will kill everyone else...either we need to deal with this or use AWS Lambda

  • I think Lambda is the way to go.

    • Serverless framework
    • Set up scheduled events, maybe once per minute, to sync geth with postgres
    • Biggest concerns: how long is spinup time? How will we do subscriptions? We don't have subscriptions right now, but if we ever need them we'll probably have to setup a dedicated long-running server...I don't know if lambda can multi-thread, in case you would ever need that for calculations
    • I think we need to do this to scale
  • this query doesn't work ```graphql
    query {
    latestBlock: blocks(last: 1) {
    items {
    number
    timestamp
    }
    }

    latestStats: blocks(last: 10000) {
    stats {
    transactionCount {
    total
    average {
    perSecond
    }
    }
    }
    }
    }

- [ ] Optimize the groupings if needed
  - [ ] the bigger groupings over longer periods of time are not optimal, I think because of the array spreads that I am doing. They are large, and happen on every single block

Consider example website

It might be useful to create an Ethereum network statistics dashboard to show what the capabilities of Query Ethereum are...try to differentiate it from anything out there. I'm not sure what exactly to build, but showing gas used, gas limit, network utilization, price, gas per second, all of that stuff...I think it would be interesting. All of the stuff that I want to see constantly, which is going to be all of the indicators of demand for ETH. So gas used, gas price, ETH transfers, erc20 transfers, all of that

Prepare for Gitcoin CLR matching

It would probably be a good idea to enter the next CLR matching round in March

Requirements:

  • Nice logo
  • Excellent README
  • Excellent documentation with examples, showing how developers and non-developers alike can run chain analysis
  • Transactions
  • Transaction logs
  • It would be really neat to be able to do MakerDAO, DAI, DSR analysis by then

Alpha

  • Allow querying the gasLimit and difficulty (the aggregate values get too large)
  • Get rid of SQL injection possibilities
  • Get rid of requirement for client to add required fields to the selection set for stats
  • Clean up code, declarative, do not repeat, typed, etc
  • Fix sql injection
  • Import all blocks from node to postgres on my local machine
  • Test it all out
  • Deploy an Ethereum node in production (only sync through present, no need to keep it up to date for now)
  • Deploy GraphQL server in production
  • Deploy Postgres server in production
  • ssl

1.0

  • Allow people to make charts right from the browser, just run a query and click create chart

Validation

Round 1, started early February 2019

I reached out to maybe 10-15 people directly, I've posted on Twitter, and on Reddit. About 5 people have responded directly to me, saying generally that the idea is a good one and looks useful. They were all non-developers, and all expressed the same concern: this looks like it's for developers. Most didn't seem to think it would apply to them as non-developers. One person did think it would apply to him, but mentioned that more documentation or something needs to be done to help non-developers.

Some tech-minded people replied to Tweets. They asked how this is different from The Graph, and two people have asked for direct use-cases (DSR deposits and what you would have earned on DeFi protocols if you had invested early). Tweets get a handful of likes when I post updates.

Also, I posted on r/ethereum. The upvote number is 11, 100% upvoted. I also got picked up on Week in Ethereum, and apparently it was one of the most clicked items: https://twitter.com/evan_van_ness/status/1229862064400850944

With all of this feedback, I'd say we have passed round 1 of validation. This project looks useful/valuable, and I should continue building it out. The next steps seem to me to be adding transactions and events, and building out the capabilities for the use cases that have been requested so far. I should also add analytics to the website, and tutorials or documentation of some kind. A landing page perhaps. I also would like to move to Lambda for scalability, and increase the power of the geth node if necessary. I would like to make sure that it is up-to-date, not 30 minutes behind. Also, I would like to add pricing data. Chainlink seems like a great solution for the past yearish of data (essentially it launched in June 2019). More historical data will have to be obtained from elsewhere, though the main APIs seem to have bad terms.

Round 2 of validation will commence during the implementation of these next features.

Scaling ideas

  • AWS ECS with AWS Fargate for GraphQL
  • AWS Lambda for GraphQL (still need to figure out how to keep Postgres in sync with geth)

historical returns from DeFi

See if we can provide for this use case: https://twitter.com/davecraige/status/1225239343297519616?s=20

Essentially he wants to be able to plug in a date and a number to see the returns from different DeFi protocols. For example, for the DSR, we would have to go through time to find all DSR interest rates, and then calculate from there based on the principal investment. With compound we would have to find all compound rates, etc

UI

In some of my feedback, non-developers have expressed concern that it is still too much for them to use the GraphQL API directly. What if there were a project that took a GraphQL schema and generated a UI, with dropdowns and everything? This might solve that problem rather easily

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.