Essentially he wants to be able to plug in a date and a number to see the returns from different DeFi protocols. For example, for the DSR, we would have to go through time to find all DSR interest rates, and then calculate from there based on the principal investment. With compound we would have to find all compound rates, etc
We need to make importing much more efficient if possible...perhaps hook into geth's db directly instead of going through the GraphQL endpoint or rpc endpoint
Generate TypeScript types from the GraphQL schema
Put an interval on stats. By default, the interval is the entire range from the first date to the last date of the returned records. If you set the interval to seconds, minutes, hours, days, weeks etc, the stats will return an array with each of the stats for the interval as each item. You'll be able to grab the startDate and endDate of the interval inside of the stats object
Get Ethereum node to stay up to date...constantly import into postgres, shouldn't be that bad
Optimize stats, only calculate stats that are requested
Add transactions
Add price data
Consider rate limiting and pricing
Solicit feedback on next steps
Ensure infrastructure will scale, consider fargate and auto-scaling if necessary
Setup service for geth on AWS, make sure it will always remain running and will restart when necessary
Security audit of entire system
Write introductory article (perhaps on dev.to), post on Twitter, Reddit. Reach out to people you know are doing analytics
Give the load balancer its own security group...for some reason it is in the same security group as the Postgres database, so I've opened both the database and the load balanacer to general traffic
Seems like we're going to need some more powerful filtering, like intervals...that person above wants to see week over week, I've personally been wanting to see month over month how tx/sec is increasing or decreasing, figure out how to do stuff like this elegantly
Add Google analytics to get some user counts if possible, not sure how to easily integrate that into the playground but hopefully it's possible.
For performance, I want to be able to query entire months of block and transaction data from June 2017 to January 2018. That's the height of the bull market and very interesting data. Right now it's too much for the servers to handle, and I believe it actually crashes the GraphQL server process. pm2 restarts it. Get rid of that behavior of crashing
Create production-ready versions of the geth docker container and the ethereum-etl docker container
https only for queryethereum.com
check in the ethereum-etl-data directory somehow, permissions are weird because of docker, but we need it to exist in production
Lock down permissions, do not run all processes as root
Multithread everything
optimize the first query
See if you can use normal select for normal queries, and then use the cursor only when necessary. using the cursor to go all the way to the beginning is really bad for performance
Make sure the node process has enough memory, allow it to use up as much memory as possible
I broke the server asking for too much data, this cannot happen. I think using threads will help with this
Make repo self-contained...the repo should be able to install docker, clone ethereum-etl, build the image, all from npm install or npm start...same for the ethereum node, especially make sure that it can restart correctly
Consider downgrading to t3.small
This gives us 2 cores and 2 g of memory, which is probably the lowest we should go for the node server...I want at least 2 cores, and below 2 g of memory just seems really low
The big question is the geth server, not sure how small that one can be
We might even be able to do a t3.micro, still has 2 cpus, but 1 g of ram. We'll have to decide on the db, node, graphql, the graphql server itself can probably be pretty small. In fact, it might even do well as a lambda function. In fact, that would be nice. We could then get rid of the load balancer as well, and the lambda functions should scale pretty niftily...hmmm...that could work rather nicely, we'll see
We do not to continuously update the postgres server though, so lambdas might not work super well for that. There are scheduled tasks
Consider Unlock protocol for payments...making my own might be much better
Only select the fields from the select sql query that are requested in the selection set
We have to speed up importing from geth somehow...instead of executing update manies, perhaps create a bunch of update many and send them all at once
perhaps individual create statements would work better
We also might want to get rid of the unnecessary indices that might be slowing things down a lot
If a request gets cancelled on the client, it would be nice to cancel it on the server. For example, I asked for all blocks on accident, and I cancelled it on the client but I believe it was too late. The server is frozen now
I should put a cap on the memory that a node process is allowed to use, I think I just ran into the problem of a node process using all of the memory available
Put the memory max as an environment variable so that it is easy to switch when we move ec2 instances around
Some really good robustness testing
Make a request for all blocks
Make sure other requests can still come through and that the process that asked for all of the blocks is killed eventually and cleaned up
If one person asks for too much from a query, it will kill everyone else...either we need to deal with this or use AWS Lambda
I think Lambda is the way to go.
Serverless framework
Set up scheduled events, maybe once per minute, to sync geth with postgres
Biggest concerns: how long is spinup time? How will we do subscriptions? We don't have subscriptions right now, but if we ever need them we'll probably have to setup a dedicated long-running server...I don't know if lambda can multi-thread, in case you would ever need that for calculations
I think we need to do this to scale
this query doesn't work ```graphql
query {
latestBlock: blocks(last: 1) {
items {
number
timestamp
}
}
latestStats: blocks(last: 10000) {
stats {
transactionCount {
total
average {
perSecond
}
}
}
}
}
- [ ] Optimize the groupings if needed
- [ ] the bigger groupings over longer periods of time are not optimal, I think because of the array spreads that I am doing. They are large, and happen on every single block
I believe geth ran for 1-2 weeks without problems, but it just crashed on February 16th, I'm not sure why. A quick reboot should get it up and running again. It also runs consistently about 20-30 minutes behind, and I'm not quite sure why. I hope it's not just because of the specs of the machine
In some of my feedback, non-developers have expressed concern that it is still too much for them to use the GraphQL API directly. What if there were a project that took a GraphQL schema and generated a UI, with dropdowns and everything? This might solve that problem rather easily
We should probably get on this sooner rather than later...I would like to be on Lambda, with transactions and logs up to date and syncing before jumping into conferences though
I reached out to maybe 10-15 people directly, I've posted on Twitter, and on Reddit. About 5 people have responded directly to me, saying generally that the idea is a good one and looks useful. They were all non-developers, and all expressed the same concern: this looks like it's for developers. Most didn't seem to think it would apply to them as non-developers. One person did think it would apply to him, but mentioned that more documentation or something needs to be done to help non-developers.
Some tech-minded people replied to Tweets. They asked how this is different from The Graph, and two people have asked for direct use-cases (DSR deposits and what you would have earned on DeFi protocols if you had invested early). Tweets get a handful of likes when I post updates.
With all of this feedback, I'd say we have passed round 1 of validation. This project looks useful/valuable, and I should continue building it out. The next steps seem to me to be adding transactions and events, and building out the capabilities for the use cases that have been requested so far. I should also add analytics to the website, and tutorials or documentation of some kind. A landing page perhaps. I also would like to move to Lambda for scalability, and increase the power of the geth node if necessary. I would like to make sure that it is up-to-date, not 30 minutes behind. Also, I would like to add pricing data. Chainlink seems like a great solution for the past yearish of data (essentially it launched in June 2019). More historical data will have to be obtained from elsewhere, though the main APIs seem to have bad terms.
Round 2 of validation will commence during the implementation of these next features.
It might be nice to have some custom queries for use-case specific stuff. For example, providing similar functionality to what eth gas station does, just as a custom query. Perhaps this library could evolve into something like web3.js or ethers.js, it'll just be entirely exposed through GraphQL. There can be queries that are executed locally and remotely. The local queries will do stuff that can only be done on the client, like signing transactions, generating keys, etc. The remote queries can do stuff like check gas prices, submit transactions, etc...hmmm...this would all be really cool
It might be useful to create an Ethereum network statistics dashboard to show what the capabilities of Query Ethereum are...try to differentiate it from anything out there. I'm not sure what exactly to build, but showing gas used, gas limit, network utilization, price, gas per second, all of that stuff...I think it would be interesting. All of the stuff that I want to see constantly, which is going to be all of the indicators of demand for ETH. So gas used, gas price, ETH transfers, erc20 transfers, all of that
I'm hoping I can get price data from chain link. I believe this will work very well for all current blocks and moving forward, but obviously I won't be able to get price data from the beginning of the chain