GithubHelp home page GithubHelp logo

Metrics collection about opera-sds-int HOT 4 OPEN

riverma avatar riverma commented on June 3, 2024
Metrics collection

from opera-sds-int.

Comments (4)

riverma avatar riverma commented on June 3, 2024

@hhlee445 @chrisjrd @maseca - FYI and to provide guidance to @philipjyoon.

from opera-sds-int.

niarenaw avatar niarenaw commented on June 3, 2024

A lot of these metrics would be directly impacted by the number of autoscaling fleets and their maximum sizes. Do we want to standardize these?

from opera-sds-int.

philipjyoon avatar philipjyoon commented on June 3, 2024

Some ideas from @niarenaw

  • Accumulated size (in bytes) of a given AWS S3 bucket over a given time frequency (down to minutes)
    • can do this programmatically by running the following command before and after the test and taking the difference: aws s3 ls s3://$BUCKET --recursive --summarize --human-readable
    • can also compute s3 size on the aws s3 console
  • Throughput (in bytes/sec) of a given AWS S3 bucket over a given time frequency (down to minutes)
    • can derive from previous metric and total length of load test
    • better granularity with Metrics tab on aws s3 console
  • Elasticsearch statistics (num docs, query time, etc.) for a given index over a given time frequency (down to minutes)
    • can use elasticsearch sdk or use the web ui to generate queries and filter by time range
    • I’m pretty horrible at the elasticsearch DSL syntax, but might be time I learn it properly
  • PCM queue sizes (QUEUED / PENDING jobs especially) over a given time frequency (down to minutes)
    • probably easiest to get these using Figaro and Lucene queries (ex. “job_queue:<> AND timestamp:<>” for each queue)
    • can make these programmatic by querying ES directly instead
  • AWS EC2 spot errors (insufficient capacity/terminations)
    • using AWS cloudtrail conosole, can search for BidEvictedEvent events in given time range

from opera-sds-int.

philipjyoon avatar philipjyoon commented on June 3, 2024

Thoughts on S3 size: What Nick has found seems to be the only way we can get near-real-time and high-frequency metrics on S3 bucket size. However it can get very slow for large buckets as well as costly. I think something like 0.005 cents per object query?

I did find an alternative using cloudwatch but it only works at daily frequency so not much useful to us:

aws --profile saml-pub cloudwatch get-metric-statistics --namespace AWS/S3 --start-time 2022-06-08T23:22:00 --end-time 2022-06-08T23:59:00 --period 86400 --statistics Average --metric-name BucketSizeBytes --dimensions Name=BucketName,Value=opera-dev-isl-fwd-pyoon Name=StorageType,Value=StandardStorage

Perhaps there are other metrics we can measure instead that can give us the same/similar insight into what's happening in the PCM and where the bottlenecks lie. If we are looking to see if the ingest workers are lagging behind the download workers (this is what high-frequency ISL S3 accum size would tell us) we could just measure the length of queue that which the ingest workers consume? I don't know if these queue entries would have file size in them; however, at least for HSLS and HSLL data, file sizes seem to be quite uniform.

from opera-sds-int.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.