GithubHelp home page GithubHelp logo

docs's People

Contributors

ajaytripathy avatar biancaburtoiu avatar bstuder99 avatar chipzoller avatar dwbrown2 avatar elenalape avatar garreeoke avatar ivancherepov avatar ivankube avatar jessegoodier avatar kaelanspatel avatar kbrwn avatar kc-adawson avatar keithhand avatar kirbsauce avatar kwombach12 avatar linhlam-kc avatar mattray avatar mbolt35 avatar michaelmdresser avatar mmurph3 avatar nealormsbee avatar nickcurie avatar nikovacevic avatar praveenkumar0566 avatar rossfisherkc avatar sean-holcomb avatar srpomeroy avatar teevans avatar thomasvn avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

docs's Issues

Some Athena queries failing with Permission Denied error

I've followed the docs to set up AWS cost reporting. Am running the latest cost-analyzer helm chart (1.85.3)

I can see in the Athena in my master billing account some of the queries are completing, but others are failing with:

Your query has the following errors:[ErrorCategory:USER_ERROR, ErrorCode:PERMISSION_ERROR], Detail:Permission denied on S3 path: s3://kubecost-reports/kubecost_/kubecost_intergration/kubecost_intergration/year=2021/month=8/kubecost_intergration-00002.snappy.parquet, Message:Amazon Athena experienced a permission error. Please provide proper permission and submitting the query again. If the issue reoccurs, contact AWS support for further assistance. You will not be charged for this query. We apologize for the inconvenience.

As far as I understand it, this is kubecost (via athena) trying to read from that parquet file when executing a query? It's got the following permission:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "AthenaAccess",
            "Effect": "Allow",
            "Action": [
                "athena:*"
            ],
            "Resource": [
                "*"
            ]
        },
        {
            "Sid": "ReadAccessToAthenaCurDataViaGlue",
            "Effect": "Allow",
            "Action": [
                "glue:GetDatabase*",
                "glue:GetTable*",
                "glue:GetPartition*",
                "glue:GetUserDefinedFunction",
                "glue:BatchGetPartition"
            ],
            "Resource": [
                "arn:aws:glue:*:*:catalog",
                "arn:aws:glue:*:*:database/athenacurcfn*",
                "arn:aws:glue:*:*:table/athenacurcfn*/*"
            ]
        },
        {
            "Sid": "AthenaQueryResultsOutput",
            "Effect": "Allow",
            "Action": [
                "s3:GetBucketLocation",
                "s3:GetObject",
                "s3:ListBucket",
                "s3:ListBucketMultipartUploads",
                "s3:ListMultipartUploadParts",
                "s3:AbortMultipartUpload",
                "s3:CreateBucket",
                "s3:PutObject"
            ],
            "Resource": [
                "arn:aws:s3:::aws-athena-query-results-*"
            ]
        },
        {
            "Sid": "S3ReadAccessToAwsBillingData",
            "Effect": "Allow",
            "Action": [
                "s3:Get*",
                "s3:List*"
            ],
            "Resource": [
                "arn:aws:s3:::kubecost-reports"
            ]
        }
    ]
}

Any clues on how to investigate further? I'm sure something must be misaligned, but having gone through it all I'm struggling to find the mismatch.

gz#768

Clarification on spot data feed configuration

Hi there, I'm trying to get the spot instance data feed working for a deployment of Kubecost, and I've run into trouble getting it to work fully, part of my confusion lying in how the docs are formatted. For context. this is on a 1.16.0 cluster in AWS, and we are using kiam in order to grant access, rather than access keys and secrets. The IAM role granted to Kubecost is overly permissive in terms of S3, so I don't believe that to be the problem.

Is the Prometheus metric node_total_hourly_cost a trustworthy sign that the spot feed is working? I did notice the metric is available before configuring the spot feed, but after configuring the spot data feed, the metric dropped significantly, which makes me think things might be working.

There's also the issue that I don't see the following logs at all:

I1104 00:21:02.905327       1 awsprovider.go:1397] Found spot info {Timestamp:2019-11-03 20:35:48 UTC UsageType:USE2-SpotUsage:t2.micro Operation:RunInstances:SV050 InstanceID:i-05487b228492b1a54 MyBidID:sir-9s4rvgbj MyMaxPrice:0.010 USD MarketPrice:0.004 USD Charge:0.004 USD Version:1}
I1104 00:21:02.922372       1 awsprovider.go:1376] Spot feed version is "#Version: 1.0"

However I do see these logs:

I0518 17:40:32.750949       1 awsprovider.go:1804] Found 0 spot data files from yesterday
I0518 17:40:32.765787       1 awsprovider.go:1813] Found 2 spot data files from today

After crawling through the code a bit, it appears the logs about finding spot data files confirm that Kubecost is able to read from the S3 bucket. The numbers here are also accurate to the files in the bucket itself. Shortly after the above logs, I also see:

2020/05/19 03:19:43 error getting addresses: EnvAccessKeyNotFound: AWS_ACCESS_KEY_ID or AWS_ACCESS_KEY not found in environment
2020/05/19 03:19:43 error getting disks: EnvAccessKeyNotFound: AWS_ACCESS_KEY_ID or AWS_ACCESS_KEY not found in environment

Which I believe to be generated by the AWS SDK. So, while kiam and the associated role seem to be working, something else looks to be trying to use environment variables that don't exist.

With that in mind, is there something missing in the docs, or something I'm missing in the docs, to get the spot data feed working?

Metrics Issue with Prometheus making it not work

Hi there.

It is not working for me. I installed using helm 3 on a 3 nodes cluster with K8s version 1.18.4.

When I access the localhost link, it shows the below error at the end of the page:-
Unable to establish a connection with Prometheus at http://localhost:9090/api

Deployment kubecost-prometheus-server has the issue. Its pods is not running. It shows persistent volume claim error.

image

Because, of above even Grafana dashboard does not show up the metrics.

Please let me know how we can make it work. Thanks.

Document queries run by ETL tool

It is currently necessary to reference source code in order to determine the queries run by the ETL tool.

A document containing these queries for various platforms may be valuable to users.

It can be referenced as a footnote in the existing ETL docs.

Clarify requestSizing API targetCPUUtilization parameter

In this docs

https://github.com/kubecost/docs/blob/3a61e569e60ec30671240efb7970a17959e4afaa/api-request-right-sizing.md

It says that targetCPUUtilization should be between 0 and 1.
It does not say if this range is exclusive (0, 1) or inclusive [0, 1]
It actually behaves in a odd way:

  • 0 returns 500: error: RecommmendRequestSizesMaxHeadroom: expected target CPU utilization to be > 0, was 0.000000
  • -1 returns a response
  • 2 returns a reponse
  • it is not clear which numbers notation/syntax are allowed. Is 1e-2 allowed?
  • 0e returns 200. 1abc returns 200 too. What does it means?

I would like to know which value will be used for this parameter, if not sent

gz#1207

Wrong cost computation for GPU node

Hello,
I have a single Azure Standard_NC6 instance in one of my nodepools, it's showing a daily usage cost of over $900. That's more than the monthly cost for that instance.
image

AWS Spot Instance Not Showing

Hi there, about a week ago I configured the AWS integration and permissions. I was able to view which EKS nodes were spot instances and their different hourly rate, but recently I could not view these. Could someone look into this?

gz#2146

(related to Zendesk ticket #2146)

Upgrade from 1.66.0 to 1.67.0 is having issue

Hello, I am trying to upgrade kubecost from 1.66.0 to 1.67.0 in a cluster where Kubernetes version is 1.13.

As mentioned in document, https://github.com/kubecost/docs/blob/master/thanos-upgrade.md I have done the changes for secret and deployed the same. After that upgrading to version 1.67.0 using helm.
Thanos is getting upgraded to version 0.15.0.

But while upgrade pod cost-analyzer is in pending state. And below is the message while describing the pod. Could you please help on the below issue?

` Type Reason Age From Message


Normal NotTriggerScaleUp 74s cluster-autoscaler pod didn't trigger scale-up (it wouldn't fit if a new node is added): 2 Insufficient cpu, 6 node(s) had taints that the pod didn't tolerate, 8 node(s) had no available volume zone
Normal NotTriggerScaleUp 64s cluster-autoscaler pod didn't trigger scale-up (it wouldn't fit if a new node is added): 8 node(s) had no available volume zone, 2 Insufficient cpu, 6 node(s) had taints that the pod didn't tolerate
Normal NotTriggerScaleUp 54s cluster-autoscaler pod didn't trigger scale-up (it wouldn't fit if a new node is added): 2 Insufficient cpu, 8 node(s) had no available volume zone, 6 node(s) had taints that the pod didn't tolerate
Warning FailedScheduling 10s (x4 over 77s) default-scheduler persistentvolumeclaim "costing-cost-analyzer-db" not found
Normal NotTriggerScaleUp 3s (x5 over 44s) cluster-autoscaler pod didn't trigger scale-up (it wouldn't fit if a new node is added): 8 node(s) had no available volume zone, 6 node(s) had taints that the pod didn't tolerate, 2 Insufficient cpu`

While upgrading it is creating a new PVC and as old PVC is also there, so I assume it is confused to use which PV/PVC.

`NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE

costing-cost-analyzer Bound pvc-64ef5f3d-cb78-11ea-b62b-0a6a682c3ebc 1Gi RWO ebs-general-gp2 90d

costing-cost-analyzer-db Bound pvc-0a814698-12e5-11eb-b072-025920e6cffc 32Gi RWO ebs-general-gp2 12m`

How to Set up CUR integration in AWS Gov Cloud

AWS S3 CUR Integration.pdf
Need documentation around this process

IAM Roles Anywhere is NOT supported for non AWS workloads: https://docs.aws.amazon.com/IAM/latest/UserGuide/id_roles_common-scenarios_non-aws.html

The CUR would need to be created in the commercial account and transferred over to an s3 bucket in govcloud so kubecost can pull the data correctly.

This can be achieved various ways through awscli, govcloud import tool, s3 sync, datasync, lambda, etc.

The high level concept is to pull the data from the s3 bucket down to an intermediary then upload it to an s3 bucket in aws govcloud.

Public sector customers want clearer direction on how to achieve this and how kubecost would be set up to run in govcloud. One approach for setting up kubecost in govcloud is the s3 CUR integration (attached) after it is transferred from commercial to govcloud.

s3 sync
https://docs.aws.amazon.com/cli/latest/reference/s3/sync.html

datasync
https://docs.aws.amazon.com/managedservices/latest/userguide/data-sync.html

govcloud import tool
https://aws.amazon.com/blogs/publicsector/gov-cloud-import-tool-how-to-transfer-information-between-identity-boundaries/

question for optimization the suitable request

refer to : https://github.com/kubecost/docs/blob/main/api-request-right-sizing.md
there a section with words below:
"Within the query window, the pod could haved saved: 2 cores * (15min / (60 min/hr)) = 0.5 core-hours 0.67 core-hours * $7/core-hour = $3.50 "
and im confused , if "0.5 core-hours 0.67 core-hours * $7/core-hour " should be "0.5 core-hours * $7/core-hour" and gave the result " = $3.50",
was it a typo ?

gz#2045

(related to Zendesk ticket #2045)

Document all sliders and options on the new allocation page clearly

Essentially we want to have clear documentation for every slider on our core cost allocation page clearly defined. I'm envisioning detailed screenshots and samples This would include for example:

  1. What are services? https://github.com/kubecost/cost-analyzer-frontend/issues/1179#issuecomment-1284250065
  2. What is idle?
  3. How do I make queries for labels
  4. How do I query data on a per day basis
    ...and many more

This task represents a common theme where users don't actually know how the allocation page works and what their various options are.

The order of navigation links should be improved

We should at re-ordering these links. See examples below.

  • Spot guide is probably not the best starting point after our Welcome guide.
    image

  • Installing Kubecost with Rafay is not the first link we would want to show under Setup.
    image

The navbar is overflowing

On the landing page of Kubecost.com, the navigation bar is overflowing to the x-axis on mobile phone browsers.
It is causing really bad UX.

If you want I could create a PR for this.

gz#1960

(related to Zendesk ticket #1960)

Updated terminology and clarify aws cloud integration doc

"Master Payer" is now "Management Account" according to AWS: https://docs.aws.amazon.com/organizations/latest/userguide/orgs_getting-started_concepts.html

Member accounts have access to the CUR if the management account allows it and do not need management account IAM permissions for access or to enable. This set up is not well reflected in the "Setting up IAM permissions" section:

My kubernetes clusters all run in the same account as the master payer account.
My kubernetes clusters run in different accounts from the master payer account

My Kubernetes clusters run in the same account as the master payer account
My Kubernetes clusters run in different accounts

Maybe these options should reflect the location of the CUR rather than the location of the cluster?

api test

I'm having a problem with this.

Confusion about storage requirements

The storage configuration documentation states the following:

Where ingested samples can be measured as the average over a recent period, e.g. sum(avg_over_time(scrape_samples_post_metric_relabeling[24h])). On average, Prometheus uses around 1.5-2 bytes per sample. So ingesting 100k samples per minute and retaining for 15 days would demand around 40 GB. It's recommended to add another 20-30% capacity for headroom and WAL. More info on disk sizing here.

Is my arithmetic wrong, or does the example actually suggest 4GB is needed rather than 40GB?

2bytes per sample * 100k samples per minute * 15 days (21,600 minutes) = 4320000000 bytes = 4.32GB

gz#2123

(related to Zendesk ticket #2123)

Add instructions for setting up the Athena Database and Table

The AWS Cloud Integration explains that you have to supply the Athena Database and Table, and explains that these were created in the Step 2: Setting up the CUR, however Setting up the CUR is actually Step 1, and additionally that step never details how to set up the database and table. The following Step 2: Setting up Athena also doesn't detail how to create this database and table.

* `athenaDatabase` the name of the database created by the CUR setup
* The athena database name is available as the value (physical id) of `AWSCURDatabase` in the CloudFormation stack created above (in [Step 2: Setting up the CUR](#Step-2:-Setting-up-Athena))
* `athenaTable` the name of the table created by the CUR setup
* The table name is typically the database name with the leading `athenacurcfn_` removed (but is not available as a CloudFormation stack resource)

I'm not sure if these databases and tables are supposed to be automatically created, or whether I need to create them manually, and which settings to set them up with.

Could this be clarified in the documentation?

The currrent documentation website has limited discoverability/support for hierachal documentation

Ref: opencost/opencost#781

This might need split into two issues.

The initial question I had was: For documentation ( such as how to integrate kubecost's alerting with MS Teams) where in the documentation hierachy should this go. I think at the moment it would either be in the alerting.md file.

This has lead to a larger question on my side: Is the architect theme being used for documentation suitable for a growing documentation base ?

Looking at the docs as it stands they seem to be hard to discover/search from the documentation website. I think this is due in part due to the flat structure of the architect theme. When compared to themes more aimed at documentation ( docsy, doks, etc ) architect doesn't natively support a hierachal navigation approach and instead relies on a simple side bar + links within existing pages. Would there be any interest in trying a different structure for the documentation to aid with discoverability?

gz#1306

Document federated AWS configuration as a simple diagram or flow chart

In a recent customer discussion they shared that they were using individual/sub AWS accounts with their cloud-integration.json to allow their primary Kubecost access to the subaccounts' CUR data. It appeared that savings plan data was only available through the master payer account, which they were not using so it was not applied.

The federated AWS configuration and documentation is confusing, a simple diagram or flowchart would be useful for this very common deployment. Explaining how to support various AWS account configurations (individual accounts, sub accounts, etc.) would be very helpful.

A simple explainer like https://github.com/kubecost/poc-common-configurations/blob/main/aws/README-enterprise.md would also be appreciated.

Add notes around AWS S3 Lifecycle Management best practices for cost data

For the AWS Spot Data Feed and the Athena Results Bucket, we should add some notes around best practices for lifecycle management.

For Athena query results, we use them immediately and don't look at previous results. So long as they are going to a dedicated S3 bucket, lifecycle policies can be used to expire and delete objects after 1 day.

For the Spot data feed, we only use them when pulling current pricing information. So long as we are the only consumer of the spot data feed files, lifecycle policies can be used to expire and delete objects after a few days.

Lifecycle policies could also be used on the S3 bucket where the CUR is stored. But that would need to be defined based on the customer's requirements for historical cost data.

Might be worth setting expiration to something like 7 days in case of the need for troubleshooting.

Using lifecycle policies to remove objects we no longer need will ensure will limit wasteful spending in customer accounts.

New docs are failing to publish

We are unable to publish new docs to guide.kubecost.com right now due to expired credentials. Docs that already exist are still receiving updates.

See e.g. https://github.com/kubecost/docs/actions/runs/3113402788

Technical details of the issue explained here: https://github.com/kubecost/zd-docs-uploader/issues/7. Raising an issue here because this repo has more visibility.

@bstuder99 we were talking about this earlier.

@Adam-Stack-PM FYI -- we probably need to put eng effort here to find a solution and implement it. There are a few different ways to fix, some quicker than others, but I don't have time to walk through them and pick one.

External costs - AWS master-payer pattern clarifications

I've got a master-payer account structure, with subaccounts for eg Dev, QA, Prod, etc. Each cluster in each subaccount runs it's own kubecost cost-analyzer deployment.

Am I right in thinking that the 'external' costs that get pulled in from AWS will be for all accounts (as per the CUR being exported and queried over?). I can see a lot of things in the QA QA1 cluster's 'Assets' view that I presume are from eg Prod.

Is there any way to only see kubernetes_cluster-tagged resources in this view? Or is the idea that the assets view shows everything and you then use switch-cluster to show numbers for a specific one?

gz#770

Where we need to set PV storage class

Hi Team,

Please let me know in which file we need to put "persistentVolume.dbStorageClass=your-topology-aware-storage-class-name" this entry.

Thanks!
Kishor.

Deployment controllers and service section data not showing up in kubecost dashboard

Hi Team, we have been looking into kubecost for getting cost in AKS at namespace level, and it sounds like a good fit for our needs.
Deployment controller aren't be identified, gets into a generic section called no controllers as shown in the image, why is this happening ?
kubecost-screenshot

We are passing a custom values file with the helm chart kubecost/cost-analyzer, which disables prometheus, grafana and adds an ingress resource

global:
  grafana:
    domainName: grafana-xxx.com
    enabled: false
    proxy: false
    scheme: http
  notifications:
    alertmanager:
      enabled: false
  prometheus:
    enabled: false
    fqdn: http://prometheus-server.monitoring.svc.cluster.local
  thanos:
    enabled: false

ingress:
  enabled: true
  annotations:
    kubernetes.io/ingress.class: nginx
  paths: ["/"] 
  hosts:
    - kubecost.xx.com
  tls: []

prometheus:
  alertmanager:
    enabled: false
    persistentVolume:
      enabled: false
  nodeExporter:
    enabled: true

networkCosts:
  enabled: true

kubecostProductConfigs:
  azureStorageSecretName: kubecost-storage

Chart version: cost-analyzer-1.86.1
Not sure why the deployment controllers and service section doesnt show up ?

kubecost-bug-report-1637727299197.txt

Thank You !

gz#1077

Markdown not rendering properly on the docs website

This issue is with respect to the getting started page If we look at the cost optimization section on this page then it's confusing to me. With too many numbers and prices floating around, it's very easy to get confused and one have to repeat the reading. Overall, for me, I have to read it thrice to get it.

Screenshot from 2022-06-16 21-56-52
The same document is much easier to read on the GitHub page.

Screenshot from 2022-06-16 22-02-38

  • I think the problem is arising because the markdown is not rendering properly on the website.

Proposed solution

  • We can add a diagram to make things more clear, and with visuals it will be much easier to understand without getting distracted.
  • If not diagram then we can add math blocks support which GitHub recently launched and with that the equation will look like this on the github page

$ ((0.5/2) * 20 + (0.5/1) * 1) / (20 + 1) = 5.5 / 21 = 26% 2 $

  • I'm not sure how this will look like on the main docs website.

gz#2000

(related to Zendesk ticket #2000)

Thanos Receivers with Kubecost

Hi,

We are using Thanos receivers in our cluster.
We have implemented Kubecost, in the documentation, it uses thanos side car in prometheus
but we are using Thanos Receivers to avoid tightly coupled with prometheus.

We'd like to configure the kubecost to use our thanos receivers but I couldn't find in the documentation.
Hope someone can help me.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.