kubecost / docs Goto Github PK
View Code? Open in Web Editor NEWProduct and technical docs
Home Page: http://kubecost.com
Product and technical docs
Home Page: http://kubecost.com
Thanos component url should be changed to https://thanos.io/tip/components/.
As titled. Here's the enumeration of routes and what is marked as write. https://github.com/kubecost/kubecost-cost-model/blob/develop/pkg/auth/samlauth.go#L269
I've followed the docs to set up AWS cost reporting. Am running the latest cost-analyzer helm chart (1.85.3)
I can see in the Athena in my master billing account some of the queries are completing, but others are failing with:
Your query has the following errors:[ErrorCategory:USER_ERROR, ErrorCode:PERMISSION_ERROR], Detail:Permission denied on S3 path: s3://kubecost-reports/kubecost_/kubecost_intergration/kubecost_intergration/year=2021/month=8/kubecost_intergration-00002.snappy.parquet, Message:Amazon Athena experienced a permission error. Please provide proper permission and submitting the query again. If the issue reoccurs, contact AWS support for further assistance. You will not be charged for this query. We apologize for the inconvenience.
As far as I understand it, this is kubecost (via athena) trying to read from that parquet file when executing a query? It's got the following permission:
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "AthenaAccess",
"Effect": "Allow",
"Action": [
"athena:*"
],
"Resource": [
"*"
]
},
{
"Sid": "ReadAccessToAthenaCurDataViaGlue",
"Effect": "Allow",
"Action": [
"glue:GetDatabase*",
"glue:GetTable*",
"glue:GetPartition*",
"glue:GetUserDefinedFunction",
"glue:BatchGetPartition"
],
"Resource": [
"arn:aws:glue:*:*:catalog",
"arn:aws:glue:*:*:database/athenacurcfn*",
"arn:aws:glue:*:*:table/athenacurcfn*/*"
]
},
{
"Sid": "AthenaQueryResultsOutput",
"Effect": "Allow",
"Action": [
"s3:GetBucketLocation",
"s3:GetObject",
"s3:ListBucket",
"s3:ListBucketMultipartUploads",
"s3:ListMultipartUploadParts",
"s3:AbortMultipartUpload",
"s3:CreateBucket",
"s3:PutObject"
],
"Resource": [
"arn:aws:s3:::aws-athena-query-results-*"
]
},
{
"Sid": "S3ReadAccessToAwsBillingData",
"Effect": "Allow",
"Action": [
"s3:Get*",
"s3:List*"
],
"Resource": [
"arn:aws:s3:::kubecost-reports"
]
}
]
}
Any clues on how to investigate further? I'm sure something must be misaligned, but having gone through it all I'm struggling to find the mismatch.
gz#768
Instead of "Redhat OpenShift" it should be "Red Hat Openshift". I will submit a PR.
I'm pretty sure this header button isn't relevant to our users. If so, can we remove from this page?
https://guide.kubecost.com/hc/en-us/articles/4407595950359-Welcome#getting-started-1
Prometheus is deployed with many jobs. Some of these jobs are difficult to determine exactly what is being scraped and why. Some jobs seem like they would overlap. Let's try to doc some of these jobs:
https://github.com/kubecost/cost-analyzer-helm-chart/blob/develop/cost-analyzer/values.yaml#L383
Hi there, I'm trying to get the spot instance data feed working for a deployment of Kubecost, and I've run into trouble getting it to work fully, part of my confusion lying in how the docs are formatted. For context. this is on a 1.16.0 cluster in AWS, and we are using kiam in order to grant access, rather than access keys and secrets. The IAM role granted to Kubecost is overly permissive in terms of S3, so I don't believe that to be the problem.
Is the Prometheus metric node_total_hourly_cost
a trustworthy sign that the spot feed is working? I did notice the metric is available before configuring the spot feed, but after configuring the spot data feed, the metric dropped significantly, which makes me think things might be working.
There's also the issue that I don't see the following logs at all:
I1104 00:21:02.905327 1 awsprovider.go:1397] Found spot info {Timestamp:2019-11-03 20:35:48 UTC UsageType:USE2-SpotUsage:t2.micro Operation:RunInstances:SV050 InstanceID:i-05487b228492b1a54 MyBidID:sir-9s4rvgbj MyMaxPrice:0.010 USD MarketPrice:0.004 USD Charge:0.004 USD Version:1}
I1104 00:21:02.922372 1 awsprovider.go:1376] Spot feed version is "#Version: 1.0"
However I do see these logs:
I0518 17:40:32.750949 1 awsprovider.go:1804] Found 0 spot data files from yesterday
I0518 17:40:32.765787 1 awsprovider.go:1813] Found 2 spot data files from today
After crawling through the code a bit, it appears the logs about finding spot data files confirm that Kubecost is able to read from the S3 bucket. The numbers here are also accurate to the files in the bucket itself. Shortly after the above logs, I also see:
2020/05/19 03:19:43 error getting addresses: EnvAccessKeyNotFound: AWS_ACCESS_KEY_ID or AWS_ACCESS_KEY not found in environment
2020/05/19 03:19:43 error getting disks: EnvAccessKeyNotFound: AWS_ACCESS_KEY_ID or AWS_ACCESS_KEY not found in environment
Which I believe to be generated by the AWS SDK. So, while kiam and the associated role seem to be working, something else looks to be trying to use environment variables that don't exist.
With that in mind, is there something missing in the docs, or something I'm missing in the docs, to get the spot data feed working?
Hi there.
It is not working for me. I installed using helm 3 on a 3 nodes cluster with K8s version 1.18.4.
When I access the localhost link, it shows the below error at the end of the page:-
Unable to establish a connection with Prometheus at http://localhost:9090/api
Deployment kubecost-prometheus-server has the issue. Its pods is not running. It shows persistent volume claim error.
Because, of above even Grafana dashboard does not show up the metrics.
Please let me know how we can make it work. Thanks.
Kubecost has a tunable sampling resolution to capture short-lived pods more accurately. Let's document where to set this, how much extra memory it needs, etc.
It is currently necessary to reference source code in order to determine the queries run by the ETL tool.
A document containing these queries for various platforms may be valuable to users.
It can be referenced as a footnote in the existing ETL docs.
In this docs
It says that targetCPUUtilization
should be between 0 and 1.
It does not say if this range is exclusive (0, 1)
or inclusive [0, 1]
It actually behaves in a odd way:
0
returns 500: error: RecommmendRequestSizesMaxHeadroom: expected target CPU utilization to be > 0, was 0.000000-1
returns a response2
returns a reponse1e-2
allowed?0e
returns 200. 1abc
returns 200 too. What does it means?I would like to know which value will be used for this parameter, if not sent
gz#1207
It looks like the background color is wrong.
When you hover on icons they are no longer visible.
@kbrwn are you best to tackle?
https://guide.kubecost.com/ ends up on https://guide.kubecost.com/hc/en-us/articles/6152374933655-Windows-Node-Support
A better landing page would probably be https://guide.kubecost.com/hc/en-us/articles/4407595950359-Welcome
This step https://github.com/kubecost/docs/blob/main/aws-cloud-integrations.md#step-4-attaching-iam-permissions-to-kubecost need to be simplified by providing AWS CLI commands to help customer following the instructions easier.
Create a complete list of both internal and external event triggers and notifications. Then determine if there are additional items we would like to add.
Related Issues
kubecost/cost-analyzer-helm-chart#1319
opencost/opencost#264
kubecost/cost-analyzer-helm-chart#1252
Related PRs
https://github.com/kubecost/kubecost-cost-model/pull/796
opencost/opencost#1214
https://github.com/kubecost/kubecost-cost-model/pull/784
Hi there, about a week ago I configured the AWS integration and permissions. I was able to view which EKS nodes were spot instances and their different hourly rate, but recently I could not view these. Could someone look into this?
gz#2146
(related to Zendesk ticket #2146)
Hello, I am trying to upgrade kubecost from 1.66.0 to 1.67.0 in a cluster where Kubernetes version is 1.13.
As mentioned in document, https://github.com/kubecost/docs/blob/master/thanos-upgrade.md I have done the changes for secret and deployed the same. After that upgrading to version 1.67.0 using helm.
Thanos is getting upgraded to version 0.15.0.
But while upgrade pod cost-analyzer is in pending state. And below is the message while describing the pod. Could you please help on the below issue?
` Type Reason Age From Message
Normal NotTriggerScaleUp 74s cluster-autoscaler pod didn't trigger scale-up (it wouldn't fit if a new node is added): 2 Insufficient cpu, 6 node(s) had taints that the pod didn't tolerate, 8 node(s) had no available volume zone
Normal NotTriggerScaleUp 64s cluster-autoscaler pod didn't trigger scale-up (it wouldn't fit if a new node is added): 8 node(s) had no available volume zone, 2 Insufficient cpu, 6 node(s) had taints that the pod didn't tolerate
Normal NotTriggerScaleUp 54s cluster-autoscaler pod didn't trigger scale-up (it wouldn't fit if a new node is added): 2 Insufficient cpu, 8 node(s) had no available volume zone, 6 node(s) had taints that the pod didn't tolerate
Warning FailedScheduling 10s (x4 over 77s) default-scheduler persistentvolumeclaim "costing-cost-analyzer-db" not found
Normal NotTriggerScaleUp 3s (x5 over 44s) cluster-autoscaler pod didn't trigger scale-up (it wouldn't fit if a new node is added): 8 node(s) had no available volume zone, 6 node(s) had taints that the pod didn't tolerate, 2 Insufficient cpu`
While upgrading it is creating a new PVC and as old PVC is also there, so I assume it is confused to use which PV/PVC.
`NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
costing-cost-analyzer Bound pvc-64ef5f3d-cb78-11ea-b62b-0a6a682c3ebc 1Gi RWO ebs-general-gp2 90d
costing-cost-analyzer-db Bound pvc-0a814698-12e5-11eb-b072-025920e6cffc 32Gi RWO ebs-general-gp2 12m`
AWS S3 CUR Integration.pdf
Need documentation around this process
IAM Roles Anywhere is NOT supported for non AWS workloads: https://docs.aws.amazon.com/IAM/latest/UserGuide/id_roles_common-scenarios_non-aws.html
The CUR would need to be created in the commercial account and transferred over to an s3 bucket in govcloud so kubecost can pull the data correctly.
This can be achieved various ways through awscli, govcloud import tool, s3 sync, datasync, lambda, etc.
The high level concept is to pull the data from the s3 bucket down to an intermediary then upload it to an s3 bucket in aws govcloud.
Public sector customers want clearer direction on how to achieve this and how kubecost would be set up to run in govcloud. One approach for setting up kubecost in govcloud is the s3 CUR integration (attached) after it is transferred from commercial to govcloud.
s3 sync
https://docs.aws.amazon.com/cli/latest/reference/s3/sync.html
datasync
https://docs.aws.amazon.com/managedservices/latest/userguide/data-sync.html
govcloud import tool
https://aws.amazon.com/blogs/publicsector/gov-cloud-import-tool-how-to-transfer-information-between-identity-boundaries/
Let's reflect changes in kubecost/cost-analyzer-helm-chart#1643 in latest AMP docs. @linhlam-kc would you still be up for taking this one? Thanks again for offering.
AMP documentation: https://guide.kubecost.com/hc/en-us/articles/4409859798679-Amazon-Managed-Service-for-Prometheus
refer to : https://github.com/kubecost/docs/blob/main/api-request-right-sizing.md
there a section with words below:
"Within the query window, the pod could haved saved: 2 cores * (15min / (60 min/hr)) = 0.5 core-hours 0.67 core-hours * $7/core-hour = $3.50 "
and im confused , if "0.5 core-hours 0.67 core-hours * $7/core-hour " should be "0.5 core-hours * $7/core-hour" and gave the result " = $3.50",
was it a typo ?
gz#2045
(related to Zendesk ticket #2045)
Essentially we want to have clear documentation for every slider on our core cost allocation page clearly defined. I'm envisioning detailed screenshots and samples This would include for example:
This task represents a common theme where users don't actually know how the allocation page works and what their various options are.
Issue
A new issue was missing the needs-triage tag.
Originally posted by @dwbrown2 in #263 (comment)
Soltion
Install GitHub bot in docs repository.
Step 3: Tagging Azure resources
The table still refers to "AWS Tags."
On the landing page of Kubecost.com, the navigation bar is overflowing to the x-axis on mobile phone browsers.
It is causing really bad UX.
If you want I could create a PR for this.
gz#1960
(related to Zendesk ticket #1960)
Document how we join the CUR with providerIds for savings plans, spot nodes, and reserved instances.
"Master Payer" is now "Management Account" according to AWS: https://docs.aws.amazon.com/organizations/latest/userguide/orgs_getting-started_concepts.html
Member accounts have access to the CUR if the management account allows it and do not need management account IAM permissions for access or to enable. This set up is not well reflected in the "Setting up IAM permissions" section:
My kubernetes clusters all run in the same account as the master payer account.
My kubernetes clusters run in different accounts from the master payer account
My Kubernetes clusters run in the same account as the master payer account
My Kubernetes clusters run in different accounts
Maybe these options should reflect the location of the CUR rather than the location of the cluster?
Node-exporter is optional, not essential to core Kubecost. Please update https://github.com/kubecost/docs/blob/master/architecture.md#kubecost-core-architecture-overview.
It gives you a little extra context on the overview page about what your kubelets are doing, but no core allocation models are affected.
I'm having a problem with this.
The storage configuration documentation states the following:
Where ingested samples can be measured as the average over a recent period, e.g. sum(avg_over_time(scrape_samples_post_metric_relabeling[24h])). On average, Prometheus uses around 1.5-2 bytes per sample. So ingesting 100k samples per minute and retaining for 15 days would demand around 40 GB. It's recommended to add another 20-30% capacity for headroom and WAL. More info on disk sizing here.
Is my arithmetic wrong, or does the example actually suggest 4GB is needed rather than 40GB?
2bytes per sample * 100k samples per minute * 15 days (21,600 minutes) = 4320000000 bytes = 4.32GB
gz#2123
(related to Zendesk ticket #2123)
The AWS Cloud Integration explains that you have to supply the Athena Database and Table, and explains that these were created in the Step 2: Setting up the CUR
, however Setting up the CUR
is actually Step 1
, and additionally that step never details how to set up the database and table. The following Step 2: Setting up Athena
also doesn't detail how to create this database and table.
docs/aws-cloud-integrations.md
Lines 376 to 379 in 6f5bc36
I'm not sure if these databases and tables are supposed to be automatically created, or whether I need to create them manually, and which settings to set them up with.
Could this be clarified in the documentation?
This might need split into two issues.
The initial question I had was: For documentation ( such as how to integrate kubecost's alerting with MS Teams) where in the documentation hierachy should this go. I think at the moment it would either be in the alerting.md file.
This has lead to a larger question on my side: Is the architect theme being used for documentation suitable for a growing documentation base ?
Looking at the docs as it stands they seem to be hard to discover/search from the documentation website. I think this is due in part due to the flat structure of the architect theme. When compared to themes more aimed at documentation ( docsy, doks, etc ) architect doesn't natively support a hierachal navigation approach and instead relies on a simple side bar + links within existing pages. Would there be any interest in trying a different structure for the documentation to aid with discoverability?
gz#1306
There are some broken links in the AWS Cloud Integration
Precisely in Step 4: Attaching IAM permissions to Kubecost section Attach via Pod Annotation on EKS
In a recent customer discussion they shared that they were using individual/sub AWS accounts with their cloud-integration.json to allow their primary Kubecost access to the subaccounts' CUR data. It appeared that savings plan data was only available through the master payer account, which they were not using so it was not applied.
The federated AWS configuration and documentation is confusing, a simple diagram or flowchart would be useful for this very common deployment. Explaining how to support various AWS account configurations (individual accounts, sub accounts, etc.) would be very helpful.
A simple explainer like https://github.com/kubecost/poc-common-configurations/blob/main/aws/README-enterprise.md would also be appreciated.
For the AWS Spot Data Feed and the Athena Results Bucket, we should add some notes around best practices for lifecycle management.
For Athena query results, we use them immediately and don't look at previous results. So long as they are going to a dedicated S3 bucket, lifecycle policies can be used to expire and delete objects after 1 day.
For the Spot data feed, we only use them when pulling current pricing information. So long as we are the only consumer of the spot data feed files, lifecycle policies can be used to expire and delete objects after a few days.
Lifecycle policies could also be used on the S3 bucket where the CUR is stored. But that would need to be defined based on the customer's requirements for historical cost data.
Might be worth setting expiration to something like 7 days in case of the need for troubleshooting.
Using lifecycle policies to remove objects we no longer need will ensure will limit wasteful spending in customer accounts.
We are unable to publish new docs to guide.kubecost.com right now due to expired credentials. Docs that already exist are still receiving updates.
See e.g. https://github.com/kubecost/docs/actions/runs/3113402788
Technical details of the issue explained here: https://github.com/kubecost/zd-docs-uploader/issues/7. Raising an issue here because this repo has more visibility.
@bstuder99 we were talking about this earlier.
@Adam-Stack-PM FYI -- we probably need to put eng effort here to find a solution and implement it. There are a few different ways to fix, some quicker than others, but I don't have time to walk through them and pick one.
I'm unable to access docs site at https://www.docs.kubecost.com/
I believe this worked correctly before the guide rerouting was setup.
I am running cost-model:prod-1.76.0 on a K8S cluster in 1.18
kube_pod_container_resource_requests_memory_bytes
mertic is DEPRECATED in the latest kube-state-metrics
version 2.0 , but KubeCost required it
Tags got italicized by mistake on pages:
We have kubernetes_label_NAME instead of 'kubernetes_label_NAME'
gz#1635
I've got a master-payer account structure, with subaccounts for eg Dev, QA, Prod, etc. Each cluster in each subaccount runs it's own kubecost cost-analyzer deployment.
Am I right in thinking that the 'external' costs that get pulled in from AWS will be for all accounts (as per the CUR being exported and queried over?). I can see a lot of things in the QA QA1 cluster's 'Assets' view that I presume are from eg Prod.
Is there any way to only see kubernetes_cluster-tagged resources in this view? Or is the idea that the assets view shows everything and you then use switch-cluster to show numbers for a specific one?
gz#770
Hi Team,
Please let me know in which file we need to put "persistentVolume.dbStorageClass=your-topology-aware-storage-class-name" this entry.
Thanks!
Kishor.
Hi Team, we have been looking into kubecost for getting cost in AKS at namespace level, and it sounds like a good fit for our needs.
Deployment controller aren't be identified, gets into a generic section called no controllers as shown in the image, why is this happening ?
We are passing a custom values file with the helm chart kubecost/cost-analyzer, which disables prometheus, grafana and adds an ingress resource
global:
grafana:
domainName: grafana-xxx.com
enabled: false
proxy: false
scheme: http
notifications:
alertmanager:
enabled: false
prometheus:
enabled: false
fqdn: http://prometheus-server.monitoring.svc.cluster.local
thanos:
enabled: false
ingress:
enabled: true
annotations:
kubernetes.io/ingress.class: nginx
paths: ["/"]
hosts:
- kubecost.xx.com
tls: []
prometheus:
alertmanager:
enabled: false
persistentVolume:
enabled: false
nodeExporter:
enabled: true
networkCosts:
enabled: true
kubecostProductConfigs:
azureStorageSecretName: kubecost-storage
Chart version: cost-analyzer-1.86.1
Not sure why the deployment controllers and service section doesnt show up ?
kubecost-bug-report-1637727299197.txt
Thank You !
gz#1077
This issue is with respect to the getting started page If we look at the cost optimization section on this page then it's confusing to me. With too many numbers and prices floating around, it's very easy to get confused and one have to repeat the reading. Overall, for me, I have to read it thrice to get it.
The same document is much easier to read on the GitHub page.
$ ((0.5/2) * 20 + (0.5/1) * 1) / (20 + 1) = 5.5 / 21 = 26% 2 $
gz#2000
(related to Zendesk ticket #2000)
@AjayTripathy suggested I create an issue to request an update the docs.kubecost.com dns entry to the redirect server IP (34.83.246.98).
Hi,
We are using Thanos receivers in our cluster.
We have implemented Kubecost, in the documentation, it uses thanos side car in prometheus
but we are using Thanos Receivers to avoid tightly coupled with prometheus.
We'd like to configure the kubecost to use our thanos receivers but I couldn't find in the documentation.
Hope someone can help me.
https://github.com/kubecost/kubecost-cost-model/issues/928 for context and what config options conflict.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.