green-software-foundation / sci Goto Github PK

View Code? Open in Web Editor NEW

248.0 21.0 51.0 556 KB

A specification that describes how to calculate a carbon intensity for software applications.

License: Other

CSS 27.27% HTML 71.92% JavaScript 0.81%

sci's Introduction

Software Carbon Intensity (SCI) Specification

A specification that describes how to calculate a carbon intensity score for software applications.

Created and managed by the Standards Working Group in the greensoftware.foundation.

Project Scope

This document, the Software Carbon Intensity technical specification, describes how to calculate the carbon intensity of a software application. It describes the methodology of calculating the total carbon emissions and the selection criteria to turn the total into a rate that can be used to achieve real-world, physical emissions reductions, also known as abatement.

Electricity has a carbon intensity depending on where and when it is consumed. An intensity is a rate. It has a numerator and a denominator. A rate provides you with helpful information when considering how to design, develop, and deploy software applications. This specification describes the carbon intensity of a software application or service.

Getting Started

The development version of the specification is here.
The latest published version of the specification is here.
The dev branch contains the current version that is being worked on and the main branch contains the latest published version.
Check the issues tab for active and closed conversations regarding the spec.

GitHub Training

Getting started with GitHub

Contributing

The recommended approach for getting involved with the specification is to:

Read the development version of the specification.
Raise an issue, question, or recommendation in the issues tab above and start a discussion with other members.
Once agreement has been reached, then raise a pull request to update the specification with your recommended changes.
Let others know about your pull request by either commenting on the relevant issue or posting in the Standards Working Group slack channel.
Pull requests are reviewed and merged during Standards Working Group meetings.
Only chairs of the Standards Working Group can merge pull requests.

Versioning

We use Semantic Versioning for versioning.

Copyright

Standard WG projects are copyrighted under Creative Commons Attribution 4.0.

License

Standard WG projects are licensed under the MIT License - see the LICENSE.md file for details.

Patent

Standard WG projects operate under the W3C Patent Mode.

Feedback

sci's People

Contributors

Stargazers

Watchers

sci's Issues

Accepted Methodology/Data Providers For Marginal Carbon Intensity

I can think of five data providers for location-based marginal carbon intensity. This opens up some questions:

How do we account for discrepancies?
Is it in scope of the SCI to define acceptable sources fo for this information?
Do we require these data sources meet a given standard that someone else has defined? (e.g. World Resources Institute, RMI, GHG)

Data providers:

WattTime ( @Henry-WattTime) has validated through RMI (also WRI certified?), which can be found on our website on this page at the Methodology and Validation section
ElectricityMap (WRI certified?)
REsurety models transmission constraints to get a locational marginal emissions intensity figure per building.
WeDeex by CSN Energy tracks carbon intensity of grid
e2Intel uses LEEM (Location Emission Estimation Methodology) - Liang Downey (MSFT) is trying to join GSF and bring LEEM into the discussion

Creating a citation for the SCI

I think it will be a good idea to add a citation file to the SCI repo as people start to reference the SCI once it is made public.

Define Marginal Carbon Intensity

"This is the emissions intensity of the marginal power plant which will be turned up if you schedule some compute (e.g. increase electricity demand from the grid) at that moment." - this is vague and needs proper definition. What is a "marginal power plant"?

Review term "Software Boundary"

Why is the section labeled "Software Boundary" when it makes reference to infrastructure? In what sense is this a software boundary?

In the LCA methodology, this would be related to scope.

Characteristic: Simple or Complex

The more complexity in calculating the software carbon intensity, the fewer people and teams that will adopt it.

How do we make things simple so they are easy to adopt, and yet not so simple that they lose accurate information?

One direction could be in using tooling or modelling as much as possible in calculating the number (in the specification we need to steer clear of implementation details but we can describe the types of tooling that might help).

Another direction is to leverage as much as possible existing methods of calculation but adjust them to fit our desired characteristics. E.g. cost, energy, network bandwidth and have our own emission factors which people can use to multiply out to a carbon number.

Perhaps being consequential makes our life a lot easier, if we ignore calculating the total carbon emissions and just focus on what are the emissions when just one more person uses your application, does that make things simpler?

Characteristic: Consequential versus Attributional

Telemetry Based and Benchmark Based SCI Calculations

Discussed in #66

^{Originally posted by jawache September 2, 2021}
In the SCI the conversation currently revolves a lot around the idea that you can measure the total carbon emissions for an application. For a lot of applications, esp. open source applications, this is not possible and we should be more inclusive and start talking about an alternative method of calculating the SCI.

For example if you are the maintainer of an open source library you have very little idea of where you library is used, the hardware, energy consumption, the carbon intensity. It would be impossible for you to be able to calculate a total carbon emissions. For those projects their only option for calculating the SCI is via a benchmark, a program which puts the application through a consistent set of use cases, a consistent load and then calcultes teh result. There are plenty of examples of performance benchmarks out there e.g. https://github.com/python/pyperformance or https://github.com/tensorflow/benchmarks/tree/master/perfzero and they measure the E, M and some thought needs to be put into I for these use cases.

Reducing the problem space a little bit.

There are two broad categories for applications, those that run in the cloud (>1 machine) those that run on one device (=1 machine).

There are also two methods of calculation, via telemetry (so you can calculate the total) or via benchmark.

+---------+--------------------+---------------------+
|         |     Telemetry      |      Benchmark      |
+---------+--------------------+---------------------+
| Cloud   |  Total / Baseline  |  Total of Benchmark |
| Device  |  Total / Baseline  |  Total of Benchmark |
+---------+--------------------+---------------------+

For example, if you are tensorflow you may choose perfzero as your benchmark.

You run perfzero and calculate SCI as (E * I) + M using the numbers you have directly measured from running that benchmark.
The SCI might then be reported as X per perfzero, with the name of the benchmark as the baseline.

If for example you are an AI Cloud SaaS then you would calculating using telemetry, like so:

T = (E * I) + M
Then you decide your baseline, if it's number of users then you would calculate it as
SCI = T / number of users which gives you something like SCI = X per user.

Use-Case: Measuring the total carbon emissions for a given ML workload (training/inference)

AzureML is a great use case to explore, due to the high cost (including carbon) of ML workloads. They are discrete 'jobs' and can showcase the carbon intensity specification work.

What Scope? (1/2/3)

It looks like we've got to scope this (heh) and fit the SCI within standard scope 1/2/3 methodologies. Flagging this to revisit

Baseline, acceptable R

As discussed during the working group

key idea:
"What does the software scale by"
(or functional unit)

R:
API call/request (per call)
Benchmark (version) (for lab vs. real-world)
User (account/person/per install/
Machine
Minute (time unit)
Device/Physical site (home, factory, device)
Data volume (bytes through system)
Job (batch jobs, training rounds)
Transaction (payments systems)
Mining
Database read/write

Embodied Carbon (`M`) Needs Detail

Flagging this to work with @arexub (Alex Bitukov) from MSFT around the latest embodied carbon work.

There's significant ambiguity around two key areas

we can provide guidelines for what's expected when calculating embodied carbon, but going beyond is not worthwhile
variability: same product can produce different results

Goal: generalize enough around key variables (@arexub (Alex Bitukov) from MSFT:

sourced materials
MFG overhead
transportation/packaging
use phase (energy consumption * useful life)
end of life
the actual rack you're running your workload on (not even accounting for datacenter infrastructure (e.g. concrete/wires/etc))

Electronics: some specific standards we can reference

Key unknown: how to amortize embodied carbon over lifespan (demoninator) of equipment?
total carbon / lifespan

What is the market perspective for this specification?

From a market perspective...

What can you do with this specification?
What problem does this solve?
How can this specification be applied?
Consider the target audience and provide deployment examples as possible.

[This was taken from the template spec doc]

Methodology: review section goals

"measuring the total change in global emissions associated with a particular piece of software"

replace measuring - none of the current document is related to measurement.

What is meant with "global emissions"?

What is a "particular piece of software"?

Update the citation file with WG participants

While #83 gets resolved (if not before release of SCIv1) then we should update the citation file manually with the names of WG participants. Based on point raised by @vaughanknight in #82

"Software Boundary" - inconsistent types of devices and term MAY is confusing

The list is a mix of types of devices and service functions.

Additionally, the term "MAY" is confusing - is this supposed to be an exhaustive list? I suppose not - so it should be explicitly stated that these are examples.

This section requires review and generalisation, so it can support a standard.

Defining Baselines (`R`)

In today's WG meeting we discussed defining possible definitions of R, being the baseline as a denominator. This issue is to create a thread and discussion to capture this.

Some items discussed for example:

per user
per workstation
per install
per $

We also discussed creating a "Recommended baselines" or "Accepted baselines" and a "Not accepted baselines".

I will leave it there for now so we can discuss.

Approved PRs should automatically merge

@Sealjay suggested approved PRs should merge automatically after a few days, which will bypass permissions issues.

Methodology For Energy Consumption (`E`)

This is a can of worms, but I'm flagging this so we can puzzle through fun questions such as how we account for....

Shared hardware? @SaraEmilyBergman has a great point here
idle load?
Utilization/energy consumption relationship
Acceptable ways to measure and report energy consumption

All software-based energy measurement contains errors. In terms of methodology, we have two approaches that may need to meet somewhere in the middle

top-down (e.g. datacenter plug)
bottom-up (on-chip) libraries (e.g. NVIDIA-smi or DCGM libraries) have been shown to have between 8-73% error

Two ways to measure energy per @jawache:

Direct measurement

Modeling

if everything is a model, what are the requirements? Some examples:
e.g. mapping utilization to energy per lookup table provided by hardware companies
what are the boundaries of your software model ?per @vaughanknight. - Can't just give a number and not account for software boundary #56. The boundary needs guidance on what significance means, including supporting network infrastructure such as:
idle machines
data stack
supporting network infrastructure
Compte
Storage
Networking
Memory

per @SaraEmilyBergman, we should add fundamental constraints/guardrails:

power consumption can't exceed the input power

Standardize Units For SCI

Propose we standardize around units for reporting & methodology (e.g Joules/kWh/ (gCO2eq/kWh))

The carbon intensity of electricity is a measure of how much carbon (CO2eq) emissions are produced per kilowatt-hour (kWh) of electricity consumed, for a standard unit of gCO2eq/kWh.

Let's use a good metric measurement if possible!

Some immediate thoughts:

Energy (Joules? kWh?)
Location-based carbon intensity MOER (I)
Embodied carbon (kg? g? lbs?)

@Henry-WattTime last time I dug into WattTime's documentation, it was really hard to find units within WattTime's documentation around carbon. I think I did't find a clear delineation between kg and lbs. Is this something you can add to WattTime's documentation help surface?

Emissions Boundaries & Scope For SCI Equation

Ben Davy showed a compelling visualization for his formula that outlines scope. I'm borrowing from his blog post to propose that we do something similar in our SCI for clarity:

Here are the definitions of Scope 1, 2, and 3 according to the GHG Protocol:
Scope 1 emissions are direct emissions from owned or controlled sources.
Scope 2 emissions are indirect emissions from the generation of purchased energy.
Scope 3 emissions are all indirect emissions (not included in scope 2) that occur in the value chain of the reporting company, including both upstream and downstream emissions.

Will need to account for who is reporting this. e.g. who is reporting
“If outsourced to the cloud, IT emissions previously accounted for under the Greenhouse Gas Protocol Scope 1 — emissions that are directly linked to the activities of an organization from sources that it owns and control — and Scope 2 — emissions from the generation of purchased energy — move to Scope 3, referring to all other emissions.”

READ THIS FIRST!

(content coming soon)

[AI] Support for Calculating carbon intensity

See Issue #22
@buchananwp - To bring someone in from Microsoft with more formal experience in carbon measurement methodologies.

Carbon Savings Quantification: Counterfactual & Delta

Flagging this as a high-level area to address: When we implement carbon-savings methodologies, how do we track/report it? This should be part of SCI equation itself, and will lay out clearly how you capture, quantify, and report the SCI savings due to your GSE Action. This metric is used to measure the success of Green Software Actions/practices.

By measuring SCI score deltas for different choices, we can nudge user behaviour in positive directions:

(developers) to track the impact of a given GSE Action
(end user) to help understand the impact of their behavior choices. The SCI spec could define how exactly you quantity and report those savings to the end user
(companies) to start tallying carbon savings for implementation of GSE actions

This extends to carbon aware, "eventually" I think carbon aware features will be checkboxes selected by customers "do you want this workload to run carbon aware y/n?. We're currently debating two paths/terms:

Carbon "Delta"This is a retrospective delta between two SCI scores. (e.g. load shifted by X time, resulting in Y measured savings)
Carbon "Counterfactual" - This hypothetical capability tracks are if the suggested green runtime was accepted by the user or not and the carbon reduction for each of these. This way we can get a singular statement of, "Over X predictions made, users on average reduced their carbon footprint by Z %"

### - Example of application:

ML Job/workload shifting: we can compare this carbon against a counterfactual by applying the methodology (energy consumption, location-based marginal carbon intensity) against the prior action. e.g. 'by shifting your workload to a greener region, you saved X% on your carbon emissions, for a total of Y emissions reductions" compared to your original action
Windows update at green times of day
Eco-mode: Ask customers 'do you want to run this in eco-mode'? When this box is checked,
Carbon-Aware Libraries - TensorFlow leverages an optional library that uses a new green runtime. Usage of this library is tallied across all actions it could be applied against

@TaylorPrewitt is driving

Questions to consider as we shape the SCI

I think as a part of the scope of this SCI, we should try and at least come up with answers to the following questions so that the SCI becomes something that is meaningful and actually adopted in practice:

What are the properties of the entire software system that we should be measuring?
Which of these properties currently have/lack metrics for effective measurement?
As highlighted above, what consitutes effective measurement?
What are the current measurement methods that are being utilized?
What are the strengths / limitations of these methods?
Which circumstances are the ones where it is appropriate to use one methodology over another?
What are the different properties of the metrics that we desire?
What are the impacts of these properties and the chosen metrics on the evaluation process?
What data would be needed to enable these measurements to take place?
Which of those are already available and which ones need to be collected?

Defining software boundaries

For anybody using the SCI, we should provide a set of software boundaries for different types of applications.

Contributing to the SCI should invite a user to add to the citation file

Raised during discussion on #82

Site reliability engineering principles and impact of SCI on them

What is SLI and SLO?
Reliability is the concept of a system having the quality of being trustworthy or of performing consistently well. Today most applications have a Service level objective (SLO) to facilitate monitoring. SLOs have typically 3 constraints

Service level indicator or SLI.
The target metric or Objective in percentile
The observation window
SLI + Objective +Observation window = SLO
SLIs have an event, a success criterion, and specify where and how you record success or failure. It is specified as the proportion of events that were good.

Example: Percentage of Query requests that return an error code other than 5XX or a timeout measured on the Query Gateway Frontend

SLIs exist to help engineering teams make better decisions. Your SLO performance is critical information to have when you’re making decisions about how hard and fast you can push your systems. SLOs are also important data points for other engineers when they’re making assumptions about their dependencies on your service or system. Lastly, your larger organization should use your SLIs and SLOs to make informed decisions about investment levels and about balancing reliability work against engineering velocity.
Availability SLO example
• 90% of HTTP requests as reported by the load balancer succeeded in the last 30 day window ( Here 90% HTTP requests succeeded is the SLI , Return error code other than 5xx or timeout is the Objective and 30 days is the observation window)

Latency SLO examples:
• 100% of requests in the last 5 minutes as measured at load balancer are served in less than 900ms
• 99.99% of requests in the last 5 minutes as measured at load balancer are served in less than 500ms
• 90% of requests in the last 5 minutes as measured at load balancer are served in less than 200ms

How organizations can re-define SRE considering SCI.
Software carbon intensity is a relative score between applications to be more carbon efficient, carbon aware and hardware efficient.

SCI for applications (CI)= (Energy used by the application * Located based marginal carbon emissions) + Embodied Carbon per baseline

Here the baseline is per API call, per additional user, per additional ML job etc.

From an SRE principle and alignment then, organizations would like to track carbon emissions from their applications using the SCI score and raise alerts on breach, i.e. if it increases beyond a certain percentage . If an application has a SCI score of x, organizations would then track variance from this value and configure monitoring principles .

How would you then correctly define the metric as per SRE principles?

In the above formula for SCI, baseline is a key aspect. We will explain with an example where we are considering the baseline as "one instance of a batch job". The batch job is a piece of component within a larger "software" or "application" which could potentially be a web application workload with a batch job doing a long running business process that does not need user interactions

Let us assume that the SCI value of an batch job running in West Europe has a CI value of 100 kgCO2 per instance of Azure webjob. By an initial assumption, let us assume that the SLO for SCI has been defined as not more than 20% variance. If during the operating window of the job, the service cranks up and the carbon intensity increases to 121 KgCo2 , then an alert has to be signaled. However this is theoretical. We have to look at this increase in the context of so many factors like interplay of SCI increases with other SLOs like latency, performance etc; how the west Europe datacenter was powered (% of coal/renewables) during the time of heightened operation of this web job; inefficient threading and garbage collection practices that would have been in the code that surfaced during the peak operation etc.

When this incident (as per SRE principles, this is an incident that should be monitored and alerted like a Sev 1,2,3 incident) happens, there could be multiple tuning techniques. One tuning technique that comes to mind for this incident is to try moving the workload to a different datacenter that is better powered by renewables ( by calling Watt Time API) or shifting the workload to a different time of the day. These are techniques that need detailed and vetted data upfront for the "orchestration algorithm" to make dynamic decisions regarding moving the workload. However we can tell today that we do not have defined and foolproof information to tell how much do each of these tuning techniques will contribute to managing the increased carbon intensity . This data has to be collated and cross verified over a longer period of time going forward to come up with authentic deductions.

Hence for the initial version of the specification, I propose that we raise the level of abstraction for monitoring of SCI at an application level rather than the individual component. i.e We will keep the baseline for software carbon intensity at the "application level" rather than a batch job, ML job, API call etc.

Thus we can consider that the metric we will use for the site reliability engineer will be the Total carbon emissions (C) value. The formula for this metric is :C= O+ M where O= E*I

Scope of Metric
For this metric, the definition of scope around which SCI will operate is very important. Scope is the boundary area where we will apply the monitoring. Since we are talking about software, the boundary here is the software boundary as defined in the SCI specification.
However, we may not be able to apply this uniformly for all software. Software varies by architecture , environment, hosting type (dedicated infrastructure vs shared infrastructure vs serverless ) and the implementation of SRE monitoring for SCI varies by these factors .We will discuss these factors below:

Architecture of Software
Different application architectures need monitoring differently. Consider for example the following workloads:
1. Web based multi-tier application or Long running process deployed on either cloud or on-premise
2. Mobile app connecting to backend APIs on cloud or on-premise
3. Desktop app connecting to backend APIs on cloud or on-premise
4. AI based machine learning model experiments
5. Open Source or Closed Source Framework SDKs
6. Server less applications

From a pure monitoring perspective of the SRE metric , doing it on server based workloads in the above list may be the first step . For e.g Web based multi-tier applications have either Virtual machines or EC2s connecting to APIs and databases and hence we can monitor the Operational emissions of these server components. Similarly we can calculate the metric for backend server APIs and serverless components of the Mobile and desktop apps.

There would be challenges however in doing the same for desktop devices and mobile platforms as the emissions calculations would need to know at a rough level the total number of mobile devices or desktops , their types etc to calculate roughly the value for the desktop apps. Hence for the first release of specifications, we can propose that monitoring will be for a subset of the above workloads i.e for workloads which have majorly server components.

Hosting Infrastructure - Dedicated Vs Shared

Monitoring techniques will also vary based on the hosting mode for the software. For those with dedicated infrastructure, SCI will just be the sum total of Operational Emission values across the different layers. In the equation for SCI ( SCI per unit of baseline = (E*I) + M , the value of M does not make an impact when calculating delta carbon intensity = Current CI- Original CI since the hardware is exclusively reserved for the purpose of the said software. Hence the monitoring technique can potentially look for variances in the Operational Emissions value and variances of it to raise alert for the Site Reliability Engineer.

The situation is different when we consider Shared infrastructure servers, multi-tenant databases, SaaS software shared by multiple customers. Here multiple micro-services could share the same PaaS compute platforms and storage services which by design is carbon friendly. In these cases, the percentage of allocation of infrastructure is necessary information to be able to calculate the carbon intensity value for the specific customer software. Hence we need to include the Embodied Emissions (M) value in the monitoring metric .

Application Environment Types
The usage of the above SRE metric also changes by environment. This statement is tied to the Application architecture factor somewhat but broadly, the concept is that for the purpose of carbon tracking and monitoring, measurement should be done for all environments like development, testing, QA, Performance and Production. This is because the carbon emission of the software increases manifold for lower environments like development and QA for workloads like machine learning models.

Multiple iterations of running AI experiments in lower environments should be tracked for carbon emissions and hence the scope of the metric should be monitored at the environment level.

Similarly for the other common workload scenarios like web or desktop applications, multiple performance tests are executed to achieve the SLO targets for throughput and /or latency. Through the process of trying to achieve these targets, the compute and storage resources are used more intensively than it would be on a production environment. Hence tracking of the metric is recommended at Environment scope as well.

Conclusion
Some of the deductions we have made at the end of this article:

Total carbon emissions (C) is the metric we will monitor at an SRE level .
The application of this metric can be done at multiple scope levels - Environment , Hosting infrastructure type and Application architecture
In future iterations of the specification, work to be done to understand the SRE impact of the C metric for other SLO attributes like latency, availability. A brief write up below:

Availability SLOs: Availability SLOs can be met either by software changes and redundant application design patterns or hardware redundancy. However, in the most common of scenarios, it is met by having hot standby and/or warm/cold standby infrastructure configurations. This directly impacts the “Embodied carbon” co-efficient in the above equation and hence tradeoffs have to be defined between meeting Availability SLO and allowed variance in SCI.

Latency SLOs:
Meeting latency SLOs involves either increasing the compute power allocated to the workload, spending developer cycles to fix performance issues, allocating the workload to synchronous services rather than async services that can run in energy efficient time sand also scaling the hardware required. Hence attempting to meet aggressive latency SLOs involves impacting all the co-efficient of the above equation: carbon efficiency, carbon aware and hardware efficiency.

Hence from a specification point of view, the SCI score can be integrated into the SLO examples as follows

Availability SLO example with SCI
• 90% of HTTP requests as reported by the load balancer succeeded in the last 30 day window and ensuring that the overall SCI does not go higher than x%

Latency SLO example with SCI
• 100% of requests in the last 5 minutes as measured at load balancer are served in less than 900ms and ensuring that the overall SCI does not go higher than x%

How can we monitor SCI impact?
Performance tests are great ways to measure SCI impact on SRE. Today they are used primarily to see if the application meets Service level objectives. We can add a couple of addition of performance tests (not a lot as that would mean transferring the SCI from prod environment to performance environments and cycles) to monitor for performance and adjusting the performance goal downwards (mostly!) to ensure SCI variances are not breached.

Improve Readability of Specification Methodology

Propose LaTex for our equation to improve readability.

https://www.overleaf.com/learn/how-to/Writing_Markdown_in_LaTeX_Documents
https://stackoverflow.com/questions/11256433/how-to-show-math-equations-in-general-githubs-markdownnot-githubs-blog

Collapsible markdown:
https://gist.github.com/pierrejoubert73/902cc94d79424356a8d20be2b382e1ab
Improve equation readibility
I'd like to get it to something similar to this:

How to calculate carbon intensity, do we need 24/7 hourly buckets?

The operational carbon emissions for a piece of software is the energy consumption * carbon intensity

Carbon awareness is a pillar of green software, if an application does more when there are more renewables available and less when there are less then it's reduced real-world carbon emissions. The calculation above needs to be sensitive to that kind of change to an application and the problem with most carbon emissions metrics these days is that they are not sensitive to carbon awareness, so it's hard for product teams to invest in making their application carbon aware.

The number in the above equation links to carbon awareness is carbon intensity but I don't think we can treat it as one number, I think we need to treat the above as a bucketed calculation, a weighted average.

So let's say my application runs for two hours. In the first hour I use 5 kWh in my second hour I use 45 kWh, the total energy consumption is 50 kWh.

The carbon intensity in the first hour is 2000gCO2/kWh and the carbon intensity in the second hour is 400gCO2/kWh. What is the carbon intensity number I will use in the above equation?

It's not the average of (2000 + 400)/2, that doesn't make sense. It has to be the weighted average based on the amount of energy consumed for each of those hours.

( (5 * 2000) + (45 * 4000) ) / 50 = 560gCO2/kWh

If we use an hourly weighted average then if you were to make your application more carbon aware, if you were to make it do more when there is more renewables available and less carbon intensity, then the operational carbon emissions number will go down.

Questions:

Is it ok to roughly calculate a yearly average first? The above is a hard calculation to do, I think it's ok to roughly calculate the average for CI, to begin with, but then as you invest more time into your SCI calculation move into hourly buckets.
What size buckets make sense? I think given the way organizations are moving towards 24/7 hourly matching, hourly buckets make a lot of sense. Since then improving your SCI score would directly help organizations meet their 24/7 hourly matching target.

Question: Do we have someone that works with (any) Government/regulators or those that work closely with gov/reg on policy engaged here?

Do we have someone that works with (any) Government/regulators or those that work closely with gov/reg on policy engaged here? I think it would be good to get a perspective from the other side, and at some point understanding what would incentivise and drive adoption (which would then hand over to the community WG)

I've worked over the last few years with people across these groups that work on driving policy, and with groups driving policy with policy makers and they would give quite a bit of insight. But I'm not sure if we already had someone.

Review LF build tools and provide recommendations

Following #39 - identify LF build tools, and requirements we need to adhere to, including:

Folder structure
Templating requirements
Deployment - GH actions, CICD, basis of the tech

Provide recommendations

Community contributions determination

More info @jawache - http://bit.ly/across-workshop-community-pitch - This is related to a bigger issue, around how to identify contributors - understanding explicitly that source control is not the full picture of open source, as it excludes governance, decision making, and consensus as you mentioned. https://citation-file-format.github.io/ - this file is one approach to recognize contributors without requiring people to update the repo

Originally posted by @Sealjay in #84 (comment)

Include non-linearity in the limitations/discussions section

So you make an interesting point @jawache ,in fact, I'd be shocked if the rate wasn't non-linear since the underlying mechanics are. More than a rate then perhaps we should be thinking of a model given some expected traffic pattern. This way, the end consumer can clearly see points of inflection and explore re-architecting the solution if need be when they hit a part of the curve that exacerbates their marginal carbon emissions.

Originally posted by @atg-abhishek in #3 (comment)

Actions To Reduce Carbon

use less electricity
use electricity more intelligently
use less hardware

Flagging this as future work item. @jawache can elucidate when we're ready

Software Boundary: define "significantly"

Use Case: Measuring the total carbon emissions for Windows

Getting a little bit more concrete here, I think Windows is an excellent use case to explore. Windows really shows the weakness in the GHG attributional model of emissions calculation.

In the GHG attributional model, the carbon emissions of Windows for Microsoft are JUST the carbon emissions from running Windows at Microsoft. I.e. we only count the electricity we bought to run Windows for Microsoft employees. However, Windows is run on 1 billion devices worldwide. Each individual company that uses Windows calculates its own carbon emissions from its own use of Windows. If we add them all up together we get the carbon emissions of Windows for the whole world.

But because the metric we use is the GHG attributional total for Microsoft, the Windows team is not as incentivized as it could be. If they did some work to make WIndows more energy-efficient it would benefit the whole world and every company that uses Windows, but it would only count a little bit towards Microsoft meeting its own carbon emissions targets.

In the SCI contributional model, Windows would calculate the total carbon emissions of all of Windows, worldwide, regardless of which organization is using it and who bought the electricity to run Windows. That way the Windows team is incentivized to do a lot more to improve the energy efficiency of Windows.

NOTE: The Windows team at MSFT doesn't only use the GHG attributional model to measure its carbon emissions for this very reason.

Getting to specifics.

The carbon cost of Windows = Global Energy Usage * Carbon Intensity + Embodied Carbon

Global Energy Usage

I would count this as the energy consumption of JUST the OS, since the OS developers are going to want to know how their changes impacted the total number.
If windows become more energy-efficient, this number goes down.

Carbon Intensity

This number needs to be sensitive to carbon awareness.
If the Windows team made Windows do more when there are more renewables available on the grid, then this number should go down.
E.g. if the Windows team timeshifted charging the laptop to only,

Embodied Carbon

This is the embodied carbon not of the device Windows is running on, but on the services used to support Windows.
E.g. the Windows org has a large set of services that support Windows, APIs, Updates. This is the embodied carbon of that.
Making windows hardware efficient, using fewer hardware resources to do the same job, would reduce this number.

Interesting thoughts:

For SaSS products, the embodied carbon will be higher, since you control all the hardware that is using your software.
For open-source software or installed software like Windows, the energy cost and carbon intensity will be the primary factors
If you are an open-source piece of software, how would you even calculate your total energy consumption? Windows knows it only because of deep investment into telemetry, maybe some of this will have to be modeled?

Goals/criteria for a case study to validate the SCI standard

There is an idea to validate the SCI standard using some existing software, to help us learn whether the standard is/isn't providing the intended value. If it isn't, the case study would hopefully help us identify gaps in the standard. Personally, I think we're still some way off from being able to do this effectively, however as we think about selecting a piece of software, I wanted to open a discussion on the goals of the case study. I currently see three potential potential goals:

1. Validate whether the SCI standard can be used to quantify the carbon emissions of a piece of software. E.g I have a software application running in the cloud and I want to quantify it's carbon emissions over time. Can I use the SCI standard to do this?
2. Validate whether the SCI can be used to reduce the carbon emissions from a piece of software. E.g I want to make changes to the same software application and quantify the change in emissions. Can I use the SCI standard to do this?
3. Validate whether a piece of software can be built as an implementation of the SCI standard. E.g. I want to offer a product/service/tool that allows people to quantify their carbon emissions from their software. Can I use and meet the SCI standard in doing so?

Keen to read/hear what others think, but my sense is that 2 can't be achieved without 1, and 2 brings us the most value for learning if/how the standard can help create an immediate real-world impact on the software we build. If this is true, here are some suggested criteria for deciding on a piece of software:

It's simple enough to run the software in commonly used environments (locally, the cloud, on-premise, etc)
Software is complex enough that there are possible changes (code, infrastructure, dependancies, etc) that could be made to potentially reduce carbon emissions. But also simple enough so we can understand how it works.
It's open source - so the WG can best understand the case study and the underlying software under test
It's written in one of the top X most common languages

There could be others, but thought I'd share those to start the discussion on both the goals for the case study and how we select a piece of software for it.

Disambiguate 'Marginal'

We'll need to update the SCI on terms & methodology

Currently, we have 'R' which is a baseline, also represneting 'marginal' (R)
We've also got 'location-based marginal carbon intensity (I)

Methodology Summary: remove "real-world"

This is superfluous.

Adding Building, Training, Redundancy and Failover to the boundary list (and fixed a formatting issue)

Building, training, were not in the list and I think they are important to call out specifically. Some systems have the majority of their carbon footprint is in the building/training vs operations.

Failover I feel is an ommision because we talk about "idle servers", but that may only be looked at in the scale unit. Sometimes these are also not idle, taking 1% of load to ensure they are working etc.

There was also a formatting issue at the bottom fo the boundaries bullet point listing I noticed that can be fixed.

Reference material

@atg-abhishek talked about having a place to add additional reference and background material, like the background on fuel mix that @Henry-WattTime shared today.

That could be good, at least as reference material or external sources for new joiners to the GSF.

Do we have an appropriate place outside of the specification to reference this @seanmcilroy29 ? Maybe as an addendum to the dictionary, or as background to the way we work? Slack is great for ongoing discussions, but it might be worth being able to link to sources or material that informed our viewpoint.

Characteristic: Calculation of emissions should not netted off using offsets

1g of carbon that is offset, is not equivalent to 1g of carbon that is not emitted.

If the focus of the standard is to drive behaviour change that reduces carbon emissions, that reduces the amount of carbon in our atmosphere, then allowing someone to improve their software carbon intensity number simply by purchasing an offset negates the goal.

The main challenge is that offsets are not the same, paying someone to plant a tree is not the same as paying someone to not cut down a tree which is not the same as direct air capture of carbon. However, for the purposes of offsetting emissions, they are often all treated the same and also treated the same as not emitting 1g of carbon in the first place.

The specification MUST not allow offsets to be used to neutralise the carbon emissions of the hardware the software uses.

If you are connected to a grid, renewables are an offset for electricity consumption. Not using 1kWh of electricity, is better than using 1kWh of electricity consumed from the grid and then offset with renewables. We need to measure the software energy consumption since that is the thing we want to reduce, not the software energy consumption offset with renewables.

So again, the specification MUST not allow renewable offsets (in the form of PPA, RECs etc..) to be used to neutralise the carbon emissions from the electricity the software uses.

Improve phrasing of problem statement

The phrasing is vague in many places and needs clarity prior to release for public consumption. Example:
"The purpose of this specification will be to enable standardization across industry empowering individuals and organizations to make more informed choices in the software solutions that they pick. " - "more informed choices in the software solutions that they pick" - what kind of choices are these? What software solutions are meant here?

Badge for Conformance

"Conformance" for the SCI

Should there be different badges for different levels of conformance with the SCI?

Methodology section: poor phrasing in statement about carbon intensity

"Electricity has a carbon intensity depending on where and when it is consumed" -
This is wrong: carbon intensity is related to the generation of electricity.

Methodology section: define "marginal rate" and "growth of a software product"

Also, in "A rate provides you with helpful information when considering the growth of a software product and allows for the computation of a marginal rate." - be clearer about "helpful". How is it helpful?

Finally, regarding "computation" - is this the same as calculation?

Readability: Intro & Executive Summary

Intro & executive summary
1. bump the 'core characteristics' section to the top for readability
2.Incorporate comments from 'getting started' discussion with core characteristics section in the spec
simplified SCI equation at the front (similar to the slide deck that Asim showed)
references/dictionary at the bottom
delete empty 'boundaries' section

Baseline: Do we expose (`C`) AND (`CI`)?

C = total carbon emissions per piece of software
CI = carbon emissions per unit baseline

This is a root issue per @Henry-WattTime. Some open questions around:

What do we do if only one piece is available?
What if C goes up, but CI goes own?

Regional Averages vs. Granular Carbon Intensity: What Is The Delta? (`I`)

For carbon intensity calculations, is a regional average sufficient, or do we need the real-time carbon intensity of that electricity source?

Question: What do we gain/lose by accounting with these two methods?

e.g. for a given datacenter region, perhaps location-based carbon intensity

Potential application: carbon accounting to the real-time information to estimate the potential error introduced by the delta in averages (regional averages vs. granular).

Data needed: hourly demand curve for the source (e.g. Azure, windows) to appropriately capture the seasonality

Proposal/expected outcome of the analysis: a comparison of the averages currently used for Company carbon accounting and real-time data sources (WattTime, ElectricityMap) using historical demand curves (Azure/Windows) that capture seasonality. We hope this will better capture progress and highlight paths toward a consistent methodology.

Rephrase "Software Sustainability Actions"

"All actions that can reduce the carbon emissions of a piece of software fall into one of three categories." This statement needs justification. I don't think is it necessary to make such a strong claim for the purpose of this section.

I suggest softening it to something such as:
"The SCI specification intends to encourages the reduction of carbon emission from software through the actions from these three categories: ... "