softsys4ai / unicorn Goto Github PK

A Framework for Reasoning about System Performance using Causal AI

License: MIT License

Python 98.20% Shell 1.72% Dockerfile 0.07%

causal-inference causality machine-learning optimization performance-analysis performance-testing performance-tuning systems

unicorn's People

Contributors

Stargazers

Watchers

Forkers

leawise yuhala abirhossen786 alexvoedi wyhmhs ztz1989 socioprophet

unicorn's Issues

Evaluation of Source Environments

Need to determine the transfer learning pipeline. Determine the following:
--- How good is the source modeling?
--- How much update is needed?
--- Explainability (what are the changes across environments)
--- Experiments with different source budgets

Bootstrap sampling

Sample with replacement X% of the data
Build a causal model
Sample another X% (with replacement) and build a causal model
- How to update the edges' weight thresholds?
- Keep edges between two nodes when they appear > Y% of the times and don't violate constraints

Overview

Update Causal Structure Learning Algorithm.

-- Use FCI with the entropic approach to resolving edges.
-- Breakdown computation efforts required for causal structure discovery, computing path causal effects, computing individual treatment effect, and measuring recommended configurations.

Appendix post-submission with proofs in the replication package

Run experiments with post-training quantization.

Use SparseYolov2 and SparseYolov3. Similar to previous actions. I Will update you with details.

Update the Google Sheets for configuration options and their values

Update the ground truth datasets for each type of performance fault.

Update ground truth for each fault by using the configurations that provide 80% or more gain and recompute accuracy, precision, and recall with a confidence interval.

Use FCI to Update NOTEARS and (vice-verse)

Learning a Causal Models with each one
Use the union of the edges
- How do we handle discrepancies between the two

Bootstrap+Active Learning

Policies for handing edge-type mismatches

When are the policies applied?

bi-directed & no-edge → we get a confidence score- whichever edge direction has the highest confidence use that direction.
Un-directed edge & no-edge → no edge
Tail has a bubble and head has arrow → keep the directed edge and remove the bubble
No-edge & edge → edge
No-edge & no-edge → no-edge

When are the policies applied?

Bubble/un-directed edge - selection variables
Bi-directed edge - hidden variables

When are the policies applied?

Case 1: Greedy-- apply the above rules at every step
- At each iteration there is a DAG (say DAG_t, DAG_t-1, ...)
- If there are conflicts keep the counts of how many times an edge a->b, b->a, a--/--b, appears, use the one that the max count.
Case 2: Apply in the end.

What is the motivation for the three HW systems?

@iqbal128855 List some previous papers that use these systems:

Run Scalability experiments with Facebook DLRM systems.

--- Performance analysis of the Facebook DLRM systems with different configurations. Show how difficult it is to debug for misconfigurations in real-world production systems and discuss challenges. Discuss the richness in performance landscape (more complex behavior).
--- Run CAUPER, BugDoc, SMAC, DeltaDebugging, Encore, and CBI on the DLRM fault dataset and evaluate using the ground truth dataset for both single and multi-objective performance faults.
--- Show proof of scalability of CAUPER in Facebook DLRM system with a high number of allowable values taken by different configuration options.
--- Write about the evaluation of Facebook DLRM systems. Analyze by 3 slices of latency, energy and heat.

Real world case study with a self-driving car system composition

Use Fig. 3 from here: https://www.bdti.com/InsideDSP/2017/03/14/NVIDIA to explain a real world scenario https://forums.developer.nvidia.com/t/cuda-performance-issue-on-tx2/50477 to show it works

Structure Learning

Enrich the causal models with Functional Causal Model (FCM) using CGNN and work with visualization for FCM
Update causal model with Causal Interaction model and compare with CGNN.
Comparison of CGNN, FCI (entropic calculation), and Causal Interaction model.
If we use CGNN need to find the correct strategy -
--- how to find the initial skeleton?

Identify all back-door criteria for the intervenable configuration options

NOTEARS pseudocode

Questions regarding offline mode and entropy-based orientation

Dear experts of Unicorn,

Thanks a lot for open-sourcing this excellent research. I have read your EuroSys paper and learned a lot! I just have two questions regarding the codebase due to my lack of knowledge:

If I understand correctly, the offline mode of Unicorn debugging experiments cannot be reproduced for all the test scenarios (i.e., hardware+software). In particular, I can only find the measurement.json file under the Single Objective, Image folders of TX2 and Xavier. In the other debug directories, there is only the data.csv file. Could you give me some hints to reproduce the offline experiments for the other scenarios?
I am really interested in the entropy-based edge orientation approach, detailed in Sec. 4 of the EuroSys paper (i.e., resolving partially directed edges). However, I failed to find the corresponding algorithm in the codebase. For instance, in the resolve_edges method of causal_model.py, the edges seem to be oriented based on fixed rules without involving entropies.

    # replace trail and undirected edges with single edges using entropic policy
    for i in range (len(PAG)):
        if trail_edge in PAG[i]:
            PAG[i]=PAG[i].replace(trail_edge, directed_edge)
        elif undirected_edge in PAG[i]:
                PAG[i]=PAG[i].replace(undirected_edge, directed_edge)
        else:
            continue

    for edge in PAG:
        cur = edge.split(" ")
        if cur[1]==directed_edge:
            node_one = self.colmap[int(cur[0].replace("X", ""))-1]               
            node_two = self.colmap[int(cur[2].replace("X", ""))-1]
            options[node_one][directed_edge].append(node_two)
        elif cur[1]==bi_edge:
            node_one = self.colmap[int(cur[0].replace("X", ""))-1]
            node_two = self.colmap[int(cur[2].replace("X", ""))-1]
            
            options[node_one][bi_edge].append(node_two)
        else: print ("[ERROR]: unexpected edges")

I did find a function that computed the entropy for the EnCore method in the debugging_based.py. But maybe it is not the same. Could you point me to the right location of the entropy-based method that orients the undetermined edges of FCI? Thanks in advance!

Best regards,
Tianzhu

More comparisons

Method	Where?	When	link
∆LDA	ECML	2007	http://pages.cs.wisc.edu/~jerryzhu/ssl/pub/rlda.pdf
SmartConf	ASPLOS	2018	https://people.cs.uchicago.edu/~hankhoffmann/autoconf.pdf
BestConfig	SoCC	2017	https://arxiv.org/pdf/1710.03439.pdf
LEO	SIGARCH	2015	https://dl.acm.org/doi/pdf/10.1145/2786763.2694373

Run MLPerf benchmark with Facebook DLRM.

Run MLPerf Benchmark with Facebook DLRM on different hardware (Jetson Xavier and TX2, Possibly on GPU cloud). Change software (RMC1, RMC2, and RMC3) and change workload (single stream, multi-stream and offline, varying number of queries for inference.)

Make sure invalid options are normalized

Try a different ML dataset
CIFR test size

Randomly -- not an appropriate answer for the reviewer
Use FCI/FGS/PC (besides expert knowledge) which makes much looser assumptions about causal sufficiency to inform NOTEARS

softsys4ai / unicorn Goto Github PK

unicorn's People

Contributors

Stargazers

Watchers

Forkers

unicorn's Issues

When are the policies applied?

When are the policies applied?

When are the policies applied?

Recommend Projects

Recommend Topics

Recommend Org

Jobs