py-why / causal-learn Goto Github PK

View Code? Open in Web Editor NEW

1.1K 16.0 181.0 27.51 MB

Causal Discovery in Python. It also includes (conditional) independence tests and score functions.

Home Page: https://causal-learn.readthedocs.io/en/latest/

License: MIT License

Python 98.46% Jupyter Notebook 1.54%

causal causal-discovery causal-inference causality python graph structure tetrad time-series hidden-causal

causal-learn's People

Contributors

Stargazers

Watchers

Forkers

kunwuz shuowang-ai lijush raegher ziatdinovmax kenneth-lee-ch lindaweijiasun biwei-huang nonmean ccxu-ustc jingmouren uclwilson snowgy allensmile zhangc927 minghao2016 felix660 vincentwei2021 zhichao-li sq206 wxyhhhhh323 molimomo lzu-cvpr waitalone falconerchen huaminginagora ggzhang0071 huoniu ziyeejiang hhmy27 zclllyybb phage-gp zren111 xihuishawpy jiangjao lipi12q markdana kildallithro 1587causalai iriszyy scorpiobao zhi-yi-huang aahmadai chunlei-ren cassie230 wean2016 huangmingzhi kaihcohrs musfiqshohan horizonailab duanzi019 vanaja-artefact xrosliang dumari shuyanw zachwooddoughty jarodyv turuibo ldtenacity liyi722 muz1lee t-triobox black-swan-icl shangulaike hlzhang109 zhenlanji yzbrlan howardhuang98 sera91 chihyuanchiu vishalbelsare saraalrawi ikeuchi-screen wucutin erdungao tofuwen jinanzou aoqiz py-zhai cogito233 delacylab daryabiparva9 leizan yu-jian-hua zhangleiray007 dingchenwei huangzhongyu xavierwong charonwangg tmacmilan ariktan rayuelaproject huiouyang16 asm-def briziorusso jinshi201 fengxie009 14110951d0 gg-big-org hejinyang123

causal-learn's Issues

Unresolved GIT confict

I just checked out CL from the repository and noticed there is an unresolved conflict in

https://github.com/cmu-phil/causal-learn/blob/main/tests/TestANM.py

Here:

<<<<<<< HEAD
test.test_anm_simul()
test.test_anm_pair()

test.test_anm_pair()

8443c49

Add required edges by background knowledge

Hi, Thanks for your great work.

I hope to add an edge that is not detected during the PC skeleton phase according to background knowledge.
I read the codes in causal-learn/causallearn/utils/PCUtils/BackgroundKnowledgeOrientUtils.py and it looks like only edges already detected in the skeleton phase can be defined as 'required'.

Is there any method to solve my problem?
Can I direct add the following lines in BackgroundKnowledgeOrientUtils.py?

def orient_by_background_knowledge(cg: CausalGraph, background_knowledge: BackgroundKnowledge):
    """
    orient the direction of edges using background background_knowledge after running skeleton_discovery in PC algorithm

    Parameters
    ----------
    cg: a CausalGraph object. Where cg.G.graph[j,i]=1 and cg.G.graph[i,j]=-1 indicates  i -> j ,
                    cg.G.graph[i,j] = cg.G.graph[j,i] = -1 indicates i -- j,
                    cg.G.graph[i,j] = cg.G.graph[j,i] = 1 indicates i <-> j.
    background_knowledge: artificial background background_knowledge

    Returns
    -------

    """
    if type(cg) != CausalGraph or (type(background_knowledge) != BackgroundKnowledge and type(background_knowledge) != CustomBackgroundKnowledge):
        raise TypeError(
            'cg must be type of CausalGraph and background_knowledge must be type of BackgroundKnowledge. cg = ' + str(
                type(cg)) + ' background_knowledge = ' + str(type(background_knowledge)))
    for edge in cg.G.get_graph_edges():
        if cg.G.is_undirected_from_to(edge.get_node1(), edge.get_node2()):
            if background_knowledge.is_forbidden(edge.get_node2(), edge.get_node1()):
                cg.G.remove_edge(edge)
                cg.G.add_directed_edge(edge.get_node1(), edge.get_node2())
            elif background_knowledge.is_forbidden(edge.get_node1(), edge.get_node2()):
                cg.G.remove_edge(edge)
                cg.G.add_directed_edge(edge.get_node2(), edge.get_node1())
            elif background_knowledge.is_required(edge.get_node2(), edge.get_node1()):
                cg.G.remove_edge(edge)
                cg.G.add_directed_edge(edge.get_node2(), edge.get_node1())
            elif background_knowledge.is_required(edge.get_node1(), edge.get_node2()):
                cg.G.remove_edge(edge)
                cg.G.add_directed_edge(edge.get_node1(), edge.get_node2())

    # custom change to add required edges
    for node1 in cg.G.get_nodes():
        for node2 in cg.G.get_nodes():
            if not cg.G.is_undirected_from_to(node1,node2):
                if background_knowledge.is_required(node1,node2):
                    cg.G.add_directed_edge(node1,node2)
                elif background_knowledge.is_required(node2,node1):
                    cg.G.add_directed_edge(node2,node1)

Is there an implementation of Degenerate Gaussian?

In the ReadMe.md, it says causal-learn is a Python translation and extension of Tetrad. So, is there an implementation of Degenerate Gaussian(DG) algorithm, which is published in a paper named "Learning High-dimensional Directed Acyclic Graphs with Mixed Data-types"? As in Tetrad program, it will always use CG or DG algorithm to manipulate mixed data first. Thank you for reply.

granger2d should work for more than 2d data

Problem: granger_test_2d will throw an error if applied to data with more than 2 variables (d > 2).

I think that probably without the check below, the granger_test_2d method in the Granger class will work over data with more than 2 variables.

https://github.com/cmu-phil/causal-learn/blob/6f297e743c462b4a2f8191250ea9ac03c288abf0/causallearn/search/Granger/Granger.py#L43

If this behavior is wanted maybe documentation should be changed, as doc is now it states that granger_test_2d will return matrices of pvalues and adjacency and it accept data (nxd) as input.

Generalized score with mixed data

hello,

i used generalized score to do GES on mixed data, which contains float and string types of data. But somehow it failed.
Is there any example for this situation?

Best Regards
Yuqiang

How to Visualize Casual Graph?

Awesome work!
I just try to use TestDirectLiNGAM to get the casual graph on public dataset. And the result is interesting! Thank you for your work~ I am interested in visualizing the casual graph I got ( TestDirectLiNGAM method produce a adjacency_matrix_). Is there any operation to get a png which visualize the graph?
Aftering reading docs, i notice there is some Graph operations but no visualize2Graph Operation :(

Passing domain knowledge

I am planning to get rid of Java dependencies in cause2e by replacing py-causal with causal-learn for the discovery step.

However, my applications require passing domain knowledge in the form of required or forbidden edges in the causal graph. Py-causal and Tetrad have a great interface for domain knowledge. Will this be included in causal-learn, too? In the docs, I have only found possibilities for LiNGAM-type models, but not for GES or PC.

Thanks for finally translating Tetrad to Python!

Please clarify how mixed data are to be represented in causal-learn.

This is a request for information. For the mixed data project, could you clarify how mixed data are to be represented in causal-learn? That is, how is one to know, programmatically, which columns are for discrete data and which for continuous data? This cannot be gleaned from an np array itself, since binary data, for instance, can be treated as either continuous (with values 0 and 1) or discrete, and ordinal discrete data may often be treated as either continuous or discrete as well.

CD-NOD with independent change principle.

Hi, thanks for your great work.

I hope to use CD-NOD phase 3, i.e., identifying directions with independent change principle.

However, I notice it is still not implemented (a TODO comment).

When do you plan to release these codes?

Many thanks.

Add cache for local score function for efficiency.

Local score is calculated repeatly during search.

Make the function def local_score_BDeu(Data: ndarray, i: int, PAi: List[int], parameters=None) -> float: cachable. Strongly recommend to use functools.lru_cache, that is a efficient python built-in decorator. But it requires all parameters are hashable, currently they are not, e.g. ndarray.

A commonly solution is to create a Score class to store data, and local score function only takes i, and parents tuple (now it is a list).

Deleting edges takes a very long time, I came up with a solution.

I was running this program (newest version from a few days ago) on a dataset of roughly 6000 x 800 matrix and found in-between depth levels the program was taking a long time to get to the next depth level. I extrapolated and found this was going to take 331 days for just one depth level. I studied the code and eventually traced the bottleneck to GeneralGraph.remove_edge(edge1), which rebuilds the entire edge list in the final line self.reconstitute_dpath(self.get_graph_edges()). get_graph_edges is the problem as it traverses all of self.graph to build an edge list.

I fixed the problem by the following: create new functions remove_edge_only(self, edge: Edge) which is the same as remove_edge except it omits the last line self.reconstitute_dpath(self.get_graph_edges(). I also added a function
def clear_dpath(self):
self.dpath = np.zeros((self.num_vars, self.num_vars), np.dtype(int))

Then in SkeletonDiscovery.py:136, I call remove_edge_only instead of remove_edge. At the end of that loop, I call
cg.G.clear_dpath()
cg.G.reconstitute_dpath(cg.G.get_graph_edges())
so the edge list is rebuilt only once at the end of all edge deletions instead of after every individual edge deletion. This brought the runtime down to seconds and produced identical causal images on smaller datasets. Since this isn't my project, is there some kind of approval process I need to upload suggested changes like this to or is this something you all would want to implement yourselves?

Can't add a new node to GeneralGraph after deleting edges

If I attempt to add a new node to a GeneralGraph using add_node after calling the method remove_nodes, I get the following error:

ValueError: all the input array dimensions for the concatenation axis must match exactly, but along dimension 1, the array at index 0 has size 10 and the array at index 1 has size 5.

The more detailed error message:

Traceback (most recent call last):
  File <PATH_TO_SCRIPT>, line 20, in <module>
    graph.add_node(GraphNode("X11"))
  File "<PATH_TO_PROJECT>/ctf_venv/lib/python3.9/site-packages/causallearn/graph/GeneralGraph.py", line 205, in add_node
    dpath = np.vstack((self.dpath, row))
  File "<__array_function__ internals>", line 180, in vstack
  File "<PATH_TO_PROJECT>/ctf_venv/lib/python3.9/site-packages/numpy/core/shape_base.py", line 282, in vstack
    return _nx.concatenate(arrs, 0)
  File "<__array_function__ internals>", line 180, in concatenate

Minimal example to reproduce:

# Generate a graph with 10 edges
nodes = [GraphNode(f"X{i}") for i in range(1, 11)]
graph = GeneralGraph(nodes)
for i in range(9):
    graph.add_edge(Edge(nodes[i], nodes[i+1], Endpoint.TAIL, Endpoint.ARROW))
print(graph)

# Delete the first 5 nodes
nodes_to_remove = nodes[:5]
graph.remove_nodes(nodes_to_remove)

# Try to add a new node (X11)
graph.add_node(GraphNode("X11"))
print(graph)

It appears that self.dpath is not being updated correctly upon calling remove_nodes. If you compare its dimensions to self.graph you can see that it has not deleted the paths involving the removed nodes. For example, in the above example we removed 5 nodes but if we call print(graph.graph.shape, graph.dpath.shape), we get the following: (5, 5) (10, 10).

QUESTION: To get DAG/Graph from FCM-based methods

With FCM-based methods, for instance DirectiINGAM, how do we get DAG so that we can compare and contrast the structure with other methods such as PC,FCI, GES etc? Are there any utilities?

from causallearn.search.FCMBased import lingam
model = lingam.DirectLiNGAM()
model.fit(data)

print(model.causal_order_)
print(model.adjacency_matrix_)

Thanks

Typo.

"Constrained-based" in the README should be "Constraint-based".

Need reference for PCMAX (default setting for PC in CL apparently)

After some sleuthing in the Python code and some performance testing with the help of Pablo Puig and Bryan Andrews, I've come to the conclusion that the default setting for PC in CL is maxP. That is an algorithm of mine called PCMAX. The only reference for for this algorithm currently in the literature that I know of is an arXiv tech report I put out in 2016, this one:

Ramsey, J. (2016). Improving accuracy and scalability of the pc algorithm by maximizing p-value. arXiv preprint arXiv:1610.00378.

Anyway, the performance of PC in CL is the same as the performance of PCMAX in Tetrad (and different from the performance of PC in Tetrad.)

If this is going to be made the default setting for PC for CL, perhaps this tech report should be referenced for the PC algorithm in the documentation? Otherwise no one will know what the algroithm is. I'll see if I can't sent that to a conference somewhere to get it published. I didn't realize that was being made the default here.

FAS does not use required edges anymore

Hi,

Since the new commit on Fas.py, the required edges are not used anymore to create the graph. May I ask why and what the benefit of this is? I made a program for which the required edges were essential, so I am very curious 😅

Kind regards,
Vera

Question: Clarification on Edge properties from FCI algo

In the docs the edge properties are explained as:

"""
edges: list. Contains graph’s edges properties. If edge.properties have the Property ‘dd’, then there is no latent confounder. Otherwise, there might be latent confounders. If edge.properties have the Property ‘nl’, then it is definitely direct. Otherwise, it is possibly direct.
"""
The abbreviations seem odd to me. Unfortunately, I can't find the original paper on the internet free and I can't find another reference to these abbreviations. Consequently, I'm inquiring here for clarification.

Is it possible that dd and nl have been switched? I.e. dd means definitely direct and nl means no latent?

Additionally, there is no explanation fro pl. Does it mean possible latent confounders?

I'm guessing that pd means possibly direct.

Thanks for the clarification!

score_g function

I am not sure if the function "score_g" in causallearn.utils.GESUtils is designed to calculate the score of a causal graph given a specific dataset. But when I used it, I found when parameter "PA" is not empty list, it comes to an error "IndexError: only integers, slices (:), ellipsis (...), numpy.newaxis (None) and integer or boolean arrays are valid indices".
I think the reason of the problem is the “PA” given by "score_g" does not match the parameter (PAi) request of local_score_BIC.
Much appreciate for your response.

ImportError: cannot import name 'GIN' from 'causallearn.search.FCMBased'

from causallearn.search.FCMBased.GIN.GIN import GIN
G, K = GIN(data)

ImportError Traceback (most recent call last)
/var/folders/8b/hhnbt0nd4zsg2qhxc28q23w80000gn/T/ipykernel_52081/3662974966.py in
----> 1 from causallearn.search.FCMBased import GIN
2 G, K = GIN(data)

ImportError: cannot import name 'GIN' from 'causallearn.search.FCMBased' (/Users/datalab/github/causal-learn/causallearn/search/FCMBased/init.py)

PC learns different graphs dependent on the ordering

Hi,

Thank you for your great work on this package!
I am testing the behaviour of the PC algorithm on simple simulated data. I found that the number of directed edges detected differs based on the ordering of the variables given to the algorithm.

I am running the following code:

import numpy as np
import matplotlib.pyplot as plt
from causallearn.search.ConstraintBased.PC import pc

def simulate_data(n_obs):
    '''
    Simulate data from the following graph

        A       B
         \     /
          v   v
            C
          /   \
         v     v
        D       E
    '''
    A = np.random.normal(size = n_obs)
    B = np.random.normal(size = n_obs)
    C = A + B + np.random.normal(size = n_obs)*0.25
    D = C + np.random.normal(size = n_obs)*0.25
    E = C + np.random.normal(size = n_obs)*0.25
    return np.stack((A,B,C,D,E), axis =1)

# generate data
n = 10000
data = simulate_data(n)

# test different permutations
permutations = [[0,1,2,3,4], [0,1,3,2,4]]

for permutation in permutations:
    graph = pc(data[:,permutation], 0.05, 'fisherz', node_names = np.array(["A","B", "C", "D", "E"])[permutation],  verbose = False)
    graph.draw_pydot_graph(labels=np.array(["A","B", "C", "D", "E"])[permutation])
    plt.show()

When the ordering of variables C and D is permuted, the PC algorithm returns the Graph with an undirected edge from C to D. However, when the ordering is unpermuted, the PC algorithm correctly directs the edge from C to D. This happens, even though the p-values in the CI tests are unchanged for the two permutations.
Is this expected behaviour or can you help me fix this?

Mixed data example

May I know if there are any examples of causal-learn usage on mixed data (categorical and continuous/ordinal)? This is supported in tetrad. Many thanks!

Using background knowledge makes FCI algorithm slower

Hi!
I would like to use the FCI algorithm with background knowledge, but I have noticed that the computation speed with the background is much slower than the computation speed without passing the background knowledge to FCI (or PC). I work with about 300 variables and there are not many dependencies (about 1,2 or 3) between them. In my background knowledge there are a lot of forbidden edges and some required edges. As far as I understand the FAS function, the amount of forbidden edges will reduce the adjacency list by a lot and also the separating set will be smaller than the version without the background knowledge.

I cannot find a explanation for the increase of computation speed, what am I missing?

Thanks in advance!

multi-threading support

I found algorithm takes a long time to run, maybe should add multi-threading support

Derivation of the Fisher-z independence test

Hi, I would like to understand where the derivation of the Fisher-z test comes from. I have tried to derive it from Wikipedia (https://en.wikipedia.org/wiki/Partial_correlation), since that source is the most elaborating. I think I understand everything, but I am not very sure about why the inverse of the matrix is used. Is there any documentation available to read into?

Kind regards,
Vera

memory consumption

On my dataset (~3.6w, ~40 feats), I found it may use over 40GB memory, ( and I have not get it end successfully)
Maybe it needs to use some db to reduce max memory usage?

[RFC, META-ISSUE] Complete continuous integration (CI) for unit-testing, documentation, and test coverage

Hi, this is a meta issue to track the items required to bring causal-learn into a more PRable repository:

This list is to track the high-level issues that need to be resolved in a series of PRs. Each may have some description as well associated regarding what needs to be done in detail, what the end result should look like and some motivation. My recommendation is using a combination of GH actions and circleCI as this has worked well in the past for me, but we can change based on discussion.

Implement easy-to-use build, test, docs, and formatting commands, such as https://github.com/py-why/dodiscover/blob/b0cb5c48317cf1c6859b3dc9646925fdac3ecf1f/pyproject.toml#L112-L122. These will be used to ease developer workflows and also make CIs easier to run.
Implement different requirement dependency groups e.g. for building, testing, doc-building and actually running of causal-learn
Enable CI that runs all unit-tests in each PR and commit to main. Should also upload test coverage via GH actions
Enable CI that builds (installs) causal-learn in each PR and commit to main. Testing installation for Windows, Mac OSX and Linux via GH actions.
Enable CI that builds the documentation in each PR and commit to main via circleCI
Add basic templates to GH issues and PRs: Can copy and modify files from https://github.com/py-why/dodiscover/tree/main/.github

Some possible issues that may need to be resolved along the way:

Ensure unit-tests run in a reasonable amount of time. I'm not sure how long the unit-tests takes to run currently, but if it requires some intensive computations, then it might be beneficial to refactor to make the unit-tests faster
Ensure docs are built in a reasonable amount of time. Examples should be self-contained and short to illustrate a point. If large dataset is needed, one can always trim the dataset and then add a note saying why we did it.

Some implementation details related to CI:

Copy GH action workflow from https://github.com/py-why/dodiscover/blob/main/.github/workflows/main.yml.
Enable CircleCI (need an admin) and copy over circleCI artifact redirector for building docs https://github.com/py-why/dodiscover/blob/main/.github/workflows/circle_artifacts.yml
Copy CircleCI workflow from https://github.com/py-why/dodiscover/blob/main/.circleci/config.yml

Still a WIP to get this list fully flushed out, but here's a first go at it.

Add Error Handling for cit.py line 165/166

Hi, I experienced a ValueError: math domain error on this line from log as the value of (1 + r) / (1 - r) was negative.
Please see the screenshot below and the value of r from the debugger.
I'm not sure why this might be happening, but it would be good to catch potential errors here :)

the direct of two nodes in adjacency matrix which is result of ICALiNGAM

how to get the right graph direction between tow nodes? for example:

causal_order_ = [3,0,2,1,4,5]
adjacency_matrix =
[[0, 0, 0, 3, 0, 0]
[3, 0, 2, 0, 0, 0]
[0, 0, 0, 6, 0, 0]
[0, 0, 0, 0, 0, 0]
[8, 0, -1, 0, 0, 0]
[4, 0, 0, 0, 0, 0]]
how can I get the right directed graph?

thank you

FCI algorithm with KCI testing method

Hello,

I am testing the fci algrithm and its defaul independence testing method is 'fisherz'. I tried to use the 'kci' testing method. But it shows:

TypeError: '(slice(None, None, None), [0])' is an invalid key.

So, is there any example about the fci algorithm with kci method? Thank you.

Please provide developing guidelines for outside contributors

When I tried to pass unittests in tests/, I got confused about how maintainers collect so many test cases.
Finally, I executed python -m unittest discover -p "Test*.py" under tests/, which I hope is the correct usage.

The tests first failed due to the lack of pygam (required by CAMUV) and torch (required by PNL), as I executed python setup.py install to install the requirements.
I thought that a simple patch in setup.py could solve this problem. But WAIT! What if the maintainers just want to make big packages like torch optional? It is so nice of you to think like that.
However, it would be better to document such packages somewhere, like in README.md or setup.py.

A general advice is to provide more information about how to participate in the development of this project in README.md. It would be better to provide executable commands in continuous integration configuration.

What's more, could the developing guidelines of this repository, if it already exists, change a little bit?
For example, print is widely used in this project. I find it annoying as a user like me cannot disable those messages. Could this project use the logging package with the DEBUG level instead?

Import Error during using

I am a novice in this field, seek for help...

Operation:
pip install causal-learn(sucessfully)
touch a new file test.py(sucessfully)
copy testPC.py to test.py(sucessfully)
python3 test.py (encounter Error ：

(main) ➜  Test git:(main) ✗ python3 testC.py 
Traceback (most recent call last):
  File "testC.py", line 8, in <module>
    from causallearn.utils.cit import chisq, fisherz, gsq, kci, mv_fisherz, d_separation
ImportError: cannot import name 'd_separation' from 'causallearn.utils.cit' (/Users/sheshuaijie/app/miniconda3/envs/main/lib/python3.8/site-packages/causallearn/utils/cit.py)

Any Suggestion will be greatly appreciate!!!!

Passing required domain knowledge using add_required_by_node

Hello,
I'm new in causal discovery, and I'm not sure if it's an issue in the causal-learn package or if I'm using wrongly the add_required_by_node function.
I first built a graph based on data (without any knowledge), let's call it G1.
Then I wanted to add required edges based on domain expertise to create a new graphe G2.
The expected result : G2 will contain all directed edges defined by domain knowledge.
The observed result : G2 contains the directed edge (if the edge was already present in G1), otherwise the directed edge is not created.

Below the details :

data:
First I create a graph without any knowledge (G1), using this code :

from causallearn.search.ConstraintBased.PC import pc
      data_np = data.to_numpy()
      cg = pc(data_np)

below the result :

Second I wanted to add domain knowledge rule
rules = [
```
  ("a", "b", "required"), #a->b
  ("c", "d", "required"), #c->d
  ("b", "c", "required"), #b->c
```
]
I used add_required_by_node and add_forbidden_by_node methods to add either required or forbidden edges and to create the graph G2, using this code:

forbidden_edges = []
required_edges =  [(0, 1), (2, 3), (1, 2)] # {"a":0, "b":1, "c":2, "d":3}
data_np = data.to_numpy()
cg = pc(data_np)
nodes = cg.G.get_nodes()

bk = BackgroundKnowledge()
for (node_1_idx, node_2_idx) in forbidden_edges:
    bk.add_forbidden_by_node(nodes[node_1_idx], nodes[node_2_idx])

for (node_1_idx, node_2_idx) in required_edges:
    bk.add_required_by_node(nodes[node_1_idx], nodes[node_2_idx])

cg_with_knowledge = pc(data_np, background_knowledge=bk)

below the result :

While I was expecting more something like this (results based on another package):

Orientation rules for FCI

Hello,

Are the edge orientation rules for the FCI implementation taken from Zhang et al (2008)? This paper introduces additional orientation rules and proves their completeness. I am wondering if the entire set of orientation rules required for completeness are implemented in this package.

Zhang, J. On the completeness of orientation rules for causal discovery in the presence of latent confounders and selection bias. Artificial Intelligence 172, 1873–1896 (2008).

If not, from which citation(s) are the orientation rules taken?

Thanks!

Using scoring function 'local_score_marginal_multi' on ges() function gives error

When I call the ges() function with the score function local_score_marginal_multi with no parameters, it gives the following error: File "lib\site-packages\causallearn\score\LocalScoreFunction.py", line 824, in local_score_marginal_multi X = Data[:, parameters["dlabel"][Xi]] KeyError: 0

It looks like the labels of the parameters are not correctly parsed. In the LocalScoreFunction.py Xi is an int, but the parameters["dlabel"] have that the keys are strings. However, fixing this problem locally gives other errors, so I am not sure what goes wrong further.

The same applies when using the function with the score functionlocal_score_CV_multi

Facing problem in SHD estimation

Hi. I am getting the following KeyError whenever I try to estimate the SHD using the true graph and estimated graph.

Traceback (most recent call last): File "test_GES_metrics.py", line 25, in <module> shd = SHD(truth_cpdag, est) File "F:\REPOSITORIES\causal-learn\causallearn\graph\SHD.py", line 33, in __init__ if (not truth.get_edge(truth.get_node(nodes_name[i]), truth.get_node(nodes_name[j]))) and est.get_edge( File "F:\REPOSITORIES\causal-learn\causallearn\graph\GeneralGraph.py", line 530, in get_edge i = self.node_map[node1] KeyError: None

I am unable to figure out why this mapping error is happening and how to solve this.

implementation of fGES (fast greedy equivalence search)

Does this repo have plans to implement the algorithm fGES[1]? fGES seems to work well for large scale problems. I wanna do some work on a large scale problem. If there is a related plan, it will help to use fGES more conveniently on the python platform, instead of calling Tetrad implemented in Java.

[1] Ramsey J, Glymour M, Sanchez-Romero R, et al. A million variables and more: the fast greedy equivalence search algorithm for learning high-dimensional graphical causal models, with an application to functional magnetic resonance images[J]. International journal of data science and analytics, 2017, 3(2): 121-129.

The performance gap bewteen `KCI_UInd` and `KCI_CInd` under a similar setting

The issue is based on the code in Pull request #55

Here is just a weird problem with the performance gap between KCI_UInd and KCI_CInd. Intuitively, the test of $X\bot Y$ and $X\bot Y|Z=1$ (Z is a constant) should have a similar performance, or the latter test(use KCI_CInd) should have a worse performance due to it handling a more universal case. However, when I ran the code, the result is not as I excepted.

I test the code by a random collider dataset, which means $X\bot Z$, $X\equiv Y$; and I also visualize the test statistics, mean and var for convenient debugging. And the result shows a similar p-value of $X\bot Z$, $X\bot Y$ and a different p-value of $X\bot Z | 1$, $X\bot Y | 1$.

Following is my test code:

from icecream import ic
from causallearn.utils.cit import CIT
from tqdm import trange
import numpy as np


def generate_single_sample(type, dim):
    if (type == 'chain'):
        X = np.random.random(dim)
        Y = np.random.random(dim)+X
        Z = np.random.random(dim)+Y
        #X->Y->Z
    elif (type == 'collider'):
        # X->Y<-Z
        X = np.random.random(dim)
        Z = np.random.random(dim)
        Y = np.random.random(dim)+X+Z
    #Y = np.zeros(dim)+np.average(Y)
    return list(X)+list(Y)+list(Z)+[1]# 31 dim X:0..9; Y:10..19; Z:20..29; 1: 30

def generate_dataset(dim, size):
    dataset = []
    for i in range(size):
        datapoint = generate_single_sample('collider', dim)
        dataset.append(datapoint)
    dataset = np.array(dataset)
    return dataset


if __name__ == '__main__':
    dataset = generate_dataset(10, 1000)
    cit_tester = CIT(dataset, method = 'kci')
    #ic(cit_tester.kci(0, 20, []))
    # Origin version can not pass this due to the feature-30 have the similar value
    #ic(cit_tester.kci(0, 20, [30]))
    # The follow is from one of my recent requirements, which is using CIT to test high dim variables
    # Test high dim variables is not supported by current cit class, which is different from the documents,
    # so I also implement this function in the last commit.
    # An issue is related to the "CIT of test high dim variables" which I will put forward latter
    ic(cit_tester.kci(range(10), range(20,30), range(10,20)))
    ic(cit_tester.kci(range(10), range(20,30), []))
    ic(cit_tester.kci(range(10), range(10,20), []))
    ic(cit_tester.kci(range(10), range(20,30), [30]))
    ic(cit_tester.kci(range(10), range(10,20), [30]))

Add reference paper for implementation of granger lasso method

Hello everyone!

I want to use your library as part of an ongoing project, because I saw that you implemented a granger causality test for multi-dimensional time series called granger_lasso (docs).

I am curious based on which academic literature you implemented this method. There exist many papers who combine granger causality with some lasso regularization, so it is unclear for me why you implemented it that way.

It would be great if you can provide the reference paper here or, even better, update the documentation so other people can also benefit from this information.

Thank you very much in advanced.

GeneralGraph.subgraph bug

Hi,
thanks for the great work on the package.
I think I found a bug in GeneralGraph.subgraph() (causallearn.graph.GeneralGraph) when building on top of the method.
My code:

from causallearn.graph.GeneralGraph import GeneralGraph
import numpy as np
_ , relevant_nodes = cdag.get_parents_plus(cluster3) # A list of nodes (node objects)
#cdag.cg.G.subgraph(relevant_nodes)
subgraph = GeneralGraph(relevant_nodes)
graph = cdag.cg.G.graph # ndarray
for i in range(cdag.cg.G.num_vars):
    print(i)
    if (not cdag.cg.G.nodes[i] in relevant_nodes):
        print(cdag.cg.G.nodes[i].get_name())
        graph = np.delete(graph, i, axis = 0)

Throws error: index 8 is out of bound for axis 0 with size 8

My code is specific to my environment, but logically works the same as

import numpy as np
array = np.zeros((5,5))
for i in range(5):
    for j in range(5):
        array[i,j] = i+j
delete = [1,2,4]
for i in range(5):
    if i in delete:
        array = np.delete(array, i, axis=0)

In causallearn, the graph is a ndarray, and iteratively deletes rows/columns. This causes an index out of bounds error, as the array gets smaller and so an index later on in the loop can be out of bounds.

Interestingly, when i directly restrict from the node list of the graph, i don't get an error:

from causallearn.graph.GraphClass import CausalGraph
test = CausalGraph(no_of_var=5, node_names=['X1','X2','X3','X4','X5'])
node_list = test.G.get_nodes()
restricted_nodes = node_list[0:2] + node_list[3:5]
subgraph = test.G.subgraph(restricted_nodes)

Am i missing something or is this bugged?

A fix (which I submit as a pull request (#118) also) would be to change the code to:

def subgraph(self, nodes: List[Node]):
    subgraph = GeneralGraph(nodes)

    graph = self.graph

    nodes_to_delete = []

    for i in range(self.num_vars):
        if not (self.nodes[i] in nodes):
            nodes_to_delete .append(i)

    graph = np.delete(graph, nodes_to_delete, axis = 0)
    graph = np.delete(graph, nodes_to_delete, axis = 1)

    subgraph.graph = graph
    subgraph.reconstitute_dpath(subgraph.get_graph_edges())

    return subgraph

Let me know what you think.
Best,
Jan Marco

pip install?

Dumb question: Does pip install causal-learn grab the latest version from the repository, or do I need to grab the repository version and install that myself locally?

Thanks!

Inconsistent return types

One general problem we're having in comparing algorithms to one another is that algorithms that basically should return a GeneralGraph return it in different ways. So for instance, the way to get a graph from PC is different from the way one gets it from FCI or from GES. Is it possible to make this uniform with the same syntax?

Basically one should be able to write something like this (I think):

G = search_algorithm(data, ...)

uniformly, or perhaps

G = search_algorithm(data,...).G

maybe, but uniformly for all algorithms, it seems.

CD-NOD outputs directed cycles.

Hi, Thanks for your great work.

When I am using CD-NOD in my private data, I realize the output PDAG contains a directed cycle.

I use KCI test, other settings (e.g. orientation rules) are set exactly as default values in the illustration example.

Any idea why this phonomene shows? How should I solve this bug?

On the development of PNL

When will the implementation of the PNL method be completed? I saw that development has stopped since Feb 18, 2022. What are the remaining parts of the algorithm that needs to be implemented precisely? Maybe I can help

How to transfer general graph into matirx

How to handle named variables

Hi,

I have been working with the causal-learn package for a couple of weeks now and I was wondering if there is a plan to add support for named variables?

As I understand it, at the moment data is generally assumed to take the form of a 2d np.array where each column contains a value for a particular variable X1, X2, ..., XN. I can quite easily take a csv file and convert it to this format but, in doing so, I lose the names of my columns. This relies on me having to keep track of which variable (X1, X2, ..., XN) corresponds to which named column of my csv data, which makes tasks such as visualising the output quite difficult.

I have been using the to_pyplot method to draw the discovered DAG and passing it the labels. However, this relies on the order being preserved (simply replacing X1 with the first label, X2 with the second, and so on). This has tripped me a couple of times when comparing two graphs produced from similar data where the variables do not align.

I am not sure what the solution is here, but it would be really useful if it were possible to attach actual variable names to the graph rather than doing this manually after computation. One possibility would be to support pandas dataframes for reading in data.

Does DAG2PAG supports selection variable?

Hi, I hope to use the dag2pag function in the causal-learn lab, i.e., causallearn.utils.DAG2PAG.

However, I notice in the documentation, this function only support add latent variables.

Is there a way to add selection variables (maybe a custom change to the function) ?

Many thanks!

How to use BIC score

Hi, sorry to bother you@kunwuz ,

I'm having some problems using the bic score. It always get a big number like -6951.8887543026485
rewards = local_score_BIC(data,0,[0])

The data is created by myself. I don't know the eaxt meaning of i and Pai.

Usage of KCI test

I am interested in the KCI independence test function (causallearn.utils.cit.kci) and testing it now.

In the test, the input "data" is set with a simple numpy data table and column indices X Y are set as 1,2. However, after running the function, there is a TypeError ""str" object is not callable", as shown below. Does anyone know what is the cause of this problem , or show me with a correct example about this function's usage? Thank you.

fisherz test occasionally (but rarely) errors out

I am unsure why, but occasionally the fisherz test experiences the following error. I have not had this problem with the KCI test on the same data. I do not know if this is a bug or if it is a problem with the data. If a problem with the data, it would be more useful to be informed why.

File "/miniconda3/envs/causal/lib/python3.8/site-packages/causallearn/utils/cit.py", line 172, in fisherz
r = -inv[0, 1] / sqrt(inv[0, 0] * inv[1, 1])
ValueError: math domain error

PC algorithm and Meek rules

I generate some sample data and use the PC algorithm to get an essential graph. but when I plot the graph. It looks like this.

I have several questions.

Why there is no arrow from X8 -> X9 in this case, shouldn't Meek rules be able to orient X8-X9 as X8->X9?
What is the difference between Meek.meek and Meek.definite_meek()?
If I want to orient a DAG with Meek rules after I manually change some orientations of the graph. How should I go about it?

Thank you for having a look at this thread.

py-why / causal-learn Goto Github PK

causal-learn's People

Contributors

Stargazers

Watchers

Forkers

causal-learn's Issues

<<<<<<< HEAD test.test_anm_simul() test.test_anm_pair()

Recommend Projects

Recommend Topics

Recommend Org

Jobs

<<<<<<< HEAD
test.test_anm_simul()
test.test_anm_pair()