GithubHelp home page GithubHelp logo

Comments (11)

tofuwen avatar tofuwen commented on May 27, 2024

Hi Andrew,

Thanks so much for your input! We really appreciate it.

Currently we are working on "code quality improvement project", and we will fix this issue during the project. (We already included this issue in our tracking doc, and assigned to the owner).

Currently our main focus is to "adding more tests to the causal-learn" project, as without reliable and automatic tests, it's hard to ensure the correctness of the code.

And of course, feel free to send a PR to help us make causal-learn better! :) We are more than happy to review. :)

And regarding the priority of this issue, it seems to me that users can reconstruct the graph from data, instead of calling "add_node()" / "remove_nodes()" right? I just want to make sure that no one is blocked by this issue now. :)

from causal-learn.

AndrewC19 avatar AndrewC19 commented on May 27, 2024

Hi,

No problems, and thanks for the quick response!

I'm not sure what you mean by reconstructing the graph from data. For my use case, I want to first create a graph by learning its structure from data (using the pc algorithm). Then, I want to merge a series of binary indicator nodes into a single categorical variable. To achieve this, I do the following:

  1. For each categorical variable, obtain its binary indicators, B (e.g. if my category is favourite drink, I might have three indicators drink_coffee, drink_tea, drink_water).
  2. For each indicator in B, record its parents, P, and children, C.
  3. Remove each indicator from the graph.
  4. Add a new node to the graph for the category (e.g. drink)
  5. Add an edge from all nodes in the set of parents P to the new node.
  6. Add an edge from the new node to all nodes in the set of children C.

As you can see, this requires me to update the graph after removing a set of nodes. Is there a way around this for the time being?

from causal-learn.

AndrewC19 avatar AndrewC19 commented on May 27, 2024

Also, two questions regarding making a PR:

  1. Do you have any contributor guidelines?
  2. I believe the underlying problem here is the updating of the dpath instance variable for GeneralGraph. Specifically, the method reconstitute_dpath is called from within remove_node with the set of all graph edges as an argument. There doesn't appear to be any documentation for reconstitute_dpath so I'm not sure what it should be doing and, therefore, how to fix it. Could you give me some guidance on what this method should be doing?

from causal-learn.

tofuwen avatar tofuwen commented on May 27, 2024

Hi Andrew,

Thanks for the clear explanation! Yeah, it seems in your use case, you need the remove_node() to behave functionally.

  1. Regarding your first comment, I am wondering why you want to perform PC on the graph generated in this way. How about directly perform PC on graph with categorical variables, instead of the binary indicators? The problem of your current construction is that it violates the faithfulness assumption (i.e. drink_coffee, drink_tea, drink_water forms a deterministic relationship: drink_coffee + drink_tea + drink_water == 1). And as a result, conditioning on any two variables, the third variable is a constant, which is independent to any other variables.

  2. "Do you have any contributor guidelines": currently we don't have one. You reminds me creating a contributor guideline. :) I am wondering in your experience, do you have an example about good contributor guidelines? We just started the "code quality control project", so lots of things are new. Previously each individual researchers contributed their own code to the package. So as you can see, the code style is not consistent across the packages, and we lack good tests. We are currently in progress to make the code quality of causal-learn much better. :)

Regarding making PR to causal-learn, currently what I am thinking is:

  1. Follow "Google Python Style Guide"
  2. send PR with good descriptions and test plan
  1. Regarding your last question, sorry I am not expert here as well. Let me tag the owner here to better help you. cc @chenweiDelight

from causal-learn.

AndrewC19 avatar AndrewC19 commented on May 27, 2024

Hi again,

Thanks for the in-depth response. In response to your comments:

  1. Could you explain in slightly more detail how/why this violates the faithfulness assumption?

  2. Could you explain how you would apply PC to data that contains categorical and continuous data? My understanding is that this doesn't work by default and that your categorical variables need to be pre-processed properly.

    To this end, I have implemented the Multinomial Logistic Regression Test from the paper "Evaluation of Causal Structure Learning Methods on Mixed Data Types (Raghu, Poon, and Benos, 2018)". Following this approach, if I want to test whether X and Y are independent conditional on an adjustment set S, where X and Y are continuous and some variables in S are categorical, I can convert categorical S to binary indicator variables and apply linear regression. It also states that, if either X or Y are categorical, then we should apply logistic regression.

    Based on your first comment, would this approach not violate the faithfulness assumption too?

  3. I would suggest providing instructions on how to create a PR and the general expected quality (e.g. coding standards, maintaining test coverage, passing test cases). Here's an example of good contributor guidelines: https://github.com/pykale/pykale/blob/main/.github/CONTRIBUTING.md.

    It might also be worth setting up some continuous integration tests with GitHub actions. This can be implemented so that whenever a PR is made, your tests are run automatically (potentially on multiple different operating systems and python versions) to ensure that your PR does not break the existing code.

Hope that helps!

from causal-learn.

tofuwen avatar tofuwen commented on May 27, 2024

Hi Andrew,

  1. I briefly googled "violation of faithfulness because of deterministic relations", and found https://arxiv.org/pdf/2010.14265.pdf, which states "This assumption can, however, be violated in many ways, including xor
    connections, deterministic functions or cancelling paths. " But I don't find a good example in the paper --- you probably need to dig more into references to check. But hope this can be served as a good starting point.

I can also give you an example: Consider we have two boolean variables "drink or not", "happy or not". And assume in the true graph, "drink" causes "happy". Now let's create the third variable "not drink" (i.e. "not drink" and "drink" must be one true and one false, they form deterministic relationships)

Consider running PC on the graph with these three variables: "drink or not", "not drink", "happy" ==> we will get that
conditioning on "not drink", "drink" is constant, thus "happy" and "drink" is independent, so there won't be an edge between.

This violates faithfulness assumptions, i.e. these independence found in the data are NOT due to separations in the true causal graph.

Hope this helps.

  1. Hmm, I am curious about this. I've tagged experts regarding this questions in another PR. I don't know what's the best way to run PC in causal learn with those data types.

  2. thanks! that's super helpful! :)

from causal-learn.

AndrewC19 avatar AndrewC19 commented on May 27, 2024

Hi again,

Thanks so much for your help, that's much clearer now. I really appreciate you taking time to respond to my questions in such detail.

I was wondering whether it would be possible to arrange a virtual meeting? I am planning to conduct an experiment as part of my thesis that evaluates how well different causal discovery algorithms apply to different types of computational model data. I am fairly new to causal discovery and would greatly appreciate some help with a few questions that aren't necessarily related to causal-learn.

from causal-learn.

tofuwen avatar tofuwen commented on May 27, 2024

Hi Andrew,

I'd like to help, but I am also relatively new to causal discovery (I just started my PhD last August, and my first year project is not very causal-related), so I am not sure whether I have the ability to help you.

The good news is: my team has lots of causal experts --- how about you sending me the questions via email (yewenf AT andrew.cmu.edu), and let me see whether I know the answers or I can ask some causal experts in my team. :)

We can discuss more in the emails --- let's the thread here to focus on causal-learn related issues, so when others check this issue later, they won't be distracted. :)

from causal-learn.

chenweiDelight avatar chenweiDelight commented on May 27, 2024

Hi Andrew,

Thank you for pointing out this issue!
It seems that there are some bugs in remove_node(). For example, when removing the node, it does not update the 'dpath'.
If you would like to reconstruct the graph, I think it needs to fix these bugs first. If you have any ideas or updates about fixing the bugs in remove_node(), it would be great if you send RP.

Another solution is, that you can preprocess the data of the categorical variables (e.g. drink), and then apply PC to the new data to output the graph.

from causal-learn.

AndrewC19 avatar AndrewC19 commented on May 27, 2024

Hi,

Your proposed solution is what I am doing at the moment; I preprocess my data to convert categorical variables to binary indicators. However, @tofuwen pointed out that this approach would lead to problems as the inclusion of binary indicator variables violates the causal faithfulness assumption.

I'm a little bit confused with how best to proceed!

from causal-learn.

chenweiDelight avatar chenweiDelight commented on May 27, 2024

Hi Andrew,

I am sorry that I'm confused about your task. For the causal faithfulness assumption, please refer to the paper that @tofuwen mentioned. You can send the detail of your task and your questions to @tofuwen and me (chenweiDelight AT gmail.com)via email. We can discuss more in the emails.

from causal-learn.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.