GithubHelp home page GithubHelp logo

Comments (5)

thomaskeller79 avatar thomaskeller79 commented on August 17, 2024

Just a small comment: the rddl.competition.Server component can handle the instance, which is why I assume its not a syntax issue of the rddl file but has something to do with the RandomBoolPolicy

from rddlsim.

fdouglis avatar fdouglis commented on August 17, 2024

So, I just used RandomBoolPolicy based on some other examples. What would be the appropriate command to run it the same way that prost uses it, but just a single execution? Or is the issue that if you aren't feeding in commands through the server instance, you need some other policy to tell it how to decide on actions to take?

I feel like there is a certain disconnect in the way rddlsim runs standalone and via prost -- there are things that prost runs ok with but cause invariant failures in rddlsim alone, and vice-versa. I think the problem is that the things that check invariants consider just the current state. For instance, when picking a random action and a random value for that action, rddlsim would try the action and if it fails, back up and try another. If it can pick every possible action/value over a long horizon and it hits an exception it has to back up and try the next thing ... deep stack and potentially exponential runtime. With prost I have the opposite problem, where it decides that something leads down a bad path and it aborts rather than dropping that path rather than looking only at other options. You've said if it goes down a bad path, it is a badly formed domain. Is there documentation or anything that can better explain how one forms such paths properly?

I've seen examples in the couple of domains that added the intermediate fluent manually.. things like this:

        // enforce proceed-interm-level at all levels but @level0
        (current-level == @level1) => proceed-interm-level;

        // forbid proceed-interm-level at all other levels but @level0
        proceed-interm-level => (current-level == @level1);

But when I've tried things like this for parameterized actions, it hasn't gone well at all. I hit prost complaining about parse-time errors that are actually violations of at least one precondition for all actions. I added the change to get a human-readable version of the preconditions, and my own changes to say "action N failed recondition M" so I can see which preconditions are tripping it up -- but not the state it is in at the time to decide that it should fail. That is, rddlsim will say "invariant failed, and the current state is x := true and y := false" but prost simply says "nope, not gonna happen".

You mentioned -log VERBOSE as an option, by the way, and maybe that would help here, but I didn't figure out a place to put that option such that I saw a difference in the output. Where does the option go?

Thanks much.

from rddlsim.

fdouglis avatar fdouglis commented on August 17, 2024

And yeah, I see I made this comment in the rddlsim issue, and it is more appropriate someplace in the prost github or email ... whoops.

from rddlsim.

thomaskeller79 avatar thomaskeller79 commented on August 17, 2024

So, I just used RandomBoolPolicy based on some other examples. What would be the appropriate command to run it the same way that prost uses it, but just a single execution? Or is the issue that if you aren't feeding in commands through the server instance, you need some other policy to tell it how to decide on actions to take?

I assume that the RandomBoolPolicy samples a random action, ignoring action applicability. I don't know anything about the other policies implemented in rddlsim, @ssanner can you say something about that?

I feel like there is a certain disconnect in the way rddlsim runs standalone and via prost -- there are things that prost runs ok with but cause invariant failures in rddlsim alone, and vice-versa. I think the problem is that the things that check invariants consider just the current state. For instance, when picking a random action and a random value for that action, rddlsim would try the action and if it fails, back up and try another. If it can pick every possible action/value over a long horizon and it hits an exception it has to back up and try the next thing ... deep stack and potentially exponential runtime. With prost I have the opposite problem, where it decides that something leads down a bad path and it aborts rather than dropping that path rather than looking only at other options. You've said if it goes down a bad path, it is a badly formed domain. Is there documentation or anything that can better explain how one forms such paths properly?

No, there is no such documentation. Prost expects that there is at least one applicable action in every reachable state, which is an assumption that is made because it is unclear what happens in a state without an applicable action (and because there are different possibilities to define the semantics of this case, something no one ever did because it wasn't necessary). As a general rule, you can avoid this by adding a dummy action to your domain that is applicable whenever no other action is applicable. Of course, it is easier to say this than implement this, because it requires that you know the set of states where no action is applicable, and it requires that it is possible to describe that set compactly with a logical formula.

I've seen examples in the couple of domains that added the intermediate fluent manually.. things like this:

        // enforce proceed-interm-level at all levels but @level0
        (current-level == @level1) => proceed-interm-level;

        // forbid proceed-interm-level at all other levels but @level0
        proceed-interm-level => (current-level == @level1);

These are necessary because of the interm-fluent compilation. If you have more than 1 level because you have interm-fluents at higher levels, it is probably best to replace these with:

(current-level ~= @level0) => proceed-interm-level;
proceed-interm-level => (current-level ~= @level0);

With these, you basically say that in every state where the interm-fluent is different from 0, you have to apply the artifical "proceed-interm-level" action (which is only there to allow the evaluation of interm-fluents as state-fluents). If all your other constraints now only talk about states where (current-level == @level0) and you make sure that there is an applicable action for all possible assignments for all other variables, you should be fine.

But when I've tried things like this for parameterized actions, it hasn't gone well at all. I hit prost complaining about parse-time errors that are actually violations of at least one precondition for all actions. I added the change to get a human-readable version of the preconditions, and my own changes to say "action N failed recondition M" so I can see which preconditions are tripping it up -- but not the state it is in at the time to decide that it should fail. That is, rddlsim will say "invariant failed, and the current state is x := true and y := false" but prost simply says "nope, not gonna happen".

I agree this can be useful. Feel free to open an issue for prost that handles this (don't make it too general, but describe exactly the case where prost crashes because there is no applicable action; then there is a chance that I will find the time to actually implement this).

You mentioned -log VERBOSE as an option, by the way, and maybe that would help here, but I didn't figure out a place to put that option such that I saw a difference in the output. Where does the option go?

E.g., ./prost.py elevators_inst_mdp__1 [PROST -log VERBOSE -s 1 -se [IPC2014]]

Note that this only affects logging of information in the search component, we want to add this to the parser in issue 114 .

Thanks much.

from rddlsim.

fdouglis avatar fdouglis commented on August 17, 2024

So, I just used RandomBoolPolicy based on some other examples. What would be the appropriate command to run it the same way that prost uses it, but just a single execution? Or is the issue that if you aren't feeding in commands through the server instance, you need some other policy to tell it how to decide on actions to take?

I assume that the RandomBoolPolicy samples a random action, ignoring action applicability. I don't know anything about the other policies implemented in rddlsim, @ssanner can you say something about that?

Yeah, that is the part that confuses me. It seems like in the vanilla rddlsim, it will try a boolean generator like that, and if it fails, it will back out. With Prost, it feels like you assume that if you go down a path where you then find yourself unable to generate an action that satisfies the preconditions, you abort.

...
No, there is no such documentation. Prost expects that there is at least one applicable action in every reachable state, which is an assumption that is made because it is unclear what happens in a state without an applicable action (and because there are different possibilities to define the semantics of this case, something no one ever did because it wasn't necessary). As a general rule, you can avoid this by adding a dummy action to your domain that is applicable whenever no other action is applicable. Of course, it is easier to say this than implement this, because it requires that you know the set of states where no action is applicable, and it requires that it is possible to describe that set compactly with a logical formula.

I actually tried such a dummy action, but it didn't help. Again, more likely operator error than anything else.

...
These are necessary because of the interm-fluent compilation. If you have more than 1 level because you have interm-fluents at higher levels, it is probably best to replace these with:

(current-level ~= @level0) => proceed-interm-level;
proceed-interm-level => (current-level ~= @level0);

Yeah, I basically did that. Is the notion that proceed-interm-level is a default action, such that it's not good enough to say "don't do real actions unless @level0, but do proceed-interm at other levels so you have something to do"?

I agree this can be useful. Feel free to open an issue for prost that handles this (don't make it too general, but describe exactly the case where prost crashes because there is no applicable action; then there is a chance that I will find the time to actually implement this).

OK, thanks.

...

E.g., ./prost.py elevators_inst_mdp__1 [PROST -log VERBOSE -s 1 -se [IPC2014]]

Thanks.

from rddlsim.

Related Issues (11)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.