GithubHelp home page GithubHelp logo

Comments (6)

petrelharp avatar petrelharp commented on May 29, 2024 1

from slim.

bhaller avatar bhaller commented on May 29, 2024

Hrm. I have no guesses, but I'll look into it when I have a sec.

from slim.

bhaller avatar bhaller commented on May 29, 2024

OK, I've put some thinking-out-loud here so you can see the process, but the upshot is, bug fixed. So with a seed of 2 instead of 123, some of the first-generation individuals live on to the end, and their locations are corrupted too; it's not only the individuals that are present in the table solely because they were remembered. I added a call to sim.treeSeqOutput("first_gen.treestxt", _binary=F); before the binary-file output call, and the positions in the text output are corrupted too, so it is not an issue specifically with binary output. I also wrote a little script to read the .trees file back in and then print positions, and the positions of first-generation individuals are corrupted there too:

initialize() {
	initializeSLiMModelType("nonWF");
	initializeSLiMOptions(dimensionality="xy");
	initializeTreeSeq();
	
	initializeMutationType("m1", 0.5, "g", 0.0, 2);
	initializeGenomicElementType("g1", m1, 1.0);
	initializeGenomicElement(g1, 0, 9);
	initializeMutationRate(0.0);
	initializeRecombinationRate(1e-8);
}
reproduction() {
}
1 late() {
	sim.readFromPopulationFile("first_gen.trees");
	for (ind in p1.individuals) {
		catn(c("late", ind.pedigreeID, ind.x, ind.y));
	}
	sim.simulationFinished();
}

So the corruption is also hosing SLiM's reading back in, unsurprisingly; this is just a confirmation of what we (think we) already know, that the data in the files is in fact bad (rather than this being a weird pyslim bug or something).

So, there is something about the way individuals in the first generation are handled that is different from other individuals, and that results in their location data being corrupted. Well. When addSubpop() is called, that results in the first-gen individuals being archived immediately, right? With whatever random data happens to be in their location properties. Then setSpatialPosition() gets called, but that does not update the archived bad data. Then in the late() event you call treeSeqRememberIndividuals(), but only on the individuals that survived through the first mortality phase. It should update the archived data for those individuals; but nobody ever updates the archive for the other first-gen individuals.

So we have two groups of first-gen individuals. The first group is those who died during the first mortality phase; in my test run with seed 2, that is ids 1 4 6 9. They get archived with garbage and never get updated, so it is not surprising that they have garbage. To fix them, I would think you would want to call rememberIndividuals() in your 1 early() event after setting up their spatial positions.

But then the second group is those who lived through the first mortality phase; in my test run that is 0 2 3 5 7 8. They should have their archives updated by treeSeqRememberIndividuals(), it seems to me, and that apparently is not happening. SLiMSim::ExecuteMethod_treeSeqRememberIndividuals() calls SLiMSim::AddIndividualsToTable(), and the code there to update the location data... makes no sense. It was passing memcpy() &location, which is a pointer to a std::vector; and it was passing a size of location.size(), which is the number of entries in location, not the number of bytes. Changing that to location.data() and location.size() * sizeof(double) seems to have fixed the bug, as far as my tests indicate.

And after fixing that bug, the first group of individuals (1 4 6 9) did indeed still have garbage location data, and adding a treeSeqRememberIndividuals() call at the end of the 1 early() event did indeed fix that. So that also seems to make sense, although it's sucky that one has to call treeSeqRememberIndividuals() for this to work; that's a bit counterintuitive and will probably hose other people too. Not sure how to fix it, though.

This really is one of those "how did this ever work??" bugs, since any call to treeSeqRememberIndividuals() that caused an update of an existing record ought to have corrupted the location data, as far as I can tell. So all of the individuals that lived for more than one generation in your model should have become corrupted, it seems to me; I'm not sure why they didn't. The mysteries of life; I'm not inclined to track that down. :->

If you think there are any loose ends here, let me know. Thanks for the catch, this is a good bug. :->

from slim.

petrelharp avatar petrelharp commented on May 29, 2024

Whew, good catch. I don't think that the bad info in non-surviving individuals is terrible; but perhaps there should be a bit in the treeSeqRememberIndividuals() documentation saying that (a) you can call it in early and/or late; and (b) maybe you'd want to call it in early, before mortality; and (c) a brief explanation of this gotcha.

I guess we didn't see this because we've got no tests that check whether first-gen locations are correctly recorded.

from slim.

bhaller avatar bhaller commented on May 29, 2024

@petrelharp The new doc:

– (void)treeSeqRememberIndividuals(object individuals)
Permanently adds the individuals specified by individuals to the sample retained across tree sequence table simplification. This method may only be called if tree sequence recording has been turned on with initializeTreeSeq(). All currently living individuals are always retained across simplification; this method does not need to be called, and indeed should not be called, for that purpose. Instead, treeSeqRememberIndividuals() is for permanently adding particular individuals to the retained sample. Typically this would be used, for example, to retain particular individuals that you wanted to be able to trace ancestry back to in later analysis. However, this is not the typical usage pattern for tree sequence recording; most models will not need to call this method.
Calling treeSeqRememberIndividuals() on an individual that is already remembered will cause the archived information about the remembered individual to be updated to reflect the individual’s current state. A case where this is particularly important is for the spatial location of individuals in continuous-space models. SLiM automatically remembers the individuals that comprise the first generation of any new subpopulation created with addSubpop(), for easy recapitation and other analysis (see section 16.10). However, since these first-generation individuals are remembered at the moment they are created, their spatial locations have not yet been set up, and will contain garbage – and those garbage values will be archived in their remembered state. If you need correct spatial locations of first-generation individuals for your post-simulation analysis, you should call treeSeqRememberIndividuals() explicitly on the first generation, after setting spatial locations, to update the archived information with the correct spatial positions.

Does that seem good?

from slim.

bhaller avatar bhaller commented on May 29, 2024

Sure, good edit. This will not make it into the 3.2 docs (that train has just left the station :->), but it'll roll in whatever the next version after that is. :->

from slim.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.