allendowney / thinkcomplexity2 Goto Github PK

View Code? Open in Web Editor NEW

712.0 39.0 647.0 188.35 MB

Book and code for Think Complexity, 2nd edition

Home Page: https://allendowney.github.io/ThinkComplexity2/

Makefile 0.07% TeX 5.59% HTML 0.10% Python 1.08% Jupyter Notebook 93.16%

thinkcomplexity2's People

Contributors

Stargazers

Watchers

Forkers

mustafaascha nsrinivasapps cjwoodard nshlapo brennamanning joeylmaalouf daniel6 leonjunwei kghite margocrawf sungwoopark duncandhall buttegab megsaysrawr adamcoppola selinawang gubbatuba youralien jaredbriskman matthewruehle apan64 willythor seanccarter phuston dhashs jsutker map0logo vyraun selat minzhe anhnguyendepocen gwtaylor asimonia ozgurbozat nakamura9 vithursant sz222 alick2016 dwww2012 kmontg krashk cwt1 pesouza masayas dlsucomet anjat targetit maja-lin aleandros zugenliu terratenney ugurmutlu kartechbabu davidjmachado odewahn yotakecrookshanks strawhatdrag0n ilosea lackofentropy kmccabe akshayemp ludovicvalet jiaor markhyphen bixiu addelalat cbojer rscheiwe endlesspint8 cloud9-xx abrahamsangha ccarloss66 tiagoirineu vickymmcd elepert yehemily seunginlyu nrivkin lucywilcox apurvaraman ericasaywhat mpbrucker cleanestmink126 newsch labseven poosomooso katyadonovan mightydeveloper ailuropoda0 nmohamed mitaxim kaitlynkeil pairofpatheticperipatetics jp-wright dstrodtman shak2633 peior lishucai bengapple akodod

thinkcomplexity2's Issues

Cell2DViewer is missing

Where is the class Cell2DViewer? It's missing from Cell2D.py. Life.py in the code directory uses Cell2DViewer

NetworkX 2.0 not supported

Chapter 02 notebook uses G.nodes_iter, which has been removed in v2.0:
https://networkx.github.io/documentation/stable/release/migration_guide_from_1.x_to_2.0.html

Maybe a note on the modules versions needed or a requirements.txt file would be useful.

Running Notebooks in VSCode

As a developer, I wanted to run the notebooks in VSCode. I struggled a bit to get this working (imports cell threw ModuleNotFoundError), so sharing here for future readers.

Install Anaconda

https://www.anaconda.com/products/distribution

Clone the repo

git clone https://github.com/AllenDowney/ThinkComplexity2.git
cd ThinkComplexity2

Create and activate the environment

conda env create -f environment.yml
conda activate ThinkComplexity2

Select the environment in VSCode

From the Command Palette choose Python: Select Interpreter and select the workspace environment (should be the Recommended one).

VSCode docs.

Open a notebook and select the Conda Kernel

The kernel picker in the top right (base (Python 3.9.12) in the screenshot below.

VSCode docs.

Run the import cell to check it work

PDF (continuous) plot being used for PMFs

I wasn't sure if it was intentional that you're using thinkplot.Pdf instead of thinkplot.Pmf for the analysis in Chapter 4. This is in section 4.3, e.g.

thinkplot.Pdf(pmf_fb, label= Facebook )
thinkplot.Pdf(pmf_ws, label= WS graph )

It makes more sense to me to use thinkplot.Pmf since we are dealing with discrete quantities.

Using itertools for `make_all_agents` in chapter 11

Currently the make_all_agents function in chapter 11 makes use of binary arithmetic:

def make_all_agents(fit_land, agent_maker):
    """Make an array of Agents.
    
    fit_land: FitnessLandscape
    agent_maker: class used to make Agent
    
    returns: array of Agents
    """
    N = fit_land.N
    xs = np.arange(2**N)
    ys = 2**np.arange(N)[::-1]    
    locs = np.bitwise_and.outer(xs, ys) > 0
    agents = [agent_maker(loc, fit_land) for loc in locs]
    return np.array(agents)

make_all_agents uses the outer product of bitwise_and, which is not the most obvious operation.

Would using itertools.product (which returns the product of any set of spaces) make this more readable/obvious?

def make_all_agents(fit_land, agent_maker):
    """Make an array of Agents.
    
    fit_land: FitnessLandscape
    agent_maker: class used to make Agent
    
    returns: array of Agents
    """
    locations =  itertools.product([0, 1], repeat=fit_land.N)
    agents = [agent_maker(loc, fit_land) for loc in locations]
    return np.array(agents)

(itertools.product(S, repeat=N) creates a generator that gives the elements of $S^N$)

Currently make_all_agents is not included in the text of the book. This is understandable given the slightly complex use of binary arithmetic, perhaps using itertools would allow for it to be included.

Just a suggestion :)

(Note that that that particular implementation creates tuples of 0/1 and not lists of True/False, this could be refactored.)

(A very minor further suggestion would be to also use the variable name agent_class instead of agent_maker, I spent a little while looking around for some sort of factory function but that could well be a very personal miss read).

Notre Dame web graph data link is dead (Exercise 4.3)

The link to the Data files from the Barabasi and Albert paper in Exercise 4.3 no longer works:
http://www3.nd.edu/~networks/resources.htm

It looks like SNAP is maintaining one of the Notre Dame datasets, but not the IMDB Actors dataset:
https://snap.stanford.edu/data/web-NotreDame.html

A quick look doesn't turn up an appropriate place to re-direct, so possibly remove the link or contact Barabasi's group at Northeastern.

Chapter 5 of ThinkComplexity2e syntax error on upload

I was able to upload the Jupyter notebooks for chapters 1-4 of ThinkComplexity2e but attempting Chapter 5 (chap05.ipynb) led to the following error:

error was: SyntaxError: JSON Parse error: Unrecognized token '<'

The output should be a list.

ThinkComplexity2/book/book.tex

Line 1210 in 30d23de

NodeView(('Alice', 'Bob', 'Chuck'))

The output is not a NodeView, but a list ['Alice', 'Bob', 'Chuck'].

PROD: equations don't look great

Rendering of equations doesn't look very good. Page 30 is the first example to look at. Another is the exponents on page 116. And on page 118, the number 1.5 is spaced strangely.

Is there different DocBook I could generate to make these look better? Or can we tweak them by hand?

Dead links in Appendix Exercise A.6

The Appendix Exercise A.6 provides some dead links:

You can download my map implementations from \url{thinkcomplex.com/Map.py}, and the code I used in this section from \url{thinkcomplex.com/listsum.py}.

Simple is_connected() function

Hi Dr Allen, could your is_connected() function be replaced by a simple mathematical function, i mean, instead of:

def is_connected(G):
start = next(iter(G))
reachable = reachable_nodes(G, start)
return len(reachable) == len(G)

it could be:

def is_connected(g: nx.Graph):
maxEdges = (gr.number_of_nodes() * (g.number_of_nodes() - 1)) // 2

if g.number_of_edges() == maxEdges:
    return True
return False

this simple logic will replace the reachable_nodes() function.

Turing diffusion model

The beginning of section 7.1 discusses Turing's diffusion model:

In 1952 Alan Turing published a paper called “The chemical basis of morpho- genesis”, which describes the behavior of systems involving two chemicals that diffuse in space and react with each other.

It then branches to talk about a CA-based version

Turing’s model is based on differential equations, but it can also be imple- mented using a cellular automaton.
But before we get to Turing’s model, we’ll start with something simpler: a diffusion system with just one chemical.

The way it's written implies returning to Turing's DE-based model, but the chapter never returns to discuss this.

PROD: captions with code in them are broken

It looks like when I have a code snippet in a caption, the typesetting of the caption gets messed up. There's an example on page 119.

Same issue comes up in the footnote on page 124.

Is the DocBook I am generating correct, and getting rendered wrong, or is there something wrong with it?

Incorrect link to the book website

The notebook ThinkComplexity2/code/appAsoln.ipynb contains an incorrect link to the book website.

Chapter 6 notebook: Straightforward translation of GoL rules using loops doesn't handle boundaries correctly

Note that the output for the straightforward loop-based implementation is different from the correlation-based versions. In the static version of the notebook, the input is:

[[1 1 1 0 1 1 1 1 0 1]
 [0 1 0 1 0 1 0 0 1 1]
 [0 0 1 0 0 1 1 0 0 1]
 [0 0 0 0 0 1 1 0 0 1]
 [0 0 0 1 0 1 1 1 1 1]
 [0 0 1 0 1 0 1 0 0 0]
 [0 0 0 1 0 0 0 0 0 0]
 [0 1 0 0 0 1 0 1 1 0]
 [1 1 0 1 0 0 1 1 1 1]
 [0 0 1 0 0 0 0 0 0 0]]

The output for the loop-based implementation is:

[[0 0 0 0 0 0 0 0 0 0]
 [0 0 0 1 0 0 0 0 0 0]
 [0 0 1 0 0 0 0 1 0 0]
 [0 0 0 0 0 0 0 0 0 0]
 [0 0 0 1 0 0 0 0 1 0]
 [0 0 1 0 1 0 1 0 1 0]
 [0 0 1 1 1 1 1 1 0 0]
 [0 1 0 0 1 0 0 0 0 0]
 [0 1 0 0 0 0 1 0 0 0]
 [0 0 0 0 0 0 0 0 0 0]]

The output for the cross correlation version is:

[[1 1 1 1 1 1 1 1 0 1]
 [1 0 0 1 0 0 0 0 0 1]
 [0 0 1 0 0 0 0 1 0 1]
 [0 0 0 0 0 0 0 0 0 1]
 [0 0 0 1 0 0 0 0 1 1]
 [0 0 1 0 1 0 1 0 1 0]
 [0 0 1 1 1 1 1 1 0 0]
 [1 1 0 0 1 0 0 0 0 1]
 [1 1 0 0 0 0 1 0 0 1]
 [0 1 1 0 0 0 0 1 1 0]]

Note that the difference is on the boundaries, and that's because the following line doesn't work for the boundaries:

neighbors = a[i-1:i+2, j-1:j+2]

A student in my class made this observation, and also supplied an alternative version that explicitly checks for boundaries. I have confirmed that this version produces the same output as the cross correlation version.

b = np.zeros_like(a)
rows, cols = a.shape
for i in range(0, rows):
    for j in range(0, cols):
        state = a[i, j]
        
        if i == 0 and j == 0:
            neighbors = a[i:i+2, j:j+2]
        elif j == 0:
            neighbors = a[i-1:i+2, j:j+2]
        elif i == 0:
            neighbors = a[i:i+2, j-1:j+2]
        else:
            neighbors = a[i-1:i+2, j-1:j+2]
            
        k = np.sum(neighbors) - state
        if state:
            if k==2 or k==3:
                b[i, j] = 1
        else:
            if k == 3:
                b[i, j] = 1

print(b)

Typo in Chapter 6

Chapter 6, page 98 says "if the center cell is 1" instead of "if the center cell is 10".

targets initialized as a list instead of a set in barabasi_albert_graph

I noticed targets was initialized as a list, but then replaced by a set (_random_subset returns a set) in section 4.6, barabasi_albert_graph:

def barabasi_albert_graph(n, k):

    G = nx.empty_graph(k)
    targets = list(range(k))
    repeated_nodes = []

    for source in range(k, n):
        G.add_edges_from(zip([source]*k, targets))

        repeated_nodes.extend(targets)
        repeated_nodes.extend([source] * k)

        targets = _random_subset(repeated_nodes, k)
    return G

Would it make more sense to just initialize targets as a set?

Maybe a "correction" for Chapter 2 Graphs

I don't know what is your preferred way to get those kind of corrections/typos, so I posted it here.

In Chapter 2 Graphs you say

We can use reachable_nodes to write is_connected:

def is_connected(G):
    start = next(G.nodes_iter())
    reachable = reachable_nodes(G, start)
    return len(reachable) == len(G)

is_connected chooses a starting node by calling nodes_iter, which returns an iterator object, and passing the result to next, which returns the first node.

seen gets the set of nodes that can be reached from start. If the size of this set is the same as the size of the graph, that means we can reach all nodes, which means the graph is connected.

You say seen when I think you should say reachable, because seen is the returned variable from reachable_node function which in turns gets assigned to reachable. But you are explaining now the function is_connected not the function reachable_nodes.

[BUG] Chapter 12 notebook treats probability of survival as probability of death

In chapter 11's default implementation of choose_dead we see that a random 10% of agents die in every round. So 0.1 is the probability of death:

def choose_dead(self, ps):
        """Choose which agents die in the next timestep.
        ps: probability of survival for each agent
        returns: indices of the chosen ones
        """
        n = len(self.agents)
        is_dead = np.random.random(n) < 0.1
        index_dead = np.nonzero(is_dead)[0]
        return index_dead

When this is overridden to use differential survival, we flip < to > in the is_dead line to interpret ps as probability of survival:

class SimWithDiffSurvival(Simulation):
    def choose_dead(self, ps):
        """Choose which agents die in the next timestep.
        ps: probability of survival for each agent
        returns: indices of the chosen ones
        """
        n = len(self.agents)
        is_dead = np.random.random(n) > ps
        index_dead = np.nonzero(is_dead)[0]
        return index_dead

However, this doesn't happen in chapter 12 and I believe it's a bug that affects the conclusions made in the chapter.

Note that chapter 12 uses the same default implementation of choose_dead as chapter 11. It interprets 0.1 as probability of death. But when introducing differential survival, choose_dead is overridden as this:

# class PDSimulation(Simulation):

    def choose_dead(self, fits):
        """Choose which agents die in the next timestep.
        fits: fitness of each agent
        returns: indices of the chosen ones
        """
        ps = prob_survive(fits)
        n = len(self.agents)
        is_dead = np.random.random(n) < ps
        index_dead = np.nonzero(is_dead)[0]
        return index_dead

Note that < isn't flipped this time. So imagine that all the agents had infinite fitness and therefore their probability of survival was 1.0 . The line is_dead = np.random.random(n) < ps would return all True and kill them all off. This doesn't make sense.

Since choose_dead takes a parameter ps (stands for probability of survival), I think that we should flip < to > and use 0.9 in the default implementation so that the semantics are consistent. Then we never need to do a sign flip and there won't be confusion down the road.

However, this still leaves open the experiments & conclusions made in chapter 12, which, I believe, come with this bug in place.

Chapter 6 notebook "Gosling's" gun

Chapter 6 notebook refers to "Gosling's" gun, but this should be Gosper's gun

Make the stochastic experiments reproducible with a seed

There are numerous stochastic experiments in the book/notebooks. Would it be worthwhile setting a seed for them?

I believe all stochastic operations are done in numpy so the following would suffice for each experiment:

np.random.seed(0)

(
for repeated experiments I often use

for seed in range(number_repetitions):
    np.random.seed(seed)
    ...

)

This ensures reproducibility of results.

For the reader it would ensure they get the exact same images etc as in the book;
It's a good habit for potential research/grad students using the book.