allendowney / thinkcomplexity2 Goto Github PK
View Code? Open in Web Editor NEWBook and code for Think Complexity, 2nd edition
Home Page: https://allendowney.github.io/ThinkComplexity2/
Book and code for Think Complexity, 2nd edition
Home Page: https://allendowney.github.io/ThinkComplexity2/
Where is the class Cell2DViewer? It's missing from Cell2D.py. Life.py in the code directory uses Cell2DViewer
Chapter 02 notebook uses G.nodes_iter, which has been removed in v2.0:
https://networkx.github.io/documentation/stable/release/migration_guide_from_1.x_to_2.0.html
Maybe a note on the modules versions needed or a requirements.txt file would be useful.
As a developer, I wanted to run the notebooks in VSCode. I struggled a bit to get this working (imports cell threw ModuleNotFoundError
), so sharing here for future readers.
https://www.anaconda.com/products/distribution
git clone https://github.com/AllenDowney/ThinkComplexity2.git
cd ThinkComplexity2
conda env create -f environment.yml
conda activate ThinkComplexity2
From the Command Palette choose Python: Select Interpreter and select the workspace environment (should be the Recommended one).
The kernel picker in the top right (base (Python 3.9.12)
in the screenshot below.
I wasn't sure if it was intentional that you're using thinkplot.Pdf
instead of thinkplot.Pmf
for the analysis in Chapter 4. This is in section 4.3, e.g.
thinkplot.Pdf(pmf_fb, label= Facebook )
thinkplot.Pdf(pmf_ws, label= WS graph )
It makes more sense to me to use thinkplot.Pmf
since we are dealing with discrete quantities.
Currently the make_all_agents
function in chapter 11 makes use of binary arithmetic:
def make_all_agents(fit_land, agent_maker):
"""Make an array of Agents.
fit_land: FitnessLandscape
agent_maker: class used to make Agent
returns: array of Agents
"""
N = fit_land.N
xs = np.arange(2**N)
ys = 2**np.arange(N)[::-1]
locs = np.bitwise_and.outer(xs, ys) > 0
agents = [agent_maker(loc, fit_land) for loc in locs]
return np.array(agents)
make_all_agents uses the outer product of bitwise_and, which is not the most obvious operation.
Would using itertools.product
(which returns the product of any set of spaces) make this more readable/obvious?
def make_all_agents(fit_land, agent_maker):
"""Make an array of Agents.
fit_land: FitnessLandscape
agent_maker: class used to make Agent
returns: array of Agents
"""
locations = itertools.product([0, 1], repeat=fit_land.N)
agents = [agent_maker(loc, fit_land) for loc in locations]
return np.array(agents)
(itertools.product(S, repeat=N)
creates a generator that gives the elements of
Currently make_all_agents
is not included in the text of the book. This is understandable given the slightly complex use of binary arithmetic, perhaps using itertools
would allow for it to be included.
Just a suggestion :)
(Note that that that particular implementation creates tuples of 0/1 and not lists of True/False, this could be refactored.)
(A very minor further suggestion would be to also use the variable name agent_class
instead of agent_maker
, I spent a little while looking around for some sort of factory function but that could well be a very personal miss read).
The link to the Data files from the Barabasi and Albert paper in Exercise 4.3 no longer works:
http://www3.nd.edu/~networks/resources.htm
It looks like SNAP is maintaining one of the Notre Dame datasets, but not the IMDB Actors dataset:
https://snap.stanford.edu/data/web-NotreDame.html
A quick look doesn't turn up an appropriate place to re-direct, so possibly remove the link or contact Barabasi's group at Northeastern.
I was able to upload the Jupyter notebooks for chapters 1-4 of ThinkComplexity2e but attempting Chapter 5 (chap05.ipynb) led to the following error:
error was: SyntaxError: JSON Parse error: Unrecognized token '<'
ThinkComplexity2/book/book.tex
Line 1210 in 30d23de
The output is not a NodeView
, but a list ['Alice', 'Bob', 'Chuck']
.
Rendering of equations doesn't look very good. Page 30 is the first example to look at. Another is the exponents on page 116. And on page 118, the number 1.5 is spaced strangely.
Is there different DocBook I could generate to make these look better? Or can we tweak them by hand?
The Appendix Exercise A.6 provides some dead links:
You can download my map implementations from \url{thinkcomplex.com/Map.py}, and the code I used in this section from \url{thinkcomplex.com/listsum.py}.
Hi Dr Allen, could your is_connected() function be replaced by a simple mathematical function, i mean, instead of:
def is_connected(G):
start = next(iter(G))
reachable = reachable_nodes(G, start)
return len(reachable) == len(G)
it could be:
def is_connected(g: nx.Graph):
maxEdges = (gr.number_of_nodes() * (g.number_of_nodes() - 1)) // 2
if g.number_of_edges() == maxEdges:
return True
return False
this simple logic will replace the reachable_nodes() function.
The beginning of section 7.1 discusses Turing's diffusion model:
In 1952 Alan Turing published a paper called “The chemical basis of morpho- genesis”, which describes the behavior of systems involving two chemicals that diffuse in space and react with each other.
It then branches to talk about a CA-based version
Turing’s model is based on differential equations, but it can also be imple- mented using a cellular automaton.
But before we get to Turing’s model, we’ll start with something simpler: a diffusion system with just one chemical.
The way it's written implies returning to Turing's DE-based model, but the chapter never returns to discuss this.
It looks like when I have a code snippet in a caption, the typesetting of the caption gets messed up. There's an example on page 119.
Same issue comes up in the footnote on page 124.
Is the DocBook I am generating correct, and getting rendered wrong, or is there something wrong with it?
The notebook ThinkComplexity2/code/appAsoln.ipynb
contains an incorrect link to the book website.
Note that the output for the straightforward loop-based implementation is different from the correlation-based versions. In the static version of the notebook, the input is:
[[1 1 1 0 1 1 1 1 0 1]
[0 1 0 1 0 1 0 0 1 1]
[0 0 1 0 0 1 1 0 0 1]
[0 0 0 0 0 1 1 0 0 1]
[0 0 0 1 0 1 1 1 1 1]
[0 0 1 0 1 0 1 0 0 0]
[0 0 0 1 0 0 0 0 0 0]
[0 1 0 0 0 1 0 1 1 0]
[1 1 0 1 0 0 1 1 1 1]
[0 0 1 0 0 0 0 0 0 0]]
The output for the loop-based implementation is:
[[0 0 0 0 0 0 0 0 0 0]
[0 0 0 1 0 0 0 0 0 0]
[0 0 1 0 0 0 0 1 0 0]
[0 0 0 0 0 0 0 0 0 0]
[0 0 0 1 0 0 0 0 1 0]
[0 0 1 0 1 0 1 0 1 0]
[0 0 1 1 1 1 1 1 0 0]
[0 1 0 0 1 0 0 0 0 0]
[0 1 0 0 0 0 1 0 0 0]
[0 0 0 0 0 0 0 0 0 0]]
The output for the cross correlation version is:
[[1 1 1 1 1 1 1 1 0 1]
[1 0 0 1 0 0 0 0 0 1]
[0 0 1 0 0 0 0 1 0 1]
[0 0 0 0 0 0 0 0 0 1]
[0 0 0 1 0 0 0 0 1 1]
[0 0 1 0 1 0 1 0 1 0]
[0 0 1 1 1 1 1 1 0 0]
[1 1 0 0 1 0 0 0 0 1]
[1 1 0 0 0 0 1 0 0 1]
[0 1 1 0 0 0 0 1 1 0]]
Note that the difference is on the boundaries, and that's because the following line doesn't work for the boundaries:
neighbors = a[i-1:i+2, j-1:j+2]
A student in my class made this observation, and also supplied an alternative version that explicitly checks for boundaries. I have confirmed that this version produces the same output as the cross correlation version.
b = np.zeros_like(a)
rows, cols = a.shape
for i in range(0, rows):
for j in range(0, cols):
state = a[i, j]
if i == 0 and j == 0:
neighbors = a[i:i+2, j:j+2]
elif j == 0:
neighbors = a[i-1:i+2, j:j+2]
elif i == 0:
neighbors = a[i:i+2, j-1:j+2]
else:
neighbors = a[i-1:i+2, j-1:j+2]
k = np.sum(neighbors) - state
if state:
if k==2 or k==3:
b[i, j] = 1
else:
if k == 3:
b[i, j] = 1
print(b)
Chapter 6, page 98 says "if the center cell is 1" instead of "if the center cell is 10".
I noticed targets
was initialized as a list, but then replaced by a set (_random_subset
returns a set) in section 4.6, barabasi_albert_graph
:
def barabasi_albert_graph(n, k):
G = nx.empty_graph(k)
targets = list(range(k))
repeated_nodes = []
for source in range(k, n):
G.add_edges_from(zip([source]*k, targets))
repeated_nodes.extend(targets)
repeated_nodes.extend([source] * k)
targets = _random_subset(repeated_nodes, k)
return G
Would it make more sense to just initialize targets
as a set?
I don't know what is your preferred way to get those kind of corrections/typos, so I posted it here.
In Chapter 2 Graphs you say
We can use reachable_nodes to write is_connected:
def is_connected(G):
start = next(G.nodes_iter())
reachable = reachable_nodes(G, start)
return len(reachable) == len(G)
is_connected chooses a starting node by calling nodes_iter, which returns an iterator object, and passing the result to next, which returns the first node.
seen gets the set of nodes that can be reached from start. If the size of this set is the same as the size of the graph, that means we can reach all nodes, which means the graph is connected.
You say seen when I think you should say reachable, because seen is the returned variable from reachable_node function which in turns gets assigned to reachable. But you are explaining now the function is_connected not the function reachable_nodes.
In chapter 11's default implementation of choose_dead
we see that a random 10% of agents die in every round. So 0.1 is the probability of death:
def choose_dead(self, ps):
"""Choose which agents die in the next timestep.
ps: probability of survival for each agent
returns: indices of the chosen ones
"""
n = len(self.agents)
is_dead = np.random.random(n) < 0.1
index_dead = np.nonzero(is_dead)[0]
return index_dead
When this is overridden to use differential survival, we flip <
to >
in the is_dead
line to interpret ps
as probability of survival:
class SimWithDiffSurvival(Simulation):
def choose_dead(self, ps):
"""Choose which agents die in the next timestep.
ps: probability of survival for each agent
returns: indices of the chosen ones
"""
n = len(self.agents)
is_dead = np.random.random(n) > ps
index_dead = np.nonzero(is_dead)[0]
return index_dead
However, this doesn't happen in chapter 12 and I believe it's a bug that affects the conclusions made in the chapter.
Note that chapter 12 uses the same default implementation of choose_dead
as chapter 11. It interprets 0.1 as probability of death. But when introducing differential survival, choose_dead
is overridden as this:
# class PDSimulation(Simulation):
def choose_dead(self, fits):
"""Choose which agents die in the next timestep.
fits: fitness of each agent
returns: indices of the chosen ones
"""
ps = prob_survive(fits)
n = len(self.agents)
is_dead = np.random.random(n) < ps
index_dead = np.nonzero(is_dead)[0]
return index_dead
Note that <
isn't flipped this time. So imagine that all the agents had infinite fitness and therefore their probability of survival was 1.0 . The line is_dead = np.random.random(n) < ps
would return all True
and kill them all off. This doesn't make sense.
Since choose_dead
takes a parameter ps
(stands for probability of survival), I think that we should flip <
to >
and use 0.9 in the default implementation so that the semantics are consistent. Then we never need to do a sign flip and there won't be confusion down the road.
However, this still leaves open the experiments & conclusions made in chapter 12, which, I believe, come with this bug in place.
Chapter 6 notebook refers to "Gosling's" gun, but this should be Gosper's gun
There are numerous stochastic experiments in the book/notebooks. Would it be worthwhile setting a seed for them?
I believe all stochastic operations are done in numpy
so the following would suffice for each experiment:
np.random.seed(0)
(
for repeated experiments I often use
for seed in range(number_repetitions):
np.random.seed(seed)
...
)
This ensures reproducibility of results.
Just a suggestion :)
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.