amyjko / foundations-of-information Goto Github PK

A book to support the INFO 200 Intellectual Foundations of Information course.

License: Other

Shell 100.00%

foundations-of-information's Introduction

For several years, I've taught an introductory course on the intellectual foundations of information at the University of Washington. I've written this book as a gateway into the topics of the course, providing a broad overview of major topics in information, and links into the deeper research and popular literature. You can find the published form here:

https://faculty.washington.edu/ajko/books/foundations-of-information

foundations-of-information's People

Contributors

Stargazers

Watchers

foundations-of-information's Issues

Add Neeva interview podcast

https://www.nytimes.com/2021/07/29/opinion/sway-kara-swisher-sridhar-ramaswamy.html

Interesting points about ad revenue and how it warps search results, privacy.

Add right to be forgotten citation

Korenhof, P. & Koops, B-K. (2014). Identity construction and the right to be forgotten: The case of gender identity. In A. Ghezzi, A. G. Pereira, & L. Vesnić-Alujević (Eds.), The ethics of memory in a digital age: Interrogating the right to be forgotten (pp. 102-121). Palgrave Macmillan

Incorrect definition of Shannon Information/ Kolmogorov complexity in Ch3

The following quoted section, the way it's currently written, is confusing entropy with Kolmogorov complexity. https://en.wikipedia.org/wiki/Kolmogorov_complexity

"For example, the sequence of letters aaaaa has low entropy, conceptually, as it is the same letter appearing multiple times; it could be abbreviated to just 5 a’s. In contrast, the sequence of letters bkawe has high entropy, with no apparent pattern, and no apparent way to abbreviate it without losing its content. Shannon’s view of information was thus as an amount of information, measured by the compressibility of some data.

Another way to think about Shannon’s entropic idea of information is through probability: if we were to observe each character in the sequences above, and make a prediction about the likelihood of the next character, the first sequence would result in increasingly high confidence of seeing another a. In contrast, in the second sequence, the probability of seeing any particular letter is quite low. The implication of these ideas is that the more rare “events” or “observations” in some phenomenon, the more information that is required to represent it."

Specifically you say "could be abbreviated to just 5 a's" that's most definitely using Komogorov complexity which perhaps would merit inclusion in the book too. The reason why this example must use Komogorov complexity is that you don't make any reference to a random variable that describes the generation of those sequences. Without the prior distribution you must then appeal to Kolmogorov complexity. With a prior distribution, the entropy is then well defined and thus provides a bound on compressability.

Also, I don't believe the 2nd paragraph I quoted above continues in a correct way either. To follow the reasoning as it's currently written, you need at least some modeled hyperparameter governing the generation of the next letter in the sequence with a distribution over the values of the hyperparameter itself. If you don't want to complicate the example one might think we could adjust the 2nd paragraph to just say that if a letter sequence was generated by drawing a sequence of letters uniformly at random, the sequence 'aaaaa' would have low entropy and 'bkawe' would have a high entropy since it seems random -- but not true! They have the same entropy!

Perhaps better would be to use an example where you toss a coin x times, and count how many heads and tails. (like this example: https://courses.lumenlearning.com/physics/chapter/15-7-statistical-interpretation-of-entropy-and-the-second-law-of-thermodynamics-the-underlying-explanation/) Then a set of tosses all heads would be very low entropy and set of tosses with about equal heads and tails would be high entropy. The key distinction is if you view it as a sequence or a set. I think H/T are easier to think of as a set.

"Elements of Information Theory" by Cover and Thomas is a book I like on Shannon information

Anyhow, thanks for putting this book up. I've been quite enjoying it so far, just though I could help on this little detail.

Write a glossary

Now that it's supported, let's define things!

On the Media “Shrill” podcast

https://podcasts.apple.com/us/podcast/on-the-media/id73330715?i=1000532354198

This was a great discussion at the intersection of information, hardware, physics, compression, culture, and policy.

preference for This / That / These / It ... throughout text

I have noticed a preference for this/these/they/it in your writing -- requiring the reader to keep track of focal subjects/ideas/objects from a clause, sentence, or paragraph earlier.

No judgement on your stylistics choice, rather -- flagging as something that could prove problematic for reader comprehension. Example from managing

He went on to frame the problem of attention allocation as an economic and management one, noting that many at the time had incorrectly framed organizational problems as one of a scarcity of information, rather than one of a scarcity of attention. Instead, he argued that in contexts of information abundance, the key problem is figuring what information exists, who needs to know it and when, and archiving it in ways that it can be accessed by those people when necessary.

Starting with "he", I look back to the previous paragraph or have Simon still in mind, very good.
(slight distraction of problem / problems - one, numerical inconsistency, moving on)

These ideas, along with Simon’s book Administrative Behavior, laid the foundation for problems of information management for the coming decades. They were taken up to shape perspectives on management in business. They were used to explain problems of advertising, in which consumer attention was the scarce resource. And they became the foundation of even personal information management problems, such e-mail spam and growing archives of personal photos.

As reader, I go back to previous paragraph to make sure that I am tracking the correct 'These'.
Pause throughout reading to make sure that I have the right 'they' -- first "they were taken up..." does not refer back to the immediately-previous plural "decades", or even "problems", but the very first "ideas." ...

"They were used" refers not to "perspectives" but ... probably back to "ideas".

By the time I encounter "...they became", I assume that you are reaching back to the original "These ideas", a callback to the previous paragraph - which takes a second to connect.

(slight distraction: "such e-email spam" -- such as, perhaps?)

Apple’s child abuse detection

https://podcasts.apple.com/us/podcast/the-daily/id1200361736?i=1000532578715

Nice analysis of tradeoffs, slippery slopes, policy

Suggested "Laws as information" Supplement

Related to the "knowing the laws" paragraph, the state of Georgia had its legal code behind a massive paywall for a hot minute. (SCOTUS said stop that with some dissenting opinions that had troubling ramifications if the vote happened to swing the other way--copyright law is just the worst). Might be too much of a tangent but it's a nice contrast to the Washington State Legislature.

revisit for clarity? [managing]

With these two kinds of organization in mind, the critical difference personal and organization information management is what information is for.

I think you're missing a 'between', and might revisit structure on this sentence, too -- didn't propose edits because I can imagine this going many different ways.

Design a unique Peruse theme

It should be something that's bright, creative, and playful.

Typo in license name

Hi there! I noticed an extra i after the v in NoDerivatives in the license at the bottom of this page: https://faculty.washington.edu/ajko/books/foundations-of-information/#/

John Snow map year wrong

In "information + science", the John Snow map has wrong year (185).

Typo in Chapter 12

I think there is a typo in the following sentence. I have bolded the issue.

"many researchers are investing new forms of decentralized moderation, such as online harassment moderation systems that use friends instead of platform maintainers or moderators."

I'm enjoying the book! Thank you for writing it

Possible Typo in Chapter 15

I am unsure how the bolded word in the following sentance fits in which makes me suspect it is a typo for silo. (It is possible that I am just unfamiliar with the term "healthcare solos"

"secure medical data, creating issues of poor data interoperability between health care solos,"

Cite cyberbullying survey

https://dl.acm.org/doi/abs/10.1145/3424246

This has a nice collection of techniques for detecting and a discussion of their limitations.

Chapter 18 Typos

Missing either a conjunction or a third item to enumerate
"(e.g., by deprioritizing accessibility, locking out people with disabilities from key information resources)."

License Clash?

I notice you are using CC0 on the repository, a quit-claim to sort-of force the work into the public domain. For those who want to rely on copies/adaptations of the material, some might want to establish the provenance of such work. I wonder how that would best be done. Do you have a suggested way?

My original reason for this issue though, is to note that the CC0 quit claim is different than the CC-By-No-Derivatives that appears at https://faculty.washington.edu/ajko/books/foundations-of-information/#/

Automated AI interviewing podcast

https://podcasts.apple.com/us/podcast/in-machines-we-trust/id1523584878?i=1000528104144

Lots of hype
Poor explainability
Perpetuating human AI bias

This next episode covers more about disability bias:

https://podcasts.apple.com/us/podcast/in-machines-we-trust/id1523584878?i=1000529551729

Inconsistency/typo in first paragraph

The intro begins

My great grandfather was a saddlemaker

and then later says

His children—my great grandfather and his siblings

Am I misunderstanding the relationships here or is this a typo? Either the first family member should be "great^2" or the second "great^0".

Incoherent passage, Chapter 1

Third to last paragraph: "But the power of information, therefore, derives more from its meaning, its context, and how it is received and used, and less from how it is transmitted or received." "Received" is used here for opposite cases, and "received and used" sounds almost the same as "transmitted or received." You seem to be talking about the difference between meaning/function and technology, but it's not at all clear what these terms designate and whether you are simply making the same point as below (challenging "the medium is the message").

Address epistemology

Epistemology is central to defining knowledge, but the knowledge chapter only briefly mentions it and it doesn't come up anywhere else in the book. Think about how to engage it and where.

Can foundations of information...

....be downloaded as a single file ?

Attribution studies

It might be interesting to address attribution studies in the sustainability chapter. They're an interesting use of data to understand regional climate change impacts. Here's a nice podcast about them:

https://slate.com/podcasts/what-next-tbd/2021/07/climate-change

minor edits: "it's" ch 1 + 3

Both bolded below (emphasis mine) should be sans apostrophe

ch1

Throughout, we shall see that while information is powerful in it’s capacity to shape action, and information technology can make it even more powerful, information can also be perilous to capture and share without doing great harm and injustice.

ch3

From a process perspective, the door itself is not information, but particular people in particular situations may glean different information from the door and it’s relation to other social context about its meaning.

Caption Typo in Chapter 5

In chapter 5 there is a picture with the caption "Google’s most recent quantum computer can perform a task that currently takes 10,000 in a few minutes. Credit: Google."

The units for 10,000 are missing.

Add right to repair podcast

https://my.slate.com/subscriptions/podcast/single/136860/59/9ba76acbc5e5e08754d913f5f19aa1bab5ca65cdbc

Tensions between security, design, sustainability

"was replaced speed"

In the great prologue of the information (a.k.a knowledge?) chapter, I get what you mean, based on the following passages. I stumbled on "the wonderful anticipation that came with having to wait for information, was replaced speed" though. displaced/supplanted/forfeited by immediacy?

Minor typo in first chapter

In the section "The illusory power of technology", cause is misspelled (its spelled as cuase). Just noticed it while I was reading, great book!

On the Media Algorithmic Bias

https://podcasts.apple.com/us/podcast/on-the-media/id73330715?i=1000533959461

Talks about algorithmic bias, data bias, corporate responsibility, policing.

Integrate ideas of Bowker and Star

Bowker, G. C., & Star, S. L. (2000). Sorting things out: Classification and its consequences.

At a minimum, cite it, but probably also address it early on in relation to knowledge organization.

Link to Native DNA podcast

There was a really interesting podcast on the clash between Western uses of DNA for science and Native American perspectives on nature and intervention:

https://podcasts.apple.com/us/podcast/the-experiment/id1549704404?i=1000540009072

To the extent DNA is an information topic, and issues of consent and values around DNA information are information topics, it very much belongs on the book.

Radiolab Mixtape mini-series

This would be an excellent podcast to cite: it talks about information from a media perspective and weaves history through its telling.

https://podcasts.apple.com/us/podcast/radiolab/id152249110?i=1000539403379

Interoperability

In Chapter 15 you mention data interoperability. I think its an interesting topic that is worth a definfition (i.e., you define many other terms that are more common, like capitalism). I would also argue that a small aside on data interoperability is worthwhile. Perhaps in the context of how many of our current information systems came to be by leveraging interoperability and then slamming the door behind them with prickly EULAS noninmally backed by Computer Fraud and Abuse Act. Cory Doctorow has written a bit about adversarial interoperability recently which may prove to be a good citation if you want to add this stuff to your book.

A nice podcast on health IT + AI

https://podcasts.apple.com/us/podcast/in-machines-we-trust/id1523584878?i=1000544280532

It covers speech recognition for people with impairments and provider conversation monitoring.

Link to the published form

I lost the URL to the published form, although GitHub renders the chapter pages pretty nicely. Maybe in the README.md ?

Podcast: Inequities in Online justice

https://my.slate.com/subscriptions/podcast/single/136860/59/9ba76acbc5e5e08754d913f5f19aa1bab5ca65cdbc

Discusses shift to online, its fragile implementation, and the consequences at the margins.

Typo in the first chapter

Hi!

In the 1st chapter of the book, in the section describing how information teaches us, there is a simple typo : in the sentence "[...]through application programming interfaces that facilitate time and date arithemtic" the last word should be arithmetic I guess.

Thank you for the book!

On the Media classification

Talks about eugenics, classification, Dewey. https://podcasts.apple.com/us/podcast/on-the-media/id73330715?i=1000534268848

Podcast: When data should be available to journalists

https://podcasts.apple.com/us/podcast/on-the-media/id73330715?i=1000530559504

It was very much about the power and peril of information, and taking information out of context.

Employ GitHub Discussions too

I recommend that you enable the new Discussions feature on this project repository.

That might be too meta, so one might want to introduce Discussions with a statement of the desired domain of discussion.

Discuss random ware

Define the concept, perhaps add this podcast:

https://www.nytimes.com/2021/06/08/podcasts/the-daily/colonial-pipeline-jbs-ransomware-attacks.html

iPhone origins

This is a nice podcast that covers some of the history and impact of the iPhone. It completely ignores the research history, and only glosses over social impacts, but it does reveal some of the Silicon Valley players' perspectives:

https://podcasts.apple.com/us/podcast/land-of-the-giants/id1465767420?i=1000536247980

Information management chapter is vague

There was some vague feedback from students that the information management chapter was more confusing and abstract. Try to make it more interesting and concrete?

minor edit: capitalize U.S. constitution (ch4)

constitution -> Constitution

This census, mandated by the U.S. constitution, is a necessary part of determining many functions of government, including how seats in the U.S. House of Representatives are allocated to each state, based on population, as well as how federal support of is allowed for safety net programs.

Date incorrect in Chapter 1 The power of information

Aloha,

You write, "In February and March 2019, an international community of American and Chinese researchers worked together to model, describe, and share the structure of the SARS-CoV-2 spike." which I ought to be February and March 2020. SARS-CoV-2 hadn't yet been identified in early 2019.

Thank you for writing this entire "book."

missing word in Chp 16

"Therefore, even in the U.S., which is regarded as having some of the strongest speech protections, has limits, and political speech is included in these limits."

Likely intended to be "speech has limits".

Facebook whistleblower on prioritizing people over profits

It's full of good examples of amplification, opportunities for adding friction, and the tensions between capitalism, safety, wellbeing, and democracy.

https://www.nytimes.com/2021/10/06/podcasts/the-daily/facebook-whistleblower-frances-haugen.html

Chapter 17 Typos

Missing comma between "British Colombia" and "California" in paragraph 2:

"... the United Kingdom, Alberta, British Columbia California, Missouri ... "

and "events" is misspelled as "evens" in paragraph 3 of the "Managing Crisis Events" section:

"Throughout such evens, particular individuals ..."

Chapter 16 Typos

Missing a space

"no organized role in society, since there is no society.Such a system maximizes"

Consider changing how you mention Pres #45
"on January 8th, 2021, Twitter permanently suspended the account of President Trump after he incited mob violence on the U.S. capital."

According to this source former President [last name] or Mr. [last name] are both acceptable. I've seen venues do both. Obviously, this is a stylistic change and kinda nitpicky but I figured I would bring it up anyway.

suggestion: seminal -> foundational (or etc.)

Perhaps consider "foundational" (or formative, important, key, vital, central, earth-moving, game-changing, ground-breaking [or other noun-gerund combinations] ... ) instead of seminal?

Four appearances in text:
ch 3

In his seminal work, he linked information to the concept of entropy from thermodynamics.

ch 4

This history of analog encodings directly informed the digital encodings that followed Shannon’s seminal work on information theory.

ch 9

What determines when information management becomes necessary? The answer to this question goes back to some seminal work from Herb Simon, who said in (yet another) seminal work, Designing Organizations for an Information-rich World

amyjko / foundations-of-information Goto Github PK

foundations-of-information's Introduction

foundations-of-information's People

Contributors

Stargazers

Watchers

Forkers

foundations-of-information's Issues

Recommend Projects

Recommend Topics

Recommend Org

Jobs