GithubHelp home page GithubHelp logo

archive-program's People

Contributors

1999azzar avatar aayu24 avatar abdulkhadhar avatar erkinalp avatar harmon758 avatar haykam821 avatar hongqn avatar jcwebhole avatar jodyswartz avatar karaketir16 avatar kmchris avatar lgutter avatar mage1k99 avatar marcauberer avatar miguellopes avatar nishkarshraj avatar noctisatrae avatar rezendi avatar sannidhya-kushwaha avatar schaecsn avatar shahzain-lab avatar shawndavenport avatar simenfjeldolsen avatar simonw avatar slord6 avatar sourangshughosh avatar v3lop5 avatar vidyabhandary avatar viswamvs avatar zuberdunge avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

archive-program's Issues

Guidelines

Here are some ideas/guidelines to evaluate books about technology (but not literature):

  1. General principles instead of specific technologies

    • Example: "Relational Databases" instead of "MySQL" or "PostgreSQL"
    • Exceptions: technologies that became standards with multiple implementations: C, UNIX, etc.
  2. Technical instead of pop science

    • Example: "The Elements of Computing Systems" instead of "But How Do It Know?" (which I assume is intended for a general audience)
  3. Comprehensive instead of partial

    • Example: "The Art of Computer Programming" instead of "Everyday Data Structures"
  4. Unique instead of redundant

    • We don't need several books about each topic.
    • If you have to pick one, pick the most general, technical and comprehensive :)
  5. Optional, but recommended: include official standards when possible


I also propose the following changes in the structure:

  1. Split "Fundamentals of computing" and "the Internet":
    • "Fundamentals of computing"
    • Move "the internet" to "Networking"
  2. Split "Compilers, Assembler and Operating systems":
    • "Compilers and Interpreters"
    • "Operating Systems"
    • Move "Assembler" to "Programming Languages".

With that in mind, here's a shorter list of books for sections 1 to 7:

1. Fundamentals of computing

  • Code: The Hidden Language of Computer Hardware and Software by Charles Petzold (Pearson Education)
  • The Elements of Computing Systems: Building a Modern Computer from First Principles by Noam Nisan (MIT Press)

2. Algorithms and data structures

  • The Art of Computer Programming by Donald Knuth (Pearson)

Optional:

  • Introduction to Algorithms, by Thomas H. Cormen (MIT Press)

3. Compilers and Interpreters

  • Compilers: Principles, Techniques, and Tools by Alfred V. Aho, Ravi Sethi, Jeffrey D. Ullman (Addison-Wesley)
  • Modern Compiler Implementation in C by Andrew W. Appel, Maia Ginsburg (Cambridge University Press)
  • Structure and Interpretation of Computer Programs, by Abelson, Sussman, and Sussman (MIT Press)

Optional:

  • Lex & Yacc by John R. Levine, Tony Mason, Doug Brown (O'Reilly)

4. Programming Languages

  • Assembly: Programming from the Ground Up by Jonathan Bartlett (GNU Free Documentation License)
  • Forth: A sometimes minimal FORTH compiler and tutorial for Linux / i386 systems, by Richard W.M. Jones (PUBLIC DOMAIN)
  • C: C Programming Language, by K&R (Pearson)
  • C++: C++ Programming Language, by Bjarne Stroustrup (Pearson)
  • Scheme: How to Design Programs, by Matthias Felleisen (MIT Press, Creative Commons CC BY-NC-ND)

Optional:

  • Javascript: JavaScript: The Definitive Guide, by David Flanagan (O'Reilly)
  • Python: ?
  • Java: ?

5. Operating systems

  • Modern Operating Systems by Andrew S. Tanenbaum (Pearson PLC)
  • Operating Systems: Design and Implementation by Andrew S. Tanenbaum (Prentice Hall)
  • Operating System Concepts, by Abraham Silberschatz (Willey)
  • Advanced Programming in the UNIX Environment, by W. Richard Stevens (Addison-Wesley Professional)
  • The Art of Unix Programming by Eric S. Raymond (Addison-Wesley, but also available under a Creative Commons license)

Optional:

  • The Linux Programming Interface, by Michael Kerrisk (No Starch Press)

6. Databases

  • A Relational Model of Data for Large Shared Data Banks, by E.F.Codd (IBM Research Laboratory)
  • An Introduction to Database Systems, by C.J. Date (Pearson)
  • Fundamentals of Database Systems, by Ramez Elmasri (Pearson)

7. Networking

  • Cabling: The Complete Guide To Copper and Fiber-Optic Networking by Andrew Oliviero and Bill Woodward (Wiley)
  • Ethernet: The Definitive Guide by Charles E. Spurgeon and Joann Zimmerman (O'Reilly)
  • TCP/IP Illustrated (volumes 1-3), by Richard Stevens (Addison-Wesley Professional)
  • DNS and BIND by Cricket Liu and Paul Albitz (O'Reilly)
  • HTTP: The Definitive Guide by David Gourley, Brian Totty, Marjorie Sayer, Anshu Aggarwal, and Sailu Reddy (O'Reilly)
  • Computer Networks, by Andrew S. Tanenbaum (Pearson)

Before adding a book, ask yourself: is book about general principles? Is it technical? Is it comprehensive? Does it provide unique information?

[Suggestion] Publish the tech tree

It would be nice to publish the tech tree reel at some point, if possible in any way. It obviously is too late now to change anything on it, but it surely is still an interesting resource.

Separate "Fundamentals of computing" and "Internet"

The books about "Internet" should be moved to "Networking and connectivity":

Fundamentals of computing

  • The Pattern On The Stone by W. Daniel Hills (Basic Books)
  • But How Do It Know? by J. Clark Scott (John C Scott)
  • Code: The Hidden Language of Computer Hardware and Software by Charles Petzold (Pearson Education)
  • The Elements of Computing Systems: Building a Modern Computer from First Principles by Noam Nisan (MIT Press)

Networking and Internet

  • Cabling: The Complete Guide To Copper and Fiber-Optic Networking by Andrew Oliviero and Bill Woodward (Wiley)
  • Ethernet: The Definitive Guide by Charles E. Spurgeon and Joann Zimmerman (O'Reilly)
  • Understanding TCP/IP by Alena Kabelová and Libor Dostálek (Packt)
  • (...)
  • Tubes: A Journey To The Center Of The Internet by Andrew Blum (HarperCollins)
  • Introduction to Networking: How the Internet Works by Charles Severance, illustrated by Mauro Toselli and Aimee Andrion (Charles Severance)

Add Turkish translation

Turkic languages are spoken by 250 million people. Turkish is the most commonly spoken Turkic language, with 90 million speakers. Going to provide the translation myself in the near future.

2D Barcode Format

Is the code used to generate the frames available somewhere? Is that a standardized project?

cc @craSH

Web development – choice of frameworks

I'll be the first to admit I'm biased, but is there a reason why the list would favor Flask over Django, particularly when the latter is arguably more popular, robust and mature?

I see Ruby on Rails is favored over Sinatra, so size / scope doesn't seem to be the deciding factor.

Another Archiving?

Just wanted to find out if this will be the only archive program or there will be another one and there will be another one, when?

Neutral language not opinionated

The tone should be descriptive and not opinionated. Other than exposing the personal views of the anonymous authors statements such as the following should play no part.

Examples (emphasis mine)

a clumsy but straightforward language

a very cryptic, limited, but fast and powerful family

Delays?

Hello! Is this project suffering some delays due to coronavirus pandemic? Just curious. I hope everybody is fine. Thanks.

(Could repos with tag 'coronavirus', and related, be included after you started the snapshot?)

Commit/change history

It's stated in the FAQ that:

The snapshot will consist of the HEAD of the default branch of each repository,

And the guide confirms:

However, in order to save space, this archive's repositories generally do not include git histories.

I understand the need to save space, but I want to point out that in some cases, the change history can be extremely important for understanding why code was written.

As an example, in the course of my work, I regularly need to read and understand the source code of the Linux kernel. A great deal of information about how the code works, and how and why it came to be designed and organized the way that it is, is found in the commit history.

In any large project that has matured over many years and seen many mistakes made, bugs fixed, edge cases covered, and lessons learned, all of that knowledge is in the commit logs. Many a developer who has worked on someone else's code has cried out to themselves, the universe, or their favourite deity, "What the hell is this line for? What on earth were they thinking?". And the first (and sometimes only) place to look for answers is the commit logs.

I know there are storage costs and practicality to consider, but I would strongly suggest that commit history be preserved, at least for some projects. Possible heuristics could be importance/popularity and complexity (lines of code?). I don't presume to know enough to tell you how to manage the archive, but as a developer, I can tell you that some codebases almost cannot be worked on without commit history.

The matter has been escalated to private discussions.

Thank you for your participation. For history record the matter regards such questionnable practices "For the GitHub Arctic Code Vault, we are unable to remove data that has already been stored." enforced without any previous knowledge or notification to the end-user and here customers.

Factual error

This is a factual statement which is incorrect

This is known as the closed source model, and, historically, was the early, crude approach to software development.

Code was all open before it was later closed and then often open again.

Source https://en.wikipedia.org/wiki/History_of_free_and_open-source_software

In the 1950s and 1960s, computer operating software and compilers were delivered as a part of hardware purchases without separate fees. At the time, source code, the human-readable form of software, was generally distributed with the software providing the ability to fix bugs or add new functions.[1] Universities were early adopters of computing technology. Many of the modifications developed by universities were openly shared, in keeping with the academic principles of sharing knowledge, and organizations sprung up to facilitate sharing.

Were deleted repositories archived.

For example, if I created a repository after 2019/11 and before 2020/02, then I have deleted it recently (such as a month ago), were the repository archived?

Few suggestions

A few possible improvements:

  • Source code of open source software

    Open source software is made available to any and all who want to use it, at no cost, so they can in turn improve it, or use it to build something new and better.

    It might be better to explicitly mention that it is the source code that is made available to all, like so: Source code of Open source software is made available to any and all ...

  • Open source software project

    An open source software project

    Excuse me for the nitpick. We define this as open source software project and then refer to it as open source project in all other places. May be its better to just define it that way.

  • GitHub not defined
    GitHub seems to be mentioned in a dozen places but I don't see any clear definition as to what GitHub is in a way similar to how other things such as computer, Git etc. are defined. It would be nice to define it somewhere.

Error: Cannot set property 'innerHTML' of null

Error in main.js:

Uncaught TypeError: Cannot set property 'innerHTML' of null
    at showRemaining (main.js:247)

Stack trace line:247

showRemaining @ main.js:247
setInterval (async)
bind @ main.js:262
init @ main.js:223
(anonymous) @ main.js:14
l @ jquery-3.3.1.min.js:2
c @ jquery-3.3.1.min.js:2
setTimeout (async)
(anonymous) @ jquery-3.3.1.min.js:2
u @ jquery-3.3.1.min.js:2
fireWith @ jquery-3.3.1.min.js:2
fire @ jquery-3.3.1.min.js:2
u @ jquery-3.3.1.min.js:2
fireWith @ jquery-3.3.1.min.js:2
ready @ jquery-3.3.1.min.js:2
_ @ jquery-3.3.1.min.js:2

Property:

document.getElementById("countdown").innerHTML = ""; // null

Use plain text (.txt) file instead of markdown

I think this guide should be a plain txt file that can be read and understood easily. GitHub would easily parse markdown to readable text. But what would happen when someone reaches the code vault in an age when there are no such parsers? 🤔

License concept

The guide says nothing about the concept of license. Since this is about open source, and most of the open source projects are released under one of the currently available open source licenses, I think it would be useful to show an overview of them (or at least of the very concept).

Replace Javascript titles

Instead of:

  • Learning JavaScript by Ethan Brown (O'Reilly)
  • Mastering JavaScript Functional Programming by Federico Kereki (Packt)

I recommend the following books:

  • JavaScript: The Definitive Guide by David Flanagan (O'Reilly)
  • JavaScript: The Good Parts by Douglas Crockford (O'Reilly)

Wikis and other knowledge projects

Hello! I just read in your blog post[1] that you will add a snapshot of Wikipedia to the vault. It's great!

I created some time ago two repos when I learnt about the GitHub Vault, Wiktionary[2] and Wikispecies[3]. They include XML dumps of those wiki sites, only current versions for pages, and some metadata (date, username of last edit, etc).

Are you going to include in the wiki snapshot only Wikipedia? Only English? Full history or current versions only?

Other interesting projects which could be added, if there aren't issues (copyright or other), are: Project Gutenberg (60,000 books), Open Library (index of all known books) and Rosetta Project.

Thanks!

[1] https://github.blog/2020-02-03-the-arctic-code-vault-starts-production-and-your-open-source-projects-are-being-archived/
[2] https://github.com/emijrp/dictionaries-timecapsule
[3] https://github.com/emijrp/species-timecapsule

Localizing the Guide to different languages

While open source software is written by people from all over the world, "it is not guaranteed that the inheritors of this archive will know English," as mentioned by the guide.

The guide also serves as a great introduction to programming and data storage. Through translation, we can make it more accessible not only to the Vault's future handlers, but also to the general public interested in computer principles.

Volunteers

Language GitHub user
Bulgarian @sahwar
German @marcauberer
Romanian @vladfrangu
Simplified Chinese @xyx0826
Spanish @erubio0

Book about Data Compression?

I don't know how comprehensive the documentation of the XZ format included on every reel is, but if it is just the C source code of xzdec or a similar tool, it would be prudent to include some book about data compression concepts (huffman coding, arithmetic coding, LZ77, etc.). While it should be possible to port any program from any language to any other language without understanding why it works, this is certainly easier when you have some background what the program is doing.

Separate Compilers, assembler, and operating systems

Compilers, assembler, and operating systems are three different topics.

Assembly should be part of Programming languages (see #73).

The other should be split in two topics:

Compilers and Interpreters

  • Lex & Yacc by John R. Levine, Tony Mason, Doug Brown (O'Reilly)
  • Compilers: Principles, Techniques, and Tools by Alfred V. Aho, Ravi Sethi, Jeffrey D. Ullman (Addison-Wesley)
  • Modern Compiler Implementation in C by Andrew W. Appel, Maia Ginsburg (Cambridge University Press)
  • Structure and Interpretation of Computer Programs, by Abelson, Sussman, and Sussman (MIT Press)

Operating Systems

  • Modern Operating Systems by Andrew S. Tanenbaum (Pearson PLC)
  • Operating Systems: Design and Implementation by Andrew S. Tanenbaum (Prentice Hall)

And maybe:

  • The Art of Unix Programming by Eric S. Raymond (Addison-Wesley, but also available under a Creative Commons license)

Theory of Computation and Complexity Theory

There are no books about computation models and complexity theory.

If there is no space for many books I suggest these four:

  • Introduction to the Theory of Computation by Sipser: an undergrad introduction to the Theory of Computation.
  • Computational Complexity: A Modern Approach by Arora and Barak: a more advanced (grad) book about Complexity Theory covering many theorems and topics.
  • The Annotated Turing: A Guided Tour Through Alan Turing's Historic Paper on Computability and the Turing Machine by Petzold: what the title says. It also gives some context about the paper and a little bit about Turing's life.
  • Computers and Intractability: A Guide to the Theory of NP-Completeness by Garey and Johsnon: a catalog of NP-Complete problems.

All of them are self-contained with sections or appendices covering the basics. They could be added to the "Fundamentals of computing and the Internet" section or the "Algorithms and data structures" section.

End frame inconsistency

So, using the hypothetical CPython example, the item in this list with the ID 12345 might have a start frame of 054321, a start byte of 03210321, an end frame of of 054545, and an end byte of 12321232.

This specifies the end frame as 54545, but in the next paragraph:

Decode all frames from the start frame, 54321, to the end frame, 54544

54544 is referred to as the end frame.

Is this because the value given for the end frame is exclusive and is actually the value for the frame after the last frame in the range? or is this just a mistake in the guide?

If it's the former, an explanation in the guide to clarify the inconsistency might be necessary.

Can we buy one little slice frame of the copy of the Archive Reel?

  • How much does it cost to produce a frame of the reel?
  • Can we buy one slice of that as a souvenir? Does this have a plan?
  • Can the souvenir record a short Hello World programme without compression? So that the buyer can try to use a source of illumination and some kind of magnifier to read the code directly, kind of fun.
  • Is the craftsmanship have a private patent? Can other factories imitation this?

Request for disclaimer about private/sensitive info

This is a great project and I can understand its significance. However, I would like to point out that several people may (unknowingly) have private and/or sensitive info recorded in their repositories which may be archived. While this information is likely publicly accessible and may have been scraped by others, it would still serve a user well to be able to delete said info when they so choose. Now that this information may be archived, users may suddenly want to double check and/or opt-out.

Of course, GitHub's Privacy Policy clearly notes that:

If you choose to store any Sensitive Personal Information on our servers, you are responsible for complying with any regulatory controls regarding that data.

but it is very likely that people have not read the Privacy Policy and it would thus be helpful to add a small disclaimer that sensitive information may be present in a user's repository and they may want to review it. Again, I understand that this is the user's responsibility and they should have dealt with it in the first place, but a small nudge in the right direction would be immensely helpful.

Programming languages

The current list of programming languages is quite... arbitrary.

I suggest the following languages based on their historical importance and current popularity:

  1. Python
  2. Java
  3. Javascript
  4. C
  5. LISP (*)

(*) "Lisp is worth learning for the profound enlightenment experience you will have when you finally get it; that experience will make you a better programmer for the rest of your days, even if you never actually use Lisp itself a lot." (Eric S. Raymond)

Other languages that could be mentioned:

  1. C#
  2. Swift
  3. Kotlin
  4. Go
  5. Rust
  6. Scala
  7. Lua
  8. Perl
  9. Julia
  10. Haskell

The tech tree does not leave enough information to rebuild civilation from collapse.

What follows, which we call the Tech Tree, is a selection of works intended to describe how the world makes and uses software today, as well as an overview of how computers work and the foundational technologies required to make and use computers

I dont think the current tech tree is nearly focused enough to archive "how computers work and the foundational technologies required to make and use computers"

Let's look at some examples.

Compilers, assembler, and operating systems

All of the three compiler book cover building a compiler for a low level imperative language (essentially C). There is no compiler about functional programming language (SML/Haskell), no compiler for declarative language (datalog), or for high-level dynamically typed language (smalltalk). Furthermore there is no book that talk about SAT/SMT solving, nor Garbage Collection, which is the foundation for compiler and programming language (C Compiler use graph coloring (NP complete) for register allocation).

Programming languages

All of the book only talk about contemporary programming language, with zero books about how to design a language. How does one design the type system to ensure safety, without sacrificing too much expressiveness, meanwhile allowing efficient type checking/inference? How do you manage different kind of effect (e.g. reference, concurrency, probability, nondeterminism, exception)? How do you design the language such that it contain a few simple yet universal construct, instead of having lots of ad hoc construct, and quickly becoming and overly-complex language (see Gedanken and Scheme)?

Scientific computing

Scientific Computing workload often consists of
0: lots of domain knowledge
1: ran with numerics algorithm (e.g. finite element methods)
2: optimization for superomputer

I didnt see any of them.

Machine learning

There are five books on deep learning, with no book on bayesian method/probabilistic graphical model/symbolic methods/classical machine learning.

Even if this is just for deep learning, it is still not enough - there is no mention on how to rebuild deep learning framework.

Recursion Error

Arrording to my calculations, RecursionError: maximum recursion depth exceeded

Concepts of time

In many places in this document phrases are used which relate to time or the passage of time. These concepts are artificial constructs that may be meaningless in the future.

For example (emphasis mine)

Most modern languages include libraries of pre-written functions, and such libraries can be very voluminous and elaborate. Some of today's most popular programming languages include:

C, one of the oldest and fastest

Replace implementation-specific titles

Replace the following titles:

  • Learning MySQL and MariaDB by Russell J. T. Dyer (O'Reilly)
  • PostgreSQL Development Essentials by Manpreet Kaur, Baji Shaik (Packt)

With a single title about data modeling and database design:

  • Database Design for Mere Mortals: A Hands-On Guide to Relational Database Design, Michael J. Hernandez (Addison-Wesley Professional)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.