GithubHelp home page GithubHelp logo

bfeist / apollo17 Goto Github PK

View Code? Open in Web Editor NEW
34.0 7.0 5.0 1.55 GB

Apollo 17 Mission Timeline Reconstruction

Home Page: http://benfeist.com

License: GNU Affero General Public License v3.0

Python 2.44% JavaScript 23.06% CSS 2.20% HTML 69.24% PHP 3.06%

apollo17's Introduction

This is a project to reconstruct the Apollo 17 mission timeline in order to generate digital, corrected transcripts of the entire mission. From these transcripts I hope to create an interactive web experience of the entire mission.

Website on this effort: http://benfeist.com/project-apollo-17/

Original post on this project: https://groups.google.com/forum/?fromgroups#!topic/spacelog/H7D4UiLfhPo

Feel free to contact me if you're interested in this project or the output. Ben Feist ([email protected])

"MISSION_DATA" folder: The main source of authoritive transcript information that is the product of this project can be found in "A17 master TEC and PAO utterances.csv"

"! Previous Steps" folder contains files that were used for each step of the transcript digitization. The steps were:

  1. Convert source images (complete)
  2. OCR Tecnical Air-to-ground (TEC) transcript into dirty CSV tables (complete)
  3. OCR Public Affairs Office (PAO) transcript into dirty CSV tables (complete)
  4. Process CSV OCR output with Python scripts (multiple one-time operations to clean various issues) (complete) (more here http://benfeist.com/digitizing-apollo-17-part-5-python-processing/ )
  5. Reconstruct entire mission timeline in Adobe Premiere laying in air-to-ground audio from Internet Archive and television video from NASA History office. (complete - pending last 5% of source material digitized by JSC Audio Lab). (more here http://benfeist.com/digitizing-apollo-17-part-6-timeline-reconstruction/ )
  6. Listen to reconstruction timeline. Correct transcript of each utterance including timestamp, transcriptions errors from 1972, and OCR errors. (complete) (more here http://benfeist.com/digitizing-apollo-17-part-7-listening-in-real-time/ )
  7. Render all Premiere Pro video segments that were created for timecode purposes, and upload all 39, 8 hour segments (125GB) to YouTube. YouTube Channel containing these videos: https://www.youtube.com/channel/UC3pGYbJCfrINT1DNBJMxC2Q/videos
  8. Generate HTML transcript from corrected utterance CSV. (complete)
Current Status: Future Steps:
  • Generate HTML for AFJ
  • Generate MC output for Spacelog.org

The "_Website/_webroot" folder contains the apollo17.org website itself.

The "! Previous Steps/OCR/Abbyy Image OCR" folder contains a project that can be opened using ABBYY FineReader 11 Pro. This is where the body of the conversion work is being done.

The "! Previous Steps/OCR/OCR_Output" folder contains pipe-delimited CSV files that are direct outputs from FineReader. These CSV files are quite dirty, as many as 100 pages were completely misread by finereader due to the typing being tilted in the scans.

The "Processing_Scripts" folder contains Python scripts that were written to scrub the OCR CSV output. These scripts are changed often to assist with whatever portion of the cleaning process is currently being addressed. They perform tasks such as timestamp processing, checking of callsigns, merging dialog lines that are split across pages in the typewritten originals, etc.

The "_MC_Output" folder contains the output from scripts like makeTEC_MCFromRawCSV.py in the "Processing_Scripts" folder. This "MC" output is a format that's usable by Spacelog.org.

The "_AFJ" folder contains the output of all transcript data into HTML compatible with the Apollo Flight Journal.

apollo17's People

Contributors

bfeist avatar tedtedson avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

apollo17's Issues

Feature Request: Input GET

It'd actually be kind of cool to be able to input the Ground Elapsed Time on the frontpage to get to a particular moment in the stream.

This'd be useful to anyone who has time to watch only occasionally but wants to see everything, like me, that take notes of where they left off.

Apart from that it could be used to share funny moments, like collecting up the funniest time codes regarding the on-board scissors or quotes that would make good samples for music or whatever.

Some more possible corrections

I'll try to heave these in in larger batches, even at the risk that someone else caught these. Been kinda busy, might have missed a bunch, the usual disclaimers ;)

229:15:11 the news reading; tut the report -> "but the report"
president Kixon -> "president Nixon"
assignment nay be -> "assignment may be"
John Scaii -> (sounds like) "John Scali" and also A3C -> ABC
. a format -> ". A formal"
Major Joe Dennis -> (I'm sure he said) "Mayor Joe Dennis"
Mrs Kart is currently -> "Mrs Hart is currently"
(Also if the plane type is spelled "single-engine" in English, there's a singleengine)

229:28:20 reao -> "read" but audio's missing for that.
229:29:05 catsup -> "ketchup", right? and gravy5 chocolate bar seems like a typo
229:29:05 catsup -> "ketchup", right? or are both correct spellings?

229:52:01 then and then. -> "them"
229:55:53 trie -> "the" (there could be more of these, seems like it could be a common one)

232:15:53 awhile -> I think it's "a while" in English but not sure

233:21:30 interestea -> "interested"

233:29:29 you-do -> That dash probably shouldn't be there

233:32:04 alor.g -> "along", indisinguishable . -> "indisinguishable." (without an extra space)

233:34:07 lateral or echelon structure I could swear he says "on" but I'm not a geologist nor is this my first language ;)

233:46:45 47:15s the "s" is probably a mistake

234:01:09 Looks like the font reads R-l but he says "R-1"

235:39:01 Gordy. is -> "Gordy. Is"

236:22:29 it's path back -> "its path back"

236:57:14, 236:57:59 call sign/name/whatever missing

236:59:21 lunan -> "lunar"

237:00:29 Sure he says "DELTA-VGx" or something, and "DELTA-Vc"

237:02:07 far aide -> "far side"

237:04:47 ana apparent -> "and apparent"

237:10:08 point doesn't he say "pointing"?

237:15:45 sounds more like Evans than Schmitt

Potential breakage

Noticed the lines starting at 43:30:40 don't seem to be in sync; it's followed by 43:36:28 and then 43:31:47.

Also, not sure if this is too nitpicky, but "ROOSA" looks odd in all caps there.

Strange echoes?

Haven't got past circa 121:48 yet, but noticed there's duplication in the soundtrack there.

Is that in the original material, some kind of echoing effect, or an edit glitch?

Cheers!

Duplicate timestamp

Noticed this at 086:27:42, there are two lines with the same time stamp. In the transcript Mission Control asks to toggle the UV and then asks to copy a voice check, but the second one isn't heard until 086:32:32 or so.

Wish I had the hard drive capacity currently to do a git clone, maybe I can do it later on other hardware if I have time, but it should be almost a Python one-liner to check for duplicates in the transcript data. Is this possible for you to do?

Thanks!

Last ones

Been very very hectic lately for me but I finally got through to the end, and boy was it great :D

Hope I didn't miss anything, I was taking very quick notes for a long time and then had the fortune of watching the recovery almost in one go, but here's something for you to check:

301:10:01 BUSS/dumping slash character correct?

301:18:15 the comma

301:38:23 pago pago -> pango pango

303:08:25 misplaced, on purpose? after the 203:43:06 bunch

even without audio 303:07:25 fi ne -> "fine"

303:36:40 and others, a stereo mix would be nicer than overlap/cutoff but these were there before and I guess not doable?

303:47:32 wasn't there another change like "TAPE RECORDER's" with a lowercase s, because only the certain nouns are in all caps?

303:45:15 "6-1/2 g's" I think most of the "six and a half" style utterances have been without dashes

304:01:54 I see it! it's -> "I see it! It's"

304:07:19 amont -> "among"

304:12:23 it's -> "It's"

304:15:01 R-1 and R-2 are correct in this instance? The call signs do get a bit random later

304:19:01 Tyconderoga -> "Ticonderoga"

304:24:21 is that on time? -> "Is that on time?"

304:33:07 Okay,(garble) would be clearer with a space after the comma

304:33:10 (garble) -> "(Garble)" there's some inconsistencies with this later on you may want to check out, maybe change 304:32:45 to lowercase for less effort

304:33:41 Chopper -> "chopper" (not a name, right?)

304:35:25 and 304:35:39 3-1/2 miles -> "3 1/2 miles" (unless most are with dashes but then the others may want attention)

304:36:34 Does SPKR whoever actually end with "keep cranking the valve"? Does that make sense?

304:38:59 Not sure what's going on here, it's as if there's (garble) missing there

304:46:12 collar's -> "collar is"

304:49:49 GARBLE -> (garble)

304:56:45 or so, the Public Affairs transcript is missing

305:34:33 is mispaced between 304:56:38 and 304:57:21

304:57:21 motor whale -> "motor whale boat"

305:15:22 Astronauts -> "astronauts"

305:22:11 was -> "is" as he says

305:24:56 placque -> "plaque", right? Not sure about the spelling, but this was inconsistent earlier

305:32:55 Another inconsistent 6-1/2 years ago

205:35:24 After "stiff-upper-liping it" he says "with talk of Skylab" but that's not in the transcript

Thanks a million for creating this show!

Wonder if it caught the eye of Cernan and Schmitt and whoever may still be alive who contributed to the mission :)

Timeline links

Unfortunately I can view this only in the background at work and sometimes I have to skip around to stay even remotely in sync (as I can't sync my life to their rest period) but I noticed Evans and his scissors 069:33:00 popped up at some point.

Now I'm getting to the ALFMED part, noticing that if I hit the 068:03:55 image, it skips forward to Evans and his scissors again.

Maybe it's about their clock time update, or maybe it's something else, but it would be nice seeing the pictures :)

Audio loss and duplicate timestamps

287:32:08 it's ultraviolet spectrometer -> "its"

287:53:02 simply -simply -> "simply - simply", formatting like all the others

288:05:51 VERB 49, -> "VERB 49."

The background static dropped off every once in a while, probably due to the source
material and I haven't really reported these, but one occurred after 289:35:31,
and it didn't recover at 289:56:02 probably because of the overlapping utterances.

The last 289:56:02 news , -> "news, "

Similarly broken at 290:02:30, no audio, but Mission Control's broken with you If,
though I can't say if there should be a period or comma there.

I think I'll give this some pause, it looks like a ton of trouble ahead with duplicate GET codes :/

(I did refresh real hard to make sure this isn't a known fixed issue, but let me know if there's
an issue on my side)

Typo in timecode

Not sure where to go looking for this exactly, but I made a note that timecode 24:12:13 says "burnped" when it should say "bumped" - an OCR error.

Love the work, btw :)

VHF OCR mistake

156:10:51 reads "Command Module VHP off" though Mission Control clearly says VHF :)

New this week

281:43:07 RET 9OK he says RET 90K

284:23:10 into space If I think that could be "into space, if" or maybe "into space if"

284:23:43 almost as if isn't is added in the transcript, and the rest would read truly indeed has really been a beginning

284:30:06 doesn't he say "as submitted"? as missing from the transcript

284:30:28 Looks like the "I uhh" is cleaned up, fine, but vex-y is not right. Looks like transcribed "very" but he might say "fairly", not sure I picked that up right

284:31:11 Evans says "all of the vehicles", transcript says all the vehicles

287:02:26 FAN 2 -> "H_2 FAN 2"

Been the busiest week ever, don't think I covered a lot of hours at all :D Let's see how long it takes for me to watch through the last 20 or so hours I have remaining, probably weeks ;)

.small ones

Hi!

Noticed 166:55:42 transcribed as ".small ones", with an extra period there. You have this in the develop branch under Corrections from others so did you get this from someone? The Excel files don't seem to contain it.

At least two new transcript errors

Hi!

110:23:28 "Epic moment of ray life." ... "my life"

114:37:43 "look axound a little more" ... "around"

And I gotta say it again, it's incredible how exciting watching this can be :)

Silent video

The video is silent roughly from 184:17:30 to 186:01:44. I did not listen through all the silence, so I don't know if it's continuously like that or not.

The transcript also seems to glitch a bit at 184:15:11.

Fortunately not a whole lot seems to be happening there and maybe skipping to launch is a good thing because of all the vacation/holiday-long breaks I've had to have in watching this ;)

Question about overlapping sound/transcript

112:36 and onwards for example, as a general question, there have been Public Affairs utterances that go over something in the transcript, and then that talk kicks in. Is this in the original material and your choice to have only one soundtrack or is this configurable somewhere or what gives?

Thanks!

Audio discrepancy during the telescope video

Found one more, fortunately, though this one's pretty obvious so you may have got it elsewhere.

Watching at around 106:09 there's talk that's transcribed at 106:54:49.

How can that be, as it's repeated at 106:54? Is there an issue with the soundtrack on the Youtube video?

Updates again

Been really slow progress lately, seems that the real-time stream will soon be in sync with me :D Maybe some of these haven't been reported by anyone else:
237:48:47 Tranquillity -> "Tranquility". This is something there could be more of. Wikipedia confirms the spelling is "Tranquillitatis" for Latin, "Tranquility" for English

237:22:04 snow up -> "show up"

237:25:06 Symthii -> "Smythii"

237:28:49 taKe -> "take"

238:01:41 Negative, But that should probably be but (he goes too fast for the comma to be a period IMO)

238:02:15 Communications; Officer -> "Communications Officer"

238:08:55 annulus - =the -> "annulus - the"

239:15:02 it's sleep -> "its sleep"

243:07:41 higjh gain -> "high gain"

243:08:47 SC were you correcting these or do I remember wrong?

249:37:01 nautical miles: from Earth. -> "nautical miles from Earth"

251:17:45 It's net really -> "It's not really"

251:58:11 ve're working -> "we're working"

252:30:01 CMPs -> "CMP's" (right?)

252:37:29 Borth Vietnam -> "North Vietnam", thi.-President I'm quite sure he said "the President", supervisee -> "supervised"

At 254:54:39 the transcript didn't move forwards, similar at 255:02 - there was more of this around Evans' EVA

254:55:40 Pretty sure it should read Once you've

255:32:48 .Okay -> "Okay"

255:43:13 in w suit -> "in the suit" (unless this is Polish or something ;)

256:28:36 loud and clear; -> "loud and clear."

257:00:16 Okay. is -> "Okay. Is"

258:05:21 . is there -> ". Is there"

New list

Been silly busy at work, but I'll just leave these here...

263:10:34 presumatly -> "presumably"

263:23:24 about one-half foot doesn't he say "about half a foot per second"

263:38:59 right.now. -> "right now."

264:37:02 the data, replayed imo he says it without the comma, what do you think?

265:17:54 erase-able is that correct English? I'd go without the dash/hyphen, and
in effect, make a decision is also something I'm not sure about, being non-native and all

267:36:25 BAT I'm sure they talked about "BATT" before, but this is (like some of the above)
might not be OCR mistakes but just another typist doing things different.

274:58:02 wakeup -> "wakeup."

276:28:01 -it's convenient -> "- it's convenient" (space after the dash)

280:05:53 SC again?

280:25:28 is that -> "Is that" (new sentence)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.