softwareunderground / 52things Goto Github PK

View Code? Open in Web Editor NEW

101.0 101.0 61.0 36.78 MB

52 Things You Should Know About Geocomputing

52things's People

Contributors

Stargazers

Watchers

52things's Issues

Quality Checking your Spatial Data - review (chapter 28, Hassan Sabirin)

This chapter is an important one for reminding the reader the importance of checking your co-ordinate system. It's also helpful in reminding the reader the number of small discrete ways these errors can creep into your database and is an important chapter to have.

It is 837 words long: I am unsure of a word limit for this book but if it needed reducing some of the introductory context setting could be reduced to lead the user more directly to warnings about incorrect co-ordinate systems. My suggestions below may add more code lines and so some text may have to be reduced.

Main review points:

It would benefit from copy-editing and formatting
I am unsure of the intended audience for the book (in terms of experience), but some of the acronyms and steps in the code comments could be explained more fully (e.g. what does "srs" mean and at the end of code block 2, give some examples of what code the author has in mind).
"The next thing to do is to adapt this measurement method in a while or for loop and run it across all rows in your well header database (perhaps by using ODBC/SQL calls)."

For the above section from the chapter, it would be really helpful to have an example code block to show what the author intends - having a start to finish workflow is so helpful for developers new to this kind of thing.
I didn't have the contents of the table to refer to for this review but I ran the code in a Jupyter notebook. There are a number of syntax errors which I have attempted to rectify below but these should be double checked by the author.

In addition to this, in Code Block 1 as shown below, I got some Future warning and syntax deprecation warnings: if the author is able to update the code calls to the updated functionality that would be beneficial.

Code Block 1: suggested syntax corrections

import pyproj

#init srs
w84UTM50N_srs = pyproj.Proj( "+init=EPSG:32650" ) 

#Without preserve_units=True, all results will be returned in meter
TimbBRSO_ft_srs = pyproj.Proj( "+init=EPSG:29872", preserve_units=True )

# these are original/stored coordinates in Timbalai BRSO(ft)
long = 114.0
lat = 5.0
x = 1900000.0
y = 2050000.0

# and this is WGS84 UTM 50N coordinate
W84_x = 278000.0
W84_y = 626000.0

# Section A – projection and unprojection
# ‘c_’ indicates calculated values
(c_x, c_y) = TimbBRSO_ft_srs (long, lat)		# Timbalai BRSO geographic to projected coordinate
(c_long, c_lat) = TimbBRSO_ft_srs (x,y, inverse=True)	# Timbalai BRSO projected to geographic coordinate

#Section B - transformation between different datums/projection - here from WGS84 50N to Timbalai BRSO(ft) 
(c_29872_x, c_29872_y) = pyproj.transform(w84UTM50N_srs, TimbBRSO_ft_srs,  W84_x, W84_y)
# and the other way around
(c_32650_x, c_32650_y) = pyproj.transform(TimbBRSO_ft_srs , w84UTM50N_srs ,  x, y)

Code Block 2: suggested syntax corrections

#Here we calculate the abs difference between stored and calculated values.
#Refer to variables defined in Section A
abs_diff_x = abs(c_x - x)  # calculate absolute difference
abs_diff_y = abs(c_y - y)

# You can also calculate offset: use Pythagoras method sqrt(dX*dX + dY*dY) and then magnitude: log10(offset)
# Insert your own code here to do something when the difference in values exceed a certain value or magnitude

Dinneen: SEG-Y

I like this. The analogy is quite apt. (Using the 'edition notice' or copyright page might be more accurate, strictly speaking, but the 'judge a book by its cover' thing works better.) The message is straightforward -- it's good, solid, memorable advice. Classic 52 Things :)

line 5, semi-colons to commas
SEG-Y consistent throughout
line 9, passive to active
Bullets to single lines.
Suggested final sentence (felt a little chastising):

We all judge books by their covers. Let's make SEG-Y files people want to read.

Figure

Is probably fine as-is. Worst-case scenario: we OCR it and set it in text.

Turn data into colours

This is a fantastic chapter, could not help reading it given my obsession with colours.

I fixed the link to the figure already as it was out of date.

A must read; great review of a lot of things that go on in the background when we tinker with colour. I like the light, fun style, and the narrative form of evolving from simple to complex really works. I like the reference to 'objects'.

The only thing I wonder: can you label (e.g. A, B, C, D) the four panels in the figure and reference them in the text as you go (or just reference them as first from top etcetera)?

Teaching students to code - Jan Niederau

This looks solid.

Comparing the title to the content, it is not entirely clear to me that the thrust is to teach students to code, rather than just the use of notebooks as a teaching tool for the subject being taught. What is described in many cases (using widgets and such for interactivity) would not need to be coded by students themselves. I acknowledge that it would be somewhat strange to use notebooks without coding them, and there are references to writing some, but the title felt off somehow.

Highlighting the use of automatic marking and similar things is valuable though, and certainly the use of notebooks as a teaching tool is a really good use case for them, especially with interactive widgets.

Ennen: Software, Software Everywhere

I liked this one a lot.

I wonder if we are expecting things to change or if the list of software mentioned is anticipated to be stable? A quick look around suggests that most of these have been around for a while, so are likely to be around for a decent time.

Geologists also do a lot more than look for oil and gas, so it might need a qualification that that is the problem space being discussed. (Since there is probably not space to start discussing geochemical packages and the various modelling software used on mines and such.)

Something that might help would be linking things back to the first few tasks mentioned. It feels like that is something that will frame the entire discussion, but then there is a long digression into seismic processing that does not seem to intuitively fit into the processes suggested as being handled by Petra and Geographix.

Is it worth adding python/scipy in along with MATLAB, Mathematica and Maple? I certainly see and use it more, but that might be my own echo chamber.

Quantum GIS has rebranded, and is simply QGIS now.

I enjoyed this essay, and the advice on learning a new package seems valuable and spot on to me. Thanks for writing this.

Digitalization, from harry nyquist.... (chapter 16) review

What to do in such circumstances is part of ‘edge computing,’ the subject of this month’s lead.

Not sure what this bit is doing here? It seems this was taken from a monthly publication and not editted accordingly.

Dramsch: General Purpose GPU Programming

Nice!

I wonder if putting some values in terms of the anticipated speed-up might be worthwhile or is this something that is changing is too quickly to be worthwhile?

Otherwise, I thought this was well-written, and a really good overview of the benefits of using a GPU. It also pointed out the difficulties nicely. I do not really have too much more to say.

Keep on improving your geocomputing projects (chapter 39) reviewer wanted

I can use another set of eyes on this chapter's draft:
https://github.com/softwareunderground/52things/blob/master/chapters/Niccoli_2.md

My eyes are tired of re-reading it

Rusic: I hate computers

I like Alberto's chapters a lot. They were also among the first I received, after doing my writing class at RSI. I love reading stories like this, telling how far we've come as individuals, and also as a discipline.

I really like the title, but I'm also not sure it quite fits. Maybe part 2 could be "I love computers"... but Bitter sweet complexity is nice too.
Replaced the line about the Schrodinger equation, which AFAIK we can solve numerically, with something about simulating organisms, which seemed to me to preserve the point.

We can even imagine the possibility of simulating a complex organism, allowing us to delve deeper into the secrets of life.

Language throughout, nothing major.

The phoenix - review (chapter 47 , Ágoston Sasvári)

This chapter will need copy editing for little things like relative pronouns etcetera.

But it is a solid chapter, an interesting and enjoyable read. It makes a good story.

Reviewing_submissions.md

I was interested to see that you didn't promote pull request reviews as an initial chance to jump into the reviewing process.

@lheagy and I have found that pretty interesting in the geosci sphere when getting a bunch of authors to collaborate on textbook chapters. Might be something worth exploring over here as well?

Example:
geoscixyz/em#480

Arm-wavers Anonymous

@jessepisel I reviewed your chapter.

Nice read, clear and interesting.

I tried your code in a Jupyter notebook running Python 3.6 and it works.
The only thing that needs to be changed is the order of the last three lines.

As it is right now it will generate an extra figure at the end:

so it needs to be:

Storey: De profundis

Ok, this was really interesting. I had no idea that this was a thing that needed to be taken into account, so thanks for giving me that new insight.

I wonder about how strong the link to geocomputing is though. Is this something that computing can help with? Is there a dataset of common references that can be queried when starting a job? Is there a standard data format used following the points at the end?

I am really glad that I know about this as a problem now, but I feel like the geocomputing link (for this collection of essays) could be made a bit more explicit.

Seismic data encryption - review (chapter 10, Graham Ganssle)

@gganssle :

I've never done any encryption, but after reading Simon Singh's The Code Book I've developed quite the interest for it. This chapter was a fun read. I loved it. The level is perfect! And great that you enable readers by providing links to the tools.

One thing needs to be fixed: the link for Diskcryptor (third link) is old or broken. Is there a new one, or alternative?
GnuPG link was instead incorrect but I fixed it already .

At 552 words the length is right but has 2 figures so I will leave final say to the expert (@kwinkunks ).

Software challenges in oil & gas - review (chapter 18, Bill Menger)

This chapter sits at 1243 words. Way too long.

It looks interesting, so I will read and review it anyway and add in a separate comment.

BUT: it will need to either be significantly rewritten, split in half, or not included.

Hale: My favourite 10-line program

I really like this chapter. IIRC, it was the first submission. And of course we're lucky to have something from Dave Hale.

Only drawback is that the code takes quite some parsing if you're not used to C++. It would be interesting to see a pure Python port (one would approach it in a completely different way, at least in NumPy, so it wouldn't be a fair comparison).

Changes

'awhile' to 'a while'
'in a one-line program' to 'in one line'
format var names as code (throughout)
hypen to minus
space before '(black)'
no figure caption
delete remark about efficiency on line 28, concatenate paragraphs for space.
line 24: awkward mixture of code and math symbology
replace bullets in lines 34ff
cut at "One final tip"

Figures

Skip Figure 3 (discussion of edges and zero mean)
Will need recoloring, but can probably do this in GIMP. We'll keep the colour versions for online / e-book.

The blog reference will go in as a footnote.

Human neural networks in geocomputing (55)

the chapter is ready to go, and only requires some copy editing

Best Practices are not (always) the best approach

@mtb-za I have just re-read your Bentley 2 chapter.

Apart from perhaps two typos that I mentally registered but ignored, I have no recommendations for changes, just comments. But it is ready to go!

This is a nice chapter. Very readable.

I think it'll make a useful read both for someone that is coding for colleagues that have no programming experience, and for those colleagues (they too can be educated about some of these dilemmas).

I say this as someone that paid an undergrad computer science student (with a few weekly pizzas and some Italian tutoring) to write Perl scripts (circa 2002, when I knew very little CS, and could not even, yet, write Matlab scripts) to do just these kind of works, but on a scale of tens of thousands of proprietary files from 3C seismometers, and had to be educated to some extent, at least to be able to explain, in the appendix of my Master's thesis, what was going on.

Domain-driven design in geocomputing (chapter 3) review

I enjoyed reading this chapter. These topics use to very much NOT be on my radar buy I recognize of late signs of what you define ubiquitous language so it is really nice to hear about it in a formal context. Example of “well” is very relevant.

I did not look at grammar (though nothing jumped out) and to me this is ready to go.

A Geological Model is a Single Hypothesis - of Many Possible Ones (chapter 34) - review

Overall very well written and, save for some copyediting, I'd say this is pretty much ready for publication ✔️
I'd recommend putting the chapter in front of the GemPy chapter (33), as it introduces the need for stochastic modeling and talks about open-source goemodeling software.
Could be updated with explicitly mentioning the by now published GemPy paper when talking about open-source geomodeling tools in the second-to-last paragraph.

Is Geology Cartesian?

Great chapter.

@kwinkunks can you forward this to the authors?

Is it possible to label the figure and specify mesh type?
i.e. TL: Cartesian grid - TR: S grid? - BL: mesh grid? - BR: hybrid grid?

Dunnington: Grammar of graphics

This very nice essay is very well written and the length is spot on.

I made very few changes. There are quite a lot of parentheses, mostly justified, but I took one or two out. We have a house style for book names and refs so I transformed them.

Figure

We will possibly need to place the legend inside the plot area in order to fit this; we'll know at layout.

Am=d: A linear algebra approach to seismic modelling (chapter 4) - review

Chapter still seems relevant
It needs equations
It needs 52 things Geophysics in the References
Haase’s paper in the References is not cited in the text

I will test the code and report in a separate comment.

GemPy: 3D Geological Modelling in Python (chapter 33) - review

Paging @Leguark

Suggestions to improve the article

Update code snippets for latest (significantly different) GemPy 2.x version
Update the two plots with better figures.
- Figure 1 doesn't really serve much information - I suggest it can be cut down to one clear view of the data in 3-D. Maybe add a 2-D slice plot instead of the 3 different camera views.
- Figure 2 should contain model fault displayed
- clean marching cubes "artifacts" (cut surfaces with fault)
- White plot background for printing?
In the second to last paragraph you switch from geomodeling to "geomodeling as an inference problem" without much explanation. I doubt most readers will know why automatically differentiable software is key for that and what that all even means. I'd recommend first just talking about stochastic simulations in general, and how running GemPy on GPUs can speed things up. Maybe then hint at the ML use cases afterwards. If you're running into the word limit, maybe cut all the references to the widely known open-source packages used in GemPy, which are probably out of scope for such a short article.

I'd be happy to make those changes if you'd like 😄

In Praise of Small Tools, or a short Ode to the CommandLine (chapter 1) review

The last "problem" in the first sentence does not sound right.
"approach" perhaps?

Given the huge scope of geological problems, it is not reasonable to assume that all problems can be solved using large, monolithic problems.

Other than that, I like the chapter.
Is it worht mentioning sometihng like cookiecutter?
https://github.com/cookiecutter/cookiecutter

Thurmond: The tyranny of formats

A really nice chapter -- great connections from everyday problems to more specialist or aspirational challenges, and back to the everyday. I did not do much:

light copy-editing throughout
kept most important links

@grajohnt — I wonder if it's worth mentioning the most important patterns for this kind of work (say, text processing, regular expressions, and parsers)? I don't know, but if you feel like there's an easy way to add a couple of sentences of advice for someone trying to code up a data translator, let's include them. What do you think?

Some advice on reproducing figures - review wanted

Why use virtual outcrop? (chapter 35) - review

Very interesting chapter, loved reading it.
Ready to go (apart from grammar review but nothing stood out).

The Steady Advance of Linux - review (chapter 17, Bill Menger)

Disclaimer: I say this with a mixture of silly pride and sheepishness at my stubbornness and my blind corner: I do not get Linux, and likely I will never get it. I tried...

Having said that, I enjoyed reading this chapter a ton!

Question: with 3 years since this was added, and 5 from original publication, has anything further changed significantly that it is worth adding? With the chapter being at 513 words there's definitely room for another paragraph or two. I will ask the author.

And two very small issues:

on line 4 I see a repetition of sort in put this to work at work!
similarly on line 7 I think With a little work, Ingres worked can be rewritten

In Praise of Small Tools, or a short Ode to the CommandLine - error in Pseudo code

The first line of code in this article contains an error:
cat data.csv | grep i ‘sio2’ > silica.csv

this should be (note dash in the grep flag).
cat data.csv | grep -i ‘sio2’ > silica.csv

Open source geostatistical modeling - review (chapter 23, Michael Pyrcz)

Good length, good chapter, I enjoyed reading it.
Informative, relevant. Ready for copy editing.

Reproducible research - review (chapter 9, Sergey Fomel)

Fantastic read.

720 words but it is acceptable.

I did not know the etymology of "Reproducible research" was tied to Clarebout. The whole thing is very interesting. A great way to promote both reproducible research and Madagascar!

The body of the chapter is probably ready for layout.

I could not track the first reference down ( Claerbout, JF (1991). Electronic document preface, in SEP-72, Stanford Exploration Project, p 1–18. )

R, RStudio, and the tidyverse for Geocomputing (chapter 36) - review

As a reader: I like this chapter very much. I resist picking up R in my scientific computing work, so this did strike a chord in a positive way. I now have RStudio on my desktop! Baby steps. I also agree on your “ do-first, learn-the-details afterward” as that is my approach too.

As a reviewer, just two comments:

“ typing code in Python” seems a bit underwhelming when in the section below that, you write about “ data analysis, generating documents/figures, and building software tools”.
With respect to sections 1 and 3 is it still the case in 2020 that Python, Jupyter, Pycharm, etc are behind? I do not know, you have more experience.

Learn Javascript!

Review coming soon by @mycarta

Prototype colourmaps for fault interpretation

Reviews welcome

What's so special about geoscience? - Review (chapter 13, Matt Hall)

I really liked this chapter Matt.
It is exciting, witty as usual (I love "how many cobras does a honey badger eat at one sitting?"), interesting. Well written as usual.

Reference is correct.

But:
it is 840 words long, can it work? If not, do you want to go back and trim it?

Would you change anything in view of last couple of years of even more accelerated transformation in your personal experience with Agile, and outside?

Getting started in HPC in 3 easy steps

Review needed for #38

Crossplots on the boardroom table (14)

A nice read for anyone crossing domain

Suggestions

I know Eirik is no longer 44 so this needs an update :)
Could benefit from a figure detailing the "stack" or difference between classification and regression
at 950 words it's too long and I think there are a few easy cuts by tidying up the english especially where Eirik goes into great detail about E&P.
The cross plots on the boardroom table are probably introduced a little too late in the blog and the message gets lost about switching to geocomputing.
Paragraphs beggining at line 13 and 15 are very similar and could either be reduced or merged.

The Virtual Geoscience Revolution

Hi @bsburnham
Review coming soon

Simple Machine Learning - review (chapter 21, Didi Ooi)

Overall nicely written chapter. I like the style, the structure, and the objectives, which I think are met.
However, I have a few comments on specifics of Machine Learning; see below, organized by section. I may call on others to help out. Ultimately it may need further work from the author.

1. Understand each variable independently
About determining the normality: I recently had an in-depth discussion with a friend (a statistician) about this becasue I was confused by contradicting recommendations in this regards - he assured me there are no distributional assumptions on the predictors, only on the dependent variable, so this needs to be clarified.

2. Feature engineering
All good

3. Understand bivariate relationship
All good

4. Exploit multivariate patterns
I would not only use PCA. I would consider suggesting multiple methods to explore multivariate relationships / variable importance, ideally a combination of model based and some not model based, and decide base on majority vote (variables most methods agree upon).

5. Train your Machine Learning model
In here we have a recommendation for a 80/20 training / validation split. THis needs to be clarified on two levels:

the terminology. It is unclear to me what the author means with Validation (for terminology I try to stick to Sebastian Raschka's, see diagram below:

If the intended meaning is just an 80 train/test set like in the first row in the diagram, then it may be ok, although 80/20 is seldom a good generic split; I could be wrong but I have a sense the author may be referring to the second row because she mentions training competitive models, in which case this approach would be incorrect. It certainly needs to be clarified.

6. Prediction!
All good

The obsolete geosicientist - review (chapter 37, Andrew Pethick)

Good chapter, interesting experience, I really enjoyed reading it. A bit short at 544 words.
Also read by @dabiged

Only fixed a couple of missing words and a repeated one:

I opted for a combination of options (ii) and (iii) only later opting for (i) a few weeks later. becomes:
I opted for a combination of options (ii) and (iii) only TO GO for (i) a few weeks later.

Are us geoscientists becomes:
Are WE geoscientists

I am excited what the industry will look like in another 25 years becomes:
I am excited BY what the industry will look like in another 25 years

Standing on the Shoulders of the Guy in the UK Office (11)

Good content but the flow isn't quite there
750 words but i think it's acceptable
would like to see Innersource and Open Source introduced properly in paragraph beginning line 7- e.g "two main giants are inner source/ open source" as its a bit of a jump to the next section
paragraph beggining line 15 is difficult to understand

My name is bot, geobot (41)

All good here I think this is a (3) already

What is geocomputing? - review (chapter 38, Matt Hall)

@kwinkunks :

I like this chapter.

It is interesting, witty, thought provoking, answers the questions posed.
It is a bit short at 307 words but if as you say you will split the figure in two it should work. You are the pro.

As you point out, it is from blog post and needs a bit of adapting, but not too much, in my view. Perhaps things like "These concerns are valid, sort of" is ok for blog, a bit casual for essay; maybe I'm wrong.
"As you may know, we offer a multi-day course on "geocomputing" has context in the blog post because people know you there, needs a rethink here. That's all really for me.

Reflections on Building Technical Communication Tools

For me this chapter is good to go!

Getting started in Geocomputing can seem daunting but it doesn’t need to be!

Review coming soon by @mycarta

Machine Learning for Geological Modeling

@GeostatsGuy
This is a great chapter; exactly the kind of review anyone doing geological modeling should go back to re-reading every once in a while.
The only modification I would suggest: there is no reference in the text to the figure and it needs one and a bit of context or else there’s a disconnect. It could be as simple as referencing it as an example of the simpler, more explainable, perhaps more accurate models you talk about in the second-last section.

Hardware is hard: teaching geotech (chapter 15) - review

Link in the article is broken http://tge.geoscience.tech/ is unresolved.

Teaching geoscience students to code (chapter 20) - review

In paragraph 1 there is the following statement:

It was until I studied abroad in France,

I think this should read:

It wasn't until I studied abroad in France,

Saltman: Speeding things up

First a general comment: I really like this chapter, content and style. Exactly the kind of reading I wish I'd done on week 1, say day 2 of my Python adventure. It will help many.

The code works. I run all of it and got the same results.

Text without the code snippet is ~650 words, just about right.

But it is double that amount with the code; I am not sure if it'll fit, I'd check with @kwinkunks - I wonder if you could aggregate all code into a figure with 4 subplots (a, b, c, d) and use screen captures of the snippets instead of text. Just a thought.

softwareunderground / 52things Goto Github PK

52things's People

Contributors

Stargazers

Watchers

Forkers

52things's Issues

Code Block 1: suggested syntax corrections

Code Block 2: suggested syntax corrections

Changes

Figures

Suggestions to improve the article

Recommend Projects

Recommend Topics

Recommend Org

Jobs