softwareunderground / 52things Goto Github PK
View Code? Open in Web Editor NEW52 Things You Should Know About Geocomputing
52 Things You Should Know About Geocomputing
This chapter is an important one for reminding the reader the importance of checking your co-ordinate system. It's also helpful in reminding the reader the number of small discrete ways these errors can creep into your database and is an important chapter to have.
It is 837 words long: I am unsure of a word limit for this book but if it needed reducing some of the introductory context setting could be reduced to lead the user more directly to warnings about incorrect co-ordinate systems. My suggestions below may add more code lines and so some text may have to be reduced.
Main review points:
It would benefit from copy-editing and formatting
I am unsure of the intended audience for the book (in terms of experience), but some of the acronyms and steps in the code comments could be explained more fully (e.g. what does "srs" mean and at the end of code block 2, give some examples of what code the author has in mind).
"The next thing to do is to adapt this measurement method in a while or for loop and run it across all rows in your well header database (perhaps by using ODBC/SQL calls)."
For the above section from the chapter, it would be really helpful to have an example code block to show what the author intends - having a start to finish workflow is so helpful for developers new to this kind of thing.
I didn't have the contents of the table to refer to for this review but I ran the code in a Jupyter notebook. There are a number of syntax errors which I have attempted to rectify below but these should be double checked by the author.
In addition to this, in Code Block 1 as shown below, I got some Future warning and syntax deprecation warnings: if the author is able to update the code calls to the updated functionality that would be beneficial.
import pyproj
#init srs
w84UTM50N_srs = pyproj.Proj( "+init=EPSG:32650" )
#Without preserve_units=True, all results will be returned in meter
TimbBRSO_ft_srs = pyproj.Proj( "+init=EPSG:29872", preserve_units=True )
# these are original/stored coordinates in Timbalai BRSO(ft)
long = 114.0
lat = 5.0
x = 1900000.0
y = 2050000.0
# and this is WGS84 UTM 50N coordinate
W84_x = 278000.0
W84_y = 626000.0
# Section A – projection and unprojection
# ‘c_’ indicates calculated values
(c_x, c_y) = TimbBRSO_ft_srs (long, lat) # Timbalai BRSO geographic to projected coordinate
(c_long, c_lat) = TimbBRSO_ft_srs (x,y, inverse=True) # Timbalai BRSO projected to geographic coordinate
#Section B - transformation between different datums/projection - here from WGS84 50N to Timbalai BRSO(ft)
(c_29872_x, c_29872_y) = pyproj.transform(w84UTM50N_srs, TimbBRSO_ft_srs, W84_x, W84_y)
# and the other way around
(c_32650_x, c_32650_y) = pyproj.transform(TimbBRSO_ft_srs , w84UTM50N_srs , x, y)
#Here we calculate the abs difference between stored and calculated values.
#Refer to variables defined in Section A
abs_diff_x = abs(c_x - x) # calculate absolute difference
abs_diff_y = abs(c_y - y)
# You can also calculate offset: use Pythagoras method sqrt(dX*dX + dY*dY) and then magnitude: log10(offset)
# Insert your own code here to do something when the difference in values exceed a certain value or magnitude
I like this. The analogy is quite apt. (Using the 'edition notice' or copyright page might be more accurate, strictly speaking, but the 'judge a book by its cover' thing works better.) The message is straightforward -- it's good, solid, memorable advice. Classic 52 Things :)
We all judge books by their covers. Let's make SEG-Y files people want to read.
Figure
This is a fantastic chapter, could not help reading it given my obsession with colours.
I fixed the link to the figure already as it was out of date.
A must read; great review of a lot of things that go on in the background when we tinker with colour. I like the light, fun style, and the narrative form of evolving from simple to complex really works. I like the reference to 'objects'.
The only thing I wonder: can you label (e.g. A, B, C, D) the four panels in the figure and reference them in the text as you go (or just reference them as first from top etcetera)?
This looks solid.
Comparing the title to the content, it is not entirely clear to me that the thrust is to teach students to code, rather than just the use of notebooks as a teaching tool for the subject being taught. What is described in many cases (using widgets and such for interactivity) would not need to be coded by students themselves. I acknowledge that it would be somewhat strange to use notebooks without coding them, and there are references to writing some, but the title felt off somehow.
Highlighting the use of automatic marking and similar things is valuable though, and certainly the use of notebooks as a teaching tool is a really good use case for them, especially with interactive widgets.
I liked this one a lot.
I wonder if we are expecting things to change or if the list of software mentioned is anticipated to be stable? A quick look around suggests that most of these have been around for a while, so are likely to be around for a decent time.
Geologists also do a lot more than look for oil and gas, so it might need a qualification that that is the problem space being discussed. (Since there is probably not space to start discussing geochemical packages and the various modelling software used on mines and such.)
Something that might help would be linking things back to the first few tasks mentioned. It feels like that is something that will frame the entire discussion, but then there is a long digression into seismic processing that does not seem to intuitively fit into the processes suggested as being handled by Petra and Geographix.
Is it worth adding python/scipy in along with MATLAB, Mathematica and Maple? I certainly see and use it more, but that might be my own echo chamber.
Quantum GIS has rebranded, and is simply QGIS now.
I enjoyed this essay, and the advice on learning a new package seems valuable and spot on to me. Thanks for writing this.
What to do in such circumstances is part of ‘edge computing,’ the subject of this month’s lead.
Not sure what this bit is doing here? It seems this was taken from a monthly publication and not editted accordingly.
Nice!
I wonder if putting some values in terms of the anticipated speed-up might be worthwhile or is this something that is changing is too quickly to be worthwhile?
Otherwise, I thought this was well-written, and a really good overview of the benefits of using a GPU. It also pointed out the difficulties nicely. I do not really have too much more to say.
I can use another set of eyes on this chapter's draft:
https://github.com/softwareunderground/52things/blob/master/chapters/Niccoli_2.md
My eyes are tired of re-reading it
I like Alberto's chapters a lot. They were also among the first I received, after doing my writing class at RSI. I love reading stories like this, telling how far we've come as individuals, and also as a discipline.
We can even imagine the possibility of simulating a complex organism, allowing us to delve deeper into the secrets of life.
This chapter will need copy editing for little things like relative pronouns etcetera.
But it is a solid chapter, an interesting and enjoyable read. It makes a good story.
I was interested to see that you didn't promote pull request reviews as an initial chance to jump into the reviewing process.
@lheagy and I have found that pretty interesting in the geosci
sphere when getting a bunch of authors to collaborate on textbook chapters. Might be something worth exploring over here as well?
Example:
geoscixyz/em#480
@jessepisel I reviewed your chapter.
Nice read, clear and interesting.
I tried your code in a Jupyter notebook running Python 3.6 and it works.
The only thing that needs to be changed is the order of the last three lines.
As it is right now it will generate an extra figure at the end:
so it needs to be:
Ok, this was really interesting. I had no idea that this was a thing that needed to be taken into account, so thanks for giving me that new insight.
I wonder about how strong the link to geocomputing is though. Is this something that computing can help with? Is there a dataset of common references that can be queried when starting a job? Is there a standard data format used following the points at the end?
I am really glad that I know about this as a problem now, but I feel like the geocomputing link (for this collection of essays) could be made a bit more explicit.
I've never done any encryption, but after reading Simon Singh's The Code Book I've developed quite the interest for it. This chapter was a fun read. I loved it. The level is perfect! And great that you enable readers by providing links to the tools.
One thing needs to be fixed: the link for Diskcryptor (third link) is old or broken. Is there a new one, or alternative?
GnuPG link was instead incorrect but I fixed it already .
At 552 words the length is right but has 2 figures so I will leave final say to the expert (@kwinkunks ).
This chapter sits at 1243 words. Way too long.
It looks interesting, so I will read and review it anyway and add in a separate comment.
BUT: it will need to either be significantly rewritten, split in half, or not included.
I really like this chapter. IIRC, it was the first submission. And of course we're lucky to have something from Dave Hale.
Only drawback is that the code takes quite some parsing if you're not used to C++. It would be interesting to see a pure Python port (one would approach it in a completely different way, at least in NumPy, so it wouldn't be a fair comparison).
The blog reference will go in as a footnote.
the chapter is ready to go, and only requires some copy editing
@mtb-za I have just re-read your Bentley 2 chapter.
Apart from perhaps two typos that I mentally registered but ignored, I have no recommendations for changes, just comments. But it is ready to go!
This is a nice chapter. Very readable.
I think it'll make a useful read both for someone that is coding for colleagues that have no programming experience, and for those colleagues (they too can be educated about some of these dilemmas).
I say this as someone that paid an undergrad computer science student (with a few weekly pizzas and some Italian tutoring) to write Perl scripts (circa 2002, when I knew very little CS, and could not even, yet, write Matlab scripts) to do just these kind of works, but on a scale of tens of thousands of proprietary files from 3C seismometers, and had to be educated to some extent, at least to be able to explain, in the appendix of my Master's thesis, what was going on.
I enjoyed reading this chapter. These topics use to very much NOT be on my radar buy I recognize of late signs of what you define ubiquitous language so it is really nice to hear about it in a formal context. Example of “well” is very relevant.
I did not look at grammar (though nothing jumped out) and to me this is ready to go.
Great chapter.
@kwinkunks can you forward this to the authors?
Is it possible to label the figure and specify mesh type?
i.e. TL: Cartesian grid - TR: S grid? - BL: mesh grid? - BR: hybrid grid?
This very nice essay is very well written and the length is spot on.
I made very few changes. There are quite a lot of parentheses, mostly justified, but I took one or two out. We have a house style for book names and refs so I transformed them.
Figure
We will possibly need to place the legend inside the plot area in order to fit this; we'll know at layout.
I will test the code and report in a separate comment.
Paging @Leguark
GemPy 2.x
versionI'd be happy to make those changes if you'd like 😄
The last "problem" in the first sentence does not sound right.
"approach" perhaps?
Given the huge scope of geological problems, it is not reasonable to assume that all problems can be solved using large, monolithic problems.
Other than that, I like the chapter.
Is it worht mentioning sometihng like cookiecutter?
https://github.com/cookiecutter/cookiecutter
A really nice chapter -- great connections from everyday problems to more specialist or aspirational challenges, and back to the everyday. I did not do much:
@grajohnt — I wonder if it's worth mentioning the most important patterns for this kind of work (say, text processing, regular expressions, and parsers)? I don't know, but if you feel like there's an easy way to add a couple of sentences of advice for someone trying to code up a data translator, let's include them. What do you think?
Very interesting chapter, loved reading it.
Ready to go (apart from grammar review but nothing stood out).
Disclaimer: I say this with a mixture of silly pride and sheepishness at my stubbornness and my blind corner: I do not get Linux, and likely I will never get it. I tried...
Having said that, I enjoyed reading this chapter a ton!
Question: with 3 years since this was added, and 5 from original publication, has anything further changed significantly that it is worth adding? With the chapter being at 513 words there's definitely room for another paragraph or two. I will ask the author.
And two very small issues:
The first line of code in this article contains an error:
cat data.csv | grep i ‘sio2’ > silica.csv
this should be (note dash in the grep flag).
cat data.csv | grep -i ‘sio2’ > silica.csv
Good length, good chapter, I enjoyed reading it.
Informative, relevant. Ready for copy editing.
Fantastic read.
720 words but it is acceptable.
I did not know the etymology of "Reproducible research" was tied to Clarebout. The whole thing is very interesting. A great way to promote both reproducible research and Madagascar!
The body of the chapter is probably ready for layout.
I could not track the first reference down ( Claerbout, JF (1991). Electronic document preface, in SEP-72, Stanford Exploration Project, p 1–18. )
As a reader: I like this chapter very much. I resist picking up R in my scientific computing work, so this did strike a chord in a positive way. I now have RStudio on my desktop! Baby steps. I also agree on your “ do-first, learn-the-details afterward” as that is my approach too.
As a reviewer, just two comments:
Review coming soon by @mycarta
Reviews welcome
I really liked this chapter Matt.
It is exciting, witty as usual (I love "how many cobras does a honey badger eat at one sitting?"), interesting. Well written as usual.
Reference is correct.
But:
it is 840 words long, can it work? If not, do you want to go back and trim it?
Would you change anything in view of last couple of years of even more accelerated transformation in your personal experience with Agile, and outside?
Review needed for #38
A nice read for anyone crossing domain
Suggestions
I know Eirik is no longer 44 so this needs an update :)
Could benefit from a figure detailing the "stack" or difference between classification and regression
at 950 words it's too long and I think there are a few easy cuts by tidying up the english especially where Eirik goes into great detail about E&P.
The cross plots on the boardroom table are probably introduced a little too late in the blog and the message gets lost about switching to geocomputing.
Paragraphs beggining at line 13 and 15 are very similar and could either be reduced or merged.
Hi @bsburnham
Review coming soon
Overall nicely written chapter. I like the style, the structure, and the objectives, which I think are met.
However, I have a few comments on specifics of Machine Learning; see below, organized by section. I may call on others to help out. Ultimately it may need further work from the author.
1. Understand each variable independently
About determining the normality: I recently had an in-depth discussion with a friend (a statistician) about this becasue I was confused by contradicting recommendations in this regards - he assured me there are no distributional assumptions on the predictors, only on the dependent variable, so this needs to be clarified.
2. Feature engineering
All good
3. Understand bivariate relationship
All good
4. Exploit multivariate patterns
I would not only use PCA. I would consider suggesting multiple methods to explore multivariate relationships / variable importance, ideally a combination of model based and some not model based, and decide base on majority vote (variables most methods agree upon).
5. Train your Machine Learning model
In here we have a recommendation for a 80/20 training / validation split. THis needs to be clarified on two levels:
6. Prediction!
All good
Good chapter, interesting experience, I really enjoyed reading it. A bit short at 544 words.
Also read by @dabiged
Only fixed a couple of missing words and a repeated one:
I opted for a combination of options (ii) and (iii) only later opting for (i) a few weeks later. becomes:
I opted for a combination of options (ii) and (iii) only TO GO for (i) a few weeks later.
Are us geoscientists becomes:
Are WE geoscientists
I am excited what the industry will look like in another 25 years becomes:
I am excited BY what the industry will look like in another 25 years
Good content but the flow isn't quite there
750 words but i think it's acceptable
would like to see Innersource and Open Source introduced properly in paragraph beginning line 7- e.g "two main giants are inner source/ open source" as its a bit of a jump to the next section
paragraph beggining line 15 is difficult to understand
All good here I think this is a (3) already
I like this chapter.
It is interesting, witty, thought provoking, answers the questions posed.
It is a bit short at 307 words but if as you say you will split the figure in two it should work. You are the pro.
As you point out, it is from blog post and needs a bit of adapting, but not too much, in my view. Perhaps things like "These concerns are valid, sort of" is ok for blog, a bit casual for essay; maybe I'm wrong.
"As you may know, we offer a multi-day course on "geocomputing" has context in the blog post because people know you there, needs a rethink here. That's all really for me.
For me this chapter is good to go!
Review coming soon by @mycarta
@GeostatsGuy
This is a great chapter; exactly the kind of review anyone doing geological modeling should go back to re-reading every once in a while.
The only modification I would suggest: there is no reference in the text to the figure and it needs one and a bit of context or else there’s a disconnect. It could be as simple as referencing it as an example of the simpler, more explainable, perhaps more accurate models you talk about in the second-last section.
Link in the article is broken http://tge.geoscience.tech/ is unresolved.
In paragraph 1 there is the following statement:
It was until I studied abroad in France,
I think this should read:
It wasn't until I studied abroad in France,
First a general comment: I really like this chapter, content and style. Exactly the kind of reading I wish I'd done on week 1, say day 2 of my Python adventure. It will help many.
The code works. I run all of it and got the same results.
Text without the code snippet is ~650 words, just about right.
But it is double that amount with the code; I am not sure if it'll fit, I'd check with @kwinkunks - I wonder if you could aggregate all code into a figure with 4 subplots (a, b, c, d) and use screen captures of the snippets instead of text. Just a thought.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.