The eql-gradient-boosted from compgeolab

Pin Harmonica v0.2.0 and Verde v1.6.0

Pin Harmonica to v0.2.0 and Verde to v1.6.0 after they are released.

Information

Journal name: Geophysical Journal International
Journal guidelines for authors: https://academic.oup.com/gji/pages/General_Instructions
Supplementary material DOI: https://doi.org/10.6084/m9.figshare.13604360

This is the standard EarthArXiv disclaimer for a preprint:

\noindent
\textbf{Disclaimer:}
This is a non-peer reviewed preprint submitted to EarthArXiv.
It is currently under review at \textit{insert-journal}.

Checklist

Add noise to the data

@santisoler we forgot a very important thing! Adding some random noise to the data. I suspect this will drive the scores down a bit but I don't think it will change the results much.

Move matplotlib config file to notebook that creates figures

The matplotlib.rc file was created so several notebooks can use the same configuration for creating figures for the manuscript. Since we centrilised the figures creation on a single notebook, there's no need to maintain this file, we could simply change matplotlib's configuration on the notebook itself.

Ditch South Africa notebook

We decided to keep only with the Australia gravity data for a real world application.

Post-publication updates

A checklist of things to do once the paper is accepted:

Update the preprint disclaimer to include the version of reccrd
Update the README with the version of record link
Update the figshare archive
Tag a new version as accepted
Update the preprint on EarthArXiv
Sign and submit the publisher agreement
Double-check the source files and update any that need changing on the journal system

Add a section to the introduction about the new convolutional equivalent layer

From this recent paper: https://www.pinga-lab.org/papers/convolutional-eql-grav.html

Change conclusions for source distributions and use relative depth as default

Based on the results of testing the performance of the different source distributions, we can conclude that:

Block-averaging the sources reduces the computational load, while keeping good accuracy. It's computationally cheaper than using source below data or grid_sources
Changing the depth type doesn't introduce any considerable improvement on the results.
Using variable depths might be more expensive when searching the best set of parameters through CV.
Using relative depth or constant depth reduces the number of parameters to the minimum (depth and damping), while generating the same level of accuracy.

So, would be better to stick with relative depth on any further application of the gridders, like the ones we do with EQLHarmonicBoost.

Plot the SVD of each layer type

@birocoles and @valcris suggested plotting normalize singular values for the Jacobians in each source distribution type. This will help give a more theoretical basis for our conclusions on which layout is better.

Ditch jupytext and .py files

After working with jupytext for a long time, we haven't experienced much of a gain.
The process of keeping both the .py files and the .ipynb notebooks updated had caused some problems and the benefit of review can be achieved through other tools with less hassle.

Add sentence or two about collocation in the introduction

We don't mention this at all but it would be good to round out the introduction. Need to first do some reading on collocation 🙂

Add a README.md

Check citation format for software and data

See what GJI says about that and make sure we comply.

Review/edit methods

I need to finish my review of the methods section.

Preprint deposit on EarthArXiv

After submission, we could post a preprint to EarthArXiv if desired. An advantage of doing this is that we could start coding the gradient boosting into Harmonica and cite the preprint while we wait for the paper reviews.

If we do this, we need to format the paper for EarthArXiv by removing the GJI style and putting a disclaimer at the top saying that it's a preprint that hasn't been peer-reviewed.

@santisoler what do you think?

Replace R2 score for RMS on cross validation

Following #60, we would need to move from R2 score to RMS also on cross validation of EQL gridders, which is only used on real world cases (Australia gravity for example, on #61).

To be able to use a different score on cross validation we would need to use the latest version of Verde that introduces a new scoring argument to verde.cross_val_score.

I think we can safely pass sklearn.metrics.mean_squared_error, although new we would set the best predictor as the one that achieves the minimum score. Also remember to compute its square root before appending it to the scores list.

Add abstract figure to README.md

Keep .ipynb files in the history as well

Jupytext is awesome to actually get a diff of the notebooks. But it also kind of sucks when I just want to see your results without having to run the code here (for example, it might take a long time). One alternative would be keep both the .py and .ipynb files in the history. We can review in PRs using the .py but be able to quickly check the outputs in the notebook. It's a bit wasteful but it's not like we'll be commiting to this repo for years.

What do you think?

Rebrand EQLIterative for EQLHarmonicBoost

Test how gradient boosting performance changes with window size

Add some experiments showing how the gradient boosting algorithm behaves when changing the window size on synthetic data.
Topics of interest would be:

How accurate predictions are for small windows? Is there a threshold where predictions are garbage?
How window size affects computational load?

The last one might be tricky because increasing the window reduces the number of iterations, but increases the size of the least square problem. Might be a trade-off between these two.

Ideas for paper titles

Just to register some ideas:

Reducing the number of parameters in equivalent source processing

Add author contribution statement

From the GJI instructions:

Author contribution statement: GJI encourages authors to include an author contribution statement, stating for example: who analysed the results, who processed the data, who wrote the paper. This should be included as a narrative statement in the acknowledgements section of the paper.

Feedback from EGU2020

Questions from the chat session:

Peter Lelievre, Memorial University (16:19) Hello Santiago. I'm happy to see someone investigating these important practical questions. I have two sets of comments/questions. First, regarding the x's in the table on slide 20 of your presentation: combining the grid source layout with relative depth positioning should be possible: just interpolate onto the topography elevations and then push downwards by some constant amount, no? I feel like combining the grid source layout with variable depth positioning should also be possible: can you use the same strategy you show on page 18 and calculate the mean distance of the gridded points to their nearest neighbours in the original measurement locations?

Hans-Juergen Goetze CAU Kiel (audience) (16:17) Santiago: Could 3D kriging possibly be an acceptable method?

Paolo Mancinelli UniChieti (16:20) Have you tested it on real case datasets?

compgeolab / eql-gradient-boosted Goto Github PK

eql-gradient-boosted's People

Contributors

Stargazers

Watchers

Forkers

eql-gradient-boosted's Issues

Information

Checklist

Recommend Projects

Recommend Topics

Recommend Org

Jobs