GithubHelp home page GithubHelp logo

compgeolab / eql-gradient-boosted Goto Github PK

View Code? Open in Web Editor NEW
19.0 4.0 6.0 372.26 MB

Paper: Gradient-boosted equivalent sources method for interpolating very large gravity and magnetic datasets

Home Page: https://doi.org/10.31223/X58G7C

License: BSD 3-Clause "New" or "Revised" License

Python 0.27% Makefile 0.02% TeX 0.88% Jupyter Notebook 98.84%
python geophysics geodesy gravity machine-learning gradient-boosting equivalent-source gridding interpolation fatiando-a-terra

eql-gradient-boosted's People

Contributors

leouieda avatar santisoler avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

eql-gradient-boosted's Issues

Submit the paper to a journal

Information

This is the standard EarthArXiv disclaimer for a preprint:

\noindent
\textbf{Disclaimer:}
This is a non-peer reviewed preprint submitted to EarthArXiv.
It is currently under review at \textit{insert-journal}.

Checklist

  • Reserve a DOI on figshare or Zenodo for the supplementary material (zip archive of this repository). Paste the DOI at the top of this issue.
  • Add the supplementary DOI to the manuscript
  • Make a cover letter for the editor. Include a short summary, suggested reviewers, and contributions by student authors
  • Submit the paper to the Journal
  • Submit the preprint to EarthArXiv
  • Add the preprint DOI to the README.md
  • Make a release on GitHub
  • Upload the .zip version of the repository to figshare or Zenodo and publish it
  • Add the preprint to the CompGeoLab (and your personal) website
  • Tell people about your preprint on social media (include a figure and maybe a thread with the main results)

Add noise to the data

@santisoler we forgot a very important thing! Adding some random noise to the data. I suspect this will drive the scores down a bit but I don't think it will change the results much.

Move matplotlib config file to notebook that creates figures

The matplotlib.rc file was created so several notebooks can use the same configuration for creating figures for the manuscript. Since we centrilised the figures creation on a single notebook, there's no need to maintain this file, we could simply change matplotlib's configuration on the notebook itself.

Post-publication updates

A checklist of things to do once the paper is accepted:

  • Update the preprint disclaimer to include the version of reccrd
  • Update the README with the version of record link
  • Update the figshare archive
  • Tag a new version as accepted
  • Update the preprint on EarthArXiv
  • Sign and submit the publisher agreement
  • Double-check the source files and update any that need changing on the journal system

Change conclusions for source distributions and use relative depth as default

Based on the results of testing the performance of the different source distributions, we can conclude that:

  • Block-averaging the sources reduces the computational load, while keeping good accuracy. It's computationally cheaper than using source below data or grid_sources
  • Changing the depth type doesn't introduce any considerable improvement on the results.
  • Using variable depths might be more expensive when searching the best set of parameters through CV.
  • Using relative depth or constant depth reduces the number of parameters to the minimum (depth and damping), while generating the same level of accuracy.

So, would be better to stick with relative depth on any further application of the gridders, like the ones we do with EQLHarmonicBoost.

Plot the SVD of each layer type

@birocoles and @valcris suggested plotting normalize singular values for the Jacobians in each source distribution type. This will help give a more theoretical basis for our conclusions on which layout is better.

Ditch jupytext and .py files

After working with jupytext for a long time, we haven't experienced much of a gain.
The process of keeping both the .py files and the .ipynb notebooks updated had caused some problems and the benefit of review can be achieved through other tools with less hassle.

Preprint deposit on EarthArXiv

After submission, we could post a preprint to EarthArXiv if desired. An advantage of doing this is that we could start coding the gradient boosting into Harmonica and cite the preprint while we wait for the paper reviews.

If we do this, we need to format the paper for EarthArXiv by removing the GJI style and putting a disclaimer at the top saying that it's a preprint that hasn't been peer-reviewed.

@santisoler what do you think?

Replace R2 score for RMS on cross validation

Following #60, we would need to move from R2 score to RMS also on cross validation of EQL gridders, which is only used on real world cases (Australia gravity for example, on #61).

To be able to use a different score on cross validation we would need to use the latest version of Verde that introduces a new scoring argument to verde.cross_val_score.

I think we can safely pass sklearn.metrics.mean_squared_error, although new we would set the best predictor as the one that achieves the minimum score. Also remember to compute its square root before appending it to the scores list.

Keep .ipynb files in the history as well

Jupytext is awesome to actually get a diff of the notebooks. But it also kind of sucks when I just want to see your results without having to run the code here (for example, it might take a long time). One alternative would be keep both the .py and .ipynb files in the history. We can review in PRs using the .py but be able to quickly check the outputs in the notebook. It's a bit wasteful but it's not like we'll be commiting to this repo for years.

What do you think?

Test how gradient boosting performance changes with window size

Add some experiments showing how the gradient boosting algorithm behaves when changing the window size on synthetic data.
Topics of interest would be:

  • How accurate predictions are for small windows? Is there a threshold where predictions are garbage?
  • How window size affects computational load?

The last one might be tricky because increasing the window reduces the number of iterations, but increases the size of the least square problem. Might be a trade-off between these two.

Ideas for paper titles

Just to register some ideas:

  1. Reducing the number of parameters in equivalent source processing

Add author contribution statement

From the GJI instructions:

Author contribution statement: GJI encourages authors to include an author contribution statement, stating for example: who analysed the results, who processed the data, who wrote the paper. This should be included as a narrative statement in the acknowledgements section of the paper.

Feedback from EGU2020

Questions from the chat session:

Peter Lelievre, Memorial University (16:19) Hello Santiago. I'm happy to see someone investigating these important practical questions. I have two sets of comments/questions. First, regarding the x's in the table on slide 20 of your presentation: combining the grid source layout with relative depth positioning should be possible: just interpolate onto the topography elevations and then push downwards by some constant amount, no? I feel like combining the grid source layout with variable depth positioning should also be possible: can you use the same strategy you show on page 18 and calculate the mean distance of the gridded points to their nearest neighbours in the original measurement locations?

Hans-Juergen Goetze CAU Kiel (audience) (16:17) Santiago: Could 3D kriging possibly be an acceptable method?

Paolo Mancinelli UniChieti (16:20) Have you tested it on real case datasets?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.