GithubHelp home page GithubHelp logo

software paper about pyjanitor HOT 24 OPEN

pyjanitor-devs avatar pyjanitor-devs commented on August 20, 2024
software paper

from pyjanitor.

Comments (24)

jcmkk3 avatar jcmkk3 commented on August 20, 2024 2

Not sure how comprehensive that you're trying to be with the "Comparison to other tools" section, but another couple of tidyverse influenced libraries for python are plydata (https://github.com/has2k1/plydata) and kadro (https://github.com/koaning/kadro). I think that both are some of the most interesting previous art in the python space.

Also, one not tied to the tidyverse, but also some similar objectives with a verb based method chaining approach to data preparation, pdpipe (https://github.com/shaypal5/pdpipe).

from pyjanitor.

zbarry avatar zbarry commented on August 20, 2024 1

I think the wishlist section code example you've given argues very clearly why pyjanitor is so game-changing.

from pyjanitor.

zbarry avatar zbarry commented on August 20, 2024 1

Provide note in architecture section that the chaining is not copying data on each call unless that would be the point of the method. That's an important point for people concerned about performance.

from pyjanitor.

ericmjl avatar ericmjl commented on August 20, 2024 1

Wasn't very clear what I meant, haha. Idk, when I was reading this in a meeting, I kind of snickered because for whatever reason, to me at the time, it read like "yeah, sure, these people chipped in, but these other guys are more important" which, while that may not be an incorrect statement, could possibly be said in a different way. Not sure if others would read it like that.

Got it. Yes, could use rephrasing. Feel free to PR a change.

from pyjanitor.

Zsailer avatar Zsailer commented on August 20, 2024 1

After thinking about it for a bit, I don't think you need to add anything more about pandas-flavor 👍
The current level of detail is appropriate for your readers. Anything more gets into the "weeds" of Pandas.

That said, I've add a TL;DR section to the pandas-flavor README that you could always reference if you'd like (no pressure from me 😃). It provides a simple explanation of how method registration works in the register_dataframe_method decorator.

from pyjanitor.

ericmjl avatar ericmjl commented on August 20, 2024 1

@Zsailer thanks for the feedback! I'm more than happy to include you on the paper regardless, because pandas-flavor was very enabling for pyjanitor to become a reality. Would you still like to be included? Please let me know!

from pyjanitor.

zbarry avatar zbarry commented on August 20, 2024

Thanks Eric; this is quite the honor.

from pyjanitor.

szuckerman avatar szuckerman commented on August 20, 2024

Great! I'd be glad to take a look.

from pyjanitor.

zbarry avatar zbarry commented on August 20, 2024

Tidy data manuscript citation to possibly use: https://www.jstatsoft.org/article/view/v059i10

from pyjanitor.

zbarry avatar zbarry commented on August 20, 2024

I'm just stream-of-consciousness'ing things here that I might add in myself later.

Cite the SO study

from pyjanitor.

zbarry avatar zbarry commented on August 20, 2024

More significant has been the contributions from data scientists seeking a cleaner API for cleaning data

The connotation here is probably not what is intended.

from pyjanitor.

zbarry avatar zbarry commented on August 20, 2024

Diagrams showing how a DataFrame is progressively mutated over a chain of methods might be interesting.

from pyjanitor.

ericmjl avatar ericmjl commented on August 20, 2024

I'm just stream-of-consciousness'ing things here that I might add in myself later.

Cite the SO study

Already done.

from pyjanitor.

ericmjl avatar ericmjl commented on August 20, 2024

More significant has been the contributions from data scientists seeking a cleaner API for cleaning data

The connotation here is probably not what is intended.

Wait what? Not sure what you mean by that.

from pyjanitor.

ericmjl avatar ericmjl commented on August 20, 2024

Diagrams showing how a DataFrame is progressively mutated over a chain of methods might be interesting.

I think that belongs in the docs, which could be much better improved. There is already one example available, copied directly from the janitor repository.

from pyjanitor.

zbarry avatar zbarry commented on August 20, 2024

I think that belongs in the docs, which could be much better improved. There is already one example available, copied directly from the janitor repository.

I was thinking in more of an overview figure sense so people can get a clear visual indication of how you can easily track the chain and its effects on the DataFrame, though it's not super important. It's more for improving visual appeal of the paper than anything else.

Newcomer contributors to open source have made their maiden contributions to pyjanitor, and experienced software engineers have also chipped in. More significant has been the contributions from data scientists seeking a cleaner API for cleaning data.

Wasn't very clear what I meant, haha. Idk, when I was reading this in a meeting, I kind of snickered because for whatever reason, to me at the time, it read like "yeah, sure, these people chipped in, but these other guys are more important" which, while that may not be an incorrect statement, could possibly be said in a different way. Not sure if others would read it like that.

from pyjanitor.

zbarry avatar zbarry commented on August 20, 2024

Just curious - what would be the eventual format the paper would be written in? E.g., Word, LaTeX, etc.? If it's the latter, there are good templates out there for structuring manuscripts, of course. Happy to do the typesetting if we go for it.

from pyjanitor.

ericmjl avatar ericmjl commented on August 20, 2024

@zbarry that'll depend, but for an arXiv deposition, I'd probably use some tooling I already have to convert the markdown text into latex, which involves Pandoc in the loop. It's something I did for my thesis, where a PDF of the paper can go into continuous integration build step as well. It'll be like how readthedocs builds docs, except for papers!

I think you're more well-versed in latex typesetting than I would be, so if you'd like to build the template, please go for it!

from pyjanitor.

zbarry avatar zbarry commented on August 20, 2024

Whoa, that's cool lol. I'd expect nothing less. Sounds good.

from pyjanitor.

ericmjl avatar ericmjl commented on August 20, 2024

@jcmkk3 thanks for the feedback! Yes, I will have to update the "comparison to other tools" as well.

from pyjanitor.

ericmjl avatar ericmjl commented on August 20, 2024

Going to start a GitHub projects board for this.

from pyjanitor.

ericmjl avatar ericmjl commented on August 20, 2024

Inviting @eli-s-goldberg and @Zsailer onto the thread.

@eli-s-goldberg I have finally gotten to reviewing your proposed changes, and I definitely like and value the feedback. I am making changes on the basis of this. I'd also like to invite you to contribute an end-user testimony of some kind to the paper - if you're inclined! Totally understand if you'd like to decline, given your load at the moment.

@Zsailer I am also inviting you onto the thread because pandas-flavor has been very enabling for this project. At the moment, I am wondering if you are open to contributing comments on whether additional description about pandas-flavor would help educate an end-user about pyjanitor's architecture? Again, also only if you're inclined to do so!

I hope to keep things lightweight for both of you, since I'm sure both of you are super busy with your respective things. If you don't want to do the hassle of a PR, I'm happy to accept your contributions via the issue tracker (i.e. just copy/paste the text you'd like added or modified).

from pyjanitor.

Zsailer avatar Zsailer commented on August 20, 2024

@ericmjl thanks for the invite!

I'd be happy to contribute. I'll take a closer look at the "architecture" section in the next couple days and leave comments.

After a quick read, the paper is looking great!

from pyjanitor.

Zsailer avatar Zsailer commented on August 20, 2024

I'd be honored! 😃

from pyjanitor.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.