GithubHelp home page GithubHelp logo

fast-pandas's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

fast-pandas's Issues

Potential typo in section 1.3 of readme

Hey there,

Thanks for all the work you put into this, it will be a great reference in the future.

I noticed something that could be either a typo or an overlooked error - or I could just be misunderstanding. In section 1.3 of the readme, it seems like you've swapped the contents of the query_selection and bracket_selection functions:

def query_selection(df):
    return df[(df["A"] > 0) & (df["A"] < 100)]

def bracket_selection(df):
    return df.query("A > 0 and A < 100")

It seems like query_selection should be making a call to the df.query() method, and bracket_selection should be using [] to select data. If this isn't the case, then I'd argue you should probably use different or more explicit function names, as I definitely got the wrong impression from your current names. =)

Finally, if this was unintentional, I'd be curious as to whether the benchmarks were actually reversed, or if this was just a typo in creating the write-up.

Thanks again!

Improving the timing even more

I dont know the implementation details of np.mean and df["A"].mean() but I guess that np.mean(df["A"]) is slow because numpy has to cast the pd.Series in some way.
Using np.mean(df["A"].values) is probably always faster, especially for large arrays.

benchmarks for pivot, groupby, resampling

these microbenchmarks are informative but in supporting tensorflow dataprep for larger dataset (2.5 million events) pandas ventures into swap territory(for me).

maybe there are expert pandas techniques and more than one way to do things

Either legend is wrong or bullet points are wrong

In the selection section these bullet points don't match the graph according to the legend.

  • loc and query selections are identical in performance.
  • Square bracket selection is the slowest method.

According to the graph it looks like loc and square brackets are identical in performance. It looks like query is the slowest method. I'm not sure if the issue is with the legend in the graph or the bullet points.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.