GithubHelp home page GithubHelp logo

Comments (14)

SethMMorton avatar SethMMorton commented on June 7, 2024 1

FYI - this was released as part of fastnumbers 3.1.0

from fastnumbers.

SethMMorton avatar SethMMorton commented on June 7, 2024

Use isreal().

from fastnumbers.

argenisleon avatar argenisleon commented on June 7, 2024

Thanks for the fast response @SethMMorton
I was looking for something like is_int_or_float with a return of 0 for int or 1 for float.

Maybe is something niche but it will be helpful for filtering data types in columns

from fastnumbers.

SethMMorton avatar SethMMorton commented on June 7, 2024

To be pedantic, isreal is is_int_or_float - you're looking for is_int_xor_float 😄

No, there currently is no functionality for that directly. You could do int_xor_float = isint(x) + isfloat(). It will be 0 if it's not a number, 1 if it's an int, and 2 if it's a float.

from fastnumbers.

argenisleon avatar argenisleon commented on June 7, 2024

Thanks. I am just trying to gain the maximum speed I can get :)

from fastnumbers.

SethMMorton avatar SethMMorton commented on June 7, 2024

If this is something you think would be useful, I would be very open to a PR. I don't think it would require adding any new algorithms, just a new top-level function utilizing existing code.

from fastnumbers.

argenisleon avatar argenisleon commented on June 7, 2024

Thanks, @SethMMorton,

At the moment I have neither the bandwidth or C knowledge to tackle this, but I will be more than happy to collaborate in the future if I can not get other options work.

from fastnumbers.

SethMMorton avatar SethMMorton commented on June 7, 2024

Is the suggestion I made usable, or is this something you need? I'm curious, what application are you needing this for?

from fastnumbers.

argenisleon avatar argenisleon commented on June 7, 2024

Thanks @SethMMorton

I am the main developer in Bumblebee https://github.com/ironmussa/Bumblebee/tree/develop-3.0
I am trying to infer the data type of a column as fast as I can. For that, I would like to apply a function to every element in a pandas series(like an array) to know if it is object,int, float, or null. . Then count every data type.

The problem with the default approach in Dask is that it loads the data in chunks and tries to inter the datatype in every chunk. Sometimes it fails because every chunk results in different data types.

The final goal is to reduce the memory usage, casting to the data type that better represents the data.

from fastnumbers.

SethMMorton avatar SethMMorton commented on June 7, 2024

Doesn't pandas auto-infer the datatype for you? Or are you trying to infer the type of your dataset before inserting into the dataframe?

Either way, I can see the value of a function to tell the type, not just answer "is this a particular type"? I think I was a bit thrown off by the specificity of is_int_or_float - I think a better name would be something like detect_type, and instead of returning a number, it would return the actual python type int or float or whatnot.

Rough python equivalent put in terms of existing fastnumbers functionality:

from fastnumbers import isint, isfloat

def detect_type(x):
    if isint(x):
        return int
    elif isfloat(x):
        return float
    else:
        return None

Open questions:

  • If given a string that is non-numeric, should it return None as shown above, or return str?
  • If given something completely crazy, like a list, should it return None, list, or raise a TypeError?

from fastnumbers.

argenisleon avatar argenisleon commented on June 7, 2024

Yes, pandas can infer the datatype. The problem is Dask because it is inferring the datatype in every chunk of data.

The code you wrote is exactly what I am doing right now :)
About your questions:

If given a string that is non-numeric, should it return None as shown above, or return str?
return str

If given something completely crazy, like a list, should it return None, list, or raise a TypeError?
return a list

from fastnumbers.

SethMMorton avatar SethMMorton commented on June 7, 2024

@argenisleon I have created a PR for this at #38. Can you please review? At the very least, please review the following:

from fastnumbers.

argenisleon avatar argenisleon commented on June 7, 2024

Sure @SethMMorton , I will be reviewing this today

from fastnumbers.

SethMMorton avatar SethMMorton commented on June 7, 2024

Closed by #38

from fastnumbers.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.