Comments (14)
FYI - this was released as part of fastnumbers 3.1.0
from fastnumbers.
Use isreal()
.
- https://fastnumbers.readthedocs.io/en/master/api.html#fastnumbers.isreal
- https://github.com/SethMMorton/fastnumbers#checking-functions
from fastnumbers.
Thanks for the fast response @SethMMorton
I was looking for something like is_int_or_float
with a return of 0 for int or 1 for float.
Maybe is something niche but it will be helpful for filtering data types in columns
from fastnumbers.
To be pedantic, isreal
is is_int_or_float
- you're looking for is_int_xor_float
😄
No, there currently is no functionality for that directly. You could do int_xor_float = isint(x) + isfloat()
. It will be 0
if it's not a number, 1
if it's an int, and 2
if it's a float.
from fastnumbers.
Thanks. I am just trying to gain the maximum speed I can get :)
from fastnumbers.
If this is something you think would be useful, I would be very open to a PR. I don't think it would require adding any new algorithms, just a new top-level function utilizing existing code.
from fastnumbers.
Thanks, @SethMMorton,
At the moment I have neither the bandwidth or C knowledge to tackle this, but I will be more than happy to collaborate in the future if I can not get other options work.
from fastnumbers.
Is the suggestion I made usable, or is this something you need? I'm curious, what application are you needing this for?
from fastnumbers.
Thanks @SethMMorton
I am the main developer in Bumblebee https://github.com/ironmussa/Bumblebee/tree/develop-3.0
I am trying to infer the data type of a column as fast as I can. For that, I would like to apply a function to every element in a pandas series(like an array) to know if it is object
,int
, float
, or null.
. Then count every data type.
The problem with the default approach in Dask is that it loads the data in chunks and tries to inter the datatype in every chunk. Sometimes it fails because every chunk results in different data types.
The final goal is to reduce the memory usage, casting to the data type that better represents the data.
from fastnumbers.
Doesn't pandas auto-infer the datatype for you? Or are you trying to infer the type of your dataset before inserting into the dataframe?
Either way, I can see the value of a function to tell the type, not just answer "is this a particular type"? I think I was a bit thrown off by the specificity of is_int_or_float
- I think a better name would be something like detect_type
, and instead of returning a number, it would return the actual python type int
or float
or whatnot.
Rough python equivalent put in terms of existing fastnumbers functionality:
from fastnumbers import isint, isfloat
def detect_type(x):
if isint(x):
return int
elif isfloat(x):
return float
else:
return None
Open questions:
- If given a string that is non-numeric, should it return
None
as shown above, or returnstr
? - If given something completely crazy, like a
list
, should it returnNone
,list
, or raise aTypeError
?
from fastnumbers.
Yes, pandas can infer the datatype. The problem is Dask because it is inferring the datatype in every chunk of data.
The code you wrote is exactly what I am doing right now :)
About your questions:
If given a string that is non-numeric, should it return None as shown above, or return str?
return str
If given something completely crazy, like a list, should it return None, list, or raise a TypeError?
return a list
from fastnumbers.
@argenisleon I have created a PR for this at #38. Can you please review? At the very least, please review the following:
from fastnumbers.
Sure @SethMMorton , I will be reviewing this today
from fastnumbers.
Closed by #38
from fastnumbers.
Related Issues (20)
- Make most options keyword-only
- Rename "key" option to "on_fail"
- [BUG] FastNumbers can crash with a SystemError due to returning NULL without setting an exception HOT 5
- Proposal: change behavior of isfloat with respect to treatment of float("nan") HOT 19
- Proposal: change behavior of isfloat function with respect to treatment of strings containing integers HOT 3
- Proposal: Do not raise an exception on None HOT 5
- python3.9 compatibility HOT 5
- Re-write using C++ and pybind11
- Add support to release Linux aarch64 wheels HOT 1
- Broken 3.2.0 installation
- Missing -lm breaks build on armv7hl
- Error: <built-in function isint/isfloat> returned NULL without setting an error HOT 2
- Use fast C++ methods like std::from_chars or fast_float HOT 1
- Improve performance with METH_FASTCALL
- Add support for operating on iterables
- Add numpy support? HOT 2
- Big numbers handling: silencing arbitrary digits beyond least significant bit HOT 9
- Python 3.12 support HOT 1
- Push an updated version to the PyPi HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from fastnumbers.