Comments (6)
I asked about this inconsistency in egen here, if you want to read more about it. For now, I'll stick to upgrading variable types.
from stata-gtools.
Right. I put that in place in case no type was specified, but I suppose if the user wants to specify a different type then I should allow for that. I'll fix it over the weekend. Thanks!
from stata-gtools.
Actually, I had misunderstood this comment. Due to the way the plugin API works, there's no way to do this efficiently, and even then it's not obvious to me what the default behavior should be. For now, I'll just upgrade the type to something that is for sure safe. egen
does not always do this, by the way. Consider
clear
set obs `=2^24 + 10'
g long x = _n
egen group = group(x)
format %21.0gc x group
l in `=_N - 10' / `=_N'
qui gdistinct x group
matrix list r(distinct)
And you can see that egen
does not adequately type group
so there aren't enough levels.
from stata-gtools.
Hi Mauricio,
Just to be clear, do you plan in upgrading the user-specified type in order to fit the number of levels?
I would argue that we should follow two rules, in order of importance:
- As much as possible, upgrade type to avoid losing information. In particular, if
group()
returns incorrect results, that would be both hard to detect and would create serious problems down the line (e.g. if I get the mean by group, but I used float as default so now I treat two groups as one). - If possible, downgrade type to save memory. So if the user doesn't specify the type of the new variable, and it fits in -byte-, then use byte and not -float- or anything else.
Also, even if you don't like the ideas above, I would argue that the default type for egen group should never be float, because it is always dominated by -long-:
clear all
set obs `=3e7'
gen long i = _n // long supports up to 2bn
gegen long g1 = group(i)
cap noi assert i == g1 // works
gegen g2 = group(i)
cap noi assert i == g2 // fails b/c g2 is float
gegen byte g3 = group(i)
cap noi assert i == g3 // fails b/c g3 is byte
from stata-gtools.
"gegen, group" now forces a type if there might be loss of information. In other words, I follow the first rule but not always the second because, as I mention, the second is not possible to always implement efficiently.
I think you're using an old version of gtools for your example. In gtools 0.12.5 I don't get the issues you mention, and gegen byte g3 = group(i)
you should see the message "(warning: user-requested type 'byte' upgraded to 'long')" (unless I didn't upload to github correctly, in which case do let me know).
from stata-gtools.
Cool! The first rule is the key one, because we can always call compress for the 2nd one!!
from stata-gtools.
Related Issues (20)
- gegen total vs. egen total HOT 6
- Could not load gtools_macosx_v3.plugin, error 9999 HOT 9
- gegen normalize does not realize that a new variable shall be created HOT 1
- gunique missing scalars when there are no observations
- gtools version of merge HOT 5
- gtools not installing on macos Stata 16 HOT 3
- Problem with -if- condition in several commands HOT 1
- Please update the benchmark using Stata 17. HOT 5
- gtools 1.8.1 not working *at all* with Stata MP 16.1 on MacOS 11.6 HOT 7
- Plugin download error when using "ssc install gtools" HOT 2
- the option cw in gcollapse is invalid. HOT 2
- Error r(111) in Stata MP 16.1 and SE 17.0, macOS Monterey HOT 4
- OSX plugin fails; move OSX Compilation to github
- gegen max does not properly evaluate string expressions HOT 2
- Error trying to copy gtop.sthlp in Stata 14 HOT 3
- Could not load gtools_macosx_v3.plugin, error 9999 HOT 17
- Some commands appear to ignore [w=weights] HOT 3
- Export results to word or excel HOT 4
- Wrong number of groups HOT 1
- Will greshape support strL variabes in the future? HOT 4
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from stata-gtools.