Comments (6)
I do not think it's specific to total
, and the issue is not quite what you describe so it affects every command. If gegen
is called without a by
prefix, then the expression is computed without by
. If you want to compute the expression by a set of variables, you need to use the by
prefix.
I cannot change the default behavior of gegen
. Computations are done in C and I cannot parse Stata's syntax there. However, I can try to print a warning here and in the documentation? The whole point of gtools
is that the data needn't be sorted; I didn't realize that egen computed stuff after the sort even when by is not a prefix.
FYI this works:
bys id: gegen gtot = total(cat!=cat[_n-1])
PS: Sorts are not stable by default and the sort order of your data will affect it. You ought to do something like
bys id (subid): gegen gtot = total(cat!=cat[_n-1])
from stata-gtools.
I see, makes sense that this problem is not specific to gegen total.
If I've understood gtools correctly, it's improvement over egen disappears when gegen is combined with the by-prefix (and not the by()-option), right?
I.e.
bys id: gegen gtot = total(cat!=cat[_n-1])
Isn't necessarily any faster than using egen?
gsort id
by id: egen gtot = total(cat!=cat[_n-1])
I agree: If it's not easily fixed, then a warning might help other users to be aware of this potential problem.
Never mind the mistake in the toy example. I guess the problem here is what happens when gtools tries to call the _n==0 observation when using by().
(Btw: Thanx for a very good package. gtools gives me several hours of extra coding time each week).
from stata-gtools.
Right; the by
prefix eliminates much of the speed gains. It may still be faster in some cases, but it could also be slower in others.
The issue is that gtools is computing the expression for the whole data, whereas egen
is doing it by group. Not really the 0
th observation. If cat was all the same, then gtools
would give yet a different answer, though egen
would not change.
from stata-gtools.
Hm, haven't read enough about gtools to understand exactly what's going on here. But yeah, if I replace the toy example above with
(...)
replace cat="one" //if id!="1"
(...)
I get a data set where cat is always equal to "one". Then gtools produces
id | cat | gtot | tot |
---|---|---|---|
1 | one | 1 | 1 |
1 | one | 1 | 1 |
1 | one | 1 | 1 |
2 | one | 0 | 1 |
2 | one | 0 | 1 |
2 | one | 0 | 1 |
3 | one | 0 | 1 |
3 | one | 0 | 1 |
3 | one | 0 | 1 |
Which is another surprising result.
So, what's the moral here? Subscripting with gtools should be used with caution (or not at all)?
from stata-gtools.
@adamreir The lesson is this:
gtools
internals are in C and cannot parse Stata syntax. If you are creating variables inside ofgegen
, they are being created before gtools internals are called.- Therefore you should think of variables created inside
gegen
functions as equivalent to usinggen
, because that is what it is doing.- If you call
by ...: gegen
then the variable creation will be equivalent toby ...: gen
- If you call
gegen
then the variable creation will be equivalent to simply callinggen
.
- If you call
- After variable creation, gtools is called and the function is invoked correctly by group.
from stata-gtools.
Aha, now I understand what's going on! Will keep this in mind.
Thanx a lot!
from stata-gtools.
Related Issues (20)
- Could not load gtools_macosx_v3.plugin, error 9999 HOT 9
- gegen normalize does not realize that a new variable shall be created HOT 1
- gunique missing scalars when there are no observations
- gtools version of merge HOT 5
- gtools not installing on macos Stata 16 HOT 3
- Problem with -if- condition in several commands HOT 1
- Please update the benchmark using Stata 17. HOT 5
- gtools 1.8.1 not working *at all* with Stata MP 16.1 on MacOS 11.6 HOT 7
- Plugin download error when using "ssc install gtools" HOT 2
- the option cw in gcollapse is invalid. HOT 2
- Error r(111) in Stata MP 16.1 and SE 17.0, macOS Monterey HOT 4
- OSX plugin fails; move OSX Compilation to github
- gegen max does not properly evaluate string expressions HOT 2
- Error trying to copy gtop.sthlp in Stata 14 HOT 3
- Could not load gtools_macosx_v3.plugin, error 9999 HOT 17
- Some commands appear to ignore [w=weights] HOT 3
- Export results to word or excel HOT 4
- Wrong number of groups HOT 1
- Will greshape support strL variabes in the future? HOT 4
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from stata-gtools.