gsbdbi / ds-wgan Goto Github PK
View Code? Open in Web Editor NEWDesign of Simulations using WGAN
License: MIT License
Design of Simulations using WGAN
License: MIT License
Currently a continuous variable is required for wgan training. Would it be possible to add a fix such that we can train with just categorical variables? i.e. leaving the continuous variable as empty.
Is there an easy way to use/implement categorical context variables?
Is my understanding correct that all context_vars
will implicitly be treated as continuous, such that I should (?) turn them into dummy variables manually beforehand if they are categorical (and take on more than 2 values)? Or is there a reason to prefer treating categorical context variables as continuous?
Hi,
I am trying to generate many several Y's simultaneously (some are binary and some are continuous) based on X, t. It tuened out that the training failed since the test error and training loss will both blow up to infinity. It works if I want to generate only continuous Y's.
I include both my data and the jupyter notebook file that I use based on your colab example. Could you kindly have a look? Thanks!
example.zip
The current requirements.txt
file does not specify versions.
However, this line does not run with torch <= 1.0.0.
The reason is that the output of max
is a tuple
in versions 1.0.0 and older, but it is a namedtuple
in version 1.1.0
and up. The line linked above relies on the output being a namedtuple
.
When running the second line of the third block in the Google Colab notebook for the tutorial
!pip3 install git+https://github.com/gsbDBI/ds-wgan.git@package#egg=wgan
installation fails and gives the error
"Did not find a branch or tag 'package', assuming 'revision' or 'ref'".
it would be great if there was a way to simulate integer / ordered categorical data, say age in years. Treating it as a categorical variable seems to yield data sets where other variables are less smooth in age than desired and probably also increases the complexity of the training task (by turning each value into a dummy?). Treating it as a continuous variable requires rounding ex post, but ideally the rounding would happen even in training?
The GAN output for the welfare data is creating skewed binary variables for almost all of them. For example, race is either 1, 2 or 3 in the original data, but in the generated data is it 0, 1, or 2:
race_welfare | race_gan |
---|---|
1 | 0 |
2 | 1 |
3 | 2 |
Similarly, the numbering is off for categorical variables with many categories (below is just the first 20 categories for the indus80 variable):
indus_80_welfare | indus_80_gan |
---|---|
10 | 0 |
11 | 1 |
20 | 2 |
21 | 3 |
30 | 4 |
31 | 5 |
40 | 6 |
41 | 7 |
42 | 8 |
50 | 9 |
60 | 10 |
100 | 11 |
101 | 12 |
102 | 13 |
110 | 14 |
111 | 15 |
112 | 16 |
120 | 17 |
121 | 18 |
122 | 19 |
It would be really nice if it was possible to use the generator to generate data without access to the real data. In particular, to get the scaling/centering and variable names right in the deprocess function, it would be nice if it was possible to have a function of the package to save (and load) those aspects of the data wrapper that are truly needed (variable names and types, means/standard deviations/values for categorical variables?).
It seems like that should in principle be possible such that the user does not need to continue having access to/loading the real data when they want to generate artificial data?
(Simulating one very large data set once after training while the real data is still loaded isn't always a great option, in particular when considering very large samples)
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.