Comments (5)
Sampling is cheap so you could just sample a bunch of records then filter for the ones you're interested in.
from ctgan.
Sampling is cheap so you could just sample a bunch of records then filter for the ones you're interested in.
Exactly what @kevinykuo says: right now there is no way to fix variables when sampling, so the only option is to discard the invalid rows and resample until you have as many as you need.
Another option is, is you know what you need in advance, to create a subset of the original dataset with only the rows that contain the value(s) that you need and then fitting a specific CTGAN instance with that subset of rows so all the newly generated ones have those values fixed.
from ctgan.
Sampling is cheap so you could just sample a bunch of records then filter for the ones you're interested in.
Exactly what @kevinykuo says: right now there is no way to fix variables when sampling, so the only option is to discard the invalid rows and resample until you have as many as you need.
Another option is, is you know what you need in advance, to create a subset of the original dataset with only the rows that contain the value(s) that you need and then fitting a specific CTGAN instance with that subset of rows so all the newly generated ones have those values fixed.
What about setting condvec
by hand while sampling? Shouldn't we then get the desired class most of the time?
from ctgan.
This is exactly the feature I’m looking for. Would be a great addition. More specifically, would like the ability to fix any value, discrete or continuous. I’m in financial services where certain combination of high level parameters in a given month (eg interest rate) we would want to fix but have other variables generated such as credit profile of the customer. We may have a high level plan for lending but the loans that get written will vary significantly due to variation in customer profiles.
I can’t just generate synthetic credit profiles because only certain financial products would accept certain credit profiles and it would be inefficient to build a model for each combination of features I want to treat as fixed.
I would think this could be handled by the conditioning vector?
from ctgan.
This feature has been covered in the PR #68
from ctgan.
Related Issues (20)
- Should a 5-Likert scale be treated as either continuous or discrete? HOT 2
- Multi GPU support
- Avoid generating the conditional column
- Add support for Python 3.11
- Add progress bar for CTGAN fitting (+ save the loss values)
- Question about large amount of training dataset in TVAE -- is there max? HOT 1
- Add verbosity TVAE (progress bar + save the loss values)
- Condition with inequality for continuous columns
- Drop support for Python 3.7
- Question regarding CTGAN for data synthesis and classification tasks
- Tracking and Saving TVAE Loss Values HOT 2
- Set generator to eval mode before sampling?
- Switch default branch from master to main
- Remove or implement CTGAN tests
- `ClusterBasedNormalizer` refactor
- Hyperparameters
- Doubts on the usage of conditional sampling HOT 4
- Support Python 3.12
- Tune about CTGAN
- TypeError while ctgan.fit() HOT 6
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from ctgan.