Comments (3)
@petulla and @mikiwz - I think I found the reason for this issue. In this line:
https://github.com/georgian-io/Multimodal-Toolkit/blob/master/multimodal_transformers/data/load_data.py#L228
The package concatenates the train, val, and test dfs. Then, if you're precessing the categorical features via one hot encoding, which is the default, it will one hot encode with ALL of those dfs together.
For example, say your train df has a categorical feature with values ["a", "b"]. This would get one hot encoded as 2 separate columns (a and b). However, say your test data has values ["a", "c"]. Well, with the way this is currently packaged, the train and test data is concatenated together and so there will be one hot encoding to produce 3 columns (a, b, and c). But, if you load your test dataset separately, you would only one hot encode "a" and "c" - resulting in 2 columns instead of 3. This is the issue. The model was thus trained on 3 columns, but you're giving it 2 columns to predict with.
The way around this is to either not use categorical data, or use label encoding instead:
test_dataset_2 = load_data(
test_data,
data_args.column_info['text_cols'],
tokenizer,
label_col=data_args.column_info['label_col'],
label_list=data_args.column_info['label_list'],
numerical_cols=data_args.column_info['num_cols'],
sep_text_token_str=tokenizer.sep_token,
categorical_encode_type="label"
)
from multimodal-toolkit.
I have this same issue when trying to use load_data() on separate test and train dataframes. Weirdly, one of the two tabular_torch_dataset.TorchTextDataset` returned from load_data will train; the other will not.
I have to use the code with the load from file and setup just as in the colab to make work.
@codeKgu Seems like you put a ton of work into this repo. Would be great to get this fixed.
from multimodal-toolkit.
Closing as this has been answered.
from multimodal-toolkit.
Related Issues (20)
- pip error on m1 pro apple.
- pip error on linux
- Make code compatible with newest transformers version
- AttributeError: 'OneHotEncoder' object has no attribute 'get_feature_names'
- Better versioning system HOT 2
- n_gpu in multimodal_exp_args.py
- Standardize output formats to match transformers
- AxisError: axis 1 is out of bounds for array of dimension 1 HOT 3
- TypeError: __init__() got an unexpected keyword argument 'evaluate_during_training' HOT 3
- Inference Error with Text Features Only HOT 7
- Please check the colab notebook HOT 4
- Can numerical feature in table be multidimensional arrays? HOT 2
- XLNet doesn't support the "weighted_feature_sum_on_transformer_cat_and_numerical_feats" combining module
- AttributeError: 'OurTrainingArguments' object has no attribute 'deepspeed_plugin' HOT 5
- Unable to run notebook examples HOT 4
- RuntimeError: Caught RuntimeError in replica 0 on device 0. HOT 2
- Help with inference HOT 2
- Bad dtype when categorical_encode_type is set to label HOT 3
- Misleading TabularConfig parameter in Example code HOT 1
- Batch Normalization is done even with False BN parameter HOT 3
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from multimodal-toolkit.