GithubHelp home page GithubHelp logo

gem-benchmark / nl-augmenter Goto Github PK

View Code? Open in Web Editor NEW
760.0 760.0 196.0 91.64 MB

NL-Augmenter ๐ŸฆŽ โ†’ ๐Ÿ A Collaborative Repository of Natural Language Transformations

License: MIT License

Python 86.87% Jupyter Notebook 13.10% Makefile 0.03%

nl-augmenter's People

Contributors

aadesh11 avatar abinayam02 avatar ashish3586 avatar asnota avatar boyleconnor avatar bryanwilie avatar filco306 avatar gentaiscool avatar gxywang avatar juand-r avatar jzcs2018 avatar kaustubhdhole avatar kvadityasrivatsa avatar marco-digio avatar mnamysl avatar nickeilf avatar raft001 avatar saad-mahamood avatar samuelcahyawijaya avatar sirrob1997 avatar sotwi avatar tanay2001 avatar tanfiona avatar timothy22000 avatar uyaseen avatar vyraun avatar wwydmanski avatar xudongolivershen avatar zhexiongliu avatar zijwang avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

nl-augmenter's Issues

The default performance evaluation shows strange results

Hi all,

If one runs the evaluate.py script against our transformation (#230), the results are very strange. The performance is too good, considering the dramatic changes made by our transformation.

Here is the performance of the model aychang/roberta-base-imdb on the test[:20%] split of the imdb dataset
The accuracy on this subset which has 1000 examples = 96.0
Applying transformation:
100%|โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆ| 1000/1000 [00:19<00:00, 51.83it/s]
Finished transformation! 1000 examples generated from 1000 original examples, with 1000 successfully transformed and 0 unchanged (1.0 perturb rate)
Here is the performance of the model on the transformed set
The accuracy on this subset which has 1000 examples = 100.0

On the other hand, if we use non-default models, they produce reasonable results (kudos to @sotwi):

roberta-base-SST-2: 94.0 -> 51.0
bert-base-uncased-QQP: 92.0 -> 67.0
roberta-large-mnli: 91.0 -> 43.0

I speculate that the problem in the default test could be caused by some deficiency in the model aychang/roberta-base-imdb and / or the imdb dataset. But I'm not knowledgeable enough in the inner workings of the model to identify the source of the problem.


How to reproduce the strange results:

Get the writing_system_replacement transformation from #230.

cd to the NL-Augmenter dir.

Run this:

python3 evaluate.py -t WritingSystemReplacement


Expected results:

a massive drop in accuracy, similar to the results by @sotwi on non-default models, as mentioned above.

Observed results:

a perfect accuracy of 100.0.

`insert_abbreviation` incorrectly imports python file and incorrectly assumes resources are relative to transformation directory

These issues appear when trying to use this transformation outside of the root NL-Augumenter directory. For example in another sub-directory off the root directory. The fixes needed are the following:

  • import grammaire.py using the full import path: import transformations.insert_abbreviation.grammaire as grammaire
  • Remove sys.path.append("./transformations/insert_abbreviation")
  • Use file = os.path.join(os.path.dirname(os.path.abspath(__file__)), '<file_name>.txt') to get a handle on the current path relative to transformation.py script file. This will allow easy access to the two .txt resource files.

`english_inflectional_variation` throws ValueError when called

Here is the stack trace when the EnglishInflectionalVariation class is initialised:

File "/Users/saad/Documents/Research Work/GEM/NL-Augmenter/transformations/english_inflectional_variation/__init__.py", line 1, in <module>
    from .transformation import *
  File "/Users/saad/Documents/Research Work/GEM/NL-Augmenter/transformations/english_inflectional_variation/transformation.py", line 1, in <module>
    import random, lemminflect
  File "/Users/saad/Documents/Research Work/GEM/NL-Augmenter/venv/lib/python3.9/site-packages/lemminflect/__init__.py", line 49, in <module>
    spacy.tokens.Token.set_extension('inflect', method=Inflections().spacyGetInfl)
  File "spacy/tokens/token.pyx", line 47, in spacy.tokens.token.Token.set_extension
ValueError: [E090] Extension 'inflect' already exists on Token. To overwrite the existing extension, set `force=True` on `Token.set_extension`.

Standardize loading of different spacy models

Some of the transformations/filters use different spacy models (en, es, zh, de). The way it is loaded needs to be standardized. The function initialize_models in initialize.py needs to be re-written to accommodate language parameter and the following transformations/filters should be updated.

Once the changes are done, test the modules individually using pytest using the below command,

pytest -s --t=<module_name>

Transformations:

  • grapheme_to_phoneme_transformation
  • city_names_transformation
  • synonym_substitution
  • ocr_perturbation
  • change_person_named_entities
  • antonyms_substitute
  • emojify
  • sentence_reordering
  • transformer_fill
  • auxiliary_negation_removal
  • correct_common_misspellings
  • word_noise
  • yes_no_question
  • subject_object_switch
  • dyslexia_words_swap
  • close_homophones_swap
  • gender_neutral_rewrite
  • tense
  • adjectives_antonyms_switch
  • abbreviation_transformation
  • hashtagify
  • token_replacement
  • mr_value_replacement
  • urban_dict_swap
  • syntactically_diverse_paraphrase
  • yoda_transform
  • disability_transformation
  • replace_numerical_values
  • unit_converter
  • suspecting_paraphraser
  • change_date_format
  • negate_strengthen
  • gender_culture_diverse_name
  • lexical_counterfactual_generator
  • change_two_way_ne
  • gender_culture_diverse_name_two_way
  • replace_abbreviation_and_acronyms
  • replace_financial_amounts
  • slangificator
  • summarization_transformation
  • pinyin
  • gender_neopronouns
  • spanish_gender_swap
  • add_hashtags

Filters:

  • question_filter
  • length
  • polarity
  • yesno_question
  • keywords
  • soundex
  • numeric
  • code_mixing
  • speech_tag
  • quantitative_ques
  • group_inequity
  • token_amount

Error when evaluating TEXT_TO_TEXT_GENRATION

When running python evaluate.py -t ButterFingersPerturbation -task "TEXT_TO_TEXT_GENERATION" -p 1, there will be error of

Here is the performance of the model on the transformed set
Length of Evaluation dataset is 226
Traceback (most recent call last):
  File "evaluate.py", line 67, in <module>
    if_filter
  File "./NL-Augmenter/evaluation/evaluation_engine.py", line 41, in evaluate
    percentage_of_examples=percentage_of_examples,
  File "./NL-Augmenter/evaluation/evaluation_engine.py", line 115, in execute_model
    split=f"test[:{percentage_of_examples}%]",
  File "./NL-Augmenter/evaluation/evaluate_text_generation.py", line 44, in evaluate
    dataset, summarization_pipeline, transformation=operation
  File "./NL-Augmenter/evaluation/evaluate_text_generation.py", line 70, in transformation_performance
    pt_dataset, summarization_pipeline
  File "./NL-Augmenter/evaluation/evaluate_text_generation.py", line 81, in performance_on_dataset
    article, gold_summary = example
  File "./NL-Augmenter/dataset.py", line 301, in <genexpr>
    yield (datapoint[field] for field in self.fields)
TypeError: string indices must be integers


Language Detection

How can we detect, which language is used for the evaluation on the fly?
We want to apply the correct transformation in "generate" on the fly according to the current language...

Thanks in advance

Spacy behaves differently when testing one case vs testing all cases

It seems Spacy's tokenizer behaves differently when I run pytest -s --t=emojify and pytest -s --t=light --f=light.

For example, I added the following snippet in my generate() function:

print([str(t) for t in self.nlp(sentence)])

With input sentence "Apple is looking at buying U.K. startup for $132 billion."

pytest -s --t=emojify gives:

['Apple', 'is', 'looking', 'at', 'buying', 'U.K.', 'startup', 'for', '$', '132', 'billion', '.']

However, pytest -s --t=light --f=light gives:

['Apple', 'is', 'looking', 'at', 'buying', 'U.K.', 'startup', 'for', '$1', '32', 'billion.']

I use the fowling code to load spacy:

import spacy
from initialize import spacy_nlp
self.nlp = spacy_nlp if spacy_nlp else spacy.load("en_core_web_sm")

It looks very strange. Am I overlooking something?

`GermanGenderSwap` missing `noun_pairs.json` file and incorrectly assumes the resources are on the script path

Hi @raft001,

It seems that in addition to issue #310 there are two other issues that need addressing:

  • noun_pairs.json is missing. This needed on line 17.
  • The script assumes that the resource *.json files will always be on the script path. Please instead do the following to resolve the path:
    file = os.path.join(os.path.dirname(os.path.abspath(__file__)), '<file_name>.json')

Then current_path can be used as the absolute path to your resource files.

Standardize module names - Transformation

The module number-to-word should be changed to number_to_word.

Solution:

  • Rename the folder from number-to-word to number_to_word
  • Add an entry number_to_word in the test/mapper.py file in the appropriate dictionary (either heavy or light transformation depending on the flag heavy)
  • Once added, test the module by executing
pytest -s --t=number_to_word

Spacy upgrade to 3.0+

Hi there,
Just wondering - is there any reason spacy is locked with the old version spacy==2.2.4 in the main requirements.txt?

Spacy 3.0 was quite a big upgrade from 2.2.4, and 3.1.0 was just released today so it might make sense to look forward and make that a requirement instead.

I don't think any current implementations would break by this upgrade but I'm happy to make a PR for it and fix things if needed.

Cannot Run `evaluate.py` Script

I've tried running the evaluate.py script in this Colab notebook. I get the following error:

OSError: /usr/local/lib/python3.7/dist-packages/torchtext/_torchtext.so: undefined symbol: _ZNK3c104Type14isSubtypeOfExtESt10shared_ptrIS0_EPSo

Should we add a global seed for all transformations?

Almost all transformations such as, for example, butter_fingers_perturbation or replace_numerical_values use a seed in their constructor that is set to some value. How are we going to handle the global seed? we could easily set one in initialize.py that get's imported in each transformation and set that as the default, similar to what is currently done for spacy_nlp. Otherwise, we can also set it during evaluation, as far as I could tell that is not currently done but I think having a global default is a little cleaner.

Happy to make the required changes if that's something we'd want.

Loading of Filter Tests

I think there might be something broken with the filter tests, at least when I extended the test.json of the TextContainsKeywordsFilter to contain another test case:

{
    "type": "keywords",
    "test_cases": [
        {
            "class": "TextContainsKeywordsFilter",
            "args": {
                "keywords": ["in", "at"]
            },
            "inputs": {
                "sentence": "Andrew played cricket in India"
            },
            "outputs": true
        },
	{
            "class": "TextContainsKeywordsFilter",
            "args": {
                "keywords": ["sad"]
            },
            "inputs": {
                "sentence": "Andrew played cricket in India"
            },
            "outputs": false
        }
    ]
}

And then ran: pytest -s --f=keywords

It fails the test, although from my understanding it should still work properly. In particular, after printing self.keywords in the filter method, it seems like there is no new instance created for the new test case and the old keywords are still used which causes the second test case to fail.

Am I misusing something here? I ran into this when writing the tests for my addition of a filter.

`gender_neutral_rewrite` Unresolved references to spaCy and Unresolved List reference

When running the gender_neutral_rewrite there are several unresolved references to the spacy_nlp variable. In particular on line:

  • Line 27: self.nlp = spacy_nlp if spacy_nlp else spacy.load("en_core_web_sm")

Please use from initialize import spacy_nlp to get a handle on the global spacy instance.

There is also an unresolved reference on Line 495: def generate(self, sentence: str) -> List[str]. List[str] is not resolvable. Should this be lower case? e.g. list[str]

`Formal2Casual` fails to load due to unavailable huggingface model

from nlaugmenter.transformations.formality_change.transformation import Formal2Casual
OSError: prithivida/parrot_adequacy_on_BART is not a local folder and is not a valid model identifier listed on 'https://huggingface.co/models'
If this is a private repository, make sure to pass a token having permission to this repo with `use_auth_token` or log in with `huggingface-cli login` and pass `use_auth_token=True`.

The model (prithivida/parrot_adequacy_on_BART) is indeed not available on huggingface anymore. Perhaps an acceptable alternative is to use prithivida/parrot_adequacy_model instead?

Tests do not Check that Expected and Generated Outputs have Same Number of Sentences

This issue concerns the following line in the main test script:

for pred_output, output in zip(perturbs, outputs):

The zip() builtin (which is used in the above-mentioned line to pair up expected sentences with generated sentences) clips the longer of its two inputted iterables to the length of the shorter iterable. E.g.:

>>> list(zip([1,2,3], [6,7,8,9,10]))
[(1, 6), (2, 7), (3, 8)]

This means that even if a transformation generates fewer sentences (e.g. 0) than the expected number of sentences, it will still pass and the later expected sentences will not get evaluated. This also makes it impossible to test affirmatively that a transformation does not generate any outputs for a given input.

I would recommend either asserting that the two iterables are of equal length, or replacing zip() with zip_longest().

`ocr_perturbation` requirements issues

The ocr_perturbation package requires trdg==1.6.0. However, under macOS 11.6 with Python 3.9 it will not install due to a dependency on pillow==7.0.0, which generates a RequiredDependencyException: zlib error.

Installing pillow==8.3.2 works fine but is too new for trdg==1.6.0.

Installing trdg==1.7.0 has a dependency conflicts with opencv-python:

ERROR: Cannot install opencv-python==4.5.3.56, trdg and trdg==1.7.0 because these package versions have conflicting dependencies.

The conflict is caused by:
    trdg 1.7.0 depends on numpy<1.17 and >=1.16.4
    opencv-python 4.5.3.56 depends on numpy>=1.19.3
    trdg 1.7.0 depends on numpy<1.17 and >=1.16.4
    opencv-python 4.5.2.54 depends on numpy>=1.19.3
    trdg 1.7.0 depends on numpy<1.17 and >=1.16.4
    opencv-python 4.5.2.52 depends on numpy>=1.19.3
    trdg 1.7.0 depends on numpy<1.17 and >=1.16.4
    opencv-python 4.5.1.48 depends on numpy>=1.19.3
    trdg 1.7.0 depends on numpy<1.17 and >=1.16.4
    opencv-python 4.4.0.46 depends on numpy>=1.19.3
    trdg 1.7.0 depends on numpy<1.17 and >=1.16.4
    opencv-python 4.4.0.42 depends on numpy>=1.17.3
    trdg 1.7.0 depends on numpy<1.17 and >=1.16.4
    opencv-python 4.4.0.40 depends on numpy>=1.17.3
    trdg 1.7.0 depends on numpy<1.17 and >=1.16.4
    opencv-python 4.3.0.38 depends on numpy>=1.17.3

`sentiment_emoji_augmenter` throws SyntaxWarning messages

When run it throws the following error messages:

/Users/saad/Documents/Research Work/GEM/NL-Augmenter/transformations/sentiment_emoji_augmenter/transformation.py:103: SyntaxWarning: "is" with a literal. Did you mean "=="?
  if sentiment is "pos":
/Users/saad/Documents/Research Work/GEM/NL-Augmenter/transformations/sentiment_emoji_augmenter/transformation.py:106: SyntaxWarning: "is" with a literal. Did you mean "=="?
  elif sentiment is "neg":

`correct_common_misspellings` throws FileNotFoundError and incorrectly assumes resources are relative to transformation directory

These issues appear when trying to use this transformation outside of the root NL-Augumenter directory. For example in another sub-directory off the root directory. The fixes needed are the following:

  • Remove:
spell_corrections = os.path.join(
        "transformations", "correct_common_misspellings", "spell_corrections.json"
    )
  • Use file = os.path.join(os.path.dirname(os.path.abspath(__file__)), 'spell_corrections.json') to get a handle on the current path relative to transformation.py script file.

Informal & Untested Suggestions for Possible Transformations

Here are some random ideas informally put which could be used for perturbations & augmentations. @vgtomahawk is making a formal list in this branch.

Meanwhile here is an informal list for the benefit of the participants.

  1. Interchange positions of SRL AM arguments for non-overlapping AM arguments:

    • Alex left for Delhi with his wife at 5 pm. --> Alex left for Delhi at 5 pm with his wife.
    • "at 5 pm" (AM-TMP) and "with his wife" (AM-COM) can be exchanged: This is safe to do only with non-core arguments and non-overlapping arguments. Check what SRL is here.
  2. The ButterFingersPertubation could be implemented for keyboard types other than English - like Devanagiri (Hindi, Marathi, Nepail), Shahmukhi (Urdu, Persian), South Indian languages (Tamil, Telugu, Kannada, Malayalam) or Chinese, etc.

  3. Style transfer approaches could be interesting to look at - Changing formal to informal and vice versa. Check this model.

  • What the heck is going on? --> What is going on?
  • What you upto? --> What are you doing?
  1. Word Order Changes: Active to Passive & vice versa, Topicalisation, Extraposition, Wh-fronting, (& vice versa) & other used in constituency tests.
    Scrambling (for German, Turkic languages)
    John went to the store to buy bread. --> To buy bread, John went to the store.

The above are only related to SentenceOperation. There are other transformation types too which could be looked at.

Add CUDA argument in evaluate.py to set the "is_cuda" flag in evaluate methods to False. (for non-Nvidia GPUs to use CPU)

Hi All,

I am using a Mac OS for my project so I am running into an issue when trying to evaluate my transformations. As I do not have Nvidia GPUs, I would like to use the CPU when working with PyTorch otherwise I would get an "AssertionError: Torch not compiled with CUDA enabled".

Mac OS users that do not have Nvidia GPU will have to set device = -1 to not use GPU:
MacOS: "AssertionError: Torch not compiled with CUDA enabled"
allenai/allennlp#877

This seems to be stemmed from the fact that there is currently no way to change the is_CUDA flag that is being set to TRUE by default in the evaluate() method inside evaluate_text_classification.py to FALSE. (There is code to set the device to 0 or -1 based on the is_cuda flag.)

I am able to run my evaluations by changing the is_cuda flag in the code. It will probably be better to make it an argument so that future users who want to use CPUs instead of GPUs to be able to do it when running python evaluate.py -t [transformation] -task [task_type]

I will be happy to make the required changes if that's something we'd want.

Thanks,
Tim

Swap Transformations

Thank you for your great work! It's super useful!

I have a suggestion for improvement -
Some transformations are working with a "swap" principle. For example, in GenderSwap, if we had "sister" in the original sentence then it would be transformed to 'brother" and vice versa.
There are scenarios when it's important to know what direction the transformation went, female to male or male to female. In my case for example, I want to compare the performances of my model on female/male sentences on inference time.

I really liked the way TenseTransformation works. You need to specify in the constructor what tense (past/present/future) you want to transform to.
Maybe that could be applicable for other swap transformations?

Thanks again!

Style paraphrasers work best in a two-stage pipeline, can re-use HuggingFace `generate(...)` APIs

Hi everyone, I'm the original author of the STRAP paraphrasers (paper link) which were recently accepted to NL-Augmenter (#227), an effort led by @Filco306. Excited to see these models in NL-Augmenter!

After discussing with @Filco306 and seeing the PR, I saw that 6 different variants of the paraphraser have been provided, a "Basic" style agnostic paraphraser as well as five style-specific paraphrasers (link). While the "Basic" paraphraser is implemented fine, for the style-specific paraphrasers it's recommended to use a two-step pipelined process ---

(1) normalize the text using the "Basic" paraphraser;
(2) pass the output from (1) through the style-specific paraphraser.

This is important since all style-specific paraphrasers were trained on the outputs of "Basic", so any other text is technically out-of-distribution. In an ablation study (-Inf PP. in Table 3 of the paper) we saw a significant drop in style transfer performance without this step. Moreover, the two-step process helps boost output diversity since the "Basic" paraphraser strips input style. This should be fairly simple to implement.

Another minor point is that the models are fully compatible with the new HuggingFace generate(...) APIs, which provide additional functionality compared to what was originally implemented in my repository (in other words, this import can be avoided). Here's an example of how to do it,

out = gpt2.generate(
    input_ids=gpt2_sentences[:, 0:init_context_size],
    max_length=gpt2_sentences.shape[1],
    return_dict_in_generate=True,
    eos_token_id=eos_token_id,
    output_scores=True,
    do_sample=top_k > 0 or top_p > 0.0,
    top_k=top_k,
    top_p=top_p,
    temperature=temperature,
    num_beams=beam_size,
    token_type_ids=segments[:, 0:init_context_size]
)

Also CCing the NL-Augmenter reviewers for the style paraphraser to keep them in the loop --- @sebastianGehrmann @Nickeilf @juand-r @kaustubhdhole

`summarization_transformation` has unresolved reference to spaCy

When running this transformation there are several unresolved references to the spacy_nlp variable. In particular on line:

  • Line 21: self.nlp = spacy_nlp if spacy_nlp else spacy.load("en_core_web_sm", disable=['ner','textcat'])

Please use from initialize import spacy_nlp to get a handle on the global spacy instance.

Spacy Loading can be done once

Many transformations load spacy multiple times and reparse the same utterance. We will need a mechanism to load spacy once and parse once or at least cache the parse for a string so that when running all transformations together, there is no repetition of parsing.

`p1_noun_transformation` wptools dependency issues

The p1_noun_transformation relies on wptools as a dependency. However, wptools depends on pycurl. Unfortunately, pycurl keeps throwing the following message when used:

  File "/Users/saad/Documents/Research Work/GEM/NL-Augmenter/transformations/p1_noun_transformation/__init__.py", line 1, in <module>
    from .transformation import *
  File "/Users/saad/Documents/Research Work/GEM/NL-Augmenter/transformations/p1_noun_transformation/transformation.py", line 9, in <module>
    import wptools
  File "/Users/saad/Documents/Research Work/GEM/NL-Augmenter/venv/lib/python3.9/site-packages/wptools/__init__.py", line 23, in <module>
    from . import core
  File "/Users/saad/Documents/Research Work/GEM/NL-Augmenter/venv/lib/python3.9/site-packages/wptools/core.py", line 14, in <module>
    from . import request
  File "/Users/saad/Documents/Research Work/GEM/NL-Augmenter/venv/lib/python3.9/site-packages/wptools/request.py", line 17, in <module>
    import pycurl
ImportError: pycurl: libcurl link-time ssl backends (secure-transport) do not include compile-time ssl backend (openssl)

PR Filter label

There should probably be another label called "filter" to quickly check in the PR's which transformations/filters have already been implemented. Both of my PRs are filters and should therefore not have a transformation label.

Change batch size and number of visible devices for text-style-transfer

Hi @Filco306

Thank you for your great work to make the powerful paraphrasing model easily accessible through HuggingFace! Now it is much easier for me to work with it without the hassle of handling complicated dependencies!

But is there any way for us to use a larger batch size and more GPUs to accelerate the paraphrasing process? Now it I could use only one GPU and a small batch size. I read your implementation here but there does not seem to be an easy to do either of them.

Thank you. I am looking forward to your reply.

`re.sub` method error during the evaluation

Hi,
While running the evaluate method (for #246), I get an error in my re.sub method for one of the tests --most likely due to a problem with the escape characters. I can replace it with string.replace to solve the problem. However, this branch is already merged. Do you suggest creating a new branch or to leave the corresponding eval columns empty?

Is the first test case skipped?

When adjusting the tests for #146 I noticed that I almost never needed to adjust the first test case in each test.json but all the others. It almost felt as if the first one was being skipped since it is so unlikely that all other test cases needed slight adjustments but the first one always perfectly matched. Can someone quickly check if everything works as intended there? Could very well be chance as well but just to make sure.

Typos discovered by codespell

codespell --ignore-words-list="fro,ist,oder"

./dataset.py:122: relavent ==> relevant
./dataset.py:143: hierachy ==> hierarchy
./notebooks/Write_a_sample_transformation.ipynb:1442: tht ==> the, that
./notebooks/Write_a_sample_transformation.ipynb:1718: exisiting ==> existing
./evaluation/evaluate_text_generation.py:84: upto ==> up to
./transformations/change_two_way_ne/README.md:11: implemetation ==> implementation

Data augmentation methods and filters that require the entire dataset

Hello!

First of all, thanks for the effort to build such a collaborative framework!

At the moment, the augmentation methods and filters are only provided with a single example per call. Since there are many techniques that need the whole dataset with the class information (to be conditioned on the class, to interpolate instances, etc.), I wanted to ask if there are plans to add this to this framework?

OSError in the PR Workflow Test

Hi, I just found an OS error in the PRs' workflow.

Collecting huggingface-hub<0.1.0
  Downloading huggingface_hub-0.0.8-py3-none-any.whl (34 kB)
Collecting sacremoses
  Downloading sacremoses-0.0.45-py3-none-any.whl (895 kB)
Collecting tokenizers<0.11,>=0.10.1
  Downloading tokenizers-0.10.3-cp37-cp37m-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_12_x86_64.manylinux2010_x86_64.whl (3.3 MB)
Collecting filelock
  Downloading filelock-3.0.12-py3-none-any.whl (7.6 kB)
ERROR: Could not install packages due to an OSError: [Errno 2] No such file or directory: '/opt/hostedtoolcache/Python/3.7.10/x64/lib/python3.7/site-packages/importlib_metadata-4.6.0.dist-info/METADATA'

Probably, somebody has some idea about this error that occurred in many PRs recently.

Thanks!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.