GithubHelp home page GithubHelp logo

xweat's Introduction

XWEAT

Cross-lingual version of WEAT, which includes the tests in English, German, Spanish, Croation, Italian, Russian, and Turkish.

Example usage: This example would test monolingual bias in seven languages.

for similarity_type in "cosine" "euclidean" ; do
    for language in "en" "de" "es" "hr" "it" "ru" "tr" ; do
        for test_number in 3 4 5 6 10; do
            python weat.py \
                --test_number $test_number \
                --permutation_number 1000000 \
                --output_file ./results/w2v_wiki_${language}_${similarity_type}_${test_number}_cased.res \
                --lower False \
                --use_glove False \
                --is_vec_format True \
                --lang $language \
                --embeddings \
                <PATH TO YOUR EMBEDDINGS>/cbow.wiki.${language}.300w5.vec \
                --similarity_type $similarity_type |& tee ./results/w2v_wiki_${language}_${similarity_type}_${test_number}_cased.out
        done
    done
done

This example tests bias in cross-lingual embedding spaces. Here, Spanish is tested in bilingual spaces with each of the other six languages.

for similarity_type in "cosine" ; do
    targets_language="es"
    for attributes_language in "en" "de" "hr" "it" "ru" "tr" ; do
        for test_number in 6 7 8 9 10 1 2 ; do
            xspace=${targets_language}-${attributes_language}
            embedding_dir="/smith/fasttext/${xspace}" # be aware that you need to have trained the bilingual embedding space first

            python xweat.py \
                --test_number $test_number \
                --permutation_number 1000000 \
                --output_file ./results/ft_xling_space-${xspace}_ta-${targets_language}-${attributes_language}_${similarity_type}_${test_number}.res \
                --lower False \
                --use_glove False \
                --targets_lang $targets_language \
                --attributes_lang $attributes_language \
                --targets_embedding_vocab \
                ${embedding_dir}/${targets_language}.vocab \
                --targets_embedding_vectors \
                ${embedding_dir}/${targets_language}.vectors \
                --attributes_embedding_vocab \
                ${embedding_dir}/${attributes_language}.vocab \
                --attributes_embedding_vectors \
                ${embedding_dir}/${attributes_language}.vectors \
                --similarity_type $similarity_type |& tee ./results/ft_xling_space-${xspace}_ta-${targets_language}-${attributes_language}_${similarity_type}_${test_number}.out
        done
    done
done

For more information on our approach, experiments, and results we refer to our paper:

Lauscher A. and Glavas G. (2019). Are We Consistently Biased? Multidimensional Analysis of Biases in Distributional Word Vectors. To appear at *SEM 2019.

xweat's People

Contributors

anlausch avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

Forkers

rafikt1992 umanlp

xweat's Issues

About replicating monolingual experiments

Hi! Thanks for sharing this codebase. I was trying to replicate the results in Table5 of paper.

I have tried 3 languages so far: DE,IT,TR and I couldn't unfortunately replicate the results.
I am probably missing something. Could you help me on this? Let me try to explain what I did for German(DE)

I have downloaded the FastText embedding from here (Specifically, the text version. the file I downloaded is named wiki.de.vec

Then I run the following command after cloning the repository:

similarity_type="cosine"
language="de"
for test_number in 1 2; do
    python weat.py \
           --test_number $test_number \
           --permutation_number 1000000 \
           --output_file ./results/w2v_wiki_${language}_${similarity_type}_${test_number}_cased.res \
           --lower False \
           --use_glove False \
           --is_vec_format True \
           --lang $language \
           --embeddings \
           data/fastTextEmbeddings/wiki.${language}.vec \
           --similarity_type $similarity_type |& tee ./results/w2v_wiki_${language}_${similarity_type}_${test_number}_cased.out
done

Then I checked the automatically created files. For example in w2v_wiki_de_cosine_1_cased.res, I found this:

Config: 1 and False and 1000000
Result: (0.0, nan, 0.0)
0.10255803883075715

and in w2v_wiki_de_cosine_2_cased.res:

Config: 2 and False and 1000000
Result: (0.0, nan, 0.0)
0.08906068411138322

I was also getting bunch of warning some of which are as follows (written in w2v_wiki_de_cosine_1_cased.out):

WARNING:root:Not in vocab Veilchen
WARNING:root:Not in vocab Trauer
WARNING:root:Not in vocab Tod
WARNING:root:Not in vocab Kricket
WARNING:root:Not in vocab Tausendfüßler
WARNING:root:Not in vocab Hornisse
WARNING:root:Not in vocab Ehre
WARNING:root:Not in vocab Rüsselkäfer
WARNING:root:Not in vocab Narzisse
WARNING:root:Not in vocab Butterblume
WARNING:root:Not in vocab Spinne
WARNING:root:Not in vocab Urlaub
WARNING:root:Not in vocab Käfer
WARNING:root:Not in vocab Qual
WARNING:root:Not in vocab Absturz
WARNING:root:Not in vocab Rose
WARNING:root:Not in vocab Himmel
WARNING:root:Not in vocab Termite
WARNING:root:Not in vocab Orchidee
WARNING:root:Not in vocab Zinnie
WARNING:root:Not in vocab Tarantel
WARNING:root:Not in vocab Tragödie
WARNING:root:Not in vocab Heuschrecke
WARNING:root:Not in vocab Familie
WARNING:root:Not in vocab Regenbogen
WARNING:root:Not in vocab Nelke
WARNING:root:Not in vocab Paradies
WARNING:root:Not in vocab Ameise
WARNING:root:Not in vocab Lachen
WARNING:root:Not in vocab Lilie
WARNING:root:Not in vocab Klee
WARNING:root:Not in vocab Gefängnis
WARNING:root:Not in vocab Bettwanze
WARNING:root:Not in vocab Mord
WARNING:root:Not in vocab Diplom
WARNING:root:Not in vocab Made
WARNING:root:Not in vocab Diamant
WARNING:root:Not in vocab Glockenblume
WARNING:root:Not in vocab Vergnügen
WARNING:root:Not in vocab Krokus
WARNING:root:Not in vocab Missbrauch
WARNING:root:Not in vocab Frieden
INFO:root:Popped T2 0
INFO:root:Popped A2 8
fromnumeric.py:2957: RuntimeWarning: Mean of empty slice.
  out=out, **kwargs)
methods.py:80: RuntimeWarning: invalid value encountered in double_scalars
  ret = ret.dtype.type(ret / rcount)
_methods.py:135: RuntimeWarning: Degrees of freedom <= 0 for slice
  keepdims=keepdims)
_methods.py:105: RuntimeWarning: invalid value encountered in true_divide
  arrmean, rcount, out=arrmean, casting='unsafe', subok=False)
methods.py:127: RuntimeWarning: invalid value encountered in double_scalars
  ret = ret.dtype.type(ret / rcount)
INFO:root:Calculating p value ...
INFO:root:Number of possible permutations: 1
INFO:root:(0.0, nan, 0.0)

Can you tell me what I need to do to run the code successfully? Thanks in advance!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.