GithubHelp home page GithubHelp logo

wikigender's Introduction

WikiGender

According to the Wikipedia article on grammatical gender (version at time of writing),

in Spanish, female gender is often attributed to objects that are "used by women, natural, round, or light" and male gender to objects "used by men, artificial, angular, or heavy."

The source is a similar statement in the book Social Psychology of Culture by Chi-Yue Chiu and Ying-yi Hong. No evidence is provided for this claim, as far as I can see.

This project is an attempt to test the above claim by comparing masculine and feminine nouns on the Spanish Wikipedia, specifically the claims concerning natural and artificial objects.

Method

  • The program download_wikipedia.py downloads a specified version of the Spanish Wikipedia from a data dump.
  • The master list of masculine and feminine nouns comes from the dictionary FreeLing.
  • The program count_all_pages.py iterates through all the articles and tallies the masculine and feminine nouns, writing the results to a text file.
  • The program sum_counts.py sums the masculine and feminine nouns in a given article, and all the articles that it links to.

Shortcomings

  • The word de is a preposition and a feminine noun in Spanish. Every instance of the preposition de would be incorrectly counted as a feminine noun by count_all_pages.py. There are presumably many other examples of words being incorrectly classed as a noun.
  • Certain words occur in Wikipedia articles frequently. For example, Referencias meaning References occurs in virtually every article. These words are not related to the subject matter of the article, but would still be counted.

Results

Here are the results. This table lists the number of nouns in the listed article and all the articles it links to, together with the ratio of masculine nouns to feminine nouns.

Article Translation Number of nouns Ratio M/F
Ingeniería Engineering 309,208 0.9266
Tren Train 153,426 0.9708
Diseño Design 69,966 0.9770
Arquitectura Architecture 236,646 0.9554
Ciencia política Political science 280,862 0.9804
Botánica Botany 309,742 0.9821
Biología Biology 542,049 0.9822
Naturaleza Nature 455,088 1.0045
Animalia Animalia 256,790 1.0652
Wikipedia total 438,157,505 0.9349

Interpretation

The articles are just a few that I selected which I thought reflected the natural vs artificial dichotomy. The articles which are artificial seem to have lower masculine:feminine ratios, while the natural articles have a higher ratio. The difference is very small, however, so it's possible that there is no discernible difference. This is not predicted by the claim we are testing, which would predict a higher ratio for artificial articles compared to natural articles.

Caveats

  • It is possible this negative result is a consequence of the shortcomings listed above.
  • It is possible I have incorrectly understood what artificial and natural objects are.

Resources and licenses

wikigender's People

Contributors

toddlj avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.