GithubHelp home page GithubHelp logo

phseiff / gender-render Goto Github PK

View Code? Open in Web Editor NEW
19.0 1.0 0.0 162.35 MB

Template-system and proof-of-concept for rendering gender-neutral text-, email- and RPG-text-templates with the correct pronouns of all people involved.

Home Page: https://phseiff.com/gender-render

License: Other

Python 100.00%
python3 gender gender-equality specification template-engine template-language grammatical-gender pronouns

gender-render's Introduction

gh banner

Hello! My name is phseiff, and I'm an avid hobby developer, manga-binge-reader, vegetarian, programmer, writer and denglish-speaker.

In my free time, I work on holistic software projects with a social vision (like my newest project, gender*render), draw and design (for example my website), post about my thoughts on mastodon, and socialise with awesome people.

I also love good food, especially cheese. ๐Ÿง€

Feel free to drop me an email, raise an issue on github or contact me on mastodon if you feel like it; I'm always up for a chat or having an exchange about our projects, especially if they're in similar areas.

they/them

gender-render's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar

gender-render's Issues

Separate repository for managing data for gendered nouns?

As it is now, the implementation reads the data which describes the gendered versions of every noun from a separate repository maintained by another person (https://github.com/ecmonsen/gendered_words). To work with this data (its data format is somewhat burdensome and bloated for our purposes), we read it from a fork and then convert it into a different data format for usage. The resulting data then runs through an algorithm that "fixes" it, because there are lots of things in it like

  • links to words that are not part of the database
  • words that end with a gender-indicator (-men, -woman, -aunt etc.), yet don't have versions for different genders/ gender neutral individuals
  • links that are one-sided
  • words that have no neutral version listed, but an algorithm can easily determine one, and words that are wrongly listed as male rather than neutral

For further information, see this issue I raised as well as the gender_nouns-submodule of gender*render, which implements the algorithms above.

The way this is handled right now -reading from a different repository and converting to a different data type- comes with its pros and cons:

  • pro: The repository is maintained, so we don't need to worry about maintaining its data ourself (or, so I thought, at least; I am not entirely sure about it anymore)
  • pro: The data from the repository is exactly what we need, so no need to make a second repository with the same data (this did not hold true, since the data seems to be made for a very specific purpose, and does not really attempt to be suited for different purposes, though it obviously partially is
  • contra: We need to convert the data from the repository (and I have to maintain the code for it) to our own data format, run it through a complicated pipeline, make sure the result is always shipped with the implementation and always up-to-date to the data from the repository (we have all of this, and it works, but it is still less-than-ideal and somewhat bloated.)
  • contra: The data has holes that need to be fixed per algorithm (as in, 500 words for each step of the pipeline, not just 3 or 4 inconsistencies), and these fixes are unreviewed; for example, the female version of manager should definitely not be "womanager", to name one example). The solution to this would be to manually go over all changes applied by the pipeline, adding those to the original data via pull request that seem correct, and adding the correct alternatives to the data in cases where the pipeline was wrong. This does, however, come with problems:
    • The maintainer of the original project might not want a "-person"-version for all "-man"-words, since the original projects vision is less focused on non-binary issues than this project is; so we would need to move the fork used by this project away from the state of the original data, which makes the first pro-argument somewhat obsolete.
    • The data format from which we read is much more cumbersome than the one we actually use, so adding changes to it rather than applying them to data managed in the data format to which we convert is cumbersome.

My preferred approach would be transitioning to a different way of managing the noun data we read from, but whether this transition is necessary depends highly on whether the repository I read the data from turns out to be maintained, and what its vision and perspective is; I should eventually raise an issue asking about this.

If a change of concept turns out to be necessary here, I would prefer the following approach:

  • Create a phseiff/gender-nouns-repository (named analogously to the submodule of gender-render.
  • This repository should contain merely the README file and the gendered noun data in our preferred dataformat, with all secure steps (steps that can not misjudge) of our pipeline applied to it.
  • The code of gender_render.gender_nouns would be changed to read its data from the new repository, so the data format conversion step would fall away.
  • The new repositories README should reference phseiff/gender-render, explain how to use the gender_nouns submodule outside the context of rendering templates (since there are many other valid use cases for it), explain the data format of the gendered-noun-data, why one needs to run it through the pipeline before using it, and so on. It should also prominently ask for people to go through the automatic changes applied by the pipeline and create pull requests for the corrected version of the changes they deem incorrect, and contain detailed information on how to do this efficiently (e.g. by turning on advanced logging for the pipeline).

This would make maintaining the data easier, as well as make fixing its shortcomings easier and help people participate in it, as well as, as a side effect, making the usage of gender*render for noun gendering (without the rest) easier. I feel like this change would also go will with creating an extension specification for the format of the gender noun data.

Diversify branding for a broader scope of applications

Currently, gender*render is branded as a tool mainly for rendering email templates, and analogously, the spec is branded as a proof of concept that gendering people in automated emails is possible. I feel, however, like this misses a relatively big branch of applications, these being text dialogues in computer games (mainly RPGs and visual novel).

As it is now, most computer games offer only two genders (with binary pronoun preferences) to play, though this is admittedly more of a social and/or historical issue, whilst actively progressive games offer a non-binary option with gender-neutral they/them pronouns. There are, however, as far as I know (feel free to correct me on this) no games where the player can customize their pronouns, which I feel would be necessary to make games inclusive, though admittedly out of the technical scope of lots of game companies. Branding gender*render for the purpose of gendering procedurally generated/ displayed dialogue in games and other products in addition to correctly gender emails would therefore be appropriate and add a possibly even more common use case to gender*render's portfolio, as well as broaden the scope of problems it "proofs" to be solvable.

I admit that replacing every mention of "emails" with "emails and video game dialogue" would be cumbersome and bloated, but at least the first mention of "automatically generated emails" in every prominent document (that is, those documents including gender*render's vision, which is mainly the spec and the README) should be extended to procedurally generated dialogue.

I could also imagine a section in the specification's introduction about this use case (like the introduction I gave above) in addition to the section explaining the use case of automated emails.

I think I will do this once I get to work on a general wording overhaul for the main specification, which is worded somewhat messy here and there right now due to lots of changes applied to it during its initial development, but before I get to do this, I will have to work on some things at hand (the issues labelled with needed) and, in the process, potentially reevaluate which spec/ extension spec some features belong into and flesh out the way the spec, extension specs and semantic versioning flow with each other.

specification proposal: gender verbs

As it is now, gender*render assumes every unknown property used as the context value of a tag to be a gendered noun.
However, nouns are not the only words whose grammatical case depends on the preferences of the person they refer to, because some pronouns are grammatically singular, whilst others are grammatically plural - I must admit that I did not think about that when I wrote version 0.1.0 of the specification.
I do, however, think that this issue is easy to address.

The requirements would be as follows:

  • When gender*render encounters an unknown word, it checks whether it is a verb or a noun. If the word is a noun with different versions for different genders, it is assumed to be a noun; otherwise, if it is a verb, it is assumed to be a verb; otherwise, it is assumed to be a noun. This construction should suffice to ensure that, in cases where a word can be a noun as well as a verb (are there even words like this in the English language? Let me know if you know more about this!), confusion can be avoided by simply not setting the word into a tag if it is meant to be a noun and said noun doesn't even have multiple versions for multiple genders anyway.
  • verbs are assumed to be added to tags in their plural-form, since this form fits the "they" used in template-syntax and is easier to convert to the singular form than vice versa.
  • nouns are rendered as specified by the main specification.
  • verbs are gendered as follows: If the individuals pronoun data specifies gender_verbs (potential values are singular and plural), the verb is gendered according to said information (for example, similar to https://smrtenglish.com/cg/lesson/217/2531). Otherwise, the gender is determined based on the persons pronouns (subject), and a warning should be risen (DefaultValueUsedWarning? Or a new type of warning?). Which pronouns corresponds to plural and which correspond to singular is defined by the specification (or would it be a better idea to handle it like gendered nouns and make it implementation dependent? This seems like it'd a bad idea to me since it's far more subjective). In case the pronouns can not be determined as either plural or singular, an error would be risen (probably MissingInformationError).

I feel like this is an important feature, but should not necessarily be part of the main specification, since it comes with lots of logic and therefore, seems slightly too big to me to add it to the main specification, especially since it can be encapsulated relatively well. On the other hand, it's similarly important as gendered noun handling (that is, pretty important), so maybe I should actually make it part of the main specification, or move it out of it together with noun gendering?

Comments & Opinions much appreciated.

future plans

I plan on adding support for pronoun-less identities and better support for identities with no preferred way of addressing in the future.
It would also be interesting to have the ability to pass the name of any gender with a big-enough social consent on the pronoun associated with the gender to the renderer rather than a piece of individual pronoun data, with the name of the gender then being converted to a piece of individual pronoun data, with an extension specification defining the form of the data used for this as well as the feature itself, but before thinking about such an addition, all other current and accepted specification proposals should be resolved, and it should be discussed whether such a change is even in the scope of this project and compatible with its vision, since it might lead to marginalization or inherently entail marginalization of people of genders that are not listed in this specific set of genders.

The above list is merely there to give a slight overview of things that might get implemented in the future, not to suggest anyone to try to find solutions to them. It's solely here for transparency, and it is definitely not suited for pull requests.

specification proposal: global capitalization system

As it is now, gender*render does not atone to the fact that the capitalization of words depends on their context. For example,

I ate. {they} didn't join me in doing so.

would (for a person with they/them pronouns) become

I ate. they didn't join me in doing so.

On the other hand the "They" in

I ate. {They} didn't join me in doing so.

would not even be recognized as a tag, because it uses the wrong capitalization.
The specification simply doesn't take capitalization into account (yet).

However, the implementation of noun gendering makes nouns whose first letter was uppercase lowercase before gendering them, and then makes the first letter of their gendered version upppercase again before returning it, therefore making actor an actress and Actor an Actress, if the person uses female noun gendering. This behavior, however, is not specified by the specification, though it is arguable implied by it, since it requires correct gendering of valid nouns, and nouns with an uppercase first letter are valid nouns and would not be gendered correctly if they lost their capitalization in the progress.

I feel like (a) the behavior of nouns should not be a poorly documented implementation feature not even explicitly mentioned by the specification, and (b) their behavior should be extended to every type of context value, since every tag can be the first one of a sentence and therefore require capitalization.

My concept for implementing this is as follows:

  • Add an extension spec for it, or add it to the main spec (opinions on this?).
  • Every context value can be written with an uppercase first character, in which case it is converted to lowercase during parsing and the information about its case is stored with the tag (this should be implementation-specific, but my way of going about this would be to add a new section called capitalization which has 0 or 1 (and possible different numbers if more "capitalization types" where added to later versions of the specification) as a value).
  • When rendering a template, the capitalization of a tag would be re-applied to the value the tag is resolved to.

Suggestions, Comments and Opinions are welcome.

specification proposal: possesive nouns

Every noun in English has a possessive form. If a worker owns a chair, it's "the worker's chair". If Jesus owns a chair, it's "Jesus' chair". So whether a word (a person's personal-name, name or a noun replacing them) is followed by a "'" or a "'s" when it is put into its possessive form depends on whether it ends with an "s" or not.

I think this should be addressed by adding an extension specification implementing a 's-context value which is resolved to "'" or "'s" depending on whether the word (or resolved value of the tag) proceeding them ends with an "s" or not. This would entail that a warning is recommended by the specification to be raised in case the "'s"-tag follows on a hard coded word rather than a tag, and an error in case a "'s"-tag is not proceeded by anything except whitespace until the start of the file or the next newline before it. It would also require a definition of "word" in the specification, which should generally not be an issue.

Another thing this should entail should be that the string "'s " (if it is hardcoded into the template rather than rendered from a tag) is, if following immediately on a tag, replaced with a tag ร  la "{'s} "; this should make the syntax less cumbersome.

It might be a good idea to -in the same specification or a different one- implement similar features (minus the string replacement) for the distinction between "a" and "an".

(Also see Wikipedia on this)

Quick Start table style

This is a minor, stylistic thing in the README that I thought about, not really worth raising an issue about it, but I didn't come to a conclusion myself, so I figured I would post it here to possibly get some second opinions about it.

As it is now, every second row in the table in which the Quick Start section is arranged (this one) is gray, and every second one is white, because this is how GitHub's CSS styles tables.
This comes with...

  • ...an advantage: rows are easy to tell apart, and the table is visually more distinct from its non-table surroundings.
  • ...a disadvantage: every second row (the gray ones) has exactly the same color as the code field within it, so the code fields in every second row are visually indistinguishable from their surroundings, which is really suboptimal.
  • ...a second (maybe?) disadvantage: I'm not sure if it is desirable to highlight every second row, since there is not much of a difference between even and uneven rows, and this is not even intended to be a "real" table (in the classical sense of a table) anyway, so this coloring might even be confusing and tangential to its purpose and meaning.

I figured out that, since I am using a HTML table rather than a markdown table in the markdown file, it's possible to make every row white (rather than gray) by inserting empty rows between every non-empty row. These empty rows will then be colored gray, which won't have any effect since they'll be invisible anyways.

My question: Do you think that this would look better/ more intuitive than it currently does, or do the advantages of having every second row gray outnumber the disadvantages?
You can answer with a thumbs-up for yes/ thumbs-down for no, or leave a comment, if you have an opinion on this.

Note that I might apply this to other tables in future repositories as well, so this is partially a general question and not just me over-optimizing this project's README, in case you are worrying about that :)

Good first issues

There are some details in the implementation that are less-than-ideal, not connected to any of the other bigger issues I already created for this repository, and relatively separated from specification content and bigger concepts.
These issues are mostly marked as "ToDo" in the implementation (note that not every issue marked as ToDo is necessarily one of them). If you want, you can work on one of them and create a pull request (especially those regarding the warning system), but please raise an issue before you do to discuss the way you go about the implementation, so we can make sure it doesn't get into the way of vague design decisions that need to be upheld for later features.

Minor ideas

I use this issue to dump small ideas for this project whom I would like to realize sooner or later, or see realized.
All of them are minor things, just ideas that I'd like to share here for transparency and to give people a chance to object or offer themselves to implement them, and to help me organize them better.

  • An RSS feed to inform people of specification releases. This might sound like a minor improvement, but it might help to get more feedback on specification changes in the long run, since many people might have an interest in the project without having a GitHub account, due to the relatively IT-unrelated vision of the project (compared to other IT projects, at least).
  • An "additional reading"-section in the README, that links to relevant resources and issues
  • An in-depth guide to gender*render that extends the quick-start guide and works without having to read the specification, or at least a link to the specification at the end of the quick-start guide.
  • Modification on the spec download section:
    • The title of the spec next to every specification name
    • Hiding older versions in a dropdown that can be clicked on to show them (relevant once there are more versions)
    • The version of the latest version and a link to its html version next to the "download latest"-link
  • A maintenance document (linked in the additional reading section) in which I outline how I maintain the repository, how releases are made and handled, how I version stuff, how I use git, how I develop the specification and its features, how I use issues for organization et cetera, so people can continue this project as a fork in case something happens to me (this is a measurement recommended to all open source projects), and to increase transparency
  • A section in the README on how one can support the project, that asks for feedback and explains how giving stars is valuable since it helps the project get to trending, therefore increasing the odds of people seeing it there and potentially becoming more aware of the issues the project tackles (or maybe link that in the additional reading section?)
  • Evaluate in how far adding things into these handy hide-until-someone-clicks-it-tags that GitHub offers in the README might help with organizing it and making things easier to access that wouldn't have fit into the README otherwise
  • Create (and link to the additional reading section) a document that broadly explains the project to non-tech people, so I (and other people) can share the link to said document with non-tech people who might be interested in the project or take inspiration from the explanations there (this is relevant since the project's mission is not only relevant to tech people)
  • I would like to keep all of these things in the repository itself rather than a wiki to have them version controlled and integrated into every clone of the repo.
  • Improve the banner of the repository (the one that is embedded into every link to the project) to show the whole name and a tiny explanation of what it is and does.
  • Maybe add a better starting section to the README, that explains the issue the project tackles from a "look how gender selection forms usually work"-perspective rather than a "look at it from the server side!"-perspective, to be more intuitive and start at a point that's more familiar to most people?
  • Should I make the examples in the quickstart section less specific? People could think that only a limited set of pronouns and id-values is supported if I use specific ones in my examples rather than generic values like "foo" and "bar". On the other hand, having specific values for a specific example makes it easier to grasp why features are important.
  • I should probably add more subsub-sections and subsubsub-sections to the specification, plus include a deeper nestling depth into the table of contents, to make it easier to read and navigate.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.