GithubHelp home page GithubHelp logo

carltonnorthern / nicknames Goto Github PK

View Code? Open in Web Editor NEW
279.0 279.0 147.0 10.45 MB

A CSV file with US given names (first name) and their associated nicknames or diminutive names.

License: Apache License 2.0

Java 3.52% Perl 17.16% Python 70.51% R 8.82%

nicknames's Issues

BUG: can't instantiate default nicknamer twice in a row

going

Nicknamer()
Nicknamer()

gives

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/nickcrews/Library/Application Support/hatch/env/virtual/noatak-UM6-FHel/noatak/lib/python3.9/site-packages/nicknames/__init__.py", line 36, in __init__
    nickname_lookup = _lookup_nicknames_default()
  File "/Users/nickcrews/Library/Application Support/hatch/env/virtual/noatak-UM6-FHel/noatak/lib/python3.9/site-packages/nicknames/__init__.py", line 120, in _lookup_nicknames_default
    with DEFAULT_NICKNAME_RESOURCE as f:
  File "/Users/nickcrews/.pyenv/versions/3.9.4/lib/python3.9/contextlib.py", line 115, in __enter__
    del self.args, self.kwds, self.func
AttributeError: args

because of some way that the package resource is used. Investigating

Documentation out of date?

Hi,

Thanks for creating this useful package. It seems the documentation is out of date or out of sync with the package in pypi:

Python 3.9.12 (main, Apr  5 2022, 01:53:17)
>>> from us_nicknames import NickNamer
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ImportError: cannot import name 'NickNamer' from 'us_nicknames' 

I do have the latest one (pip installed today)

>>> import us_nicknames
>>> us_nicknames.__version__
'0.1.2'

How is this file structured?

I don't understand how this file is structured. If I want to find the name associated with "Dicky," how would I do that aside from looking through the file manually?
And why are some names spread across multiple lines? Shouldn't each name appear on only one line? Example:

russ,russell
russell,russ,rusty
rusty,russell

Shouldn't those closely associated names be on one line?

I tried the Perl script out but all it did was display how many names had more than 5 mentions in the file.

Additional nicknames and name variants to add

traci = tracy (which you already have), tracie
falon = fallon, Fal, Fall, Fallie, Fally, Falcon, Lon, Lonnie (https://momlovesbest.com/fallon-name-meaning)
hillary = hilary
toni = tony, antonia, etc.
lindsay = lindsey, lindsie, lindsy
garrett = Barrett, Gare, Garrison, Gars, Gary, Jerry, Rhett, Variations: Garratt, Garret, Garrod, Jarrett, Jared, Jarratt, Jerrold (https://momlovesbest.com/garrett-name-meaning)
gareth = gary, gare
dacia = Daycia, Daisha, Dacya
marc = mark, marcus, etc.
sheri = sherry, sherryl, sheryl, sherri, cheri, cherie, etc.
dianne = diane, dian
angelika = angelica
miguel = Miguell, Miguael, Miguaell, Miguail, Miguaill, Miguayl, Miguayll = michael/mick (spanish version)
monika = monica, monique
michele = michelle
shelley = sheley, michelle, shellie, etc.
hayley = hailey, haylee, etc.
karl = carl
rosemary = rosemarie, marie, mary, rose, etc.
jalen = Jay, Jaye, Len, Lenny, Lennie, Jaylin, Alen, Al, Jaylen, Jaelen, Jaelin, Jaelyn, Jailyn, Jaylyn
rachael = rachel
kellie = kelli, kelly, kelley
kalli = kali, cali
jodi = jody
lori = lorrie, laurie, lorelei, etc.
shawn = shaun
allen = allan, alan, al
erika = erica
marcia = marcie, marsha
dona = donna
kristi = kristy, Christy, christine, christina, krista, etc.
norman = norm
chelsie = chelsey
stephine = stephanie, stephany, stephani
audree = audrey
kerri = kerry
fiona = fionna
savanna = savannah
bryanna = brianna, bri, briana, etc.
jaine = jane, jayne
leilani = lani
jesse = jessica, jess, jessie
abby = abbie
glenn = glen
carri = carrie, kari, kara
donn = don, donald
kym = kymberly, kim, kimberly, kimberli
gerri, geri = geraldine
nichole = nicky, nicki, nicholette, nicci, nicole
jamey = jaime, jamie
tami = tammie, tammy
derek = derick, derrick, derrek, rick, etc.
jenni = jennie, jenny
karin = karen
gabriela = gabriella
marni = marnie
dena = deena, dina, adina, adena
brittnie = brittany
juston = justin
lesli = leslie, lesley, les
kev = kevin
aga = athaga
carla = karla, carly
tiffanee = tiffany
staci = stacy, stacey, stacie
sara = sarah
katia = kate, katie
terri = teri, terrie, terry
ashly = ashley
jeanie = jeannie
matt = matthew, matthews
jillian = jill
laurel = laurie

(these all came from a registration list I'm working on)

allie for allison

Thanks for this -- really finding it useful.

I found an instance of Allie used for Allison in my dataset.

Get PyPI tokens set up

Make release on PyPI

Hi! This looks to me to be one of the better maintained datasets of diminutive names on GitHub. It could be easier to use in python if this was actually released on PyPI so people could do a pip install nicknames (surprisingly this package name isn't taken? Could definitely choose another name too.)

If I open a PR for this, would you be open to it? I'd add a github action similar to this one that would build and release the wheels automatically on a git tag. Your admin overhead on a day-to-day would be minimal, you'd just have to set up an PyPI account and add the access token to this repo's Secrets once. I can help with this too if you want.

Thank you!

Create better SQL resources for names.csv

I'll start by saying that having names-mysql.sql is far better than not having it. Thanks to the guys that created it.
But there are a few aspects I don't like about it. I'm thinking of adding some improvements. I would very much like feedback from others about what would be most useful. Here's my mini-spec:

  • A SQL neutral (ANSI) version of the file would be appropriate here rather than MySQL or other technology-specific version.
  • Include a version exactly parallel to names.csv. i.e. it would have exactly the same number of rows.
  • Include a normalized version. Following the model of names-mysql.sql available today.
  • Include a python script for regenerating these files whenever names.csv is updated.

Ideas for the file names:

  • names.sql
  • names-normalized.sql
  • generate-sql.py

The JavaParser has the wrong type definition for the dimNames map

Currently the map is declared with:

public Map<String, String> dimNames = new HashMap<String, String>();

but that won't compile because the code needs to store a map keyed on String but with a List of Strings being the value type. So this is the correct declaration:

Map<String, List<String>> dimNames = new HashMap<>();

Nice list - what is the best way for suggesting new aliases/nicknames?

How to deal with nickname-canonical-nickname transitive links (jon-johnathon-john)

There are 4 possible combos of "formalness", and how we typically treat them:

  1. (formal, formal) (jonathon, johnathon): we usually include these
  2. (formal, casual) (johnathon, john): we usually include these
  3. (casual, formal) (john, johnathon): we almost never include these
  4. (casual, casual): (john, jon): we are very inconsistent on how we include these. eg (jon, john) is present, but (abbie, abbey) is not.

So in order to catch the (abbie, abbey) case, someone would need to do the abbie->abbigail lookup, and then the abbigail->abby lookup. eg:

def are_aliases(n1, n2):
    for canon in nn.canonicals_of(n1):
        if n2 in nn.nicknames_of(canon):
            return True
    return False

I'm thinking of some uses cases, ideally all of them could be supported. Where is my take on expected behavior:

  • canonicals_of(jonathon) should just be {johnathon}, no jon or john included.
  • nicknames_of(jonation) should be {johnathon, jon, john}
  • canonicals_of(jon): should this be merely {johnathon, jonathon}, or should it also include {john}?
    -nicknames_of(john) should be {jon}

What do you think of these test cases and expected outputs? Once we know the expected outputs, that can inform what data representation we should use.

If we went with my suggestion of listing individual pairs, then we could annotate the pairs with their level of casualness. But that is whole other level of subjectivity we may want to avoid.

@carltonnorthern I'd love your thoughts here if you have the time. Thanks!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.