carltonnorthern / nicknames Goto Github PK
View Code? Open in Web Editor NEWA CSV file with US given names (first name) and their associated nicknames or diminutive names.
License: Apache License 2.0
A CSV file with US given names (first name) and their associated nicknames or diminutive names.
License: Apache License 2.0
going
Nicknamer()
Nicknamer()
gives
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/Users/nickcrews/Library/Application Support/hatch/env/virtual/noatak-UM6-FHel/noatak/lib/python3.9/site-packages/nicknames/__init__.py", line 36, in __init__
nickname_lookup = _lookup_nicknames_default()
File "/Users/nickcrews/Library/Application Support/hatch/env/virtual/noatak-UM6-FHel/noatak/lib/python3.9/site-packages/nicknames/__init__.py", line 120, in _lookup_nicknames_default
with DEFAULT_NICKNAME_RESOURCE as f:
File "/Users/nickcrews/.pyenv/versions/3.9.4/lib/python3.9/contextlib.py", line 115, in __enter__
del self.args, self.kwds, self.func
AttributeError: args
because of some way that the package resource is used. Investigating
I'm opening this because 'alex', has 'alexander'. Seems like an oversight.
existing name is quite verbose. Then it would match the python package name. I think all existing links should be redirected.
Hi,
Thanks for creating this useful package. It seems the documentation is out of date or out of sync with the package in pypi:
Python 3.9.12 (main, Apr 5 2022, 01:53:17)
>>> from us_nicknames import NickNamer
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ImportError: cannot import name 'NickNamer' from 'us_nicknames'
I do have the latest one (pip installed today)
>>> import us_nicknames
>>> us_nicknames.__version__
'0.1.2'
I don't understand how this file is structured. If I want to find the name associated with "Dicky," how would I do that aside from looking through the file manually?
And why are some names spread across multiple lines? Shouldn't each name appear on only one line? Example:
russ,russell
russell,russ,rusty
rusty,russell
Shouldn't those closely associated names be on one line?
I tried the Perl script out but all it did was display how many names had more than 5 mentions in the file.
traci = tracy (which you already have), tracie
falon = fallon, Fal, Fall, Fallie, Fally, Falcon, Lon, Lonnie (https://momlovesbest.com/fallon-name-meaning)
hillary = hilary
toni = tony, antonia, etc.
lindsay = lindsey, lindsie, lindsy
garrett = Barrett, Gare, Garrison, Gars, Gary, Jerry, Rhett, Variations: Garratt, Garret, Garrod, Jarrett, Jared, Jarratt, Jerrold (https://momlovesbest.com/garrett-name-meaning)
gareth = gary, gare
dacia = Daycia, Daisha, Dacya
marc = mark, marcus, etc.
sheri = sherry, sherryl, sheryl, sherri, cheri, cherie, etc.
dianne = diane, dian
angelika = angelica
miguel = Miguell, Miguael, Miguaell, Miguail, Miguaill, Miguayl, Miguayll = michael/mick (spanish version)
monika = monica, monique
michele = michelle
shelley = sheley, michelle, shellie, etc.
hayley = hailey, haylee, etc.
karl = carl
rosemary = rosemarie, marie, mary, rose, etc.
jalen = Jay, Jaye, Len, Lenny, Lennie, Jaylin, Alen, Al, Jaylen, Jaelen, Jaelin, Jaelyn, Jailyn, Jaylyn
rachael = rachel
kellie = kelli, kelly, kelley
kalli = kali, cali
jodi = jody
lori = lorrie, laurie, lorelei, etc.
shawn = shaun
allen = allan, alan, al
erika = erica
marcia = marcie, marsha
dona = donna
kristi = kristy, Christy, christine, christina, krista, etc.
norman = norm
chelsie = chelsey
stephine = stephanie, stephany, stephani
audree = audrey
kerri = kerry
fiona = fionna
savanna = savannah
bryanna = brianna, bri, briana, etc.
jaine = jane, jayne
leilani = lani
jesse = jessica, jess, jessie
abby = abbie
glenn = glen
carri = carrie, kari, kara
donn = don, donald
kym = kymberly, kim, kimberly, kimberli
gerri, geri = geraldine
nichole = nicky, nicki, nicholette, nicci, nicole
jamey = jaime, jamie
tami = tammie, tammy
derek = derick, derrick, derrek, rick, etc.
jenni = jennie, jenny
karin = karen
gabriela = gabriella
marni = marnie
dena = deena, dina, adina, adena
brittnie = brittany
juston = justin
lesli = leslie, lesley, les
kev = kevin
aga = athaga
carla = karla, carly
tiffanee = tiffany
staci = stacy, stacey, stacie
sara = sarah
katia = kate, katie
terri = teri, terrie, terry
ashly = ashley
jeanie = jeannie
matt = matthew, matthews
jillian = jill
laurel = laurie
(these all came from a registration list I'm working on)
Adding to Margaret and William.
Original issue reported on code.google.com by [email protected]
on 2 Mar 2014 at 11:46
Attachments:
Not an issue per se, but do you guys consider adding Latnio/Latina names to this package?
Thanks for this -- really finding it useful.
I found an instance of Allie used for Allison in my dataset.
There are some problems with names.csv
.
For example, Jon is the shortened form of Jonathan. But it is not mentioned in names.csv.
See mdahlman@fb82614
Hi! This looks to me to be one of the better maintained datasets of diminutive names on GitHub. It could be easier to use in python if this was actually released on PyPI so people could do a pip install nicknames
(surprisingly this package name isn't taken? Could definitely choose another name too.)
If I open a PR for this, would you be open to it? I'd add a github action similar to this one that would build and release the wheels automatically on a git tag. Your admin overhead on a day-to-day would be minimal, you'd just have to set up an PyPI account and add the access token to this repo's Secrets once. I can help with this too if you want.
Thank you!
I'll start by saying that having names-mysql.sql
is far better than not having it. Thanks to the guys that created it.
But there are a few aspects I don't like about it. I'm thinking of adding some improvements. I would very much like feedback from others about what would be most useful. Here's my mini-spec:
names-mysql.sql
available today.names.csv
is updated.Ideas for the file names:
names.sql
names-normalized.sql
generate-sql.py
Currently the map is declared with:
public Map<String, String> dimNames = new HashMap<String, String>();
but that won't compile because the code needs to store a map keyed on String but with a List of Strings being the value type. So this is the correct declaration:
Map<String, List<String>> dimNames = new HashMap<>();
Nice list - what is the best way for suggesting new aliases/nicknames?
There are 4 possible combos of "formalness", and how we typically treat them:
So in order to catch the (abbie, abbey) case, someone would need to do the abbie->abbigail lookup, and then the abbigail->abby lookup. eg:
def are_aliases(n1, n2):
for canon in nn.canonicals_of(n1):
if n2 in nn.nicknames_of(canon):
return True
return False
I'm thinking of some uses cases, ideally all of them could be supported. Where is my take on expected behavior:
canonicals_of(jonathon)
should just be {johnathon}
, no jon or john included.nicknames_of(jonation)
should be {johnathon, jon, john}
canonicals_of(jon)
: should this be merely {johnathon, jonathon}
, or should it also include {john}?nicknames_of(john)
should be {jon}
What do you think of these test cases and expected outputs? Once we know the expected outputs, that can inform what data representation we should use.
If we went with my suggestion of listing individual pairs, then we could annotate the pairs with their level of casualness. But that is whole other level of subjectivity we may want to avoid.
@carltonnorthern I'd love your thoughts here if you have the time. Thanks!
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.