Comments (6)
Hi! Thanks for opening this! A few things:
- The relationship between canonical and nickname is not symmetrical. So Matt is a nickname for Matthew, but not vice versa. matt and Matthew are already present in the data, are you just not using this library correctly?
- I want to be conservative with what links are added, so that there aren't false positives. For instance, I'm skeptical of how common Barrett-Garrett is. 95% of your suggestions look good, but I want to leave out a few of the weirder ones.
- I can add these cases to the code, but only if you help me with the grunt work of formatting for me, putting these in the form CANONICAL,NICK0,NICK1,NICK2, etc
Let me know what you'd like to do!
from nicknames.
I'm unclear on the file structure and how to decide whether something is a name or nickname (e.g, is Kari a real name or a nickname for Carrie?). Would need more info to help with this.
I'm ok if stuff gets left out, the point of my issue report was to try to close some of the holes. Some of the names I'd never seen before either, but then looked up and found they were common in other countries (e.g., Garrett is big in Ireland).
from nicknames.
The file structure is CANONICAL,NICK0,NICK1,NICK2, NICK3,etc as you can see in the csv. Does that make sense?
Yeah the Kari/Carrie case is ambiguous. I would lean towards Carrie being the longer one and therefore the canonical one. But for the Sara/Sarah case, I think that is symmetrical, so you should have a line sarah,sara
as well as sara,sarah
. Just try your best and I can go through and give my 2 cents and we should be able to find something. Just trying to make it better than how it currently is, it doesn't need to be perfect.
from nicknames.
names.csv
This is close, maybe not all the possible canonicals, but enough that code could look in the nicknames to find related nicknames as canonical names too.
from nicknames.
Thank you @afeibus ! I made a few adjustments, but most of them looked great. Thank you very much, your work is very much appreciated! If you want, take a look at the above linked change and double check that I didn't do any changes to your edits that you disagree with.
from nicknames.
Closing as done, but if you find any problems with the tweaks I made please raise a new issue ( and link to this one)
from nicknames.
Related Issues (17)
- Patch for /trunk/names.csv
- 'Ed' list does not have 'Edward' HOT 1
- Karon has last name in csv HOT 8
- How is this file structured? HOT 1
- allie for allison HOT 2
- The JavaParser has the wrong type definition for the dimNames map HOT 1
- Make release on PyPI HOT 1
- Consider renaming repo to `us-nicknames` HOT 11
- remove doctor,namegivento HOT 1
- Get PyPI tokens set up HOT 3
- Create better SQL resources for names.csv HOT 6
- Documentation out of date? HOT 3
- BUG: can't instantiate default nicknamer twice in a row
- Problems with names.csv HOT 2
- Browse through some of the forked repos and bring improvements back into this repo HOT 3
- add a license HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from nicknames.