aeshirey / nameparsersharp Goto Github PK

License: GNU Lesser General Public License v2.1

C# 100.00%

nameparsersharp's Introduction

NameParserSharp

Based upon nameparser 0.36, NameParserSharp is a C# library that parses a human name into constituent fields Title, First, Middle, Last, Suffix, and Nickname from the HumanName class. For example:

var jfk = new HumanName("president john 'jack' f kennedy");

// person.Title == "president"
// person.First == "john"
// person.Middle == "f"
// person.Last == "kennedy"
// person.Nickname == "jack"

var jfk_alt = new HumanName("kennedy, president john (jack) f");

Assert.IsTrue(jfk == jfk_alt);

NameParserSharp implements the functionality of the Python project on which it is based in a C# idiomatic way. It also,

eliminates nearly all regular expressions for efficiency
adds unit tests
improves nickname handling to expand delimiters: John (Jack) Torrence == John 'Jack' Torrence == John "Jack" Torrence
parses out multiple names from a single string as you might expect, as in mr john and mrs jane doe

NameParserSharp is available as a NuGet package: Install-Package NameParserSharp

nameparsersharp's People

Contributors

Stargazers

Watchers

Forkers

nikolaymakhonin yschiller eeschi dannycabrera redactedhash chrisckc edoust jamesrm9235 rwooters wayneseguin enkodellc rip-leo jiangsheng itzalive

nameparsersharp's Issues

Invalid Parsing

“JOSEPH J MA SR” is parsed as below

FNAME : JOSEPH
LNAME: J
SUFFIX: MA SR

instead of

FNAME : JOSEPH
MNAME: J
LNAME: MA
SUFFIX: SR

Parser throws ArgumentOutOfRangeExceptionexception when name has parenthesis (just nickname)

If name has parenthesis you get ArgumentOutOfRangeException exception. i.e.

var humanName = new HumanName("(FIRSTNAME MIDDLENAME LASTNAME)");

System.ArgumentOutOfRangeException: Index was out of range. Must be non-negative and less than the size of the collection.
Parameter name: index
at System.ThrowHelper.ThrowArgumentOutOfRangeException(ExceptionArgument argument, ExceptionResource resource)
at System.Collections.Generic.List`1.get_Item(Int32 index)
at NameParser.HumanName.ParseFullName() in c:\Users\Adam\Source\Repos\NameParserSharp\NameParser\NameParser\Parser.cs:line 394
at NameParser.HumanName.set_FullName(String value) in c:\Users\Adam\Source\Repos\NameParserSharp\NameParser\NameParser\Parser.cs:line 46
at NameParser.HumanName..ctor(String fullName)

Request functionality to add to lists.

Love the functionality of this utility.

Our data contains some slightly different variants of some of the suffixes. e.g. "2nd" or "3rd" instead of "Jr", "II", or "III". Rather than needing to try to anticipate every way that users will enter their suffixes, etc. It might be nice to allow the consuming developer to add their own tweaks when their data presents them.

Additional prefix (for config.cs)

I have some additional prefixes for you to make the list more complete:

den
dem
't
onder
op
het
in

First Name is a Known Title

I commented on issue #20 - but my issue was slightly different.

Within the list is "Junior" - which in my data at least is also a first name.

I saw the fix you made with the enum Prefer, I wondered whether you were considering the same for the Title? So we could have a Prefer.FirstOverTitle?

It is actually an easy fix on user end, so I wouldn't feel any urgency, just be nice to set a flag for it.

if (testName.Title != "" && testName.First == "")
    {
        sb.Append("" + "þ" + testName.Title + "þ" + testName.Middle + "þ" + testName.Last + "þ" + testName.Suffix + Environment.NewLine);
    }
    else
    {
        sb.Append(testName.Title + "þ" + testName.First + "þ" + testName.Middle + "þ" + testName.Last + "þ" + testName.Suffix + Environment.NewLine);
    }

Nice tool btw very good and I have tried a few of these and written a couple myself - variation is just a killer with names.

Support for netstandard

Requesting support for netstandard so this package can be used in netstandard and netcore libraries.

Caveat is that this will also require Visual Studio 2017 or greater.

Req: Seperate prefixes

This is not so much an issue as a request for a feature. I would like to have the prefixes separately from the last name. At the moment they are joined with the lastname in the function 'join_on_conjunctions'.

The reason is for sorting. For example, my own name is Sander van der Linden. Although my lastname is 'van der Linden', it is sorted under the l, not the v. That is not uncommon in European countries at least. If the prefixes are separated, this can be done much more easily. From what I gather, the following changes would be required. It seems to work, but you might need to check with more testcases...:

In the property FullName (around line 39/40) add _NicknameList = new List<string>();
Add around 74:

public string Prefix
        {
            get { return string.Join(" ", _PrefixList); }
        }

Around line 84:
private IList<string> _PrefixList;

Around line 138:

if (includeEmpty || !string.IsNullOrEmpty(Prefix))
                d["prefix"] = Prefix;

Around line 166:

private static bool ArePrefixes(string piece)
{
    return IsPrefix(piece.Split(" ").First); 'check only if the first is a prefix if there are more.
}

Around line 272:

else if (ArePrefixes(piece))
                        _PrefixList.Add(piece)

line 517 should become (since there can be multiple prefixes, even if there is a suffix:
var newPiece = string.Join(" ", pieces.Skip(i + prefixes.Length).Take(j - i + 1));

line 521: .Concat({ string.Join(" ", prefixes) })
line 522: .Concat({ newPiece })
line 527-529 :

pieces = pieces.Take(i).ToList()
.Concat({ string.Join(" ", prefixes) })
.Concat({ string.Join(" ", pieces.Skip(i + prefixes.Length).ToList()) }).ToList

Invalid Name Parsing

Love this library (thank you!).

Found a case where a person's name "VAN L JOHNSON" was putting the entire name into the first name field, instead of First: "VAN", Middle: "L", Last: "JOHNSON"

First name is a known prefix

I greatly appreciate your project. In the data I am using I have a Mr. Del Richards. The last name returns Del Richards as del is listed in prefixes. However, Del is his first name. I am thinking that it could treat the prefix as the first name if there is not a first name, but there would be times no first name could be included. Any thoughts how to parse this one name correctly?

D'Juan O'Connor does not parse properly

Library Version
1.5.0

Describe the bug
D'Juan O'Connor

(Made up test name)

Actual & Expected behavior
What is your actual and expected result? Ideally, include code that can be copied & pasted directly into the test project

Expected result: FirstName: D'Juan, LastName: O'Connor, Nickname:
Actual result: FirstName: DConnor, LastName: , Nickname: JuanO

Applicability
Is your use-case unique, or is it likely to affect many other people?

This would affect any names with exactly 2 apostrophes that aren't nickname selectors.

Is there a general property of your input data that might help describe it or guide a resolution? (eg, input is guaranteed to have a first name; need to handle different last name prefixes than are standard)

Possibly need to cleanup the nickname logic to look for space apostrophe nickname apostrophe space instead of just sets of apostrophes.

Nickname parsing failed when single quotes are used and nickname is at beginning of string

This WILL parse correctly with TREY as the Nickname:
"TREY" ROBERT HENRY BUSH III

This WILL NOT parse correctly:
'TREY' ROBERT HENRY BUSH III
-- Ends up with First = TREY' (single quote left in string, probably result of first single quote getting prematurely trimmed)

Note, if the nickname surrounded by single quotes in not at the start of the line, it parses correctly:
ROBERT 'TREY' HENRY BUSH III

Incorrectly identifying name parts

I was trying to develop a name parsing routine myself when I came across this repo, and I think it will be tremendously helpful for my use case. However, I've run across a couple of instances that seem to be tripping the parser up a bit.

The first one is how the parser handles names that actually include one of the pre-defined titles as a part of the name. For example, the name "Robert Lee Elder III" does not parse correctly. In this case, it should be:

First Name: Robert
Middle Name: Lee
Last Name: Elder
Suffix: III

However, what I actually get is:

Title: Elder
First Name: Robert
Middle Name: Lee
Last Name: III

Obviously, I can simply remove this entry from the list of titles and recompile for my purposes today, but it would be better if it's possible to rework the logic to ignore the title if it's determined that the original name is in First [Middle] Last format and the "title" is found at or towards the end. I realize that logic can get extremely complicated - that's the main reason I was trying to find someone else's code to handle it for me - but it would make this much more reliable.

The second is, I realize, not something that the parser is currently designed to handle: business names. Because of my needs, I've got some additional tests that seem to be handling that, but it might be something you consider adding in a future push. This would involve identifying the abbreviations and such for businesses (LLC, Corp, Assn, etc.).

NuGet DLL version is incorrect

I've just upgraded to 1.2.0.0 (thanks for the updates btw). I've got a build running in TeamCity for my project that is using NameParserSharp, and I am using NuGet Package restore. I noticed that your DLLs have the wrong version number. Shouldn't it match the release number?

Not a major issue, but it looks a little odd, because the version number never increases between releases. It could cause users DLL version issues as well.

If you have an automated build process using TeamCity or Maven, you can patch the assembly version automatically as part of the automated build process: https://essenceofcode.com/2013/09/18/assembly-versioning-with-team-city-and-git/

Name with suffix and last name prefix fail to parse correctly

Names that contain both a suffix and a last name prefix, "Quincy De La Rosa Sr" for example, are not parsing out correctly. The suffix winds up as a part of the last name and as the suffix.

I've added requisite unit tests and submitted a pull request that resolves the issue: #12

Improvement: A few examples that aren't parsing properly

First of all, this is an excellent library and in most of our use cases it is working really well.

I have a number of edge case examples where it isn't parsing correctly, namely those with joint names:

N.B. output is string.Format (Title, First, Last)

Mr S Bloggs and Miss L Jones => Mr S Jones
Mrs K Bloggs and Mr S Jones => Mrs K Jones
Mr R & Mrs J Bloggs => Mr R Bloggs
Mr Ian Bloggs & Mrs Elizabeth Bloggs => Mr Ian Bloggs
Mrs M L Jones => Mrs L Jones

Probably edge cases, but I'd thought I'd drop you a note.

Request to accept 2 names in first name

Some names have 2 names in the first name like "John Doe A. Smith"

Expected:
Should extract both the 2 names because that is part of his first name.

Actual:
Only the first word is extracted and the second is in the middle name which is actually not his middle name because he has "A." for his middle initial/name.

aeshirey / nameparsersharp Goto Github PK

nameparsersharp's Introduction

NameParserSharp

nameparsersharp's People

Contributors

Stargazers

Watchers

Forkers

nameparsersharp's Issues

Recommend Projects

Recommend Topics

Recommend Org

Jobs