ua-parser / uap-core Goto Github PK

View Code? Open in Web Editor NEW

742.0 742.0 449.0 6.83 MB

The regex file necessary to build language ports of Browserscope's user agent parser.

License: Other

Perl 8.03% JavaScript 91.97%

uap-core's People

Contributors

Stargazers

Watchers

Forkers

commenthol tobie r4fek riiiqpl winnersoftlab chadbailey59 diminished davemarchevsky ishwar-kumar forceward ajvondrak vanchinathan83 yozik04 sahat frantz obormot russellwhitaker codebynumbers lone nicole-ashley housepage adorsys netproteus shiftcars sibiyes itoto sociablelabs alonbg modulexcite josealvarezmuguerza sunnynot xplenty selwin jstangroome staslos die-tageszeitung holidayextras glogiotatidis brightroll themackworth seanwang1123 nico-at-worxx mattrobenolt gkalabin efouts terrybrown baigang sjokim nerdbaggy zires nikunjb goosewobbler dn009757 tsuikm simudream plutoshe sinaad msokk clippit salmani jiunjiunma ioninteractive thadafinser duforetntabmo itismewxg seanevans keccah escribano axelcho courage-zen guavatak acelan86 adrifelt shaun-stripe lubosan80 synchro gamoshi-archives bportnoy changhaifeng xbaran frazer-jin jazd dbeckham mldbai aflatter thomas-lehmann k-wojcik shikev rhysparry reticool mekegi wislow rockrotem danm huncrys yurigorokhov holographix mailspice sape-ru sonifi

uap-core's Issues

Windows 10 should be parsed as osFamily=Windows osMajor=10

Right now it's parsed as its own family, no longer part of "Windows".

Monitoring plugins

Hi folks,
we use monitoring tool which watch our infrastructure. It uses checks from monitoring plugins project. The http check identify it self as:
check_http/v1.4.16 (nagios-plugins 1.4.16)
check_http/v2.0 (nagios-plugins 2.0)

This is what i have added to regexes.yaml to make it parse.

regex: '^(check_http)/v(\d+).(\d+).?(\d+)?'
family_replacement: 'Monitoring_plugins'

It would be great if someone could check it and possibly include it to mainstream.

Best regards

Edge still detected as Chrome

http://uaparser.dmolsen.com/

  ua->family: Chrome
  ua->major: 42
  ua->minor: 0
  ua->patch: 2311
  ua->toString: Chrome 42.0.2311
  ua->toVersionString: 42.0.2311
  os->family: Windows
  os->major: 
  os->minor: 
  os->patch: 
  os->patch_minor: 
  os->toString: Windows
  os->toVersionString: 
  device->family: Other
  toFullString: Chrome 42.0.2311/Windows
  uaOriginal: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/42.0.2311.135 Safari/537.36 Edge/12.0

https://msdn.microsoft.com/en-us/library/hh869301(v=vs.85).aspx

Microsoft Office user agents

We use Sharepoint and so we see the various Microsoft Office components hitting the server. Is it possible to get these detected, either just as a generic "Microsoft Office" or as individual components?
For the version, 15.0.4693 aligns with Office 2013, patch/update 4693. Office 2010 is version 14, but I don't have any agent strings to confirm that it uses the same format.
The various agents are below, but as Microsoft like being consistent each product generally has more than one string.

Word
Microsoft Office Word 2013 (15.0.4693) Windows NT 6.2
Microsoft Office/15.0 (Windows NT 6.2; Microsoft Word 15.0.4693; Pro)

Excel
Microsoft Office Excel 2013 (15.0.4693) Windows NT 6.2
Microsoft Office/15.0 (Windows NT 6.2; Microsoft Excel 15.0.4693; Pro)

OneNote
Microsoft Office OneNote 2013
Microsoft Office OneNote 2013 (15.0.4693) Windows NT 6.2

Outlook
Microsoft Office Outlook 2013 (15.0.4693) Windows NT 6.2
Microsoft Outlook Social Connector (15.0.4569) MsoStatic (15.0.4569)

PowerPoint
Microsoft Office PowerPoint 2013 (15.0.4693) Windows NT 6.2
Microsoft Office/15.0 (Windows NT 6.2; Microsoft PowerPoint 15.0.4693; Pro)

Visio
Microsoft Office Visio 2013 (15.0.4693) Windows NT 6.2
Microsoft Office/15.0 (Windows NT 6.2; Microsoft Visio 15.0.4693; Pro)

Access
Microsoft Office/15.0 (Windows NT 6.2; Access Web Datasheet 15.0.4693; Pro)
Microsoft Office/15.0 (Windows NT 6.2; Microsoft Access 15.0.4693; Pro)

Lync
Microsoft Office/15.0 (Windows NT 6.2; Microsoft Lync 15.0.4675; Pro)

FrontPage
MSFrontPage/15.0

Unknown
Microsoft Office SyncProc 2013 (15.0.4693) Windows NT 6.2
Microsoft Office Upload Center 2013 (15.0.4693) Windows NT 6.2
non-browser; Microsoft Office/15.0 (Windows NT 6.2; 15.0.4691; Pro)

Chrome build version not parsed

Chrome versions are based on 4 numbers (like 39.0.2171.95) but only the first 3 are currently parsed. Like that:

  "major" => "39",
  "minor" => "0",
  "patch" => "2171"

I see 2 potential solutions:

Skip the second number in Chrome which is usually always 0. It would result as:

  "major" => "39",
  "minor" => "2171",
  "patch" => "95"

This would unfortunately break applications that actually rely on the existing mapping.

The second solution is to introduce patch_minor for all user agents. This is already existing for the os parser, so this would be logic and consistent.

  "major" => "39",
  "minor" => "0",
  "patch" => "2171",
  "patch_minor" => "95"

I understand this would be a major change, but think it's worth it.

What do you think?

Spotify Desktop App

We are seeing a lot of ads requested by the Spotify Desktop Application. This is being detected as Safari. Here are some example strings:

Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_4) AppleWebKit/537.36 (KHTML, like Gecko) Spotify/1.0.9.133 Safari/537.36
Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Spotify/1.0.9.133 Safari/537.36
Mozilla/5.0 (Macintosh; Intel Mac OS X 10_7_5) AppleWebKit/537.36 (KHTML, like Gecko) Spotify/1.0.4.90 Safari/537.36
Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Spotify/1.0.3.101 Safari/537.36
Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_5) AppleWebKit/537.36 (KHTML, like Gecko) Spotify/1.0.8.59 Safari/537.36

Oragnization migration

I'm so happy you began to migrate!

Some notes about this repository: instead of delete your "old" repository data (https://github.com/tobie/ua-parser), you should make a transfer and rename it uap-core after.

See https://help.github.com/articles/transferring-a-repository/

You will keep everything (stars, forks, links, issues, ...) and all links point to the old will be redirected.
Only one (bad) more step: clean up the repository after the transfer:

Delete all implementation already moved
Delete all package files (composer, package, setup.py, ...), and more globally all files related to implementation
Be focused on the data.

So IMO the best structure here should be like this:

docs
  / ...
test_resources
  / ...
.gitattributes
.gitignore
CONTRIBUTING.md
LICENSE
README.md
regexes.yaml

That's all!

⛵

Differentiating from WebView mobile chrome and real Mobile Chrome

Original issue by @bwaters at ua-parser/uap-php#15:

Is there anyway for uap-php to tell the difference between WebView in an Android App and the Chrome Mobile browser. https://developer.chrome.com/multidevice/user-agent

Looks like if it has a Version/x.x then it is webview ie..

Mozilla/5.0 (Linux; Android 4.4.4; SAMSUNG-SM-G870A Build/KTU84P) AppleWebKit/537.36 (KHTML, like Gecko) Version/4.0 Chrome/33.0.0.0 Mobile Safari/537.36

Vs a real chrome ua like this one:

Mozilla/5.0 (Linux; Android 4.4.4; SAMSUNG-SM-G900A Build/KTU84P) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2272.96 Mobile Safari/537.36

Recent HbbTV-related commit specifies non-existent capture group on replacement

This recent commit introduced a reference to a non-existing capture group:
a431239

→ git blame -L4337,+4 regexes.yaml
a4312396 (Joe Green 2015-04-15 12:23:22 +0100 4337)   - regex: '(HbbTV)/1\.1\.1.*CE-HTML/1\.\d;Vendor/(THOMSON);'
a4312396 (Joe Green 2015-04-15 12:23:22 +0100 4338)     device_replacement: '$1'
a4312396 (Joe Green 2015-04-15 12:23:22 +0100 4339)     brand_replacement: 'Thomson'
a4312396 (Joe Green 2015-04-15 12:23:22 +0100 4340)     model_replacement: '$3'

Also, the second (and last) actually specified capture group yields "THOMSON" but the brand_replacement specifies "Thomson"; this should be rationalized. Attn. @mrjgreen

Add composer.json support, that tests can be used without git submodule

It would be really great to get the rest ressources with complicated git commands, just with a composer require for php

Device Type Project - recommended/supported tool

The question of adding a "device_type" field comes up on a regular basis, but as @elsigh points out in #31 and #65 the ua-parser project is not the right place to make decisions on what is and isn't a phone/tv/tablet/refrigerator etc... Currently I am applying a set of basic rules to the output from the ua-parser to determine the device type (desktop/phone/tablet/other) but due to the simplicity of the approach it's not very accurate. A more complex solution would be difficult to keep up to date without community collaboration.

So - I wondered how people would feel about a side-project within the ua-parser organization to offer a device classification solution, or maybe to begin with a page to point people in the direction of some maintained projects that do this?

@commenthol has done a lot of awesome work on the https://github.com/commenthol/ua-parser-caps project which meets this requirement, plus a lot more - maybe that would be a good place to start for anybody looking to classify devices. As far as I can tell the project currently has only a JavaScript implementation, but is based on a regex ruleset that could be read by any library.

Does anybody have any thoughts on whether this is a good use of time, or is device classification a challenge that we really don't want to try to tackle?

regexes.yaml now has negative lookaheads (which break the uap-go implementation)

We've talked about performance before, but interestingly I wanted to use the uap-go implementation and found that some time between the version of uap-core that it's pinned to (99e8ba5) and now there have been a fair number of negative lookaheads added to regexes.yaml. Golang doesn't implement negative lookaheads because they are slower than O(n). I'm sure there are some hacks the go implementation could make to deal with these, but maybe we shouldn't allow negative lookaheads in regexes.yaml for this reason?
It may be that regexes.yaml isn't the place to enforce anything related to perf, but it's also true that negative lookaheads are kinda hard to digest (but then again, so is regex generally to many people).

Java User Agents get categorized as Other

This PR adds them on as a user agent family and add tests.

#60

Several recently introduced/modified HTC-related regexes reference non-existent capture groups

I've submitted a PR with fixes which addresses this issue: #84

add support for device categories

I hoped to bring up this topic again and maybe offer a compromise solution that does not impact performance and yet maintains reasonable accuracy. There have been a couple of pull requests to provide desktop vs mobile distinctions but that doesn't really work because the desktop vs mobile dichotomy is a false dichotomy. I suggest that we add 'category' whose values can be one of a number of predefined strings (this list can grow over time): "smartphone", "tablet", "game console", "wearable", etc.

The policy should be that we can add this category to existing regular expressions but that in general we should not add new regular expressions only to determine the category. A few consequences of this policy:

there will be no category for "personal computer". A UA string is determined to be a personal computer from the absence of any other category. We will let the library user make that determination we simply will either return some default value (such as "unknown") or return no value at all.
there will be a category labelled "smartphone or tablet" for entries in the regular expression file that match against both smartphones and tablets. Possibly if it doesn't impact performance we can later add additional regular expressions to differentiate some of these ambiguous entries. Based on our data if we only disambiguate smartphone vs tablet categories for Apple and Samsung devices that should correctly identify around 75% of the mobile devices that we see.

Crystal Semantics bot

I see "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; CrystalSemanticsBot http://www.crystalsemantics.com/service-navigation/imprint/useragent/)" coming back as IE 6 on Windows X, but it should be noted as a Web Crawler

Add support for device type

Similar to the request from @mspiegel--

I'd like to be able to identify the type or category of the device, or at least test if the device belongs to current broad categories, like is-a-phone or is-a-tablet.

Firefox on Android recognized as Android Stock Browser

The following UA
Mozilla/5.0 (Android 5.0; Tablet; rv:41.0) Gecko/41.0 Firefox/41.0

should return Firefox 41 as browser instead of Android 5

Otter and QupZilla are parsed as Safari

The following UAs are parsed as "Safari"

Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/538.1 (KHTML, like Gecko) Otter/0.9.04
=> should parse to Otter

Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/538.1 (KHTML, like Gecko) QupZilla/1.7.0 Safari/538.1
=> should parse to QupZilla

Client type?

Is there a way to retrieve the type (i.e. phone, tablet, laptop, desktop, etc) of a device using ua-parser? This alternate JavaScript library performs this by hardcoding the type depending on which regexes match. I couldn't find anything like it in the ua-parser docs.

Support the new IE preview

A new monster UA has emerged:

Mozilla/5.0 (Windows NT 6.4; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/36.0.1985.143 Safari/537.36 Edge/12.0

Seems to me that the best way to detect this monster is to assume that Edge/12.0 is assumed as IE 12 and that other browser occurrences are ignored.

Source:
https://gist.github.com/jacobrossi/c9699b27df2f4e97c0bd
http://blogs.msdn.com/b/ie/archive/2014/11/11/living-on-the-edge-our-next-step-in-interoperability.aspx#10572654

Why is Internet Explorer parsed as IE

I'm wondering, is there any reason to parse Internet Explorer as IE? FireFox isn't parsed as FF so why should Internet Explorer be parsed as IE?

From a end user point of view (for example in a session overview) Internet Explorer sounds more logical then IE.

Currently I'm changing IE to Internet Explorer after parsing but why not change it to Internet Explorer in the regexes.yaml?

package is not published to npm

Is there a good reason not to have the uap-core in npm registry?

Pinterest Browser

I see "Mozilla/5.0 (iPad; CPU OS 8_3 like Mac OS X) AppleWebKit/600.1.4 (KHTML, like Gecko) Mobile/12F69 [Pinterest/iOS]" coming back as Mobile Safari, but it should be the Pinterest Browser

iOS 9 support

At some point I hope to get around to this, but if not:

TestApp/1.0 CFNetwork/758.0.2 Darwin/15.0.0

I would except this to to parse into iOS 9.

iOS Captive Network user-agent not parsing

The iOS user agent used on captive networks

CaptiveNetworkSupport-324 wispr

Does not parse, it just parses into

Other

Differentiating from embedded Mobile Safari and real Mobile Safari

It appears that the Mobile Safari browser is not differentiated from the embedded "uiwebview" version of the browser. It is useful to know which webkit is rendering the page as the uiwebview version is known for being much worse than the actual Mobile Safari browser.

This is related to #38 but for the iOS side of things. There was also an issue created in the old UAParser repository, but it was never resolved.

Here is a SO answer talking about the issue as it applies to iOS.

Doesn't detect some Blackberry 10 UAs

Known Blackberry 10 UA:

Mozilla/5.0 (BB10; Touch) AppleWebKit/537.35  (KHTML, like Gecko) Version/10.3.1.2243 Mobile Safari/537.35

This comes out as BlackBerry Webkit 0.0.0. Tested with npm useragent module having run its internal update process to bring in the master copy of the regexes from here.

Agent {
  family: 'BlackBerry WebKit',
  major: '0',
  minor: '0',
  patch: '0',
  source: 'Mozilla/5.0 (BB10; Touch) AppleWebKit/537.35  (KHTML, like Gecko) Version/10.3.1.2243 Mobile Safari/537.35' }

This UA string matches the pattern explained by BB in this blog post. Issue in the polyfill service: polyfillpolyfill/polyfill-service#491.

Edge is being parsed as Chrome

I tested in some links i found:
http://www.whatsmyua.info/
http://uaparser.dmolsen.com/

Both shows:

rawUa: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/42.0.2311.135 Safari/537.36 *Edge/12.10240*
string: *Chrome 42.0.2311*
family: *Chrome*
major: 42
minor:
patch: 2311
device: Other

ua->family: Chrome
  ua->major: 42
  ua->minor: 0
  ua->patch: 2311
  ua->toString: Chrome 42.0.2311
  ua->toVersionString: 42.0.2311

Is that links update with latest code? If no, do you have any?

Edge mobile browser detects as Chrome Mobile

Hi!
Edge mobile browser is detected as Chrome Mobile. Desktop browser is detected right.
I got user agents from here: https://msdn.microsoft.com/en-us/library/hh869301(v=vs.85).aspx

In [1]: ua_parser.VERSION
Out[1]: (0, 4, 1)

In [2]: ua = "Mozilla/5.0 (Windows Phone 10.0; Android 4.2.1; DEVICE INFO) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/42.0.2311.135 Mobile Safari/537.36 Edge/12.9600"

In [3]: Parse(ua)
Out[3]:
{'device': {'brand': u'Generic',
  'family': u'Generic Smartphone',
  'model': u'Smartphone'},
 'os': {'family': 'Windows Phone',
  'major': '10',
  'minor': '0',
  'patch': None,
  'patch_minor': None},
 'string': 'Mozilla/5.0 (Windows Phone 10.0; Android 4.2.1; DEVICE INFO) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/42.0.2311.135 Mobile Safari/537.36 Edge/12.9600',
 'user_agent': {'family': u'Chrome Mobile',
  'major': '42',
  'minor': '0',
  'patch': '2311'}}

Windows 10 is not detected

Windows 10:
UA: "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.71 Safari/537.36 Edge/12.0"

Xbox One:
UA: "Mozilla/5.0 (compatible; MSIE 10.0; Windows NT 6.2; Trident/6.0;
Xbox; Xbox One)"

Xbox 360:
UA: "Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; Trident/5.0;
Xbox)"

IE parsed as Outlook

The UA below is classified as "IE" by the original BrowserScope code (see http://www.browserscope.org/ua ), but the uap regex classify it as "Outlook". It seems like "IE" is the correct answer. Is "Outlook" preferred or is this a bug?

UA:
Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.1; WOW64; Trident/4.0; SLCC2; .NET CLR 2.0.50727; .NET CLR 3.5.30729; .NET CLR 3.0.30729; InfoPath.3; .NET4.0C; .NET4.0E; Microsoft Outlook 14.0.7109; ms-office; MSOffice 14)

I tested the BrowserScope code by going to http://www.browserscope.org/ua:

I tested the uap regex using the uap-php code:
$ php bin/uaparser.php ua-parser:parse "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.1; WOW64; Trident/4.0; SLCC2; .NET CLR 2.0.50727; .NET CLR 3.5.30729; .NET CLR 3.0.30729; InfoPath.3; .NET4.0C; .NET4.0E; Microsoft Outlook 14.0.7109; ms-office; MSOffice 14)"
{"ua":{"major":"2010","minor":null,"patch":null,"family":"Outlook"},"os":{"major":null,"minor":null,"patch":null,"patchMinor":null,"family":"Windows 7"},"device":{"brand":null,"model":null,"family":"Other"},"originalUserAgent":"Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.1; WOW64; Trident/4.0; SLCC2; .NET CLR 2.0.50727; .NET CLR 3.5.30729; .NET CLR 3.0.30729; InfoPath.3; .NET4.0C; .NET4.0E; Microsoft Outlook 14.0.7109; ms-office; MSOffice 14)"}

question : same thing but for android phones

just wondering if anyone knows of a UA parrser for android.

the android fragmentation problem means that its important to do capability testing code and fall back to other means.
From a capability tester, you can perhaps build a UA data silo

specification device_parser section example out of date

The matching regex and device lookup result for the example used in the device_parser section of the specification is out of date. This useragent:

Mozilla/5.0 (Linux; U; Android 4.2.2; de-de; PEDI_PLUS_W Build/JDQ39) AppleWebKit/534.30 (KHTML, like Gecko) Version/4.0 Safari/534.30

Actually resolves to:

uap-clj.core=> (def pedi-plus-ua "Mozilla/5.0 (Linux; U; Android 4.2.2; de-de; PEDI_PLUS_W Build/JDQ39) AppleWebKit/534.30 (KHTML, like Gecko) Version/4.0 Safari/534.30")
#'uap-clj.core/pedi-plus-ua

uap-clj.core=> (pprint (:device (lookup-useragent pedi-plus-ua)))
{:family "Odys PEDI PLUS W", :brand "Odys", :model "PEDI PLUS W"}

Using this regex:

uap-clj.core=> (first (remove nil? (map #(re-find #"(?i).*PEDI.*" (:regex %)) regexes-device)))
"; *(PEDI)_(PLUS)_(W) Build"

I'm going to edit this and generate a PR for update. Oh, and along the way I'll also fix the use of "family_replacement" in the same section as well.

Get the test suite running again

What it says on the tin; we're stalled for stable new editions until the tests run.

Specify regex dialect in specification.md

I'm working on ua-parser/uap-go#6. In the process of implementing a potential fix (adding some new struct fields), I saw this error while testing against the latest version of regexes.yaml and test_resources:

panic: regexp: Compile(`; *(?:ARCHOS|Archos) ?(GAMEPAD(?:(?! Build|[;/\(\)\-]).)*)`): error parsing regexp: invalid or unsupported Perl syntax: `(?!`

Ideally specification.md would point to a spec (pcre perhaps?) that all regexes in regexes.yaml must conform to so parser implementations can use the appropriate library for their language.

'js_user_agent_string' not mentioned in specification

'js_user_agent_string' and associated Chrome Frame related substitution alternates are apparently used in at least two ua-parser language implementations (Python and PHP) but are not addressed in the specification document for uap-core. Apparently there's historical context predating the languages/core split which is not documented in ua-parser/uap-core, which can cause puzzlement to developers new to the project (e.g. myself, with a Clojure implementation I'm bringing up to date on the specification w.r.t. Browser and O/S, as well as adding Device, PR pending.)

Does someone have guidance on this?

qutebrowser parsed as Safari 0.0.0

qutebrowser sends a user agent like Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/538.1 (KHTML, like Gecko) qutebrowser/0.2.1 Safari/538.1, which is parsed as Safari 0.0.0.

Bump.

The last bump was in Nov 2014. It's time :D

Facebook Mobile Browser

I see "Mozilla/5.0 (Linux; Android 5.0.1; SCH-I545 Build/LRX22C; wv) AppleWebKit/537.36 (KHTML, like Gecko) Version/4.0 Chrome/43.0.2357.121 Mobile Safari/537.36 [FB_IAB/FB4A;FBAV/38.0.0.47.240;]" parsing as Chrome but it should be the Facebook Browser.

Instagram uses Chrome but is parsed as Mobile Safari

This is the user agent string:

Instagram 3.4.1 (iPhone5,1; iPhone OS 6.0.2; en_US; en) AppleWebKit/420+

This is what it gets parsed as:
{"family":"Mobile Safari","os":"iOS","osversion":"6"}

This is how I know it is Chrome:

window.performance.memory.usedJSHeapSize is set (in JavaScript)

Add regex for FxiOS (Firefox on iOS)

See https://developer.mozilla.org/en-US/docs/Web/HTTP/Gecko_user_agent_string_reference#Firefox_for_iOS

specification.md out of date with respect to devices

The section on device_parsers lists 'family_replacement', 'brand_replacement', and 'model_replacement' fields. After commit 22ef8a1 it looks like devices has the fields 'device_replacement', 'brand_replacement', and 'model_replacement'. Also it seems like all devices have all three fields explicitly enumerated and there doesn't appear to be an implicit match ordering as with the user_agent or operating system section. Is the documentation out of date?

BlackBerry browser version number

I'm seeing lots user agent string like this, which seem to be valid:

BlackBerry 9670/63.94.0.711 ldrepos/XXXXX-123 Configuration/XXXXXX-123 VendorID/123

Current Behaviour

This appears to be parsing the same number for browser version and device model (9670).

The string is matching rule: (Black[bB]erry)\s?(\d+)

Expected Behaviour

It seems that the browser version is not available in this case, so the expected output could just be BlackBerry. In this case the regex would instead be: (Black[bB]erry)

Is anybody able to confirm whether or not this is the intended behaviour? If somebody can confirm what the expected output should be I will attempt to create a PR with some tests.

Additional Information

Full parse result below:

    "user_agent": {
        "family": "BlackBerry",
        "major": "9670",
        "minor": null,
        "patch": null,
        "regex": "@(Black[bB]erry)\\s?(\\d+)@"
    },
    "os": {
        "family": "BlackBerry OS",
        "major": "63",
        "minor": "94",
        "patch": "0",
        "regex": "@(Black[Bb]erry)[0-9a-z]+\/(\\d+)\\.(\\d+)\\.(\\d+)(?:\\.(\\d+))?@"
    },
    "device": {
        "device": "BlackBerry 9670",
        "brand": null,
        "model": "9670",
        "regex": "@Black[Bb]erry([0-9]+)@"
    }

Thanks!

Get real data set for performance testing

Firefox Tablet not properly recognized

User-Agent: Mozilla/5.0 (Android 4.1.2; Tablet; rv:41.0) Gecko/41.0 Firefox/41.0
results in

UAParser\Result\Client Object
(
    [ua] => UAParser\Result\UserAgent Object
        (
            [major] => 4
            [minor] => 1
            [patch] => 2
            [family] => Android
        )

    [os] => UAParser\Result\OperatingSystem Object
        (
            [major] => 4
            [minor] => 1
            [patch] => 2
            [patchMinor] =>
            [family] => Android
        )

    [device] => UAParser\Result\Device Object
        (
            [brand] => Generic
            [model] => Tablet
            [family] => Generic Tablet
        )

    [originalUserAgent] => Mozilla/5.0 (Android 4.1.2; Tablet; rv:41.0) Gecko/41.0 Firefox/41.0
)

How should we handle IE compatibility mode?

The date for the support change announced here is approaching. I use the analysis of UAs to determine support and understanding what version of IE end users are running in addition to what version is being emulated is crucial.

Currently compatibility mode user agent strings such as these from IE 11 are parsed as 10 and 9.

Mozilla/5.0 (compatible; MSIE 10.0; Windows NT 6.3; Trident/7.0)
Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.3; Trident/7.0)

Does it make sense to add the idea of engines like @commenthol has in his ua-parser2 or are there other ideas?

Some guidance before spending time on PR is appreciated.

Riddler twice in regexes.yaml

Riddler appears twice in the same regex:

  # Bots
  - regex: '(1470\.net crawler|50\.nu|8bo Crawler Bot|Aboundex|Accoona-[A-z]+-Agent|AdsBot-Google(?:-[a-z]+)?|altavista|AppEngine-Google|archive.*?\.org_bot|archiver|Ask Jeeves|[Bb]ai[Dd]u[Ss]pider(?:-[A-Za-z]+)*|bingbot|BingPreview|blitzbot|BlogBridge|BoardReader(?: [A-Za-z]+)*|boitho.com-dc|BotSeer|\b\w*favicon\w*\b|\bYeti(?:-[a-z]+)?|Catchpoint bot|[Cc]harlotte|Checklinks|clumboot|Comodo HTTP\(S\) Crawler|Comodo-Webinspector-Crawler|ConveraCrawler|CRAWL-E|CrawlConvera|Daumoa(?:-feedfetcher)?|Feed Seeker Bot|findlinks|Flamingo_SearchEngine|FollowSite Bot|furlbot|Genieo|gigabot|GomezAgent|gonzo1|(?:[a-zA-Z]+-)?Googlebot(?:-[a-zA-Z]+)?|Google SketchUp|grub-client|gsa-crawler|heritrix|HiddenMarket|holmes|HooWWWer|htdig|ia_archiver|ICC-Crawler|Icarus6j|ichiro(?:/mobile)?|IconSurf|IlTrovatore(?:-Setaccio)?|InfuzApp|Innovazion Crawler|InternetArchive|IP2[a-z]+Bot|jbot\b|KaloogaBot|Kraken|Kurzor|larbin|LEIA|LesnikBot|Linguee Bot|LinkAider|LinkedInBot|Lite Bot|Llaut|lycos|Mail\.RU_Bot|masidani_bot|Mediapartners-Google|Microsoft .*? Bot|mogimogi|mozDex|MJ12bot|msnbot(?:-media *)?|msrbot|netresearch|Netvibes|NewsGator[^/]*|^NING|Nutch[^/]*|Nymesis|ObjectsSearch|Orbiter|OOZBOT|PagePeeker|PagesInventory|PaxleFramework|Peeplo Screenshot Bot|PlantyNet_WebRobot|Pompos|Read%20Later|Reaper|RedCarpet|Retreiver|Riddler|Riddler|Rival IQ|scooter|Scrapy|Scrubby|searchsight|seekbot|semanticdiscovery|Simpy|SimplePie|SEOstats|SimpleRSS|SiteCon|Slurp|snappy|Speedy Spider|Squrl Java|TheUsefulbot|ThumbShotsBot|Thumbshots\.ru|TwitterBot|URL2PNG|Vagabondo|VoilaBot|^vortex|Votay bot|^voyager|WASALive.Bot|Web-sniffer|WebThumb|WeSEE:[A-z]+|WhatWeb|WIRE|WordPress|Wotbox|www\.almaden\.ibm\.com|Xenu(?:.s)? Link Sleuth|Xerka [A-z]+Bot|yacy(?:bot)?|Yahoo[a-z]*Seeker|Yahoo! Slurp|Yandex\w+|YodaoBot(?:-[A-z]+)?|YottaaMonitor|Yowedo|^Zao|^Zao-Crawler|ZeBot_www\.ze\.bz|ZooShot|ZyBorg)(?:[ /]v?(\d+)(?:\.(\d+)(?:\.(\d+))?)?)?'

{device,brand,module}_replacement can refer to non-existent capture groups

We found '; *(Google )?(Nexus [Ss](?: 4G)?) Build/' failed as both device_replacement and model_replacement can refer to none existent capture groups (if Google is not present then $2 is not defined).

This probably should be rewritten (untested and off the top of my head) as ; *(Google |)(Nexus [Ss](?: 4G)?) Build/ to avoid this.

Really crudely, looks like someone will need to flip through additionally the following to check at least these are safe, but someone is going to have to come up with a regex to test your regex's...Dawg :) Should help in future as they get updated and added to so problems like this do not crop up again:

$ grep -A3 '([^(?:)][^()]*)?' regexes.yaml | grep -B3 '_replacement.*\$' | grep regex:
  - regex: '(Namoroka|Shiretoko|Minefield)/(\d+)\.(\d+)([ab]\d+[a-z]*)?'
  - regex: 'Android Application[^\-]+ - (Sony) ?(Ericsson)? (.+) \w+ - '
  - regex: '; *(Advent )?(Vega(?:Bean|Comb)?).* Build'
  - regex: '; *(Ainol )?((?:NOVO|[Nn]ovo)[^;/]+) Build'
  - regex: '; *(ALLVIEW[ _]?|Allview[ _]?)?(AX1_Shine|AX2_Frenzy) Build'
  - regex: '; *(CUBE[ _])?([KU][0-9]+ ?GT.*|A5300) Build'
  - regex: '; *(HUAWEI |Huawei-)?([UY][^;/]+) Build/(?:Huawei|HUAWEI)([UY][^\);]+)\)'
  - regex: '; *(MODECOM )?(FreeTab) ?([^;/]+) Build'
  - regex: '; *(A\d+)[ _](Duo)? Build'
  - regex: '; *(NOOK )?(BNRV200|BNRV200A|BNTV250|BNTV250A|BNTV400|BNTV600|LogicPD Zoom2) Build'
  - regex: '; *(SKY[ _])?(IM\-[AT]\d{3}[^;/]+).* Build/'
  - regex: 'Android 4\..*; *(M[12356789]|U[12368]|S[123])\ ?(pro)? Build'
  - regex: '; *(?:Polaroid[ _])?((?:MIDC\d{3,}|PMID\d{2,}|PTAB\d{3,})[^;/]*)(\/[^;/]*)? Build/'
  - regex: '; *(A2|A5|A8|A900)_?(Classic)? Build'
  - regex: '; *(SAMSUNG |Samsung )?((?:Galaxy (?:Note II|S\d)|GT-I9082|GT-I9205|GT-N7\d{3}|SM-N9005)[^;/]*)\/?[^;/]* Build/'
  - regex: '; *(Google )?(Nexus [Ss](?: 4G)?) Build/'
  - regex: '; *(SAMSUNG-)?(GT\-[BINPS]\d{4}[^\/]*)(\/[^ ]*) Build'
  - regex: '; *((?:SCH|SGH|SHV|SHW|SPH|SC|SM)\-[A-Za-z0-9 ]+)(/?[^ ]*)? Build'
  - regex: ' ((?:SCH)\-[A-Za-z0-9 ]+)(/?[^ ]*)? Build'
  - regex: '; *((?:CSL_Spice|Spice|SPICE|CSL)[ _\-]?)?([Mm][Ii])([ _\-])?(\d{3}[^;/]*) Build/'
  #~ - regex: '; *(Sprint)? ?(HTC_?)?(X515E|ATP515CKIT|APA7373KT|PG06100|APC715CKT|APX515CKT|PG86100|EVOV4G|APA9292KT|PC36100) Build'
  - regex: '\b(T-Mobile ?)?(myTouch)[ _]?([34]G)[ _]?([^\/]*) (?:Mozilla|Build)'

SMART-TV, Tizen, SamsungBrowser

User-Agent: Mozilla/5.0 (SMART-TV; Linux; Tizen 2.3) AppleWebkit/538.1 (KHTML, like Gecko) SamsungBrowser/1.0 TV Safari/538.1

{
    "ua": {
        "major": null,
        "minor": null,
        "patch": null,
        "family": "Safari"
    },
    "os": {
        "major": null,
        "minor": null,
        "patch": null,
        "patchMinor": null,
        "family": "Linux"
    },
    "device": {
        "brand": null,
        "model": null,
        "family": "Other"
    },
    "originalUserAgent": "Mozilla\/5.0 (SMART-TV; Linux; Tizen 2.3) AppleWebkit\/538.1 (KHTML, like Gecko) SamsungBrowser\/1.0 TV Safari\/538.1"
}

No device is recognized
OS is not recognized as Tizen 2.3.
The regex in regexes.yaml is

  ##########
  # Tizen OS from Samsung
  # spoofs Android so pushing it above
  ##########
  - regex: '(Tizen)/(\d+)\.(\d+)'

But according to http://developer.samsung.com/technical-doc/view.do?v=T000000203 there should be no / between platform and platform version:
Mozilla/$(MOZILA_VER) ($(DEVICE_TYPE); $(OS); $(PLATFORM) $(PLATFORM_VER);
SAMSUNG $(MODEL_NAME) Build/$(BUILD_TAG)) AppleWebKit/$(APPLEWEBKIT_VER)
(KHTML, like Gecko) $(APP_NAME)/$(APP_VER) (Chrome/$(CHROME_VER))
$(UX RECOMMEND) Safari/$(SAFARI_VER)
3. The browser is recognized as Safari which looks like a fallback for comething more specific.