ua-parser / uap-core Goto Github PK
View Code? Open in Web Editor NEWThe regex file necessary to build language ports of Browserscope's user agent parser.
License: Other
The regex file necessary to build language ports of Browserscope's user agent parser.
License: Other
Right now it's parsed as its own family, no longer part of "Windows".
Hi folks,
we use monitoring tool which watch our infrastructure. It uses checks from monitoring plugins project. The http check identify it self as:
check_http/v1.4.16 (nagios-plugins 1.4.16)
check_http/v2.0 (nagios-plugins 2.0)
This is what i have added to regexes.yaml to make it parse.
It would be great if someone could check it and possibly include it to mainstream.
Best regards
ua->family: Chrome
ua->major: 42
ua->minor: 0
ua->patch: 2311
ua->toString: Chrome 42.0.2311
ua->toVersionString: 42.0.2311
os->family: Windows
os->major:
os->minor:
os->patch:
os->patch_minor:
os->toString: Windows
os->toVersionString:
device->family: Other
toFullString: Chrome 42.0.2311/Windows
uaOriginal: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/42.0.2311.135 Safari/537.36 Edge/12.0
https://msdn.microsoft.com/en-us/library/hh869301(v=vs.85).aspx
We use Sharepoint and so we see the various Microsoft Office components hitting the server. Is it possible to get these detected, either just as a generic "Microsoft Office" or as individual components?
For the version, 15.0.4693 aligns with Office 2013, patch/update 4693. Office 2010 is version 14, but I don't have any agent strings to confirm that it uses the same format.
The various agents are below, but as Microsoft like being consistent each product generally has more than one string.
Word
Microsoft Office Word 2013 (15.0.4693) Windows NT 6.2
Microsoft Office/15.0 (Windows NT 6.2; Microsoft Word 15.0.4693; Pro)
Excel
Microsoft Office Excel 2013 (15.0.4693) Windows NT 6.2
Microsoft Office/15.0 (Windows NT 6.2; Microsoft Excel 15.0.4693; Pro)
OneNote
Microsoft Office OneNote 2013
Microsoft Office OneNote 2013 (15.0.4693) Windows NT 6.2
Outlook
Microsoft Office Outlook 2013 (15.0.4693) Windows NT 6.2
Microsoft Outlook Social Connector (15.0.4569) MsoStatic (15.0.4569)
PowerPoint
Microsoft Office PowerPoint 2013 (15.0.4693) Windows NT 6.2
Microsoft Office/15.0 (Windows NT 6.2; Microsoft PowerPoint 15.0.4693; Pro)
Visio
Microsoft Office Visio 2013 (15.0.4693) Windows NT 6.2
Microsoft Office/15.0 (Windows NT 6.2; Microsoft Visio 15.0.4693; Pro)
Access
Microsoft Office/15.0 (Windows NT 6.2; Access Web Datasheet 15.0.4693; Pro)
Microsoft Office/15.0 (Windows NT 6.2; Microsoft Access 15.0.4693; Pro)
Lync
Microsoft Office/15.0 (Windows NT 6.2; Microsoft Lync 15.0.4675; Pro)
FrontPage
MSFrontPage/15.0
Unknown
Microsoft Office SyncProc 2013 (15.0.4693) Windows NT 6.2
Microsoft Office Upload Center 2013 (15.0.4693) Windows NT 6.2
non-browser; Microsoft Office/15.0 (Windows NT 6.2; 15.0.4691; Pro)
Chrome versions are based on 4 numbers (like 39.0.2171.95) but only the first 3 are currently parsed. Like that:
"major" => "39",
"minor" => "0",
"patch" => "2171"
I see 2 potential solutions:
Skip the second number in Chrome which is usually always 0. It would result as:
"major" => "39",
"minor" => "2171",
"patch" => "95"
This would unfortunately break applications that actually rely on the existing mapping.
The second solution is to introduce patch_minor for all user agents. This is already existing for the os parser, so this would be logic and consistent.
"major" => "39",
"minor" => "0",
"patch" => "2171",
"patch_minor" => "95"
I understand this would be a major change, but think it's worth it.
What do you think?
We are seeing a lot of ads requested by the Spotify Desktop Application. This is being detected as Safari. Here are some example strings:
Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_4) AppleWebKit/537.36 (KHTML, like Gecko) Spotify/1.0.9.133 Safari/537.36
Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Spotify/1.0.9.133 Safari/537.36
Mozilla/5.0 (Macintosh; Intel Mac OS X 10_7_5) AppleWebKit/537.36 (KHTML, like Gecko) Spotify/1.0.4.90 Safari/537.36
Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Spotify/1.0.3.101 Safari/537.36
Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_5) AppleWebKit/537.36 (KHTML, like Gecko) Spotify/1.0.8.59 Safari/537.36
I'm so happy you began to migrate!
Some notes about this repository: instead of delete your "old" repository data (https://github.com/tobie/ua-parser), you should make a transfer and rename it uap-core
after.
See https://help.github.com/articles/transferring-a-repository/
You will keep everything (stars, forks, links, issues, ...) and all links point to the old will be redirected.
Only one (bad) more step: clean up the repository after the transfer:
So IMO the best structure here should be like this:
docs
/ ...
test_resources
/ ...
.gitattributes
.gitignore
CONTRIBUTING.md
LICENSE
README.md
regexes.yaml
That's all!
⛵
Original issue by @bwaters at ua-parser/uap-php#15:
Is there anyway for uap-php to tell the difference between WebView in an Android App and the Chrome Mobile browser. https://developer.chrome.com/multidevice/user-agent
Looks like if it has a Version/x.x then it is webview ie..
Mozilla/5.0 (Linux; Android 4.4.4; SAMSUNG-SM-G870A Build/KTU84P) AppleWebKit/537.36 (KHTML, like Gecko) Version/4.0 Chrome/33.0.0.0 Mobile Safari/537.36
Vs a real chrome ua like this one:
Mozilla/5.0 (Linux; Android 4.4.4; SAMSUNG-SM-G900A Build/KTU84P) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2272.96 Mobile Safari/537.36
This recent commit introduced a reference to a non-existing capture group:
a431239
→ git blame -L4337,+4 regexes.yaml
a4312396 (Joe Green 2015-04-15 12:23:22 +0100 4337) - regex: '(HbbTV)/1\.1\.1.*CE-HTML/1\.\d;Vendor/(THOMSON);'
a4312396 (Joe Green 2015-04-15 12:23:22 +0100 4338) device_replacement: '$1'
a4312396 (Joe Green 2015-04-15 12:23:22 +0100 4339) brand_replacement: 'Thomson'
a4312396 (Joe Green 2015-04-15 12:23:22 +0100 4340) model_replacement: '$3'
Also, the second (and last) actually specified capture group yields "THOMSON" but the brand_replacement specifies "Thomson"; this should be rationalized. Attn. @mrjgreen
It would be really great to get the rest ressources with complicated git commands, just with a composer require for php
The question of adding a "device_type" field comes up on a regular basis, but as @elsigh points out in #31 and #65 the ua-parser project is not the right place to make decisions on what is and isn't a phone/tv/tablet/refrigerator etc... Currently I am applying a set of basic rules to the output from the ua-parser to determine the device type (desktop/phone/tablet/other) but due to the simplicity of the approach it's not very accurate. A more complex solution would be difficult to keep up to date without community collaboration.
So - I wondered how people would feel about a side-project within the ua-parser organization to offer a device classification solution, or maybe to begin with a page to point people in the direction of some maintained projects that do this?
@commenthol has done a lot of awesome work on the https://github.com/commenthol/ua-parser-caps project which meets this requirement, plus a lot more - maybe that would be a good place to start for anybody looking to classify devices. As far as I can tell the project currently has only a JavaScript implementation, but is based on a regex ruleset that could be read by any library.
Does anybody have any thoughts on whether this is a good use of time, or is device classification a challenge that we really don't want to try to tackle?
We've talked about performance before, but interestingly I wanted to use the uap-go implementation and found that some time between the version of uap-core that it's pinned to (99e8ba5) and now there have been a fair number of negative lookaheads added to regexes.yaml. Golang doesn't implement negative lookaheads because they are slower than O(n). I'm sure there are some hacks the go implementation could make to deal with these, but maybe we shouldn't allow negative lookaheads in regexes.yaml for this reason?
It may be that regexes.yaml isn't the place to enforce anything related to perf, but it's also true that negative lookaheads are kinda hard to digest (but then again, so is regex generally to many people).
This PR adds them on as a user agent family and add tests.
I've submitted a PR with fixes which addresses this issue: #84
I hoped to bring up this topic again and maybe offer a compromise solution that does not impact performance and yet maintains reasonable accuracy. There have been a couple of pull requests to provide desktop vs mobile distinctions but that doesn't really work because the desktop vs mobile dichotomy is a false dichotomy. I suggest that we add 'category' whose values can be one of a number of predefined strings (this list can grow over time): "smartphone", "tablet", "game console", "wearable", etc.
The policy should be that we can add this category to existing regular expressions but that in general we should not add new regular expressions only to determine the category. A few consequences of this policy:
I see "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; CrystalSemanticsBot http://www.crystalsemantics.com/service-navigation/imprint/useragent/)" coming back as IE 6 on Windows X, but it should be noted as a Web Crawler
Similar to the request from @mspiegel--
I'd like to be able to identify the type or category of the device, or at least test if the device belongs to current broad categories, like is-a-phone or is-a-tablet.
The following UA
Mozilla/5.0 (Android 5.0; Tablet; rv:41.0) Gecko/41.0 Firefox/41.0
should return Firefox 41
as browser instead of Android 5
The following UAs are parsed as "Safari"
Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/538.1 (KHTML, like Gecko) Otter/0.9.04
=> should parse to Otter
Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/538.1 (KHTML, like Gecko) QupZilla/1.7.0 Safari/538.1
=> should parse to QupZilla
Is there a way to retrieve the type (i.e. phone, tablet, laptop, desktop, etc) of a device using ua-parser
? This alternate JavaScript library performs this by hardcoding the type depending on which regexes match. I couldn't find anything like it in the ua-parser
docs.
A new monster UA has emerged:
Mozilla/5.0 (Windows NT 6.4; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/36.0.1985.143 Safari/537.36 Edge/12.0
Seems to me that the best way to detect this monster is to assume that Edge/12.0
is assumed as IE 12 and that other browser occurrences are ignored.
Source:
https://gist.github.com/jacobrossi/c9699b27df2f4e97c0bd
http://blogs.msdn.com/b/ie/archive/2014/11/11/living-on-the-edge-our-next-step-in-interoperability.aspx#10572654
I'm wondering, is there any reason to parse Internet Explorer as IE? FireFox isn't parsed as FF so why should Internet Explorer be parsed as IE?
From a end user point of view (for example in a session overview) Internet Explorer sounds more logical then IE.
Currently I'm changing IE to Internet Explorer after parsing but why not change it to Internet Explorer in the regexes.yaml?
Is there a good reason not to have the uap-core
in npm registry?
I see "Mozilla/5.0 (iPad; CPU OS 8_3 like Mac OS X) AppleWebKit/600.1.4 (KHTML, like Gecko) Mobile/12F69 [Pinterest/iOS]" coming back as Mobile Safari, but it should be the Pinterest Browser
At some point I hope to get around to this, but if not:
TestApp/1.0 CFNetwork/758.0.2 Darwin/15.0.0
I would except this to to parse into iOS 9.
The iOS user agent used on captive networks
CaptiveNetworkSupport-324 wispr
Does not parse, it just parses into
Other
It appears that the Mobile Safari browser is not differentiated from the embedded "uiwebview" version of the browser. It is useful to know which webkit is rendering the page as the uiwebview version is known for being much worse than the actual Mobile Safari browser.
This is related to #38 but for the iOS side of things. There was also an issue created in the old UAParser repository, but it was never resolved.
Here is a SO answer talking about the issue as it applies to iOS.
Known Blackberry 10 UA:
Mozilla/5.0 (BB10; Touch) AppleWebKit/537.35 (KHTML, like Gecko) Version/10.3.1.2243 Mobile Safari/537.35
This comes out as BlackBerry Webkit 0.0.0. Tested with npm useragent module having run its internal update process to bring in the master copy of the regexes from here.
Agent {
family: 'BlackBerry WebKit',
major: '0',
minor: '0',
patch: '0',
source: 'Mozilla/5.0 (BB10; Touch) AppleWebKit/537.35 (KHTML, like Gecko) Version/10.3.1.2243 Mobile Safari/537.35' }
This UA string matches the pattern explained by BB in this blog post. Issue in the polyfill service: polyfillpolyfill/polyfill-service#491.
I tested in some links i found:
http://www.whatsmyua.info/
http://uaparser.dmolsen.com/
Both shows:
rawUa: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/42.0.2311.135 Safari/537.36 *Edge/12.10240*
string: *Chrome 42.0.2311*
family: *Chrome*
major: 42
minor:
patch: 2311
device: Other
ua->family: Chrome
ua->major: 42
ua->minor: 0
ua->patch: 2311
ua->toString: Chrome 42.0.2311
ua->toVersionString: 42.0.2311
Is that links update with latest code? If no, do you have any?
Hi!
Edge mobile browser is detected as Chrome Mobile. Desktop browser is detected right.
I got user agents from here: https://msdn.microsoft.com/en-us/library/hh869301(v=vs.85).aspx
In [1]: ua_parser.VERSION
Out[1]: (0, 4, 1)
In [2]: ua = "Mozilla/5.0 (Windows Phone 10.0; Android 4.2.1; DEVICE INFO) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/42.0.2311.135 Mobile Safari/537.36 Edge/12.9600"
In [3]: Parse(ua)
Out[3]:
{'device': {'brand': u'Generic',
'family': u'Generic Smartphone',
'model': u'Smartphone'},
'os': {'family': 'Windows Phone',
'major': '10',
'minor': '0',
'patch': None,
'patch_minor': None},
'string': 'Mozilla/5.0 (Windows Phone 10.0; Android 4.2.1; DEVICE INFO) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/42.0.2311.135 Mobile Safari/537.36 Edge/12.9600',
'user_agent': {'family': u'Chrome Mobile',
'major': '42',
'minor': '0',
'patch': '2311'}}
Windows 10:
UA: "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.71 Safari/537.36 Edge/12.0"
Xbox One:
UA: "Mozilla/5.0 (compatible; MSIE 10.0; Windows NT 6.2; Trident/6.0;
Xbox; Xbox One)"
Xbox 360:
UA: "Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; Trident/5.0;
Xbox)"
The UA below is classified as "IE" by the original BrowserScope code (see http://www.browserscope.org/ua ), but the uap regex classify it as "Outlook". It seems like "IE" is the correct answer. Is "Outlook" preferred or is this a bug?
UA:
Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.1; WOW64; Trident/4.0; SLCC2; .NET CLR 2.0.50727; .NET CLR 3.5.30729; .NET CLR 3.0.30729; InfoPath.3; .NET4.0C; .NET4.0E; Microsoft Outlook 14.0.7109; ms-office; MSOffice 14)
I tested the BrowserScope code by going to http://www.browserscope.org/ua:
I tested the uap regex using the uap-php code:
$ php bin/uaparser.php ua-parser:parse "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.1; WOW64; Trident/4.0; SLCC2; .NET CLR 2.0.50727; .NET CLR 3.5.30729; .NET CLR 3.0.30729; InfoPath.3; .NET4.0C; .NET4.0E; Microsoft Outlook 14.0.7109; ms-office; MSOffice 14)"
{"ua":{"major":"2010","minor":null,"patch":null,"family":"Outlook"},"os":{"major":null,"minor":null,"patch":null,"patchMinor":null,"family":"Windows 7"},"device":{"brand":null,"model":null,"family":"Other"},"originalUserAgent":"Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.1; WOW64; Trident/4.0; SLCC2; .NET CLR 2.0.50727; .NET CLR 3.5.30729; .NET CLR 3.0.30729; InfoPath.3; .NET4.0C; .NET4.0E; Microsoft Outlook 14.0.7109; ms-office; MSOffice 14)"}
just wondering if anyone knows of a UA parrser for android.
the android fragmentation problem means that its important to do capability testing code and fall back to other means.
From a capability tester, you can perhaps build a UA data silo
The matching regex and device lookup result for the example used in the device_parser
section of the specification is out of date. This useragent:
Mozilla/5.0 (Linux; U; Android 4.2.2; de-de; PEDI_PLUS_W Build/JDQ39) AppleWebKit/534.30 (KHTML, like Gecko) Version/4.0 Safari/534.30
Actually resolves to:
uap-clj.core=> (def pedi-plus-ua "Mozilla/5.0 (Linux; U; Android 4.2.2; de-de; PEDI_PLUS_W Build/JDQ39) AppleWebKit/534.30 (KHTML, like Gecko) Version/4.0 Safari/534.30")
#'uap-clj.core/pedi-plus-ua
uap-clj.core=> (pprint (:device (lookup-useragent pedi-plus-ua)))
{:family "Odys PEDI PLUS W", :brand "Odys", :model "PEDI PLUS W"}
Using this regex:
uap-clj.core=> (first (remove nil? (map #(re-find #"(?i).*PEDI.*" (:regex %)) regexes-device)))
"; *(PEDI)_(PLUS)_(W) Build"
I'm going to edit this and generate a PR for update. Oh, and along the way I'll also fix the use of "family_replacement" in the same section as well.
What it says on the tin; we're stalled for stable new editions until the tests run.
I'm working on ua-parser/uap-go#6. In the process of implementing a potential fix (adding some new struct fields), I saw this error while testing against the latest version of regexes.yaml
and test_resources
:
panic: regexp: Compile(`; *(?:ARCHOS|Archos) ?(GAMEPAD(?:(?! Build|[;/\(\)\-]).)*)`): error parsing regexp: invalid or unsupported Perl syntax: `(?!`
Ideally specification.md
would point to a spec (pcre perhaps?) that all regexes in regexes.yaml
must conform to so parser implementations can use the appropriate library for their language.
'js_user_agent_string' and associated Chrome Frame related substitution alternates are apparently used in at least two ua-parser language implementations (Python and PHP) but are not addressed in the specification document for uap-core. Apparently there's historical context predating the languages/core split which is not documented in ua-parser/uap-core, which can cause puzzlement to developers new to the project (e.g. myself, with a Clojure implementation I'm bringing up to date on the specification w.r.t. Browser and O/S, as well as adding Device, PR pending.)
Does someone have guidance on this?
qutebrowser sends a user agent like Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/538.1 (KHTML, like Gecko) qutebrowser/0.2.1 Safari/538.1
, which is parsed as Safari 0.0.0.
The last bump was in Nov 2014. It's time :D
I see "Mozilla/5.0 (Linux; Android 5.0.1; SCH-I545 Build/LRX22C; wv) AppleWebKit/537.36 (KHTML, like Gecko) Version/4.0 Chrome/43.0.2357.121 Mobile Safari/537.36 [FB_IAB/FB4A;FBAV/38.0.0.47.240;]" parsing as Chrome but it should be the Facebook Browser.
This is the user agent string:
Instagram 3.4.1 (iPhone5,1; iPhone OS 6.0.2; en_US; en) AppleWebKit/420+
This is what it gets parsed as:
{"family":"Mobile Safari","os":"iOS","osversion":"6"}
This is how I know it is Chrome:
window.performance.memory.usedJSHeapSize
is set (in JavaScript)
The section on device_parsers lists 'family_replacement', 'brand_replacement', and 'model_replacement' fields. After commit 22ef8a1 it looks like devices has the fields 'device_replacement', 'brand_replacement', and 'model_replacement'. Also it seems like all devices have all three fields explicitly enumerated and there doesn't appear to be an implicit match ordering as with the user_agent or operating system section. Is the documentation out of date?
I'm seeing lots user agent string like this, which seem to be valid:
BlackBerry 9670/63.94.0.711 ldrepos/XXXXX-123 Configuration/XXXXXX-123 VendorID/123
This appears to be parsing the same number for browser version and device model (9670).
The string is matching rule: (Black[bB]erry)\s?(\d+)
It seems that the browser version is not available in this case, so the expected output could just be BlackBerry
. In this case the regex would instead be: (Black[bB]erry)
Is anybody able to confirm whether or not this is the intended behaviour? If somebody can confirm what the expected output should be I will attempt to create a PR with some tests.
Full parse result below:
"user_agent": {
"family": "BlackBerry",
"major": "9670",
"minor": null,
"patch": null,
"regex": "@(Black[bB]erry)\\s?(\\d+)@"
},
"os": {
"family": "BlackBerry OS",
"major": "63",
"minor": "94",
"patch": "0",
"regex": "@(Black[Bb]erry)[0-9a-z]+\/(\\d+)\\.(\\d+)\\.(\\d+)(?:\\.(\\d+))?@"
},
"device": {
"device": "BlackBerry 9670",
"brand": null,
"model": "9670",
"regex": "@Black[Bb]erry([0-9]+)@"
}
Thanks!
User-Agent: Mozilla/5.0 (Android 4.1.2; Tablet; rv:41.0) Gecko/41.0 Firefox/41.0
results in
UAParser\Result\Client Object
(
[ua] => UAParser\Result\UserAgent Object
(
[major] => 4
[minor] => 1
[patch] => 2
[family] => Android
)
[os] => UAParser\Result\OperatingSystem Object
(
[major] => 4
[minor] => 1
[patch] => 2
[patchMinor] =>
[family] => Android
)
[device] => UAParser\Result\Device Object
(
[brand] => Generic
[model] => Tablet
[family] => Generic Tablet
)
[originalUserAgent] => Mozilla/5.0 (Android 4.1.2; Tablet; rv:41.0) Gecko/41.0 Firefox/41.0
)
How should we handle IE compatibility mode?
The date for the support change announced here is approaching. I use the analysis of UAs to determine support and understanding what version of IE end users are running in addition to what version is being emulated is crucial.
Currently compatibility mode user agent strings such as these from IE 11 are parsed as 10 and 9.
Mozilla/5.0 (compatible; MSIE 10.0; Windows NT 6.3; Trident/7.0)
Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.3; Trident/7.0)
Does it make sense to add the idea of engines like @commenthol has in his ua-parser2 or are there other ideas?
Some guidance before spending time on PR is appreciated.
Riddler
appears twice in the same regex:
# Bots
- regex: '(1470\.net crawler|50\.nu|8bo Crawler Bot|Aboundex|Accoona-[A-z]+-Agent|AdsBot-Google(?:-[a-z]+)?|altavista|AppEngine-Google|archive.*?\.org_bot|archiver|Ask Jeeves|[Bb]ai[Dd]u[Ss]pider(?:-[A-Za-z]+)*|bingbot|BingPreview|blitzbot|BlogBridge|BoardReader(?: [A-Za-z]+)*|boitho.com-dc|BotSeer|\b\w*favicon\w*\b|\bYeti(?:-[a-z]+)?|Catchpoint bot|[Cc]harlotte|Checklinks|clumboot|Comodo HTTP\(S\) Crawler|Comodo-Webinspector-Crawler|ConveraCrawler|CRAWL-E|CrawlConvera|Daumoa(?:-feedfetcher)?|Feed Seeker Bot|findlinks|Flamingo_SearchEngine|FollowSite Bot|furlbot|Genieo|gigabot|GomezAgent|gonzo1|(?:[a-zA-Z]+-)?Googlebot(?:-[a-zA-Z]+)?|Google SketchUp|grub-client|gsa-crawler|heritrix|HiddenMarket|holmes|HooWWWer|htdig|ia_archiver|ICC-Crawler|Icarus6j|ichiro(?:/mobile)?|IconSurf|IlTrovatore(?:-Setaccio)?|InfuzApp|Innovazion Crawler|InternetArchive|IP2[a-z]+Bot|jbot\b|KaloogaBot|Kraken|Kurzor|larbin|LEIA|LesnikBot|Linguee Bot|LinkAider|LinkedInBot|Lite Bot|Llaut|lycos|Mail\.RU_Bot|masidani_bot|Mediapartners-Google|Microsoft .*? Bot|mogimogi|mozDex|MJ12bot|msnbot(?:-media *)?|msrbot|netresearch|Netvibes|NewsGator[^/]*|^NING|Nutch[^/]*|Nymesis|ObjectsSearch|Orbiter|OOZBOT|PagePeeker|PagesInventory|PaxleFramework|Peeplo Screenshot Bot|PlantyNet_WebRobot|Pompos|Read%20Later|Reaper|RedCarpet|Retreiver|Riddler|Riddler|Rival IQ|scooter|Scrapy|Scrubby|searchsight|seekbot|semanticdiscovery|Simpy|SimplePie|SEOstats|SimpleRSS|SiteCon|Slurp|snappy|Speedy Spider|Squrl Java|TheUsefulbot|ThumbShotsBot|Thumbshots\.ru|TwitterBot|URL2PNG|Vagabondo|VoilaBot|^vortex|Votay bot|^voyager|WASALive.Bot|Web-sniffer|WebThumb|WeSEE:[A-z]+|WhatWeb|WIRE|WordPress|Wotbox|www\.almaden\.ibm\.com|Xenu(?:.s)? Link Sleuth|Xerka [A-z]+Bot|yacy(?:bot)?|Yahoo[a-z]*Seeker|Yahoo! Slurp|Yandex\w+|YodaoBot(?:-[A-z]+)?|YottaaMonitor|Yowedo|^Zao|^Zao-Crawler|ZeBot_www\.ze\.bz|ZooShot|ZyBorg)(?:[ /]v?(\d+)(?:\.(\d+)(?:\.(\d+))?)?)?'
We found '; *(Google )?(Nexus [Ss](?: 4G)?) Build/'
failed as both device_replacement
and model_replacement
can refer to none existent capture groups (if Google
is not present then $2
is not defined).
This probably should be rewritten (untested and off the top of my head) as ; *(Google |)(Nexus [Ss](?: 4G)?) Build/
to avoid this.
Really crudely, looks like someone will need to flip through additionally the following to check at least these are safe, but someone is going to have to come up with a regex to test your regex's...Dawg :) Should help in future as they get updated and added to so problems like this do not crop up again:
$ grep -A3 '([^(?:)][^()]*)?' regexes.yaml | grep -B3 '_replacement.*\$' | grep regex:
- regex: '(Namoroka|Shiretoko|Minefield)/(\d+)\.(\d+)([ab]\d+[a-z]*)?'
- regex: 'Android Application[^\-]+ - (Sony) ?(Ericsson)? (.+) \w+ - '
- regex: '; *(Advent )?(Vega(?:Bean|Comb)?).* Build'
- regex: '; *(Ainol )?((?:NOVO|[Nn]ovo)[^;/]+) Build'
- regex: '; *(ALLVIEW[ _]?|Allview[ _]?)?(AX1_Shine|AX2_Frenzy) Build'
- regex: '; *(CUBE[ _])?([KU][0-9]+ ?GT.*|A5300) Build'
- regex: '; *(HUAWEI |Huawei-)?([UY][^;/]+) Build/(?:Huawei|HUAWEI)([UY][^\);]+)\)'
- regex: '; *(MODECOM )?(FreeTab) ?([^;/]+) Build'
- regex: '; *(A\d+)[ _](Duo)? Build'
- regex: '; *(NOOK )?(BNRV200|BNRV200A|BNTV250|BNTV250A|BNTV400|BNTV600|LogicPD Zoom2) Build'
- regex: '; *(SKY[ _])?(IM\-[AT]\d{3}[^;/]+).* Build/'
- regex: 'Android 4\..*; *(M[12356789]|U[12368]|S[123])\ ?(pro)? Build'
- regex: '; *(?:Polaroid[ _])?((?:MIDC\d{3,}|PMID\d{2,}|PTAB\d{3,})[^;/]*)(\/[^;/]*)? Build/'
- regex: '; *(A2|A5|A8|A900)_?(Classic)? Build'
- regex: '; *(SAMSUNG |Samsung )?((?:Galaxy (?:Note II|S\d)|GT-I9082|GT-I9205|GT-N7\d{3}|SM-N9005)[^;/]*)\/?[^;/]* Build/'
- regex: '; *(Google )?(Nexus [Ss](?: 4G)?) Build/'
- regex: '; *(SAMSUNG-)?(GT\-[BINPS]\d{4}[^\/]*)(\/[^ ]*) Build'
- regex: '; *((?:SCH|SGH|SHV|SHW|SPH|SC|SM)\-[A-Za-z0-9 ]+)(/?[^ ]*)? Build'
- regex: ' ((?:SCH)\-[A-Za-z0-9 ]+)(/?[^ ]*)? Build'
- regex: '; *((?:CSL_Spice|Spice|SPICE|CSL)[ _\-]?)?([Mm][Ii])([ _\-])?(\d{3}[^;/]*) Build/'
#~ - regex: '; *(Sprint)? ?(HTC_?)?(X515E|ATP515CKIT|APA7373KT|PG06100|APC715CKT|APX515CKT|PG86100|EVOV4G|APA9292KT|PC36100) Build'
- regex: '\b(T-Mobile ?)?(myTouch)[ _]?([34]G)[ _]?([^\/]*) (?:Mozilla|Build)'
User-Agent: Mozilla/5.0 (SMART-TV; Linux; Tizen 2.3) AppleWebkit/538.1 (KHTML, like Gecko) SamsungBrowser/1.0 TV Safari/538.1
{
"ua": {
"major": null,
"minor": null,
"patch": null,
"family": "Safari"
},
"os": {
"major": null,
"minor": null,
"patch": null,
"patchMinor": null,
"family": "Linux"
},
"device": {
"brand": null,
"model": null,
"family": "Other"
},
"originalUserAgent": "Mozilla\/5.0 (SMART-TV; Linux; Tizen 2.3) AppleWebkit\/538.1 (KHTML, like Gecko) SamsungBrowser\/1.0 TV Safari\/538.1"
}
##########
# Tizen OS from Samsung
# spoofs Android so pushing it above
##########
- regex: '(Tizen)/(\d+)\.(\d+)'
But according to http://developer.samsung.com/technical-doc/view.do?v=T000000203 there should be no /
between platform and platform version:
Mozilla/$(MOZILA_VER) (
SAMSUNG
(KHTML, like Gecko)
3. The browser is recognized as Safari which looks like a fallback for comething more specific.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.