tobie / ua-parser Goto Github PK
View Code? Open in Web Editor NEWA multi-language port of Browserscope's user agent parser.
License: Other
A multi-language port of Browserscope's user agent parser.
License: Other
Hey,
When checking for the -get
argument, you guys are doing this:
if (defined('STDIN') && isset($argv) && ($argv[1] == '-get')) {
Which I think should be this:
if (defined('STDIN') && isset($argv[1]) && ($argv[1] == '-get')) {
I would submit a pull, but I've got \r\n vs. \n issues with the repo, so my git wants to replace all of the empty lines.
Cheers guys,
Mike
Any plans to port this to Python 3 in the near future?
If not, I'll probably fork it, make a py3k branch, and port it myself; any particular bits that you would expect to be difficult?
Thanks for your nice work on this!
regex: 'Minimo' in yaml causes:
File "C:\Python27\lib\site-packages\ua_parser-1.0-py2.7.egg\ua_parser\user_agent_parser.py", line 85, in ParseUserAgent
family, v1, v2, v3 = uaParser.Parse(user_agent_string)
File "C:\Python27\lib\site-packages\ua_parser-1.0-py2.7.egg\ua_parser\user_agent_parser.py", line 60, in Parse
family = match.group(1)
IndexError: no such group
regex: '(Minimo)' works fine
see issue #30
Java implementation don't pass unit tests, is @sjiang still active?
It does not correctly detect mobile devices.
One of the HTC device_replacement strings is unquoted, which is causing the NodeJS/JavaScript yamlparser to erroneously interpret it as a Date.
https://github.com/tobie/ua-parser/blob/master/regexes.yaml#L527
Is line 51 of UAParser.php debug code?
"loading yaml..." is output on web pages using UA-Parser.
Line 51 in c4a18f2
Hi.
Polaris browser is classified as Safari.
It use UA String like "Mozilla/5.0 (Android) AppleWebkit/536.2 (KHTML, like Gecko) Polaris/8.0 Safari/536.2".
thank you.
Example user-agent string:
python-requests/0.14 CPython/2.6 Linux/2.6-43-server
Using the PHP parser, Opera Mini and Opera Mobile are reporting isMoble as false and isTablet as true. This is on a Google Nexus S with the latest updates. The default "Browser", Firefox mobile and Chrome beta are reporting correctly.
UA strings:
Opera Mobile:
Opera/9.80 (Android 4.0.4; Linux; Opera Mobi/ADR-1204201824; U; en) Presto/2.10.254 Version/12.00
Opera Mini:
Opera/9.80 (Android; Opera Mini/7.29703/27.1662; U; en) Presto/2.8.119 Version/11.10
Hi,
Working on an update to the JS port and realized a lot of the data to parse device info is embedded in the code (notably in the PHP port) rather than inside of the regexes.yaml
file. This is leading to oddities in the Python version (see #89).
Can we agree to put the relevant data in regexes.yaml
and write pseudo code somewhere so as to define how to use this information?
@elsigh-
I looked at Travis [http://about.travis-ci.org/] today. Not sure multiple languages in one project are supported but I'm going to play around with it anyway. The validator doesn't choke at least ;) For the PHP lib I'm going to build out a better test script that uses the files you already have built.
One requirement of Travis that on success the test script kick out 0 (zero). Any other output is treated as an error. I glanced at the Python but I'm not sure what its final output is on success. I'll be honest, I'm being lazy. I'm kind of hoping you happen to know off the top of your head ;)
I'll create a separate branch for playing around with this. This is a first for me but I don't think I should bork anything too badly.
@tobie-
Are you cool if I give Travis access to your repo? It gets read/write so it can set-up the service hooks but otherwise keeps its hands off the repo.
I probably won't attempt this until next week but I wanted to start the conversation while I was thinking about it.
Hi there,
First at all thanks for this helpful library.
I noticed that there's a setup.py
file in this repository added from this pull request #28, but it doesn't look like this package has been added to PyPI. I tried doing pip install ua_parser
and pip
it complained about not finding ua_parser
.
Is registering this package on PyPI planned?
I've submitted a PR for a new C# port but before I can finalize it I have a few questions and Tobie asked me to open a new issue so here it goes (pretty new to the whole github thing, so foregive me if I ask obvious questions):
1: I've included unit tests of the yaml-defined testcases under test_resources but I get one failing test on matching the useragent 'Mozilla/5.0 (BlackBerry; U; BlackBerry 9320; en-GB) AppleWebKit/534.11+ (KHTML, like Gecko) Version/7.1.0.398 Mobile Safari/534.11+' as a mobile device. From running the java tests I see they have the same issue (seems the issue, at least for me and for the java version, is that the list of mobile OS'es are matched exactly against the family and BlackBerry 9320 != BlackBerry so the device is not marked as mobile). Is this a known issue or is it just c#/java that suffer from this?
2: I've added some Apache2 copyright and my name to the readme. I'd like to support this going forward but how can I contribute and stay in the loop for updates/fixes across the ported platforms?
3: As far as I can tell there is no Travis-CI support for C# which is too bad since I'd like to integrate into some CI system. Any requirements/thoughts/ideas on this from more experienced githubbers?
4: I think @tobie has touched on this but c# is really not all that happy with _ (illegal escape) so I am currently just working around that and can remove this if the regex'es are adjusted (not really a question come to think of it, more of a reminder)
> 'Mozilla/5.0 (X11; Linux i686) AppleWebKit/534.34 (KHTML, like Gecko) rekonq Safari/534.34'
{ family: 'Rekonq',
major: NaN,
minor: null,
patch: null,
os: 'Linux' }
> 'Mozilla/5.0 (X11; U; Linux i686; en-GB) AppleWebKit/533.3 (KHTML, like Gecko) rekonq Safari/533.3'
{ family: 'Rekonq',
major: NaN,
minor: null,
patch: null,
os: 'Linux' }
// Rekonq 1.0+
> 'Mozilla/5.0 (X11; Linux i686) AppleWebKit/534.34 (KHTML, like Gecko) rekonq/1.0 Safari/534.34'
{ family: 'Rekonq',
major: NaN,
minor: null,
patch: null,
os: 'Linux' }
Unfortunately rekonq didn't put versions in its user agent string until recently. Perhaps it would be appropiate to pick WebKit's or Safari's value instead? Or default to 0.x.x.
Either way the NaN, null, null is somewhat confusing. And even for 1.0+ it doesn't recognize the version.
Sort of a stupid question... should regexes.yaml take into account IE's compatibility mode? e.g. when running in compatibility mode IE 8-10 will send a UA that ua-parser sees as IE 7. By looking at the Trident version you can tell what the actual version of IE is. I'm sort of torn on it but figured I'd ask in case anyone felt strongly one way or the other.
Some examples ss taken from testing via browserstack:
IE 10 default:
Mozilla/5.0 (compatible; MSIE 10.0; Windows NT 6.2; Trident/6.0)
IE 10 in compatibility mode:
Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.2; Trident/6.0; ...)
IE 9 default:
Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.0; Trident/5.0)
IE 9 in compatibility mode:
Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.0; Trident/5.0; ...)
IE 8 default:
Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.0; Trident/4.0; ...)
IE 8 in compatibility mode:
Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.0; Trident/4.0; ...)
IE 7 default:
Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.2; ...)
Hi
Opera 12.50 alpha (aka Opera Next http://www.opera.com/browser/next/) on desktop doesn't render http://detector.dmolsen.com/, whereas Opera 12.01 does. In Opera.Next, masking as Firefox makes it work. That's often indicative of UA detection at play. detector.dmolsen.com uses ua-parser, hence filing this issue.
Opera 12.50 simplifies the UA string (see http://my.opera.com/ODIN/blog/2012/08/03/a-hot-opera-12-50-summer-time-snapshot). Opera Mobile and Opera Mini will move to the new UA string in the future, too.
Opera 12.50 will ship with a simplified UA string. Firstly, we have dropped the "U;" token, which signified that the browser provides crypto support that is stronger than what the "international" builds of Netscape offered circa 1995. The second change is removal of the language indicator. As an example, while the UA string for Opera 12.01 on Mac is currently
Opera/9.80 (Macintosh; Intel Mac OS X 10.8.0; U; en) Presto/2.10.289 Version/12.01
today's snapshot for Opera 12.50 on Mac shows
Opera/9.80 (Macintosh; Intel Mac OS X 10.8.0) Presto/2.12.363 Version/12.50
Both these changes correspond to similar changes in the IE, Firefox, Chrome and Safari browsers' UA strings."
Ran across this article today. Anyone see an actual user-agent string to fill in the gaps?
http://devblog.blackberry.com/2012/08/blackberry-10-user-agent-string/
The UA:
Mozilla/5.0 (BB10; ) AppleWebKit/ (KHTML, like Gecko) Version/<BB Version #> Mobile Safari/
Hi,
Trying to parse on Linux (ubuntu) :
Warning: preg_match(): Unknown modifier '' [...] UAParser.php, line 255
I printed $osRegex (var_dump) :
array(1) { ["regex"]=> string(15) "(GoogleTV)/\d+" }
Thanks for the lib and the work !
Hey guys,
We just started feeding live data through the UAParser and it's working quite well for us. The only issue we have had thus far comes from Symbian related devices (Nokia):
This user agent string, for example:
Mozilla/5.0 (SymbianOS/9.1; U; en) AppleWebKit/413 (KHTML, like Gecko) Safari/413
Fails in UAParser.php with an "Undefined offset" error:
PHP Notice: Undefined offset: 1 in .../ua_parser/php/UAParser.php on line 307
Which think is happening because in the YAML
there is this defined regex block:
- regex: 'Symbian'
device_replacement: 'Nokia'
If you change to the following, it seems to work:
- regex: '(Symbian)'
device_replacement: 'Nokia'
In other words that one wasn't defining a closure, so there would never be $matches[1]
defined. Now, I don't know as much about user agents as you guys so is this a proper change, or should it perhaps be done another way?
Cheers,
Mike
I'm working on integrating ua-parser in TestSwarm, however I'm getting this error when accessing our API from node.
I do usually configure a user agent (I forgot it in this case), and TestSwarm does have different behaviour for API clients from node.
Anyhow, the error got through and caused an E_NOTICE in the output (which then caused the response to be invalid JSON, don't worry, error_reporting
only enabled in development):
--- UAParser.php
function parse($ua = null ) {
$ua ? $ua : $_SERVER["HTTP_USER_AGENT"]
PHP Notice: Undefined index: HTTP_USER_AGENT in /ua-parser/php/UAParser.php on line 55
PHP Warning: Cannot modify header information - headers already sent by (output started at /ua-parser/php/UAParser.php:55)
I suggest:
$_SERVER
if the passed parameter is null
(not if it is an empty string or some other falsey value)$_SERVER
, do an isset
check in case the key is not set (for instance, it isn't set when on the command-line).I am getting the following error, once in a while.
Any idea?
/home/meeting/node_modules/ua-parser/js/index.js:15
var m = ua.match(regexp);
^
TypeError: Cannot call method 'match' of undefined
at Array.parser as 0
at Object.parse (/home/meeting/node_modules/ua-parser/js/index.js:53:31)
at IncomingMessage.app.get.callback (/home/meeting/app.js:294:21)
at IncomingMessage.EventEmitter.emit (events.js:115:20)
at IncomingMessage._emitEnd (http.js:366:10)
at HTTPParser.parserOnMessageComplete as onMessageComplete
at Socket.socketOnData as ondata
at TCP.onread (net.js:402:27)
Although this seems easily fixable through pre-processing, I'm curious to know why we're doing this in the first place. Is this something Python specific?
Is there a reason why YAML instead of JSON is used to store the regex patterns? Loading regexes.yml
in python felt slow so I converted the YAML file to JSON (https://gist.github.com/4199468) to do a quick comparison between the two:
>>> timeit.timeit("import json; f = open('regexes.json'); json.load(f); f.close()", number=10)
0.02537703514099121
>>> timeit.timeit("import yaml; f = open('regexes.yaml'); yaml.load(f); f.close()", number=10)
2.897357940673828
As you can see, the JSON version performed over 100x faster. Furthermore, many programming languages also have JSON parsing support built in (at least for both Python and JavaScript).
Thoughts?
Subversion folders should be removed.
I found UA 'EasouSpider' in log, but ua-parser treat it as nothing. (and 'Bot', 'Spider' ...)
the problem is in 832 line of regexes.yaml, it is case sensitive!
example:
import ua_parser.user_agent_parser as ua_parser
ua = ua_parser.ParseDevice('EasouSpider')
ua
{'is_spider': False, 'is_mobile': False, 'family': None}
ua = ua_parser.ParseDevice('Easouspider')
ua
{'is_spider': True, 'is_mobile': False, 'family': 'Spider'}
Hi,
I Tested it with Blackberry 9320 mobile phone user ugent.It gives is_mobile:False.
user_agent_parser.Parse('Mozilla/5.0 (BlackBerry; U; BlackBerry 9320; en-GB) AppleWebKit/534.11+ (KHTML, like Gecko) Version/7.1.0.398 Mobile Safari/534.11+')
a['os']
{'major': '7', 'patch_minor': '398', 'minor': '1', 'family': 'BlackBerry OS', 'patch': '0'}
a['device']
{'is_spider': False, 'is_mobile': False, 'family': 'BlackBerry 9320'}
a['user_agent']
{'major': '7', 'minor': '1', 'family': 'Blackberry WebKit', 'patch': '0'}
Seems code contained in the if
statement at https://github.com/tobie/ua-parser/blob/master/py/ua_parser/user_agent_parser.py#L250 should never be reached because of https://github.com/tobie/ua-parser/blob/master/py/ua_parser/user_agent_parser.py#L245.
Not sure if that's deliberate or a typo.
Traceback (most recent call last):
File "/home/ubuntu/workspace/rtbopsConfig/rtbServers/rtbWorkerServer/workerServer.py", line 29, in
from ua_parser import user_agent_parser
File "/usr/local/lib/python2.7/dist-packages/ua_parser-1.0-py2.7.egg/ua_parser/user_agent_parser.py", line 395, in
yamlFile = open(os.path.join(ROOT_DIR, '../../regexes.yaml'))
IOError: [Errno 2] No such file or directory: '/usr/local/lib/python2.7/dist-packages/ua_parser-1.0-py2.7.egg/ua_parser/../../regexes.yaml'
We currently run parseInt on the version number parts. This is nice to store the data compactly if we're writing to a database, but it has the side-effect that if the version number isn't really a number, eg, it has the strings a, b, pre, etc., as part of it, then parseInt becomes lossy.
We should either not do parseInt (not so desirable), or check that the part is really a number before doing a parseInt (may be a little slower).
Some modifications to the Node.js version should make it readily available as an AMD module...
Hey,
When checking for the -get
argument, you guys are doing this:
if (defined('STDIN') && isset($argv) && ($argv[1] == '-get')) {
Which I think should be this:
if (defined('STDIN') && isset($argv[1]) && ($argv[1] == '-get')) {
I would submit a pull, but I've got \r\n vs. \n issues with the repo, so my git wants to replace all of the empty lines.
Cheers guys,
Mike
In the last line: might be 'mobile_os_families' instead of 'mobile_user_agent_families'
Here is my research so far on UA strings:
http://stackoverflow.com/questions/13076839/what-is-the-user-agent-string-for-surface-rt
Mozilla/5.0 (compatible; MSIE 10.0; Windows NT 6.2; ARM; Trident/6.0; Touch)
New to this library but would like to help if I can...
if (preg_match("/".str_replace("/","\/",$osRegex['regex'])."/",self::$ua,$matches)) {
In the python implementation Chrome Frame is handled through parsing the user agent string accessible by JavaScript. This seems contrary to how Chrome Frame user agent strings are describe to work. It's also not very practical. How do you determine the UA of a CSS request, for example?
As cool as it'd be to get repos into this one official library it makes sense to keep some separate (e.g. the Ruby gem). Should we at least link to them in the main README?
I think the only other ua-parser related library I've found out there is written in Haskell.
Need to close up my old repo and point to this new one with a simple README. Will also post a blog article on dmolsen.com to announce the move.
Now that @elsigh merged it into: regexes.yaml
, I think we can get rid of user_agents_regex.yaml. Just needs updating the PHP code.
I noticed that in PHP certain properties (for example osPatch / osBuild, or device) not always exists, thus generating a notice (Notice: Undefined property: stdClass::$osPatch) when accessing.
Simple test: create a php page to show all properties (as showed in php's README.md), then access the page via chrome and firefox (both latest version): osPatch / osBuild are available with chrome but not with firefox.
Wouldn't be better if all properties had a default value (blank, zero or false)?
Hi,
The User-Agent string sent by Google Chrome for Android has changed between 4.0.3 and 4.0.4. The pattern CrMo used in the regex is no longer returned, and the library fails to detect it as a browser in the "Chrome Mobile" family.
Here is a typical User-Agent string returned by the browser:
Mozilla/5.0 (Linux; Android 4.0.4; Galaxy Nexus Build/IMM76B) AppleWebKit/535.19 (KHTML, like Gecko) Chrome/18.0.1025.133 Mobile Safari/535.19
The only difference that remains with the tablet/desktop version is the "Mobile" token. I suggest to add the following regex to regexes.yaml next to the current one for Google Chrome for Android (or at least before the more generic browser parsing rules):
- regex: '(Chrome)/(\d+)\.(\d+)\.(\d+)\.(\d+) Mobile'
family_replacement: 'Chrome Mobile'
I checked the regexp locally. That seems to work as expected.
Unit test for test_resources/test_user_agent_parser.yaml:
- user_agent_string: 'Mozilla/5.0 (Linux; Android 4.0.4; Galaxy Nexus Build/IMM76B) AppleWebKit/535.19 (KHTML, like Gecko) Chrome/18.0.1025.133 Mobile Safari/535.19'
family: 'Chrome Mobile'
v1: '18'
v2: '0'
v3: '1025'
v4: '133'
Let me know if you would rather see a pull request for this.
See Google Chrome developers doc at:
https://developers.google.com/chrome/mobile/docs/user-agent
@elsigh & @tobie-
@elsigh talked about merging ua-parser-php into this project. i'd love to see that but one caveat... the names of my attributes are different than the js version of ua-parser and, i'm sure, the python version. do you want me to standardize before attempting to merge?
also, one feature that i like but, again, may not fit the original ua-parser philosophy is my inclusion of isMobile, isTablet, isDesktop, etc. There is probably a better way to do it but just curious what your thoughts are on that feature as well.
Really looking forward to contributing to the ua-parser project. It's been really helpful.
Currently output contains:
{
family:
major:
minor:
patch:
os:
}
Would be nice to have device, isMobile and isSpider in there as well (like some other implementations do).
Basically to add:
os.family
os.major
os.minor
device.family
device.isMobile
device.isSpider
isMobileDevice returns "true" on IE9 when compatibility mode is activated
Hi,
It seems that The OS versions for Windows 7 are NULL. Is this for a reason or can't the major and minor versions be found like that?
Currently (though not all implementations make use of it yet) the regex library of ua-parser is able to parse:
Being able to extract information like "Trident", "Gecko" and "WebKit" (and their versions) would be useful. Especially now that there are more and more different browsers that are very similar, it would be more future proof to check the layout engine instead of the primary browser family (e.g. WebKit
instead of (Safari | Flock | Mobile Safari | Chrome on iOS | Android Browser | ..
). Or Chromium
instead of (Chromium | Chrome | Opera | Chrome Mobile | ..
).
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.