GithubHelp home page GithubHelp logo

tobie / ua-parser Goto Github PK

View Code? Open in Web Editor NEW
2.0K 2.0K 499.0 4.93 MB

A multi-language port of Browserscope's user agent parser.

License: Other

Makefile 0.25% C++ 8.32% C# 11.09% Shell 0.08% D 3.27% Go 3.91% Haskell 4.62% Java 17.92% JavaScript 6.24% Perl 17.95% PHP 16.48% PigLatin 0.65% Python 8.88% CMake 0.32% Batchfile 0.02%

ua-parser's People

Contributors

3rd-eden avatar atifaziz avatar basvandijk avatar bluesmoon avatar commenthol avatar dmatth avatar dmolsen avatar drucifer avatar elsigh avatar emberian avatar enemaerke avatar fyrd avatar georgevreilly avatar gquinones avatar ironholds avatar jpvincent avatar lstrojny avatar mamod avatar nielsbasjes avatar ozataman avatar philipzae avatar rascalking avatar selwin avatar shripadk avatar sjiang avatar stephenwalz avatar synchro avatar tobie avatar yihuanz avatar yurigorokhov avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

ua-parser's Issues

Undefined offset: 1 when running from php/cli

Hey,

When checking for the -get argument, you guys are doing this:

if (defined('STDIN') && isset($argv) && ($argv[1] == '-get')) {

Which I think should be this:

if (defined('STDIN') && isset($argv[1]) && ($argv[1] == '-get')) {

I would submit a pull, but I've got \r\n vs. \n issues with the repo, so my git wants to replace all of the empty lines.

Cheers guys,
Mike

Python3 support

Any plans to port this to Python 3 in the near future?

If not, I'll probably fork it, make a py3k branch, and port it myself; any particular bits that you would expect to be difficult?

Thanks for your nice work on this!

Python fails on some regex's missing parentheses

regex: 'Minimo' in yaml causes:

File "C:\Python27\lib\site-packages\ua_parser-1.0-py2.7.egg\ua_parser\user_agent_parser.py", line 85, in ParseUserAgent
family, v1, v2, v3 = uaParser.Parse(user_agent_string)
File "C:\Python27\lib\site-packages\ua_parser-1.0-py2.7.egg\ua_parser\user_agent_parser.py", line 60, in Parse
family = match.group(1)
IndexError: no such group

regex: '(Minimo)' works fine

see issue #30

improper detection for Polaris browser

Hi.

Polaris browser is classified as Safari.

It use UA String like "Mozilla/5.0 (Android) AppleWebkit/536.2 (KHTML, like Gecko) Polaris/8.0 Safari/536.2".

thank you.

Opera reporting incorrectly on Nexus S

Using the PHP parser, Opera Mini and Opera Mobile are reporting isMoble as false and isTablet as true. This is on a Google Nexus S with the latest updates. The default "Browser", Firefox mobile and Chrome beta are reporting correctly.

UA strings:

Opera Mobile:
Opera/9.80 (Android 4.0.4; Linux; Opera Mobi/ADR-1204201824; U; en) Presto/2.10.254 Version/12.00

Opera Mini:
Opera/9.80 (Android; Opera Mini/7.29703/27.1662; U; en) Presto/2.8.119 Version/11.10

Bring device parser data to regexes.yaml

Hi,

Working on an update to the JS port and realized a lot of the data to parse device info is embedded in the code (notably in the PHP port) rather than inside of the regexes.yaml file. This is leading to oddities in the Python version (see #89).

Can we agree to put the relevant data in regexes.yaml and write pseudo code somewhere so as to define how to use this information?

/cc @dmolsen @elsigh

Adding PHP Tests & Support for Travis CI

@elsigh-

I looked at Travis [http://about.travis-ci.org/] today. Not sure multiple languages in one project are supported but I'm going to play around with it anyway. The validator doesn't choke at least ;) For the PHP lib I'm going to build out a better test script that uses the files you already have built.

One requirement of Travis that on success the test script kick out 0 (zero). Any other output is treated as an error. I glanced at the Python but I'm not sure what its final output is on success. I'll be honest, I'm being lazy. I'm kind of hoping you happen to know off the top of your head ;)

I'll create a separate branch for playing around with this. This is a first for me but I don't think I should bork anything too badly.

@tobie-

Are you cool if I give Travis access to your repo? It gets read/write so it can set-up the service hooks but otherwise keeps its hands off the repo.

I probably won't attempt this until next week but I wanted to start the conversation while I was thinking about it.

Upload package to PyPI

Hi there,

First at all thanks for this helpful library.

I noticed that there's a setup.py file in this repository added from this pull request #28, but it doesn't look like this package has been added to PyPI. I tried doing pip install ua_parser and pip it complained about not finding ua_parser.

Is registering this package on PyPI planned?

Question on new C# port

I've submitted a PR for a new C# port but before I can finalize it I have a few questions and Tobie asked me to open a new issue so here it goes (pretty new to the whole github thing, so foregive me if I ask obvious questions):

1: I've included unit tests of the yaml-defined testcases under test_resources but I get one failing test on matching the useragent 'Mozilla/5.0 (BlackBerry; U; BlackBerry 9320; en-GB) AppleWebKit/534.11+ (KHTML, like Gecko) Version/7.1.0.398 Mobile Safari/534.11+' as a mobile device. From running the java tests I see they have the same issue (seems the issue, at least for me and for the java version, is that the list of mobile OS'es are matched exactly against the family and BlackBerry 9320 != BlackBerry so the device is not marked as mobile). Is this a known issue or is it just c#/java that suffer from this?

2: I've added some Apache2 copyright and my name to the readme. I'd like to support this going forward but how can I contribute and stay in the loop for updates/fixes across the ported platforms?

3: As far as I can tell there is no Travis-CI support for C# which is too bad since I'd like to integrate into some CI system. Any requirements/thoughts/ideas on this from more experienced githubbers?

4: I think @tobie has touched on this but c# is really not all that happy with _ (illegal escape) so I am currently just working around that and can remove this if the regex'es are adjusted (not really a question come to think of it, more of a reminder)

Merge YAML files

  • merge regexes.yaml and user_agents_regex.yaml.
  • update lang specific code so they all point to the same file.

Family "Rekonq" has "NaN, null, null" as version.

> 'Mozilla/5.0 (X11; Linux i686) AppleWebKit/534.34 (KHTML, like Gecko) rekonq Safari/534.34'
{ family: 'Rekonq',
  major: NaN,
  minor: null,
  patch: null,
  os: 'Linux' }

> 'Mozilla/5.0 (X11; U; Linux i686; en-GB) AppleWebKit/533.3 (KHTML, like Gecko) rekonq Safari/533.3'
{ family: 'Rekonq',
  major: NaN,
  minor: null,
  patch: null,
  os: 'Linux' }

// Rekonq 1.0+
> 'Mozilla/5.0 (X11; Linux i686) AppleWebKit/534.34 (KHTML, like Gecko) rekonq/1.0 Safari/534.34'
{ family: 'Rekonq',
  major: NaN,
  minor: null,
  patch: null,
  os: 'Linux' }

Unfortunately rekonq didn't put versions in its user agent string until recently. Perhaps it would be appropiate to pick WebKit's or Safari's value instead? Or default to 0.x.x.

Either way the NaN, null, null is somewhat confusing. And even for 1.0+ it doesn't recognize the version.


[1] https://bugs.kde.org/show_bug.cgi?id=293298

IE & Compatibility Mode

Sort of a stupid question... should regexes.yaml take into account IE's compatibility mode? e.g. when running in compatibility mode IE 8-10 will send a UA that ua-parser sees as IE 7. By looking at the Trident version you can tell what the actual version of IE is. I'm sort of torn on it but figured I'd ask in case anyone felt strongly one way or the other.

Some examples ss taken from testing via browserstack:

IE 10 default:
Mozilla/5.0 (compatible; MSIE 10.0; Windows NT 6.2; Trident/6.0)

IE 10 in compatibility mode:
Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.2; Trident/6.0; ...)

IE 9 default:
Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.0; Trident/5.0)

IE 9 in compatibility mode:
Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.0; Trident/5.0; ...)

IE 8 default:
Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.0; Trident/4.0; ...)

IE 8 in compatibility mode:
Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.0; Trident/4.0; ...)

IE 7 default:
Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.2; ...)

Opera's UA string is changing

Hi

Opera 12.50 alpha (aka Opera Next http://www.opera.com/browser/next/) on desktop doesn't render http://detector.dmolsen.com/, whereas Opera 12.01 does. In Opera.Next, masking as Firefox makes it work. That's often indicative of UA detection at play. detector.dmolsen.com uses ua-parser, hence filing this issue.

Opera 12.50 simplifies the UA string (see http://my.opera.com/ODIN/blog/2012/08/03/a-hot-opera-12-50-summer-time-snapshot). Opera Mobile and Opera Mini will move to the new UA string in the future, too.

Opera 12.50 will ship with a simplified UA string. Firstly, we have dropped the "U;" token, which signified that the browser provides crypto support that is stronger than what the "international" builds of Netscape offered circa 1995. The second change is removal of the language indicator. As an example, while the UA string for Opera 12.01 on Mac is currently
Opera/9.80 (Macintosh; Intel Mac OS X 10.8.0; U; en) Presto/2.10.289 Version/12.01

today's snapshot for Opera 12.50 on Mac shows
Opera/9.80 (Macintosh; Intel Mac OS X 10.8.0) Presto/2.12.363 Version/12.50

Both these changes correspond to similar changes in the IE, Firefox, Chrome and Safari browsers' UA strings."

PHP - Warning on preg_match at googleTV check

Hi,

Trying to parse on Linux (ubuntu) :

Warning: preg_match(): Unknown modifier '' [...] UAParser.php, line 255

I printed $osRegex (var_dump) :

array(1) { ["regex"]=> string(15) "(GoogleTV)/\d+" }

Thanks for the lib and the work !

UAParser.php fails for Symbian device user agents

Hey guys,

We just started feeding live data through the UAParser and it's working quite well for us. The only issue we have had thus far comes from Symbian related devices (Nokia):

This user agent string, for example:

Mozilla/5.0 (SymbianOS/9.1; U; en) AppleWebKit/413 (KHTML, like Gecko) Safari/413

Fails in UAParser.php with an "Undefined offset" error:

PHP Notice: Undefined offset: 1 in .../ua_parser/php/UAParser.php on line 307

Which think is happening because in the YAML there is this defined regex block:

 - regex: 'Symbian'
    device_replacement: 'Nokia'

If you change to the following, it seems to work:

 - regex: '(Symbian)'
    device_replacement: 'Nokia'

In other words that one wasn't defining a closure, so there would never be $matches[1] defined. Now, I don't know as much about user agents as you guys so is this a proper change, or should it perhaps be done another way?

Cheers,
Mike

PHP Notice: Undefined index: HTTP_USER_AGENT

I'm working on integrating ua-parser in TestSwarm, however I'm getting this error when accessing our API from node.

I do usually configure a user agent (I forgot it in this case), and TestSwarm does have different behaviour for API clients from node.

Anyhow, the error got through and caused an E_NOTICE in the output (which then caused the response to be invalid JSON, don't worry, error_reporting only enabled in development):

--- UAParser.php
    function parse($ua = null ) {
        $ua ? $ua : $_SERVER["HTTP_USER_AGENT"]
PHP Notice:  Undefined index: HTTP_USER_AGENT in /ua-parser/php/UAParser.php on line 55
PHP Warning:  Cannot modify header information - headers already sent by (output started at /ua-parser/php/UAParser.php:55)

I suggest:

  • Only use $_SERVER if the passed parameter is null (not if it is an empty string or some other falsey value)
  • When using $_SERVER, do an isset check in case the key is not set (for instance, it isn't set when on the command-line).

TypeError: Cannot call method 'match' of undefined

I am getting the following error, once in a while.

Any idea?

/home/meeting/node_modules/ua-parser/js/index.js:15
var m = ua.match(regexp);
^
TypeError: Cannot call method 'match' of undefined
at Array.parser as 0
at Object.parse (/home/meeting/node_modules/ua-parser/js/index.js:53:31)
at IncomingMessage.app.get.callback (/home/meeting/app.js:294:21)
at IncomingMessage.EventEmitter.emit (events.js:115:20)
at IncomingMessage._emitEnd (http.js:366:10)
at HTTPParser.parserOnMessageComplete as onMessageComplete
at Socket.socketOnData as ondata
at TCP.onread (net.js:402:27)

\_ in Regexes crashes the C# parser

Although this seems easily fixable through pre-processing, I'm curious to know why we're doing this in the first place. Is this something Python specific?

Using JSON instead of YAML to store regexes

Is there a reason why YAML instead of JSON is used to store the regex patterns? Loading regexes.yml in python felt slow so I converted the YAML file to JSON (https://gist.github.com/4199468) to do a quick comparison between the two:

>>> timeit.timeit("import json; f = open('regexes.json'); json.load(f); f.close()", number=10)
0.02537703514099121
>>> timeit.timeit("import yaml; f = open('regexes.yaml'); yaml.load(f); f.close()", number=10)
2.897357940673828

As you can see, the JSON version performed over 100x faster. Furthermore, many programming languages also have JSON parsing support built in (at least for both Python and JavaScript).

Thoughts?

problem of general spider detect (case sensitive)

I found UA 'EasouSpider' in log, but ua-parser treat it as nothing. (and 'Bot', 'Spider' ...)

the problem is in 832 line of regexes.yaml, it is case sensitive!

example:

import ua_parser.user_agent_parser as ua_parser
ua = ua_parser.ParseDevice('EasouSpider')
ua
{'is_spider': False, 'is_mobile': False, 'family': None}
ua = ua_parser.ParseDevice('Easouspider')
ua
{'is_spider': True, 'is_mobile': False, 'family': 'Spider'}

Blackberry Webkit family not being detected as mobile

Hi,
I Tested it with Blackberry 9320 mobile phone user ugent.It gives is_mobile:False.
user_agent_parser.Parse('Mozilla/5.0 (BlackBerry; U; BlackBerry 9320; en-GB) AppleWebKit/534.11+ (KHTML, like Gecko) Version/7.1.0.398 Mobile Safari/534.11+')

a['os']
{'major': '7', 'patch_minor': '398', 'minor': '1', 'family': 'BlackBerry OS', 'patch': '0'}
a['device']
{'is_spider': False, 'is_mobile': False, 'family': 'BlackBerry 9320'}
a['user_agent']
{'major': '7', 'minor': '1', 'family': 'Blackberry WebKit', 'patch': '0'}

Just isntalled the lastest commit - this is what happens

Traceback (most recent call last):
File "/home/ubuntu/workspace/rtbopsConfig/rtbServers/rtbWorkerServer/workerServer.py", line 29, in
from ua_parser import user_agent_parser
File "/usr/local/lib/python2.7/dist-packages/ua_parser-1.0-py2.7.egg/ua_parser/user_agent_parser.py", line 395, in
yamlFile = open(os.path.join(ROOT_DIR, '../../regexes.yaml'))
IOError: [Errno 2] No such file or directory: '/usr/local/lib/python2.7/dist-packages/ua_parser-1.0-py2.7.egg/ua_parser/../../regexes.yaml'

User Agent version may not be a number

We currently run parseInt on the version number parts. This is nice to store the data compactly if we're writing to a database, but it has the side-effect that if the version number isn't really a number, eg, it has the strings a, b, pre, etc., as part of it, then parseInt becomes lossy.

We should either not do parseInt (not so desirable), or check that the part is really a number before doing a parseInt (may be a little slower).

Undefined offset: 1 when running from php/cli

Hey,

When checking for the -get argument, you guys are doing this:

if (defined('STDIN') && isset($argv) && ($argv[1] == '-get')) {

Which I think should be this:

if (defined('STDIN') && isset($argv[1]) && ($argv[1] == '-get')) {

I would submit a pull, but I've got \r\n vs. \n issues with the repo, so my git wants to replace all of the empty lines.

Cheers guys,
Mike

Linking to other ua-parser Libraries

As cool as it'd be to get repos into this one official library it makes sense to keep some separate (e.g. the Ruby gem). Should we at least link to them in the main README?

I think the only other ua-parser related library I've found out there is written in Haskell.

PHP - default values for properties

I noticed that in PHP certain properties (for example osPatch / osBuild, or device) not always exists, thus generating a notice (Notice: Undefined property: stdClass::$osPatch) when accessing.

Simple test: create a php page to show all properties (as showed in php's README.md), then access the page via chrome and firefox (both latest version): osPatch / osBuild are available with chrome but not with firefox.

Wouldn't be better if all properties had a default value (blank, zero or false)?

Google Chrome for Android >=4.0.4 not detected as "Chrome Mobile"

Hi,

The User-Agent string sent by Google Chrome for Android has changed between 4.0.3 and 4.0.4. The pattern CrMo used in the regex is no longer returned, and the library fails to detect it as a browser in the "Chrome Mobile" family.

Here is a typical User-Agent string returned by the browser:

Mozilla/5.0 (Linux; Android 4.0.4; Galaxy Nexus Build/IMM76B) AppleWebKit/535.19 (KHTML, like Gecko) Chrome/18.0.1025.133 Mobile Safari/535.19

The only difference that remains with the tablet/desktop version is the "Mobile" token. I suggest to add the following regex to regexes.yaml next to the current one for Google Chrome for Android (or at least before the more generic browser parsing rules):

  - regex: '(Chrome)/(\d+)\.(\d+)\.(\d+)\.(\d+) Mobile'
    family_replacement: 'Chrome Mobile'

I checked the regexp locally. That seems to work as expected.

Unit test for test_resources/test_user_agent_parser.yaml:

  - user_agent_string: 'Mozilla/5.0 (Linux; Android 4.0.4; Galaxy Nexus Build/IMM76B) AppleWebKit/535.19 (KHTML, like Gecko) Chrome/18.0.1025.133 Mobile Safari/535.19'
    family: 'Chrome Mobile'
    v1: '18'
    v2: '0'
    v3: '1025'
    v4: '133'

Let me know if you would rather see a pull request for this.

See Google Chrome developers doc at:
https://developers.google.com/chrome/mobile/docs/user-agent

Merging ua-parser-php

@elsigh & @tobie-

@elsigh talked about merging ua-parser-php into this project. i'd love to see that but one caveat... the names of my attributes are different than the js version of ua-parser and, i'm sure, the python version. do you want me to standardize before attempting to merge?

also, one feature that i like but, again, may not fit the original ua-parser philosophy is my inclusion of isMobile, isTablet, isDesktop, etc. There is probably a better way to do it but just curious what your thoughts are on that feature as well.

Really looking forward to contributing to the ua-parser project. It's been really helpful.

Update node-js implementation with device and os objects

Currently output contains:

{
  family:
  major:
  minor:
  patch:
  os:
}

Would be nice to have device, isMobile and isSpider in there as well (like some other implementations do).

Basically to add:

os.family
os.major
os.minor
device.family
device.isMobile
device.isSpider

Add support for parsing layout information

Currently (though not all implementations make use of it yet) the regex library of ua-parser is able to parse:

  • browser family and version (major, minor, patch)
  • os family and version (major, minor, patch)
  • device family, isMobile, isSpider

Being able to extract information like "Trident", "Gecko" and "WebKit" (and their versions) would be useful. Especially now that there are more and more different browsers that are very similar, it would be more future proof to check the layout engine instead of the primary browser family (e.g. WebKit instead of (Safari | Flock | Mobile Safari | Chrome on iOS | Android Browser | ..). Or Chromium instead of (Chromium | Chrome | Opera | Chrome Mobile | ..).

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.