GithubHelp home page GithubHelp logo

geolite2legacy's Introduction

geolite2legacy

This tool will convert MaxMind GeoLite2 Database to the old legacy format.

It's based on mmutils but it reads new GeoLite2 directly from zip files containings CSV databases.

You can download databases from https://dev.maxmind.com/geoip/geoip2/geolite2/

It's tested with python/pypy 2.7 and python 3.5+

Limitations

  • Processing may be slow, expecially for City blocks, consider using pypy, it is a lot faster
  • Some software may expect iso-8859-1 encoded names, but the script will output utf-8, you can force a different encoding e.g. using -e iso-8859-1 but some name may result wrong

Examples

$ ./geolite2legacy.py -i GeoLite2-Country-CSV.zip -f geoname2fips.csv -o GeoIP.dat
Database type Country - Blocks IPv4
wrote 306385-node trie with 300679 networks (251 distinct labels) in 8 seconds

# ./geolite2legacy.py -i GeoLite2-ASN-CSV.zip -o GeoIPASNum.dat
Database type ASN - Blocks IPv4
wrote 518484-node trie with 417952 networks (62896 distinct labels) in 15 seconds

Usage

usage: geolite2legacy.py [-h] -i INPUT_FILE -o OUTPUT_FILE [-f FIPS_FILE]
                         [-e ENCODING] [-d] [-6]

optional arguments:
  -h, --help            show this help message and exit
  -i INPUT_FILE, --input-file INPUT_FILE
                        input zip file containings csv databases
  -o OUTPUT_FILE, --output-file OUTPUT_FILE
                        output GeoIP dat file
  -f FIPS_FILE, --fips-file FIPS_FILE
                        geonameid to fips code mappings
  -e ENCODING, --encoding ENCODING
                        encoding to use for the output rather than utf-8
  -d, --debug           debug mode
  -6, --ipv6            use ipv6 database

Run inside Docker container

  1. Build the Docker image:
docker build -t geolite2legacy .
  1. This command assmes that you have downloaded the GeoLite2 database to the current directory.
docker run -it -v $(pwd):/src geolite2legacy:latest -i /src/GeoLite2-Country-CSV.zip -o /src/GeoIP.dat
The MIT License (MIT)

Copyright (c) 2015 Mark Teodoro
Copyright (c) 2019 Gianluigi Tiesi

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.

geolite2legacy's People

Contributors

fffonion avatar lnussel avatar lvasiliev avatar miyurusankalpa avatar poz2k4444 avatar selivan avatar sherpya avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

geolite2legacy's Issues

Support IPv4-mapped addresses

It would be nice if we could build IPv6 database with all IPv4 addresses included as IPv4-mapped IPv6 addresses (::ffff:*:*)
Currently, It could be done by editing ipv6 csv inside zip (country example):

tail -n+2 GeoIP2-Country-Blocks-IPv4.csv |
    awk -F, '{ split($1,a,"/"); split(a[1],a1,"."); m = 96+a[2]; printf("::ffff:%02x%02x:%02x%02x/%d,%s,%s,%s,%s,%s\n"),a1[1],a1[2],a1[3],a1[4],m,$2,$3,$4,$5,$6}' >> GeoIP2-Country-Blocks-IPv6.csv

Missing modules: ipaddr and pygeoip

I am running in Visual Studio Community edition on windows 10 Pro 64 bit
Python 3.7 64bit

Two missing modules are:

ipaddr
pygeoip

where do I get these from and where do I put them?

I had this working but had to reset my PC and re-install.

Thanks

convert Country database got error

CentOS Linux release 7.6.1810 (Core)
python-2.7.5-76.el7.x86_64
python-pygeoip-0.2.6-5.el7.noarch
python-ipaddr-2.1.11-1.el7.noarch

Command:
yum -y install python-ipaddr python-pygeoip
wget https://geolite.maxmind.com/download/geoip/database/GeoLite2-Country.tar.gz
wget https://raw.githubusercontent.com/sherpya/geolite2legacy/master/geolite2legacy.py
chmod 700 geolite2legacy.py
./geolite2legacy.py -i GeoLite2-Country-CSV.zip -o GeoIP.dat

Error :
Traceback (most recent call last):
File "./geolite2legacy.py", line 202, in
class ASNv6RadixTree(ASNRadixTree):
File "./geolite2legacy.py", line 204, in ASNv6RadixTree
edition = pygeoip.const.ASNUM_EDITION_V6
AttributeError: 'module' object has no attribute 'ASNUM_EDITION_V6'

Error in database city

on centos, python 2.7.5, pygeoip 0.3.2 last database (23-11-2018)
Launch : ./geolite2legacy.py -i GeoLite2-City-CSV.zip -o GeoLiteCity.dat
Error :

Traceback (most recent call last):
  File "./geolite2legacy.py", line 377, in <module>
    r.load(locs, TextIOWrapper(ziparchive.open(blocks, 'r')))
  File "./geolite2legacy.py", line 118, in load
    for nets, data in self.gen_nets(locations, outfile):
  File "./geolite2legacy.py", line 214, in gen_nets
    location['subdivision_1_name'].encode('utf-8'),  # region
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc5 in position 0: ordinal not in range(128)

ipaddress.AddressValueError: '1.0.0.0/24' does not appear to be an IPv4 or IPv6 network.

I'm on fresh Ubuntu Server 20.04. The python-ipaddress is installed by the command:

sudo apt-get install -y python-ipaddress

But geolite2legacy.py returns the following error:

./geolite2legacy.py -i GeoLite2-Country-CSV.zip -f geoname2fips.csv -o GeoIP.dat
Database type Country - Blocks IPv4 - Encoding: utf-8
Traceback (most recent call last):
  File "./geolite2legacy.py", line 475, in <module>
    main()
  File "./geolite2legacy.py", line 459, in main
    r.load(locs, TextIOWrapper(ziparchive.open(blocks, 'r'), encoding='utf-8'))
  File "./geolite2legacy.py", line 162, in load
    for nets, data in self.gen_nets(locations, outfile):
  File "./geolite2legacy.py", line 319, in gen_nets
    nets = [IPNetwork(row['network'])]
  File "/usr/lib/python2.7/dist-packages/ipaddress.py", line 199, in ip_network
    ' a unicode object?' % address)
ipaddress.AddressValueError: '1.0.0.0/24' does not appear to be an IPv4 or IPv6 network. Did you pass in a bytes (str in Python 2) instead of a unicode object?

In order to solve this error I've made the following change:

git diff
diff --git a/geolite2legacy.py b/geolite2legacy.py
index e6a39f1..73c3780 100755
--- a/geolite2legacy.py
+++ b/geolite2legacy.py
@@ -225,7 +225,7 @@ class ASNRadixTree(RadixTree):

     def gen_nets(self, locations, infile):
         for row in csv.DictReader(infile):
-            nets = [IPNetwork(row['network'])]
+            nets = [IPNetwork(row['network'].decode('utf-8'))]
             org = decode_text(row['autonomous_system_organization'])
             asn = row['autonomous_system_number']
             entry = u'AS{} {}'.format(asn, org)
@@ -254,7 +254,7 @@ class CityRev1RadixTree(RadixTree):
             if location is None:
                 continue

-            nets = [IPNetwork(row['network'])]
+            nets = [IPNetwork(row['network'].decode('utf-8'))]
             country_iso_code = location['country_iso_code'] or location['continent_code']
             fips_code = geoname2fips.get(location['geoname_id'])
             if fips_code is None:
@@ -316,7 +316,7 @@ class CountryRadixTree(RadixTree):
             if location is None:
                 continue

-            nets = [IPNetwork(row['network'])]
+            nets = [IPNetwork(row['network'].decode('utf-8'))]
             country_iso_code = location['country_iso_code'] or location['continent_code']
             yield nets, (country_iso_code,)

Does this a correct way to solve that issue?

Is Netspeed (GeoIP2-Connection) database supported?

I've hoped that GeoIP2-Connection (Geolegacy known as GeoIPNetSpeed.dat) database is supported but after executing:
python geolite2legacy.py -i GeoIP2-Connection.zip

i receive error: Missing Locations or Block files, please check the archive.

Archive contains:
GeoIP2-Connection-Type-Blocks-IPv4.csv
GeoIP2-Connection-Type-Blocks-IPv6.csv

in format for block IPv4:
network,connection_type
1.0.0.0/24,Corporate
1.0.4.0/22,Corporate
1.0.16.0/24,Cable/DSL
1.0.64.0/18,Cable/DSL
etc...etc...etc...etc...

in format for block IPv6:
network,connection_type
a10:33c0::/29,Corporate
2001:200::/37,Corporate
2001:200:800::/40,Corporate
2001:200:900::/40,Cable/DSL
2001:200:a00::/39,Corporate
2001:200:c00::/38,Corporate
2001:200:1000::/36,Corporate
etc...etc...etc...etc...

etc....

If it is not supported at this moment because of error could you please update script to support converting of GeoIP2-COnnection to .dat database.

Thanks

GEOIP_REGION returns a number instead of the region code

Hello there :)

I've successfully converted a GeoIP2-Lite-City to GeoIP Legacy, however the HTTP_GEOIP_REGION is returning a number code instead of the letter one:

    [HTTP_GEOIP_CITY] => Brasília
    [HTTP_GEOIP_REGION_NAME] => Distrito Federal
    [HTTP_GEOIP_REGION] => 07

I expected GEOIP_REGION to be "DF", not "07".

Is this the expected behavior?

Thank you

Python 3.5 Problems in step 4

Step 4/6 : RUN pip install -U pip && pip install -r requirements.txt
---> Running in 9127c8d8639b
DEPRECATION: Python 3.5 reached the end of its life on September 13th, 2020. Please upgrade your Python as Python 3.5 is no longer maintained. pip 21.0 will drop support for Python 3.5 in January 2021. pip 21.0 will remove support for this functionality.
WARNING: Retrying (Retry(total=4, connect=None, read=None, redirect=None, status=None)) after connection broken by 'NewConnectionError('<pip._vendor.urllib3.connection.HTTPSConnection object at 0xb5cf93d0>: Failed to establish a new connection: [Errno -3] Temporary failure in name resolution',)': /simple/pip/

Any idea how to fix it?

as for REGION_IGNORE

There's something I'd like to ask. Why BOURGOGNE FRANCHE COMTE is ignored?
https://github.com/sherpya/geolite2legacy/blob/master/geoname2fips.py#L445

In the OSS I use, there is the following source code, I expect that the region_code is A6, but because it is being ignored it has become 00.

https://github.com/matomo-org/matomo/blob/ac07aa40497901c7f2a7a99166a88a1b72265d53/plugins/UserCountry/LocationProvider/GeoIp.php#L210

    private static function getTestIpAndResult()
    {
        static $result = null;
        if (is_null($result)) {
            // TODO: what happens when IP changes? should we get this information from piwik.org?
            $expected = array(self::COUNTRY_CODE_KEY => 'FR',
                              self::REGION_CODE_KEY  => 'A6',
                              self::CITY_NAME_KEY    => 'Besançon');
            $result = array(self::TEST_IP, $expected);
        }
        return $result;
    }

updated geolit2lacgay still having KeyError: 'region

I am trying to convert a file , following your how to do having below error? can you please help?
I have all the latest file from your repo. :)

./geolite2legacy.py -i GeoLite2-Country-CSV.zip -f geoname2fips.csv -o GeoIP.dat
Traceback (most recent call last):
File "./geolite2legacy.py", line 447, in
main()
File "./geolite2legacy.py", line 427, in main
parse_fips(opts.fips_file)
File "./geolite2legacy.py", line 358, in parse_fips
geoname2fips[row['geoname_id']] = row['region']
KeyError: 'region'

Support A1, A2 special countries

It would be nice if we could build database with A1 (Anonymous Proxy), A2 (Satellite Provider) special countries support.
Currently, this hack of editing csv inside zip could be used (country example):

echo '12000001,en,--,"",A1,"Anonymous Proxy",0
12000002,en,--,"",A2,"Satellite Provider",0' >> GeoIP2-Country-Locations-en.csv

sed -i 's/^\(.*\),,\([0-9]*,[0-9]*,1,0\)$/\1,12000001,\2/; s/^\(.*\),,\([0-9]*,[0-9]*,0,1\)$/\1,12000002,\2/;' GeoIP2-Country-Blocks-IPv4.csv GeoIP2-Country-Blocks-IPv6.csv

Replace ipaddr module by ipaddress

With recent python releases (>2.7), it does make sense to replace ipaddr module by ipaddress.

Ipaddr has become an hidden library by now.

nginx problems with dat file

I do not know if this is related directly to the nginx or database transformation, but I know that lots of people will use your script for either logstash / nginx enrichment:

When converting to .dat, nginx throws error:
nginx: [emerg] invalid GeoIP database "/etc/nginx/geoip/GeoIP_City.dat" type:2
nginx: [emerg] invalid GeoIP City database "/etc/nginx/geoip/GeoIP_Country.dat" type:1

Regions/states

$ geoiplookup 207.179.201.1

geolite2legacy result:
GeoIP City Edition, Rev 1: US, 53, N/A ...

original GeoIP legacy database result:
GeoIP City Edition, Rev 1: US, IL, Illinois ...

Atleast US states are ignored completely and not written.

I don't understand what is the purpose of geoname2fips at all? All the country and region data is already found in GeoLite2-City-Locations-en.csv, why isn't it used directly?

python 2.7 ASN conversion results in decode error

$ pip list 2>/dev/null | egrep 'pygeoip|ipaddr'
ipaddr (2.2.0)
pygeoip (0.3.2)

$ python --version
Python 2.7.15

$ geolite2legacy/geolite2legacy.py -i GeoLite2-ASN-CSV.zip -f geolite2legacy/geoname2fips.csv -o GeoIPASNum.dat
Database type ASN - Blocks IPv4
Traceback (most recent call last):
  File "geolite2legacy/geolite2legacy.py", line 430, in <module>
    main()
  File "geolite2legacy/geolite2legacy.py", line 418, in main
    r.load(locs, TextIOWrapper(ziparchive.open(blocks, 'r'), encoding='utf-8'))
  File "geolite2legacy/geolite2legacy.py", line 143, in load
    for nets, data in self.gen_nets(locations, outfile):
  File "geolite2legacy/geolite2legacy.py", line 211, in gen_nets
    yield nets, (asn.encode('utf-8'),)
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 9: ordinal not in range(128)

works OK with python 3.6

$ python3 geolite2legacy/geolite2legacy.py -i GeoLite2-ASN-CSV.zip -f geolite2legacy/geoname2fips.csv -o GeoIPASNum.dat
Database type ASN - Blocks IPv4
wrote 517773-node trie with 417209 networks (62788 distinct labels) in 18 seconds

Is it possible patch geobase during conversion to legacy format?

The problem is that for a number of cities the country is determined incorrectly. For example, for Sevastopol (which has always, regardless of the status of Crimea, been extraterritorial and belonged to Russia), the MaxMind bases consider that "Sevastopol is Ukraine."

Is it possible to patch when converting the database using the list of IPs?

And how to do it?

country fips file

may this file could help a bit ;-)
http://download.geonames.org/export/dump/countryInfo.txt

found the link in these files:
https://github.com/mschmitt/GeoLite2xtables/blob/master/10_download_countryinfo

geoipupdate don't produce zip files

"It's based on mmutils but it reads new GeoLite2 directly from zip files containings CSV databases"

but why? the new databases are mmdb and "geoipupdate" for cronjobs is producing them

[root@srv-rhsoft:/usr/share/GeoIP]$ ls
insgesamt 141M
-rw-r--r-- 1 root root 4,5M 2018-03-24 11:10 GeoIPASNum.dat
-rw-r--r-- 1 root root 5,6M 2018-12-29 11:11 GeoIPASNumv6.dat
-rw-r--r-- 1 root root 20M 2018-03-27 14:05 GeoIPCity.dat
-rw-r--r-- 1 root root 23M 2018-12-27 19:05 GeoIPCityv6.dat
-rw-r--r-- 1 root root 1,2M 2018-03-27 14:17 GeoIP.dat
-rw-r--r-- 1 root root 2,4M 2013-06-09 00:52 GeoIPISP.dat
-rw-r--r-- 1 root root 2,4M 2018-12-27 20:30 GeoIPv6.dat
-rw-r--r-- 1 root root 7,2M 2021-10-09 13:39 GeoLite2-ASN.mmdb
-rw-r--r-- 1 root root 70M 2021-10-09 13:39 GeoLite2-City.mmdb
-rw-r--r-- 1 root root 6,1M 2021-10-09 13:39 GeoLite2-Country.mmdb

Build time / version in DAT file comment

Would it be possible to use .mmdb file build time as a .dat file comment?

Right now, the comment is just a static string: geolite2legacy.py

f.write(b'geolite2legacy.py')  # .dat file comment - can be anything

mmdblookup shows 1561400523 (2019-06-24 18:22:03 UTC)

# mmdblookup -v -f GeoLite2-Country.mmdb -i 8.8.8.8 | egrep '^\s+Build '
    Build epoch:   1561400523 (2019-06-24 18:22:03 UTC)

Or maybe if the geolite2legacy.py script allowed for a custom comment via a script argument, for example:

-c COMMENT, --comment COMMENT

Problem converting

Hi
Environment: CentOS7.X, python.x86_64 2.7.5-77.el7_6, python-ipaddr.noarch 2.1.11-1.el7, python-pygeoip.noarch 0.2.6-5.el7.
Downloaded GeoLite2-Country-CSV.zip on 8/5/19
used commandline:
./geolite2legacy.py -d -i GeoLite2-Country-CSV.zip -f geoname2fips.csv -o GeoNew.dat

But got error:
Traceback (most recent call last):
File "./geolite2legacy.py", line 219, in
class ASNv6RadixTree(ASNRadixTree):
File "./geolite2legacy.py", line 221, in ASNv6RadixTree
edition = pygeoip.const.ASNUM_EDITION_V6
AttributeError: 'module' object has no attribute 'ASNUM_EDITION_V6'

This even happens if I just do
./geolite2legacy.py -h

Any help would be appreciated.

Country database sets wrong edition type value

Hi,
i've recently run into an issue using the GeoIP.dat files downloaded from https://www.miyuru.lk/geoiplegacy
while the file is working as expected for nginx, it is being used by a Magento 1 module (Openstream/GeoIP) to provide geoip redirects.

After investigating the module and testing with the last published GeoIP.dat file from MaxMind, it seems that the
database type (set as COUNTRY_EDITION) in the MaxMind DB is value 106, where as in geolite2legacy the constant has a value 1

other than the magento module not accepting the geolite2legacy generated file, everything else appears to be happy with the difference in the database type value (geoiplookup tools, nginx) - but it could be that they are just assuming default values for things if the type value doesn't match something else.

Question to encoding

Hello,

thanks for your converter, i want to use it, but currently i have an issue with encoding.
I test this IP: 91.38.193.110
City is called Füssen (german umlaut)
When using the converted db on console geoiplookup shows city as Füssen.
In my utf-8 putty i would expect to see a correct umlaut when using this encoding.
Whats wrong here?
https://geolite.maxmind.com/download/geoip/database/GeoLite2-City-CSV.zip
geolite2legacy.py -i GeoLite2-City-CSV.zip -f geoname2fips.csv -e utf-8 -o GeoLiteCity.dat
In csv file itself the umlaut seems to be correct, i can see a gorgeous ü when grepping in GeoLite2-City-Locations-de.csv

What do you think?

Thanks,
Hans

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.