Extract information from all games published in Steam thanks to its Web API, and store it in JSON format. It also collects extra data from SteamSpy.

I used this code to generate these dataset: 'Steam Games Dataset'.

Requisites 🔧

Pyhton 3.8
requests and argparse.

pip3 install requests argparse

Usage 🚀

Start generating data simply with:

python SteamGamesScraper.py

The first time, the file 'appplist.json' will be created with all the ID that facilitates Steam (>140K). In the next execution, that file will be used instead of requesting all the data again. If you want to get new IDs, simply delete the file 'appplist.json'.

Only the data of the games are saved. DLCs, music, tools, etc. are ignored and added to the file 'discarted.json' so as not to ask for them in future searches. You can delete the file to ask again for those IDs.

Finally, in the file 'games.json' all games are stored, if:

It have been already been released.
'developers' field not empty.
Price included if its not free.

The format is this:

{
    "906850": {
        "name": "...",
        "release_date": {
            "coming_soon": false,
            "date": "..."
        },
        "required_age": 0,
        "is_free": false,
        "price": 0.99,
        "detailed_description": "...",
        "supported_languages": "...",
        "reviews": "...",
        "header_image": "...",
        "website": "...",
        "support_url": "...",
        "support_email": "...",
        "windows": true,
        "mac": false,
        "linux": false,
        "metacritic_score": 0,
        "metacritic_url": "...",
        "achievements": 0,
        "recommendations": 0,
        "notes": "",
        "packages": [
            {
                "title": "...",
                "description": "...",
                "subs": [
                    {
                        "text": "...",
                        "description": "...",
                        "price": 0.99
                    }
                ]
            }
        ],
        "developers": [
            "..."
        ],
        "publishers": [
            "..."
        ],
        "categories": [
            "..."
        ],
        "genres": [
            "..."
        ],
        "screenshots": [
            "..."
        ],
        "movies": [
            "..."
        ],
        "user_score": 0,
        "score_rank": "",
        "negative": 0,
        "positive": 1,
        "estimated_owners": "0 - 20000",
        "average_playtime_forever": 0,
        "average_playtime_2weeks": 0,
        "median_playtime_forever": 0,
        "median_playtime_2weeks": 0,
        "peak_ccu": 0,
        "tags": {
            "...": 22,
            ...
        }
    },
    ...
}

In the file 'ParseExample.py' you can see a simple example of how to parse the information.

⚙️ Parameters

To change the input file uses the parameter '-i' / '-infile':

python SteamGamesScraper.py -i games.json

To change the output file uses the parameter '-o' / '-outfile':

python SteamGamesScraper.py -o output.json

There is a general API rate limit for each unique IP adress of 200 requests in five minutes which is one request every 1.5 seconds. That's why 1.5 seconds are waited by default. You can change this with the parameter '-s' / '-sleep':

python SteamGamesScraper.py -s 2.0

It is not recommended to set the wait time below 1.5 seconds.

You can disable the extra data collected in SteamSpy using '-p' / '-steamspy':

python SteamGamesScraper.py -p False

When this option is deactivated, some data will appear as empty.

When Steam denies a request, by default it is trying up to four times. You can change the number of retries with '-r' / '-retries':

python SteamGamesScraper.py -r 10

Although it is not recommended, you can set always retry by changing the value to 0.

By default prices are requested in US dollars. You can change the currency with the parameter '-c' / '--currency' and the country or region code:

python SteamGamesScraper.py -c es

By default the language is set to English. You can change the language wit the parameter '-l' / '--language' and the country or region code:

python SteamGamesScraper.py -l en

The games that have not yet been released are added to the file 'notreleased.json' and will not be checked again. If you want to ignore this list, you can set the parameter '-d' / '-released' to False, or eliminate the file.

At the end of the scan, or by pressing Ctrl + C, all data are recorded. You can activate the auto-save to activate each X new entries with '-a' / '-autosave':

python SteamGamesScraper.py -a 100

A backup file will also be generated with the previous data.

Do you want to add new games from a file? You can use the parameter '-u' / '-update' and the CSV file name to add new games. The AppID must be in the first column.

python SteamGamesScraper.py -u update.csv

Contributors ✨

License 📜

Code released under MIT License.

fronkongames / steam-games-scraper Goto Github PK

steam-games-scraper's Introduction

Requisites 🔧

Usage 🚀

⚙️ Parameters

Contributors ✨

License 📜

steam-games-scraper's People

Contributors

Stargazers

Watchers

Forkers

steam-games-scraper's Issues

Add args option to automatically delete applist.json

python .\SteamGamesScraper.py doesn't generate anything

Include tags in code

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs