GithubHelp home page GithubHelp logo

fronkongames / steam-games-scraper Goto Github PK

View Code? Open in Web Editor NEW
56.0 2.0 13.0 3.63 MB

Extract information from all games published in Steam thanks to its Web API, and store it in JSON format.

Home Page: https://fronkongames.github.io/

License: MIT License

Python 100.00%
gamedev gamedevelopment indiedev python steam dataset games machine-learning machinelearning scraper

steam-games-scraper's Introduction


Extract information from all games published in Steam thanks to its Web API, and store it in JSON format. It also collects extra data from SteamSpy.

I used this code to generate these dataset: 'Steam Games Dataset'.

Requisites ๐Ÿ”ง

  • Pyhton 3.8
  • requests and argparse.
pip3 install requests argparse

Usage ๐Ÿš€

Start generating data simply with:

python SteamGamesScraper.py

The first time, the file 'appplist.json' will be created with all the ID that facilitates Steam (>140K). In the next execution, that file will be used instead of requesting all the data again. If you want to get new IDs, simply delete the file 'appplist.json'.

Only the data of the games are saved. DLCs, music, tools, etc. are ignored and added to the file 'discarted.json' so as not to ask for them in future searches. You can delete the file to ask again for those IDs.

Finally, in the file 'games.json' all games are stored, if:

  • It have been already been released.
  • 'developers' field not empty.
  • Price included if its not free.

The format is this:

{
    "906850": {
        "name": "...",
        "release_date": {
            "coming_soon": false,
            "date": "..."
        },
        "required_age": 0,
        "is_free": false,
        "price": 0.99,
        "detailed_description": "...",
        "supported_languages": "...",
        "reviews": "...",
        "header_image": "...",
        "website": "...",
        "support_url": "...",
        "support_email": "...",
        "windows": true,
        "mac": false,
        "linux": false,
        "metacritic_score": 0,
        "metacritic_url": "...",
        "achievements": 0,
        "recommendations": 0,
        "notes": "",
        "packages": [
            {
                "title": "...",
                "description": "...",
                "subs": [
                    {
                        "text": "...",
                        "description": "...",
                        "price": 0.99
                    }
                ]
            }
        ],
        "developers": [
            "..."
        ],
        "publishers": [
            "..."
        ],
        "categories": [
            "..."
        ],
        "genres": [
            "..."
        ],
        "screenshots": [
            "..."
        ],
        "movies": [
            "..."
        ],
        "user_score": 0,
        "score_rank": "",
        "negative": 0,
        "positive": 1,
        "estimated_owners": "0 - 20000",
        "average_playtime_forever": 0,
        "average_playtime_2weeks": 0,
        "median_playtime_forever": 0,
        "median_playtime_2weeks": 0,
        "peak_ccu": 0,
        "tags": {
            "...": 22,
            ...
        }
    },
    ...
}

In the file 'ParseExample.py' you can see a simple example of how to parse the information.

โš™๏ธ Parameters

To change the input file uses the parameter '-i' / '-infile':

python SteamGamesScraper.py -i games.json

To change the output file uses the parameter '-o' / '-outfile':

python SteamGamesScraper.py -o output.json

There is a general API rate limit for each unique IP adress of 200 requests in five minutes which is one request every 1.5 seconds. That's why 1.5 seconds are waited by default. You can change this with the parameter '-s' / '-sleep':

python SteamGamesScraper.py -s 2.0

It is not recommended to set the wait time below 1.5 seconds.

You can disable the extra data collected in SteamSpy using '-p' / '-steamspy':

python SteamGamesScraper.py -p False

When this option is deactivated, some data will appear as empty.

When Steam denies a request, by default it is trying up to four times. You can change the number of retries with '-r' / '-retries':

python SteamGamesScraper.py -r 10

Although it is not recommended, you can set always retry by changing the value to 0.

By default prices are requested in US dollars. You can change the currency with the parameter '-c' / '--currency' and the country or region code:

python SteamGamesScraper.py -c es

By default the language is set to English. You can change the language wit the parameter '-l' / '--language' and the country or region code:

python SteamGamesScraper.py -l en

The games that have not yet been released are added to the file 'notreleased.json' and will not be checked again. If you want to ignore this list, you can set the parameter '-d' / '-released' to False, or eliminate the file.

At the end of the scan, or by pressing Ctrl + C, all data are recorded. You can activate the auto-save to activate each X new entries with '-a' / '-autosave':

python SteamGamesScraper.py -a 100

A backup file will also be generated with the previous data.

Do you want to add new games from a file? You can use the parameter '-u' / '-update' and the CSV file name to add new games. The AppID must be in the first column.

python SteamGamesScraper.py -u update.csv

Contributors โœจ

License ๐Ÿ“œ

Code released under MIT License.

steam-games-scraper's People

Contributors

elementmedia avatar fronkongames avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

steam-games-scraper's Issues

python .\SteamGamesScraper.py doesn't generate anything

I was wondering if this tool is still working? I installed the dependencies and started with "python .\SteamGamesScraper.py" however nothing is generated. I get the following returned pretty quick:

[i 15:44:12] Steam Games Scraper 1.2.0 by Martin Bustos [email protected].
[i 15:44:12] Loading 'discarted.json'.
[i 15:44:12] New dataset created.
[i 15:44:12] 74200 apps discarted.
[E 15:44:12] File update.csv not found.
[i 15:44:12] Done.

Include tags in code

FronkonGames,

Is it possible to insert tags in this csv? I'm having a huge difficulty because to finish my data analysis I need the tags column, or each tag described with 1 or 0 to determine if this tag belongs to the game line or not.

Tags are very important for data analysis and I realized that the data is up to date and it's not long before I finish.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.