RimWorld game save data analyzer
viperior / rimhistory Goto Github PK
View Code? Open in Web Editor NEWRimWorld game save data analyzer
License: MIT License
RimWorld game save data analyzer
License: MIT License
Verify that the Python version used in GitHub workflows is >3.6. Update as needed.
Use multiprocessing
to parallelize the process of loading the XML data into Save
objects and converting extracted subsets into pandas DataFrames. This massively improves load times when working with a large number of source files and auto-scales based on the machine's local resources.
Memory usage is kept in check by ensuring excess XML data is deleted at the end of Save.__init__()
.
After adding support for loading and aggregating data from multiple save files, I discovered what appears to be a data duplication bug for pawn data. I am using a new save file series with more playtime and event history than the original test file. The number of pawns being reported is 26, which suggests possible duplication. The highest number reported from a single save in the series should be around 5 or 6.
Full error details from pytest output:
Run python -m pytest -v -x -n auto
python -m pytest -v -x -n auto
shell: /usr/bin/bash -e {0}
env:
pythonLocation: /opt/hostedtoolcache/Python/3.10.2/x64
LD_LIBRARY_PATH: /opt/hostedtoolcache/Python/3.10.2/x64/lib
============================= test session starts ==============================
platform linux -- Python 3.10.2, pytest-7.0.1, pluggy-1.0.0
rootdir: /home/runner/work/rimhistory/rimhistory, configfile: pytest.ini, testpaths: tests
plugins: xdist-2.5.0, forked-1.4.0
gw0 I / gw1 I
gw0 [14] / gw1 [14]
.........F
=================================== FAILURES ===================================
_____________________________ test_get_pawn_count ______________________________
[gw1] linux -- Python 3.10.2 /opt/hostedtoolcache/Python/3.10.2/x64/bin/python
test_data_list = ['test_data/demosave 1.rws.gz', 'test_data/demosave 3.rws.gz', 'test_data/demosave 2.rws.gz']
def test_get_pawn_count(test_data_list: list) -> None:
"""Test counting the number of pawns identified from the save data
Parameters:
test_data_list (list): The list of paths to the test input data files (fixture)
Returns:
None
"""
pawn_data = Save(path_to_save_file=test_data_list[0]).data.dataset.pawn.dictionary_list
> assert len(pawn_data) == 3
E AssertionError: assert 26 == 3
E + where 26 = len([{'pawn_ambient_temperature': '28.46966', 'pawn_biological_age': '4', 'pawn_chronological_age': '4', 'pawn_id': 'Thing...8601', 'pawn_biological_age': '100', 'pawn_chronological_age': '100', 'pawn_id': 'Thing_Android2Tier288152', ...}, ...])
tests/test_pawn_data.py:17: AssertionError
=========================== short test summary info ============================
FAILED tests/test_pawn_data.py::test_get_pawn_count - AssertionError: assert ...
!!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
!!!!!!!!!!!! xdist.dsession.Interrupted: stopping after 1 failures !!!!!!!!!!!!!
========================= 1 failed, 9 passed in 18.55s =========================
Error: Process completed with exit code 2.
Branch: feature/async-load
Commit tested: b337a17
I'm guessing pawn data is duplicated in the save file for various reasons. I need to identify the "single source of truth" about colonist-type pawns and formulate the correct XPath patterns to target those. I may also need to select the "current" XML element representing the current, rather than a past, state of the pawn.
Simplify the property names used in the Save
class by completing #21 and removing the then-obsolete dataset
level in the data
Bunch
object. Store the named datasets at the same level as the non-dataset properties.
from save import Save
save = Save(path_to_save_file="saves/save 1.rws")
# Before change
plant_data = save.data.dataset.plant
# After change
plant_data = save.data.plant
Document the data model created by rimhistory's ELT design
A RimWorld save file can be thought of as a snapshot of the game state. To conduct time series analysis, multiple snapshots are needed. Now that the ELT design for extracting some of the key datasets is complete, it is time to implement a way to combine data from a series of save files into a single dataset. The time dimension will distinguish each dataset being unioned.
For example, the app can currently load plant data from one save at a time:
import statistics
from rimhistory.save import Save
save_1 = Save("saves/mysave 1.rws")
save_2 = Save("saves/mysave 2.rws")
save_3 = Save("saves/mysave 3.rws")
total_plants_1 = len(save_1.data.dataset.plant.dictionary_list)
total_plants_2 = len(save_2.data.dataset.plant.dictionary_list)
total_plants_3 = len(save_3.data.dataset.plant.dictionary_list)
average = statistics.mean([total_plants_1, total_plants_2, total_plants_3])
print(f"Average living plants over time = {average}")
The SaveSeries
class will streamline these operations:
from rimhistory.save import SaveSeries
series = SaveSeries(
save_dir_path="path/to/saves",
save_file_regex_pattern=r"mysave\s\d{1,10}"
)
average = len(series.dataset.plant.dataframe.index) / len(series.dictionary)
print(f"Average living plants over time = {average}")
In run https://github.com/viperior/rimhistory/runs/6662782380
The gameTicks value is off, but not by much. Not sure if this is because I forgot to update after changing the test data or if it's a sorting issue. Determine the root cause and fix so build workflow passes.
Add a feature that allows rimhistory
to load RimWorld save files hosted in an S3 bucket.
New options in config.json
:
rimworld_save_file_source
(str): The source to use to load the save files (local, s3)rimworld_save_file_s3_bucket
(str): The name of the S3 bucket to load save files fromThe current implementation uses glob
and regex to scan a local directory for save files. It does not sniff the XML before loading into a series, although that would be a nice addition to enhance series loading. Instead, it relies on file naming conventions. It ignores autosaves. Rimhistory would work best when paired with a mod that uses templates to name autosaves. I found 2 or 3 such mods available on the Steam Workshop.
To support S3 loading, new functionality must be added to perform the matching against S3 object names in the bucket. After that, the content is loaded as a string into a Save
object via SaveSeries
.
The Save
class is storing the datasets as:
Reduce memory usage by dropping the list of dictionaries as soon as the pandas DataFrame is created from it. This also allows the Bunch path to be simplified from save.data.dataset.plant.dataframe
to save.data.dataset.plant
. A list of dictionaries can be re-created as needed, but most operations can access the pandas DataFrame directly.
def Save.__init__():
# Extract datasets
# Delete the root object to free up memory
# Generate pandas DataFrames from each dataset initialized as a list of dictionaries
<-- Add a step here that deletes each list of dictionaries
# Apply transformations to DataFrames
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.