pastelmind / d2txt Goto Github PK
View Code? Open in Web Editor NEWPython scripts for editing Diablo 2 TXT files, or converting them to INI files
License: MIT License
Python scripts for editing Diablo 2 TXT files, or converting them to INI files
License: MIT License
Hello, I've recently used d2txt for the first time and got the following errors when decompiling the official d2 1.10 Txt files
d2txt.DuplicateColumnNameError: ('mindam', 161, 'armor.txt')
d2txt.DuplicateColumnNameError: ('Type2', 9, 'AutoMap.txt')
d2txt.DuplicateColumnNameError: ('SkillName', 25, 'CharTemplate.txt')
Currently, methods that accept file objects or paths (e.g. D2TXT.load_txt()
) do not support pathlib.Path
objects, because they directly check if the path is a string with isinstance(inifile, str)
. Add support for Path
objects, by trying open()
first and catching a TypeError
.
As of writing, D2TXTRow
stores values in a list
ordered by column order. It searches a list
of column names to determine the appropriate index when retrieving a cell by column name. This is inefficient (O(N)).
Solution: Use dict
to speed up random access to each cell by column name.
There are two ways of doing this. One way is to use a master dict
held by D2TXT
that maps column names to indices. Another way is to make each D2TXTRow
store values in a dict
, and keep a master list
held by D2TXT
that remembers the order of column names.
dict
, row list
approachcollections.abc.Mapping
is implemented (source code), sequential access requires a key lookup, which increases time cost to O(1) average, O(N) amortized worst casedict
โ row list
)dict
, which increases complexitylist
, row dict
approachdict
must be built each timeOrderedDict
, but this is incompatible with column insertion/deletionUse master dict
, row list
approach. Row creations and iterations (including loading and saving TXT and INI files) are frequent, while column operations are relatively infrequent. Extra indirection when accessing cells by column name is acceptable.
To facilitate ease of use, D2TXT.__setitem__()
should accept both sequences and mappings. A sequence is treated as a list of cell values ordered by column name, and can be used to quickly create a row. A mapping is treated as a collection of column name-cell value pairs, and can be used to intuitively create a row. Use isinstance()
with Abstract Base Classes to check if the given object is a sequence or a mapping.
The goal is to reduce the number of config files as much as possible.
legacy_tox_ini
format is supported. This is an ugly trick that stores the entire INI as a TOML string. I'd rather store the config using a proper TOML syntax. That said, the documentation states that "there is a plan to add a pure TOML one soon". I can sit and wait.pyproject.toml
. I am looking forward to its release, since it also includes several desirable fixes.Many columns in TXT files are closely related to each other and often used together. When decompiling a TXT file to TOML, grouping these columns into a single key would improve readability and reduce both the number of lines and the overall file size. These columns shall be grouped when decompiling to TOML, and un-grouped when compiled back to TXT.
Each column group has a unique group alias. When a D2TXT
is decompiled, columns that belong to a column group are not directly written to the TOML file. Instead, their values are combined into a single line, using the group alias (which does not overlap with existing keys in the TXT file) as the key.
The details of column grouping rules, syntax, and the (mostly) full list of columns grouped have been moved to the wiki.
The Columns section of INI files generated by d2ini.py
is used to specify the order of columns in TXT files. Currently, it merely lists each column name in order with empty values:
[Columns]
column name 1=
column name 2=
...
This is ugly. It also depends on ConfigParser
preserving the order of keys (see Customizing Parser Behaviour section of ConfigParser
docs for more info).
Using column numbers as keys and names as values is more intuitive:
[Columns]
1=column name 1
2=column name 2
...
However, this requires extra code for parsing and sorting the COLUMNS section. It also creates an ambiguous situation when a column number is missing:
; column 2 is deleted by user, intentionally or by accident
[Columns]
1=column name 1
3=column name 3
...
If a column number is missing, should D2INI raise an exception, create an empty column, or ignore it?
Enabling the allow_no_value
option in ConfigParser
would prevent such ambiguities, in addition to eliminating useless equals signs:
[Columns]
column name 1
column name 2
...
Hi Pastel, when applying:
i've written a simple test script calling the skills file from my D2R install.
skills = D2TXT.load_txt('skills.txt')
#Plauge poppy max spawn
skills[222]['petmax'] = "1"
D2TXT.to_txt('result/skills.txt')
but when i compile i get TypeError on every single argument that can be called from your class.
is this only my issue?
Continuing the discussion from #3 (comment), which format should we switch to?
MonStats.txt
), which prevent us from saving them to INI.JSON does not support comments. This alone makes it undesireable.
TOML is very similar to INI files. It supports inline arrays. It also supports arrays of tables, which solves item 4.
TOML requires all string values to be quoted. This makes the output slightly more verbose, but also leaves less room for mistakes.
YAML supports inline sequences (flow style). However, PyYAML follows special rules to decide whether to emit block or flow style. I don't know if I can (or need to) modify this rule.
Item 4 can be solved by storing each rows in a sequence.
YAML is better than TOML for deeply nested structures. However, Diablo 2's TXT files (tab-separated values) are mostly flat, which means YAML's advantage is moot.
YAML also supports unquoted string literals.
YAML has several gotchas. Notably, it has multiple confusing variants of multi-line string syntax.
I know little about other markup formats. I don't want to use an obscure format.
As of writing, D2TXT.__setitem__()
does not support the slice syntax:
# This will raise an exception or corrupt your table!
d2txt[1:1] = [{'column 1': 'value 1', 'column 2': 'value 2'}]
This is because it does not check whether the given key is actually a slice object. For more information, see the docs on the slice()
function, as well as a section on slice objects in the Data model docs.
To add support for slice syntax, a type-check for the slice object is necessary, plus some code for inserting them:
# NOT TESTED
def __setitem__(self, key, value):
if isinstance(key, slice):
self._rows[key] = [D2TXTRow(self, row) for row in value]
else:
self._rows[key] = D2TXTRow(self, value)
Note that D2TXT.__getitem__()
and D2TXT.__delitem__()
do (unintentionally) support the slice syntax, because they delegate the key (which is either a number or a slice object) to the internal list of rows:
def __getitem__(self, key):
return self._rows[key]
Of course, this returns a list of rows still bound to the original D2TXT
object. Should it return a D2TXT
object instead? Maybe.
A shallow copy on slice would return a table whose rows are still bound to the original D2TXT
. This may be confusing, since it creates two seemingly distinct table objects that actually refer to the same table. Also, column operations would be tricky to implement. A column operation on one table must propagate to all rows on every other table.
A deep copy on slice would return a clone of the table. Since column operations on one table does not affect the other, there are no headaches. However, since built-in collections return a shallow copy on slice, cloning D2TXT
on slice might feel inconsistent.
Just returning a list (current behavior) solves most of the headaches. You don't get a new table, or a "view" of the old table; just a list of rows still linked to the original table. Practicality beats consistency.
Based on the findings in #9, add support for conversion to and from TOML. Also drop support for INI files, which rely on fragile, home-grown syntax.
TOML is better than INI in many aspects. It is more strict and leaves less room for mistakes. It brings its own string escaping rules, so that I don't have to make my own. It supports complex structures like nested lists, so that I don't have to reinvent obscure syntaxes (see #3).
I shall use toml and qtoml. Both are small packages (~90 KiB each, not counting upstream dependencies), so pulling both as dependencies sounds OK.
D2TXTRow should inherit from [collections.abc.Mapping]
, not collections.abc.Sequence
. The initial decision was affected by the fact that each row is internally stored as a list
. However, each cell is usually accessed using column name rather than column index. Thus, it is more intuitive to treat D2TXTRow as a mapping.
D2TXTRow does not inherit from collections.abc.MutableMapping
yet, because I do not intend to add support for adding/deleting entire columns in the near future. Even if column operations are added later, they are expensive, and should be exposed through methods on D2TXT. D2TXTRow must separately implement __setitem__()
that only allows direct assignments on existing keys (column names).
Previously, D2TXTRow.__getitem__()
and D2TXTRow.__setitem__()
accepted both column names (str
) and indices (int
) as keys. However, this required type()
-checking the key. It could also create ambiguous situations when a column uses a number as its name ("1"
, "2"
, etc.). To prevent such ambiguities, these methods should no longer accept column indices as keys.
Note that ordered iterations are still possible using D2TXTRow.values()
and D2TXTRow.items()
. Due to how collections.abc.Mapping
is implemented (source code), these operations require a key lookup. Thus, each sequential access has a time cost of O(1) average, O(N) amortized worst case (compared to always O(1) when inheriting from collections.abc.Sequence
).
As of writing, D2TXT
does not provide explicit support for adding new rows. In particular, D2TXT.__setitem__()
accepts a list
(same as internal representation), despite D2TXTRow
behaving like a mapping for practical purposes.
D2TXT
should accept most insert operations supported by list
, and accept dict
objects when inserting new rows:
d2txt.insert({}, i)
d2txt.insert(row_dict, j)
d2txt.append({})
d2txt.append(row_dict)
If row_dict
contains a key that does not match any column names in the D2TXT
object, raise a KeyError.
Many small packages are contained in a single module (file). d2txt.py
and d2ini.py
are small, so combining them into a single module may be ok.
For more information, see the Python Modules section of the An Overview of Packaging for Python from the Python Packaging User Guide.
d2txt.py
from GitHubAnimData.d2
)Use d2txt.py
and d2ini.py
for the moment. If I ever add support for other file formats, I can always add a new script.
Column Groups were proposed in #3 and implemented in #14. In doing so, I discovered that several types of columns are not suitable for describing with TOML arrays.
For example, several item-related TXT files have multiple fields with names like modXcode
, modXparam
modXmin
, modXmax
, where X
is a positive integer. Such fields are clearly meant to be edited as one subgroup, so it is desirable to place each on a single line. Using TOML arrays, they can be grouped like this:
[[items]]
id = 1
--mod1 = ['damage', '100', '200']
--mod2 = ['fire-damage', '50', '60']
--mod3 = ['swing', '20', '20']
Note that the member columns have heterogeneous types--modXcode
is a string, while other fields are numbers. Since arrays in TOML v0.5.0 spec disallow heterogeneous types, integer fields must be encoded as strings, or use the "nested array trick":
--mod1 = [['damage'], [100], [200]]
This is ugly.
Since each value in this group has a different meaning, placing them in a single array--a data structure meant to store multiple entries of the same type--is fundamentally awkward.
In addition, one has to memorize the order of each value within the group. I attempted to solve this by providing descriptive column group aliases:
--mod1-CodeMinMax = [['damage'], [100], [200]]
But this is just as ugly and hard to decipher.
This type of data is suited for nested dictionaries. In TOML, such data structures can be described compactly using inline tables:
[[items]]
id = 1
mod1 = { code = 'damage', min = 100, max = 200 }
mod2 = { code = 'fire-damage', min = 50, max = 60 }
mod3 = { code = 'swing', min = 20, max = 20 }
There are two other possible formats, both more verbose than inline tables. Dotted child tables:
[[items]]
id = 1
[items.mod1]
code = 'damage'
min = 100
max = 200
[items.mod2]
code = 'fire-damage'
min = 50
max = 60
[items.mod3]
code = 'swing'
min = 20
max = 20
...and dotted keys:
[[items]]
id = 1
mod1.code = 'damage'
mod1.min = 100
mod1.max = 200
mod2.code = 'fire-damage'
mod2.min = 50
mod2.max = 60
mod3.code = 'swing'
mod3.min = 20
mod3max = 20
These forms, are more verbose than tables without column groups. Inline tables are clearly better for the job.
Keys for column group tables are prefixed with two underscores (__
). This keeps them visually distinct from column group arrays, which are prefixed with two dashes (--
).
Example:
__rArm = { left = 5, right = 10, top = 20, bottom = 25 }
Since each subkey describes the purpose of each value, the key (alias) should be short. Less than 12 characters is good.
Column group tables should ideally have between 2 and 6 values. Each value should have a distinct meaning (cf. etype1
, etype2
, etype3
, ...).
The column_groups
table at the top of each TOML file describes the column groups used in the file. Each key-value pair describes a column group array: the key is the column group alias, and the value is an array of member columns.
This format can be extended to describe column group tables as well:
[[column_groups]]
# Metadata for column group arrays
--Mod1-MinMax = ['mod1code', 'mod1min', 'mod1max']
--Mod2-MinMax = ['mod2code', 'mod2min', 'mod2max']
--Mod3-MinMax = ['mod3code', 'mod3min', 'mod3max']
# Metadata for column group tables
--Mod1 = { code = 'mod1code', min = 'mod1min', max = 'mod1max' }
--Mod2 = { code = 'mod2code', min = 'mod2min', max = 'mod2max' }
--Mod3 = { code = 'mod3code', min = 'mod3min', max = 'mod3max' }
Subkeys such as code
, min
, and max
are henceforth referred to as column member aliases, or member aliases for short.
The metadata maps member aliases to member column names to keep consistent with how the inline tables are used in rows. It also allows us to parse TOML slightly faster, since this metadata can be converted to mappings of member aliases to column names.
Both toml and qtoml can parse inline tables. Unfortunately, neither supports generating inline tables via public APIs.
I skimmed the source code of toml v0.10.0 and qtoml 0.3.0. Both packages do support generating inline tables from dictionaries, though not easily.
toml generates an inline table if a [toml.TomlEncoder
] initialized with preserve=True
is used and the object is an instance of [toml.decoder.InlineTableDict
]. This class is a direct subclass of [object
] and therefore cannot be used as a mapping. However, [toml.TomlDecoder
] provides the [get_empty_inline_table()
] method, which returns a an InlineTableDict
instance that also subclasses [dict
]. Easy peasy. Too bad we don't use toml.dumps()
--it's bugged.
qtoml generates an inline table for an object if qtoml.encoder.TOMLEncoder.is_scalar()
returns True
. Normally, this only occurs if a dictionary is inside a list inside another list. I tried to circumvent this by creating a custom dict
subclass, which I added to qtoml.TOMLEncoder.st
as the key with TOMLEncoder.dump_itable
as the value.
from qtoml.encoder import TOMLEncoder
class MyInlineDict(dict):
pass
hacked_encoder = TOMLEncoder()
hacked_encoder.st[MyInlineDict] = hacked_encoder.dump_itable
This made the encoder generate inline tables for instances of my custom class. Unfortunately, it was also generating nested tables for the same data:
[[items]]
mod1 = { code = 'damage', min = 100, max = 200 }
mod2 = { code = 'damage', min = 150, max = 250 }
[items.mod1]
code = 'damage'
min = 100
max = 200
[items.mod2]
code = 'damage'
min = 150
max = 250
This is because the current implementation of TOMLEncoder.dump_sections()
always renders dict
instances as individual sections. Thus, we need a custom mapping class that is not a subclass of dict
.
Fortunately, the standard library already provides such a class: collections.UserDict
. It acts just like a dict
, except that isinstance(o, dict)
returns False
.
Final solution:
from qtoml.encoder import TOMLEncoder
from collections import UserDict
hacked_encoder = TOMLEncoder()
hacked_encoder.st[UserDict] = hacked_encoder.dump_itable
# Use hacked_encoder.dump_sections(o, [], True) to dump TOML
Python has explicit library support for warnings. Let's use them where appropriate.
Potential uses of warnings:
argparse
fails to deal with itI followed the installation instructions:
pip install d2txt
in PowerShell.The command d2txt
is not available. PowerShell and CMD don't recognize it as a command. (Neither does the Python interactive environment.)
Do I need further steps to add d2txt
to my path?
Thanks in advance! :)
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.