xparq / args Goto Github PK
View Code? Open in Web Editor NEWPerhaps the tiniest C++ cmdline processor with a serious feature set
Perhaps the tiniest C++ cmdline processor with a serious feature set
FEATURES -------- (Just to codify existing behavior as "expected" (and testable), rather than leaving them "accidental"...) - [x] Classic named (option) and unnamed (positional) arguments - [x] intermixed - [x] Prefix char either - or / freely mixed, - [ ] but that can be disabled - [x] Both short and long options: -x --long - [x] Long options only as --long (so //whaaat is always positional) - [x] Aggregated short options: -xyz - [x] with the last one possibly taking values: -xyz param-for-z - [x] multiple values, too: -xyZ Zval1 Zval2 - [x] greedy, too: -xyZ Zval1 Zval2 ... Zvalx-up to -this - [x] A bare -- turns off named args. for the rest of the cmdline by default, but it - [x] can be configured to be a regular positional arg. (*for now it always is!*) - [x] Options are predicates by default, with simple bool checks: args["x"], args["long"] - [x] Long options can take values without config.: --name=val - [x] Any option can take values if configured so: -a file --except *pattern - [x] long ones also without = in this case - [x] query (as std::string): args("a") -> "file", args("except") -> "*pattern" - [x] Outputs also available in args.named() -> std::map, args.positional() -> std::vector - [x] Use the non-const accessors to modify these containers as you wish (they are *yours*, right? ;) especially after parsing...) - [x] Options (short or long) can also have multiple parameters --multi a b c - [x] query like: args("multi", 2) -> "c", - [x] or get them all with args.named("multi") -> std::vector{"a", "b", "c"} - [x] Options can be set to "greedy" to take each value up to the next opt., - [x] or only a fixed n. of values - [x] Repeated options override earlier ones by default - [x] Repeated options can also be set to - [x] be ignored, - [x] append (for multi-val opts.), - [x] fail - [x] Parsing on construction: Args args(argc, argv) - [x] Deferred parsing: Args args; args.parse(argc, argv) - [x] Reparsing with different config: reparse(flags = Defaults, rules = {}) - [x] The instance can be reused for completely new parses, too: parse(new_argc, new_argv, flags = Defaults, rules = {}) - [x] The last used argc/argv are available as args.argc, args.argv (in case they're needed outside of main(), e.g. via myApp.args) - [x] exename(): argv[0], but stripping the path and - [x] the extension (".exe" by default, but -> exename(false, ".mysuffix"), - [x] unless its "true value" :) is requested with exename(true) - [x] Quick bool check if there have been any args: if (args), if (!args) EXAMPLES -------- - A simple one: #include "Args.hpp" #include <iostream> using std::cout; int main(int argc, char** argv) { Args args(argc, argv); if (args) cout << "Some args are present.\n"; if (!args || args["h"]) cout << "Usage: " << args.exename() << " " << "[-h] [-x] [--long] [whatever...]\n"; if (args["x"]) cout << " 'x' was set\n"; if (args["long"]) cout << " 'long' was set" << (args("long").empty() ? "" : " to " + args("long")) << '\n'; for (auto a: args.positional()) cout << " positional arg.: " << a << '\n'; }
E.g., these are failing:
RUN args-test --arg=1 not-a-param!
EXPECT "\
-------- NAMED:
arg = 1
-------- POSITIONAL:
not-a-param!
"
RUN args-test --arg 1 not-a-param!
EXPECT "\
-------- NAMED:
arg
-------- POSITIONAL:
1
not-a-param!
"
Just dump the current state... Well, and also quote args with spaces... and other scary chars...
map
doesn't preserve order, so we're fucked with the named ones!... :-o (And no, ordered_map
doesn't mean the order of insertion either!... :-/ ) And then listvals()
that's used in the tests could be added, too, as that could just the very same mechanics. The tests use a stream as an output:
auto listvals(auto const& container, const char* tail = "\n", const char* sep = ", ")
{
for (auto v = container.begin(); v != container.end(); ++v)
cout << (v == container.begin() ? "":sep)
<< *v
<< (v+1 == container.end() ? tail:"");
}
but it could just as well write to a string (and an improved one could even be nice and precalc. its length first -- and an even nicer one that also does auto-quoting, or an even nicer one that does that with multi-char quotes... :) ):
#include <string>
#include <string_view>
#include <cstring>
#include <cassert>
#define FOUND(expr) ((expr) != std::string::npos)
#define CONTAINS(str, chars) FOUND((str).find_first_of(chars))
string listvals(auto const& container, const char prewrap[] = "", const char postwrap[] = "", const char sep[] = ", ",
const char* quote = "\"", // not const char[], to hint that it accepts nullptr!
const char* scary_chars = " \t\n")
// The pre/post wrapping are optional parts that only get written if not empty,
// to support cases where callers would otherwise have to add an annoying
// `if (container.empty())` or two themselves.
{
string result;
if (!container.empty()) {
size_t QLEN = quote ? strlen(quote) : 0;
// Precalc. size... (Note: we're processing cmd args. We got time.)
size_t size = strlen(prewrap) + (container.size() - 1) * strlen(sep) + strlen(postwrap);
for (auto& v : container)
size += v.length()
+ (quote && *quote && CONTAINS(v, scary_chars) ? // add quotes...
(QLEN>1 ? QLEN:2) : 0); // special case for 1 (-> pair)!
result.reserve(size);
// Write...
result += prewrap;
for (auto v = container.begin(); v != container.end(); ++v) {
if (quote && *quote && CONTAINS(*v, scary_chars))
{ result += string_view(quote, quote + (QLEN/2 ? QLEN/2 : 1)); // special case for 1 quote!
result += *v;
result += string_view(quote + QLEN/2); }
else { result += *v; }
result += (v+1 == container.end() ? postwrap : sep);
}
//cout << "\n\n["<<result<<"]: " << "result.length() =? size: " << dec << result.length() << " vs. " << size << "\n\n";
assert(result.length() == size);
}
return result;
}
#undef FOUND
#undef CONTAINS
...then the args "serializer" could be as simple as (well, but still needs to write to a string, as the other!):
void dumpargs(Args& args, char prefixchar = '-', const char* longprefix = "--")
{
// Named...
for (auto& [name, val] : args.named()) {
if (name.length() == 1)
cout << prefixchar << name << listvals(val, " ", "", " ");
else
cout << longprefix << name << listvals(val, "=", "", " ");
cout << " ";
}
// Positional...
cout << listvals(args.positional(), "", "", " ");
}
Could be extra useful if the named/positional accessors would drop their (pretty orthodox) const
(#55)! Then you could manipulate the arg set, and then "render" it to a new command line!
const std::vector<std::string>& positional() const { return unnamed_params; }
//! Note: std::map[key] would want to create a key if not yet there,
//! so named()[...] would fail, if const! But constness it is... (for consistency).
//! Use the op() and op[] accessors for direct element access!
const std::map<std::string, std::vector<std::string>>& named() const { return named_params; }
It's pretty orthodox to have them always const
. I mean it's yours, change it all you want! :) Especially for #54!...
I can imagine use cases where a pristine command-line, safe from tampering, is nice to have, but as soon as you have the modifying methods, the same object is no longer tamper-proof anyway: just having const accessors, too, won't help. ;)
So, just copy your command-line to a const object if you want a safe reference.
Going with non-const only... (You'll get your compilation errors all right when trying to actually use those methods on a const obj.)
-> Err... OK, but then... No... Other const accessors do call these, so I still need the const pairs after all, too... :-/
-> #4
So this shouldn't touch other extensions, only the usual ones (like .exe
on Windows, but there could be any other extensions, and the real problem is it could be not at all (or just well-) defined on other systems...
The default (no flag) should be "best-effort" stripping of .exe (any other straightforward ones on mainstream systems??).
And then various params should tune the exact behavior, incl. supplying the string to chop off.
E.g. -> test/_engine/run_case
GitHub shows it best, check the other files, too!
There's no way currently to have -1
or "--++##XXX#++--"
etc. as positional args, or allow things like /c
or //file/...
both as an option and a path. (However, that //
for long opts. really shouldn't be supported at all... Add a (negative) test for that!)
RejectUnknown
flag, the only thing left is uncomment (and implement) it. ;)--
could perhaps just turn that flag on, and call it a day! :-o (Which would be awkward for consistency, in case someone would like to check the flags after parsing. It's not the a very appealing attitude to use config data arbitrarily as mutable state. :) )
RejectUnknown
has no effect, there should be a way to disable them (to get filenames like --known
).-- --
should obviously result in a positional arg. called --
. (Test it, as it's probably easy to f* up!)E.g. BB tolerated:
script.sh
via just script
:-o (not via PATHEXT
)./
prefix for running executables in the current dir(Plus a bunch of other small changes.)
Specific test case that brought me here: --take-two 1
should really be easily/readily detectable!
Well, first of all, rule violations should be detected... And then some sort of reporting would be nice... ;)
Even just removing the faulty ones could be better than nothing. (That would actually fit the minimalistic design quite nicely.)
It's just recursive, that's all...
The build cmd should just come from a macro with a dynamic name, like $(BUILD$(TOOLSET))
.
Actually, there's no way currently to have -file
as a positional arg! :-o
-> #4
I mean, now that I actually looked at the ash
manpage, I can be sure that $var
is just fine. :) (Yyyep, it would have taken less effort to look it up upfront than all the repeatedly mistyped {
and }
...)
But be careful not to screw up the useful simplicity by e.g. removing the unknown options from the regular containers (named
or positional
), because it would be suicide to always require a cumbersome ruleset for checking the supported options!
This should strictly be an optional aid for processing errors.
Fortunately, at least the entire --opt=val
string is captured. (But remember --opt=val1,val2,val3
, or perhaps even --opt=v1, v2, v3
, or even --val="a b c", v2,v3
!...)
False alarm; see #43 instead!
It does, but Space Test is having a problem with:
RUN args-test //
Something, somewhere converts it to a single /
... Even with '//'
or \/\/
(or the even more perverted \'//\'
)...
It's just fine directly from the command-line with args-test //
!... :-o :-/
Copied from sfml-test:
#include "Args.hpp"
#include <iostream> // cerr
using namespace std;
int main(int argc, char* argv[])
{
Args args(argc, argv, {
{"moons", 1}, // number of moons to start with
{"i", -1}, // any number of args up to the next arg or EOS
});
//auto exename = args.exename();
//test: args = Args(argc, argv); // <- no args take params.
cerr << "--------\n"; for (auto const& p : args.named()) {
cerr << p.first << (p.second.empty() ? "\n" : " = ");
for (auto const& v : p.second) cerr << " " << v << "," <<endl;
}
cerr << "--------\n"; for (auto const& p : args.unnamed()) { cerr << p << endl; }
if (args["?"] || args["h"] || args["help"]) {
cout << "Usage: [-V] [-moons n]" << endl;
return 0;
}
if (args["V"]) {
cout << "Version: " << LAST_COMMIT_HASH << endl;
return 0;
}
}
I only noticed due to some weird #include hacking in another project... :)
The use case is repeating arg
with --arg
to override any prev. settings.
--arg
can kinda still set to empy, albeit it's semantics is a predicate (bool), not "set to empty"!... It's "emptiness" just means [] returns true, but () still returns ""...
But... Even though the override logic should be fixed here (#16!), it's still up to the app, actually, to decide what to do with (arg) == ""
or [app] == false
, or any other values!
--thing=
but have [thing]
as false!Well, it is supported, implicitly, as the shell (or the C runtime on Windows) already passes quoted strings (with spaces) as single argv
words.
But the processing is far from intuitive sometimes, so improvements might be welcome. Via other, more specific issues, tho, as they come up.
Alas, find
won't fail on its -exec
children:
find -name '*.case' -not -path "./_*/*" -exec ${runner} \{\} \;
Trying with xargs
:
find -name '*.case' -not -path "./_*/*" -print0 | xargs -0 -n1 ${runner}
But my alternative (selective) runner loop also had a bug! set failed=1
instead of failed=1
... :-/
Testing the GHA with an intentionally broken test case... OK, did "fail properly".
With moons
taking 1 param, args-test.exe --moons=11 --unknown=123 --moons=-99
is now incorrect:
NAMED:
moons = 11, -99
unknown = 123
The new moons
should be ignored if it's a multi-value arg. and already has enough:
--moons=1,2,3 --unknown=123 --moons=4,5 --moons=6
Should be 1, 2, 3 + error flags.
-> But... Now (after a ~year), however, I think the "it doesn't yet have enough..." case shouldn't actually be supported! -> #44 instead!
--moons=1,2,3 --junk --moons=4,5,6
?--moons=1,2 --junk --moons=3 -x --moons=4,5,6
? Shouldn't this accumulate all the params then? I feel that greediness is a weaker factor, though, than consistency (with the other arities).Override
as default, Append
as an option...for (a : args) { cout << a << endl; }
But what exactly to include here, and how to uniformly represent each (included) item (token?), despite their various semantic possible roles on the cmdline??
One straightforward option is to only include the positional parameters (simply forwarding to unnamed()
).
This is to help debugging individual cases without entering the test case dir manually just for executing a local exe there, or even just changing the last command too much...
In this mode the EXPECT phase will likely be off, so a warning should be printed.
Note: the params would be appended to every RUN/SH statement of the case.
These all fail (note: the test app has --greedy and -G as "greedy", but that doesn't matter here):
SH echo Override with empty, long greedy
RUN args-test --greedy 1 2 3 --greedy
EXPECT "Override with empty, long
-------- NAMED (1):
greedy
-------- POSITIONAL (0):
"
Should also verify this then:
SH echo Override with empty, short greedy
RUN args-test --G 1 2 3 -G
EXPECT "Override with empty, long
-------- NAMED (1):
G
-------- POSITIONAL (0):
"
And of course the simplest case of empty override:
RUN args-test --thing=1 --thing
EXPECT "Override with empty, long
-------- NAMED (1):
thing
-------- POSITIONAL (0):
"
Non-greedy known ones can't be empty (--take-two
takes 2)?
SH echo Override with empty, long (ERROR: insufficient parameters)
RUN args-test --take-two 1 2 3 --take-two
EXPECT "Override with empty, long
-------- NAMED (1):
take-two
-------- POSITIONAL (0):
"
...and:
SH echo Override with empty, short
RUN args-test --G 1 2 3 -G
EXPECT "Override with empty, long
-------- NAMED (1):
G
-------- POSITIONAL (0):
"
The current generic one has an array (std::vector
) for every named argument, because they may have multiple values.
But they almost never do...
So, there should be a getter to receive a simple map<string, string>
a) in case no args have multiple values,
b) or even if some do, they could just be omitted, by some explicit request (e.g. a different call, or a param.).
OTOH, this feels like such a superficial issue: why would (99% of) anyone not be done with just the direct [] and () named accessors?...
All would be still configurable, of course, incl. having the current behavior exactly. It's just about more intuitive defaults.
The current default of short options being just nullary predicates* has bitten me at OON, with -C cfgfile
"not working"...
The new defaults may sound like a convoluted, arbitrary rule, but may be more intuitive actually, as positional args. are
rarely intermixed with named ones in practice (they tend to come after the "switches")!
One big problem, though: during the parsing loop there's no way to know in advance that the "last one" is actually the last one, so that might mean reparsing that whole chunk! Which, well, could be reduced to
And another, possibly even a show-stopper for this change: this would prevent intermixing named and unnamed options at all, without requiring to define rules for each (named) option! :-o
But... This may just be how command-line args should actually work? I can't come up right away with a practical example, where unnamed options are not in fact just params of some named arg. But scripts that need to tuck positional args at the end of half-assembled command strings, without knowing what came before, may suffer!
* Wait, no... :-o This is what the general
test case says:
RUN args-test.exe --one 99 2 3 4 -i a b c -h --long=1
EXPECT "\
-------- NAMED:
h
i = a, b, c
...
WTF is going on then? :-o
Arrrghs!... :) The test exe has non-default rules, and has -i defined as greedy!
-> #5... Dup.?
Using std::sto...()
with a simpler/nicer API
And test this feature e.g. with the README case, outputting the .exe path, which differs in my local expected results...
That's what options are, optional, right?
And then show its use prominently in the Readme.
It's already documented that -n in the rules should mean at "least n"...
Currently only -1 is supported, but that's as a special case, and it's even "broken", in that it means "at least 0" now... :)
Otherwise this couldn't work:
Args args (argc, argv); // or just Args args;
try {
args.parse(); // nice and cosy place to catch errors!
// or: args.parse(argc, argv); // could also reset & reparse with other sets of args; could be handy!
} catch(...) {
// boo!
}
if (args["flag"]) // use args...
Especially as --multi=1 2 3
also works (however surprising/counter-intuitive it may be), if it's defined to take a fixed number of params. Which is kinda half the solution, it's "just" that the space should also accept a ','... ;) (Well, no... that space is processed in a totally different context, AFAICR.)
But that would beg the question of "shouldn't we also support --multi=1, 2, 3
then?!"...
--multi=1 ", or, " 3
) Which is a) a shitstorm on Windows, and b) would not help on Unix anyway, as the comma would just still be there all the same, quoted or not... Some arbitrary escaping could help there, something different from what the shell already does (to avoid the usual "how many \ now?!", or on Windows: "so, why exactly ^' does fkn nothing here, again?!")! Yuck!
I think it could be OK to just leave that to the app, and perhaps give it a hand with a split()
function, with a default set of separators (like ",;: <TAB>"
), putting the results into
-> Just saving this comment from the source here: //!! const char* split_sep = ",;"; // split("option") will use this by default
argv[0]
tends to return the full path, but e.g. in help messages or in test cases etc. the exe name would be more useful, and it's tedious to manually carve it out from the path.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.