xparq / args Goto Github PK

Perhaps the tiniest C++ cmdline processor with a serious feature set

C++ 31.37% Shell 60.03% Batchfile 8.36% PowerShell 0.24%

args's Introduction

FEATURES
--------

(Just to codify existing behavior as "expected" (and testable), rather than leaving them
"accidental"...)

- [x] Classic named (option) and unnamed (positional) arguments
      - [x] intermixed
- [x] Prefix char either - or / freely mixed,
      - [ ] but that can be disabled
- [x] Both short and long options: -x --long
      - [x] Long options only as --long (so //whaaat is always positional)
- [x] Aggregated short options: -xyz
      - [x] with the last one possibly taking values: -xyz param-for-z
      - [x] multiple values, too: -xyZ Zval1 Zval2
      - [x] greedy, too: -xyZ Zval1 Zval2 ... Zvalx-up to -this
- [x] A bare -- turns off named args. for the rest of the cmdline by default, but it
      - [x] can be configured to be a regular positional arg. (*for now it always is!*)
- [x] Options are predicates by default, with simple bool checks: args["x"], args["long"]
- [x] Long options can take values without config.: --name=val
- [x] Any option can take values if configured so: -a file --except *pattern
      - [x] long ones also without = in this case
      - [x] query (as std::string): args("a") -> "file", args("except") -> "*pattern"
- [x] Outputs also available in args.named() -> std::map, args.positional() -> std::vector
      - [x] Use the non-const accessors to modify these containers as you wish
            (they are *yours*, right? ;) especially after parsing...)
- [x] Options (short or long) can also have multiple parameters --multi a b c
      - [x] query like: args("multi", 2) -> "c",
      - [x] or get them all with args.named("multi") -> std::vector{"a", "b", "c"}
- [x] Options can be set to "greedy" to take each value up to the next opt.,
      - [x] or only a fixed n. of values
- [x] Repeated options override earlier ones by default
- [x] Repeated options can also be set to
      - [x] be ignored,
      - [x] append (for multi-val opts.),
      - [x] fail
- [x] Parsing on construction: Args args(argc, argv)
- [x] Deferred parsing: Args args; args.parse(argc, argv)
- [x] Reparsing with different config: reparse(flags = Defaults, rules = {})
      - [x] The instance can be reused for completely new parses, too:
            parse(new_argc, new_argv, flags = Defaults, rules = {})
      - [x] The last used argc/argv are available as args.argc, args.argv
            (in case they're needed outside of main(), e.g. via myApp.args)
- [x] exename(): argv[0], but stripping the path and
      - [x] the extension (".exe" by default, but -> exename(false, ".mysuffix"),
      - [x] unless its "true value" :) is requested with exename(true)
- [x] Quick bool check if there have been any args: if (args), if (!args)


EXAMPLES
--------

- A simple one:

	#include "Args.hpp"
	#include <iostream>
	using std::cout;
	int main(int argc, char** argv)
	{
		Args args(argc, argv);

		if (args)
			cout << "Some args are present.\n";

		if (!args || args["h"])
			cout << "Usage: " << args.exename() << " "
		             << "[-h] [-x] [--long] [whatever...]\n";

		if (args["x"])
			cout << "  'x' was set\n";

		if (args["long"])
			cout << "  'long' was set"
			     << (args("long").empty() ? "" : " to " + args("long"))
			     << '\n';

		for (auto a: args.positional())
			cout << "  positional arg.: " << a << '\n';
	}

args's People

Contributors

Stargazers

Watchers

args's Issues

Flag to disable either / or - as prefix

Add `exename()` (or something like that)

argv[0] tends to return the full path, but e.g. in help messages or in test cases etc. the exe name would be more useful, and it's tedious to manually carve it out from the path.

Add typed accessors

Fix: `-long` is incorrectly recognized as a long option

-> #8, but shouldn't happen even without implementing #8!

Add a default ctor to allow deferred parsing

Otherwise this couldn't work:

Args args (argc, argv); // or just Args args;

try {
    args.parse();  // nice and cosy place to catch errors!
    // or: args.parse(argc, argv);  // could also reset & reparse with other sets of args; could be handy!
} catch(...) {
    // boo!
}

if (args["flag"]) // use args...

Fix: Tests: The set-runner lies an overall "OK" result with failed cases!

Alas, find won't fail on its -exec children:

find -name '*.case' -not -path "./_*/*" -exec ${runner} \{\} \;

Trying with xargs:

find -name '*.case' -not -path "./_*/*" -print0 | xargs -0 -n1 ${runner}

But my alternative (selective) runner loop also had a bug! set failed=1 instead of failed=1... :-/

Testing the GHA with an intentionally broken test case... OK, did "fail properly".

Scripts: Resolve the horrible duplication across the `RUN` vs. `SH` functions...

-> xparq/Space_Test#33

Support --multi=1,2,3 (not just --multi 1 2 3)!

Especially as --multi=1 2 3 also works (however surprising/counter-intuitive it may be), if it's defined to take a fixed number of params. Which is kinda half the solution, it's "just" that the space should also accept a ','... ;) (Well, no... that space is processed in a totally different context, AFAICR.)
But that would beg the question of "shouldn't we also support --multi=1, 2, 3 then?!"...
- And then the whole mess of quoting! (--multi=1 ", or, " 3) Which is a) a shitstorm on Windows, and b) would not help on Unix anyway, as the comma would just still be there all the same, quoted or not... Some arbitrary escaping could help there, something different from what the shell already does (to avoid the usual "how many \ now?!", or on Windows: "so, why exactly ^' does fkn nothing here, again?!")! Yuck!
  - Oh, but -- at least on Unix -- escaped quotes are fine, so we can get them, "only" have to deal with them... which may be quite easy actually: if there's a quote, then there's no trailing separator! Yay!
    - On Windows, OTOH... Well, CMD is so fckd anyway, let's just pretend it doesn't exist! Umm... yay!...
I think it could be OK to just leave that to the app, and perhaps give it a hand with a split() function, with a default set of separators (like ",;: <TAB>"), putting the results into
-> Just saving this comment from the source here: //!! const char* split_sep = ",;"; // split("option") will use this by default
- Wait, it could as well be an internal post-processing step then! :) And then there could also be a split rule that the app could override. Perfect. ;)

Either suck at documenting that `[flag]` is not the value, just the presence of "flag", or suck at handling `--flag=`, `--flag=0`, `--flag=false` etc...

Error handling: remove invalid options

Specific test case that brought me here: --take-two 1 should really be easily/readily detectable!

Well, first of all, rule violations should be detected... And then some sort of reporting would be nice... ;)
Even just removing the faulty ones could be better than nothing. (That would actually fit the minimalistic design quite nicely.)

Actually factor out the autotest stuff, it's at least as useful as this project itself

-> https://github.com/xparq/Space_Test

Fix the idiotic CR/LF mixup, wherever it happened...

E.g. -> test/_engine/run_case

GitHub shows it best, check the other files, too!

Should support `--opt=val`, obviously, not (just) the unusual `--opt val`...

Fortunately, at least the entire --opt=val string is captured. (But remember --opt=val1,val2,val3, or perhaps even --opt=v1, v2, v3, or even --val="a b c", v2,v3!...)

Add support for -- to stop arg scanning!

Actually, there's no way currently to have -file as a positional arg! :-o

Fix: Tests: incompatibilities between BusyBox and Git sh

E.g. BB tolerated:

BOM in script files
running script.sh via just script :-o (not via PATHEXT)
missing ./ prefix for running executables in the current dir

(Plus a bunch of other small changes.)

Add a simplified `named` accessor for the most common use case

The current generic one has an array (std::vector) for every named argument, because they may have multiple values.
But they almost never do...

So, there should be a getter to receive a simple map<string, string>
a) in case no args have multiple values,
b) or even if some do, they could just be omitted, by some explicit request (e.g. a different call, or a param.).

OTOH, this feels like such a superficial issue: why would (99% of) anyone not be done with just the direct [] and () named accessors?...

Test: need a way to delete noise from the test output

And test this feature e.g. with the README case, outputting the .exe path, which differs in my local expected results...

Add some basic examples for args that take values

Fix: line 33: signed/unsigned mismatch on 32-bit

Drop const from named() and positional() (or have it both ways?)

const std::vector<std::string>& positional() const { return unnamed_params; }
//! Note: std::map[key] would want to create a key if not yet there,
//! so named()[...] would fail, if const! But constness it is... (for consistency).
//! Use the op() and op[] accessors for direct element access!
const std::map<std::string, std::vector<std::string>>& named() const { return named_params; }

It's pretty orthodox to have them always const. I mean it's yours, change it all you want! :) Especially for #54!...

I can imagine use cases where a pristine command-line, safe from tampering, is nice to have, but as soon as you have the modifying methods, the same object is no longer tamper-proof anyway: just having const accessors, too, won't help. ;)
So, just copy your command-line to a const object if you want a safe reference.

Going with non-const only... (You'll get your compilation errors all right when trying to actually use those methods on a const obj.)
-> Err... OK, but then... No... Other const accessors do call these, so I still need the const pairs after all, too... :-/

Eval. op[] returning `std::optional<string>`

That's what options are, optional, right?

And then show its use prominently in the Readme.

Actually support "min. number of option params"

It's already documented that -n in the rules should mean at "least n"...

Currently only -1 is supported, but that's as a special case, and it's even "broken", in that it means "at least 0" now... :)

Add tests

Copied from sfml-test:

#include "Args.hpp"
#include <iostream> // cerr
using namespace std;

int main(int argc, char* argv[])
{
    Args args(argc, argv, {
        {"moons", 1}, // number of moons to start with
        {"i", -1}, // any number of args up to the next arg or EOS
    });
    //auto exename = args.exename();

    //test: args = Args(argc, argv); // <- no args take params.

    cerr << "--------\n"; for (auto const& p : args.named()) {
        cerr << p.first << (p.second.empty() ? "\n" : " = ");
        for (auto const& v : p.second) cerr << "    " << v << "," <<endl;
    }
    cerr << "--------\n"; for (auto const& p : args.unnamed()) { cerr << p << endl; }

    if (args["?"] || args["h"] || args["help"]) {
        cout << "Usage: [-V] [-moons n]" << endl;
        return 0;
    }
    if (args["V"]) {
        cout << "Version: " << LAST_COMMIT_HASH << endl;
        return 0;
    }
}

Empty (predicate) override option doesn't clear previous values

These all fail (note: the test app has --greedy and -G as "greedy", but that doesn't matter here):

SH echo Override with empty, long greedy
RUN args-test --greedy 1 2 3 --greedy
EXPECT "Override with empty, long
-------- NAMED (1):
greedy
-------- POSITIONAL (0):
"

Should also verify this then:

SH echo Override with empty, short greedy
RUN args-test --G 1 2 3 -G
EXPECT "Override with empty, long
-------- NAMED (1):
G
-------- POSITIONAL (0):
"

And of course the simplest case of empty override:

RUN args-test --thing=1 --thing
EXPECT "Override with empty, long
-------- NAMED (1):
thing
-------- POSITIONAL (0):
"

Non-greedy known ones can't be empty (--take-two takes 2)?

SH echo Override with empty, long (ERROR: insufficient parameters)
RUN args-test --take-two 1 2 3 --take-two
EXPECT "Override with empty, long
-------- NAMED (1):
take-two
-------- POSITIONAL (0):
"

...and:

SH echo Override with empty, short
RUN args-test --G 1 2 3 -G
EXPECT "Override with empty, long
-------- NAMED (1):
G
-------- POSITIONAL (0):
"

`.exename`/`.exedir` shouldn't be a function, probably

Add test to see if // is accepted as positional

It does, but Space Test is having a problem with:

RUN args-test //

Something, somewhere converts it to a single /... Even with '//' or \/\/ (or the even more perverted \'//\')...

It's just fine directly from the command-line with args-test //!... :-o :-/

Remove that lame "functional" pun from the comments...

It's just recursive, that's all...

Fix: repeated args should not append (but overwrite or ignore) if already have enough params

With moons taking 1 param, args-test.exe --moons=11 --unknown=123 --moons=-99 is now incorrect:

NAMED:
moons = 11, -99
unknown = 123

The new moons should be ignored if it's a multi-value arg. and already has enough:

--moons=1,2,3 --unknown=123 --moons=4,5 --moons=6

Should be 1, 2, 3 + error flags.
-> But... Now (after a ~year), however, I think the "it doesn't yet have enough..." case shouldn't actually be supported! -> #44 instead!

And it should overwrite if it's a single-val option.
- Probably also, in general, if it takes a fixed number of params: --moons=1,2,3 --junk --moons=4,5,6?
But what to do if they are greedy: --moons=1,2 --junk --moons=3 -x --moons=4,5,6? Shouldn't this accumulate all the params then? I feel that greediness is a weaker factor, though, than consistency (with the other arities).

Actually...

There should just be a rule for this (similarly to arity), with Override as default, Append as an option...

Support range-based iteration directly on an Args object

for (a : args)  { cout << a << endl; }

But what exactly to include here, and how to uniformly represent each (included) item (token?), despite their various semantic possible roles on the cmdline??

One straightforward option is to only include the positional parameters (simply forwarding to unnamed()).

Add exedir() too

-> #4

Implicit `bool` conv. for `if (args)` checking

Should --arg= actually delete arg, not set it to empty?

The use case is repeating arg with --arg to override any prev. settings.

Done in #57

--arg can kinda still set to empy, albeit it's semantics is a predicate (bool), not "set to empty"!... It's "emptiness" just means [] returns true, but () still returns ""...

But... Even though the override logic should be fixed here (#16!), it's still up to the app, actually, to decide what to do with (arg) == "" or [app] == false, or any other values!

Clean up the semantics!,,, It could be equally unexpected to some/in some cases to set --thing= but have [thing] as false!

Short args can't have values?! Anyway, Args("x") seems to not return it

False alarm; see #43 instead!

Do parse out each short arg from an aggregate (-xyz)...

Add flag to exename() to strip/keep ("the usual") extensions

-> #4

So this shouldn't touch other extensions, only the usual ones (like .exe on Windows, but there could be any other extensions, and the real problem is it could be not at all (or just well-) defined on other systems...

The default (no flag) should be "best-effort" stripping of .exe (any other straightforward ones on mainstream systems??).

And then various params should tune the exact behavior, incl. supplying the string to chop off.

Scripts: Make sure the sh scripts all have only LFs (not CRLFs)

-> xparq/Space_Test#32

Handy one-liner to...() utility methods (since already having <string>)

-> #5... Dup.?

Using std::sto...() with a simpler/nicer API

.stoi(base = 10, default = 0); // falling back to 'default' on error

Perhaps another nice utility to also generate (not just consume) cmdlines

Just dump the current state... Well, and also quote args with spaces... and other scary chars...
- Umm... Except map doesn't preserve order, so we're fucked with the named ones!... :-o (And no, ordered_map doesn't mean the order of insertion either!... :-/ )
- Ummmm... And even if some rudimentary quoting is kinda easy, there's still the open-ended problem of command-lines being consumed by shells, so such a generator must actually speak the (quoting-globbing-escaping) language of a certain particular target shell that it prepares the command-line to!... A hard NO to that!
  - Mmm, but actually... ;) OK, well, just that some generic features which are mostly OK for most shells, or are a good baseline for further custom app-level processing, would still be nice.
    (A quoting example is done below, and some escaping callback lambda could also be added, too.)
    -> Also, wtime does something similar, and it's super handy. See its impl. -- with escaping for Win32::CreateProcess -- below, in another comment!
  - Also, BTW, the escaping function can be a callback!
    - ...at the cost of adding <functional> to the comp. burden! :-/

And then listvals() (already there in the tests) could be added, too, reusing the same mechanics. The tests use a stream as an output:

auto listvals(auto const& container, const char* tail = "\n", const char* sep = ", ")
{
    for (auto v = container.begin(); v != container.end(); ++v)
	    cout << (v == container.begin() ? "":sep)
	         << *v
	         << (v+1 == container.end() ? tail:"");
}

but it could just as well write to a string (and an improved one could even be nice and precalc. its length first -- and an even nicer one that also does auto-quoting, or an even nicer one that does that with multi-char quotes... :) ):

#include <string>
#include <string_view>
#include <cstring>
#include <cassert>

#define FOUND(expr) ((expr) != std::string::npos)
#define CONTAINS(str, chars) FOUND((str).find_first_of(chars))
string listvals(auto const& container, const char prewrap[] = "", const char postwrap[] = "", const char sep[] = ", ",
    const char* quote = "\"", // not const char[], to hint that it accepts nullptr!
    const char* scary_chars = " \t\n")
// The pre/post wrapping are optional parts that only get written if not empty,
// to support cases where callers would otherwise have to add an annoying
// `if (container.empty())` or two themselves.
{
    string result;
    if (!container.empty()) {
	    size_t QLEN = quote ? strlen(quote) : 0;
	    // Precalc. size... (Note: we're processing cmd args. We got time.)
	    size_t size = strlen(prewrap) + (container.size() - 1) * strlen(sep) + strlen(postwrap);
	    for (auto& v : container)
		    size += v.length()
			    + (quote && *quote && CONTAINS(v, scary_chars) ? // add quotes...
				    (QLEN>1 ? QLEN:2) : 0); // special case for 1 (-> pair)!
	    result.reserve(size);
	    // Write...
	    result += prewrap;
	    for (auto v = container.begin(); v != container.end(); ++v) {
		    if (quote && *quote && CONTAINS(*v, scary_chars))
			    { result += string_view(quote, quote + (QLEN/2 ? QLEN/2 : 1)); // special case for 1 quote!
			      result += *v;
			      result += string_view(quote + QLEN/2); }
		    else    { result += *v; }
		    result += (v+1 == container.end() ? postwrap : sep);
	    }
//cout << "\n\n["<<result<<"]: " << "result.length() =? size: " << dec << result.length() << " vs. " << size << "\n\n";
	    assert(result.length() == size);
    }
    return result;
}
#undef FOUND
#undef CONTAINS

...then the args "serializer" could be as simple as (well, but still needs to write to a string, as the other!):

void dumpargs(Args& args, char prefixchar = '-', const char* longprefix = "--")
{
    // Named...
    for (auto& [name, val] : args.named()) {
	    if (name.length() == 1)
		    cout << prefixchar << name << listvals(val, " ", "", " ");
	    else
		    cout << longprefix << name << listvals(val, "=", "", " ");
	    cout << " ";
    }
    // Positional...
    cout << listvals(args.positional(), "", "", " ");
}

Could be extra useful if the named/positional accessors would drop their (pretty orthodox) const (#55)! Then you could manipulate the arg set, and then "render" it to a new command line!

Change short options to "optionally accept" 1 param by default, except the last one, which should take none...?

Also, "optionally accept" 1 param means "take the next param, except if it looks like another named arg" (like greedy consuming of params, but for 1)!

All would be still configurable, of course, incl. having the current behavior exactly. It's just about more intuitive defaults.
The current default of short options being just nullary predicates* has bitten me at OON, with -C cfgfile "not working"...

The new defaults may sound like a convoluted, arbitrary rule, but may be more intuitive actually, as positional args. are
rarely intermixed with named ones in practice (they tend to come after the "switches")!

One big problem, though: during the parsing loop there's no way to know in advance that the "last one" is actually the last one, so that might mean reparsing that whole chunk! Which, well, could be reduced to
- a) flag every "non-rule-driven" (i.e. implicit-arity) short option as "tentatively last" (for later massaging)
- b remove that flag if another named arg comes (either explicit (rule-driven), or implicit (default))
- c) when finished, if there's a "tentatively last" named option, take its param away (assert that it has one, or there's no positional args at all!), and add it to the positionals
And another, possibly even a show-stopper for this change: this would prevent intermixing named and unnamed options at all, without requiring to define rules for each (named) option! :-o
But... This may just be how command-line args should actually work? I can't come up right away with a practical example, where unnamed options are not in fact just params of some named arg. But scripts that need to tuck positional args at the end of half-assembled command strings, without knowing what came before, may suffer!
- Implement it anyway, and take it for a long test ride!

* Wait, no... :-o This is what the general test case says:

RUN args-test.exe --one 99 2 3 4 -i a b c -h --long=1

EXPECT "\
-------- NAMED:
h
i = a, b, c
...

WTF is going on then? :-o

Arrrghs!... :) The test exe has non-default rules, and has -i defined as greedy!

Scripts: Add a `run_test` (singular) front-end script too, which could pass extra args along to the test case

This is to help debugging individual cases without entering the test case dir manually just for executing a local exe there, or even just changing the last command too much...

In this mode the EXPECT phase will likely be off, so a warning should be printed.

Note: the params would be appended to every RUN/SH statement of the case.

Wow, the header guard was still missing :)

I only noticed due to some weird #include hacking in another project... :)

Add name=val support by default (+ flag to disable)

Undefined long args are too greedy by default, eating up positional args!

E.g., these are failing:

RUN args-test --arg=1 not-a-param!
EXPECT "\
-------- NAMED:
arg = 1
-------- POSITIONAL:
not-a-param!
"

RUN args-test --arg 1 not-a-param!
EXPECT "\
-------- NAMED:
arg
-------- POSITIONAL:
1
not-a-param!
"

Error handling: consider an "unknown" container for unexpected options

But be careful not to screw up the useful simplicity by e.g. removing the unknown options from the regular containers (named or positional), because it would be suicide to always require a cumbersome ruleset for checking the supported options!

This should strictly be an optional aid for processing errors.

Tests: Merge the two main `Makefile` flavors

The build cmd should just come from a macro with a dynamic name, like $(BUILD$(TOOLSET)).

Support quoted params

Well, it is supported, implicitly, as the shell (or the C runtime on Windows) already passes quoted strings (with spaces) as single argv words.

But the processing is far from intuitive sometimes, so improvements might be welcome. Via other, more specific issues, tho, as they come up.

Test: Don't forget to run the test GHA on both 32 and 64 bits

`--` should close named arg. lists

There's no way currently to have -1 or "--++##XXX#++--" etc. as positional args, or allow things like /c or //file/... both as an option and a path. (However, that // for long opts. really shouldn't be supported at all... Add a (negative) test for that!)

But, since ~1.9 there's at least a RejectUnknown flag, the only thing left is uncomment (and implement) it. ;)
So, seeing -- could perhaps just turn that flag on, and call it a day! :-o (Which would be awkward for consistency, in case someone would like to check the flags after parsing. It's not the a very appealing attitude to use config data arbitrarily as mutable state. :) )
- And it's also wrong! Even with known args, where RejectUnknown has no effect, there should be a way to disable them (to get filenames like --known).
Also, then -- -- should obviously result in a positional arg. called --. (Test it, as it's probably easy to f* up!)

Scripts: use niceties like `$@`, `local`, `...=(*)` etc., and undo all the tedious rookie `${var}` chickening...

I mean, now that I actually looked at the ash manpage, I can be sure that $var is just fine. :) (Yyyep, it would have taken less effort to look it up upfront than all the repeatedly mistyped { and }...)

xparq / args Goto Github PK

args's Introduction

args's People

Contributors

Stargazers

Watchers

args's Issues

Actually...

Recommend Projects

Recommend Topics

Recommend Org

Jobs