GithubHelp home page GithubHelp logo

xparq / args Goto Github PK

View Code? Open in Web Editor NEW
1.0 1.0 0.0 428 KB

Perhaps the tiniest C++ cmdline processor with a serious feature set

C++ 31.12% Shell 60.25% Batchfile 8.39% PowerShell 0.24%

args's Introduction

FEATURES
--------

(Just to codify existing behavior as "expected" (and testable), rather than leaving them
"accidental"...)

- [x] Classic named (option) and unnamed (positional) arguments
      - [x] intermixed
- [x] Prefix char either - or / freely mixed,
      - [ ] but that can be disabled
- [x] Both short and long options: -x --long
      - [x] Long options only as --long (so //whaaat is always positional)
- [x] Aggregated short options: -xyz
      - [x] with the last one possibly taking values: -xyz param-for-z
      - [x] multiple values, too: -xyZ Zval1 Zval2
      - [x] greedy, too: -xyZ Zval1 Zval2 ... Zvalx-up to -this
- [x] A bare -- turns off named args. for the rest of the cmdline by default, but it
      - [x] can be configured to be a regular positional arg. (*for now it always is!*)
- [x] Options are predicates by default, with simple bool checks: args["x"], args["long"]
- [x] Long options can take values without config.: --name=val
- [x] Any option can take values if configured so: -a file --except *pattern
      - [x] long ones also without = in this case
      - [x] query (as std::string): args("a") -> "file", args("except") -> "*pattern"
- [x] Outputs also available in args.named() -> std::map, args.positional() -> std::vector
      - [x] Use the non-const accessors to modify these containers as you wish
            (they are *yours*, right? ;) especially after parsing...)
- [x] Options (short or long) can also have multiple parameters --multi a b c
      - [x] query like: args("multi", 2) -> "c",
      - [x] or get them all with args.named("multi") -> std::vector{"a", "b", "c"}
- [x] Options can be set to "greedy" to take each value up to the next opt.,
      - [x] or only a fixed n. of values
- [x] Repeated options override earlier ones by default
- [x] Repeated options can also be set to
      - [x] be ignored,
      - [x] append (for multi-val opts.),
      - [x] fail
- [x] Parsing on construction: Args args(argc, argv)
- [x] Deferred parsing: Args args; args.parse(argc, argv)
- [x] Reparsing with different config: reparse(flags = Defaults, rules = {})
      - [x] The instance can be reused for completely new parses, too:
            parse(new_argc, new_argv, flags = Defaults, rules = {})
      - [x] The last used argc/argv are available as args.argc, args.argv
            (in case they're needed outside of main(), e.g. via myApp.args)
- [x] exename(): argv[0], but stripping the path and
      - [x] the extension (".exe" by default, but -> exename(false, ".mysuffix"),
      - [x] unless its "true value" :) is requested with exename(true)
- [x] Quick bool check if there have been any args: if (args), if (!args)


EXAMPLES
--------

- A simple one:

	#include "Args.hpp"
	#include <iostream>
	using std::cout;
	int main(int argc, char** argv)
	{
		Args args(argc, argv);

		if (args)
			cout << "Some args are present.\n";

		if (!args || args["h"])
			cout << "Usage: " << args.exename() << " "
		             << "[-h] [-x] [--long] [whatever...]\n";

		if (args["x"])
			cout << "  'x' was set\n";

		if (args["long"])
			cout << "  'long' was set"
			     << (args("long").empty() ? "" : " to " + args("long"))
			     << '\n';

		for (auto a: args.positional())
			cout << "  positional arg.: " << a << '\n';
	}

args's People

Contributors

xparq avatar y-onodera01 avatar

Stargazers

 avatar

Watchers

 avatar

args's Issues

Perhaps another nice utility to also generate (not just consume) cmdlines

  • Just dump the current state... Well, and also quote args with spaces... and other scary chars...

    • Umm... Except map doesn't preserve order, so we're fucked with the named ones!... :-o (And no, ordered_map doesn't mean the order of insertion either!... :-/ )
    • Ummmm... And even if some rudimentary quoting is kinda easy, there's still the open-ended problem of command-lines being consumed by shells, so such a generator must actually speak the (quoting-globbing-escaping) language of a certain particular target shell that it prepares the command-line to!... A hard NO to that!
      • Mmm, but actually... ;) OK, well, just that some generic features which are mostly OK for most shells, or are a good baseline for further custom app-level processing, would still be nice.
        (A quoting example is done below, and some escaping callback lambda could also be added, too.)
  • And then listvals() that's used in the tests could be added, too, as that could just the very same mechanics. The tests use a stream as an output:

    auto listvals(auto const& container, const char* tail = "\n", const char* sep = ", ")
    {
        for (auto v = container.begin(); v != container.end(); ++v)
    	    cout << (v == container.begin() ? "":sep)
    	         << *v
    	         << (v+1 == container.end() ? tail:"");
    }
    

    but it could just as well write to a string (and an improved one could even be nice and precalc. its length first -- and an even nicer one that also does auto-quoting, or an even nicer one that does that with multi-char quotes... :) ):

    #include <string>
    #include <string_view>
    #include <cstring>
    #include <cassert>
    
    #define FOUND(expr) ((expr) != std::string::npos)
    #define CONTAINS(str, chars) FOUND((str).find_first_of(chars))
    string listvals(auto const& container, const char prewrap[] = "", const char postwrap[] = "", const char sep[] = ", ",
        const char* quote = "\"", // not const char[], to hint that it accepts nullptr!
        const char* scary_chars = " \t\n")
    // The pre/post wrapping are optional parts that only get written if not empty,
    // to support cases where callers would otherwise have to add an annoying
    // `if (container.empty())` or two themselves.
    {
        string result;
        if (!container.empty()) {
    	    size_t QLEN = quote ? strlen(quote) : 0;
    	    // Precalc. size... (Note: we're processing cmd args. We got time.)
    	    size_t size = strlen(prewrap) + (container.size() - 1) * strlen(sep) + strlen(postwrap);
    	    for (auto& v : container)
    		    size += v.length()
    			    + (quote && *quote && CONTAINS(v, scary_chars) ? // add quotes...
    				    (QLEN>1 ? QLEN:2) : 0); // special case for 1 (-> pair)!
    	    result.reserve(size);
    	    // Write...
    	    result += prewrap;
    	    for (auto v = container.begin(); v != container.end(); ++v) {
    		    if (quote && *quote && CONTAINS(*v, scary_chars))
    			    { result += string_view(quote, quote + (QLEN/2 ? QLEN/2 : 1)); // special case for 1 quote!
    			      result += *v;
    			      result += string_view(quote + QLEN/2); }
    		    else    { result += *v; }
    		    result += (v+1 == container.end() ? postwrap : sep);
    	    }
    //cout << "\n\n["<<result<<"]: " << "result.length() =? size: " << dec << result.length() << " vs. " << size << "\n\n";
    	    assert(result.length() == size);
        }
        return result;
    }
    #undef FOUND
    #undef CONTAINS
    
  • ...then the args "serializer" could be as simple as (well, but still needs to write to a string, as the other!):

    void dumpargs(Args& args, char prefixchar = '-', const char* longprefix = "--")
    {
        // Named...
        for (auto& [name, val] : args.named()) {
    	    if (name.length() == 1)
    		    cout << prefixchar << name << listvals(val, " ", "", " ");
    	    else
    		    cout << longprefix << name << listvals(val, "=", "", " ");
    	    cout << " ";
        }
        // Positional...
        cout << listvals(args.positional(), "", "", " ");
    }
    
  • Could be extra useful if the named/positional accessors would drop their (pretty orthodox) const (#55)! Then you could manipulate the arg set, and then "render" it to a new command line!

Drop const from named() and positional() (or have it both ways?)

const std::vector<std::string>& positional() const { return unnamed_params; }
//! Note: std::map[key] would want to create a key if not yet there,
//! so named()[...] would fail, if const! But constness it is... (for consistency).
//! Use the op() and op[] accessors for direct element access!
const std::map<std::string, std::vector<std::string>>& named() const { return named_params; }

It's pretty orthodox to have them always const. I mean it's yours, change it all you want! :) Especially for #54!...

I can imagine use cases where a pristine command-line, safe from tampering, is nice to have, but as soon as you have the modifying methods, the same object is no longer tamper-proof anyway: just having const accessors, too, won't help. ;)
So, just copy your command-line to a const object if you want a safe reference.

Going with non-const only... (You'll get your compilation errors all right when trying to actually use those methods on a const obj.)
-> Err... OK, but then... No... Other const accessors do call these, so I still need the const pairs after all, too... :-/

Add flag to exename() to strip/keep ("the usual") extensions

-> #4

So this shouldn't touch other extensions, only the usual ones (like .exe on Windows, but there could be any other extensions, and the real problem is it could be not at all (or just well-) defined on other systems...

The default (no flag) should be "best-effort" stripping of .exe (any other straightforward ones on mainstream systems??).

And then various params should tune the exact behavior, incl. supplying the string to chop off.

`--` should close named arg. lists

There's no way currently to have -1 or "--++##XXX#++--" etc. as positional args, or allow things like /c or //file/... both as an option and a path. (However, that // for long opts. really shouldn't be supported at all... Add a (negative) test for that!)

  • But, since ~1.9 there's at least a RejectUnknown flag, the only thing left is uncomment (and implement) it. ;)
  • So, seeing -- could perhaps just turn that flag on, and call it a day! :-o (Which would be awkward for consistency, in case someone would like to check the flags after parsing. It's not the a very appealing attitude to use config data arbitrarily as mutable state. :) )
    • And it's also wrong! Even with known args, where RejectUnknown has no effect, there should be a way to disable them (to get filenames like --known).
  • Also, then -- -- should obviously result in a positional arg. called --. (Test it, as it's probably easy to f* up!)

Error handling: remove invalid options

Specific test case that brought me here: --take-two 1 should really be easily/readily detectable!

Well, first of all, rule violations should be detected... And then some sort of reporting would be nice... ;)
Even just removing the faulty ones could be better than nothing. (That would actually fit the minimalistic design quite nicely.)

Error handling: consider an "unknown" container for unexpected options

But be careful not to screw up the useful simplicity by e.g. removing the unknown options from the regular containers (named or positional), because it would be suicide to always require a cumbersome ruleset for checking the supported options!

This should strictly be an optional aid for processing errors.

Add test to see if // is accepted as positional

It does, but Space Test is having a problem with:

RUN args-test //

Something, somewhere converts it to a single /... Even with '//' or \/\/ (or the even more perverted \'//\')...

It's just fine directly from the command-line with args-test //!... :-o :-/

Add tests

Copied from sfml-test:

#include "Args.hpp"
#include <iostream> // cerr
using namespace std;

int main(int argc, char* argv[])
{
    Args args(argc, argv, {
        {"moons", 1}, // number of moons to start with
        {"i", -1}, // any number of args up to the next arg or EOS
    });
    //auto exename = args.exename();

    //test: args = Args(argc, argv); // <- no args take params.

    cerr << "--------\n"; for (auto const& p : args.named()) {
        cerr << p.first << (p.second.empty() ? "\n" : " = ");
        for (auto const& v : p.second) cerr << "    " << v << "," <<endl;
    }
    cerr << "--------\n"; for (auto const& p : args.unnamed()) { cerr << p << endl; }

    if (args["?"] || args["h"] || args["help"]) {
        cout << "Usage: [-V] [-moons n]" << endl;
        return 0;
    }
    if (args["V"]) {
        cout << "Version: " << LAST_COMMIT_HASH << endl;
        return 0;
    }
}

Should --arg= actually delete arg, not set it to empty?

The use case is repeating arg with --arg to override any prev. settings.

--arg can kinda still set to empy, albeit it's semantics is a predicate (bool), not "set to empty"!... It's "emptiness" just means [] returns true, but () still returns ""...

But... Even though the override logic should be fixed here (#16!), it's still up to the app, actually, to decide what to do with (arg) == "" or [app] == false, or any other values!

  • Clean up the semantics!,,, It could be equally unexpected to some/in some cases to set --thing= but have [thing] as false!

Support quoted params

Well, it is supported, implicitly, as the shell (or the C runtime on Windows) already passes quoted strings (with spaces) as single argv words.

But the processing is far from intuitive sometimes, so improvements might be welcome. Via other, more specific issues, tho, as they come up.

Fix: Tests: The set-runner lies an overall "OK" result with failed cases!

Alas, find won't fail on its -exec children:

find -name '*.case' -not -path "./_*/*" -exec ${runner} \{\} \;

Trying with xargs:

find -name '*.case' -not -path "./_*/*" -print0 | xargs -0 -n1 ${runner}

But my alternative (selective) runner loop also had a bug! set failed=1 instead of failed=1... :-/

Testing the GHA with an intentionally broken test case... OK, did "fail properly".

Fix: repeated args should not append (but overwrite or ignore) if already have enough params

With moons taking 1 param, args-test.exe --moons=11 --unknown=123 --moons=-99 is now incorrect:

NAMED:
moons = 11, -99
unknown = 123
  • The new moons should be ignored if it's a multi-value arg. and already has enough:

    --moons=1,2,3 --unknown=123 --moons=4,5 --moons=6

Should be 1, 2, 3 + error flags.
-> But... Now (after a ~year), however, I think the "it doesn't yet have enough..." case shouldn't actually be supported! -> #44 instead!

  • And it should overwrite if it's a single-val option.
    • Probably also, in general, if it takes a fixed number of params: --moons=1,2,3 --junk --moons=4,5,6?
  • But what to do if they are greedy: --moons=1,2 --junk --moons=3 -x --moons=4,5,6? Shouldn't this accumulate all the params then? I feel that greediness is a weaker factor, though, than consistency (with the other arities).

Actually...

  • There should just be a rule for this (similarly to arity), with Override as default, Append as an option...

Support range-based iteration directly on an Args object

for (a : args)  { cout << a << endl; }

But what exactly to include here, and how to uniformly represent each (included) item (token?), despite their various semantic possible roles on the cmdline??

One straightforward option is to only include the positional parameters (simply forwarding to unnamed()).

Empty (predicate) override option doesn't clear previous values

These all fail (note: the test app has --greedy and -G as "greedy", but that doesn't matter here):

SH echo Override with empty, long greedy
RUN args-test --greedy 1 2 3 --greedy
EXPECT "Override with empty, long
-------- NAMED (1):
greedy
-------- POSITIONAL (0):
"

Should also verify this then:

SH echo Override with empty, short greedy
RUN args-test --G 1 2 3 -G
EXPECT "Override with empty, long
-------- NAMED (1):
G
-------- POSITIONAL (0):
"

And of course the simplest case of empty override:

RUN args-test --thing=1 --thing
EXPECT "Override with empty, long
-------- NAMED (1):
thing
-------- POSITIONAL (0):
"

Non-greedy known ones can't be empty (--take-two takes 2)?

SH echo Override with empty, long (ERROR: insufficient parameters)
RUN args-test --take-two 1 2 3 --take-two
EXPECT "Override with empty, long
-------- NAMED (1):
take-two
-------- POSITIONAL (0):
"

...and:

SH echo Override with empty, short
RUN args-test --G 1 2 3 -G
EXPECT "Override with empty, long
-------- NAMED (1):
G
-------- POSITIONAL (0):
"

Add a simplified `named` accessor for the most common use case

The current generic one has an array (std::vector) for every named argument, because they may have multiple values.
But they almost never do...

So, there should be a getter to receive a simple map<string, string>
a) in case no args have multiple values,
b) or even if some do, they could just be omitted, by some explicit request (e.g. a different call, or a param.).

OTOH, this feels like such a superficial issue: why would (99% of) anyone not be done with just the direct [] and () named accessors?...

Change short options to "optionally accept" 1 param by default, except the last one, which should take none...?

  • Also, "optionally accept" 1 param means "take the next param, except if it looks like another named arg" (like greedy consuming of params, but for 1)!

All would be still configurable, of course, incl. having the current behavior exactly. It's just about more intuitive defaults.
The current default of short options being just nullary predicates* has bitten me at OON, with -C cfgfile "not working"...

The new defaults may sound like a convoluted, arbitrary rule, but may be more intuitive actually, as positional args. are
rarely intermixed with named ones in practice (they tend to come after the "switches")!

  • One big problem, though: during the parsing loop there's no way to know in advance that the "last one" is actually the last one, so that might mean reparsing that whole chunk! Which, well, could be reduced to

    • a) flag every "non-rule-driven" (i.e. implicit-arity) short option as "tentatively last" (for later massaging)
    • b remove that flag if another named arg comes (either explicit (rule-driven), or implicit (default))
    • c) when finished, if there's a "tentatively last" named option, take its param away (assert that it has one, or there's no positional args at all!), and add it to the positionals
  • And another, possibly even a show-stopper for this change: this would prevent intermixing named and unnamed options at all, without requiring to define rules for each (named) option! :-o
    But... This may just be how command-line args should actually work? I can't come up right away with a practical example, where unnamed options are not in fact just params of some named arg. But scripts that need to tuck positional args at the end of half-assembled command strings, without knowing what came before, may suffer!

    • Implement it anyway, and take it for a long test ride!

* Wait, no... :-o This is what the general test case says:

RUN args-test.exe --one 99 2 3 4 -i a b c -h --long=1

EXPECT "\
-------- NAMED:
h
i = a, b, c
...

WTF is going on then? :-o

Arrrghs!... :) The test exe has non-default rules, and has -i defined as greedy!

Actually support "min. number of option params"

It's already documented that -n in the rules should mean at "least n"...

Currently only -1 is supported, but that's as a special case, and it's even "broken", in that it means "at least 0" now... :)

Add a default ctor to allow deferred parsing

Otherwise this couldn't work:

Args args (argc, argv); // or just Args args;

try {
    args.parse();  // nice and cosy place to catch errors!
    // or: args.parse(argc, argv);  // could also reset & reparse with other sets of args; could be handy!
} catch(...) {
    // boo!
}

if (args["flag"]) // use args...

Support --multi=1,2,3 (not just --multi 1 2 3)!

  • Especially as --multi=1 2 3 also works (however surprising/counter-intuitive it may be), if it's defined to take a fixed number of params. Which is kinda half the solution, it's "just" that the space should also accept a ','... ;) (Well, no... that space is processed in a totally different context, AFAICR.)

  • But that would beg the question of "shouldn't we also support --multi=1, 2, 3 then?!"...

    • And then the whole mess of quoting! (--multi=1 ", or, " 3) Which is a) a shitstorm on Windows, and b) would not help on Unix anyway, as the comma would just still be there all the same, quoted or not... Some arbitrary escaping could help there, something different from what the shell already does (to avoid the usual "how many \ now?!", or on Windows: "so, why exactly ^' does fkn nothing here, again?!")! Yuck!
      • Oh, but -- at least on Unix -- escaped quotes are fine, so we can get them, "only" have to deal with them... which may be quite easy actually: if there's a quote, then there's no trailing separator! Yay!
        • On Windows, OTOH... Well, CMD is so fckd anyway, let's just pretend it doesn't exist! Umm... yay!...
  • I think it could be OK to just leave that to the app, and perhaps give it a hand with a split() function, with a default set of separators (like ",;: <TAB>"), putting the results into
    -> Just saving this comment from the source here: //!! const char* split_sep = ",;"; // split("option") will use this by default

    • Wait, it could as well be an internal post-processing step then! :) And then there could also be a split rule that the app could override. Perfect. ;)

Add `exename()` (or something like that)

argv[0] tends to return the full path, but e.g. in help messages or in test cases etc. the exe name would be more useful, and it's tedious to manually carve it out from the path.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.