perlpunk / yaml-pp-p5 Goto Github PK

View Code? Open in Web Editor NEW

23.0 4.0 8.0 1.62 MB

A YAML 1.2 processor in perl

Home Page: https://metacpan.org/pod/YAML::PP

Perl 99.76% Shell 0.14% Raku 0.08% Makefile 0.02%

yaml perl5 parser yaml-parser yaml-processor

yaml-pp-p5's Introduction

Perl module YAML::PP

YAML::PP is a modular YAML processor for YAML 1.2.

Additionally to loading and dumping it provides a parser and emitter. The parsing events are compatible to the YAML Test Suite and other libraries like libyaml.

Loading and Dumping can be customized.

It supports the YAML 1.2 Failsafe, JSON and Core Schemas.

You can find the full documentation here: https://metacpan.org/release/YAML-PP

yaml-pp-p5's People

Contributors

Stargazers

Watchers

Forkers

majensen pplu amrysliu pkg-perl-tools ingydotnet clayne bpj sysfce2

yaml-pp-p5's Issues

Emitter formats empty array oddly

YAML::PP::Dump({ foo => [] });

yields:

foo:
  []

would be more normal to get:

foo: []

Improper rejection of plain scalar with first character not in printable set of \u0080-\u00FF

YAML::PP fails to load valid YAML files that have plain scalars that start with a printable Unicode character in the range \u0080 through \u00FF. That is, characters that are printable in ASCII and Unicode work as first character, as do characters that are \u0100 or higher. Affected characters work fine if they're not the first as well.

I suspect the common Perl "Unicode bug" in the regexs handling plain scalars but I wasn't able to easily identify a fix within YAML::PP. Quoting the plain scalar is a sufficient workaround.

YAML::PP is version 0.009, as installed from App::Cpanminus under perlbrew using perl 5.28.0.

Output

The bug manifests in a Perl exception that generates output similar to:

$ perl test2.pl 
Line      : 5
Column    : 15
Expected  : ALIAS DOUBLEQUOTE FLOWMAP_START FLOWSEQ_START FOLDED LITERAL PLAIN SINGLEQUOTE
Got       : Invalid plain scalar
Where     : perl-5.28.0/lib/site_perl/5.28.0/YAML/PP/Parser.pm line 516
YAML      : "\x{c9}ric Bischoff\n"
  at perl-5.28.0/lib/site_perl/5.28.0/YAML/PP/Loader.pm line 60.

The "\x{c9}ric Bischoff" is Éric Bischoff in the source YAML file (with the É being \u00c9, the exact UTF-8 bytes in the source files are c3 89).

Troubleshooting Performed

I have confirmed the source file is both valid UTF-8 (using iconv) and valid YAML (using various online validators). The YAML 1.2 spec appears to say this should work with all printable Unicode characters that aren't "indicators" or otherwise confusable with other YAML syntax.

I mentioned that higher Unicode code points are unaffected. In fact a name of ☃ric Bischoff (snowman as first character, \u2603) works perfectly.

The workaround around we identified for those who can't change their name so easily is to quote the name, which YAML::PP parses fine.

Test case

This test case reproduces the bug and prints out characters which cause YAML::PP to fail to load in the range noted.

use 5.014;
use YAML::PP qw(Load);
use feature 'unicode_strings';

# Allow Perl to spit out UTF-8 to STDOUT
binmode STDOUT, ':encoding(UTF-8)';

my $base = "description: Foo\nmembers:\n- displayname: ";
# Toggle single-quoting or plain scalar testcase
$base .= @ARGV ? "'Xric Bischoff'" : "Xric Bischoff";
my $index = index ($base, 'X');
say "$base\n\n---------\nReplacing 'X' with other printable chars:";

for my $char (0x21 .. 0x110) {
    my $str = $base;

    # Unprintable chars are not valid parts of a plain scalar
    my $replacement = chr($char);
    next if $replacement !~ /[[:print:]]/;

    substr ($str, $index, 1, $replacement);

    my $data = eval { Load($str) };
    say sprintf ("\\x%X (%s)", $char, $replacement), " doesn't work" if ($@);
}

quote special YAML keywords when dumping

Hello,

perl -Ilib -MYAML::PP -MData::Dumper -E '
  my $x = YAML::PP::->new->load_string(qq(---\ntest:\n  - "no"\n  - "3x"\n  - "true"));
  print Dumper($x);
  print YAML::PP::->new->dump_string($x)'

"no" and "3x" should be quoted in resulting YAML as "true" is.
Perhaps am I missing something?

Dump: Block styles, folding, etc for multi-line strings

I'd love to see support for block notations, literal style, folded sytle, etc added.. Or maybe it is and I just can't figure out how to make it work (especially since it seems like Emitter.pm has some handling for this starting at around line 356).

# block notation (newlines become spaces)
content:
  Arbitrary free text
  over multiple lines stopping
  after indentation changes...

# literal style (newlines are preserved)
content: |
  Arbitrary free text
  over "multiple lines" stopping
  after indentation changes...

# + indicator (keep extra newlines after block)
content: |+
  Arbitrary free text with two newlines after


# - indicator (remove extra newlines after block)
content: |-
  Arbitrary free text without newlines after it


# folded style (folded newlines are preserved)
content: >
  Arbitrary free text
  over "multiple lines" stopping
  after indentation changes...

I would have hoped it would just work for multi-line strings either starting or ending with (or maybe even containing) one or more newlines...

#!/usr/bin/env perl
use JSON::XS;
use YAML::PP;
use Data::Printer;

my $ypp = YAML::PP->new(boolean => 'JSON::PP', schema => ['JSON']);

my $string = <<END;
This is a multiline string

Testing heredoc and folding in YAML dumping

It doesn't seem to work

END

my $data = {
    properties => {
        foo => {
            in => 'query',
            required => JSON::XS->true,
            description => 'a test',
        },
        bar => {
            in => 'query',
            required => JSON::XS->false
        },
        string => $string
    }
};

print $ypp->dump_string($data);

Current output:

❯ perl yaml2.pl
---
properties:
  bar:
    in: query
    required: false
  foo:
    description: a test
    in: query
    required: true
  string: "This is a multiline string\n\nTesting heredoc and folding in YAML dumping\n\nIt doesn't seem to work\n\n"

Would love to see

❯ perl yaml2.pl
---
properties:
  bar:
    in: query
    required: false
  foo:
    description: a test
    in: query
    required: true
  string: |+
    This is a multiline string

    Testing herdoc and folding in YAML dumping

    It doesn't seem to work

Support binary data via `!!binary` tag

Example:

canonical: !!binary "\
 R0lGODlhDAAMAIQAAP//9/X17unp5WZmZgAAAOfn515eXvPz7Y6OjuDg4J+fn5\
 OTk6enp56enmlpaWNjY6Ojo4SEhP/++f/++f/++f/++f/++f/++f/++f/++f/+\
 +f/++f/++f/++f/++f/++SH+Dk1hZGUgd2l0aCBHSU1QACwAAAAADAAMAAAFLC\
 AgjoEwnuNAFOhpEMTRiggcz4BNJHrv/zCFcLiwMWYNG84BwwEeECcgggoBADs="
generic: !!binary |
 R0lGODlhDAAMAIQAAP//9/X17unp5WZmZgAAAOfn515eXvPz7Y6OjuDg4J+fn5
 OTk6enp56enmlpaWNjY6Ojo4SEhP/++f/++f/++f/++f/++f/++f/++f/++f/+
 +f/++f/++f/++f/++f/++SH+Dk1hZGUgd2l0aCBHSU1QACwAAAAADAAMAAAFLC
 AgjoEwnuNAFOhpEMTRiggcz4BNJHrv/zCFcLiwMWYNG84BwwEeECcgggoBADs=
description:
 The binary value above is a tiny arrow encoded as a gif image.

dump_string must not care about Perl's internal representation of a variable

As of 0.0016 dump string actually cares what the internal UTF8 flag of a scalar is set to and behaves differently depending on what state it is in.

Perl's current internal representation should have no user visible effects. A library should be able to return a scalar containing the bytes \x{C3}\x{A9} with the UTF8 flag off, or a scalar containing \x{C3}\x{83}\x{C2}\x{A9} with the UTF8 flag on, and both be considered the byte sequence for é when decoded as UTF-8.

Option to indent lists relative to mapping keys

Currently lists as values of mapping keys are not indented relative to mapping keys, so that e.g.

{ foo => ['bar', 'baz'] }

becomes

foo:
- bar
- baz

rather than

foo:
  - bar
  - baz

I understand that this is done so that the output shall contain as little whitespace as possible but it is a problem when using indentation as fold method in Vim because such lists will also not stay folded when the level the key is on is unfolded.

The best solution would probably be an option to indent lists relative to mapping keys, since this is a matter of taste.

Implement emitting flow style collections

Currently the emitter can only emit block style sequences and mappings, not flow style (except empty ones, e.g. [], {})

Unsupported YAML structure

Hi,

I've found a piece of YAML in the wild that YAML::PP doesn't accept. (https://github.com/awslabs/aws-data-lake-solution/tree/master/deployment). The thing that is not parsing is:

RoleName: !Join ["-", ["data-lake-kibana-configure-role", Ref: "AWS::Region" ]]

with:

Line      : 1
Column    : 62
Expected  : EOL FLOWSEQ_END FLOW_COMMA WS
Got       : COLON
Where     : /redacted/local/lib/perl5/YAML/PP/Parser.pm line 367
YAML      : ": \"\"AWS::Region\" ]]\n"
  at /redacted/local/lib/perl5/YAML/PP/Loader.pm line 57.

I've tried with latest YAML::PP 0.018 (I first tried with 0.016 also).

It would seem that the string is valid YAML for some parser since AWS is accepting it. Can we get YAML::PP to parse this YAML? Can I help in some way?

I attach a small script that reproduces the error:

#!/usr/bin/env perl

use strict;
use warnings;

use YAML::PP;

my $string = <<EOF;
RoleName: !Join ["-", ["data-lake-kibana-configure-role", Ref: "AWS::Region" ]]
EOF

my $pp = YAML::PP->new;
my $s = $pp->load_string($string);

use Data::Dumper;
print Dumper($s);

anchors don't survive when files are included using the Include Schema

I'd like to use included yaml files to keep common configuration data. For example:

# db.yml
db : &db
  host : hostname
  port : 5432

And in the yaml files used by the applications,

# app1.yml
config : !include db.yml
args :
   db : *db

But (perhaps unsurprisingly) the anchor is not carried over from the included file:

my $include = YAML::PP::Schema::Include->new;
my $yp      = YAML::PP->new( schema => [ q{+}, 'Merge', $include ] );
$include->yp($yp);
p $yp->load_file('app1.yml');

Results in

No anchor defined for alias 'db' at .../YAML/PP/Parser.pm line 61.
 at .../YAML/PP/Loader.pm line 94.

I find that I use anchors quite a bit to avoid duplication, and the ability for them to exist across included file boundaries would be quite helpful.

t/54.glob.t fails on perl 5.8.8 or lower

perl 5.8.8 or lower has a bug related qr//m (see Perl/perl5#1783).
That is, on perl 5.8.8 or lower, the following script prints "not match"

#!/usr/bin/env perl
use strict;
use warnings;

my $str = <<'EOF';
a
b
EOF

print $str =~ qr/a$/m ? "match" : "not match", "\n";

As a result, t/54.glob.t fails on perl 5.8.8 or lower.

❯ perl -v
This is perl, v5.8.8 built for darwin-2level

❯ prove -l t/54.glob.t
t/54.glob.t .. 1/?
    #   Failed test 'IO::Scalar fileno correct'
    #   at t/54.glob.t line 181.
    #                   '--- !perl/glob:IO::File
    # IO:
    #   fileno: 4
    #   stat:
    #     atime: 1607237357
    #     blksize: 4096
    #     blocks: 16
    #     ctime: 1607237355
    #     device: 16777221
    #     gid: 20
    #     inode: 26870573
    #     links: 1
    #     mode: 33188
    #     mtime: 1607237355
    #     rdev: 0
    #     size: 5126
    #     uid: 501
    #   tell: 0
    # NAME: GEN0
    # PACKAGE: Symbol
    # '
    #     doesn't match '(?m-xis:fileno: 4$)'
    # Looks like you failed 1 test of 2.

#   Failed test 'ioscalar'
#   at t/54.glob.t line 182.
# Looks like you failed 1 test of 3.
t/54.glob.t .. Dubious, test returned 1 (wstat 256, 0x100)
Failed 1/3 subtests

Test Summary Report
-------------------
t/54.glob.t (Wstat: 256 Tests: 3 Failed: 1)
  Failed test:  2
  Non-zero exit status: 1
Files=1, Tests=3,  0 wallclock secs ( 0.02 usr  0.01 sys +  0.09 cusr  0.01 csys =  0.13 CPU)
Result: FAIL

ypp fails to parse !~

https://play.yaml.io/main/parser?input=LSAhfiBmb28K

Portable serialization for regexes across perl versions

Currently a regex serialized in a recent perl version can not always be loaded in perl < 5.14.
Example: (?^u:regex)

Just removing the ^u can lead to wrong results (qr{\w}u and qr(\w} behave differently).

YAML OpenAPI format/profile

Hello! I would like to ask some information about YAML::PP and OpenAPI YAML files.

OpenAPI specification is at website https://swagger.io/specification/ and says that it uses YAML 1.2 format with some constrains:

Tags MUST be limited to those allowed by the JSON Schema ruleset.
Keys used in YAML maps MUST be limited to a scalar string, as defined by the YAML Failsafe schema ruleset.

Can YAML::PP parse these YAML files with above constrains correctly?

And could be YAML::PP configured to generate YAML file from Perl structure according to these constrains?

Support both `!perl/...` and `!!perl/...` tags

The current behaviour of existing perl modules:

Dump:

# my $object = bless {}, "Foo";
# YAML.pm
--- !!perl/hash:Foo {}
# YAML::XS
--- !!perl/hash:Foo {}
# YAML::Syck
--- !!perl/hash:Foo {}

Load:

# --- !perl/hash:Foo {}
# YAML::Syck
Foo=HASH(0x556c4af24858)
# YAML.pm
Foo=HASH(0x556c4af24858)
# YAML::XS
perl/hash:Foo=HASH(0x564a5359d360)

# --- !!perl/hash:Foo {}
# YAML::Syck
Foo=HASH(0x556c4af2f4b8)
# YAML.pm
Foo=HASH(0x556c4af2f4b8)
# YAML::XS
Foo=HASH(0x564a5359d858)

So all modules use two exclamation marks when dumping.
YAML.pm and YAML::Syck support both variants when loading, YAML::XS only supports two exclamation marks.

The recommendation of the YAML spec as I understand it is to use one exclamation mark.
Two exclamation marks are reserved for standard YAML tags.

To let YAML::PP interact with the other YAML modules, it needs to support both.

Doc issue

https://github.com/perlpunk/YAML-PP-p5/blob/master/lib/YAML/PP.pm#L858

looks like it also needs YAML_FLOW_MAPPING_STYLE and YAML_FLOW_SEQUENCE_STYLE.

Recent released versions are prefixed with `v`

I dunno if it was intentional, but the declared version has gone from 0.036 to v0.37.0

Dump() can create invalid YAML 1.1

For certain strings YAML::Dump() creates YAML with control characters not escaped. it is valid YAML 1.2, but not YAML 1.1:

use Encode;
use Devel::Peek;
use YAML::PP;
use YAML::XS ();            
my $binary = "\342\202\254";   
my $dump = YAML::PP::Dump($binary);
Dump $dump;                      
my $encoded = encode_utf8($dump);
Dump $encoded;            
YAML::XS::Load($encoded);

__END__

SV = PV(0x56201c459070) at 0x56201c478358
  REFCNT = 1
  FLAGS = (POK,IsCOW,pPOK,UTF8)
  PV = 0x56201ca49e80 "--- '\303\242\302\202\302\254'\n"\0 [UTF8 "--- '\x{e2}\x{82}\x{ac}'\n"]
  CUR = 13
  LEN = 16
  COW_REFCNT = 0
SV = PV(0x56201c99c0e0) at 0x56201ca54c38
  REFCNT = 1
  FLAGS = (POK,IsCOW,pPOK)
  PV = 0x56201ca49e80 "--- '\303\242\302\202\302\254'\n"\0
  CUR = 13
  LEN = 16
  COW_REFCNT = 1
YAML::XS::Load Error: The problem:

    control characters are not allowed

was found at document: 0

https://yaml.org/spec/1.1/

[128]	nb-double-char	::=	( nb-char - “\” - “"” ) | ns-esc-char	
[34]	nb-char	::=	c-printable - b-char

https://yaml.org/spec/1.2/spec.html

[107]	nb-double-char	::=	c-ns-esc-char | ( nb-json - “\” - “"” )
[2]	nb-json	::=	#x9 | [#x20-#x10FFFF]

Emitter formatting for boolean.pm values

YAML::XS style seems better.

★ ~/src/+pegex-parser/pegex-json-pm master $ perl -Mboolean -MYAML::PP -E 'say YAML::PP->new(schema => ['Perl'])->dump_string(true)'
--- !perl/scalar:boolean
=: 1

★ ~/src/+pegex-parser/pegex-json-pm master $ perl -Mboolean -MYAML::XS -E 'say YAML::XS::Dump(true)'
--- !!perl/scalar:boolean 1

Merge breaks when merging a node with a sequence.

Given a node that is a sequence

node1: &node1
  - item1
  - item2
  - item3

node2:
  <<: [ *node1 ]

I get the following error:

Expected hash for merge key at centos7/v5.18.2/lib/perl5/YAML/PP/Constructor.pm line 160, <STDIN> line 1.
 at centos7/v5.18.2/lib/perl5/YAML/PP/Loader.pm line 92.

node1: &node1
  item1: {}
  item2: {}
  item3: {}

node2:
  <<: [ *node1 ]

Works for node that are hashes.

Bug: Literal scalars with explicit indent seem to have a problem

See: https://spec.yaml.io/main/playground/parser?input=LS0tCi0geHh4OiB8MgogICAgICBvbmUKICAgICAgdHdvCiAgICAgIHRocmVlCiAgeXl5OiB8CiAgICBvbmUKICAgIHR3bwogICAgdGhyZWUKICB6eno6IHwKICAgIG9uZQogICAgdHdvCiAgICB0aHJlZQo=

> cat foo.yaml 
---
- xxx: |2
      one
      two
      three
  yyy: |
    one
    two
    three
  zzz: |
    one
    two
    three
> perl -MYAML::PP -e '$y = YAML::PP->new; print $y->dump_string($y->load_file(shift))' foo.yaml 
---
- xxx: |2
        one
        two
        three
    yyy: |
      one
      two
      three
    zzz: |
      one
      two
      three

Schema to support TO_JSON methods

What about a schema which recognises TO_JSON methods the same way as the JSON modules do? That would be a useful compromise I think. Would it be hard to implement?

Update yamlpp-* tools to support the Merge feature.

The utilities shipped with YAML do not support https://metacpan.org/pod/YAML::PP::Schema::Merge feature.

It's be really nice if they had it turned on by default or via some command line flag.

@perlpunk Kudos on being one of the first PERL YAML parsers to support this feature.

Thanks

YAML::PP::Load loops infinitely when given tainted string on perl < 5.14

There is a bug in perl versions below 5.14 which causes pos to not work correctly with tainted strings. This was fixed by Perl/perl5@fd69380.

YAML::PP reads from strings using YAML::PP::Reader, which relies on pos to keep track of where it is in the string. If pos is not maintained properly, it will loop infinitely.

Rather than using pos, it should be possible to consume the string line by line:

diff --git c/lib/YAML/PP/Reader.pm i/lib/YAML/PP/Reader.pm
index 456630f..aeda5df 100644
--- c/lib/YAML/PP/Reader.pm
+++ i/lib/YAML/PP/Reader.pm
@@ -18,8 +18,7 @@ sub new {

 sub read {
     my ($self) = @_;
-    my $pos = pos $self->{input} || 0;
-    my $yaml = substr($self->{input}, $pos);
+    my $yaml = $self->{input};
     $self->{input} = '';
     return $yaml;
 }
@@ -29,7 +28,7 @@ sub readline {
     unless (length $self->{input}) {
         return;
     }
-    if ( $self->{input} =~ m/\G([^\r\n]*(?:\n|\r\n|\r|\z))/g ) {
+    if ( $self->{input} =~ s/\A([^\r\n]*(?:\n|\r\n|\r|\z))// ) {
         my $line = $1;
         unless (length $line) {
             $self->{input} = '';

This could lead to copying large strings though. It may also be reasonable to untaint the string before processing it.

Parse error on plain key ending with colon

This causes a parse error:

foo:: bar

Undeclared dependency Tie::IxHash

The test suite fails without Tie::IxHash:

...
#   Failed test 'YAML/PP/Schema/Tie/IxHash.pm loaded ok'
#   at t/00.compile.t line 66.
#          got: '512'
#     expected: '0'
Can't locate Tie/IxHash.pm in @INC (you may need to install the Tie::IxHash module) (@INC contains: /home/cpansand/.cpan/build/2019030321/YAML-PP-0.010_001-W9_LJt/blib/arch /home/cpansand/.cpan/build/2019030321/YAML-PP-0.010_001-W9_LJt/blib/lib /home/cpansand/.cpan/build/2019030321/YAML-PP-0.010_001-W9_LJt/blib/lib /home/cpansand/.cpan/build/2019030321/YAML-PP-0.010_001-W9_LJt/blib/arch /usr/perl5.22.4p/lib/site_perl/5.22.4/amd64-freebsd /usr/perl5.22.4p/lib/site_perl/5.22.4 /usr/perl5.22.4p/lib/5.22.4/amd64-freebsd /usr/perl5.22.4p/lib/5.22.4 .) at /home/cpansand/.cpan/build/2019030321/YAML-PP-0.010_001-W9_LJt/blib/lib/YAML/PP/Schema/Tie/IxHash.pm line 10.
BEGIN failed--compilation aborted at /home/cpansand/.cpan/build/2019030321/YAML-PP-0.010_001-W9_LJt/blib/lib/YAML/PP/Schema/Tie/IxHash.pm line 10.
Compilation failed in require at -e line 1.
# Looks like you failed 1 test of 23.
t/00.compile.t ............. 
Dubious, test returned 1 (wstat 256, 0x100)
Failed 1/23 subtests 
...
Can't locate Tie/IxHash.pm in @INC (you may need to install the Tie::IxHash module) (@INC contains: /home/cpansand/.cpan/build/2019030321/YAML-PP-0.010_001-W9_LJt/blib/lib /home/cpansand/.cpan/build/2019030321/YAML-PP-0.010_001-W9_LJt/blib/arch /usr/perl5.22.4p/lib/site_perl/5.22.4/amd64-freebsd /usr/perl5.22.4p/lib/site_perl/5.22.4 /usr/perl5.22.4p/lib/5.22.4/amd64-freebsd /usr/perl5.22.4p/lib/5.22.4 .) at /home/cpansand/.cpan/build/2019030321/YAML-PP-0.010_001-W9_LJt/blib/lib/YAML/PP/Schema/Tie/IxHash.pm line 10.
BEGIN failed--compilation aborted at /home/cpansand/.cpan/build/2019030321/YAML-PP-0.010_001-W9_LJt/blib/lib/YAML/PP/Schema/Tie/IxHash.pm line 10.
Compilation failed in require at t/38.schema-ixhash.t line 9.
BEGIN failed--compilation aborted at t/38.schema-ixhash.t line 9.
t/38.schema-ixhash.t ....... 
Dubious, test returned 2 (wstat 512, 0x200)
No subtests run 
...

order is not preserved in new subhashes

I wrote these test cases:

diff --git a/t/52.preserve.t b/t/52.preserve.t
index 0cfb1e2..7c7e141 100644
--- a/t/52.preserve.t
+++ b/t/52.preserve.t
@@ -124,6 +124,20 @@ EOM
     is(exists $data->{a}, 1, 'exists(a)');
     is(exists $data->{A}, '', 'exists(A)');
 
+    # can we preserve ordering in new subhashes?
+    $data->{d}{z} = 1;
+    $data->{d}{a} = 2;
+    @{$data->{d}}{qw(y b x c)} = 3..6;
+
+    @keys = keys %$data;
+    is("@keys", "a y b x c z d", 'keys()');
+
+    @keys = keys %{$data->{d}};
+    is("@keys", "z a y b x c", 'keys() of a new subhash');
+
+    @values = values %{$data->{d}};
+    is("@values", "1 2 3 4 5 6", 'values() of a new subhash');
+
     %$data = ();
     is(scalar keys %$data, 0, 'clear');
     is(scalar %$data, 0, 'clear');

and sadly, they fail:

...
# Subtest: preserve-order
    ok 1 - preserve=1 Key order preserved
    ok 2 - keys()
    ok 3 - hash a
    ok 4 - First key
    ok 5 - Next key
    ok 6 - delete(z)
    ok 7 - keys()
    ok 8 - keys()
    ok 9 - scalar
    ok 10 - values()
    ok 11 - exists(a)
    ok 12 - exists(A)
    ok 13 - keys()
    not ok 14 - keys() of a new subhash
    #   Failed test 'keys() of a new subhash'
    #   at t/52.preserve.t line 136.
    #          got: 'a x c b y z'
    #     expected: 'z a y b x c'
    not ok 15 - values() of a new subhash
    #   Failed test 'values() of a new subhash'
    #   at t/52.preserve.t line 139.
    #          got: '2 5 6 4 3 1'
    #     expected: '1 2 3 4 5 6'
    ok 16 - clear
    ok 17 - clear
    1..17
    # Looks like you failed 2 tests of 17.
...

I think we just need to create a new tied hash when making an assignment to a new slot.

On combining different resolver types

When implementing a "catchall" resolver in https://github.com/pplu/cfn-perl/tree/feature/yaml_support, I found out that combining some resolvers has unexpected behaviour:

$schema->add_sequence_resolver(tag => '!X', ...);
$schema->add_sequence_resolver(tag => qr/^!.*/, ...);

I was expecting the !X to be processed by the first resolver, when in fact, it gets processed always by the last one. I was adding specific resolvers for each tag, and wanted a "catchall" resolver for dying when an unsupported tag was found.

I just wanted you to notice this as feedback, since I think not many people are using these APIs.

I don't need this as a feature, since I've moved the code to only having one resolver for each type (scalar, sequence, mapping), and dispatching depending on the tag detected (https://github.com/pplu/cfn-perl/blob/3aeef4fc0d957e77a5defc7835b784cf2501f3cd/lib/Cfn/YAML/Schema.pm#L51)

Booleans not as references

Currently booleans in the data spawn YAML references so that for example the first time false occurs it says in_use: &1 false and every subsequent false becomes *1. It may be strictly speaking correct and you save 3 chars each time, but it's not very human readable. Is there a way to avoid this?

Note that I don't want to turn references off completely, just for booleans.

Feature request: Add support for types

Hello!

In some cases it is needed to distinguish in YAML between number stored as string literals (number in single or double quotes) and number stored as number (without quotes).

Similar thing is needed also in JSON and e.g. Cpanel::JSON::XS encoder/decoder provides additional argument for passing types (when encoding) or getting types (when decoding). Documentation is written at: https://metacpan.org/pod/Cpanel::JSON::XS::Type

JSON example of usage:

use Cpanel::JSON::XS;
use Cpanel::JSON::XS::Type;
my $json_string = '{"key1":{"key2":[10,"10",10.6]},"key3":"10.5"}';
my $perl_struct = decode_json($json_string, 0, my $type_spec);
# $perl_struct is { key1 => { key2 => [ 10, 10, 10.6 ] }, key3 => 10.5 }
# $type_spec is { key1 => { key2 => [ JSON_TYPE_INT, JSON_TYPE_STRING, JSON_TYPE_FLOAT ] }, key3 => JSON_TYPE_STRING }

use Cpanel::JSON::XS;
use Cpanel::JSON::XS::Type;
encode_json([10, 10, 10.25], [JSON_TYPE_STRING, JSON_TYPE_INT, JSON_TYPE_STRING]);
# '["10",10,"10.25"]'

Via YAML::PP I can load e.g. following example

use YAML::PP qw(Load);
my $perl_struct = Load(<<EOF);
array:
 - "10000000000000000000"
 - 10000000000000000000
EOF

And I would like to get information what are types of the key array (like in above JSON example). So e.g. that first element is string and second element is a number.

It is possible to provide some kind of support for types when loading and dumping YAMLs via YAML::PP?

please use more explicit code in the get-set-ers

Example:

sub loader {
    @_ > 1 and $_[0]->{loader}= $_[1];  # setter
    return $_[0]->{loader};
}

The functionality is the same as the current code,
it just makes more explicit that it is always returning the scalar.

Support merge keys

What is a merge key?

https://yaml.org/type/merge.html

Problems

The specification of merge keys is incomplete

If the value associated with the key is a single mapping node, each of its key/value pairs is
inserted into the current mapping, unless the key already exists in it.

It's not clear what "already exists" mean. If you look at the last item in the example, the x key is overriding the one from the merge key, although at the time of the parsing/loading of the merge key it does not yet exist.
Also the spec doesn't say what happens if a merge key comes after other keys, or if two merge keys are allowed at all:

<< : *alias1
key1: value
<< : *alias2

That means that a post-processing of the mapping is necessary.
Also a merge key is very different from all other tag definitions, because the usual tags target the node itself.
A merge key like the following (explicitly marked with !!merge to make it clear

!!merge << : *alias

actually alters the behaviour of the surrounding mapping, especially since post-processing is necessary.

That means that it is more work to implement it. While all other tags can be implemented generically, we need extra code to make this work.

Also, implementations differ slightly.
The merge key has to be a plain scalar, without quotes. but js-yaml also reads "<<" as a merge key, preventing people from using a literal << as a key.
Ruby's implementation differs slightly if you change the order of keys, so that the merge key comes after other keys.

You can only merge mappings

There is no smilar thing for merging lists, although that could also be useful in some situations.

Why do we still want merge keys then?

First of all, they are implemented in PyYAML, ruamel, SnakeYAML, js-yaml, Ruby psych and others, and people are using them often. They are useful after all.

I think people using YAML in perl should be able to use them too.

Make Dumper non-recursive

Currently the Dumper goes recursively through the data structure. This can use a lot of memory for deeply nested structures.

Suggestion: shorter alias for YAML::PP::Highlight

This is just a suggestion, not a bug report.

YAML::PP::Highlight is great but I would really enjoy using a shorter alias, maybe YPH, to add it at a lower cost when debugging?

Change of behavior for undef keys

Hi,

I've found a change in behavior in the handling of serialization and deserialization of some YAML structures between YAML::PP versions.

In current YAML::PP, a document like this:

key:

is getting deserialized to

{ key => '' }

When we serialize it again, it's going to what you would expect from the serialization of that Perl datastructure:

---
key: ''

The resulting YAML, IMHO, is not "equivalent" to the original one. We've detected this because one of our test suites started failing when building with up-to-date dependencies. We've tracked the change down to YAML::PP 0.012, that doesn't display this behavior, instead deserializing the document to:

{ key => undef }

which I think is more expected, since then roundtripping the YAML brings you to:

---
key: null

Which seems like a better behavior for unspecified values for keys, since they will return to the same Perl datastructure.

Do you think it's worth returning YAML::PP to the 0.012 behavior? Or was it a design decision? Is there some type of way to control the serialization that I'm not aware of?

Thanks in advance, and always open to help solve this "bug"

BTW: here's a small test script that passes on 0.012, but fails on later YAML::PPs

#!/usr/bin/env perl

use strict;
use warnings;

use Test::More;
use YAML::PP qw/Load Dump/;

{
  my $yaml = "key:";

  my $perl = Load($yaml);
  ok(exists($perl->{ key }), 'key exists');
  is($perl->{ key }, undef, 'and is an undef');

  like(Dump($perl), qr/key: null/);

  my $roundtrip = Load(Dump($perl));
  ok(exists($roundtrip->{ key }), 'key exists');
  is($roundtrip->{ key }, undef, 'and is an undef');
}

done_testing;

Support loading and dumping typeglobs

nan vs NaN (0.006_001)

The t/33.schema-dump.t test fails for older perls (< 5.22):

    #   Failed test 'Schema Failsafe dump'
    #   at t/33.schema-dump.t line 176.
    #          got: '---
    # - 1
    # - 3.14159
    # - 42
    # - ''
    # - ~
    # - 0
    # - 3.14159
    # - 0x10
    # - 0o7
    # - 1e23
    # - true
    # - false
    # - null
    # - NULL
    # - TRUE
    # - False
    # - inf
    # - -inf
    # - nan
    # '
    #     expected: '---
    # - 1
    # - 3.14159
    # - 42
    # - ''
    # - ~
    # - 0
    # - 3.14159
    # - 0x10
    # - 0o7
    # - 1e23
    # - true
    # - false
    # - null
    # - NULL
    # - TRUE
    # - False
    # - Inf
    # - -Inf
    # - NaN
    # '
    # Looks like you failed 1 test of 1.
(etc)

It seems that older perls emit "nan" instead of "NaN".

boolean.pm values cannot be emitted

boolean.pm values cannot be emitted with YAML::PP 0.24:

$ perl5.30.3 -Mboolean -MYAML::PP -E '$p=YAML::PP->new; say $p->dump_string([true,false])'  
Reftype SCALAR not implemented at /opt/perl-5.30.3/lib/site_perl/5.30.3/YAML/PP/Representer.pm line 120.

Possible workaround: specify boolean=>"boolean" (but why? this should only be a load-time option?):

$ perl5.30.3 -Mboolean -MYAML::PP -E '$p=YAML::PP->new(boolean=>"boolean"); say $p->dump_string([true,false])' 
---
- true
- false

Question: is it possible to force all one-line string scalars to be single-quoted?

The title nearly says it all. The two additional points are that I would like also keys to be single-quoted and would prefer that keys/values containing only ASCII alphanumerics and SPACE would be exempt.

t/31.schema.t fails tests 238 and 3838 when nvtype is IBM DoubleDouble

Hi,

This is a -Duselongdouble build of perl-5.32.0 (and 5.31.0 suffers the same problem) on Debian wheezy:

$ perl -V:longdblkind
longdblkind='6';

The errors are reported in the test suite as follows:

t/31.schema.t .............. 20/?
# Failed test '(yaml11) type float: load(!!float 190:20:30.15) eq '685230.15''
# at t/31.schema.t line 138.
# got: 685230.15
# expected: 685230.15
t/31.schema.t .............. 3615/?
# Failed test '(yaml11) type float: load(190:20:30.15) eq '685230.15''
# at t/31.schema.t line 138.
# got: 685230.15
# expected: 685230.15
t/31.schema.t .............. 4123/? # Looks like you failed 2 tests of 4394.
t/31.schema.t .............. Dubious, test returned 2 (wstat 512, 0x200)
Failed 2/4394 subtests

and, at the end of the test suite:

t/31.schema.t (Wstat: 512 Tests: 4394 Failed: 2)
Failed tests: 238, 3838
Non-zero exit status: 2

The problem is that $data != $def{data} in this particular case, though they are both (obviously) stringifying to the same string of "685230.15".
For this particular case
scalar(reverse(unpack "h*", pack("D<", $data))) is 4124e95c4ccccccdbdb999999999999a
scalar(reverse(unpack "h*", pack("D<", $def{data}))) is 4124e95c4ccccccdbdb9999999999998
which is a very minor discrepancy (of 2 units in the last place).

The former value (ie $data) is the correct representation of 685230.15 for this architecture, though both perl and C will inaccurately assign the latter value (which translates to 685230.150000000000000000000000005.
(The least significant double is a negative value, so $def{data} is in fact greater than $data, despite the appearance to the contrary.)

Could the test be changed from "==" to "eq" ? (The diagnostic message suggests that the intention was to test "eq", not "==".)

Cheers,
Rob

Parser events have offset, but not line number

Is it possible to add the $token->{line} to parser events? I can extend the parser for now to work for most cases (will give an update if I notice anything missing).

My goal here is to have the loader maintain the location of YAML nodes alongside the constructed objects.

Description and implementation of YAML::PP::Schema::Binary does not make sense

First description:

YAML-PP-p5/lib/YAML/PP/Schema/Binary.pm

Lines 77 to 81 in 78a0506

 By prepending a base64 encoded binary string with the C<!!binary> tag, it can 

 be automatically decoded when loading. 

 If you are using this schema, any string containing C<[\x{7F}-\x{10FFFF}]> 

 will be dumped as binary. That also includes encoded utf8.

encode_base64() is a function: {0x00..0xFF}ᴺ →{0x2B, 0x2F, 0x30..0x39, 0x41..0x5A, 0x61..7A}ᴹ
So YAML::PP::Schema::Binary obviously cannot take string which contains {0x000100..0x10FFFF}.

Binary data are defined as stream of octets, therefore from set {0x00..0xFF} like encode_base64() function takes it.

And next implementation:

YAML-PP-p5/lib/YAML/PP/Schema/Binary.pm

Lines 29 to 39 in 78a0506

 my $binary = $node->{value}; 

 unless ($binary =~ m/[\x{7F}-\x{10FFFF}]/) { 

 # ASCII 

 return; 

 } 

 if (utf8::is_utf8($binary)) { 

 # utf8 

 return; 

 } 

 # everything else must be base64 encoded 

 my $base64 = encode_base64($binary);

unless ($binary =~ m/[\x{7F}-\x{10FFFF}]/)is equivalent to if ($binary =~ m/[\x{00}-\x{7E}]/) checks for all 7-bit ASCII characters except 7-bit ASCII \x{7F}. Comment says that this code is for ASCII which is not truth as it is ASCII ∖ {0x7F}.

Next there is check if (utf8::is_utf8($binary)) which in our case is basically:
if ($binary !~ m/[\x00-\xFF]/ and (int(rand(2))%2 == 1 or $binary =~ m/[\x80-\xFF]/))
So this code always skips strings with values from set {0x000100..0x10FFFF} and then (pseudo)-randomly (depending on internal representation of scalar) also skips strings from set {0x80..0xFF}. Comment says that this code is for utf8, but it is not truth, see documentation and my above simplified implementation.

And finally it calls encode_base64 function. When this function is called? Strings with only {0x00..0x7E} are ignored by first ASCII check. Then by second checks are always ignored strings which have at least one value from {0x000100..0x10FFFF}. So encode_base64 is called when string contains at least one value 0x7F or when there is at least one value from set {0x80..0xFF} and is_utf8 (pseudo)-randomly returned false.

Suggested fix

Change YAML::PP::Schema::Binary module to work only with binary data as name suggests. Therefore with strings which contains only values from set {0x00..0xFF}.
Use encode_base64() always when input string is non-7-bit ASCII, therefore contains at least one value from set {0x80..0xFF}.
Decide if Base64 encoding is really needed for strings with character 0x7F. It is 7-bit ASCII therefore in 7-bit applications it is not needed special treatment for it. But I'm not sure if YAML needs special treatment of 0x7F or not.

CC @2shortplanks Please look at it and ideally do some correction if I wrote some mistake. Some parts I simplified to make it more easier to understand.

Anti-issue: YAML::PP parses JSON that all the other perl JSON modules can't!

So yeah, this is an anti-issue - I discovered recently that JSON is "a subset of YAML 1.2"; and then discovered YAML::PP. In short: Thank you. YAML::PP doesn't bomb on JSON that is produced with ham-fisted UTF-8 encoding.

It appears that one company in particular that distributes a data feed has somehow "switched on" interpreting all data ingested as UTF-8, even when it wasn't UTF-8 encoded. Imagine interpreting the header of a ZIP file as Unicode. The result is corrupted garbage, and it isn't standards compliant.

Example:
{"Subject": "CN=\ufffd\ufffd\ufffd\ufffd\ufffd\ufffd\u0531/OU=\ufffd\ufffd\u01b4\ufffd/OU=\u027d\ufffd\ufffd\ufffd\ude64\ufffd\ufffd\u0467/O=sdlg" }

Nothing else in Perl land seems to be able to parse the above JSON document. YAML::PP does, as of v0.005.

My request: Please let this continue to be the case. If you do end up adding validation of unicode character sequences, give folks an option to turn it off.

Getting a "Bad indendation in FLOWMAP" error

Hi,

I found this piece of CloudFormation that YAML::PP 0.026 doesn't parse, although Cloudformation considers it valid YAML:

Resources:
  Resource1:
    Properties:
      Item: {
        'accountId': { 'N': '15' },
        'username': { 'S': 'user' }
      }
  Resource2:
    Type: "XXX"

You get the following error:

Line      : 7
Column    : 7
Message   : Bad indendation in FLOWMAP
Where     : local/lib/perl5/YAML/PP/Parser.pm line 197
YAML      : "}"
  at local/lib/perl5/YAML/PP/Loader.pm line 94.

bug? not implemented yet? Or is there any parser / schema option that I'm not aware of?

Wrapping long strings

Currently long strings aren't wrapped.
Since it's common for YAML emitters to have a maximum width, wrapping should be implemented with an option to set a width.

	By prepending a base64 encoded binary string with the C<!!binary> tag, it can
	be automatically decoded when loading.

	If you are using this schema, any string containing C<[\x{7F}-\x{10FFFF}]>
	will be dumped as binary. That also includes encoded utf8.

	my $binary = $node->{value};
	unless ($binary =~ m/[\x{7F}-\x{10FFFF}]/) {
	# ASCII
	return;
	}
	if (utf8::is_utf8($binary)) {
	# utf8
	return;
	}
	# everything else must be base64 encoded
	my $base64 = encode_base64($binary);