GithubHelp home page GithubHelp logo

ppi's People

Contributors

adamkennedy avatar akiym avatar book avatar dependabot[bot] avatar dolmen avatar dsteinbrunner avatar grinnz avatar guillaumeaubert avatar h3xx avatar happy-barney avatar jkeenan avatar jmaslak avatar karenetheridge avatar kentfredric avatar kevindawson avatar manwar avatar moregan avatar nanto avatar oalders avatar randyl avatar reneeb avatar rurban avatar s-nez avatar trwyant avatar tsee avatar tsibley avatar van-de-bugger avatar wbraswell avatar wchristian avatar wolfsage avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

ppi's Issues

RT 67831: implicit statement end not recognized for perl 5.12-style package

https://rt.cpan.org/Public/Bug/Display.html?id=67831

PPI 1.1.215/1.216_01 do not recognize the implicit end of statement that follows the block in a Perl 5.12 package statement:

ppidump 'package Foo {} sub bar { 1; }'
                    PPI::Document
                      PPI::Statement::Package
[    1,   1,   1 ]     PPI::Token::Word         'package'
[    1,   9,   9 ]     PPI::Token::Word         'Foo'
                        PPI::Structure::Block   { ... }
[    1,  16,  16 ]     PPI::Token::Word         'sub'
[    1,  20,  20 ]     PPI::Token::Word         'bar'
                        PPI::Structure::Block   { ... }
                          PPI::Statement
[    1,  26,  26 ]         PPI::Token::Number   '1'
[    1,  27,  27 ]         PPI::Token::Structure        ';'

With an explicit statement terminator, it's fine:

ppidump 'package Foo {} ; sub bar { 1; }'
                    PPI::Document
                      PPI::Statement::Package
[    1,   1,   1 ]     PPI::Token::Word         'package'
[    1,   9,   9 ]     PPI::Token::Word         'Foo'
                        PPI::Structure::Block   { ... }
[    1,  16,  16 ]     PPI::Token::Structure    ';'
                      PPI::Statement::Sub
[    1,  18,  18 ]     PPI::Token::Word         'sub'
[    1,  22,  22 ]     PPI::Token::Word         'bar'
                        PPI::Structure::Block   { ... }
                          PPI::Statement
[    1,  28,  28 ]         PPI::Token::Number   '1'
[    1,  29,  29 ]         PPI::Token::Structure        ';'

The RT ticket includes a patch.

Package names beginning with 'v' plus a digit parsed as version strings

PPI 1.215 and 1.216_01

ppidump 'package v10;'
                    PPI::Document
                      PPI::Statement::Package
[    1,   1,   1 ]     PPI::Token::Word         'package'
[    1,   9,   9 ]     PPI::Token::Number::Version      'v10'
[    1,  12,  12 ]     PPI::Token::Structure    ';'

perl -WE 'package v10; print __PACKAGE__'
v10

and

ppidump 'package v10g;'
                    PPI::Document
                      PPI::Statement::Package
[    1,   1,   1 ]     PPI::Token::Word         'package'
[    1,   9,   9 ]     PPI::Token::Number::Version      'v10'
[    1,  12,  12 ]     PPI::Token::Word         'g'
[    1,  13,  13 ]     PPI::Token::Structure    ';'

[Feature Request] Parse/Token plugins

As said on IRC to @wchristian , if a rewrite is going to happen, something that would be "nice" to think about is having a proviso for non-standard syntax extensions.

So that perhaps, code that knows it is about to parse Devel::Declare based code, can load a plugin that knows how to spice the syntax, and pass the plugin to PPI, and PPI can emit structures, and re-serialize back, sanely and safely.

Then maybe down the road, we could work out how to write a plugin that dynamically loads other plugins on demand based on hints in the source being parsed, and cover more of the edgecases encountered by metasyntax.

Though I don't exactly have any idea of how such a plugin would look, or how such a plugin would be passed over, just a general sense of "this would be nice and useful"

Please release the current master as 1.216_01

@adamkennedy There's more to do, but we've got a sizable amount of changes that we'd like to see chewed through by the CPAN smokers. Can you please release the current master as dev version 1.216_01?

Alternately, if you feel like handing out COMAINT, i'd happily do it myself too. :)

size limit questions

Currently PPI has a hard-coded size limit on the files it is willing to parse. There are two questions here:

  • the commend on that code says that big files "blow up the Tokenizer/Lexer". What does this mean? Crash, subtle errors? Too much resource use?
  • I'm thinking of making the tokenizer taken an option for maximum size (in addition to an env var), but the tokenizer constructor does not yet have and code for options it can take. Is there a recommended example or method, or should i just go with what seems proper?

Misparse of &&= and ||=

Hi there-

I was smoking Perl::Critic with the latest PPI, and I came across this...

As of 1.216_01, PPI incorrectly parses the expression $foo ||= 0; as follows:

                    PPI::Document
                      PPI::Statement
[    1,   1,   1 ]     PPI::Token::Symbol   '$foo'
[    1,   6,   6 ]     PPI::Token::Operator     '||'
[    1,   8,   8 ]     PPI::Token::Operator     '='
[    1,  10,  10 ]     PPI::Token::Number   '0'
[    1,  11,  11 ]     PPI::Token::Structure    ';'

Notice that || and = are parsed as two operators. Same goes for &&= but not //= or other types of assignment operators.

rt.cpan.org tickets to close

From a pass over the PPI rt.cpan.org queue:

RT tickets to close with 1.216 from due to merging of moregan branches:

  • 68176 and 71705 -- support all augmented assignment operators
    3353672
  • RT 75039: don't allow '=CUT' to terminate POD
    1bdce9b
  • RT 36540 -- support upper case in hex and binary numbers
    6279fbb
  • RT 45014: parse '12.34..56.78' parsed as version string + '..' + float
    b4d5644
  • RT 51693: fix pod markup containing '>'
    1cafad1
  • RT 30863: spelling fix
    c2d6b37
  • RT 67264: fix spelling of Tom Christiansen
    dda1721

Other RT tickets to close due to changes in 1.216 and before:

  • RT 85049 -- Merge pull request #6 from dsteinbrunner/patch-1
    7b07326
  • RT 69026 Patched in #3
    45968ef
  • RT 90792 Patched in: #2
    4158513
  • RT 45471 -- appears to not be a bug, according to the submitter's remarks
  • RT 36556 -- apparent fixed between Sat Jun 07 18:37:32 2008 PPI 1.215
  • RT 35829 -- in 1.215 there is no code at all in the perldoc output.
    Submitter was seeing either something that is now gone (submitted
    May 2008), or the inline tests.

RT 74527: sub v2 {} parsed as a version string

ppidump 'sub v2 {1;}'
                    PPI::Document
                      PPI::Statement
[    1,   1,   1 ]     PPI::Token::Word         'sub'
[    1,   5,   5 ]     PPI::Token::Number::Version      'v2'
                        PPI::Structure::Block   { ... }
                          PPI::Statement
[    1,   9,   9 ]         PPI::Token::Number   '1'
[    1,  10,  10 ]         PPI::Token::Structure        ';'

even if the sub name only starts off looking like a version string:

ppidump 'sub v2go {1;}'
                    PPI::Document
                      PPI::Statement
[    1,   1,   1 ]     PPI::Token::Word         'sub'
[    1,   5,   5 ]     PPI::Token::Number::Version      'v2'
[    1,   7,   7 ]     PPI::Token::Word         'go'
                        PPI::Structure::Block   { ... }
                          PPI::Statement
[    1,  11,  11 ]         PPI::Token::Number   '1'
[    1,  12,  12 ]         PPI::Token::Structure        ';'

After this PPI sticks everything up to the next explicit statement separator into the sub's statement, a la #31

The RT ticket includes an idea of where to fix the problem: https://rt.cpan.org/Public/Bug/Display.html?id=74527

php-style error handling to perl-style error handling?

A generic question:

Errors in PPI are handled by capturing them and stuffing them in ->errstr, effectively hiding them unless the user knows to look for them. Over the past years the perl community has converged on treating this as an anti-pattern. Are there any stringent reasons against migrating PPI to "die on failure" behavior?

Many operators/builtins not separated from following single quote

PPI::Token::Word has code to make sure that some operators/builtins (eq ne ge le gt lt q qq qx qw qr m s tr y pack unpack) are separated from an immediately-following single quote because that’s what perl does. E.g.: “ $foo eq’bar’ “ is parsed by PPI and by perl as a symbol, the eq operator, and a single-quoted string. However, there are many words that perl separates that PPI does not, e.g.: ‘cmp’:

ppidump "\$foo cmp'bar'"
                    PPI::Document
                      PPI::Statement
[    1,   1,   1 ]     PPI::Token::Symbol       '$foo'
[    1,   6,   6 ]     PPI::Token::Word         'cmp'bar'
[    1,  13,  13 ]     PPI::Token::Quote::Single        '''

There are dozens of words PPI::Token::Word doesn't handle in regen/keywords.pl of the perl sources. Presumably most (all?) of them should be handled.

See Perl-Critic/Perl-Critic#451 for a real-world example.

request for comaint :)

Mithaldu here.

As mentioned in my Email, i'd like to help PPI index better on metacpan, as well as give it love in regards to the many bug tickets it has. As such, i would like to have comaint. :)

Can't load file written in perl string normally

Hi.

If file which target to load contains perl string; like

use utf8;
my $hash = { 東京 => 'tokyo' };

then result of PPI::Document::File->new($file) is undef, means it cannot handle perl string rightly (if I remember correctly, perl implementation allows bare word written in perl string as key of hash).

So I wrote a patch that adds perl string option to constructor.
moznion@757f382
However I think other better way probably exists.

How do you feel?

logic for detection of labeled statements in Lexer::_add_element needs to be tested

Right now it checks that $Parent->schild(1) is false, but then it goes on to request and use $second->content, which seems broken. Flipping the comparison doesn't cause any tests to fail so i assume this is untested.

Tried to dig through commit history, but it ends at "cvs import" before any code changes to that segment are made.

Should AUTOLOAD, DESTROY, et al. tokenize as PPI::Statement::Scheduled ?

In https://rt.cpan.org/Public/Bug/Display.html?id=27364 Jeffrey Thalhammer suggests that AUTOLOAD should yield a PPI::Statement::Scheduled, not a mere PPI::Statement::Sub.

DESTROY is similar to AUTOLOAD. They are both special methods called by Perl. They subs even when, as Perl allows, "sub" is omitted. They are not quite as special as BEGIN, END, etc. which are code blocks and can be repeated. threads.pm calls the CLONE method (from 5.7.3) and CLONE_SKIP method (from 5.8.7).

Is PPI::Statement::Scheduled reserved only for the five special blocks (BEGIN, UNITCHECK, CHECK, INIT, END) "intended to be run at a specific time during the loading process." as the documentation says, or should it apply to special functions Perl will call in general? If the latter, would the tie methods count? Anything else?

dead code?

I'm currently writing tests to improve the coverage of the test suite, and am finding what looks like dead code. After about 8 hours altogether of trying to find code that will trigger this if condition i haven't found any, and it looks strongly like the handling of these circumstance has been moved to the commit function. Most notably i see this because the only way to satisfy the if condition is to have a single : recognized as a sub attribute, followed by triggering of Word->__TOKENIZER__on_char, however the execution path seems to only be able to lead into Whitespace->__TOKENIZER__on_char, which can only lead to Word->__TOKENIZER__commit.

If you know of any string that can trigger this code, please let me know so i can add the test, otherwise i regard this as dead code that can be removed.

x sometimes parsing as operator not word

PPI 1.215 is parsing some instances of 'x' as an operator rather than a word:

ppidump '1=>x'
                    PPI::Document
                      PPI::Statement
[    1,   1,   1 ]     PPI::Token::Number       '1'
[    1,   2,   2 ]     PPI::Token::Operator     '=>'
[    1,   4,   4 ]     PPI::Token::Operator     'x'

ppidump '%hash=(1=>x)'
                    PPI::Document
                      PPI::Statement
[    1,   1,   1 ]     PPI::Token::Symbol       '%hash'
[    1,   6,   6 ]     PPI::Token::Operator     '='
                        PPI::Structure::List    ( ... )
                          PPI::Statement::Expression
[    1,   8,   8 ]         PPI::Token::Number   '1'
[    1,   9,   9 ]         PPI::Token::Operator         '=>'
[    1,  11,  11 ]         PPI::Token::Operator         'x'

perl (5.8.8 and 5.18.1) parses the x as a word:

perl -We "my %hash=(1=>x);"
Unquoted string "x" may clash with future reserved word at -e line 1.

Curiously, xor gets different treatment from perl:

perl -WE "my %hash=(1=>xor);"
syntax error at -e line 1, near "xor)"
Execution of -e aborted due to compilation errors.

RT 75038: PPI::Token::Number::Version tokens cut off at first underscore

https://rt.cpan.org/Public/Bug/Display.html?id=75038

An example from the PPI::Token::Number::Version documentation (PPI 1.215):

ppidump '10_000.10_000.10_000'
                    PPI::Document
                      PPI::Statement
[    1,   1,   1 ]     PPI::Token::Number::Float        '10_000.10_000'
[    1,  14,  14 ]     PPI::Token::Number::Float        '.10_000'

alternately:

ppidump 'v10_000.10_000.10_000'
                    PPI::Document
                      PPI::Statement
[    1,   1,   1 ]     PPI::Token::Number::Version      'v10'
[    1,   4,   4 ]     PPI::Token::Word         '_000'
[    1,   8,   8 ]     PPI::Token::Number::Float        '.10_000'
[    1,  15,  15 ]     PPI::Token::Number::Float        '.10_000'

whereas

ppidump '10000.10000.10000'
                    PPI::Document
                      PPI::Statement
[    1,   1,   1 ]     PPI::Token::Number::Version      '10000.10000.10000'

ppidump 'v10000.10000.10000'
                    PPI::Document
                      PPI::Statement
[    1,   1,   1 ]     PPI::Token::Number::Version      'v10000.10000.10000'

better api for encoded input

Ether summarized this in IRC just now and it seems perfectly fine and i'll need to do it asap.

14-01-22@02:41:35 (@ether) Mithaldu: basically, PPI needs separate interfaces for new_from_file, new_from_handle, and new_from_content -- for the first, you need a separate encoding argument; for the second, you should force the caller to apply the right layers to the $fh in advance; for new_from_content, presume characters (decoded)

-1 parsing as number rather than operator and 1

Perl-Critic/Perl-Critic#500 is not happy with this "-1" parsing as a number rather than the operator '-' followed by the number 1:

ppidump '(1)-1'
                    PPI::Document
                      PPI::Statement
                        PPI::Structure::List    ( ... )
                          PPI::Statement::Expression
[    1,   2,   2 ]         PPI::Token::Number   '1'
[    1,   4,   4 ]     PPI::Token::Number       '-1'

However it's a different story with '+':

ppidump '(1)+1'
                    PPI::Document
                      PPI::Statement
                        PPI::Structure::List    ( ... )
                          PPI::Statement::Expression
[    1,   2,   2 ]         PPI::Token::Number   '1'
[    1,   4,   4 ]     PPI::Token::Operator     '+'
[    1,   5,   5 ]     PPI::Token::Number       '1'

Capture variables above $9 misparsed

Perl-Critic/Perl-Critic#455
https://rt.cpan.org/Public/Bug/Display.html?id=72980

PPI does not recognize that the numbered capture variables can go higher than $9. E.g. for:

$_ = 'xxxxxxxxxxxxxxxxxxxx';
/(.)(.)(.)(.)(.)(.)(.)(.)(.)(.)(.)(.)(.)/;
my $x = $13;
print $x;
ppidump 'my $x = $13;'
                    PPI::Document
                      PPI::Statement::Variable
[    1,   1,   1 ]     PPI::Token::Word         'my'
[    1,   4,   4 ]     PPI::Token::Symbol       '$x'
[    1,   7,   7 ]     PPI::Token::Operator     '='
[    1,   9,   9 ]     PPI::Token::Magic        '$1'
[    1,  11,  11 ]     PPI::Token::Number       '3'
[    1,  12,  12 ]     PPI::Token::Structure    ';'

RT 86553: hashref in function call parses as block not constructor

https://rt.cpan.org/Public/Bug/Display.html?id=86553

ppidump 'do_something({ %options });'
                    PPI::Document
                      PPI::Statement
[    1,   1,   1 ]     PPI::Token::Word         'do_something'
                        PPI::Structure::List    ( ... )
                          PPI::Statement::Compound
                            PPI::Structure::Block       { ... }
                              PPI::Statement
[    1,  16,  16 ]             PPI::Token::Symbol       '%options'
[    1,  27,  27 ]     PPI::Token::Structure    ';'

Happily, the normal Perl workaround makes it parse as expected:

ppidump 'do_something(+{ %options });'
                    PPI::Document
                      PPI::Statement
[    1,   1,   1 ]     PPI::Token::Word         'do_something'
                        PPI::Structure::List    ( ... )
                          PPI::Statement::Expression
[    1,  14,  14 ]         PPI::Token::Operator         '+'
                            PPI::Structure::Constructor         { ... }
                              PPI::Statement
[    1,  17,  17 ]             PPI::Token::Symbol       '%options'
[    1,  28,  28 ]     PPI::Token::Structure    ';'

statement of word + block doesn't recognize implicit statement end

When a statement consists of a word plus a block and is supposed to end implicitly after the block, the statement instead keeps picking up tokens until it encounters an explicit statement end. E.g.:

ppidump 'DESTROY {} sub foo {} 1; sub bar{}'
                    PPI::Document
                      PPI::Statement
[    1,   1,   1 ]     PPI::Token::Word         'DESTROY'
                        PPI::Structure::Block   { ... }
[    1,  12,  12 ]     PPI::Token::Word         'sub'
[    1,  16,  16 ]     PPI::Token::Word         'foo'
                        PPI::Structure::Block   { ... }
[    1,  23,  23 ]     PPI::Token::Number       '1'
[    1,  24,  24 ]     PPI::Token::Structure    ';'
                      PPI::Statement::Sub
[    1,  26,  26 ]     PPI::Token::Word         'sub'
[    1,  30,  30 ]     PPI::Token::Word         'bar'
                        PPI::Structure::Block   { ... }

The DESTROY+block statement is supposed to end when the block ends, but it it doesn't actually end until it sees the ';' after sub foo. If we change the initial statement to be something other than word+block, the DESTROY statement ends properly and foo is recognized as a sub:

ppidump 'sub DESTROY {} sub foo {} 1; sub bar{}'
                    PPI::Document
                      PPI::Statement::Sub
[    1,   1,   1 ]     PPI::Token::Word         'sub'
[    1,   5,   5 ]     PPI::Token::Word         'DESTROY'
                        PPI::Structure::Block   { ... }
                      PPI::Statement::Sub
[    1,  16,  16 ]     PPI::Token::Word         'sub'
[    1,  20,  20 ]     PPI::Token::Word         'foo'
                        PPI::Structure::Block   { ... }
                      PPI::Statement
[    1,  27,  27 ]     PPI::Token::Number       '1'
[    1,  28,  28 ]     PPI::Token::Structure    ';'
                      PPI::Statement::Sub
[    1,  30,  30 ]     PPI::Token::Word         'sub'
[    1,  34,  34 ]     PPI::Token::Word         'bar'
                        PPI::Structure::Block   { ... }

This word+block pattern occurs for the special subs AUTOLOAD and DESTROY if you omit their optional 'sub'. Those two cases will no longer be an issue when #39 is applied for #31 .

do+block doesn't have the problem because it does not end implicitly. It's followed by 'until' or ';' or an expression.

'sub {} sub foo{}' fits the word+block+implicit end pattern, but it doesn't compile.

Are there any other naturally-occurring instances of word+block+implicit statement end?

PPI::Token::Prototype::__TOKENIZER__on_char uses capture var in undefined state

sub __TOKENIZER__on_char {
        my $class = shift;
        my $t     = shift;

        # Suck in until we find the closing bracket (or the end of line)
        my $line = substr( $t->{line}, $t->{line_cursor} );
        if ( $line =~ /^(.*?(?:\)|$))/ ) {
                $t->{token}->{content} .= $1;
                $t->{line_cursor} += length $1;
        }

        # Shortcut if end of line
        return 0 unless $1 =~ /\)$/;

        # Found the closing bracket
        $t->_finalize_token->__TOKENIZER__on_char( $t );
}

If $line does not match the regex, there will nevertheless be a regex match against whatever contents $1 had when this function was called.

I haven't come up with a failing test yet.

cpan testing script

This should be written as an author test script:

A script that requires a minicpan to be available, unpacks all distributions in it, and runs all perl files in it through PPI, throwing a fail when errors are encountered, both to find general bugs and to find cases that trigger code thought to be dead (#9). It could additionally also collect statistics of file size and file parsing time in order to find cases where PPI performs badly (#5).

RT 75921: match on implicit $_ after map/grep not recogized

https://rt.cpan.org/Public/Bug/Display.html?id=75921

PPI 1.215 does not recognize a match against implicit $_ that follows map/grep if the match does not include 'm':

ppidump 'map { 0 } /z/'
                    PPI::Document
                      PPI::Statement
[    1,   1,   1 ]     PPI::Token::Word         'map'
                        PPI::Structure::Block   { ... }
                          PPI::Statement
[    1,   7,   7 ]         PPI::Token::Number   '0'
[    1,  11,  11 ]     PPI::Token::Operator     '/'
[    1,  12,  12 ]     PPI::Token::Word         'z'
[    1,  13,  13 ]     PPI::Token::Operator     '/'

But with more information like 'm', it's fine:

ppidump 'map { 0 } m/z/'
                    PPI::Document
                      PPI::Statement
[    1,   1,   1 ]     PPI::Token::Word         'map'
                        PPI::Structure::Block   { ... }
                          PPI::Statement
[    1,   7,   7 ]         PPI::Token::Number   '0'
[    1,  11,  11 ]     PPI::Token::Regexp::Match        'm/z/'

contributor access to the repo for @moregan?

Adam, @moregan is doing excellent work in hunting down bugs and writing both tests and fixes. Especially the ability to manage the issues on this repo would be a great boon for him. Could you grant him access to the repo, on the condition that we proceed as before and only merge branches after they've been reviewed?

what is a structure without braces?

Currently in the hospital, so keeping myself short.

Adding more tests I found a few places where code assumes that structure objects without braces (strictly: without ->start()) can exist. How would such an object come to be? I can't think of an initial parse that would result in such, nor does deleting the opening brace token do it.

PPI::Statement::Variable too greedy

ppidump 'open( my $fh, ">", $filename );'
                    PPI::Document
                      PPI::Statement
[    1,   1,   1 ]     PPI::Token::Word         'open'
                        PPI::Structure::List    ( ... )
                          PPI::Statement::Variable
[    1,   7,   7 ]         PPI::Token::Word     'my'
[    1,  10,  10 ]         PPI::Token::Symbol   '$fh'
[    1,  13,  13 ]         PPI::Token::Operator         ','
[    1,  15,  15 ]         PPI::Token::Quote::Double    '">"'
[    1,  18,  18 ]         PPI::Token::Operator         ','
[    1,  20,  20 ]         PPI::Token::Symbol   '$filename'
[    1,  31,  31 ]     PPI::Token::Structure    ';'

variables() on the PPI::Statement::Variable returns just '$fh', which seems right to me, but it doesn't seem right that anything following '$fh' is part of the statement.

It also doesn't seem right that initializers for declared variables become part of the PPI::Statement::Variable:

ppidump 'my $x = 1;';
                    PPI::Document
                      PPI::Statement::Variable
[    1,   1,   1 ]     PPI::Token::Word         'my'
[    1,   4,   4 ]     PPI::Token::Symbol       '$x'
[    1,   7,   7 ]     PPI::Token::Operator     '='
[    1,   9,   9 ]     PPI::Token::Number       '1'
[    1,  10,  10 ]     PPI::Token::Structure    ';'

The absence of a facility like initializers() in PPI::Statement::Variables implies (to me) that including the initializer is not by design.

In playing around with Lexer.pm I found it pretty easy to have a variable declaration without parens stop after it sees the variable:

+               if ( $Statement->isa('PPI::Statement::Variable') ) {
+                       my @schildren = $Statement->schildren();
+                       if ( @schildren > 1 and !$schildren[1]->isa('PPI::Structure::List') ) {
+                               return $self->_rollback( $Token );
+                       }
+               }

but from the results it looks like that change is too naive:

ppidump 'open( my $fh, ">", $filename );'
                    PPI::Document
                      PPI::Statement
[    1,   1,   1 ]     PPI::Token::Word         'open'
                        PPI::Structure::List    ( ... )
                          PPI::Statement::Variable
[    1,   7,   7 ]         PPI::Token::Word     'my'
[    1,  10,  10 ]         PPI::Token::Symbol   '$fh'
                          PPI::Statement::Expression
[    1,  13,  13 ]         PPI::Token::Operator         ','
[    1,  15,  15 ]         PPI::Token::Quote::Double    '">"'
[    1,  18,  18 ]         PPI::Token::Operator         ','
[    1,  20,  20 ]         PPI::Token::Symbol   '$filename'
[    1,  31,  31 ]     PPI::Token::Structure    ';'

I don't know enough about the lexing/parsing to know whether it's just a case of needing a little more logic at statement end/statement begin, whether it's a fundamental problem of a variable declaration being an expression, or what.

x operator not recognized in '$a x3'

ppidump '$a x3'
                    PPI::Document
                      PPI::Statement
[    1,   1,   1 ]     PPI::Token::Symbol       '$a'
[    1,   4,   4 ]     PPI::Token::Word         'x3'

The 'x' should be recognized as the x operator, since Perl does:

perl -WE '(1)x3'
Useless use of repeat (x) in void context at -e line 1.

nonsensical code in Whitespace->__TOKENIZER__on_char

There is a bit there that seems to try to determine whether a character outside of the ASCII range is word or whitespace, however instead of actually looking at the current character it looks at the stringified tokenizer, which is just a perl address. I'm unclear on whether the tokenizer is supposed to stringify, or whether this was just a piece where the meaning of $t changed without the code adapting. Anything but the obvious change to chr($char) you'd like done here?

RT 30037: minus operator turns function name into two words

https://rt.cpan.org/Ticket/Display.html?id=30037
With PPI 1.215:

ppidump '$a=-xx::cc()'
                    PPI::Document
                      PPI::Statement
[    1,   1,   1 ]     PPI::Token::Symbol       '$a'
[    1,   3,   3 ]     PPI::Token::Operator     '='
[    1,   4,   4 ]     PPI::Token::Word         '-xx'
[    1,   7,   7 ]     PPI::Token::Word         '::cc'
                        PPI::Structure::List    ( ... )

Without the minus you get what you'd expect:

ppidump '$a=xx::cc()'
                    PPI::Document
                      PPI::Statement
[    1,   1,   1 ]     PPI::Token::Symbol       '$a'
[    1,   3,   3 ]     PPI::Token::Operator     '='
[    1,   4,   4 ]     PPI::Token::Word         'xx::cc'
                        PPI::Structure::List    ( ... )

See also https://rt.cpan.org/Public/Bug/Display.html?id=55749, which has a lot of analysis and sample Perl code

RT 27364: DESTROY and AUTOLOAD don't parse as subs without 'sub'

Jeff Thalhammer points out that Perl allows you to omit 'sub' from DESTROY and AUTOLOAD:

moregan@moregan[~]$ perl -WE 'AUTOLOAD {1;}'
moregan@moregan[~]$ perl -WE 'package x; DESTROY {1;}'
moregan@moregan[~]$ 

but PPI doesn't recognized them as subs unless 'sub' is included:

ppidump 'AUTOLOAD {;}'
                    PPI::Document
                      PPI::Statement
[    1,   1,   1 ]     PPI::Token::Word     'AUTOLOAD'
                        PPI::Structure::Block   { ... }
                          PPI::Statement::Null
[    1,  11,  11 ]         PPI::Token::Structure    ';'

ppidump 'sub AUTOLOAD {;}'
                    PPI::Document
                      PPI::Statement::Sub
[    1,   1,   1 ]     PPI::Token::Word     'sub'
[    1,   5,   5 ]     PPI::Token::Word     'AUTOLOAD'
                        PPI::Structure::Block   { ... }
                          PPI::Statement::Null
[    1,  15,  15 ]         PPI::Token::Structure    ';'

https://rt.cpan.org/Public/Bug/Display.html?id=27364

extract generated test scripts?

I just realized that a whole bunch of test scripts that i need to change are generated from POD at runtime. Since having code in comments is a terrible idea, i'd like to extract them and put them into scripts permanently. Any particular opposition to this?

Many package names that are also keywords misparsed

PPI 1.215 and 1.216_01

Some package names that are also keywords don't parse as Word:

ppidump 'package x;'
                    PPI::Document
                      PPI::Statement::Package
[    1,   1,   1 ]     PPI::Token::Word         'package'
[    1,   9,   9 ]     PPI::Token::Operator     'x'
[    1,  10,  10 ]     PPI::Token::Structure    ';'

bless, return, and scalar as package names parse as Word, but they force the following curly braces to be a hash constructor instead of a block:

ppidump 'package scalar {}'
                    PPI::Document
                      PPI::Statement::Package
[    1,   1,   1 ]     PPI::Token::Word         'package'
[    1,   9,   9 ]     PPI::Token::Word         'scalar'
                        PPI::Structure::Constructor     { ... }

Comments still direct people to rt.cpan.org

ack rt.cpan.org
lib/PPI.pm
762:L<http://rt.cpan.org/NoAuth/ReportBug.html?Queue=PPI>

lib/PPI/Structure/List.pm
34:Got any ideas for methods? Submit a report to rt.cpan.org!

lib/PPI/Structure/Subscript.pm
37:Got any ideas for methods? Submit a report to rt.cpan.org!

lib/PPI/Structure/Given.pm
32:Got any ideas for methods? Submit a report to rt.cpan.org!

lib/PPI/Structure/For.pm
32:Got any ideas for methods? Submit a report to rt.cpan.org!

lib/PPI/Structure/When.pm
32:Got any ideas for methods? Submit a report to rt.cpan.org!

lib/PPI/Structure/Unknown.pm
38:Got any ideas for methods? Submit a report to rt.cpan.org!

lib/PPI/Structure/Constructor.pm
31:Got any ideas for methods? Submit a report to rt.cpan.org!

lib/PPI/Structure/Block.pm
41:Got any ideas for methods? Submit a report to rt.cpan.org!

lib/PPI/Structure/Condition.pm
36:Got any ideas for methods? Submit a report to rt.cpan.org!

lib/PPI/Token/QuoteLike/Command.pm
27:Got any ideas for methods? Submit a report to rt.cpan.org!

lib/PPI/Token/QuoteLike/Readline.pm
36:Got any ideas for methods? Submit a report to rt.cpan.org!

lib/PPI/Token/QuoteLike/Backtick.pm
27:Got any ideas for methods? Submit a report to rt.cpan.org!

lib/PPI/Token/Cast.pm
30:Got any ideas for methods? Submit a report to rt.cpan.org!

lib/PPI/Token/Regexp/Transliterate.pm
35:Got any ideas for methods? Submit a report to rt.cpan.org!

lib/PPI/Token/Regexp/Match.pm
41:Got any ideas for methods? Submit a report to rt.cpan.org!

lib/PPI/Token/Regexp/Substitute.pm
31:Got any ideas for methods? Submit a report to rt.cpan.org!

lib/PPI/Token/Pod.pm
25:Got any ideas for more methods? Submit a report to rt.cpan.org!

lib/PPI/Token/ArrayIndex.pm
25:Got any ideas for methods? Submit a report to rt.cpan.org!

lib/PPI/Token/Attribute.pm
31:Got any ideas for methods? Submit a report to rt.cpan.org!

lib/PPI/Token/Quote/Interpolate.pm
27:Got any ideas for methods? Submit a report to rt.cpan.org!

lib/PPI/Token/Quote/Literal.pm
27:Got any ideas for methods? Submit a report to rt.cpan.org!

lib/PPI/Token/Quote/Double.pm
30:Got any ideas for methods? Submit a report to rt.cpan.org!

lib/PPI/Token/Quote/Single.pm
33:Got any ideas for methods? Submit a report to rt.cpan.org!

lib/PPI/Token/Label.pm
27:Got any ideas for methods? Submit a report to rt.cpan.org!

lib/PPI/Token/Operator.pm
40:Got any ideas for methods? Submit a report to rt.cpan.org!

lib/PPI/Statement/Given.pm
32:Got any ideas for methods? Submit a report to rt.cpan.org!

lib/PPI/Statement/When.pm
40:Got any ideas for methods? Submit a report to rt.cpan.org!

t/08_regression.t
45:# Regression Test for rt.cpan.org #11522
132:# rt.cpan.org: Ticket #16671 $_ is not localized

inc/Module/Install/Metadata.pm
560:     https?\Q://rt.cpan.org/\E[^>]+|

Pod below __END__ not parsed if __DATA__ section is present

Pod below __END__ is not recognized if a __DATA__ section is present.
See #15 for additional info.

use strict;
use warnings;
use feature 'say';
use PPI::Document;

=head1 Test
This is a piece of POD.
=head2 Sub
This is also a piece of POD.
=cut

my $content = '';
{
    local $/;
    open my $fh, '<', $0;
    $content = <$fh>;
    close $0;
}

my $doc = PPI::Document->new(\$content);
my $pod .= PPI::Token::Pod->merge(@{$doc->find('PPI::Token::Pod')});
say $pod;

__DATA__
# some data here
__END__

=head1 More
This should also be a piece of POD. Should it?
=cut

This test script yields:

=pod

=head1 Test
This is a piece of POD.
=head2 Sub
This is also a piece of POD.

=cut

Expected output:

=pod

=head1 Test
This is a piece of POD.
=head2 Sub
This is also a piece of POD.

=head1 More
This should also be a piece of POD. Should it?

=cut

Caching ->isa()

Has anyone looked into caching ->isa() at the PPI::Element level?

The number of calls to ->isa() in a "make nytprof" run of Perl::Critic is staggering, and I have to think that at least some of those are redundant. Maybe it would be a win if each call to ->isa( 'whatever' ) would cache the result of that lookup.

I can go poking at this, but didn't want to waste my time if this was already considered and rejected.

content is read as octets, not characters, with no concept of decoding

new($filename) reads the file as bytes, with no encoding layers, so any content that isn't Latin1 will cause issues. new(\$content) is read exactly the same way.

Files need to be decoded, so an encoding parameter is needed. Content strings either need to be decoded similarly, or be passed as already-decoded characters.

I would suggest new APIs that don't conflict with the existing ones, to try to preserve backcompat as much as possible.

RFC: ellipsis "..." statement parses as operator. What types would be better?

Perl 5.12 introduced the ellipsis statement, "...". Currently "..." always parses as a PPI::Token::Operator. perl5120delta.pod refers to it as an operator, but perlsyn is pretty clear that it's really a statement, making the use of Operator wrong.

It's not too bad that the ellipsis becomes a child of a simple PPI::Statement, but, given the fact that it throws, would it be more appropriate to have it be a child of PPI::Statement::Break? A new statement type altogether?

The existing token types don't seem to fit the ellipsis. Should there be a PPI::Token::Ellipsis (subclass of PPI::Token)?

comments?

POD below __DATA__ not recognized.

Pod in the __DATA__ section is valid acording to the documentation. Maybe this is on purpose (how good is good enough?) but it will be correctly parsed below __END__ so it maybe is not.

use strict;
use warnings;
use feature 'say';
use PPI::Document;

=head1 Test
This is a piece of POD.
=head2 Sub
This is also a piece of POD.
=cut

my $content = '';
{
    local $/;
    open my $fh, '<', $0;
    $content = <$fh>;
    close $0;
}

my $doc = PPI::Document->new(\$content);
my $pod .= PPI::Token::Pod->merge(@{$doc->find('PPI::Token::Pod')});
say $pod;

__DATA__

=head1 More
This should also be a piece of POD. Should it?
=cut

This little test yields:

=pod

=head1 Test
This is a piece of POD.
=head2 Sub
This is also a piece of POD.

=cut

When it should yield:

=pod

=head1 Test
This is a piece of POD.
=head2 Sub
This is also a piece of POD.

=head1 More
This should also be a piece of POD. Should it?

=cut

PPI::Token::Prototype->prototype does not strip parens/whitespace as documented

=head2 prototype

The C<prototype> accessor returns the actual prototype pattern, stripped
of braces and any whitespace inside the pattern.

=cut

sub prototype {
        my $self  = shift;
        my $proto = $self->content;
        $proto =~ s/\(\)\s//g; # Strip brackets and whitespace
        $proto;
}

The documentation says the return of prototype() has parentheses and internal whitespace, but stripping never happens due to the malformed regex, which probably intended for the parens and \s to be in a character class. As it stands, prototype() will always return the same value as content().

How to deal with ambiguous parses?

Right now PPI seems to be decidedly undecided on how to deal with code that cannot be decided confidently as to its meaning. An example as follows:

sub d { 1 };
my @c = 3 .. 6;
say 1 if d ~~ @c;

use v5.10.1;             # !
sub d () { 1 };          # !
my @c = 3 .. 6;
say 1 if d ~~ @c;

The ~~ in the last line interpreted as a single operator in both cases. However in the first case it should be two operators and only in the second case should it be parsed as the smart-match operator.

In my opinion the current handling of that code is unacceptable.

I am however unsure on how it should be handled instead and am as such fishing for general opinions on how PPI should handle ambiguous parses.

Merging method

Can you please rebase branches before merging them, or cherry-pick their commits and then close issues, instead of using the auto-merge? There is a massive multitude of reasons for this, which can be summarized as "non-ff merge results in utterly crazy and debug-hostile history": https://github.com/adamkennedy/PPI/network

Alternately just tag an issue as "good to merge" and i'll deal with it.

anon hashref after operator treated as code block

from Perl-Critic/Perl-Critic#192 , hash constructor parses as a code block

ppidump '0 || {b => 1, a => 1};'
                    PPI::Document
                      PPI::Statement
[    1,   1,   1 ]     PPI::Token::Number       '0'
[    1,   3,   3 ]     PPI::Token::Operator     '||'
                        PPI::Structure::Block   { ... }
                          PPI::Statement
[    1,   7,   7 ]         PPI::Token::Word     'b'
[    1,   9,   9 ]         PPI::Token::Operator         '=>'
[    1,  12,  12 ]         PPI::Token::Number   '1'
[    1,  13,  13 ]         PPI::Token::Operator         ','
[    1,  15,  15 ]         PPI::Token::Word     'a'
[    1,  17,  17 ]         PPI::Token::Operator         '=>'
[    1,  20,  20 ]         PPI::Token::Number   '1'
[    1,  22,  22 ]     PPI::Token::Structure    ';'

At least some other operators do not parse it as a block:

ppidump '0, {b => 1, a => 1};'
                    PPI::Document
                      PPI::Statement
[    1,   1,   1 ]     PPI::Token::Number       '0'
[    1,   2,   2 ]     PPI::Token::Operator     ','
                        PPI::Structure::Constructor     { ... }
                          PPI::Statement::Expression
[    1,   5,   5 ]         PPI::Token::Word     'b'
[    1,   7,   7 ]         PPI::Token::Operator         '=>'
[    1,  10,  10 ]         PPI::Token::Number   '1'
[    1,  11,  11 ]         PPI::Token::Operator         ','
[    1,  13,  13 ]         PPI::Token::Word     'a'
[    1,  15,  15 ]         PPI::Token::Operator         '=>'
[    1,  18,  18 ]         PPI::Token::Number   '1'
[    1,  20,  20 ]     PPI::Token::Structure    ';'

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.