ppi's People
Forkers
dsteinbrunner wolfsage moregan moznion guillaumeaubert kevindawson cowens schwern mishin karenetheridge tsibley zhurs jmaslak contyk van-de-bugger chrestomanci hurricup ksurent edenhochbaum arount abeltje rurban shlomif grinnz akiym chriscapaci kentfredric cv-library s-nez kentnl-gentoo randyl book manwar evancarroll trwyant jkeenan zmughal perlservices nanto haarg clayne h3xx sysfce2 happy-barneyppi's Issues
RT 67831: implicit statement end not recognized for perl 5.12-style package
https://rt.cpan.org/Public/Bug/Display.html?id=67831
PPI 1.1.215/1.216_01 do not recognize the implicit end of statement that follows the block in a Perl 5.12 package statement:
ppidump 'package Foo {} sub bar { 1; }'
PPI::Document
PPI::Statement::Package
[ 1, 1, 1 ] PPI::Token::Word 'package'
[ 1, 9, 9 ] PPI::Token::Word 'Foo'
PPI::Structure::Block { ... }
[ 1, 16, 16 ] PPI::Token::Word 'sub'
[ 1, 20, 20 ] PPI::Token::Word 'bar'
PPI::Structure::Block { ... }
PPI::Statement
[ 1, 26, 26 ] PPI::Token::Number '1'
[ 1, 27, 27 ] PPI::Token::Structure ';'
With an explicit statement terminator, it's fine:
ppidump 'package Foo {} ; sub bar { 1; }'
PPI::Document
PPI::Statement::Package
[ 1, 1, 1 ] PPI::Token::Word 'package'
[ 1, 9, 9 ] PPI::Token::Word 'Foo'
PPI::Structure::Block { ... }
[ 1, 16, 16 ] PPI::Token::Structure ';'
PPI::Statement::Sub
[ 1, 18, 18 ] PPI::Token::Word 'sub'
[ 1, 22, 22 ] PPI::Token::Word 'bar'
PPI::Structure::Block { ... }
PPI::Statement
[ 1, 28, 28 ] PPI::Token::Number '1'
[ 1, 29, 29 ] PPI::Token::Structure ';'
The RT ticket includes a patch.
Package names beginning with 'v' plus a digit parsed as version strings
PPI 1.215 and 1.216_01
ppidump 'package v10;'
PPI::Document
PPI::Statement::Package
[ 1, 1, 1 ] PPI::Token::Word 'package'
[ 1, 9, 9 ] PPI::Token::Number::Version 'v10'
[ 1, 12, 12 ] PPI::Token::Structure ';'
perl -WE 'package v10; print __PACKAGE__'
v10
and
ppidump 'package v10g;'
PPI::Document
PPI::Statement::Package
[ 1, 1, 1 ] PPI::Token::Word 'package'
[ 1, 9, 9 ] PPI::Token::Number::Version 'v10'
[ 1, 12, 12 ] PPI::Token::Word 'g'
[ 1, 13, 13 ] PPI::Token::Structure ';'
PPI::Statement::Sub methods need testing.
There is no testing of the PPI::Statement::Sub methods name, prototype, block, forward, and reserved. Verified by inserting a 'die;' as the first line of all these methods and getting a clean test run.
There is some incidental coverage of reserved, name, and block (merely because they're used) in https://github.com/moregan/PPI/tree/AUTOLOAD-DESTROY-without-sub
[Feature Request] Parse/Token plugins
As said on IRC to @wchristian , if a rewrite is going to happen, something that would be "nice" to think about is having a proviso for non-standard syntax extensions.
So that perhaps, code that knows it is about to parse Devel::Declare based code, can load a plugin that knows how to spice the syntax, and pass the plugin to PPI, and PPI can emit structures, and re-serialize back, sanely and safely.
Then maybe down the road, we could work out how to write a plugin that dynamically loads other plugins on demand based on hints in the source being parsed, and cover more of the edgecases encountered by metasyntax.
Though I don't exactly have any idea of how such a plugin would look, or how such a plugin would be passed over, just a general sense of "this would be nice and useful"
Please release the current master as 1.216_01
@adamkennedy There's more to do, but we've got a sizable amount of changes that we'd like to see chewed through by the CPAN smokers. Can you please release the current master as dev version 1.216_01?
Alternately, if you feel like handing out COMAINT, i'd happily do it myself too. :)
how is version bumping done for this dist?
@adamkennedy I scanned through Makefile and couldn't find anything that bumps the versions in the dist. Do you have some script for that?
size limit questions
Currently PPI has a hard-coded size limit on the files it is willing to parse. There are two questions here:
- the commend on that code says that big files "blow up the Tokenizer/Lexer". What does this mean? Crash, subtle errors? Too much resource use?
- I'm thinking of making the tokenizer taken an option for maximum size (in addition to an env var), but the tokenizer constructor does not yet have and code for options it can take. Is there a recommended example or method, or should i just go with what seems proper?
Misparse of &&= and ||=
Hi there-
I was smoking Perl::Critic with the latest PPI, and I came across this...
As of 1.216_01, PPI incorrectly parses the expression $foo ||= 0;
as follows:
PPI::Document
PPI::Statement
[ 1, 1, 1 ] PPI::Token::Symbol '$foo'
[ 1, 6, 6 ] PPI::Token::Operator '||'
[ 1, 8, 8 ] PPI::Token::Operator '='
[ 1, 10, 10 ] PPI::Token::Number '0'
[ 1, 11, 11 ] PPI::Token::Structure ';'
Notice that ||
and =
are parsed as two operators. Same goes for &&=
but not //=
or other types of assignment operators.
rt.cpan.org tickets to close
From a pass over the PPI rt.cpan.org queue:
RT tickets to close with 1.216 from due to merging of moregan branches:
- 68176 and 71705 -- support all augmented assignment operators
3353672 - RT 75039: don't allow '=CUT' to terminate POD
1bdce9b - RT 36540 -- support upper case in hex and binary numbers
6279fbb - RT 45014: parse '12.34..56.78' parsed as version string + '..' + float
b4d5644 - RT 51693: fix pod markup containing '>'
1cafad1 - RT 30863: spelling fix
c2d6b37 - RT 67264: fix spelling of Tom Christiansen
dda1721
Other RT tickets to close due to changes in 1.216 and before:
- RT 85049 -- Merge pull request #6 from dsteinbrunner/patch-1
7b07326 - RT 69026 Patched in #3
45968ef - RT 90792 Patched in: #2
4158513 - RT 45471 -- appears to not be a bug, according to the submitter's remarks
- RT 36556 -- apparent fixed between Sat Jun 07 18:37:32 2008 PPI 1.215
- RT 35829 -- in 1.215 there is no code at all in the perldoc output.
Submitter was seeing either something that is now gone (submitted
May 2008), or the inline tests.
RT 74527: sub v2 {} parsed as a version string
ppidump 'sub v2 {1;}'
PPI::Document
PPI::Statement
[ 1, 1, 1 ] PPI::Token::Word 'sub'
[ 1, 5, 5 ] PPI::Token::Number::Version 'v2'
PPI::Structure::Block { ... }
PPI::Statement
[ 1, 9, 9 ] PPI::Token::Number '1'
[ 1, 10, 10 ] PPI::Token::Structure ';'
even if the sub name only starts off looking like a version string:
ppidump 'sub v2go {1;}'
PPI::Document
PPI::Statement
[ 1, 1, 1 ] PPI::Token::Word 'sub'
[ 1, 5, 5 ] PPI::Token::Number::Version 'v2'
[ 1, 7, 7 ] PPI::Token::Word 'go'
PPI::Structure::Block { ... }
PPI::Statement
[ 1, 11, 11 ] PPI::Token::Number '1'
[ 1, 12, 12 ] PPI::Token::Structure ';'
After this PPI sticks everything up to the next explicit statement separator into the sub's statement, a la #31
The RT ticket includes an idea of where to fix the problem: https://rt.cpan.org/Public/Bug/Display.html?id=74527
php-style error handling to perl-style error handling?
A generic question:
Errors in PPI are handled by capturing them and stuffing them in ->errstr, effectively hiding them unless the user knows to look for them. Over the past years the perl community has converged on treating this as an anti-pattern. Are there any stringent reasons against migrating PPI to "die on failure" behavior?
Many operators/builtins not separated from following single quote
PPI::Token::Word has code to make sure that some operators/builtins (eq ne ge le gt lt q qq qx qw qr m s tr y pack unpack) are separated from an immediately-following single quote because that’s what perl does. E.g.: “ $foo eq’bar’ “ is parsed by PPI and by perl as a symbol, the eq operator, and a single-quoted string. However, there are many words that perl separates that PPI does not, e.g.: ‘cmp’:
ppidump "\$foo cmp'bar'"
PPI::Document
PPI::Statement
[ 1, 1, 1 ] PPI::Token::Symbol '$foo'
[ 1, 6, 6 ] PPI::Token::Word 'cmp'bar'
[ 1, 13, 13 ] PPI::Token::Quote::Single '''
There are dozens of words PPI::Token::Word doesn't handle in regen/keywords.pl of the perl sources. Presumably most (all?) of them should be handled.
See Perl-Critic/Perl-Critic#451 for a real-world example.
request for comaint :)
Mithaldu here.
As mentioned in my Email, i'd like to help PPI index better on metacpan, as well as give it love in regards to the many bug tickets it has. As such, i would like to have comaint. :)
Can't load file written in perl string normally
Hi.
If file which target to load contains perl string; like
use utf8;
my $hash = { 東京 => 'tokyo' };
then result of PPI::Document::File->new($file)
is undef, means it cannot handle perl string rightly (if I remember correctly, perl implementation allows bare word written in perl string as key of hash).
So I wrote a patch that adds perl string option to constructor.
moznion@757f382
However I think other better way probably exists.
How do you feel?
logic for detection of labeled statements in Lexer::_add_element needs to be tested
Right now it checks that $Parent->schild(1)
is false, but then it goes on to request and use $second->content
, which seems broken. Flipping the comparison doesn't cause any tests to fail so i assume this is untested.
Tried to dig through commit history, but it ends at "cvs import" before any code changes to that segment are made.
Should AUTOLOAD, DESTROY, et al. tokenize as PPI::Statement::Scheduled ?
In https://rt.cpan.org/Public/Bug/Display.html?id=27364 Jeffrey Thalhammer suggests that AUTOLOAD should yield a PPI::Statement::Scheduled, not a mere PPI::Statement::Sub.
DESTROY is similar to AUTOLOAD. They are both special methods called by Perl. They subs even when, as Perl allows, "sub" is omitted. They are not quite as special as BEGIN, END, etc. which are code blocks and can be repeated. threads.pm calls the CLONE method (from 5.7.3) and CLONE_SKIP method (from 5.8.7).
Is PPI::Statement::Scheduled reserved only for the five special blocks (BEGIN, UNITCHECK, CHECK, INIT, END) "intended to be run at a specific time during the loading process." as the documentation says, or should it apply to special functions Perl will call in general? If the latter, would the tie methods count? Anything else?
dead code?
I'm currently writing tests to improve the coverage of the test suite, and am finding what looks like dead code. After about 8 hours altogether of trying to find code that will trigger this if condition i haven't found any, and it looks strongly like the handling of these circumstance has been moved to the commit function. Most notably i see this because the only way to satisfy the if condition is to have a single :
recognized as a sub attribute, followed by triggering of Word->__TOKENIZER__on_char
, however the execution path seems to only be able to lead into Whitespace->__TOKENIZER__on_char
, which can only lead to Word->__TOKENIZER__commit
.
If you know of any string that can trigger this code, please let me know so i can add the test, otherwise i regard this as dead code that can be removed.
x sometimes parsing as operator not word
PPI 1.215 is parsing some instances of 'x' as an operator rather than a word:
ppidump '1=>x'
PPI::Document
PPI::Statement
[ 1, 1, 1 ] PPI::Token::Number '1'
[ 1, 2, 2 ] PPI::Token::Operator '=>'
[ 1, 4, 4 ] PPI::Token::Operator 'x'
ppidump '%hash=(1=>x)'
PPI::Document
PPI::Statement
[ 1, 1, 1 ] PPI::Token::Symbol '%hash'
[ 1, 6, 6 ] PPI::Token::Operator '='
PPI::Structure::List ( ... )
PPI::Statement::Expression
[ 1, 8, 8 ] PPI::Token::Number '1'
[ 1, 9, 9 ] PPI::Token::Operator '=>'
[ 1, 11, 11 ] PPI::Token::Operator 'x'
perl (5.8.8 and 5.18.1) parses the x as a word:
perl -We "my %hash=(1=>x);"
Unquoted string "x" may clash with future reserved word at -e line 1.
Curiously, xor gets different treatment from perl:
perl -WE "my %hash=(1=>xor);"
syntax error at -e line 1, near "xor)"
Execution of -e aborted due to compilation errors.
RT 75038: PPI::Token::Number::Version tokens cut off at first underscore
https://rt.cpan.org/Public/Bug/Display.html?id=75038
An example from the PPI::Token::Number::Version documentation (PPI 1.215):
ppidump '10_000.10_000.10_000'
PPI::Document
PPI::Statement
[ 1, 1, 1 ] PPI::Token::Number::Float '10_000.10_000'
[ 1, 14, 14 ] PPI::Token::Number::Float '.10_000'
alternately:
ppidump 'v10_000.10_000.10_000'
PPI::Document
PPI::Statement
[ 1, 1, 1 ] PPI::Token::Number::Version 'v10'
[ 1, 4, 4 ] PPI::Token::Word '_000'
[ 1, 8, 8 ] PPI::Token::Number::Float '.10_000'
[ 1, 15, 15 ] PPI::Token::Number::Float '.10_000'
whereas
ppidump '10000.10000.10000'
PPI::Document
PPI::Statement
[ 1, 1, 1 ] PPI::Token::Number::Version '10000.10000.10000'
ppidump 'v10000.10000.10000'
PPI::Document
PPI::Statement
[ 1, 1, 1 ] PPI::Token::Number::Version 'v10000.10000.10000'
better api for encoded input
Ether summarized this in IRC just now and it seems perfectly fine and i'll need to do it asap.
14-01-22@02:41:35 (@ether) Mithaldu: basically, PPI needs separate interfaces for new_from_file, new_from_handle, and new_from_content -- for the first, you need a separate encoding argument; for the second, you should force the caller to apply the right layers to the $fh in advance; for new_from_content, presume characters (decoded)
-1 parsing as number rather than operator and 1
Perl-Critic/Perl-Critic#500 is not happy with this "-1" parsing as a number rather than the operator '-' followed by the number 1:
ppidump '(1)-1'
PPI::Document
PPI::Statement
PPI::Structure::List ( ... )
PPI::Statement::Expression
[ 1, 2, 2 ] PPI::Token::Number '1'
[ 1, 4, 4 ] PPI::Token::Number '-1'
However it's a different story with '+':
ppidump '(1)+1'
PPI::Document
PPI::Statement
PPI::Structure::List ( ... )
PPI::Statement::Expression
[ 1, 2, 2 ] PPI::Token::Number '1'
[ 1, 4, 4 ] PPI::Token::Operator '+'
[ 1, 5, 5 ] PPI::Token::Number '1'
Capture variables above $9 misparsed
Perl-Critic/Perl-Critic#455
https://rt.cpan.org/Public/Bug/Display.html?id=72980
PPI does not recognize that the numbered capture variables can go higher than $9. E.g. for:
$_ = 'xxxxxxxxxxxxxxxxxxxx';
/(.)(.)(.)(.)(.)(.)(.)(.)(.)(.)(.)(.)(.)/;
my $x = $13;
print $x;
ppidump 'my $x = $13;'
PPI::Document
PPI::Statement::Variable
[ 1, 1, 1 ] PPI::Token::Word 'my'
[ 1, 4, 4 ] PPI::Token::Symbol '$x'
[ 1, 7, 7 ] PPI::Token::Operator '='
[ 1, 9, 9 ] PPI::Token::Magic '$1'
[ 1, 11, 11 ] PPI::Token::Number '3'
[ 1, 12, 12 ] PPI::Token::Structure ';'
RT 36384: PPI won't parse source containing NUL
https://rt.cpan.org/Public/Bug/Display.html?id=36384
PPI 1.215/1.216_01:
perl -WE 'open( my $fh, ">", "contains_nul.pl"); print $fh "my \$a; \0 my \$b; print 1;";'
xxd -g1 contains_nul.pl
0000000: 6d 79 20 24 61 3b 20 00 20 6d 79 20 24 62 3b 20 my $a; . my $b;
0000010: 70 72 69 6e 74 20 31 3b print 1;
perl contains_nul.pl
1
ppidump contains_nul.pl
Could not parse code: Encountered unexpected character '0'
RT 41170: When token before ":" in a ternary expression is a bareword, it's misparsed as a label
Still an issue in 1.215:
ppidump '$foo = $condition ? undef : 1;'
PPI::Document
PPI::Statement
[ 1, 1, 1 ] PPI::Token::Symbol '$foo'
[ 1, 6, 6 ] PPI::Token::Operator '='
[ 1, 8, 8 ] PPI::Token::Symbol '$condition'
[ 1, 19, 19 ] PPI::Token::Operator '?'
[ 1, 21, 21 ] PPI::Token::Label 'undef :'
[ 1, 29, 29 ] PPI::Token::Number '1'
[ 1, 30, 30 ] PPI::Token::Structure ';'
most of the tests check whether parsing completed, but don't diag errors that resulted if parsing failed
RT 86553: hashref in function call parses as block not constructor
https://rt.cpan.org/Public/Bug/Display.html?id=86553
ppidump 'do_something({ %options });'
PPI::Document
PPI::Statement
[ 1, 1, 1 ] PPI::Token::Word 'do_something'
PPI::Structure::List ( ... )
PPI::Statement::Compound
PPI::Structure::Block { ... }
PPI::Statement
[ 1, 16, 16 ] PPI::Token::Symbol '%options'
[ 1, 27, 27 ] PPI::Token::Structure ';'
Happily, the normal Perl workaround makes it parse as expected:
ppidump 'do_something(+{ %options });'
PPI::Document
PPI::Statement
[ 1, 1, 1 ] PPI::Token::Word 'do_something'
PPI::Structure::List ( ... )
PPI::Statement::Expression
[ 1, 14, 14 ] PPI::Token::Operator '+'
PPI::Structure::Constructor { ... }
PPI::Statement
[ 1, 17, 17 ] PPI::Token::Symbol '%options'
[ 1, 28, 28 ] PPI::Token::Structure ';'
statement of word + block doesn't recognize implicit statement end
When a statement consists of a word plus a block and is supposed to end implicitly after the block, the statement instead keeps picking up tokens until it encounters an explicit statement end. E.g.:
ppidump 'DESTROY {} sub foo {} 1; sub bar{}'
PPI::Document
PPI::Statement
[ 1, 1, 1 ] PPI::Token::Word 'DESTROY'
PPI::Structure::Block { ... }
[ 1, 12, 12 ] PPI::Token::Word 'sub'
[ 1, 16, 16 ] PPI::Token::Word 'foo'
PPI::Structure::Block { ... }
[ 1, 23, 23 ] PPI::Token::Number '1'
[ 1, 24, 24 ] PPI::Token::Structure ';'
PPI::Statement::Sub
[ 1, 26, 26 ] PPI::Token::Word 'sub'
[ 1, 30, 30 ] PPI::Token::Word 'bar'
PPI::Structure::Block { ... }
The DESTROY+block statement is supposed to end when the block ends, but it it doesn't actually end until it sees the ';' after sub foo. If we change the initial statement to be something other than word+block, the DESTROY statement ends properly and foo is recognized as a sub:
ppidump 'sub DESTROY {} sub foo {} 1; sub bar{}'
PPI::Document
PPI::Statement::Sub
[ 1, 1, 1 ] PPI::Token::Word 'sub'
[ 1, 5, 5 ] PPI::Token::Word 'DESTROY'
PPI::Structure::Block { ... }
PPI::Statement::Sub
[ 1, 16, 16 ] PPI::Token::Word 'sub'
[ 1, 20, 20 ] PPI::Token::Word 'foo'
PPI::Structure::Block { ... }
PPI::Statement
[ 1, 27, 27 ] PPI::Token::Number '1'
[ 1, 28, 28 ] PPI::Token::Structure ';'
PPI::Statement::Sub
[ 1, 30, 30 ] PPI::Token::Word 'sub'
[ 1, 34, 34 ] PPI::Token::Word 'bar'
PPI::Structure::Block { ... }
This word+block pattern occurs for the special subs AUTOLOAD and DESTROY if you omit their optional 'sub'. Those two cases will no longer be an issue when #39 is applied for #31 .
do+block doesn't have the problem because it does not end implicitly. It's followed by 'until' or ';' or an expression.
'sub {} sub foo{}' fits the word+block+implicit end pattern, but it doesn't compile.
Are there any other naturally-occurring instances of word+block+implicit statement end?
PPI::Token::Prototype::__TOKENIZER__on_char uses capture var in undefined state
sub __TOKENIZER__on_char {
my $class = shift;
my $t = shift;
# Suck in until we find the closing bracket (or the end of line)
my $line = substr( $t->{line}, $t->{line_cursor} );
if ( $line =~ /^(.*?(?:\)|$))/ ) {
$t->{token}->{content} .= $1;
$t->{line_cursor} += length $1;
}
# Shortcut if end of line
return 0 unless $1 =~ /\)$/;
# Found the closing bracket
$t->_finalize_token->__TOKENIZER__on_char( $t );
}
If $line does not match the regex, there will nevertheless be a regex match against whatever contents $1 had when this function was called.
I haven't come up with a failing test yet.
cpan testing script
This should be written as an author test script:
A script that requires a minicpan to be available, unpacks all distributions in it, and runs all perl files in it through PPI, throwing a fail when errors are encountered, both to find general bugs and to find cases that trigger code thought to be dead (#9). It could additionally also collect statistics of file size and file parsing time in order to find cases where PPI performs badly (#5).
RT 75921: match on implicit $_ after map/grep not recogized
https://rt.cpan.org/Public/Bug/Display.html?id=75921
PPI 1.215 does not recognize a match against implicit $_ that follows map/grep if the match does not include 'm':
ppidump 'map { 0 } /z/'
PPI::Document
PPI::Statement
[ 1, 1, 1 ] PPI::Token::Word 'map'
PPI::Structure::Block { ... }
PPI::Statement
[ 1, 7, 7 ] PPI::Token::Number '0'
[ 1, 11, 11 ] PPI::Token::Operator '/'
[ 1, 12, 12 ] PPI::Token::Word 'z'
[ 1, 13, 13 ] PPI::Token::Operator '/'
But with more information like 'm', it's fine:
ppidump 'map { 0 } m/z/'
PPI::Document
PPI::Statement
[ 1, 1, 1 ] PPI::Token::Word 'map'
PPI::Structure::Block { ... }
PPI::Statement
[ 1, 7, 7 ] PPI::Token::Number '0'
[ 1, 11, 11 ] PPI::Token::Regexp::Match 'm/z/'
contributor access to the repo for @moregan?
Adam, @moregan is doing excellent work in hunting down bugs and writing both tests and fixes. Especially the ability to manage the issues on this repo would be a great boon for him. Could you grant him access to the repo, on the condition that we proceed as before and only merge branches after they've been reviewed?
what is a structure without braces?
Currently in the hospital, so keeping myself short.
Adding more tests I found a few places where code assumes that structure objects without braces (strictly: without ->start()
) can exist. How would such an object come to be? I can't think of an initial parse that would result in such, nor does deleting the opening brace token do it.
PPI::Statement::Variable too greedy
ppidump 'open( my $fh, ">", $filename );'
PPI::Document
PPI::Statement
[ 1, 1, 1 ] PPI::Token::Word 'open'
PPI::Structure::List ( ... )
PPI::Statement::Variable
[ 1, 7, 7 ] PPI::Token::Word 'my'
[ 1, 10, 10 ] PPI::Token::Symbol '$fh'
[ 1, 13, 13 ] PPI::Token::Operator ','
[ 1, 15, 15 ] PPI::Token::Quote::Double '">"'
[ 1, 18, 18 ] PPI::Token::Operator ','
[ 1, 20, 20 ] PPI::Token::Symbol '$filename'
[ 1, 31, 31 ] PPI::Token::Structure ';'
variables() on the PPI::Statement::Variable returns just '$fh', which seems right to me, but it doesn't seem right that anything following '$fh' is part of the statement.
It also doesn't seem right that initializers for declared variables become part of the PPI::Statement::Variable:
ppidump 'my $x = 1;';
PPI::Document
PPI::Statement::Variable
[ 1, 1, 1 ] PPI::Token::Word 'my'
[ 1, 4, 4 ] PPI::Token::Symbol '$x'
[ 1, 7, 7 ] PPI::Token::Operator '='
[ 1, 9, 9 ] PPI::Token::Number '1'
[ 1, 10, 10 ] PPI::Token::Structure ';'
The absence of a facility like initializers() in PPI::Statement::Variables implies (to me) that including the initializer is not by design.
In playing around with Lexer.pm I found it pretty easy to have a variable declaration without parens stop after it sees the variable:
+ if ( $Statement->isa('PPI::Statement::Variable') ) {
+ my @schildren = $Statement->schildren();
+ if ( @schildren > 1 and !$schildren[1]->isa('PPI::Structure::List') ) {
+ return $self->_rollback( $Token );
+ }
+ }
but from the results it looks like that change is too naive:
ppidump 'open( my $fh, ">", $filename );'
PPI::Document
PPI::Statement
[ 1, 1, 1 ] PPI::Token::Word 'open'
PPI::Structure::List ( ... )
PPI::Statement::Variable
[ 1, 7, 7 ] PPI::Token::Word 'my'
[ 1, 10, 10 ] PPI::Token::Symbol '$fh'
PPI::Statement::Expression
[ 1, 13, 13 ] PPI::Token::Operator ','
[ 1, 15, 15 ] PPI::Token::Quote::Double '">"'
[ 1, 18, 18 ] PPI::Token::Operator ','
[ 1, 20, 20 ] PPI::Token::Symbol '$filename'
[ 1, 31, 31 ] PPI::Token::Structure ';'
I don't know enough about the lexing/parsing to know whether it's just a case of needing a little more logic at statement end/statement begin, whether it's a fundamental problem of a variable declaration being an expression, or what.
x operator not recognized in '$a x3'
ppidump '$a x3'
PPI::Document
PPI::Statement
[ 1, 1, 1 ] PPI::Token::Symbol '$a'
[ 1, 4, 4 ] PPI::Token::Word 'x3'
The 'x' should be recognized as the x operator, since Perl does:
perl -WE '(1)x3'
Useless use of repeat (x) in void context at -e line 1.
nonsensical code in Whitespace->__TOKENIZER__on_char
There is a bit there that seems to try to determine whether a character outside of the ASCII range is word or whitespace, however instead of actually looking at the current character it looks at the stringified tokenizer, which is just a perl address. I'm unclear on whether the tokenizer is supposed to stringify, or whether this was just a piece where the meaning of $t changed without the code adapting. Anything but the obvious change to chr($char) you'd like done here?
RT 30037: minus operator turns function name into two words
https://rt.cpan.org/Ticket/Display.html?id=30037
With PPI 1.215:
ppidump '$a=-xx::cc()'
PPI::Document
PPI::Statement
[ 1, 1, 1 ] PPI::Token::Symbol '$a'
[ 1, 3, 3 ] PPI::Token::Operator '='
[ 1, 4, 4 ] PPI::Token::Word '-xx'
[ 1, 7, 7 ] PPI::Token::Word '::cc'
PPI::Structure::List ( ... )
Without the minus you get what you'd expect:
ppidump '$a=xx::cc()'
PPI::Document
PPI::Statement
[ 1, 1, 1 ] PPI::Token::Symbol '$a'
[ 1, 3, 3 ] PPI::Token::Operator '='
[ 1, 4, 4 ] PPI::Token::Word 'xx::cc'
PPI::Structure::List ( ... )
See also https://rt.cpan.org/Public/Bug/Display.html?id=55749, which has a lot of analysis and sample Perl code
RT 27364: DESTROY and AUTOLOAD don't parse as subs without 'sub'
Jeff Thalhammer points out that Perl allows you to omit 'sub' from DESTROY and AUTOLOAD:
moregan@moregan[~]$ perl -WE 'AUTOLOAD {1;}'
moregan@moregan[~]$ perl -WE 'package x; DESTROY {1;}'
moregan@moregan[~]$
but PPI doesn't recognized them as subs unless 'sub' is included:
ppidump 'AUTOLOAD {;}'
PPI::Document
PPI::Statement
[ 1, 1, 1 ] PPI::Token::Word 'AUTOLOAD'
PPI::Structure::Block { ... }
PPI::Statement::Null
[ 1, 11, 11 ] PPI::Token::Structure ';'
ppidump 'sub AUTOLOAD {;}'
PPI::Document
PPI::Statement::Sub
[ 1, 1, 1 ] PPI::Token::Word 'sub'
[ 1, 5, 5 ] PPI::Token::Word 'AUTOLOAD'
PPI::Structure::Block { ... }
PPI::Statement::Null
[ 1, 15, 15 ] PPI::Token::Structure ';'
extract generated test scripts?
I just realized that a whole bunch of test scripts that i need to change are generated from POD at runtime. Since having code in comments is a terrible idea, i'd like to extract them and put them into scripts permanently. Any particular opposition to this?
Many package names that are also keywords misparsed
PPI 1.215 and 1.216_01
Some package names that are also keywords don't parse as Word:
ppidump 'package x;'
PPI::Document
PPI::Statement::Package
[ 1, 1, 1 ] PPI::Token::Word 'package'
[ 1, 9, 9 ] PPI::Token::Operator 'x'
[ 1, 10, 10 ] PPI::Token::Structure ';'
bless, return, and scalar as package names parse as Word, but they force the following curly braces to be a hash constructor instead of a block:
ppidump 'package scalar {}'
PPI::Document
PPI::Statement::Package
[ 1, 1, 1 ] PPI::Token::Word 'package'
[ 1, 9, 9 ] PPI::Token::Word 'scalar'
PPI::Structure::Constructor { ... }
Comments still direct people to rt.cpan.org
ack rt.cpan.org
lib/PPI.pm
762:L<http://rt.cpan.org/NoAuth/ReportBug.html?Queue=PPI>
lib/PPI/Structure/List.pm
34:Got any ideas for methods? Submit a report to rt.cpan.org!
lib/PPI/Structure/Subscript.pm
37:Got any ideas for methods? Submit a report to rt.cpan.org!
lib/PPI/Structure/Given.pm
32:Got any ideas for methods? Submit a report to rt.cpan.org!
lib/PPI/Structure/For.pm
32:Got any ideas for methods? Submit a report to rt.cpan.org!
lib/PPI/Structure/When.pm
32:Got any ideas for methods? Submit a report to rt.cpan.org!
lib/PPI/Structure/Unknown.pm
38:Got any ideas for methods? Submit a report to rt.cpan.org!
lib/PPI/Structure/Constructor.pm
31:Got any ideas for methods? Submit a report to rt.cpan.org!
lib/PPI/Structure/Block.pm
41:Got any ideas for methods? Submit a report to rt.cpan.org!
lib/PPI/Structure/Condition.pm
36:Got any ideas for methods? Submit a report to rt.cpan.org!
lib/PPI/Token/QuoteLike/Command.pm
27:Got any ideas for methods? Submit a report to rt.cpan.org!
lib/PPI/Token/QuoteLike/Readline.pm
36:Got any ideas for methods? Submit a report to rt.cpan.org!
lib/PPI/Token/QuoteLike/Backtick.pm
27:Got any ideas for methods? Submit a report to rt.cpan.org!
lib/PPI/Token/Cast.pm
30:Got any ideas for methods? Submit a report to rt.cpan.org!
lib/PPI/Token/Regexp/Transliterate.pm
35:Got any ideas for methods? Submit a report to rt.cpan.org!
lib/PPI/Token/Regexp/Match.pm
41:Got any ideas for methods? Submit a report to rt.cpan.org!
lib/PPI/Token/Regexp/Substitute.pm
31:Got any ideas for methods? Submit a report to rt.cpan.org!
lib/PPI/Token/Pod.pm
25:Got any ideas for more methods? Submit a report to rt.cpan.org!
lib/PPI/Token/ArrayIndex.pm
25:Got any ideas for methods? Submit a report to rt.cpan.org!
lib/PPI/Token/Attribute.pm
31:Got any ideas for methods? Submit a report to rt.cpan.org!
lib/PPI/Token/Quote/Interpolate.pm
27:Got any ideas for methods? Submit a report to rt.cpan.org!
lib/PPI/Token/Quote/Literal.pm
27:Got any ideas for methods? Submit a report to rt.cpan.org!
lib/PPI/Token/Quote/Double.pm
30:Got any ideas for methods? Submit a report to rt.cpan.org!
lib/PPI/Token/Quote/Single.pm
33:Got any ideas for methods? Submit a report to rt.cpan.org!
lib/PPI/Token/Label.pm
27:Got any ideas for methods? Submit a report to rt.cpan.org!
lib/PPI/Token/Operator.pm
40:Got any ideas for methods? Submit a report to rt.cpan.org!
lib/PPI/Statement/Given.pm
32:Got any ideas for methods? Submit a report to rt.cpan.org!
lib/PPI/Statement/When.pm
40:Got any ideas for methods? Submit a report to rt.cpan.org!
t/08_regression.t
45:# Regression Test for rt.cpan.org #11522
132:# rt.cpan.org: Ticket #16671 $_ is not localized
inc/Module/Install/Metadata.pm
560: https?\Q://rt.cpan.org/\E[^>]+|
RT 37352: $$$a parsed $$ magic plus $a
https://rt.cpan.org/Ticket/Display.html?id=37352
https://rt.cpan.org/Ticket/Display.html?id=72679
ppidump '$$$a = 3;'
PPI::Document
PPI::Statement
[ 1, 1, 1 ] PPI::Token::Magic '$$'
[ 1, 3, 3 ] PPI::Token::Symbol '$a'
[ 1, 6, 6 ] PPI::Token::Operator '='
[ 1, 8, 8 ] PPI::Token::Number '3'
[ 1, 9, 9 ] PPI::Token::Structure ';'
whereas:
perl -WE 'my $a=\\"foo"; print $$$a;'
foo
Pod below __END__ not parsed if __DATA__ section is present
Pod below __END__
is not recognized if a __DATA__
section is present.
See #15 for additional info.
use strict;
use warnings;
use feature 'say';
use PPI::Document;
=head1 Test
This is a piece of POD.
=head2 Sub
This is also a piece of POD.
=cut
my $content = '';
{
local $/;
open my $fh, '<', $0;
$content = <$fh>;
close $0;
}
my $doc = PPI::Document->new(\$content);
my $pod .= PPI::Token::Pod->merge(@{$doc->find('PPI::Token::Pod')});
say $pod;
__DATA__
# some data here
__END__
=head1 More
This should also be a piece of POD. Should it?
=cut
This test script yields:
=pod
=head1 Test
This is a piece of POD.
=head2 Sub
This is also a piece of POD.
=cut
Expected output:
=pod
=head1 Test
This is a piece of POD.
=head2 Sub
This is also a piece of POD.
=head1 More
This should also be a piece of POD. Should it?
=cut
Caching ->isa()
Has anyone looked into caching ->isa() at the PPI::Element level?
The number of calls to ->isa() in a "make nytprof" run of Perl::Critic is staggering, and I have to think that at least some of those are redundant. Maybe it would be a win if each call to ->isa( 'whatever' ) would cache the result of that lookup.
I can go poking at this, but didn't want to waste my time if this was already considered and rejected.
content is read as octets, not characters, with no concept of decoding
new($filename)
reads the file as bytes, with no encoding layers, so any content that isn't Latin1 will cause issues. new(\$content)
is read exactly the same way.
Files need to be decoded, so an encoding parameter is needed. Content strings either need to be decoded similarly, or be passed as already-decoded characters.
I would suggest new APIs that don't conflict with the existing ones, to try to preserve backcompat as much as possible.
RFC: ellipsis "..." statement parses as operator. What types would be better?
Perl 5.12 introduced the ellipsis statement, "...". Currently "..." always parses as a PPI::Token::Operator. perl5120delta.pod refers to it as an operator, but perlsyn is pretty clear that it's really a statement, making the use of Operator wrong.
It's not too bad that the ellipsis becomes a child of a simple PPI::Statement, but, given the fact that it throws, would it be more appropriate to have it be a child of PPI::Statement::Break? A new statement type altogether?
The existing token types don't seem to fit the ellipsis. Should there be a PPI::Token::Ellipsis (subclass of PPI::Token)?
comments?
POD below __DATA__ not recognized.
Pod in the __DATA__
section is valid acording to the documentation. Maybe this is on purpose (how good is good enough?) but it will be correctly parsed below __END__
so it maybe is not.
use strict;
use warnings;
use feature 'say';
use PPI::Document;
=head1 Test
This is a piece of POD.
=head2 Sub
This is also a piece of POD.
=cut
my $content = '';
{
local $/;
open my $fh, '<', $0;
$content = <$fh>;
close $0;
}
my $doc = PPI::Document->new(\$content);
my $pod .= PPI::Token::Pod->merge(@{$doc->find('PPI::Token::Pod')});
say $pod;
__DATA__
=head1 More
This should also be a piece of POD. Should it?
=cut
This little test yields:
=pod
=head1 Test
This is a piece of POD.
=head2 Sub
This is also a piece of POD.
=cut
When it should yield:
=pod
=head1 Test
This is a piece of POD.
=head2 Sub
This is also a piece of POD.
=head1 More
This should also be a piece of POD. Should it?
=cut
PPI::Token::Prototype->prototype does not strip parens/whitespace as documented
=head2 prototype
The C<prototype> accessor returns the actual prototype pattern, stripped
of braces and any whitespace inside the pattern.
=cut
sub prototype {
my $self = shift;
my $proto = $self->content;
$proto =~ s/\(\)\s//g; # Strip brackets and whitespace
$proto;
}
The documentation says the return of prototype() has parentheses and internal whitespace, but stripping never happens due to the malformed regex, which probably intended for the parens and \s to be in a character class. As it stands, prototype() will always return the same value as content().
How to deal with ambiguous parses?
Right now PPI seems to be decidedly undecided on how to deal with code that cannot be decided confidently as to its meaning. An example as follows:
sub d { 1 };
my @c = 3 .. 6;
say 1 if d ~~ @c;
use v5.10.1; # !
sub d () { 1 }; # !
my @c = 3 .. 6;
say 1 if d ~~ @c;
The ~~
in the last line interpreted as a single operator in both cases. However in the first case it should be two operators and only in the second case should it be parsed as the smart-match operator.
In my opinion the current handling of that code is unacceptable.
I am however unsure on how it should be handled instead and am as such fishing for general opinions on how PPI should handle ambiguous parses.
Merging method
Can you please rebase branches before merging them, or cherry-pick their commits and then close issues, instead of using the auto-merge? There is a massive multitude of reasons for this, which can be summarized as "non-ff merge results in utterly crazy and debug-hostile history": https://github.com/adamkennedy/PPI/network
Alternately just tag an issue as "good to merge" and i'll deal with it.
anon hashref after operator treated as code block
from Perl-Critic/Perl-Critic#192 , hash constructor parses as a code block
ppidump '0 || {b => 1, a => 1};'
PPI::Document
PPI::Statement
[ 1, 1, 1 ] PPI::Token::Number '0'
[ 1, 3, 3 ] PPI::Token::Operator '||'
PPI::Structure::Block { ... }
PPI::Statement
[ 1, 7, 7 ] PPI::Token::Word 'b'
[ 1, 9, 9 ] PPI::Token::Operator '=>'
[ 1, 12, 12 ] PPI::Token::Number '1'
[ 1, 13, 13 ] PPI::Token::Operator ','
[ 1, 15, 15 ] PPI::Token::Word 'a'
[ 1, 17, 17 ] PPI::Token::Operator '=>'
[ 1, 20, 20 ] PPI::Token::Number '1'
[ 1, 22, 22 ] PPI::Token::Structure ';'
At least some other operators do not parse it as a block:
ppidump '0, {b => 1, a => 1};'
PPI::Document
PPI::Statement
[ 1, 1, 1 ] PPI::Token::Number '0'
[ 1, 2, 2 ] PPI::Token::Operator ','
PPI::Structure::Constructor { ... }
PPI::Statement::Expression
[ 1, 5, 5 ] PPI::Token::Word 'b'
[ 1, 7, 7 ] PPI::Token::Operator '=>'
[ 1, 10, 10 ] PPI::Token::Number '1'
[ 1, 11, 11 ] PPI::Token::Operator ','
[ 1, 13, 13 ] PPI::Token::Word 'a'
[ 1, 15, 15 ] PPI::Token::Operator '=>'
[ 1, 18, 18 ] PPI::Token::Number '1'
[ 1, 20, 20 ] PPI::Token::Structure ';'
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.