GithubHelp home page GithubHelp logo

s-expressionists / eclector Goto Github PK

View Code? Open in Web Editor NEW
106.0 13.0 9.0 8.57 MB

A portable Common Lisp reader that is highly customizable, can recover from errors and can return concrete syntax trees

Home Page: https://s-expressionists.github.io/Eclector/

License: BSD 2-Clause "Simplified" License

Common Lisp 98.72% NewLisp 1.28%
common-lisp reader extensible error-recovery portable concrete-syntax-trees

eclector's People

Contributors

bike avatar robert-strandh avatar scymtym avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

eclector's Issues

sharpsign-dot signals a warning and it should signal an error

beach want's to talk with Bike and symtym about precisely which error and so I will leave this here.

The with-preservered-backquote-context can be moved out of the if at the same time.

(defun sharpsign-dot (stream char parameter)
  (declare (ignore char))
  (unless (null parameter)
    (warn 'numeric-parameter-supplied-but-ignored
          :parameter parameter
          :macro-name 'sharpsign-dot))
  (with-preserved-backquote-context
      (if *read-suppress*
          (progn
            (read stream t nil t)       ; throw away the result
            (values))
          (eval (read stream t nil t)))))

Read-from-string with *read-base* 4 inconsistent

Consider: (newest eclector, current sbcl)

("eclector")
* (let ((*read-base* 4))
           (eclector.reader:read-from-string "2140.969"))
36.969
8
* (let ((*read-base* 4))(read-from-string "2140.969"))
2140.969
8
* (let ((*read-base* 4))
           (eclector.reader:read-from-string "2140"))
36
4

Seems that the digits before the dot are read with base 4, and the digits after the dot with base 10 (since they don't fit into base 4).

probabyly a problem for all read-base different from 10.

23.2.13 read-base says:
The value of read-base, called the current input base, is the radix in which integers and ratios are to be read by the Lisp reader. The parsing of other numeric types (e.g., floats) is not affected by this option.

So I believe it is an error to apply read-base to floats at all

Fully support custom mechanism for tracking the current package

Some clients implement custom behavior for packages and symbols by defining methods on interpret-symbol-token, interpret-symbol, wrap-in-quote and friends. These clients can have their own notion of the current package (e.g. stored in the client object). There is (at least) one blind spot though: the sharpsign-plus-minus macro function, when reading feature expressions, changes the current package by binding cl:*package*. A generic function (call-with-current-package client thunk package-designator) should be called instead.

.. is illegal, .||. is a symbol

According to HyperSpec section 2.3.3 (The Consing Dot), a token consisting solely of multiple dots (more than one dot, no escapes) is illegal.

Document side effects

A suggestion: document methods which have default implementation with side effects. It would be useful for implementation of side-effect-free parser. I know of #. and interning of symbols, perhaps there are other corners of the Spec to be aware of?

Not taking SET-MACRO-CHARACTER into account?

i have trouble loading maxima with the cst version of clasp, while it loads fine with the ast-version.
The cst-version of clasp uses eclector.concrete-syntax-tree:cst-read to read lisp-sources.

The problem seems to be the following (tested with sbcl and eclector from git)

Define a marco character "_" (the silly pprints are just to test)

(ql:quickload :eclector)
(ql:quickload "eclector-concrete-syntax-tree")
(set-macro-character #\_ (lambda (stream char)
                             (pprint `(char = ,char))
                             (let ((was (peek-char nil stream nil nil t)))
                               (pprint `(peek = ,was))
			       (case was
				 (#\" (values))
				 (#\N (read-char stream t nil t) (values))
				 (otherwise '_))))
                       t)

Use that

* (read-from-string "_N\"The message-lookup\"")
(CHAR = #\_)
(PEEK = #\N)
"The message-lookup"
22

but than in eclector:

* (eclector.reader:read-from-string "_N\"The message-lookup\"")
_N
2
* (with-input-from-string (stream "_N\"The message-lookup\"")
  (eclector.concrete-syntax-tree:cst-read stream))
#<CONCRETE-SYNTAX-TREE:ATOM-CST raw: _N {10046D7143}>
NIL

Am I missing something or are user-defined macro-characters not used?

Lambda-List of read-char seems incorrect

In eclector:
(input-stream &optional (eof-error-p t) eof-value recursive-p)
in clhs
21.2.17 read-char | Function
read-char &optional input-stream eof-error-p eof-value recursive-p char

Issue with source-location.

When taking the README example with the following input:

(with-input-from-string (stream "1 ") ;; with an extra space
  (eclector.parse-result:read (make-instance 'my-client) stream))

the source-location of the 1 is considered to be the range (0 . 2) where I think it should be (0 . 1).
I think removing this when might fix the issue, but I am not sure it serves another purpose.

Eclector doesn't do shit

SBCL:

(princ (read (make-string-input-stream "#\\Pile_of_Poo")))
| ๐Ÿ’ฉ
#\PILE_OF_POO

Eclector:

(princ (eclector.reader:read (make-string-input-stream "#\\Pile_of_Poo")))
|- ECLECTOR.READER:UNKNOWN-CHARACTER-NAME

But seriously, the client should probably have control over character lookup. Maybe a generic function interpret-character-name in the spirit of interpret-symbol.

READ-CST of long lists can blow the stack

Reproducible by e.g., (eclector.concrete-syntax-tree:cst-read (make-string-input-stream (format nil "(~{~A~^ ~})" (make-list 20000 :initial-element 1)))), though of course the required size might vary.

I believe this is due to the recursion in make-cons-cst in read-cst.lisp. On Clasp, the stack trace looks like

[...]
frame #19640: 0x000000010c7beaf8 clasp`::LAMBDA^COMMON-LISP^FN^^() at read-cst.lisp:0
frame #19641: 0x000000010c7beaf8 clasp`::LAMBDA^COMMON-LISP^FN^^() at read-cst.lisp:0
frame #19642: 0x000000010c7beaf8 clasp`::LAMBDA^COMMON-LISP^FN^^() at read-cst.lisp:0
frame #19643: 0x000000010c7beaf8 clasp`::LAMBDA^COMMON-LISP^FN^^() at read-cst.lisp:0
frame #19644: 0x000000010c7be2d7 clasp`::MAKE-EXPRESSION-RESULT^ECLECTOR.PARSE-RESULT^((CST-CLIENT T T T))^METHOD^^() at read-cst.lisp:0

i.e. the make-expression-result is the last intelligible call.

Make READ-EXTENDED-TOKEN

To consolidate duplicated code between (at least):

  • sharpsign-backslash
  • sharpsign-colon
  • read-token

Eclector probably needs PEEK-CHAR

<beach> Just an observation:
<beach> I think Eclector must have an implementation of PEEK-CHAR, because for
	some combination of arguments, it checks whether the character is a
	whitespace character. 
<beach> So the native PEEK-CHAR can not be used, because it would not consult
	the Eclector readtable for character syntax.
<drmeister> Check with Bike - we ran into this already.
<beach> Also, currently, Eclector doesn't use PEEK-CHAR simply because I
	completely forgot about its existence.  But some parts of the reader
	could be faster when PEEK-CHAR is used.
<scymtym> beach: sounds right. could you make an issue for this? ideally
	  including a code snippet demonstrating the problem

Ensure every CST element has a SOURCE, suggest how to capture whitespace and comments

I'd like to be able to parse a source file into a tree structure which includes all information required to build the source (including whitespace and comments). This would be easier if every parsed CST element has a valid associated source range. Currently that isn't the case for (at least) QUOTE elements. See:

SEL/LISP-DIFF> (labels ((maptree (fn tree)
                          (if (concrete-syntax-tree:consp tree)
                              (cons
                               (maptree fn (concrete-syntax-tree:first tree))
                               (maptree fn (concrete-syntax-tree:rest tree)))
                              (funcall fn tree))))
                 (maptree #'concrete-syntax-tree:source
                          (with-input-from-string (in "'(1 2 3 4)")
                            (cst-read in))))
(NIL ((2 . 3) (4 . 5) (6 . 7) (8 . 9)))

I've taken a shot at building this using cst-read and record-skipped-input with some custom code to fold everything together and add whitespace. See https://github.com/GrammaTech/sel/blob/master/ast-diff/lisp-diff.lisp#L71, is there a better way?

Thanks!

Full test coverage for macro functions

  • semicolon
  • single-quote
  • double-quote
  • backquote
  • comma
  • left-parenthesis
  • right-parenthesis
  • sharpsign-single-quote
  • sharpsign-left-parenthesis
  • sharpsign-dot
  • sharpsign-backslash
  • read-rational
  • sharpsign-b
  • sharpsign-x
  • sharpsign-o
  • sharpsign-r
  • sharpsign-asterisk
  • sharpsign-vertical-bar
  • sharpsign-a
  • sharpsign-colon
  • sharpsign-c
  • sharpsign-p
  • sharpsign-plus-minus
  • sharpsign-invalid
  • sharpsign-equals
  • sharpsign-sharpsign

Add reports to (non-mixin) reader conditions

  • invalid-context-for-backquote
  • comma-not-inside-backquote
  • unquote-splicing-in-dotted-list
  • unquote-splicing-at-top
  • invalid-context-for-consing-dot
  • consing-dot-most-be-followed-by-object
  • multiple-objects-following-consing-dot
  • invalid-context-for-right-parenthesis
  • sub-char-must-not-be-a-decimal-digit
  • char-must-be-a-dispatching-character
  • symbol-does-not-exist
  • symbol-is-not-external
  • symbol-name-must-not-end-with-package-marker
  • two-package-markers-must-be-adjacent
  • two-package-markers-must-not-be-first
  • symbol-can-have-at-most-two-package-markers
  • uninterned-symbol-must-not-contain-package-marker
  • numeric-parameter-supplied-but-ignored
  • numeric-parameter-not-supplied-but-required
  • unknown-character-name
  • digit-expected
  • invalid-radix
  • invalid-default-float-format
  • too-many-elements
  • no-elements-found
  • incorrect-initialization-length
  • single-feature-expected
  • sharpsign-invalid
  • sharpsign-equals-label-defined-more-than-once
  • sharpsign-sharpsign-undefined-label

READ-SUPPRESS.SHARP-R.6

In latest sbcl, eclector git master

Interaction of read-suppress and #R:
I believe no error should be signaled as READ-FROM-STRING from sbcl does
23.2.16 read-suppress
...
Except as noted below, any standardized reader macro2 that is defined to read2 a following object or token will do so, but not signal an error if the object read is not of an appropriate type or syntax.

* (WITH-STANDARD-IO-SYNTAX
  (LET ((*READ-SUPPRESS* T))
    (eclector.reader:READ-FROM-STRING "#0r0")))

debugger invoked on a ECLECTOR.READER:INVALID-RADIX in thread #<THREAD "main thread" RUNNING {10005185B3}>: 0 is too small to be a radix.

Type HELP for debugger help, or (SB-EXT:EXIT) to exit from SBCL.

restarts (invokable by number or by possibly-abbreviated name):
  0: [ABORT] Exit debugger, returning to top level.

(ECLECTOR.BASE:%READER-ERROR #<SB-IMPL::STRING-INPUT-STREAM {1004BA3233}> ECLECTOR.READER:INVALID-RADIX :RADIX 0)
   source: (APPLY (FUNCTION ERROR) DATUM :STREAM STREAM :STREAM-POSITION STREAM-POSITION (ALEXANDRIA.0.DEV:REMOVE-FROM-PLIST ARGUMENTS :STREAM-POSITION))
0] 0
* (WITH-STANDARD-IO-SYNTAX
  (LET ((*READ-SUPPRESS* T))
    (READ-FROM-STRING "#0r0")))
NIL
4

Provide full set of READ functions in ECLECTOR.{P-R,C-S-T} packages

  • eclector.parse-result
    • eclector.parse-result:read
    • eclector.parse-result:read-preserving-whitespace
    • eclector.parse-result:read-from-string
    • Update tests
    • Update documentation
  • eclector.concrete-syntax-tree
    • eclector.concrete-syntax-tree:read
    • eclector.concrete-syntax-tree:read-preserving-whitespace
    • eclector.concrete-syntax-tree:read-from-string
    • Update tests
    • Update documentation
    • Deprecate eclector.concrete-syntax-tree:cst-read

READ-SUPPRESS.17

I believe the following should not fail:

(in-package :eclector.reader)
(multiple-value-list
              (WITH-STANDARD-IO-SYNTAX
                (LET ((*READ-SUPPRESS* T))
                  (READ-FROM-STRING "#garbage"))))
->
debugger invoked on a ECLECTOR.READTABLE:UNKNOWN-MACRO-SUB-CHARACTER in thread #<THREAD "main thread" RUNNING {10005205B3}>: g is not a sub-character of the dispatch macro character #.

SYNTAX.DOT-TOKEN.7

This is a deviation from sbcl and ansi-tests, but don't really know what is correct:

(in-package :eclector.reader)
* (WITH-STANDARD-IO-SYNTAX
         (READ-FROM-STRING ".||"))

debugger invoked on a ECLECTOR.READER:INVALID-CONTEXT-FOR-CONSING-DOT in thread #<THREAD "main thread" RUNNING {10005205B3}>: A consing dot appeared in an illegal position.

Type HELP for debugger help, or (SB-EXT:EXIT) to exit from SBCL.

restarts (invokable by number or by possibly-abbreviated name):
  0: [ABORT] Exit debugger, returning to top level.

(ECLECTOR.BASE:%READER-ERROR #<SB-IMPL::STRING-INPUT-STREAM {10044D9503}> ECLECTOR.READER:INVALID-CONTEXT-FOR-CONSING-DOT)
   source: (APPLY (FUNCTION ERROR) DATUM :STREAM STREAM :STREAM-POSITION STREAM-POSITION (ALEXANDRIA.0.DEV:REMOVE-FROM-PLIST ARGUMENTS :STREAM-POSITION))
0] 0
* (cl:read-from-string ".||") 
|.|
3

Allow recovering from errors

This is very broad and may have to be split into multiple issues.

Some initial thoughts:

  • What should the client-facing interface be? A recover restart?
  • Problems with right parentheses seem to be the most common obstacle for recovery.
  • Do we need dedicated condition sub-classes? Dedicated result objects?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.