GithubHelp home page GithubHelp logo

baitcenter / logos Goto Github PK

View Code? Open in Web Editor NEW

This project forked from maciejhirsz/logos

0.0 1.0 0.0 1.68 MB

Create ridiculously fast Lexers

Home Page: https://crates.io/crates/logos

License: Apache License 2.0

Rust 100.00%

logos's Introduction

Logos logo

Logos

Test Crates.io version shield Docs Crates.io license shield

Create ridiculously fast Lexers.

Logos has two goals:

  • To make it easy to create a Lexer, so you can focus on more complex problems.
  • To make the generated Lexer faster than anything you'd write by hand.

To achieve those, Logos:

Example

use logos::Logos;

#[derive(Logos, Debug, PartialEq)]
enum Token {
    // Tokens can be literal strings, of any length.
    #[token("fast")]
    Fast,

    #[token(".")]
    Period,

    // Or regular expressions.
    #[regex("[a-zA-Z]+")]
    Text,

    // Logos requires one token variant to handle errors,
    // it can be named anything you wish.
    #[error]
    // We can also use this variant to define whitespace,
    // or any other matches we wish to skip.
    #[regex(r"[ \t\n\f]+", logos::skip)]
    Error,
}

fn main() {
    let mut lex = Token::lexer("Create ridiculously fast Lexers.");

    assert_eq!(lex.next(), Some(Token::Text));
    assert_eq!(lex.span(), 0..6);
    assert_eq!(lex.slice(), "Create");

    assert_eq!(lex.next(), Some(Token::Text));
    assert_eq!(lex.span(), 7..19);
    assert_eq!(lex.slice(), "ridiculously");

    assert_eq!(lex.next(), Some(Token::Fast));
    assert_eq!(lex.span(), 20..24);
    assert_eq!(lex.slice(), "fast");

    assert_eq!(lex.next(), Some(Token::Text));
    assert_eq!(lex.span(), 25..31);
    assert_eq!(lex.slice(), "Lexers");

    assert_eq!(lex.next(), Some(Token::Period));
    assert_eq!(lex.span(), 31..32);
    assert_eq!(lex.slice(), ".");

    assert_eq!(lex.next(), None);
}

Callbacks

Logos can also call arbitrary functions whenever a pattern is matched, which can be used to put data into a variant:

use logos::{Logos, Lexer, Extras};

// Note: callbacks can return `Option` or `Result`
fn kilo(lex: &mut Lexer<Token>) -> Option<u64> {
    let slice = lex.slice();
    let n: u64 = slice[..slice.len() - 1].parse().ok()?; // skip 'k'
    Some(n * 1_000)
}

fn mega(lex: &mut Lexer<Token>) -> Option<u64> {
    let slice = lex.slice();
    let n: u64 = slice[..slice.len() - 1].parse().ok()?; // skip 'm'
    Some(n * 1_000_000)
}

#[derive(Logos, Debug, PartialEq)]
enum Token {
    #[error]
    Error,

    // Callbacks can use closure syntax, or refer
    // to a function defined elsewhere.
    //
    // Each pattern can have it's own callback.
    #[regex("[0-9]+", |lex| lex.slice().parse())]
    #[regex("[0-9]+k", kilo)]
    #[regex("[0-9]+m", mega)]
    Number(u64),
}

fn main() {
    let mut lex = Token::lexer("5 42k 75m");

    assert_eq!(lex.next(), Some(Token::Number(5)));
    assert_eq!(lex.slice(), "5");

    assert_eq!(lex.next(), Some(Token::Number(42_000)));
    assert_eq!(lex.slice(), "42k");

    assert_eq!(lex.next(), Some(Token::Number(75_000_000)));
    assert_eq!(lex.slice(), "75m");

    assert_eq!(lex.next(), None);
}

Logos can handle callbacks with following return types:

Return type Produces
() Token::Unit
bool Token::Unit or <Token as Logos>::ERROR
Result<(), _> Token::Unit or <Token as Logos>::ERROR
T Token::Value(T)
Option<T> Token::Value(T) or <Token as Logos>::ERROR
Result<T, _> Token::Value(T) or <Token as Logos>::ERROR
Skip skips matched input
Filter<T> Token::Value(T) or skips matched input

Callbacks can be also used to do perform more specialized lexing in place where regular expressions are too limiting. For specifics look at Lexer::remainder and Lexer::bump.

Token disambiguation

Rule of thumb is:

  • Longer beats shorter.
  • Specific beats generic.

If any two definitions could match the same input, like fast and [a-zA-Z]+ in the example above, it's the longer and more specific definition of Token::Fast that will be the result.

This is done by comparing numeric priority attached to each definition. Every consecutive, non-repeating single byte adds 2 to the priority, while every range or regex class adds 1. Loops or optional blocks are ignored, while alternations count the shortest alternative:

  • [a-zA-Z]+ has a priority of 1 (lowest possible), because at minimum it can match a single byte to a class.
  • foobar has a priority of 12.
  • (foo|hello)(bar)? has a priority of 6, foo being it's shortest possible match.

How fast?

Ridiculously fast!

test identifiers                       ... bench:         647 ns/iter (+/- 27) = 1204 MB/s
test keywords_operators_and_punctators ... bench:       2,054 ns/iter (+/- 78) = 1037 MB/s
test strings                           ... bench:         553 ns/iter (+/- 34) = 1575 MB/s

Acknowledgements

License

This code is distributed under the terms of both the MIT license and the Apache License (Version 2.0), choose whatever works for you.

See LICENSE-APACHE and LICENSE-MIT for details.

logos's People

Contributors

cad97 avatar k-nasa avatar maciejhirsz avatar marceloboeira avatar mikolajpp avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.