GithubHelp home page GithubHelp logo

fy0 / tinyre Goto Github PK

View Code? Open in Web Editor NEW
78.0 4.0 9.0 163 KB

A light fork of python's regex engine (but slow, ~3k lines).

License: zlib License

C 93.41% Python 3.91% CMake 2.51% C++ 0.17%

tinyre's Introduction

tinyre ver 0.9.2

Travis Code Climate

A tiny regex engine.
Plan to be compatible with "Secret Labs' Regular Expression Engine"(SRE for python).

warning: the project already works fine, but slow

Features:

  • utf-8 support
    Cheers for unicode!

  • no octal number
    \1 means group 1, \1-100 means group n, \01 match \1, \07 match \7, \08 match ['\0', '8'], \377 match 0o377, but \400 isn't match with 0o400 and [chr(0o40), '\0']!
    What the hell ... I choose go die! Go away octal number!

  • custom maximum number of backtracking
    An evil regex: 'a?'*n+'a'*n against 'a'*n
    For example: 'a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?aaaaaaaaaaaaaaaaaaaaaaaaa' matches 'aaaaaaaaaaaaaaaaaaaaaaaaa'
    It will takes a long time because of too many times of backtracking. Perl/Python/PCRE requires over 10^15 years to match a 29-character string.
    You can set a limit to backtracking times to avoid this situation, and the match will be falied.

  • more than 100 groups ...
    but who cares?

Supported:

  • "." Matches any character except a newline.
  • "^" Matches the start of the string.
  • "$" Matches the end of the string or just before the newline at the end of the string.
  • "*" Matches 0 or more (greedy) repetitions of the preceding RE. Greedy means that it will match as many repetitions as possible.
  • "+" Matches 1 or more (greedy) repetitions of the preceding RE.
  • "?" Matches 0 or 1 (greedy) of the preceding RE.
  • *?,+?,?? Non-greedy versions of the previous three special characters.
  • {m} Matches m copies of the previous RE.
  • {m,n} Matches from m to n repetitions of the preceding RE.
  • {m,n}? Non-greedy version of the above.
  • "\" Either escapes special characters or signals a special sequence.
  • "\1-N" Matches the text matched earlier by the group index.
  • [] Indicates a set of characters.
  • [^] A "^" as the first character indicates a complementing set.
  • "|" A|B, creates an RE that will match either A or B.
  • (...) Matches the RE inside the parentheses. The contents can be retrieved or matched later in the string.
  • (?ims) Set the I, M or S flag for the RE (see below).
  • (?:...) Non-grouping version of regular parentheses.
  • (?P...) The substring matched by the group is accessible by name.
  • (?P=name) Matches the text matched earlier by the group named name.
  • (?#...) A comment; ignored.
  • (?=...) Matches if ... matches next, but doesn't consume the string.
  • (?!...) Matches if ... doesn't match next.
  • (?<=...) Matches if preceded by ... (must be fixed length).
  • (?<!...) Matches if not preceded by ... (must be fixed length).
  • (?(id/name)yes|no) Matches yes pattern if the group with id/name matched, the (optional) no pattern otherwise.
  • \d \D \w \W \s \S
  • Flag: DOTALL
  • Flag: IGNORECASE
  • Flag: MULTILINE

Some of the functions in this module takes flags as optional parameters:

  • I IGNORECASE Perform case-insensitive matching.
  • M MULTILINE "^" matches the beginning of lines (after a newline) as well as the string. "$" matches the end of lines (before a newline) as well as the end of the string.
  • S DOTALL "." matches any character at all, including the newline.

Use

C/C++

#include "tinyre.h"

tre_Pattern* pattern;
tre_Match* match;

pattern = tre_compile("^(bb)*a", 0);
match = tre_match(pattern, "bbbbabc", 0);

// Group  0: bbbba
// Group  1: bb

Python

Edit CMakefile.txt, change build_target to py3lib, disable debug

project (tinyre)
#set(CMAKE_BUILD_TYPE Debug)
#set(build_target demo)
set(build_target py3lib)
mkdir build
cd build && cmake .. && make
cp ./_tinyre.so ../lib_py3

cd ../lib_py3
python3
import tre
tre.match("^(bb)*a", "bbbbabc")

Doc

基础设计
TODO列表
更新记录

License:zlib

tinyre's People

Contributors

fy0 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.