Please tick this box to confirm you have reviewed the above. <ul class="contains

ripgrep is slower than LTREP matching `[A-Z][A-Z]+` about ripgrep HOT 3 CLOSED

Bricktech2000 commented on August 15, 2024

ripgrep is slower than LTREP matching `[A-Z][A-Z]+`

from ripgrep.

Comments (3)

BurntSushi commented on August 15, 2024

Nice find. At first I thought this might be a case of the latency of the regex engine playing a role here (because it generally has more overhead than what you have here). But some other examples reveal this probably isn't the case?

$ time rg -c '[A-Z]+' enwik9
7421426

real    0.604
user    0.547
sys     0.057
maxmem  959 MB
faults  0

$ time ltrep-1297041 -c '[A-Z]+' enwik9
7421426

real    0.904
user    0.843
sys     0.060
maxmem  954 MB
faults  0

$ time rg -c '[A-Z][A-Z]+' enwik9
1212981

real    1.141
user    1.077
sys     0.063
maxmem  959 MB
faults  0

$ time ltrep-1297041 -c '[A-Z][A-Z]+' enwik9
1212981

real    0.925
user    0.884
sys     0.040
maxmem  954 MB
faults  0

So I think this probably warrants some investigation.

Also, just because something is a toy doesn't mean it is expected to never be faster than something that isn't a non-toy. For example, this is the "toy" version of memchr:

fn memchr(needle: u8, haystack: &[u8]) -> Option<usize> {
    haystack.iter().position(|&b| b == needle)
}

But sometimes this will be faster than a SIMD optimized version. Because the SIMD optimized version needs a preamble, a fallback for the case when the haystack is smaller than the vector size, creating the vector and so on. And the SIMD version might not get inlined if it's using AVX2 and the surrounding code wasn't compiled with AVX2 enabled. (Which is the common case, although the rise of things like x86-64-v3 is changing that.)

And like, despite your regex engine being a toy, your search code is still very tight and about as good as you can do with a DFA: you transition from state to state and report whether a match state was seen. ripgrep's regex engine basically does the same, but with more frills.

from ripgrep.

BurntSushi commented on August 15, 2024

My hypothesis here is that your toy regex engine is benefiting precisely from the coupling that comes from the nature of it being a toy. That is, in your grep, the code for searching the input and the code for walking the DFA are effectively intertwined. That generally isn't true for a general purpose regex engine: there's usually an abstraction boundary that separates them. To test this, I generated a file with the same number of lines as enwik9 but where every line was just AZ:

use std::io::Write;

fn main() -> anyhow::Result<()> {
    let lines = 13_147_025;
    let mut out = std::io::stdout().lock();
    for _ in 0..lines {
        writeln!(out, "AZ")?;
    }
    Ok(())
}

Then I benchmarked ripgrep, ltrep and GNU grep with hyperfine:

$ hyperfine --output pipe "LC_ALL=C grep -E -c '[A-Z][A-Z]' enwik9-two-capitals" "rg --no-config -c '[A-Z][A-Z]' enwik9-two-capitals" "ltrep-1297041 -c '[A-Z][A-Z]' enwik9-two-capitals"
Benchmark 1: LC_ALL=C grep -E -c '[A-Z][A-Z]' enwik9-two-capitals
  Time (mean ± σ):     112.7 ms ±   2.5 ms    [User: 110.7 ms, System: 2.2 ms]
  Range (min … max):   110.5 ms … 120.4 ms    25 runs

Benchmark 2: rg --no-config -c '[A-Z][A-Z]' enwik9-two-capitals
  Time (mean ± σ):     327.9 ms ±   1.5 ms    [User: 326.3 ms, System: 1.7 ms]
  Range (min … max):   325.4 ms … 331.0 ms    10 runs

Benchmark 3: ltrep-1297041 -c '[A-Z][A-Z]' enwik9-two-capitals
  Time (mean ± σ):      22.5 ms ±   3.4 ms    [User: 20.6 ms, System: 2.0 ms]
  Range (min … max):    16.0 ms …  31.3 ms    91 runs

Summary
  ltrep-1297041 -c '[A-Z][A-Z]' enwik9-two-capitals ran
    5.00 ± 0.76 times faster than LC_ALL=C grep -E -c '[A-Z][A-Z]' enwik9-two-capitals
   14.55 ± 2.20 times faster than rg --no-config -c '[A-Z][A-Z]' enwik9-two-capitals

ltrep doesn't just beat ripgrep, it also beats GNU grep. GNU grep doesn't have as much abstraction as ripgrep, but it has more than ltrep. Also, at a certain point, the feature support has a role to play here. As the features and optimizations and use cases grow, so to does the abstractions.

But ltrep's advantage here is data dependent. Your particular implementation examines every byte of input, and in exchange, your code is simpler but potentially significantly slower. For example, I wrote this program to generate a similar file as above, but with an extra 100 bytes after the initial AZ:

use std::io::Write;

fn main() -> anyhow::Result<()> {
    let lines = 13_147_025;
    let mut out = std::io::stdout().lock();
    let filler = "az".repeat(100);
    for _ in 0..lines {
        write!(out, "AZ")?;
        write!(out, "{filler}")?;
        writeln!(out, "")?;
    }
    Ok(())
}

Then I re-ran the benchmarks:

$ hyperfine --output pipe "LC_ALL=C grep -E -c '[A-Z][A-Z]' enwik9-two-capitals-with-filler" "rg --no-config -c '[A-Z][A-Z]' enwik9-two-capitals-with-filler" "ltrep-1297041 -c '[A-Z][A-Z]' enwik9-two-capitals-with-filler"
Benchmark 1: LC_ALL=C grep -E -c '[A-Z][A-Z]' enwik9-two-capitals-with-filler
  Time (mean ± σ):     330.5 ms ±   7.5 ms    [User: 161.5 ms, System: 168.8 ms]
  Range (min … max):   321.7 ms … 341.6 ms    10 runs

Benchmark 2: rg --no-config -c '[A-Z][A-Z]' enwik9-two-capitals-with-filler
  Time (mean ± σ):     531.5 ms ±  11.2 ms    [User: 460.0 ms, System: 71.1 ms]
  Range (min … max):   502.5 ms … 544.9 ms    10 runs

Benchmark 3: ltrep-1297041 -c '[A-Z][A-Z]' enwik9-two-capitals-with-filler
  Time (mean ± σ):      2.294 s ±  0.019 s    [User: 2.238 s, System: 0.055 s]
  Range (min … max):    2.261 s …  2.312 s    10 runs

Summary
  LC_ALL=C grep -E -c '[A-Z][A-Z]' enwik9-two-capitals-with-filler ran
    1.61 ± 0.05 times faster than rg --no-config -c '[A-Z][A-Z]' enwik9-two-capitals-with-filler
    6.94 ± 0.17 times faster than ltrep-1297041 -c '[A-Z][A-Z]' enwik9-two-capitals-with-filler

Both GNU grep and ripgrep know to stop searching on each line after seeing the AZ. But ltrep continues. Your code is less branchy because of it, but now it's doing a bunch of wasted work.

I think there's overall room for ripgrep to improve here, but I'd consider this difference to be "overall small." And once the abstraction genie is out of the box, it's hard to roll it back.

from ripgrep.

Bricktech2000 commented on August 15, 2024

Short-circuiting out when on an accepting state in the middle of a partial match is such a low-hanging fruit I can't believe I didn't think of it. In the general case I believe this boils down to preprocessing the DFA by flagging states which either always or never reach an accepting state. And I wouldn't be surprised if the additional compare and branch needed in the hot loop nullified LTRE's "head start".

Thanks for taking the time to provide such a great response.

from ripgrep.

ripgrep is slower than LTREP matching `[A-Z][A-Z]+` about ripgrep HOT 3 CLOSED

Comments (3)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs