ryo1kato / mlr-grep Goto Github PK
View Code? Open in Web Editor NEWMulti-line log grep
License: MIT License
Multi-line log grep
License: MIT License
Make sure our implementation is O(n).
Inadvertent use of e.g. ByteString.concatenate can cause huge memcpy and can introduce O(n^2) issue.
perhaps using stringsearch?
https://bitbucket.org/dafis/stringsearch
Don't read entire input into RAM.
seem to much memory copy is happening (7GB allocated to process 4MB data)
Probably a problem in toLogs
head -c 4000000 test-result/test.data| time ./hmlgrep 'no pattern.match' +RTS -s
7,058,003,368 bytes allocated in the heap
30,336,800 bytes copied during GC
3,564,752 bytes maximum residency (5 sample(s))
905,648 bytes maximum slop
8 MB total memory in use (1 MB lost due to fragmentation)
Do more like the way GNU grep or AWK are doing to make it faster
http://lists.freebsd.org/pipermail/freebsd-current/2010-August/019310.html
The test is currently commented out.
[color_dot] Testing: --color . test/test1.txt
--- test-result/color_dot.amlgrep 2014-08-17 10:26:14.000000000 +0900
+++ test-result/color_dot.hmlgrep 2014-08-17 10:26:14.000000000 +0900
@@ -21,4 +21,3 @@
123
456
679
-
=====================================
Inconsistent results: amlgrep and hmlgrep
Arguments were: --color . test/test1.txt
extremely slow when input is bigger than 20,000,000 bytes
for i in 1 2 3 4 5 6 7 8 9 10;do head -c ${i}0000000 test-result/test.data | time ./hmlgrep --rs=^---- 'Go*gle' >/dev/null +RTS -s |& grep allocated; done
1,838,886,064 bytes allocated in the heap
./hmlgrep --rs=^---- 'Go*gle' +RTS -s > /dev/null 2>&1 0.85s user 0.03s system 99% cpu 0.880 total
grep --color=auto allocated 0.03s user 0.00s system 3% cpu 0.879 total
5,212,401,480 bytes allocated in the heap
./hmlgrep --rs=^---- 'Go*gle' +RTS -s > /dev/null 2>&1 3.31s user 0.06s system 99% cpu 3.375 total
grep --color=auto allocated 0.06s user 0.00s system 1% cpu 3.375 total
10,101,414,752 bytes allocated in the heap
./hmlgrep --rs=^---- 'Go*gle' +RTS -s > /dev/null 2>&1 7.47s user 0.10s system 99% cpu 7.580 total
grep --color=auto allocated 0.09s user 0.00s system 1% cpu 7.580 total
16,513,555,648 bytes allocated in the heap
./hmlgrep --rs=^---- 'Go*gle' +RTS -s > /dev/null 2>&1 13.91s user 0.17s system 99% cpu 14.106 total
grep --color=auto allocated 0.12s user 0.00s system 0% cpu 14.106 total
24,452,660,056 bytes allocated in the heap
./hmlgrep --rs=^---- 'Go*gle' +RTS -s > /dev/null 2>&1 22.66s user 0.25s system 99% cpu 22.927 total
grep --color=auto allocated 0.15s user 0.01s system 0% cpu 22.927 total
33,915,184,208 bytes allocated in the heap
./hmlgrep --rs=^---- 'Go*gle' +RTS -s > /dev/null 2>&1 32.62s user 0.31s system 99% cpu 32.985 total
grep --color=auto allocated 0.18s user 0.01s system 0% cpu 32.985 total
44,896,582,768 bytes allocated in the heap
./hmlgrep --rs=^---- 'Go*gle' +RTS -s > /dev/null 2>&1 45.96s user 0.39s system 99% cpu 46.409 total
grep --color=auto allocated 0.21s user 0.01s system 0% cpu 46.409 total
57,405,759,288 bytes allocated in the heap
./hmlgrep --rs=^---- 'Go*gle' +RTS -s > /dev/null 2>&1 64.94s user 0.59s system 99% cpu 1:05.63 total
grep --color=auto allocated 0.25s user 0.01s system 0% cpu 1:05.63 total
71,441,795,904 bytes allocated in the heap
./hmlgrep --rs=^---- 'Go*gle' +RTS -s > /dev/null 2>&1 79.48s user 0.65s system 99% cpu 1:20.20 total
grep --color=auto allocated 0.27s user 0.01s system 0% cpu 1:20.20 total
86,995,466,840 bytes allocated in the heap
./hmlgrep --rs=^---- 'Go*gle' +RTS -s > /dev/null 2>&1 93.02s user 0.66s system 99% cpu 1:33.70 total
grep --color=auto allocated 0.27s user 0.01s system 0% cpu 1:33.70 total
relatively much faster if input size is less than 10MiB
for i in 1 2 3 4 5 6 7 8 9 10;do head -c ${i}000000 test-result/test.data | time ./hmlgrep --rs=^---- 'Go*gle' >/dev/null +RTS -s |& grep allocated; done
118,545,560 bytes allocated in the heap
./hmlgrep --rs=^---- 'Go*gle' +RTS -s > /dev/null 2>&1 0.05s user 0.01s system 97% cpu 0.058 total
grep --color=auto allocated 0.00s user 0.00s system 9% cpu 0.058 total
247,628,352 bytes allocated in the heap
./hmlgrep --rs=^---- 'Go*gle' +RTS -s > /dev/null 2>&1 0.10s user 0.01s system 98% cpu 0.110 total
grep --color=auto allocated 0.01s user 0.00s system 6% cpu 0.109 total
390,995,392 bytes allocated in the heap
./hmlgrep --rs=^---- 'Go*gle' +RTS -s > /dev/null 2>&1 0.16s user 0.01s system 98% cpu 0.173 total
grep --color=auto allocated 0.01s user 0.00s system 6% cpu 0.173 total
553,867,384 bytes allocated in the heap
./hmlgrep --rs=^---- 'Go*gle' +RTS -s > /dev/null 2>&1 0.22s user 0.01s system 99% cpu 0.236 total
grep --color=auto allocated 0.01s user 0.00s system 5% cpu 0.236 total
727,632,200 bytes allocated in the heap
./hmlgrep --rs=^---- 'Go*gle' +RTS -s > /dev/null 2>&1 0.31s user 0.01s system 99% cpu 0.321 total
grep --color=auto allocated 0.02s user 0.00s system 5% cpu 0.321 total
919,795,080 bytes allocated in the heap
./hmlgrep --rs=^---- 'Go*gle' +RTS -s > /dev/null 2>&1 0.39s user 0.02s system 99% cpu 0.411 total
grep --color=auto allocated 0.02s user 0.00s system 4% cpu 0.411 total
1,127,436,576 bytes allocated in the heap
./hmlgrep --rs=^---- 'Go*gle' +RTS -s > /dev/null 2>&1 0.46s user 0.02s system 99% cpu 0.476 total
grep --color=auto allocated 0.02s user 0.00s system 4% cpu 0.476 total
1,345,972,736 bytes allocated in the heap
./hmlgrep --rs=^---- 'Go*gle' +RTS -s > /dev/null 2>&1 0.59s user 0.02s system 99% cpu 0.612 total
grep --color=auto allocated 0.02s user 0.00s system 4% cpu 0.611 total
1,586,868,912 bytes allocated in the heap
./hmlgrep --rs=^---- 'Go*gle' +RTS -s > /dev/null 2>&1 0.70s user 0.02s system 99% cpu 0.727 total
grep --color=auto allocated 0.03s user 0.00s system 3% cpu 0.727 total
1,838,886,768 bytes allocated in the heap
./hmlgrep --rs=^---- 'Go*gle' +RTS -s > /dev/null 2>&1 0.85s user 0.03s system 99% cpu 0.881 total
grep --color=auto allocated 0.03s user 0.00s system 3% cpu 0.880 total
... to speed things up.
Most probably it's non-existent filename is given on a command line.
about 1GB data
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.