bminixhofer / srx Goto Github PK
View Code? Open in Web Editor NEWA mostly compliant Rust implementation of the Segmentation Rules eXchange (SRX) 2.0 standard for text segmentation.
License: Apache License 2.0
A mostly compliant Rust implementation of the Segmentation Rules eXchange (SRX) 2.0 standard for text segmentation.
License: Apache License 2.0
RUST_BACKTRACE=1 cargo spellcheck
[2021-02-17T13:34:51Z ERROR cargo_spellcheck::documentation::cluster] BUG: Failed to guarantee literal content/span integrity: Regex should match >/**
* The parameters that are required for the parachains.
*/<
[2021-02-17T13:34:51Z ERROR cargo_spellcheck::documentation::cluster] BUG: Failed to guarantee literal content/span integrity: Regex should match >/**
* The parameters that are not essential, but still may be of interest for parachains.
*/<
[2021-02-17T13:34:51Z ERROR cargo_spellcheck::documentation::cluster] BUG: Failed to guarantee literal content/span integrity: Regex should match >/**
* Parameters that will unlikely be needed by parachains.
*/<
thread '<unnamed>' panicked at 'index out of bounds: the len is 22 but the index is 22', /home/bernhard/.cargo/registry/src/github.com-1ecc6299db9ec823/srx-0.1.1/src/lib.rs:127:20
stack backtrace:
0: rust_begin_unwind
at /rustc/e1884a8e3c3e813aada8254edfa120e85bf5ffca/library/std/src/panicking.rs:495:5
1: core::panicking::panic_fmt
at /rustc/e1884a8e3c3e813aada8254edfa120e85bf5ffca/library/core/src/panicking.rs:92:14
2: core::panicking::panic_bounds_check
at /rustc/e1884a8e3c3e813aada8254edfa120e85bf5ffca/library/core/src/panicking.rs:69:5
3: srx::Rules::split
4: nlprule::tokenizer::Tokenizer::pipe
5: nlprule::rules::Rules::suggest
6: cargo_spellcheck::checker::nlprules::check_sentence
7: rayon::iter::plumbing::Folder::consume_iter
8: rayon::iter::plumbing::bridge_producer_consumer::helper
9: rayon_core::join::join_context::{{closure}}
10: rayon::iter::plumbing::bridge_producer_consumer::helper
11: std::panicking::try
12: <rayon_core::job::StackJob<L,F,R> as rayon_core::job::Job>::execute
13: rayon_core::registry::WorkerThread::wait_until_cold
14: rayon_core::join::join_context::{{closure}}
15: rayon::iter::plumbing::bridge_producer_consumer::helper
16: rayon_core::job::StackJob<L,F,R>::run_inline
17: rayon_core::join::join_context::{{closure}}
18: rayon::iter::plumbing::bridge_producer_consumer::helper
19: rayon_core::job::StackJob<L,F,R>::run_inline
20: rayon_core::join::join_context::{{closure}}
21: rayon::iter::plumbing::bridge_producer_consumer::helper
22: std::panicking::try
23: <rayon_core::job::StackJob<L,F,R> as rayon_core::job::Job>::execute
24: rayon_core::registry::WorkerThread::wait_until_cold
25: rayon_core::join::join_context::{{closure}}
26: rayon::iter::plumbing::bridge_producer_consumer::helper
27: std::panicking::try
28: <rayon_core::job::StackJob<L,F,R> as rayon_core::job::Job>::execute
29: rayon_core::registry::WorkerThread::wait_until_cold
30: rayon_core::registry::ThreadBuilder::run
note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.
RUST_BACKTRACE=full cargo spellcheck
[2021-02-17T13:35:43Z ERROR cargo_spellcheck::documentation::cluster] BUG: Failed to guarantee literal content/span integrity: Regex should match >/**
* The parameters that are required for the parachains.
*/<
[2021-02-17T13:35:43Z ERROR cargo_spellcheck::documentation::cluster] BUG: Failed to guarantee literal content/span integrity: Regex should match >/**
* The parameters that are not essential, but still may be of interest for parachains.
*/<
[2021-02-17T13:35:43Z ERROR cargo_spellcheck::documentation::cluster] BUG: Failed to guarantee literal content/span integrity: Regex should match >/**
* Parameters that will unlikely be needed by parachains.
*/<
thread '<unnamed>' panicked at 'index out of bounds: the len is 22 but the index is 22', /home/bernhard/.cargo/registry/src/github.com-1ecc6299db9ec823/srx-0.1.1/src/lib.rs:127:20
stack backtrace:
0: 0x55d2b56c82b0 - std::backtrace_rs::backtrace::libunwind::trace::h04d12fdcddff82aa
at /rustc/e1884a8e3c3e813aada8254edfa120e85bf5ffca/library/std/src/../../backtrace/src/backtrace/libunwind.rs:100:5
1: 0x55d2b56c82b0 - std::backtrace_rs::backtrace::trace_unsynchronized::h1459b974b6fbe5e1
at /rustc/e1884a8e3c3e813aada8254edfa120e85bf5ffca/library/std/src/../../backtrace/src/backtrace/mod.rs:66:5
2: 0x55d2b56c82b0 - std::sys_common::backtrace::_print_fmt::h9b8396a669123d95
at /rustc/e1884a8e3c3e813aada8254edfa120e85bf5ffca/library/std/src/sys_common/backtrace.rs:67:5
3: 0x55d2b56c82b0 - <std::sys_common::backtrace::_print::DisplayBacktrace as core::fmt::Display>::fmt::he009dcaaa75eed60
at /rustc/e1884a8e3c3e813aada8254edfa120e85bf5ffca/library/std/src/sys_common/backtrace.rs:46:22
4: 0x55d2b56efadc - core::fmt::write::h77b4746b0dea1dd3
at /rustc/e1884a8e3c3e813aada8254edfa120e85bf5ffca/library/core/src/fmt/mod.rs:1078:17
5: 0x55d2b56c2cb2 - std::io::Write::write_fmt::heb7e50902e98831c
at /rustc/e1884a8e3c3e813aada8254edfa120e85bf5ffca/library/std/src/io/mod.rs:1518:15
6: 0x55d2b56caae5 - std::sys_common::backtrace::_print::h2d880c9e69a21be9
at /rustc/e1884a8e3c3e813aada8254edfa120e85bf5ffca/library/std/src/sys_common/backtrace.rs:49:5
7: 0x55d2b56caae5 - std::sys_common::backtrace::print::h5f02b1bb49f36879
at /rustc/e1884a8e3c3e813aada8254edfa120e85bf5ffca/library/std/src/sys_common/backtrace.rs:36:9
8: 0x55d2b56caae5 - std::panicking::default_hook::{{closure}}::h658e288a7a809b29
at /rustc/e1884a8e3c3e813aada8254edfa120e85bf5ffca/library/std/src/panicking.rs:208:50
9: 0x55d2b56ca788 - std::panicking::default_hook::hb52d73f0da9a4bb8
at /rustc/e1884a8e3c3e813aada8254edfa120e85bf5ffca/library/std/src/panicking.rs:227:9
10: 0x55d2b56cb2a6 - std::panicking::rust_panic_with_hook::hfe7e1c684e3e6462
at /rustc/e1884a8e3c3e813aada8254edfa120e85bf5ffca/library/std/src/panicking.rs:597:17
11: 0x55d2b56cadc7 - std::panicking::begin_panic_handler::{{closure}}::h42939e004b32765c
at /rustc/e1884a8e3c3e813aada8254edfa120e85bf5ffca/library/std/src/panicking.rs:499:13
12: 0x55d2b56c876c - std::sys_common::backtrace::__rust_end_short_backtrace::h9d2070f7bf9fd56c
at /rustc/e1884a8e3c3e813aada8254edfa120e85bf5ffca/library/std/src/sys_common/backtrace.rs:141:18
13: 0x55d2b56cad29 - rust_begin_unwind
at /rustc/e1884a8e3c3e813aada8254edfa120e85bf5ffca/library/std/src/panicking.rs:495:5
14: 0x55d2b56ee151 - core::panicking::panic_fmt::ha0bb065d9a260792
at /rustc/e1884a8e3c3e813aada8254edfa120e85bf5ffca/library/core/src/panicking.rs:92:14
15: 0x55d2b56ee112 - core::panicking::panic_bounds_check::h625de1b83193c0a3
at /rustc/e1884a8e3c3e813aada8254edfa120e85bf5ffca/library/core/src/panicking.rs:69:5
16: 0x55d2b558692d - srx::Rules::split::h757e86f32ff5bdea
17: 0x55d2b553fc3d - nlprule::tokenizer::Tokenizer::pipe::hc177c781b863d3a9
18: 0x55d2b554e15d - nlprule::rules::Rules::suggest::h2b49a4c1e56676c4
19: 0x55d2b53a05a1 - cargo_spellcheck::checker::nlprules::check_sentence::hd763024dab0ff06d
20: 0x55d2b53743f7 - rayon::iter::plumbing::Folder::consume_iter::hb660d392b87ca017
21: 0x55d2b5328090 - rayon::iter::plumbing::bridge_producer_consumer::helper::hccb5a12ba30bde38
22: 0x55d2b532f4d6 - rayon_core::join::join_context::{{closure}}::hbb5523a69dc511c3
23: 0x55d2b532857c - rayon::iter::plumbing::bridge_producer_consumer::helper::hccb5a12ba30bde38
24: 0x55d2b5398390 - std::panicking::try::hd1772e21a95afb23
25: 0x55d2b531ba27 - <rayon_core::job::StackJob<L,F,R> as rayon_core::job::Job>::execute::h82871da77560993d
26: 0x55d2b568b901 - rayon_core::registry::WorkerThread::wait_until_cold::h4014a63026918c4e
27: 0x55d2b532f647 - rayon_core::join::join_context::{{closure}}::hbb5523a69dc511c3
28: 0x55d2b532857c - rayon::iter::plumbing::bridge_producer_consumer::helper::hccb5a12ba30bde38
29: 0x55d2b53151cd - rayon_core::job::StackJob<L,F,R>::run_inline::h55d3cd0b1bb830ef
30: 0x55d2b532f5b3 - rayon_core::join::join_context::{{closure}}::hbb5523a69dc511c3
31: 0x55d2b532857c - rayon::iter::plumbing::bridge_producer_consumer::helper::hccb5a12ba30bde38
32: 0x55d2b532f4d6 - rayon_core::join::join_context::{{closure}}::hbb5523a69dc511c3
33: 0x55d2b532857c - rayon::iter::plumbing::bridge_producer_consumer::helper::hccb5a12ba30bde38
34: 0x55d2b532f4d6 - rayon_core::join::join_context::{{closure}}::hbb5523a69dc511c3
35: 0x55d2b532857c - rayon::iter::plumbing::bridge_producer_consumer::helper::hccb5a12ba30bde38
36: 0x55d2b5398390 - std::panicking::try::hd1772e21a95afb23
37: 0x55d2b531ba27 - <rayon_core::job::StackJob<L,F,R> as rayon_core::job::Job>::execute::h82871da77560993d
38: 0x55d2b568b901 - rayon_core::registry::WorkerThread::wait_until_cold::h4014a63026918c4e
39: 0x55d2b532f647 - rayon_core::join::join_context::{{closure}}::hbb5523a69dc511c3
40: 0x55d2b532857c - rayon::iter::plumbing::bridge_producer_consumer::helper::hccb5a12ba30bde38
41: 0x55d2b5398390 - std::panicking::try::hd1772e21a95afb23
42: 0x55d2b531ba27 - <rayon_core::job::StackJob<L,F,R> as rayon_core::job::Job>::execute::h82871da77560993d
43: 0x55d2b568b901 - rayon_core::registry::WorkerThread::wait_until_cold::h4014a63026918c4e
44: 0x55d2b568a24a - rayon_core::registry::ThreadBuilder::run::h8a2ded62c0d4cce9
45: 0x55d2b568d8a5 - std::sys_common::backtrace::__rust_begin_short_backtrace::h392c810344304a0f
46: 0x55d2b568d38d - core::ops::function::FnOnce::call_once{{vtable.shim}}::h2edf92cb3a4e4f5b
47: 0x55d2b56d34ca - <alloc::boxed::Box<F,A> as core::ops::function::FnOnce<Args>>::call_once::h09ff301006f1aeca
at /rustc/e1884a8e3c3e813aada8254edfa120e85bf5ffca/library/alloc/src/boxed.rs:1307:9
48: 0x55d2b56d34ca - <alloc::boxed::Box<F,A> as core::ops::function::FnOnce<Args>>::call_once::he79488c8f00b5f31
at /rustc/e1884a8e3c3e813aada8254edfa120e85bf5ffca/library/alloc/src/boxed.rs:1307:9
49: 0x55d2b56d34ca - std::sys::unix::thread::Thread::new::thread_start::h587efff279c68ba7
at /rustc/e1884a8e3c3e813aada8254edfa120e85bf5ffca/library/std/src/sys/unix/thread.rs:71:17
50: 0x7fa7ea47f3f9 - start_thread
51: 0x7fa7ea25fb53 - __clone
52: 0x0 - <unknown>
Reproducible when running against paritytech/polkadot
, I won't have time to dig into this at least until the WE.
When using U+203C
, U+FE0F
U+2757
and others, sentence split does not work as anticipated.
Committing to one regex backend is immediately a compromise. It would be nice to have a RegexBackend
trait such that the user can bring their own backend. This would also mean having the default regex backend behind a feature flag (probably in default-features
though)
An alternative is multiple features for different backends but I am currently in favor of a trait-based approach for more customizability.
error[E0599]: no function or associated item named from_str
found for struct SRX
in the current scope
--> segment/src/main.rs:9:20
|
9 | let srx = SRX::from_str(&fs::read_to_string("data/segment.srx").unwrap())?;
| ^^^^^^^^ function or associated item not found in SRX
Trying to compile the example in the documentation but it doesn't compile despite from_xml
feature being enabled.
use std::fs;
use srx::SRX;
fn main() {
let srx = SRX::from_xml(&fs::read_to_string("data/language_tools.segment.srx").unwrap())?;
let english_rules = srx.language_rules("en");
println!("Hello, world!");
assert_eq!(
english_rules.split("e.g. U.K. and Mr. do not split. SRX is a rule-based format.").collect::<Vec<_>>(),
vec!["e.g. U.K. and Mr. do not split. ", "SRX is a rule-based format."]
);
}
error[E0599]: no function or associated item named `from_xml` found for struct `SRX` in the current scope
--> src/main.rs:5:20
|
5 | let srx = SRX::from_xml(&fs::read_to_string("data/language_tools.segment.srx").unwrap())?;
| ^^^^^^^^ function or associated item not found in `SRX`
error[E0277]: the `?` operator can only be used in a function that returns `Result` or `Option` (or another type that implements `Try`)
--> src/main.rs:5:15
|
4 | / fn main() {
5 | | let srx = SRX::from_xml(&fs::read_to_string("data/language_tools.segment.srx").unwrap())?;
| | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ cannot use the `?` operator in a function that returns `()`
6 | | let english_rules = srx.language_rules("en");
7 | |
... |
12 | | );
13 | | }
| |_- this function should return `Result` or `Option` to accept `?`
|
= help: the trait `Try` is not implemented for `()`
= note: required by `from_error`
error: aborting due to 2 previous errors
[package]
name = "rsegment"
version = "0.1.0"
edition = "2018"
[dependencies]
srx = { version = "0.1.3", features = ["from_xml"] }
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.