starkat99 / widestring-rs Goto Github PK
View Code? Open in Web Editor NEWA wide string Rust library for converting to and from wide-character strings, including UTF-16 and UTF-32 encoding.
Home Page: https://docs.rs/widestring/
License: Other
A wide string Rust library for converting to and from wide-character strings, including UTF-16 and UTF-32 encoding.
Home Page: https://docs.rs/widestring/
License: Other
As the name suggests, this would remove some boilerplate in my public code. Specifically, I need TryFrom<OsString>
for a U16CString
although I don't see why this shouldn't be implemented for all fallible types.
It would be nice if the UtfNString
types had insert
and insert_str
methods like String
, for inserting a character or string (slice) at certain position.
Despite what the documentation says, there is no conversion from Utf32String
to Vec<char>
(but there is the other way around):
This also means that
Utf32String
is the same representation as aVec<char>
; indeed conversions between the two exist and are simple typecasts.
#[test]
fn truncated_with_surrogate() {
// Character U+24B62, encoded as D852 DF62 in UTF16
let buf= "𤭢";
let mut s = widestring::U16String::from_str(buf);
s.pop_char();
}
output:
thread 'windows::mount::tests::truncated_with_surrogate' panicked at C:\Users\gbleu\.cargo\registry\src\index.crates.io-6f17d22bba15001f\widestring-1.0.2\src\ustring.rs:1286:42:
index out of bounds: the len is 1 but the index is 1
stack backtrace:
0: std::panicking::begin_panic_handler
at /rustc/82e1608dfa6e0b5569232559e3d385fea5a93112/library\std\src\panicking.rs:645
1: core::panicking::panic_fmt
at /rustc/82e1608dfa6e0b5569232559e3d385fea5a93112/library\core\src\panicking.rs:72
2: core::panicking::panic_bounds_check
at /rustc/82e1608dfa6e0b5569232559e3d385fea5a93112/library\core\src\panicking.rs:190
3: core::slice::index::impl$2::index<u16>
at /rustc/82e1608dfa6e0b5569232559e3d385fea5a93112\library\core\src\slice\index.rs:258
4: alloc::vec::impl$12::index<u16,usize,alloc::alloc::Global>
at /rustc/82e1608dfa6e0b5569232559e3d385fea5a93112\library\alloc\src\vec\mod.rs:2732
5: widestring::ustring::U16String::pop_char
at C:\Users\gbleu\.cargo\registry\src\index.crates.io-6f17d22bba15001f\widestring-1.0.2\src\ustring.rs:1286
6: libparsec_platform_mountpoint::windows::mount::tests::truncated_with_surrogate
at .\tests\unit\windows_volume_label.rs:40
7: libparsec_platform_mountpoint::windows::mount::tests::truncated_with_surrogate::closure$0
at .\tests\unit\windows_volume_label.rs:35
8: core::ops::function::FnOnce::call_once<libparsec_platform_mountpoint::windows::mount::tests::truncated_with_surrogate::closure_env$0,tuple$<> >
at /rustc/82e1608dfa6e0b5569232559e3d385fea5a93112\library\core\src\ops\function.rs:250
9: core::ops::function::FnOnce::call_once
at /rustc/82e1608dfa6e0b5569232559e3d385fea5a93112/library\core\src\ops\function.rs:250
note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.
Canceling due to test failure: 0 tests still running
I guess this is due to a off-by-one issue here:
Line 1286 in e123e39
self.inner[self.len() - 1]
)As of my PR #12, the library won't compile for Rust 1.26.0.
The options are to either stop supporting this version (which is pretty old already), which means we should bump the version as well as it is a breaking change, or to fix it.
I could try fixing it if 1.26.0 support is mandatory, but I don't see why it should be. I'd like some input on this.
Hi there,
this is actually more of a question than an issue but I'm very interested in your opinion. I have been using your WideCString type in a very small rust project and am facing the situation that I have to tranform a variable of type WideCString into a slice of type &str.
I can see various ways of doing so, specifically:
I would be really interested in your perspective in this. What would be the best way to go?
For me this is not only a question of writing code that jst works. I'm currently learning rust and am really interested in writing "good" code (whatever that means). So, I'm trying to understand a little bit more than what I have to....
Thanks
Norbert
For Rust &str
's, there is the standard library, there's the (still unstable) std::str::pattern::Pattern
trait. It's used for methods like contains
, starts_with
, or matches
.
This trait should be ported to work on types from this library (ideally, the trait from the standard library should allow different strings).
#1 requested string matching functions for the types exposed by this crate - Pattern
would allow this by providing a std
-like API.
Just as a heads up, the link to the documentation on crates.io is broken.
Would be great to provide a macro that functions like include_str
but for the types provided by Widestring.
Gonna work on a PR!
Currently UStr/UCStr/UString/UCString are generic over u16/u32, to share common implementation details. This leads to both confusing docs and an inability to add const
functions, among other problems. So these details should be removed and simply have U16/U32 strings be entirely separate types, using macros where possible to reduce code duplication.
I was trying to use this library with a C library that uses wchar_t*
strings in its API. Unfortunately widestring decided to use u16
as its “wide character” type, while wchar_t
is a 32-bit type on Linux.
Any reason why widestring can't just use wchar_t
as its character type? IMO that would be the sensible thing to do…
The documentation for Utf16String::truncate says that
If
new_len
is greater than the string’s current length, this has no effect.
But because of the assertion in that method, this is not true. If new_len
is greater than the string's current length, truncate
will always panic.
widestring-rs/src/utfstring.rs
Line 1527 in c731486
As a user of the library, I would expect the implementation to conform to the documentation, which also matches the behavior of std::string::String::truncate
. No effect would be least surprising, a panic is not. Though, as an alternative, adjusting the documentation to match the current behavior would at least be less surprising than there being a mismatch.
There exist quite a handy crate: https://docs.rs/wchar. It provides wch_c!
macro, which converts a sting literal to &'static [u16]
at compile time.
It would be great to have a similar macro in widestring-rs, so you could define &'static U16CStr
constants.
Hi,
Great work on widestring stuff!
I wonder why WideCString::from_str_unchecked
is marked unsafe
given that it accepts anything that can be converted to an OsStr
?
It sort of marks a lot of higher level stuff that I have as unsafe too but passing a String
and converting it eventually into u16
slice seems to be entirely safe. Am I doing something wrong maybe? I only want to convert a T: AsRef<OsStr>
to a const* u16
and pass it to WinAPI function.
I realize that any nul values in my string may cause the underlying value to be seen differently by C environment but that does not seem to be unsafe until Rust manages the underlying memory and I dont't think WideCString
is giving it away anyway.
Cheers,
Andrej
The documentation here for UCStr::from_ptr_with_nul
is a little self-contradictory - "Safety" claims the pointer mustn't be null, but then "Panics" documents safe behavior (panicing) on null, but then the implementation doesn't check for null and may or may hit debug-only asserts inside of std
:
Line 132 in e7236b6
Line 139 in e7236b6
Lines 148 to 152 in e7236b6
Additionally, while UCStr::from_ptr_with_nul
doesn't scan for nuls, it appears UCString::from_ptr_with_nul
does (and will truncate) - and also handles the len=0 ptr=null case without panicing at all:
Lines 587 to 597 in e7236b6
calls:
Lines 181 to 191 in e7236b6
Should I create a PR for UCStr to try and return &[UChar::NUL]
for the length 0 case? (Maybe UCStr can gain a Default
impl?)
Should it scan/truncate too (would change the result of str.len()
)? Perhaps matching function signatures?
The inconsistent "Safety" vs "Panics" documentation crops up in multiple places - should I try and drop this text from all the "Safety" sections where p
is already null-checked soundly and documented to be null-checked under "Panics"?
p
must be non-null.
I'm using widestring to implement R7RS/R6RS Scheme VM and to convert number to string I have first to use to_string
that creates String
and after this I use U32String::from_str
which does 2 heap allocations.
Thank you for this great crate, just started using it.
I wanted to split an Utf16String
into lines, similar to the built-in lines()
function for UTF-8 strings in the standard library. Would be great if this could be added. In general currently missing any kind of splitting iterators like in the stdlib (split_terminator
and so on).
0.4.0 was fine, but 0.4.1 breaks on 1.34.2:
Compiling widestring v0.4.1
error[E0658]: use of unstable library feature 'alloc': this library is unlikely to be stabilized in its current form or name (see issue #27783)
--> /builds/inliniac/suricata-ci/suricata/rust/vendor/widestring/src/lib.rs:195:1
|
195 | extern crate alloc;
| ^^^^^^^^^^^^^^^^^^^
We use widestring in Suricata. We use a minimum of rustc 1.34.2 as this is what Debian stable uses and we want to make sure Debian can (continue to) package Suricata.
The documentation of to_wide_string states that WideString won't contain the NULL value but it will because it's copying the underlying vector (which has a NULL value).
It is possible to create an empty U16String
using U16String::new()
, although it is impossible to create an empty U16CString
. It looks like the ::new()
method on a U16CString
has been deprecated and so I could guess that you've been reluctant to introduce this function until you've removed the deprecated one (or never)? Is there any reason why we can't create an empty U16CString
using some convenience method?
It would really help if some of these features existed on WideStr and WideCStr:
It would also be nice to better document the behaviour of to_os_string
/from_str
on non-windows platforms where there's no canonical 1:1 relationship between an OsString and a WideString.
The winapi crate is no longer maintained and there are now official windows/windows-sys crates from Microsoft. widestring should transition from using winapi to windows/windows-sys.
This may not be a bug at all.
When I run my application using widestring 0.4 it works fine. But with 0.5 it crashed:
V:\myapp>rustc -V
rustc 1.56.0 (09c42c458 2021-10-18)
V:\myapp>\test.exe
thread 'main' panicked at 'range end index 18446744073709551615 out of range for slice of length 0', C:\Use
rs\mark\.cargo\registry\src\github.com-1ecc6299db9ec823\widestring-0.5.0\src\ucstring.rs:119:15
stack backtrace:
0: 0x13fd17c60 - std::sys_common::backtrace::_print::impl$0::fmt
at /rustc/09c42c45858d5f3aedfa670698275303a3d19afa\/library\std\src\sys_common\backtrace.rs:46
1: 0x13fcf25fa - core::fmt::write
at /rustc/09c42c45858d5f3aedfa670698275303a3d19afa\/library\core\src\fmt\mod.rs:1150
2: 0x13fd175a8 - std::io::Write::write_fmt<std::sys::windows::stdio::Stderr>
at /rustc/09c42c45858d5f3aedfa670698275303a3d19afa\/library\std\src\io\mod.rs:1667
3: 0x13fd16cfd - std::panicking::rust_panic_with_hook
at /rustc/09c42c45858d5f3aedfa670698275303a3d19afa\/library\std\src\panicking.rs:624
4: 0x13fd1ded5 - std::panicking::begin_panic_handler::closure$0
at /rustc/09c42c45858d5f3aedfa670698275303a3d19afa\/library\std\src\panicking.rs:521
5: 0x13fd1de49 - std::sys_common::backtrace::__rust_end_short_backtrace<std::panicking::begin_panic_handler::closure$0,never$>
at /rustc/09c42c45858d5f3aedfa670698275303a3d19afa\/library\std\src\sys_common\backtrace.rs:141
6: 0x13fd1de04 - std::panicking::begin_panic_handler
at /rustc/09c42c45858d5f3aedfa670698275303a3d19afa\/library\std\src\panicking.rs:517
7: 0x13fd30280 - core::panicking::panic_fmt
at /rustc/09c42c45858d5f3aedfa670698275303a3d19afa\/library\core\src\panicking.rs:101
8: 0x13fd30387 - core::slice::index::slice_end_index_len_fail
at /rustc/09c42c45858d5f3aedfa670698275303a3d19afa\/library\core\src\slice\index.rs:41
9: 0x13fce6089 - __acrt_rg_country_count
10: 0x13fce1006 - __acrt_rg_country_count
11: 0x13fcf110d - main
12: 0x13fd23a35 - __scrt_common_main_seh
at f:\dd\vctools\crt\vcstartup\src\startup\exe_common.inl:253
13: 0x76e0571d - BaseThreadInitThunk
14: 0x7706385d - RtlUserThreadStart
Unfortunately, this does not appear to give me any clue as to which widestring function is failing.
In my code I use only two widestring functions, WideCString::from_ptr_str
and
WideCString::from_str
. The former is only used inside one function:
pub fn str_for_win16(p: *const Wchar) -> String {
if p.is_null() {
return String::new();
}
unsafe { WideCString::from_ptr_str(p).to_string_lossy() }
}
The docs say that WideCString::from_ptr_str
will panic if the pointer is null, but as you can see I always avoid this.
However, I use WideCString::from_str
in many places and it turned out that one of these uses was the problem for me. The solution I applied was to replace from_str
with from_str_truncate
. This changed all my failing tests to passes when using widestring 0.5.
It would be useful to be able to print strings directly without converting them to a String, thereby avoiding the allocation.
While trying to implement this locally, I noticed that implementing Display
automatically implies a ToString
implementation, which conflicts with the current to_string
functions as they return an error when encountering invalid code units, whereas my Display
implementation performs the conversion lossily. Therefore, implementing Display
would be a breaking change.
Solutions I could think of:
display
method similar to Path::display
thiserror
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.