Comments (1)
字符串基于 Unicode 字符的切片
Rust 的字符串(&str
, String
)本质上都是按照 UTF-8 编码的 8 位无符号数构成的序列,而 UTF-8 作为一种变长编码,是没法在 []
应该是 String
不支持通过索引直接读取字符(String
本身不是 char
的序列,所以也没法按照 ops::Index::index
的签名返回 &char
类型),切片操作也是以 UTF-8 字节数作为索引进行切片。
不过,str
提供了 .chars()
和 .char_indices()
方法,分别返回 char
和 (usize, char)
的迭代器,返回的都是 Unicode 字符 char
,因此可以利用这两个迭代器做索引和切片。
let index: usize = 1;
let my_string = "你好 Rust!";
let ch: Option<char> = my_string.chars().nth(index); // Some('好')
切片则更复杂一些:
assert!(
index.end >= index.start,
"Start index should have been less than end index, but {} is not less than {}",
index.start,
index.end
);
let mut it = my_string.char_indices().skip(index.start).peekable(); // 利用 Peek 截取 start 位置
let start = match it.peek() {
Some((idx, _)) => *idx,
None => panic!("Start index {} is out of bounds", index.start),
};
let end = match it.take(index.end - index.start + 1).last() {
Some((idx, _)) => idx,
None => my_string.len(),
};
// safe
let safe_slice = &my_string[start..end];
// unsafe
let unsafe_slice = unsafe { my_string.get_unchecked(start..end) };
这个方法只需要一次调用 .char_indices()
,不需要堆上分配也不需要遍历整个字符串,比 utf8_slice
这个 crate 的实现快一些。
test slice_my_snippet ... bench: 12 ns/iter (+/- 1)
test slice_with_byte_index ... bench: 0 ns/iter (+/- 0)
test slice_with_utf8_slice_crate ... bench: 121 ns/iter (+/- 16)
from blog-contents.
Related Issues (19)
- test-issues HOT 2
- friends
- saussure-general-linguistics HOT 6
- cpp-practice-of-programming HOT 5
- $21-chongqing
- favorites-stackoverflow HOT 8
- $phon-perception-exp-beijing-mandarin
- intro-chomskyan-generative-syntax HOT 4
- jpn-lyrics-gloss-c HOT 1
- jpn-lyrics-gloss-a HOT 4
- rust-postmortem HOT 5
- jpn-lyrics-gloss-b HOT 1
- apocrypha-rs HOT 5
- $jpn-lyrics-* HOT 1
- blog-v4 HOT 5
- workout-journal-23 HOT 3
- farewell-song
- $pixel-8
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from blog-contents.