GithubHelp home page GithubHelp logo

rust-magics about blog-contents HOT 1 OPEN

Yixuan-Wang avatar Yixuan-Wang commented on May 29, 2024
rust-magics

from blog-contents.

Comments (1)

Yixuan-Wang avatar Yixuan-Wang commented on May 29, 2024

字符串基于 Unicode 字符的切片

Rust 的字符串(&str, String)本质上都是按照 UTF-8 编码的 8 位无符号数构成的序列,而 UTF-8 作为一种变长编码,是没法在 $O(1)$ 时间内查询到它编码的特定下标的 Unicode 字符的,而 The Book 指出在 Rust 中索引运算符 [] 应该是 $O(1)$ 的,因此 String 不支持通过索引直接读取字符(String 本身不是 char 的序列,所以也没法按照 ops::Index::index 的签名返回 &char 类型),切片操作也是以 UTF-8 字节数作为索引进行切片。

不过,str 提供了 .chars().char_indices() 方法,分别返回 char(usize, char) 的迭代器,返回的都是 Unicode 字符 char,因此可以利用这两个迭代器做索引和切片。

let index: usize = 1;
let my_string = "你好 Rust!";
let ch: Option<char> = my_string.chars().nth(index); // Some('好')

切片则更复杂一些:

assert!(
  index.end >= index.start,
  "Start index should have been less than end index, but {} is not less than {}",
  index.start,
  index.end
);

let mut it = my_string.char_indices().skip(index.start).peekable(); // 利用 Peek 截取 start 位置

let start = match it.peek() {
  Some((idx, _)) => *idx,
  None => panic!("Start index {} is out of bounds", index.start),
};

let end = match it.take(index.end - index.start + 1).last() {
  Some((idx, _)) => idx,
  None => my_string.len(),
};

// safe
let safe_slice = &my_string[start..end];
// unsafe
let unsafe_slice = unsafe { my_string.get_unchecked(start..end) };

这个方法只需要一次调用 .char_indices(),不需要堆上分配也不需要遍历整个字符串,比 utf8_slice 这个 crate 的实现快一些。

test slice_my_snippet            ... bench:          12 ns/iter (+/- 1)
test slice_with_byte_index       ... bench:           0 ns/iter (+/- 0)
test slice_with_utf8_slice_crate ... bench:         121 ns/iter (+/- 16)

from blog-contents.

Related Issues (19)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.