Title here
Summary here
use std::str;
fn main() {
// `s` is a `&str` assigned a literal value
// representing the word "hello" in the Thai
// language. Rust string literals are UTF-8
// encoded text.
let s = "สวัสดี";
// Since strings are stored as UTF-8 bytes, this
// will produce the length of the raw bytes stored within.
println!("Len: {}", s.len());
// Indexing into a string produces the raw byte values at
// each index. This loop generates the hex values of all
// the bytes that constitute the code points in `s`.
for b in s.bytes() {
print!("{:x} ", b);
}
println!();
// To count how many characters are in a string, we can use
// the `chars()` method. Note that some Thai characters are
// represented by UTF-8 code points that can span multiple bytes,
// so the result of this count may be surprising.
println!("Char count: {}", s.chars().count());
// A `for` loop over a string handles it specially and decodes
// each character along with its byte offset in the string.
for (idx, c) in s.char_indices() {
println!("{:?} starts at {}", c, idx);
}
// We can achieve the same iteration by using the
// `str::from_utf8` function explicitly.
println!("\nUsing str::from_utf8");
let mut i = 0;
while i < s.len() {
let ch = str::from_utf8(&s.as_bytes()[i..])
.unwrap()
.chars()
.next()
.unwrap();
println!("{:?} starts at {}", ch, i);
i += ch.len_utf8();
// This demonstrates passing a `char` value to a function.
examine_char(ch);
}
}
fn examine_char(c: char) {
// We can compare a `char` value to a character literal directly.
if c == 't' {
println!("found tee");
} else if c == 'ส' {
println!("found so sua");
}
}
When you run this program, you’ll see output similar to:
Len: 18
e0 b8 aa e0 b8 a7 e0 b8 b1 e0 b8 aa e0 b8 94 e0 b8 b5
Char count: 6
'ส' starts at 0
'ว' starts at 3
'ั' starts at 6
'ส' starts at 9
'ด' starts at 12
'ี' starts at 15
Using str::from_utf8
'ส' starts at 0
found so sua
'ว' starts at 3
'ั' starts at 6
'ส' starts at 9
found so sua
'ด' starts at 12
'ี' starts at 15
This Rust code demonstrates similar concepts to the original example:
The main differences are in the syntax and the specific methods used. For example, Rust uses chars()
instead of utf8.RuneCountInString()
, and char_indices()
instead of ranging over the string directly. The concept of “runes” in Go is replaced by Rust’s char
type, which represents a Unicode scalar value.