Strings and Runes in Chapel
use IO;
proc main() {
const s = "สวัสดี";
// Since strings are arrays of bytes in Chapel, this
// will produce the length of the raw bytes stored within.
writeln("Len: ", s.numBytes);
// Indexing into a string produces the raw byte values at
// each index. This loop generates the hex values of all
// the bytes that constitute the code points in `s`.
for i in 0..#s.numBytes {
write("%02x ".format(s[i].toByte()));
}
writeln();
// To count how many characters are in a string, we can use
// the `numCodepoints` method. Note that some Thai characters
// are represented by UTF-8 code points that can span multiple
// bytes, so the result of this count may be surprising.
writeln("Character count: ", s.numCodepoints);
// Chapel's `string` type handles multi-byte characters automatically.
// We can iterate over the string to get each character (codepoint).
for (idx, char) in zip(0.., s.codepoints()) {
writef("U+%04x '%s' starts at %d\n", char.asciiValue, char, idx);
}
writeln("\nUsing manual iteration");
var i = 0;
for char in s.codepoints() {
writef("U+%04x '%s' starts at %d\n", char.asciiValue, char, i);
i += char.numBytes;
examineChar(char);
}
}
proc examineChar(c: string) {
// We can compare a character value to a character literal directly.
if c == "t" {
writeln("found tee");
} else if c == "ส" {
writeln("found so sua");
}
}This Chapel code demonstrates working with strings and characters (codepoints) in a manner similar to the original example. Here are some key points about the translation:
Chapel uses the
usestatement to import modules, similar to Go’simport.The
mainprocedure in Chapel is equivalent to themainfunction in Go.Chapel’s
stringtype is similar to Go’s, representing UTF-8 encoded text.Chapel provides methods like
numBytesandnumCodepointsto work with string lengths.Iteration over string characters (codepoints) is done using the
codepoints()iterator.Chapel doesn’t have a built-in
runetype, so we usestringof length 1 to represent individual characters.The
examineCharprocedure demonstrates how to compare individual characters.Chapel’s string formatting uses
writefand%placeholders, similar to C’sprintf.
This code provides similar functionality to the Go example, allowing you to examine the bytes, codepoints, and characters of a UTF-8 encoded string in Chapel.