Strings and Runes in Chapel

use IO;

proc main() {
    const s = "สวัสดี";

    // Since strings are arrays of bytes in Chapel, this
    // will produce the length of the raw bytes stored within.
    writeln("Len: ", s.numBytes);

    // Indexing into a string produces the raw byte values at
    // each index. This loop generates the hex values of all
    // the bytes that constitute the code points in `s`.
    for i in 0..#s.numBytes {
        write("%02x ".format(s[i].toByte()));
    }
    writeln();

    // To count how many characters are in a string, we can use
    // the `numCodepoints` method. Note that some Thai characters
    // are represented by UTF-8 code points that can span multiple
    // bytes, so the result of this count may be surprising.
    writeln("Character count: ", s.numCodepoints);

    // Chapel's `string` type handles multi-byte characters automatically.
    // We can iterate over the string to get each character (codepoint).
    for (idx, char) in zip(0.., s.codepoints()) {
        writef("U+%04x '%s' starts at %d\n", char.asciiValue, char, idx);
    }

    writeln("\nUsing manual iteration");
    var i = 0;
    for char in s.codepoints() {
        writef("U+%04x '%s' starts at %d\n", char.asciiValue, char, i);
        i += char.numBytes;
        examineChar(char);
    }
}

proc examineChar(c: string) {
    // We can compare a character value to a character literal directly.
    if c == "t" {
        writeln("found tee");
    } else if c == "ส" {
        writeln("found so sua");
    }
}

This Chapel code demonstrates working with strings and characters (codepoints) in a manner similar to the original example. Here are some key points about the translation:

  1. Chapel uses the use statement to import modules, similar to Go’s import.

  2. The main procedure in Chapel is equivalent to the main function in Go.

  3. Chapel’s string type is similar to Go’s, representing UTF-8 encoded text.

  4. Chapel provides methods like numBytes and numCodepoints to work with string lengths.

  5. Iteration over string characters (codepoints) is done using the codepoints() iterator.

  6. Chapel doesn’t have a built-in rune type, so we use string of length 1 to represent individual characters.

  7. The examineChar procedure demonstrates how to compare individual characters.

  8. Chapel’s string formatting uses writef and % placeholders, similar to C’s printf.

This code provides similar functionality to the Go example, allowing you to examine the bytes, codepoints, and characters of a UTF-8 encoded string in Chapel.