Strings and Runes in Dart

Dart strings are immutable sequences of UTF-16 code units. The language and standard library treat strings specially - as containers of text encoded in UTF-16. In Dart, the concept of a character is represented by a String of length 1, which corresponds to a single Unicode code point.

import 'dart:core';

void main() {
  // `s` is a `String` assigned a literal value
  // representing the word "hello" in the Thai language.
  // Dart string literals are UTF-16 encoded text.
  const s = "สวัสดี";

  // This will produce the length of the string in UTF-16 code units.
  print("Len: ${s.length}");

  // Iterating over the code units of the string
  print("Code units:");
  for (var i = 0; i < s.length; i++) {
    print("${s.codeUnitAt(i).toRadixString(16)} ");
  }
  print("");

  // To count how many Unicode scalars are in a string, we can use
  // the `runes` property. Note that this may be different from the
  // number of visible characters due to combining characters.
  print("Rune count: ${s.runes.length}");

  // A `for` loop with the `runes` property handles strings specially
  // and decodes each Unicode scalar along with its offset in the string.
  for (var i = 0; i < s.runes.length; i++) {
    var codePoint = s.runes.elementAt(i);
    print("U+${codePoint.toRadixString(16).padLeft(4, '0')} starts at ${s.runes.toList().indexOf(codePoint)}");
  }

  // We can achieve a similar iteration by using the
  // `String.runes` property explicitly.
  print("\nUsing String.runes");
  for (var rune in s.runes) {
    print("U+${rune.toRadixString(16).padLeft(4, '0')} '${String.fromCharCode(rune)}'");
    examineRune(rune);
  }
}

void examineRune(int rune) {
  // We can compare a rune value to a Unicode code point directly.
  if (rune == 't'.codeUnitAt(0)) {
    print("found tee");
  } else if (rune == 'ส'.codeUnitAt(0)) {
    print("found so sua");
  }
}

To run the program, save it as strings_and_runes.dart and use dart run:

$ dart run strings_and_runes.dart
Len: 18
Code units:
e2a e27 e31 e2a e14 e35 

Rune count: 6
U+0e2a starts at 0
U+0e27 starts at 3
U+0e31 starts at 6
U+0e2a starts at 9
U+0e14 starts at 12
U+0e35 starts at 15

Using String.runes
U+0e2a 'ส'
found so sua
U+0e27 'ว'
U+0e31 'ั'
U+0e2a 'ส'
found so sua
U+0e14 'ด'
U+0e35 'ี'

This Dart code demonstrates how to work with strings and Unicode code points (runes) in Dart. It shows string length, iterating over code units, counting and iterating over runes, and examining individual runes. The concepts are similar to the original example, but adapted to Dart’s string handling and Unicode support.