Strings and Runes in Scala

Our first program will demonstrate strings and runes in Scala. Here’s the full source code:

object StringsAndRunes {
  def main(args: Array[String]): Unit = {
    // 's' is a String assigned a literal value
    // representing the word "hello" in the Thai language.
    // Scala string literals are UTF-8 encoded text.
    val s = "สวัสดี"

    // Since strings are sequences of characters, this
    // will produce the length of the string.
    println(s"Len: ${s.length}")

    // This loop generates the hex values of all
    // the characters in 's'.
    println("Hex values:")
    s.foreach(c => print(f"${c.toInt}%x "))
    println()

    // To count how many characters are in a string, we can use
    // the length method. Note that in Scala, multi-byte Unicode
    // characters are counted as single characters.
    println(s"Character count: ${s.length}")

    // A for loop handles strings by iterating over each character.
    for ((char, idx) <- s.zipWithIndex) {
      println(f"U+${char.toInt}%04X '${char}' starts at $idx")
    }

    println("\nUsing foreach")
    s.zipWithIndex.foreach { case (char, idx) =>
      println(f"U+${char.toInt}%04X '${char}' starts at $idx")
      examineChar(char)
    }
  }

  def examineChar(c: Char): Unit = {
    // We can compare a Char value to a character literal directly.
    if (c == 't') {
      println("found tee")
    } else if (c == 'ส') {
      println("found so sua")
    }
  }
}

To run the program, save it as StringsAndRunes.scala and use scala command:

$ scala StringsAndRunes.scala
Len: 6
Hex values:
e2a e27 e31 e2a e14 e35 
Character count: 6
U+0E2A 'ส' starts at 0
U+0E27 'ว' starts at 1
U+0E31 'ั' starts at 2
U+0E2A 'ส' starts at 3
U+0E14 'ด' starts at 4
U+0E35 'ี' starts at 5

Using foreach
U+0E2A 'ส' starts at 0
found so sua
U+0E27 'ว' starts at 1
U+0E31 'ั' starts at 2
U+0E2A 'ส' starts at 3
found so sua
U+0E14 'ด' starts at 4
U+0E35 'ี' starts at 5

In Scala, strings are sequences of characters, and each character is a Unicode code point. Unlike in some other languages, Scala doesn’t have a separate concept of “runes”. Instead, characters in Scala are 16-bit Unicode characters.

The length method on a string returns the number of characters, not the number of bytes. This means that even for strings containing multi-byte Unicode characters (like Thai), length will return the number of visible characters.

Scala’s foreach method and for comprehensions provide convenient ways to iterate over the characters in a string. The zipWithIndex method is used to pair each character with its index in the string.

The toInt method on a character returns its Unicode code point, which we format as a hexadecimal value for display.

In the examineChar function, we demonstrate how to compare a character to a literal value. Scala uses single quotes for character literals.

This example shows how Scala handles Unicode strings and characters, providing a straightforward way to work with text in various languages and scripts.