Strings and Runes in Scala
Our first program will demonstrate strings and runes in Scala. Here’s the full source code:
object StringsAndRunes {
def main(args: Array[String]): Unit = {
// 's' is a String assigned a literal value
// representing the word "hello" in the Thai language.
// Scala string literals are UTF-8 encoded text.
val s = "สวัสดี"
// Since strings are sequences of characters, this
// will produce the length of the string.
println(s"Len: ${s.length}")
// This loop generates the hex values of all
// the characters in 's'.
println("Hex values:")
s.foreach(c => print(f"${c.toInt}%x "))
println()
// To count how many characters are in a string, we can use
// the length method. Note that in Scala, multi-byte Unicode
// characters are counted as single characters.
println(s"Character count: ${s.length}")
// A for loop handles strings by iterating over each character.
for ((char, idx) <- s.zipWithIndex) {
println(f"U+${char.toInt}%04X '${char}' starts at $idx")
}
println("\nUsing foreach")
s.zipWithIndex.foreach { case (char, idx) =>
println(f"U+${char.toInt}%04X '${char}' starts at $idx")
examineChar(char)
}
}
def examineChar(c: Char): Unit = {
// We can compare a Char value to a character literal directly.
if (c == 't') {
println("found tee")
} else if (c == 'ส') {
println("found so sua")
}
}
}
To run the program, save it as StringsAndRunes.scala
and use scala
command:
$ scala StringsAndRunes.scala
Len: 6
Hex values:
e2a e27 e31 e2a e14 e35
Character count: 6
U+0E2A 'ส' starts at 0
U+0E27 'ว' starts at 1
U+0E31 'ั' starts at 2
U+0E2A 'ส' starts at 3
U+0E14 'ด' starts at 4
U+0E35 'ี' starts at 5
Using foreach
U+0E2A 'ส' starts at 0
found so sua
U+0E27 'ว' starts at 1
U+0E31 'ั' starts at 2
U+0E2A 'ส' starts at 3
found so sua
U+0E14 'ด' starts at 4
U+0E35 'ี' starts at 5
In Scala, strings are sequences of characters, and each character is a Unicode code point. Unlike in some other languages, Scala doesn’t have a separate concept of “runes”. Instead, characters in Scala are 16-bit Unicode characters.
The length
method on a string returns the number of characters, not the number of bytes. This means that even for strings containing multi-byte Unicode characters (like Thai), length
will return the number of visible characters.
Scala’s foreach
method and for comprehensions provide convenient ways to iterate over the characters in a string. The zipWithIndex
method is used to pair each character with its index in the string.
The toInt
method on a character returns its Unicode code point, which we format as a hexadecimal value for display.
In the examineChar
function, we demonstrate how to compare a character to a literal value. Scala uses single quotes for character literals.
This example shows how Scala handles Unicode strings and characters, providing a straightforward way to work with text in various languages and scripts.