Strings and Runes in Kotlin

Kotlin, like many programming languages, has special handling for strings and characters. In Kotlin, strings are immutable sequences of characters, and characters are represented by the Char type. Let’s explore how Kotlin handles strings and characters, including Unicode support.

import java.nio.charset.Charset

fun main() {
    // `s` is a String assigned a literal value
    // representing the word "hello" in the Thai language.
    // Kotlin string literals are UTF-16 encoded by default.
    val s = "สวัสดี"

    // This will produce the length of the string in characters.
    println("Len: ${s.length}")

    // In Kotlin, we can iterate over the bytes of a string
    // by converting it to a ByteArray.
    println("Bytes:")
    s.toByteArray(Charset.forName("UTF-8")).forEach {
        print("${it.toUByte().toString(16)} ")
    }
    println()

    // To count how many characters are in a string, we can use
    // the length property. Note that some Unicode characters
    // might be represented by surrogate pairs, which are counted as two characters.
    println("Character count: ${s.length}")

    // A `for` loop over a string iterates through its characters.
    for ((index, char) in s.withIndex()) {
        println("${char.toInt().toString(16).uppercase()} starts at $index")
    }

    // We can achieve a similar iteration by using the
    // String's iterator explicitly.
    println("\nUsing string iterator")
    var index = 0
    s.iterator().forEach { char ->
        println("${char.toInt().toString(16).uppercase()} starts at $index")
        examineChar(char)
        index += char.toString().toByteArray(Charset.forName("UTF-8")).size
    }
}

// This demonstrates passing a Char value to a function.
fun examineChar(c: Char) {
    // We can compare a Char value to a character literal directly.
    when (c) {
        't' -> println("found tee")
        'ส' -> println("found so sua")
    }
}

When you run this Kotlin program, you’ll see output similar to this:

Len: 6
Bytes: e0 b8 aa e0 b8 a7 e0 b8 b1 e0 b8 aa e0 b8 94 e0 b8 b5 
Character count: 6
E2A starts at 0
E27 starts at 1
E31 starts at 2
E2A starts at 3
E14 starts at 4
E35 starts at 5

Using string iterator
E2A starts at 0
found so sua
E27 starts at 3
E31 starts at 6
E2A starts at 9
found so sua
E14 starts at 12
E35 starts at 15

This Kotlin code demonstrates several key points:

  1. Kotlin strings are sequences of characters, and their length is the number of characters, not bytes.

  2. We can convert a string to a byte array to examine its UTF-8 encoding.

  3. Iterating over a string gives us characters, not bytes.

  4. We can get the Unicode code point of a character by converting it to an Int.

  5. Kotlin’s when expression can be used to match characters, similar to a switch statement in other languages.

  6. Kotlin doesn’t have a built-in ‘rune’ type like Go, but the Char type serves a similar purpose, representing a single Unicode character.

Remember that while Go uses UTF-8 encoding for strings internally, Kotlin uses UTF-16. This can lead to some differences in how they handle certain Unicode operations.