Strings and Runes in Erlang

In Erlang, strings are represented as lists of integers, where each integer corresponds to a Unicode code point. This is different from many other languages where strings are often treated as a separate data type.

-module(strings_and_runes).
-export([main/0]).

main() ->
    % S is a string (list of integers) representing the word "hello" in Thai
    S = "สวัสดี",

    % Print the length of the string (number of Unicode code points)
    io:format("Len: ~p~n", [length(S)]),

    % Print the Unicode code points in hexadecimal
    [io:format("~.16B ", [C]) || C <- S],
    io:format("~n"),

    % Count the number of characters (same as length for Erlang strings)
    io:format("Character count: ~p~n", [length(S)]),

    % Iterate over each character in the string
    lists:foldl(fun(C, Idx) ->
        io:format("U+~.16B '~ts' starts at ~p~n", [C, [C], Idx]),
        Idx + 1
    end, 0, S),

    io:format("~nUsing lists:nth~n"),
    lists:foldl(fun(Idx, _) ->
        C = lists:nth(Idx, S),
        io:format("U+~.16B '~ts' starts at ~p~n", [C, [C], Idx-1]),
        examine_char(C)
    end, 0, lists:seq(1, length(S))).

examine_char(C) ->
    case C of
        $t -> io:format("found tee~n");
        16_0E2A -> io:format("found so sua~n");
        _ -> ok
    end.

In this Erlang version:

  1. We define a module strings_and_runes with a main/0 function.

  2. Strings in Erlang are lists of integers, where each integer represents a Unicode code point.

  3. We use length/1 to get the number of characters in the string, which is equivalent to both the byte length and the rune count in this case.

  4. To print the hexadecimal representation of each character, we use a list comprehension with io:format/2.

  5. We use lists:foldl/3 to iterate over the string, printing each character’s Unicode code point and its position.

  6. Instead of DecodeRuneInString, we use lists:nth/2 to access individual characters in the string.

  7. The examine_char/1 function demonstrates pattern matching on specific characters.

To run this program:

$ erlc strings_and_runes.erl
$ erl -noshell -s strings_and_runes main -s init stop
Len: 6
E2A E27 E31 E2A E14 E35 
Character count: 6
U+0E2A 'ส' starts at 0
U+0E27 'ว' starts at 1
U+0E31 'ั' starts at 2
U+0E2A 'ส' starts at 3
U+0E14 'ด' starts at 4
U+0E35 'ี' starts at 5

Using lists:nth
U+0E2A 'ส' starts at 0
found so sua
U+0E27 'ว' starts at 1
U+0E31 'ั' starts at 2
U+0E2A 'ส' starts at 3
found so sua
U+0E14 'ด' starts at 4
U+0E35 'ี' starts at 5

Note that Erlang handles Unicode strings natively, so we don’t need to use any special libraries for UTF-8 decoding. The concept of “runes” doesn’t exist in Erlang, as each element in a string is already a full Unicode code point.