Strings and Runes in Idris

Idris, like many functional programming languages, treats strings as a sequence of characters rather than bytes. However, we can still explore similar concepts to demonstrate string handling and character manipulation in Idris.

import Data.String
import Data.Char

main : IO ()
main = do
    -- Define a string constant
    let s = "สวัสดี"

    -- Print the length of the string
    putStrLn $ "Len: " ++ show (length s)

    -- Print the characters of the string
    putStrLn $ "Characters: " ++ show (unpack s)

    -- Count the number of characters
    putStrLn $ "Character count: " ++ show (length $ unpack s)

    -- Iterate over the characters with their indices
    let indexedChars = zip [0..] (unpack s)
    traverse_ (\(i, c) -> 
        putStrLn $ "Character '" ++ singleton c ++ "' starts at " ++ show i
    ) indexedChars

    -- Demonstrate character comparison
    putStrLn "\nExamining characters:"
    traverse_ examineChar (unpack s)

examineChar : Char -> IO ()
examineChar c = 
    case c of
        't' -> putStrLn "found tee"
        'ส' -> putStrLn "found so sua"
        _   -> pure ()

In this Idris version:

  1. We use Data.String and Data.Char for string and character operations.

  2. The s constant is defined as a string containing Thai characters.

  3. We use length s to get the number of characters in the string. In Idris, this gives us the character count, not the byte count.

  4. We use unpack s to convert the string to a list of characters, which we can then manipulate.

  5. To iterate over characters with their indices, we use zip with a range and the unpacked string.

  6. The examineChar function demonstrates character comparison. In Idris, character literals are enclosed in single quotes.

  7. We use traverse_ for side-effecting operations over lists, which is similar to a for loop in imperative languages.

To run this program, save it as StringsAndChars.idr and use the Idris compiler:

$ idris -o StringsAndChars StringsAndChars.idr
$ ./StringsAndChars
Len: 6
Characters: ['ส','ว','ั','ส','ด','ี']
Character count: 6
Character 'ส' starts at 0
Character 'ว' starts at 1
Character 'ั' starts at 2
Character 'ส' starts at 3
Character 'ด' starts at 4
Character 'ี' starts at 5

Examining characters:
found so sua
found so sua

This example demonstrates how Idris handles strings and characters. Unlike some languages, Idris treats strings as sequences of Unicode characters by default, simplifying operations on text containing non-ASCII characters.