Strings and Runes in Haskell

In Haskell, strings are typically represented as a list of characters. The language treats strings as a sequence of Unicode characters, which is similar to the concept of runes in other languages. Let’s explore how Haskell handles strings and characters.

import Data.Char (ord)
import Data.List (foldl')
import qualified Data.Text as T
import qualified Data.Text.IO as TIO

main :: IO ()
main = do
    -- s is a Text value assigned a literal value
    -- representing the word "hello" in the Thai language.
    -- Haskell string literals are Unicode text.
    let s = T.pack "สวัสดี"

    -- Print the length of the Text, which gives us the number of characters
    putStrLn $ "Len: " ++ show (T.length s)

    -- Print the Unicode code points of each character
    putStrLn $ "Unicode code points: " ++ 
               unwords (map (show . ord) (T.unpack s))

    -- Count the number of characters in the string
    putStrLn $ "Character count: " ++ show (T.length s)

    -- Iterate over each character with its index
    mapM_ (\(idx, c) -> putStrLn $ 
           "U+" ++ showHex (ord c) ++ " '" ++ [c] ++ 
           "' starts at " ++ show idx)
        (zip [0..] (T.unpack s))

    putStrLn "\nUsing T.foldl'"
    -- Use T.foldl' to iterate over characters
    T.foldl' (\idx c -> do
        putStrLn $ "U+" ++ showHex (ord c) ++ " '" ++ [c] ++ 
                   "' starts at " ++ show idx
        examineChar c
        return (idx + 1)) 0 s
    return ()

-- This demonstrates passing a Char value to a function
examineChar :: Char -> IO ()
examineChar c
    | c == 't'  = putStrLn "found tee"
    | c == 'ส'  = putStrLn "found so sua"
    | otherwise = return ()

-- Helper function to show hexadecimal representation
showHex :: Int -> String
showHex n = let s = "0123456789ABCDEF"
            in [s !! (n `div` 16), s !! (n `mod` 16)]

In this Haskell version:

  1. We use Data.Text (imported as T) for efficient Unicode string handling, which is similar to Go’s string type.

  2. T.length gives us the number of characters in the string, equivalent to utf8.RuneCountInString in Go.

  3. We use map and ord to print the Unicode code points of each character.

  4. The zip [0..] idiom is used to pair each character with its index when iterating.

  5. T.foldl' is used to demonstrate explicit iteration over the string, similar to the DecodeRuneInString example in Go.

  6. The examineChar function shows how to compare characters directly.

When you run this program, you’ll see output similar to the Go version, but adapted for Haskell’s string representation and functions.

This example demonstrates how Haskell handles Unicode strings and characters, providing similar functionality to Go’s string and rune types.