Strings and Runes in PureScript

module Main where

import Prelude
import Effect (Effect)
import Effect.Console (log)

main :: Effect Unit
main = do
  -- `s` is a `String` assigned a literal value
  -- representing the word "hello" in the Thai
  -- language. PureScript string literals are UTF-8
  -- encoded text.
  let s = "สวัสดี"

  -- Since strings in PureScript are arrays of characters,
  -- this will produce the length of the string in characters.
  log $ "Len: " <> show (length s)

  -- In PureScript, we can use `toCharArray` to convert a string
  -- to an array of characters, then map over it to get the 
  -- Unicode code points.
  log $ "Unicode code points: " <> show (map toCharCode (toCharArray s))

  -- To count how many characters are in a string, we can use
  -- the `length` function on the string directly.
  log $ "Character count: " <> show (length s)

  -- A `for` loop in PureScript can be achieved using recursion
  -- or higher-order functions like `traverse_`. Here we use
  -- `traverse_` to iterate over the characters in the string.
  traverse_ (\(Tuple idx char) -> 
    log $ show char <> " starts at " <> show idx
  ) (zip (0 .. (length s - 1)) (toCharArray s))

  -- This demonstrates passing a `Char` value to a function.
  traverse_ examineChar (toCharArray s)

examineChar :: Char -> Effect Unit
examineChar c = 
  -- We can compare a `Char` value to a character literal directly.
  if c == 't' 
    then log "found tee"
  else if c == 'ส'
    then log "found so sua"
  else pure unit

In PureScript, strings are treated as arrays of characters, which simplifies some operations but may not be as efficient for large strings or complex Unicode manipulations. Here’s a breakdown of the key differences and concepts:

  1. PureScript uses String for string types, and they are inherently Unicode-aware.

  2. The length function on a string returns the number of characters, not bytes.

  3. PureScript doesn’t have a built-in concept of “runes”. Instead, we work with Char values, which represent Unicode code points.

  4. To get the Unicode code points, we convert the string to an array of characters and then map over it with toCharCode.

  5. Iteration over characters in a string is typically done using higher-order functions like traverse_ or by converting the string to an array of characters first.

  6. PureScript doesn’t have mutable variables or traditional for-loops. Instead, we use recursion or higher-order functions for iteration.

  7. The examineChar function demonstrates how to work with individual characters.

To run this program, you would typically compile it with the PureScript compiler and then run it using Node.js or in a browser environment, depending on your project setup.

$ pulp run
Len: 6
Unicode code points: [3626,3623,3633,3626,3604,3637]
Character count: 6
ส starts at 0
ว starts at 1
ั starts at 2
ส starts at 3
ด starts at 4
ี starts at 5
found so sua
found so sua

This example demonstrates how PureScript handles strings and characters, which differs from some other languages in its functional and immutable approach.