Strings and Runes in Clojure

Clojure, like many functional programming languages, treats strings as sequences of characters. This example demonstrates how to work with strings and characters in Clojure.

(ns strings-and-chars
  (:require [clojure.string :as str]))

(defn main []
  ; `s` is a string assigned a literal value
  ; representing the word "hello" in the Thai language.
  ; Clojure string literals are UTF-8 encoded text.
  (def s "สวัสดี")

  ; In Clojure, count returns the number of characters in a string
  (println "Len:" (count s))

  ; We can get the raw bytes of a string using .getBytes
  (println "Bytes:" (str/join " " (map #(format "%02x" %) (.getBytes s "UTF-8"))))

  ; To count how many characters are in a string, we can use count
  (println "Character count:" (count s))

  ; We can use map with char to get the individual characters
  (doseq [[idx ch] (map-indexed vector s)]
    (println (format "U+%04X '%c' starts at %d" (int ch) ch idx)))

  ; This demonstrates passing a character to a function
  (doseq [ch s]
    (examine-char ch)))

(defn examine-char [c]
  ; We can compare a character value to a character literal directly
  (cond
    (= c \t) (println "found tee")
    (= c \ส) (println "found so sua")))

(main)

To run this program, save it as strings_and_chars.clj and use the Clojure command-line tool:

$ clj strings_and_chars.clj
Len: 6
Bytes: e0 b8 aa e0 b8 a7 e0 b8 b1 e0 b8 aa e0 b8 94 e0 b8 b5
Character count: 6
U+0E2A 'ส' starts at 0
U+0E27 'ว' starts at 1
U+0E31 'ั' starts at 2
U+0E2A 'ส' starts at 3
U+0E14 'ด' starts at 4
U+0E35 'ี' starts at 5
found so sua
found so sua

In this Clojure version:

  1. We use count to get the length of the string, which returns the number of characters.

  2. To get the raw bytes, we use .getBytes method and format them as hexadecimal.

  3. Clojure’s count function already returns the number of characters (equivalent to runes in Go).

  4. We use map-indexed with vector to iterate over the string with index and character pairs.

  5. Characters in Clojure are represented by the char type, which can be compared directly with character literals (e.g., \t or \ส).

  6. The examine-char function demonstrates how to work with individual characters.

Note that Clojure, being a JVM language, uses Unicode code points to represent characters, similar to Java’s char type. This means that some complex Unicode characters might be represented as surrogate pairs, which is different from Go’s rune concept.