Strings and Runes in Racket

Our first program will demonstrate string and character handling in Racket. Here’s the full source code:

#lang racket

(define s "สวัสดี")

; Print the length of the string (in bytes)
(printf "Len: ~a\n" (string-length s))

; Print the hexadecimal representation of each byte
(for ([b (string->bytes/utf-8 s)])
  (printf "~x " b))
(newline)

; Print the number of characters in the string
(printf "Char count: ~a\n" (string-length s))

; Iterate over each character in the string
(for ([c s]
      [i (in-naturals)])
  (printf "U+~x '~a' starts at ~a\n" 
          (char->integer c) 
          c 
          i))

; Define a function to examine characters
(define (examine-char c)
  (cond
    [(char=? c #\t) (printf "found tee\n")]
    [(char=? c #\ส) (printf "found so sua\n")]))

; Use the examine-char function
(printf "\nUsing examine-char\n")
(for ([c s])
  (examine-char c))

Let’s break down the code and explain its components:

  1. We define a string s containing the Thai word for “hello”.

  2. We print the length of the string using string-length. In Racket, this gives us the number of characters, not bytes.

  3. To print the hexadecimal representation of each byte, we first convert the string to a byte string using string->bytes/utf-8, then iterate over each byte.

  4. We print the character count, which in Racket is the same as the string length.

  5. We iterate over each character in the string, printing its Unicode code point, the character itself, and its position in the string.

  6. We define a function examine-char that checks if a character is ’t’ or ‘ส’ and prints a message accordingly.

  7. Finally, we use the examine-char function on each character in our string.

To run this program, save it as strings-and-chars.rkt and use the Racket interpreter:

$ racket strings-and-chars.rkt
Len: 6
e0 b8 aa e0 b8 a7 e0 b8 b1 e0 b8 aa e0 b8 94 e0 b8 b5 
Char count: 6
U+e2a 'ส' starts at 0
U+e27 'ว' starts at 1
U+e31 'ั' starts at 2
U+e2a 'ส' starts at 3
U+e14 'ด' starts at 4
U+e35 'ี' starts at 5

Using examine-char
found so sua
found so sua

This example demonstrates how Racket handles Unicode strings and characters. Unlike some languages, Racket treats strings as sequences of characters rather than bytes, which simplifies many string operations but may require explicit conversion to bytes when needed.