Strings and Runes in OCaml

OCaml, like many functional programming languages, treats strings differently from languages like Go. However, we can demonstrate similar concepts using OCaml’s string and character handling capabilities.

(* We'll use the Str module for regular expressions *)
open Str

(* Define a string containing Thai characters *)
let s = "สวัสดี"

(* Print the length of the string in bytes *)
let () = Printf.printf "Len: %d\n" (String.length s)

(* Print the hex values of each byte *)
let () =
  String.iter (fun c -> Printf.printf "%x " (Char.code c)) s;
  print_newline ()

(* Count the number of characters (which is different from bytes for UTF-8) *)
let () =
  let char_count = List.length (Str.split (Str.regexp "") s) in
  Printf.printf "Character count: %d\n" char_count

(* Iterate over characters and their positions *)
let () =
  String.iteri (fun i c ->
    Printf.printf "U+%04X '%c' starts at %d\n" (Char.code c) c i
  ) s

(* Define a function to examine a character *)
let examine_char c =
  match c with
  | 't' -> print_endline "found tee"
  | 'ส' -> print_endline "found so sua"
  | _ -> ()

(* Use the examine_char function *)
let () =
  print_endline "\nExamining characters:";
  String.iter examine_char s

This OCaml code demonstrates concepts similar to those in the original Go example:

  1. We define a string s containing Thai characters.

  2. We print the length of the string in bytes using String.length.

  3. We iterate over each byte in the string and print its hexadecimal value.

  4. We count the number of characters in the string. Note that in OCaml, we use a simple method of splitting the string into a list of characters, which may not be perfect for all UTF-8 encodings but serves as an approximation.

  5. We iterate over each character in the string, printing its Unicode code point and starting position.

  6. We define an examine_char function that checks for specific characters, similar to the examineRune function in the Go example.

  7. Finally, we apply the examine_char function to each character in the string.

To run this program, save it as strings_and_chars.ml and compile it with:

$ ocamlc str.cma strings_and_chars.ml -o strings_and_chars

Then run it with:

$ ./strings_and_chars

This example demonstrates how OCaml handles strings and characters, which is somewhat different from Go’s concept of strings and runes. OCaml strings are sequences of bytes, and characters are represented as 8-bit integers. For proper Unicode handling, additional libraries like Uutf or Camomile would be necessary.