Strings and Runes in R Programming Language

In R, strings are treated as vectors of characters, and there isn’t a direct equivalent to Go’s concept of runes. However, we can still demonstrate similar string manipulation techniques using R’s built-in functions and the stringi package for Unicode operations.

# Install and load the stringi package for Unicode operations
if (!require(stringi)) install.packages("stringi")
library(stringi)

# Define a string in Thai
s <- "สวัสดี"

# Print the number of bytes in the string
cat("Len:", nchar(s, type = "bytes"), "\n")

# Print the hex values of each byte
cat("Hex values:", paste(charToRaw(s), collapse = " "), "\n")

# Count the number of characters (equivalent to runes in this case)
cat("Character count:", nchar(s, type = "chars"), "\n")

# Iterate over each character and its position
for (i in 1:nchar(s)) {
  char <- substr(s, i, i)
  cat(sprintf("U+%04X '%s' starts at %d\n", 
              stri_char_tobytes(char)[1], char, i - 1))
}

# Demonstrate passing a character to a function
examine_char <- function(char) {
  if (char == "t") {
    cat("found tee\n")
  } else if (char == "ส") {
    cat("found so sua\n")
  }
}

cat("\nUsing stri_split_boundaries\n")
chars <- stri_split_boundaries(s, type = "character")[[1]]
for (i in seq_along(chars)) {
  char <- chars[i]
  cat(sprintf("U+%04X '%s' starts at %d\n", 
              stri_char_tobytes(char)[1], char, i - 1))
  examine_char(char)
}

Let’s break down the R code and explain its components:

  1. We start by loading the stringi package, which provides advanced string manipulation functions, especially for Unicode strings.

  2. We define a string s containing Thai characters, just like in the original example.

  3. To get the length in bytes, we use nchar(s, type = "bytes").

  4. To print hex values of each byte, we convert the string to raw bytes using charToRaw() and then format it.

  5. To count characters (equivalent to runes in this context), we use nchar(s, type = "chars").

  6. We iterate over each character in the string using a for loop and substr(). We use stri_char_tobytes() to get the Unicode code point.

  7. The examine_char() function demonstrates how to compare characters directly.

  8. Finally, we use stri_split_boundaries() to split the string into individual characters, which is similar to the DecodeRuneInString approach in the original example.

When you run this R script, you’ll get output similar to the following:

Len: 18 
Hex values: e0 b8 aa e0 b8 a7 e0 b8 b1 e0 b8 aa e0 b8 94 e0 b8 b5 
Character count: 6 
U+0E2A 'ส' starts at 0
U+0E27 'ว' starts at 1
U+0E31 'ั' starts at 2
U+0E2A 'ส' starts at 3
U+0E14 'ด' starts at 4
U+0E35 'ี' starts at 5

Using stri_split_boundaries
U+0E2A 'ส' starts at 0
found so sua
U+0E27 'ว' starts at 1
U+0E31 'ั' starts at 2
U+0E2A 'ส' starts at 3
found so sua
U+0E14 'ด' starts at 4
U+0E35 'ี' starts at 5

This R code demonstrates similar concepts to the original example, showing how to work with Unicode strings, examine individual characters, and handle multi-byte characters in R.