Strings and Runes in R Programming Language
In R, strings are treated as vectors of characters, and there isn’t a direct equivalent to Go’s concept of runes. However, we can still demonstrate similar string manipulation techniques using R’s built-in functions and the stringi package for Unicode operations.
# Install and load the stringi package for Unicode operations
if (!require(stringi)) install.packages("stringi")
library(stringi)
# Define a string in Thai
s <- "สวัสดี"
# Print the number of bytes in the string
cat("Len:", nchar(s, type = "bytes"), "\n")
# Print the hex values of each byte
cat("Hex values:", paste(charToRaw(s), collapse = " "), "\n")
# Count the number of characters (equivalent to runes in this case)
cat("Character count:", nchar(s, type = "chars"), "\n")
# Iterate over each character and its position
for (i in 1:nchar(s)) {
char <- substr(s, i, i)
cat(sprintf("U+%04X '%s' starts at %d\n",
stri_char_tobytes(char)[1], char, i - 1))
}
# Demonstrate passing a character to a function
examine_char <- function(char) {
if (char == "t") {
cat("found tee\n")
} else if (char == "ส") {
cat("found so sua\n")
}
}
cat("\nUsing stri_split_boundaries\n")
chars <- stri_split_boundaries(s, type = "character")[[1]]
for (i in seq_along(chars)) {
char <- chars[i]
cat(sprintf("U+%04X '%s' starts at %d\n",
stri_char_tobytes(char)[1], char, i - 1))
examine_char(char)
}Let’s break down the R code and explain its components:
We start by loading the
stringipackage, which provides advanced string manipulation functions, especially for Unicode strings.We define a string
scontaining Thai characters, just like in the original example.To get the length in bytes, we use
nchar(s, type = "bytes").To print hex values of each byte, we convert the string to raw bytes using
charToRaw()and then format it.To count characters (equivalent to runes in this context), we use
nchar(s, type = "chars").We iterate over each character in the string using a for loop and
substr(). We usestri_char_tobytes()to get the Unicode code point.The
examine_char()function demonstrates how to compare characters directly.Finally, we use
stri_split_boundaries()to split the string into individual characters, which is similar to theDecodeRuneInStringapproach in the original example.
When you run this R script, you’ll get output similar to the following:
Len: 18
Hex values: e0 b8 aa e0 b8 a7 e0 b8 b1 e0 b8 aa e0 b8 94 e0 b8 b5
Character count: 6
U+0E2A 'ส' starts at 0
U+0E27 'ว' starts at 1
U+0E31 'ั' starts at 2
U+0E2A 'ส' starts at 3
U+0E14 'ด' starts at 4
U+0E35 'ี' starts at 5
Using stri_split_boundaries
U+0E2A 'ส' starts at 0
found so sua
U+0E27 'ว' starts at 1
U+0E31 'ั' starts at 2
U+0E2A 'ส' starts at 3
found so sua
U+0E14 'ด' starts at 4
U+0E35 'ี' starts at 5This R code demonstrates similar concepts to the original example, showing how to work with Unicode strings, examine individual characters, and handle multi-byte characters in R.