Strings and Runes in Wolfram Language

A Wolfram Language string is a sequence of characters. The language and the standard library treat strings as fundamental data types with built-in support for Unicode.

```wolfram
(* s is a String assigned a literal value
   representing the word "hello" in the Thai language. *)
s = "สวัสดี"

(* StringLength gives the number of characters in the string *)
Print["Len: ", StringLength[s]]

(* We can use Characters to convert the string to a list of characters,
   then map them to their integer representations *)
Print[IntegerString[ToCharacterCode[s], 16]]

(* StringLength gives us the number of characters (equivalent to runes in Go) *)
Print["Character count: ", StringLength[s]]

(* We can use MapIndexed to iterate over each character with its position *)
MapIndexed[
  (Print[
    StringTemplate["U+````` '``' starts at ``"][
      IntegerString[ToCharacterCode[#1][[1]], 16, 4],
      #1,
      First[#2] - 1
    ]
  ]) &,
  Characters[s]
]

(* We can achieve a similar iteration using StringTake and StringDrop *)
Print["\nUsing StringTake and StringDrop"]
i = 1;
While[i <= StringLength[s],
  char = StringTake[s, {i}];
  Print[
    StringTemplate["U+````` '``' starts at ``"][
      IntegerString[ToCharacterCode[char][[1]], 16, 4],
      char,
      i - 1
    ]
  ];
  examineCharacter[char];
  i++
]

(* This demonstrates passing a character to a function *)
examineCharacter[char_] := Module[{},
  If[char == "t",
    Print["found tee"],
    If[char == "ส",
      Print["found so sua"]
    ]
  ]
]

When you run this code, you’ll see output similar to the following:

Len: 6
e0 b8 aa e0 b8 a7 e0 b8 b1 e0 b8 aa e0 b8 94 e0 b8 b5
Character count: 6
U+0E2A 'ส' starts at 0
U+0E27 'ว' starts at 1
U+0E31 'ั' starts at 2
U+0E2A 'ส' starts at 3
U+0E14 'ด' starts at 4
U+0E35 'ี' starts at 5

Using StringTake and StringDrop
U+0E2A 'ส' starts at 0
found so sua
U+0E27 'ว' starts at 1
U+0E31 'ั' starts at 2
U+0E2A 'ส' starts at 3
found so sua
U+0E14 'ด' starts at 4
U+0E35 'ี' starts at 5

In Wolfram Language, strings are sequences of Unicode characters. Unlike some other languages, there’s no separate concept of “runes” - each character in a string is inherently a Unicode code point.

The StringLength function gives the number of characters in a string, which is equivalent to the “rune count” in the original example.

We use ToCharacterCode to get the Unicode code points of characters, and IntegerString to format these as hexadecimal strings.

The Characters function splits a string into a list of individual characters, which we can then process with functions like MapIndexed.

The examineCharacter function demonstrates how to compare characters directly. In Wolfram Language, you can use string literals (like “ส”) directly for comparison, without needing a special rune type.

This example demonstrates various ways to work with strings and Unicode characters in Wolfram Language, covering similar concepts to the original Go code but using idiomatic Wolfram Language constructs and functions.

查看推荐产品