Strings and Runes in CLIPS

Our first program will demonstrate working with strings and characters in Java. Here’s the full source code:

public class StringsAndChars {
    public static void main(String[] args) {
        // s is a String assigned a literal value
        // representing the word "hello" in the Thai language.
        // Java string literals are UTF-16 encoded.
        final String s = "สวัสดี";

        // This will produce the length of the string in characters.
        System.out.println("Length: " + s.length());

        // This loop generates the hex values of all
        // the characters in s.
        for (int i = 0; i < s.length(); i++) {
            System.out.printf("%x ", (int) s.charAt(i));
        }
        System.out.println();

        // To count how many characters are in a string, we can use
        // the length() method. Note that in Java, characters are
        // represented as UTF-16 code units, so some characters
        // (like emoji) may be represented by multiple code units.
        System.out.println("Character count: " + s.codePointCount(0, s.length()));

        // A for-each loop can be used to iterate over characters in a string.
        int index = 0;
        for (int codePoint : s.codePoints().toArray()) {
            System.out.printf("U+%X starts at %d%n", codePoint, index);
            index += Character.charCount(codePoint);
        }

        System.out.println("\nUsing Character.toChars()");
        for (int i = 0; i < s.length();) {
            int codePoint = s.codePointAt(i);
            System.out.printf("U+%X starts at %d%n", codePoint, i);
            i += Character.charCount(codePoint);
            examineCodePoint(codePoint);
        }
    }

    private static void examineCodePoint(int codePoint) {
        // We can compare a code point value to a character literal directly.
        if (codePoint == 't') {
            System.out.println("found tee");
        } else if (codePoint == 'ส') {
            System.out.println("found so sua");
        }
    }
}

To run the program, compile it and use java:

$ javac StringsAndChars.java
$ java StringsAndChars
Length: 6
e2a e27 e31 e2a e14 e35 
Character count: 6
U+E2A starts at 0
U+E27 starts at 1
U+E31 starts at 2
U+E2A starts at 3
U+E14 starts at 4
U+E35 starts at 5

Using Character.toChars()
U+E2A starts at 0
found so sua
U+E27 starts at 1
U+E31 starts at 2
U+E2A starts at 3
found so sua
U+E14 starts at 4
U+E35 starts at 5

This Java program demonstrates key concepts about strings and characters:

  1. In Java, strings are sequences of UTF-16 code units.
  2. The length() method returns the number of UTF-16 code units in the string.
  3. Individual characters can be accessed using charAt(), but this may not work correctly for characters outside the Basic Multilingual Plane.
  4. To properly handle all Unicode characters, use methods like codePointCount(), codePoints(), and codePointAt().
  5. The Character class provides utility methods for working with code points.

Java’s handling of strings is somewhat different from some other languages, as it uses UTF-16 encoding internally. This can sometimes lead to surprising results when dealing with characters outside the Basic Multilingual Plane, as they are represented by surrogate pairs.