Our Java program will demonstrate the concept of strings and characters. Here’s the full source code:
To run the program, compile and execute it using javac and java:
This example demonstrates several important concepts:
In Java, strings are sequences of UTF-16 code units. Some characters (like those in the Thai example) are represented by a single code unit, while others may require two code units (surrogate pairs).
The length() method returns the number of UTF-16 code units, not necessarily the number of visible characters.
To get the actual number of characters (code points), we use codePointCount().
We can iterate over the code points in a string using the codePoints() method, which returns an IntStream of Unicode code points.
Java chars are 16-bit Unicode code units. For characters outside the Basic Multilingual Plane (BMP), we need to use surrogate pairs.
The getBytes() method with UTF-8 encoding is used to get the raw bytes of the string, similar to how Go treats strings as byte slices.
This Java code provides similar functionality to the Go example, demonstrating how to work with strings and characters in a Unicode-aware manner.