Our first program will demonstrate working with strings and characters in Java. Here’s the full source code:
This program demonstrates several concepts related to strings and characters in Java:
Java strings are sequences of UTF-16 code units.
We can get the raw UTF-8 bytes of a string using getBytes(StandardCharsets.UTF_8).
The length() method returns the number of UTF-16 code units, which may not be the same as the number of Unicode code points for some strings.
We can count and iterate over Unicode code points using codePointCount() and codePoints().
Java uses the int type to represent Unicode code points, similar to Go’s rune type.
To run this program:
This output demonstrates that the Thai string “สวัสดี” consists of 6 Unicode code points, each of which may be represented by multiple bytes in UTF-8 encoding.