Title here
Summary here
Our Java program will demonstrate working with strings and characters. Here’s the full source code:
import java.nio.charset.StandardCharsets;
public class StringsAndChars {
public static void main(String[] args) {
// s is a String assigned a literal value
// representing the word "hello" in the Thai language.
// Java string literals are UTF-16 encoded text.
final String s = "สวัสดี";
// This will produce the length of the string in characters.
System.out.println("Len: " + s.length());
// This loop generates the hex values of all
// the bytes that constitute the UTF-8 representation of s.
byte[] utf8Bytes = s.getBytes(StandardCharsets.UTF_8);
for (byte b : utf8Bytes) {
System.out.printf("%x ", b);
}
System.out.println();
// To count how many characters are in a string, we can use
// the length() method. Note that some Thai characters are
// represented by surrogate pairs in UTF-16, so the result
// of this count may be surprising.
System.out.println("Character count: " + s.codePointCount(0, s.length()));
// A for loop with the codePoints() method handles strings
// specially and decodes each code point along with its offset in the string.
s.codePoints().forEach(codePoint -> {
System.out.printf("U+%04X starts at %d%n", codePoint, s.indexOf(new String(Character.toChars(codePoint))));
});
// We can achieve the same iteration by using the
// Character.charCount and String.codePointAt methods explicitly.
System.out.println("\nUsing codePointAt");
for (int i = 0; i < s.length(); ) {
int codePoint = s.codePointAt(i);
System.out.printf("U+%04X starts at %d%n", codePoint, i);
examineCodePoint(codePoint);
i += Character.charCount(codePoint);
}
}
private static void examineCodePoint(int codePoint) {
// Values enclosed in single quotes are character literals.
// We can compare an int code point value to a character literal directly.
if (codePoint == 't') {
System.out.println("found tee");
} else if (codePoint == 'ส') {
System.out.println("found so sua");
}
}
}
To run the program, compile and execute it using the javac
and java
commands:
$ javac StringsAndChars.java
$ java StringsAndChars
Len: 6
e0 b8 aa e0 b8 a7 e0 b8 b1 e0 b8 aa e0 b8 94 e0 b8 b5
Character count: 6
U+0E2A starts at 0
U+0E27 starts at 1
U+0E31 starts at 2
U+0E2A starts at 3
U+0E14 starts at 4
U+0E35 starts at 5
Using codePointAt
U+0E2A starts at 0
found so sua
U+0E27 starts at 1
U+0E31 starts at 2
U+0E2A starts at 3
found so sua
U+0E14 starts at 4
U+0E35 starts at 5
This Java program demonstrates various aspects of working with strings and characters:
Note that Java uses UTF-16 for its internal string representation, which is different from some other languages that use UTF-8. This can lead to some differences in how strings are handled, particularly with characters outside the Basic Multilingual Plane.