Title here
Summary here
Our first program will demonstrate working with strings and characters in Java. Here’s the full source code:
public class StringsAndChars {
public static void main(String[] args) {
// s is a String assigned a literal value
// representing the word "hello" in the Thai language.
// Java string literals are UTF-16 encoded.
final String s = "สวัสดี";
// This will produce the length of the string in characters.
System.out.println("Length: " + s.length());
// This loop generates the hex values of all
// the characters in s.
for (int i = 0; i < s.length(); i++) {
System.out.printf("%x ", (int) s.charAt(i));
}
System.out.println();
// To count how many characters are in a string, we can use
// the length() method. Note that in Java, characters are
// represented as UTF-16 code units, so some characters
// (like emoji) may be represented by multiple code units.
System.out.println("Character count: " + s.codePointCount(0, s.length()));
// A for-each loop can be used to iterate over characters in a string.
int index = 0;
for (int codePoint : s.codePoints().toArray()) {
System.out.printf("U+%X starts at %d%n", codePoint, index);
index += Character.charCount(codePoint);
}
System.out.println("\nUsing Character.toChars()");
for (int i = 0; i < s.length();) {
int codePoint = s.codePointAt(i);
System.out.printf("U+%X starts at %d%n", codePoint, i);
i += Character.charCount(codePoint);
examineCodePoint(codePoint);
}
}
private static void examineCodePoint(int codePoint) {
// We can compare a code point value to a character literal directly.
if (codePoint == 't') {
System.out.println("found tee");
} else if (codePoint == 'ส') {
System.out.println("found so sua");
}
}
}
To run the program, compile it and use java
:
$ javac StringsAndChars.java
$ java StringsAndChars
Length: 6
e2a e27 e31 e2a e14 e35
Character count: 6
U+E2A starts at 0
U+E27 starts at 1
U+E31 starts at 2
U+E2A starts at 3
U+E14 starts at 4
U+E35 starts at 5
Using Character.toChars()
U+E2A starts at 0
found so sua
U+E27 starts at 1
U+E31 starts at 2
U+E2A starts at 3
found so sua
U+E14 starts at 4
U+E35 starts at 5
This Java program demonstrates key concepts about strings and characters:
length()
method returns the number of UTF-16 code units in the string.charAt()
, but this may not work correctly for characters outside the Basic Multilingual Plane.codePointCount()
, codePoints()
, and codePointAt()
.Character
class provides utility methods for working with code points.Java’s handling of strings is somewhat different from some other languages, as it uses UTF-16 encoding internally. This can sometimes lead to surprising results when dealing with characters outside the Basic Multilingual Plane, as they are represented by surrogate pairs.