Title here
Summary here
// A TypeScript string is a primitive data type that represents
// text as a sequence of UTF-16 code units. Unlike Go, TypeScript
// doesn't have a separate 'rune' type, but it does support Unicode
// code points through the use of surrogate pairs.
// We'll use the 'string' type to store our Thai greeting
const s: string = "สวัสดี";
// In TypeScript, the length property of a string returns the number
// of UTF-16 code units, which may not always equal the number of
// characters for strings containing characters outside the BMP
console.log("Length:", s.length);
// We can iterate over the UTF-16 code units of the string
for (let i = 0; i < s.length; i++) {
console.log(s.charCodeAt(i).toString(16));
}
console.log();
// To count the number of Unicode code points, we can use the
// spread operator to convert the string to an array of code points
console.log("Code point count:", [...s].length);
// We can use a for...of loop to iterate over the Unicode code points
for (const [index, codePoint] of [...s].entries()) {
console.log(`U+${codePoint.codePointAt(0)!.toString(16).padStart(4, '0')} '${codePoint}' starts at ${index}`);
}
// We can also use String.prototype.codePointAt to iterate over code points
console.log("\nUsing String.prototype.codePointAt");
for (let i = 0; i < s.length;) {
const codePoint = s.codePointAt(i)!;
console.log(`U+${codePoint.toString(16).padStart(4, '0')} '${String.fromCodePoint(codePoint)}' starts at ${i}`);
i += codePoint > 0xFFFF ? 2 : 1;
examineCodePoint(codePoint);
}
function examineCodePoint(cp: number): void {
// In TypeScript, we can compare a code point directly to a Unicode code point
if (cp === 't'.codePointAt(0)) {
console.log("found tee");
} else if (cp === 'ส'.codePointAt(0)) {
console.log("found so sua");
}
}
This TypeScript code demonstrates how to work with strings and Unicode code points. Here are some key differences from the Go version:
string
type for all strings, which are sequences of UTF-16 code units.rune
type in TypeScript. Instead, we work with code points using methods like codePointAt
and fromCodePoint
.length
property of a string in TypeScript returns the number of UTF-16 code units, which may not always equal the number of perceived characters for strings containing characters outside the Basic Multilingual Plane (BMP).codePointAt
.codePointAt(0)
to get the code point of a single-character string.This code provides similar functionality to the Go example, allowing you to examine and work with Unicode strings in TypeScript.