Strings and Runes in TypeScript

// A TypeScript string is a primitive data type that represents
// text as a sequence of UTF-16 code units. Unlike Go, TypeScript
// doesn't have a separate 'rune' type, but it does support Unicode
// code points through the use of surrogate pairs.

// We'll use the 'string' type to store our Thai greeting
const s: string = "สวัสดี";

// In TypeScript, the length property of a string returns the number
// of UTF-16 code units, which may not always equal the number of
// characters for strings containing characters outside the BMP
console.log("Length:", s.length);

// We can iterate over the UTF-16 code units of the string
for (let i = 0; i < s.length; i++) {
    console.log(s.charCodeAt(i).toString(16));
}
console.log();

// To count the number of Unicode code points, we can use the
// spread operator to convert the string to an array of code points
console.log("Code point count:", [...s].length);

// We can use a for...of loop to iterate over the Unicode code points
for (const [index, codePoint] of [...s].entries()) {
    console.log(`U+${codePoint.codePointAt(0)!.toString(16).padStart(4, '0')} '${codePoint}' starts at ${index}`);
}

// We can also use String.prototype.codePointAt to iterate over code points
console.log("\nUsing String.prototype.codePointAt");
for (let i = 0; i < s.length;) {
    const codePoint = s.codePointAt(i)!;
    console.log(`U+${codePoint.toString(16).padStart(4, '0')} '${String.fromCodePoint(codePoint)}' starts at ${i}`);
    i += codePoint > 0xFFFF ? 2 : 1;
    examineCodePoint(codePoint);
}

function examineCodePoint(cp: number): void {
    // In TypeScript, we can compare a code point directly to a Unicode code point
    if (cp === 't'.codePointAt(0)) {
        console.log("found tee");
    } else if (cp === 'ส'.codePointAt(0)) {
        console.log("found so sua");
    }
}

This TypeScript code demonstrates how to work with strings and Unicode code points. Here are some key differences from the Go version:

  1. TypeScript uses the string type for all strings, which are sequences of UTF-16 code units.
  2. There’s no built-in rune type in TypeScript. Instead, we work with code points using methods like codePointAt and fromCodePoint.
  3. The length property of a string in TypeScript returns the number of UTF-16 code units, which may not always equal the number of perceived characters for strings containing characters outside the Basic Multilingual Plane (BMP).
  4. To count actual Unicode code points, we can use the spread operator to convert the string to an array of code points.
  5. TypeScript doesn’t have a built-in way to iterate over code points directly, so we implement our own methods using codePointAt.
  6. Instead of using single quotes for rune literals, we use codePointAt(0) to get the code point of a single-character string.

This code provides similar functionality to the Go example, allowing you to examine and work with Unicode strings in TypeScript.

查看推荐产品