Strings and Runes in PHP

A PHP string is a series of characters. PHP uses UTF-8 encoding by default, which means it can handle multi-byte characters like those found in many non-Latin alphabets. Unlike some other languages, PHP doesn't have a separate 'character' type - a single character is simply a string of length 1.

```php
<?php

// $s is a string assigned a literal value
// representing the word "hello" in the Thai language.
// PHP string literals are UTF-8 encoded by default.
$s = "สวัสดี";

// strlen() returns the number of bytes in the string
echo "Len: " . strlen($s) . "\n";

// This loop generates the hex values of all
// the bytes that constitute the string $s.
for ($i = 0; $i < strlen($s); $i++) {
    printf("%x ", ord($s[$i]));
}
echo "\n";

// To count how many characters are in a string, we can use
// the mb_strlen() function from the mbstring extension.
// This correctly handles multi-byte characters.
echo "Character count: " . mb_strlen($s, 'UTF-8') . "\n";

// We can use mb_str_split() to split the string into an array of characters,
// and then iterate over them.
foreach (mb_str_split($s) as $idx => $char) {
    printf("U+%X '%s' starts at %d\n", mb_ord($char), $char, mb_strlen(mb_substr($s, 0, $idx), 'UTF-8'));
}

echo "\nUsing mb_strlen and mb_substr\n";
$i = 0;
while ($i < mb_strlen($s, 'UTF-8')) {
    $char = mb_substr($s, $i, 1, 'UTF-8');
    printf("U+%X '%s' starts at %d\n", mb_ord($char), $char, $i);
    examineChar($char);
    $i++;
}

function examineChar($c) {
    // We can compare a character directly to a string literal
    if ($c === 't') {
        echo "found tee\n";
    } elseif ($c === '') {
        echo "found so sua\n";
    }
}

When you run this script, you’ll see output similar to this:

Len: 18
e0 b8 aa e0 b8 a7 e0 b8 b1 e0 b8 aa e0 b8 94 e0 b8 b5 
Character count: 6
U+E2A 'ส' starts at 0
U+E27 'ว' starts at 1
U+E31 'ั' starts at 2
U+E2A 'ส' starts at 3
U+E14 'ด' starts at 4
U+E35 'ี' starts at 5

Using mb_strlen and mb_substr
U+E2A 'ส' starts at 0
found so sua
U+E27 'ว' starts at 1
U+E31 'ั' starts at 2
U+E2A 'ส' starts at 3
found so sua
U+E14 'ด' starts at 4
U+E35 'ี' starts at 5

This PHP script demonstrates how to work with UTF-8 encoded strings, which is especially important when dealing with non-ASCII characters. It shows how to get the byte length of a string, iterate over its bytes, count the number of characters (not bytes), and how to properly iterate over multi-byte characters.

The mb_* functions from the mbstring extension are used to correctly handle multi-byte characters. These functions are crucial when working with strings that contain characters from non-Latin alphabets or emoji.

The examineChar function demonstrates how to compare individual characters in a string, which can be useful for more complex string processing tasks.

Remember to ensure that your PHP installation has the mbstring extension enabled to use these multi-byte string functions.

查看推荐产品