Strings and Runes in C

Our program demonstrates working with strings and characters in C. Here’s the full source code:

#include <stdio.h>
#include <string.h>
#include <wchar.h>
#include <locale.h>

void examine_char(wchar_t c) {
    if (c == L't') {
        printf("found tee\n");
    } else if (c == L'ส') {
        printf("found so sua\n");
    }
}

int main() {
    setlocale(LC_ALL, "");

    const wchar_t *s = L"สวัสดี";

    // Print the length of the string in bytes
    printf("Len: %zu\n", wcslen(s) * sizeof(wchar_t));

    // Print the hex values of each wide character
    for (int i = 0; s[i] != L'\0'; i++) {
        printf("%04x ", (unsigned int)s[i]);
    }
    printf("\n");

    // Count the number of characters
    printf("Character count: %zu\n", wcslen(s));

    // Iterate over each character
    for (int i = 0; s[i] != L'\0'; i++) {
        printf("U+%04X starts at %d\n", (unsigned int)s[i], i * sizeof(wchar_t));
    }

    printf("\nUsing individual character access\n");
    for (int i = 0; s[i] != L'\0'; i++) {
        wchar_t c = s[i];
        printf("U+%04X starts at %d\n", (unsigned int)c, i * sizeof(wchar_t));
        examine_char(c);
    }

    return 0;
}

This C program demonstrates working with wide characters and strings to handle Unicode text. Let’s break it down:

  1. We include necessary headers and set the locale to handle Unicode characters properly.

  2. We define a constant wide string s containing Thai characters.

  3. We print the length of the string in bytes. In C, we use wcslen() to get the number of wide characters and multiply by sizeof(wchar_t) to get the byte count.

  4. We iterate over the string to print the hexadecimal values of each wide character.

  5. We count and print the number of characters using wcslen().

  6. We iterate over the string again, printing each character’s Unicode code point and its starting byte position.

  7. We demonstrate accessing individual characters and passing them to a function (examine_char).

  8. The examine_char function checks if the character matches specific Unicode values.

To compile and run this program:

$ gcc -o strings_and_chars strings_and_chars.c
$ ./strings_and_chars
Len: 24
0e2a 0e27 0e31 0e2a 0e14 0e35 
Character count: 6
U+0E2A starts at 0
U+0E27 starts at 4
U+0E31 starts at 8
U+0E2A starts at 12
U+0E14 starts at 16
U+0E35 starts at 20

Using individual character access
U+0E2A starts at 0
found so sua
U+0E27 starts at 4
U+0E31 starts at 8
U+0E2A starts at 12
found so sua
U+0E14 starts at 16
U+0E35 starts at 20

Note that the exact output may vary depending on your system’s locale settings and wide character implementation.

This example demonstrates how to work with Unicode strings in C, which is a bit more complex than in some higher-level languages due to C’s lower-level nature. The use of wide characters (wchar_t) and wide character functions allows us to handle Unicode text, although it requires more explicit management compared to languages with built-in Unicode support.