Strings and Runes in Cilk

Our first example demonstrates how to work with strings and runes in Cilk. In Cilk, strings are sequences of characters, and characters are represented by their ASCII or Unicode values.

#include <cilk/cilk.h>
#include <stdio.h>
#include <string.h>
#include <wchar.h>
#include <locale.h>

void examine_char(wchar_t c) {
    if (c == L't') {
        printf("found tee\n");
    } else if (c == L'ส') {
        printf("found so sua\n");
    }
}

int main() {
    setlocale(LC_ALL, "");

    const wchar_t *s = L"สวัสดี";

    // Print the length of the string in bytes
    printf("Len: %zu\n", wcslen(s) * sizeof(wchar_t));

    // Print the hex values of each wide character
    for (int i = 0; s[i] != L'\0'; i++) {
        printf("%04x ", (unsigned int)s[i]);
    }
    printf("\n");

    // Count the number of characters
    printf("Character count: %zu\n", wcslen(s));

    // Iterate over each character
    for (int i = 0; s[i] != L'\0'; i++) {
        printf("U+%04X starts at %d\n", (unsigned int)s[i], i * sizeof(wchar_t));
    }

    printf("\nUsing individual character access\n");
    for (int i = 0; s[i] != L'\0'; i++) {
        wchar_t c = s[i];
        printf("U+%04X starts at %d\n", (unsigned int)c, i * sizeof(wchar_t));
        examine_char(c);
    }

    return 0;
}

In this Cilk program:

  1. We use wide characters (wchar_t) to represent Unicode characters.

  2. The setlocale(LC_ALL, "") function is called to set the program’s locale, which is necessary for proper handling of wide characters.

  3. We define a string s containing Thai characters using a wide string literal.

  4. We print the length of the string in bytes, which is the number of wide characters multiplied by the size of wchar_t.

  5. We iterate over the string to print the hexadecimal value of each wide character.

  6. We count and print the number of characters in the string using wcslen().

  7. We iterate over the string again, printing the Unicode code point and its byte offset for each character.

  8. Finally, we demonstrate accessing individual characters and passing them to a function.

Note that Cilk doesn’t have built-in UTF-8 handling like Go does. This example uses wide characters, which may be UTF-16 or UTF-32 depending on the system. For more precise UTF-8 handling, you would need to use a dedicated Unicode library.

To compile and run this program:

$ clang -fcilk-clang -O3 strings_and_runes.c -o strings_and_runes
$ ./strings_and_runes
Len: 24
0e2a 0e27 0e31 0e2a 0e14 0e35 
Character count: 6
U+0E2A starts at 0
U+0E27 starts at 4
U+0E31 starts at 8
U+0E2A starts at 12
U+0E14 starts at 16
U+0E35 starts at 20

Using individual character access
U+0E2A starts at 0
found so sua
U+0E27 starts at 4
U+0E31 starts at 8
U+0E2A starts at 12
found so sua
U+0E14 starts at 16
U+0E35 starts at 20

This example demonstrates how to work with Unicode strings in Cilk, although the approach is quite different from Go due to the language’s C-like nature.