Regular Expressions in C

Our program demonstrates common regular expression tasks in C using the POSIX regex library. Here’s the full source code:

#include <stdio.h>
#include <stdlib.h>
#include <regex.h>
#include <string.h>

void check_match(regex_t *regex, const char *str) {
    int result = regexec(regex, str, 0, NULL, 0);
    printf("%s\n", result == 0 ? "true" : "false");
}

void find_string(regex_t *regex, const char *str) {
    regmatch_t match;
    if (regexec(regex, str, 1, &match, 0) == 0) {
        printf("%.*s\n", (int)(match.rm_eo - match.rm_so), str + match.rm_so);
    } else {
        printf("No match\n");
    }
}

void find_string_index(regex_t *regex, const char *str) {
    regmatch_t match;
    if (regexec(regex, str, 1, &match, 0) == 0) {
        printf("idx: [%d %d]\n", (int)match.rm_so, (int)match.rm_eo);
    } else {
        printf("No match\n");
    }
}

int main() {
    regex_t regex;
    int reti;

    // Compile the regular expression
    reti = regcomp(&regex, "p([a-z]+)ch", REG_EXTENDED);
    if (reti) {
        fprintf(stderr, "Could not compile regex\n");
        exit(1);
    }

    // Test if the pattern matches a string
    check_match(&regex, "peach");

    // Find the match for the regexp
    find_string(&regex, "peach punch");

    // Find the index of the match
    find_string_index(&regex, "peach punch");

    // Note: Submatch and All variants are not directly supported in POSIX regex
    // You would need to implement these manually using multiple regexec calls

    // Replace subsets of strings (simplified version)
    char input[] = "a peach";
    char output[100];
    regmatch_t match;
    if (regexec(&regex, input, 1, &match, 0) == 0) {
        strncpy(output, input, match.rm_so);
        strcat(output, "<fruit>");
        strcat(output, input + match.rm_eo);
        printf("%s\n", output);
    }

    // Clean up
    regfree(&regex);

    return 0;
}

To run the program, compile it with gcc and then execute:

$ gcc -o regex_example regex_example.c
$ ./regex_example
true
peach
idx: [0 5]
a <fruit>

This C program uses the POSIX regex library, which is less feature-rich compared to Go’s regexp package. Some key differences and limitations:

  1. C requires manual memory management and error checking.
  2. POSIX regex doesn’t support some of the more advanced features like submatch extraction or “find all” functionality out of the box. These would need to be implemented manually.
  3. String manipulation in C is more verbose and error-prone.
  4. C doesn’t have built-in dynamic arrays or slices, so returning multiple matches would require additional data structures.

Despite these limitations, this example demonstrates basic regex functionality in C, including pattern matching, finding matches, and simple string replacement.

For more advanced regex operations in C, consider using third-party libraries like PCRE (Perl Compatible Regular Expressions) which offer more features similar to those found in Go’s regexp package.