Strings and Runes in Perl

use strict;
use warnings;
use utf8;
use Encode qw(decode encode);

# $s is a string assigned a literal value
# representing the word "hello" in the Thai
# language. Perl strings can contain UTF-8
# encoded text when the utf8 pragma is used.
my $s = "สวัสดี";

# Since strings in Perl are sequences of bytes,
# this will produce the length of the raw bytes stored within.
print "Len: ", length($s), "\n";

# Iterating over each byte in the string
print "Bytes: ";
for my $i (0..length($s)-1) {
    printf("%x ", ord(substr($s, $i, 1)));
}
print "\n";

# To count how many characters are in a string, we can use
# the `split` function with an empty pattern. This works for
# UTF-8 encoded strings when the utf8 pragma is in effect.
print "Character count: ", scalar(split //, $s), "\n";

# We can use the `unpack` function to get an array of 
# Unicode code points, which are similar to runes in Go.
my @code_points = unpack("U*", $s);
for my $i (0..$#code_points) {
    printf("U+%04X starts at %d\n", $code_points[$i], 
           length(encode('UTF-8', substr($s, 0, $i))));
}

# This demonstrates passing a Unicode code point to a function.
for my $code_point (@code_points) {
    examine_code_point($code_point);
}

sub examine_code_point {
    my $cp = shift;
    
    # We can compare a Unicode code point to a character directly.
    if (chr($cp) eq 't') {
        print "found tee\n";
    } elsif (chr($cp) eq 'ส') {
        print "found so sua\n";
    }
}

This Perl script demonstrates concepts similar to those in the original Go example:

  1. It shows how to work with UTF-8 encoded strings in Perl.
  2. It demonstrates getting the byte length of a string and iterating over its bytes.
  3. It shows how to count characters in a UTF-8 string.
  4. It demonstrates how to work with Unicode code points, which are similar to runes in Go.
  5. It shows how to examine individual characters in the string.

Note that Perl doesn’t have a built-in type specifically for Unicode code points like Go’s rune type. Instead, we work with Unicode code points as integer values.

To run this script, save it as strings_and_unicode.pl and execute it with:

$ perl strings_and_unicode.pl

This will output information about the Thai string, including its byte length, individual bytes, character count, and details about each Unicode code point.