Strings and Runes in Perl

use strict;
use warnings;
use utf8;
use Encode qw(decode encode);

# $s is a string assigned a literal value
# representing the word "hello" in the Thai
# language. Perl strings can contain UTF-8
# encoded text when the utf8 pragma is used.
my $s = "สวัสดี";

# Since strings in Perl are sequences of bytes,
# this will produce the length of the raw bytes stored within.
print "Len: ", length($s), "\n";

# Iterating over each byte in the string
print "Bytes: ";
for my $i (0..length($s)-1) {
    printf("%x ", ord(substr($s, $i, 1)));
}
print "\n";

# To count how many characters are in a string, we can use
# the `split` function with an empty pattern. This works for
# UTF-8 encoded strings when the utf8 pragma is in effect.
print "Character count: ", scalar(split //, $s), "\n";

# We can use the `unpack` function to get an array of 
# Unicode code points, which are similar to runes in Go.
my @code_points = unpack("U*", $s);
for my $i (0..$#code_points) {
    printf("U+%04X starts at %d\n", $code_points[$i], 
           length(encode('UTF-8', substr($s, 0, $i))));
}

# This demonstrates passing a Unicode code point to a function.
for my $code_point (@code_points) {
    examine_code_point($code_point);
}

sub examine_code_point {
    my $cp = shift;
    
    # We can compare a Unicode code point to a character directly.
    if (chr($cp) eq 't') {
        print "found tee\n";
    } elsif (chr($cp) eq 'ส') {
        print "found so sua\n";
    }
}

This Perl script demonstrates concepts similar to those in the original Go example:

It shows how to work with UTF-8 encoded strings in Perl.
It demonstrates getting the byte length of a string and iterating over its bytes.
It shows how to count characters in a UTF-8 string.
It demonstrates how to work with Unicode code points, which are similar to runes in Go.
It shows how to examine individual characters in the string.

Note that Perl doesn’t have a built-in type specifically for Unicode code points like Go’s rune type. Instead, we work with Unicode code points as integer values.

To run this script, save it as strings_and_unicode.pl and execute it with:

$ perl strings_and_unicode.pl

This will output information about the Thai string, including its byte length, individual bytes, character count, and details about each Unicode code point.

Pointers in Perl

Structs in Perl

Learn X By Example

Title here

Strings and Runes in Perl