Title here
Summary here
use strict;
use warnings;
use utf8;
use Encode qw(decode encode);
# $s is a string assigned a literal value
# representing the word "hello" in the Thai
# language. Perl strings can contain UTF-8
# encoded text when the utf8 pragma is used.
my $s = "สวัสดี";
# Since strings in Perl are sequences of bytes,
# this will produce the length of the raw bytes stored within.
print "Len: ", length($s), "\n";
# Iterating over each byte in the string
print "Bytes: ";
for my $i (0..length($s)-1) {
printf("%x ", ord(substr($s, $i, 1)));
}
print "\n";
# To count how many characters are in a string, we can use
# the `split` function with an empty pattern. This works for
# UTF-8 encoded strings when the utf8 pragma is in effect.
print "Character count: ", scalar(split //, $s), "\n";
# We can use the `unpack` function to get an array of
# Unicode code points, which are similar to runes in Go.
my @code_points = unpack("U*", $s);
for my $i (0..$#code_points) {
printf("U+%04X starts at %d\n", $code_points[$i],
length(encode('UTF-8', substr($s, 0, $i))));
}
# This demonstrates passing a Unicode code point to a function.
for my $code_point (@code_points) {
examine_code_point($code_point);
}
sub examine_code_point {
my $cp = shift;
# We can compare a Unicode code point to a character directly.
if (chr($cp) eq 't') {
print "found tee\n";
} elsif (chr($cp) eq 'ส') {
print "found so sua\n";
}
}
This Perl script demonstrates concepts similar to those in the original Go example:
Note that Perl doesn’t have a built-in type specifically for Unicode code points like Go’s rune type. Instead, we work with Unicode code points as integer values.
To run this script, save it as strings_and_unicode.pl
and execute it with:
$ perl strings_and_unicode.pl
This will output information about the Thai string, including its byte length, individual bytes, character count, and details about each Unicode code point.