Strings and Runes in Crystal A Crystal string is an immutable sequence of UTF-8 characters. The language treats strings specially - as containers of text encoded in UTF-8. In Crystal, the concept of a character is represented by the `Char` type - it's a 32-bit Unicode code point.
```crystal
# The String class in Crystal provides many useful methods for working with UTF-8 encoded strings.
s = "สวัสดี"
# Since strings are arrays of bytes, this will produce the length of the raw bytes stored within.
puts "Len: #{s.bytesize}"
# This loop generates the hex values of all the bytes that constitute the code points in `s`.
s.each_byte do |byte|
print "#{byte.to_s(16)} "
end
puts
# To count how many characters are in a string, we can use the `size` method.
# Note that some Thai characters are represented by UTF-8 code points
# that can span multiple bytes, so the result of this count may be surprising.
puts "Char count: #{s.size}"
# Crystal's `each_char` method allows us to iterate over each character in the string.
s.each_char_with_index do |char, idx|
puts "U+#{char.ord.to_s(16).rjust(4, '0')} '#{char}' starts at #{idx}"
end
puts "\nUsing char_at method"
i = 0
while i < s.size
char = s.char_at(i)
puts " U +#{ char . ord . to_s ( 16 ). rjust ( 4 , ' 0 ')} '#{ char }' starts at #{ i }"
examine_char ( char )
i += 1
end
This demonstrates passing a Char
value to a function.
def examine_char ( c : Char )
# We can compare a `Char` value to a character literal directly.
if c == 't'
puts "found tee"
elsif c == 'ส'
puts "found so sua"
end
end
When you run this program, you’ll see output similar to this:
Len: 18
e0 b8 aa e0 b8 a7 e0 b8 b1 e0 b8 aa e0 b8 94 e0 b8 b5
Char count: 6
U+0e2a 'ส' starts at 0
U+0e27 'ว' starts at 1
U+0e31 'ั' starts at 2
U+0e2a 'ส' starts at 3
U+0e14 'ด' starts at 4
U+0e35 'ี' starts at 5
Using char_at method
U+0e2a 'ส' starts at 0
found so sua
U+0e27 'ว' starts at 1
U+0e31 'ั' starts at 2
U+0e2a 'ส' starts at 3
found so sua
U+0e14 'ด' starts at 4
U+0e35 'ี' starts at 5
This example demonstrates how Crystal handles Unicode strings and characters. It shows various methods for working with strings and individual characters, including byte-level operations, character counting, and iteration over characters.