Strings and Runes in Visual Basic .NET
Our first program will demonstrate string handling and Unicode support in Visual Basic .NET. Here’s the full source code:
Imports System
Imports System.Text
Module StringsAndRunes
Sub Main()
' s is a String assigned a literal value
' representing the word "hello" in the Thai language.
' Visual Basic .NET string literals are UTF-16 encoded text.
Const s As String = "สวัสดี"
' This will produce the length of the string in characters.
Console.WriteLine("Len: " & s.Length)
' This loop generates the hex values of all
' the UTF-16 code units in s.
For i As Integer = 0 To s.Length - 1
Console.Write("{0:X4} ", AscW(s(i)))
Next
Console.WriteLine()
' To count how many characters are in a string, we can use
' the Length property. Note that some Thai characters are
' represented by surrogate pairs, so the result of this count
' may be surprising.
Console.WriteLine("Character count: " & s.Length)
' A For Each loop handles strings specially and iterates
' over each character.
Dim idx As Integer = 0
For Each c As Char In s
Console.WriteLine("U+{0:X4} '{1}' starts at {2}", AscW(c), c, idx)
idx += 1
Next
Console.WriteLine(vbNewLine & "Using StringInfo")
Dim si As New System.Globalization.StringInfo(s)
For i As Integer = 0 To si.LengthInTextElements - 1
Dim element As String = si.SubstringByTextElements(i, 1)
Console.WriteLine("U+{0:X4} '{1}' starts at {2}",
Strings.AscW(element), element, si.StringInfo.SubstringByTextElements(0, i).Length)
ExamineChar(element(0))
Next
End Sub
Sub ExamineChar(c As Char)
' We can compare a Char value to a character literal directly.
If c = "t"c Then
Console.WriteLine("found tee")
ElseIf c = "ส"c Then
Console.WriteLine("found so sua")
End If
End Sub
End Module
This program demonstrates several concepts:
In Visual Basic .NET, strings are sequences of UTF-16 code units. This is different from some other languages where strings might be sequences of bytes.
The
Length
property of a string gives the number of UTF-16 code units, not necessarily the number of visible characters.We can iterate over a string using both index-based and foreach-style loops.
The
AscW
function is used to get the Unicode code point of a character.To properly handle complex Unicode characters (like some Thai characters that may be represented by multiple UTF-16 code units), we use the
StringInfo
class from theSystem.Globalization
namespace.Visual Basic .NET doesn’t have a direct equivalent to Go’s runes, but we can work with individual characters using the
Char
type.Character literals in Visual Basic .NET are denoted with a trailing
c
, like"ส"c
.
When you run this program, you’ll see output showing the length of the string, its UTF-16 code units, and information about each character. The exact output may vary depending on the console’s ability to display Thai characters.
This example demonstrates how Visual Basic .NET handles Unicode strings, which is crucial for internationalization and working with non-ASCII text.