Strings and Runes in Visual Basic .NET

Our first program will demonstrate string handling and Unicode support in Visual Basic .NET. Here’s the full source code:

Imports System
Imports System.Text

Module StringsAndRunes
    Sub Main()
        ' s is a String assigned a literal value
        ' representing the word "hello" in the Thai language.
        ' Visual Basic .NET string literals are UTF-16 encoded text.
        Const s As String = "สวัสดี"

        ' This will produce the length of the string in characters.
        Console.WriteLine("Len: " & s.Length)

        ' This loop generates the hex values of all
        ' the UTF-16 code units in s.
        For i As Integer = 0 To s.Length - 1
            Console.Write("{0:X4} ", AscW(s(i)))
        Next
        Console.WriteLine()

        ' To count how many characters are in a string, we can use
        ' the Length property. Note that some Thai characters are
        ' represented by surrogate pairs, so the result of this count
        ' may be surprising.
        Console.WriteLine("Character count: " & s.Length)

        ' A For Each loop handles strings specially and iterates
        ' over each character.
        Dim idx As Integer = 0
        For Each c As Char In s
            Console.WriteLine("U+{0:X4} '{1}' starts at {2}", AscW(c), c, idx)
            idx += 1
        Next

        Console.WriteLine(vbNewLine & "Using StringInfo")
        Dim si As New System.Globalization.StringInfo(s)
        For i As Integer = 0 To si.LengthInTextElements - 1
            Dim element As String = si.SubstringByTextElements(i, 1)
            Console.WriteLine("U+{0:X4} '{1}' starts at {2}", 
                              Strings.AscW(element), element, si.StringInfo.SubstringByTextElements(0, i).Length)
            ExamineChar(element(0))
        Next
    End Sub

    Sub ExamineChar(c As Char)
        ' We can compare a Char value to a character literal directly.
        If c = "t"c Then
            Console.WriteLine("found tee")
        ElseIf c = "ส"c Then
            Console.WriteLine("found so sua")
        End If
    End Sub
End Module

This program demonstrates several concepts:

  1. In Visual Basic .NET, strings are sequences of UTF-16 code units. This is different from some other languages where strings might be sequences of bytes.

  2. The Length property of a string gives the number of UTF-16 code units, not necessarily the number of visible characters.

  3. We can iterate over a string using both index-based and foreach-style loops.

  4. The AscW function is used to get the Unicode code point of a character.

  5. To properly handle complex Unicode characters (like some Thai characters that may be represented by multiple UTF-16 code units), we use the StringInfo class from the System.Globalization namespace.

  6. Visual Basic .NET doesn’t have a direct equivalent to Go’s runes, but we can work with individual characters using the Char type.

  7. Character literals in Visual Basic .NET are denoted with a trailing c, like "ส"c.

When you run this program, you’ll see output showing the length of the string, its UTF-16 code units, and information about each character. The exact output may vary depending on the console’s ability to display Thai characters.

This example demonstrates how Visual Basic .NET handles Unicode strings, which is crucial for internationalization and working with non-ASCII text.