Strings and Runes in COBOL

In COBOL, strings are handled differently compared to many modern languages. COBOL uses fixed-length or variable-length character fields to store text data. The concept of Unicode and UTF-8 encoding is not natively supported in traditional COBOL, but some modern COBOL compilers provide extensions for Unicode support.

       IDENTIFICATION DIVISION.
       PROGRAM-ID. STRING-EXAMPLE.
       ENVIRONMENT DIVISION.
       DATA DIVISION.
       WORKING-STORAGE SECTION.
       01 WS-THAI-HELLO    PIC X(18) VALUE "สวัสดี".
       01 WS-COUNTER       PIC 9(2).
       01 WS-CHAR          PIC X.

       PROCEDURE DIVISION.
       MAIN-PROCEDURE.
           DISPLAY "Len: " FUNCTION LENGTH(WS-THAI-HELLO)

           PERFORM VARYING WS-COUNTER FROM 1 BY 1
               UNTIL WS-COUNTER > FUNCTION LENGTH(WS-THAI-HELLO)
               MOVE WS-THAI-HELLO(WS-COUNTER:1) TO WS-CHAR
               DISPLAY FUNCTION HEX-OF(WS-CHAR) WITH NO ADVANCING
               DISPLAY " " WITH NO ADVANCING
           END-PERFORM
           DISPLAY SPACE

           DISPLAY "Character count: " FUNCTION LENGTH(WS-THAI-HELLO)

           STOP RUN.

In this COBOL program:

  1. We define a string WS-THAI-HELLO containing the Thai word for “hello”. COBOL doesn’t have native support for UTF-8, so this might not display correctly in all environments.

  2. We use FUNCTION LENGTH to get the length of the string. This is equivalent to the len() function in many other languages.

  3. We iterate through each character of the string using a PERFORM loop. This is similar to the for loop in the original example.

  4. We display the hexadecimal representation of each character using FUNCTION HEX-OF. This is not exactly the same as the original example, as it will show the ASCII or EBCDIC hex values, not UTF-8 bytes.

  5. We count the characters using FUNCTION LENGTH. Note that this will count each byte as a separate character, unlike the RuneCountInString function in the original example which counts Unicode code points.

COBOL doesn’t have a built-in concept of “runes” or Unicode code points. Handling Unicode in COBOL typically requires using specific compiler extensions or external libraries, which are beyond the scope of this basic example.

To run this COBOL program:

$ cobc -x string-example.cob
$ ./string-example
Len: 18
E0 B8 AA E0 B8 A7 E0 B8 B1 E0 B8 AA E0 B8 94 E0 B8 B5 
Character count: 18

Note that the output may vary depending on your COBOL compiler and system encoding. The hexadecimal values shown here assume an ASCII-based system.

COBOL’s string handling is quite different from many modern languages, especially when it comes to Unicode support. For more complex Unicode operations, you might need to use specific COBOL compiler extensions or integrate with external libraries.