Character classification

A GLS locale groups the characters of a code set into character classes. Each class contains characters that have a related purpose.

The contents of a character class can be language specific. For example, the lower class contains all alphabetic lowercase characters in a code set. In the default locale, the default code set groups the English characters a through z into the lower class, but it also includes lowercase characters such as á, ⪚, ⁣, õ, and ü.

The default code set on UNIX platforms is ISO8859-1. The default code set for Windows environments is Microsoft 1252.

For more information about the default locale and the default code set, see the HCL Informix GLS User's Guide.

The LC_CTYPE category of a GLS locale file defines the following character classes.
Character class Contains
alpha Alphabetic characters:
  • Single-byte alphabetic characters a through z and A through Z
  • Any single-byte non-English characters that the locale defines
  • Any multibyte alphabetic or digit characters that the locale defines
This class includes characters in the lower and upper classes.
lower Lowercase alphabetic characters:
  • Single-byte alphabetic characters a through z
  • Any single-byte non-English lowercase characters that the locale defines
  • Any multibyte lowercase characters that the locale defines
No characters in this class are in the upper class.
upper Uppercase alphabetic characters:
  • Single-byte alphabetic characters A through Z
  • Any single-byte non-English uppercase characters that the locale defines
  • Any multibyte uppercase alphabetic characters that the locale defines
No characters in this class are in the lower class.
digit Single-byte decimal digits 0 through 9
xdigit Hexadecimal digits:
  • Single-byte numeric digits 0 through 9
  • Single-byte representations of hexadecimal digits a through f and A through F
This class includes characters in the digit class.
alnum All characters in both the alpha and digit classes.
blank Horizontal white space:
  • Single-byte horizontal-space characters:
    • “ ” (ASCII 0x020)
    • tab (ASCII 0x009)
  • Any multibyte horizontal-space characters that the locale defines
space Horizontal and vertical white space:
  • Single-byte horizontal-space characters as defined in the blank class
  • Single-byte vertical-space characters: new line, vertical tab, form feed, carriage return
  • Any multibyte vertical-space characters that the locale defines
This class includes characters in the blank class.
cntrl Control characters:
  • Single-byte control characters: ASCII 0x000 to 0x01F
  • Any other control characters that the locale defines
graph Graphical characters are all characters that have visual representation. This class includes characters in the alpha, lower, upper, digit, xdigit, and punct classes.
punct Punctuation:
  • Single-byte punctuation characters:

    ! @ # $ % ^ & * ( ) - = + \ | ‘ ~ [ ] { } ; : ‘ “ , . ? < >

  • Any non-ASCII punctuation characters that the locale defines
print All printable characters

This class includes characters in the alpha, lower, upper, digit, xdigit, graph, and punct classes.

Your application must not assume which characters belong in a particular character class. For example, it must not contain code such as the following example to determine whether a character is lowercase:
if ( one_char >= 'a' && one_char <= 'z' )
Instead, use functions in the HCL Informix® GLS library to identify the class of a particular character. The following table lists the GLS character classes and the Informix GLS functions that test for these classes for both multibyte and wide characters.
Table 1. Informix GLS character-class functions
Character class Multibyte-character function Wide-character function
alnum (alpha or digit) ifx_gl_ismalnum() ifx_gl_iswalnum()
alpha ifx_gl_ismalpha() ifx_gl_iswalpha()
lower ifx_gl_ismlower() ifx_gl_iswlower()
upper ifx_gl_ismupper() ifx_gl_iswupper()
blank ifx_gl_ismblank() ifx_gl_iswblank()
space ifx_gl_ismspace() ifx_gl_iswspace()
digit ifx_gl_ismdigit() ifx_gl_iswdigit()
xdigit ifx_gl_ismxdigit() ifx_gl_iswxdigit()
cntrl ifx_gl_ismcntrl() ifx_gl_iswcntrl()
graph ifx_gl_ismgraph() ifx_gl_iswgraph()
punct ifx_gl_ismpunct() ifx_gl_iswpunct()
print ifx_gl_ismprint() ifx_gl_iswprint()
These Informix GLS functions check the LC_CTYPE category of the current locale to determine whether a specified character belongs to the respective character classification. The following code fragment uses the ifx_gl_ismlower() function to determine whether a multibyte character is lowercase:
if ( ifx_gl_ismlower(one_char, char_size)
The Informix GLS functions in Table 1 do not return a unique value if they encounter an error. To detect an error, initialize the ifx_gl_lc_errno() error number to 0 before you call one of these functions, and then call ifx_gl_lc_errno() immediately after you call the function. For example, the following code fragment performs error checking for the ifx_gl_ismlower() function:
/* Initialize the error number */
ifx_gl_lc_errno() = 0;

/* Determine if 'mb' character is lowercase */
value = ifx_gl_ismlower(mb, mb_size);

/* If the error number has changed, ifx_gl_ismlower()has
* set it to indicate the cause of an error */
if ( ifx_gl_lc_errno() != 0 )
    /* Handle error */
else if ( value != 0 )
    /* Character 'mb' is in lower class */
else if ( value == 0 )
    /* Character 'mb' is NOT in lower class */

Copyright© 2019 HCL Technologies Limited