canonical_maps index parameter

You can index synonyms by creating a canonical map. You specify canonical mapping strings with the canonical_maps index parameter when you create the bts index.

Read syntax diagramSkip visual syntax diagram
canonical_maps index parameter

                              .-,------------------------------------------------------.           
                              V                                                        |           
|--canonical_maps--=--"--+-(----+-{original_char}----------------+--:--{mapped_string}-+--)-+--"--|
                         |      '-{[original_char orginal_char]}-'                          |      
                         +-file:directory/filename------------------------------------------+      
                         '-table:table.column-----------------------------------------------'      

Element Description
column The name of the column that contains canonical mapping strings.
directory The directory path for the canonical mapping file.
filename The name of the file that contains canonical mapping strings.
table The name of the table that contains the column with canonical mapping strings.
original_char The characters to replace with a mapped string during indexing and searching.
mapped_string The characters that replace the original characters during indexing. If the mapped_string is empty, the original characters are not indexed.

Usage

Use canonical maps to improve the accuracy of queries by equating characters with a canonical representation of those characters. For example, you can specify that a letter with a diacritical mark is indexed without its diacritical mark. You can also normalize strings that tend to be inconsistent or delete character strings from indexed text.

You can update your canonical map by re-creating the bts index.

During indexing and searching, by default all characters are transformed to lowercase, therefore, any uppercase characters in the original characters must be mapped to lowercase characters in the mapping sting. For some locales, the uppercase characters of letters with diacritical marks or ligatures are considered independent characters from their lowercase equivalents. For those locales, you must map both the uppercase and the lowercase characters with diacritical marks or ligatures to the same lowercase letter. You cannot specify an uppercase letter in a mapped string.

Blank spaces are significant.

The mapped characters are indexed and searched, therefore, when the results are returned, words with the original characters are treated as if those characters are the same as their corresponding mapped characters. For example, if you map the character ù to the letter u, then both Raùl and Raul are indexed as raul. Similarly, if you search for Raùl or for Raul, all rows that contain either Raùl or Raul are returned.

If you want to prevent symbols from being indexed, consider how they are being used. For example, if you delete the forward slash character (/) with the mapping {/}:{}, then the string /home/john/henry is indexed as homejohnhenry.

Example: Map characters as inline comma-separated strings

The following example shows how to create an index that specifies two character substitutions:
CREATE INDEX docs_idx on repository
    (document_text bts_lvarchar_ops)
    USING bts
     (canonical_maps="({ù}:{u},{æ}:{ae})")
    IN mysbspace;

Example: Map characters as a file

The following example illustrates a file of character mappings. Some mapped characters have multiple original characters. This example assumes the locale en_us.8859-1, which does not designate uppercase letters that have diacritical marks as uppercase. Therefore, both uppercase and lowercase versions of letters are included in the original characters.

{Ææ}:{ae},

{Œœ}:{oe},

{Ññ}:{ny},

{[ÀÁÂÃÅàáâäãå]}:{a},

{[ÈÉÊËèéêë]}:{e},

{[ÌÍÎÏìíîï]}:{i},

{[ÒÓÔÖÕòóôÖõ]}:{o},

{[ÙÚÛÜÙúûü]}:{u},

{Çç}:{c},

{Øø}:{o},

{Ýý}:{y},

{ß}:{ss},

{mc }:{mc}
The following example shows how to create an index that specifies a mapping file named canon:
CREATE INDEX docs_idx on repository
    (document_text bts_lvarchar_ops)
    USING bts
     (canonical_maps="file:/tmp/canon")
    IN mysbspace;

Example: Map single characters

The following example illustrates how to map a single character to another character. The original character is in the first set of braces and the character to map it to is in the second set of braces.

The following example maps the single character ù to the single character u:
{ù}:{u}

Example: Specify multiple original characters

The following examples illustrate how to specify multiple original characters. You can put multiple characters in a set of brackets and enclose the brackets in braces. Do not put a blank space between the characters when you use brackets or every blank space in the text is indexed as the mapping string.

The following example maps both ù and ú to the letter u:

{[ùú]}:{u}  
The following example also maps both ù and ú to the letter u, but it uses two sets of mapping strings that are separated by a comma:
{ù}:{u},{ú}:{u}

Example: Specify multiple characters in mapping strings

The following example illustrates how a mapping string can have multiple characters. For example, the following example maps the single æ character to the two letters ae:
{æ}:{ae}

Example: Prevent the indexing of characters

The following example prevents the indexing of the characters 's by specifying empty braces for the mapping string:
{'s}:{}

Example: Manage multiple spellings

The following example illustrates how to manage multiple spellings by mapping the possible strings to each other. For example, if you want to search for the name McHenry and you know that the indexed name might be spelled as either mchenry or mc henry, your query string is:

 'mchenry OR "mc henry"'
Alternatively, you can map the two prefixes:
{mc }:{mc}

Copyright© 2019 HCL Technologies Limited