canonical_maps index parameter
You can index synonyms by creating a canonical map. You specify canonical mapping strings with the canonical_maps index parameter when you create the bts index.
canonical_maps index parameter .-,------------------------------------------------------. V | |--canonical_maps--=--"--+-(----+-{original_char}----------------+--:--{mapped_string}-+--)-+--"--| | '-{[original_char orginal_char]}-' | +-file:directory/filename------------------------------------------+ '-table:table.column-----------------------------------------------'
Element | Description |
---|---|
column | The name of the column that contains canonical mapping strings. |
directory | The directory path for the canonical mapping file. |
filename | The name of the file that contains canonical mapping strings. |
table | The name of the table that contains the column with canonical mapping strings. |
original_char | The characters to replace with a mapped string during indexing and searching. |
mapped_string | The characters that replace the original characters during indexing. If the mapped_string is empty, the original characters are not indexed. |
Usage
Use canonical maps to improve the accuracy of queries by equating characters with a canonical representation of those characters. For example, you can specify that a letter with a diacritical mark is indexed without its diacritical mark. You can also normalize strings that tend to be inconsistent or delete character strings from indexed text.
You can update your canonical map by re-creating the bts index.
During indexing and searching, by default all characters are transformed to lowercase, therefore, any uppercase characters in the original characters must be mapped to lowercase characters in the mapping sting. For some locales, the uppercase characters of letters with diacritical marks or ligatures are considered independent characters from their lowercase equivalents. For those locales, you must map both the uppercase and the lowercase characters with diacritical marks or ligatures to the same lowercase letter. You cannot specify an uppercase letter in a mapped string.
Blank spaces are significant.
The mapped characters are indexed and searched, therefore, when the results are returned, words with the original characters are treated as if those characters are the same as their corresponding mapped characters. For example, if you map the character ù to the letter u, then both Raùl and Raul are indexed as raul. Similarly, if you search for Raùl or for Raul, all rows that contain either Raùl or Raul are returned.
If you want to prevent symbols from being indexed, consider how they are being used. For example, if you delete the forward slash character (/) with the mapping {/}:{}, then the string /home/john/henry is indexed as homejohnhenry.
Example: Map characters as inline comma-separated strings
CREATE INDEX docs_idx on repository
(document_text bts_lvarchar_ops)
USING bts
(canonical_maps="({ù}:{u},{æ}:{ae})")
IN mysbspace;
Example: Map characters as a file
The following example illustrates a file of character mappings. Some mapped characters have multiple original characters. This example assumes the locale en_us.8859-1, which does not designate uppercase letters that have diacritical marks as uppercase. Therefore, both uppercase and lowercase versions of letters are included in the original characters.
{Ææ}:{ae},
{Œœ}:{oe},
{Ññ}:{ny},
{[ÀÁÂÃÅàáâäãå]}:{a},
{[ÈÉÊËèéêë]}:{e},
{[ÌÍÎÏìíîï]}:{i},
{[ÒÓÔÖÕòóôÖõ]}:{o},
{[ÙÚÛÜÙúûü]}:{u},
{Çç}:{c},
{Øø}:{o},
{Ýý}:{y},
{ß}:{ss},
{mc }:{mc}
CREATE INDEX docs_idx on repository
(document_text bts_lvarchar_ops)
USING bts
(canonical_maps="file:/tmp/canon")
IN mysbspace;
Example: Map single characters
The following example illustrates how to map a single character to another character. The original character is in the first set of braces and the character to map it to is in the second set of braces.
{ù}:{u}
Example: Specify multiple original characters
The following examples illustrate how to specify multiple original characters. You can put multiple characters in a set of brackets and enclose the brackets in braces. Do not put a blank space between the characters when you use brackets or every blank space in the text is indexed as the mapping string.
The following example maps both ù and ú to the letter u:
{[ùú]}:{u}
{ù}:{u},{ú}:{u}
Example: Specify multiple characters in mapping strings
{æ}:{ae}
Example: Prevent the indexing of characters
{'s}:{}
Example: Manage multiple spellings
The following example illustrates how to manage multiple spellings by mapping the possible strings to each other. For example, if you want to search for the name McHenry and you know that the indexed name might be spelled as either mchenry or mc henry, your query string is:
'mchenry OR "mc henry"'
{mc }:{mc}