International/national Character Set Support:

The interBasis search engine supports a wide range of national and international character sets. A single database may contain documents using different character sets.

The default charset is to match the locale or, when applicable, the IANA name.
IdIANA NameLocale NameOfficial Name
asciiUS-ASCII CANSI_X3.4-1968
8859-1ISO-8859-1iso_8859_1ISO_8859-1:1987
8859-2ISO-8859-2iso_8859_2ISO_8859-2:1987
8859-3ISO-8859-3iso_8859_3ISO_8859-3:1988
8859-4ISO-8859-4iso_8859_4ISO_8859-4:1988
8859-5ISO-8859-5iso_8859_5ISO_8859-5:1988
8859-6ISO-8859-6iso_8859_6ISO_8859-6:1987
8859-7ISO-8859-7iso_8859_7ISO_8859-7:1987
8859-8ISO-8859-8iso_8859_8ISO_8859-8:1988
8859-9ISO-8859-9iso_8859_9ISO_8859-9:1989
If the locale is NOT specified and no character information is available (eg. no MIME header) then the (ISO) 8859-11) set is used.

The extended list is available via index time options:
IdDescription
ascii7-bit ASCII
8859-1ISO 8859-1 (Latin-1)
8859-2ISO 8859-2 (Latin-2)
8859-3ISO 8859-3 (Latin-3)
8859-4ISO 8859-4 (Latin-4; obsolete)
8859-5ISO 8859-5 (Part 5, Cyrillic)
8859-6ISO 8859-6 (Part 6, Arabic)
8859-7ISO 8859-7 (Part 7, Greek)
8859-8ISO 8859-8 (Part 8, Hebrew)
8859-9ISO 8859-9 (Latin-5)
8859-10ISO 8859-10 (Latin-6)
usmarcUSMARC ANSEL (Extended Latin)
koi8KOI-8 (GOST 19769-74)
ucodeRussian U-code
cp866IBM PC: CP 866 (Russian)
avAlternativnyj Variant
cp1251IBM PC: CP 1251 (Russian)
ovOsnovnoj Variant
sf1ISO-646: Finnish/Swedish SF-1 variant
sf2ISO-646: Finnish/Swedish SF-2 variant (recommended)
tisThai+ASCII (TIS 620-1986)
viet1Vietnamese VSCII-1 (1993)
viet2Vietnamese VSCII-2 (1993)
visciiVietnamese VISCII 1.1 (1992)
cp437IBM PC: CP 437
cp850IBM PS/2: CP 850 (Multilingual)
appleMacintosh Standard Roman character set
nextNEXTSTEP character set
atariATARI-ST character set
LATEXLaTeX character encodings for European characters (Latin-1)
HTMLHTML including many entity extensions
UTF-8File System Safe Universal Transformation Format (UCS-2)
unicodeUniversal Character Set (16-bit)

Support for over 50 additional character sets including many Asian and Oriental languages is in preperation.

1) The ISO 8859-1 (Latin-1) character set is the common Unix default in the US and Europe.


© Copyright 1995-1997   Basis Systeme netzwerk, Munich. All Rights Reserved.