The interBasis search engine supports a wide range of national and international character sets. A single database may contain documents using different character sets.
The default charset is to match the locale or, when applicable, the IANA name.
| Id | IANA Name | Locale Name | Official Name |
|---|---|---|---|
| ascii | US-ASCII | C | ANSI_X3.4-1968 |
| 8859-1 | ISO-8859-1 | iso_8859_1 | ISO_8859-1:1987 |
| 8859-2 | ISO-8859-2 | iso_8859_2 | ISO_8859-2:1987 |
| 8859-3 | ISO-8859-3 | iso_8859_3 | ISO_8859-3:1988 |
| 8859-4 | ISO-8859-4 | iso_8859_4 | ISO_8859-4:1988 |
| 8859-5 | ISO-8859-5 | iso_8859_5 | ISO_8859-5:1988 |
| 8859-6 | ISO-8859-6 | iso_8859_6 | ISO_8859-6:1987 |
| 8859-7 | ISO-8859-7 | iso_8859_7 | ISO_8859-7:1987 |
| 8859-8 | ISO-8859-8 | iso_8859_8 | ISO_8859-8:1988 |
| 8859-9 | ISO-8859-9 | iso_8859_9 | ISO_8859-9:1989 |
The extended list is available via index time options:
| Id | Description |
|---|---|
| ascii | 7-bit ASCII |
| 8859-1 | ISO 8859-1 (Latin-1) |
| 8859-2 | ISO 8859-2 (Latin-2) |
| 8859-3 | ISO 8859-3 (Latin-3) |
| 8859-4 | ISO 8859-4 (Latin-4; obsolete) |
| 8859-5 | ISO 8859-5 (Part 5, Cyrillic) |
| 8859-6 | ISO 8859-6 (Part 6, Arabic) |
| 8859-7 | ISO 8859-7 (Part 7, Greek) |
| 8859-8 | ISO 8859-8 (Part 8, Hebrew) |
| 8859-9 | ISO 8859-9 (Latin-5) |
| 8859-10 | ISO 8859-10 (Latin-6) |
| usmarc | USMARC ANSEL (Extended Latin) |
| koi8 | KOI-8 (GOST 19769-74) |
| ucode | Russian U-code |
| cp866 | IBM PC: CP 866 (Russian) |
| av | Alternativnyj Variant |
| cp1251 | IBM PC: CP 1251 (Russian) |
| ov | Osnovnoj Variant |
| sf1 | ISO-646: Finnish/Swedish SF-1 variant |
| sf2 | ISO-646: Finnish/Swedish SF-2 variant (recommended) |
| tis | Thai+ASCII (TIS 620-1986) |
| viet1 | Vietnamese VSCII-1 (1993) |
| viet2 | Vietnamese VSCII-2 (1993) |
| viscii | Vietnamese VISCII 1.1 (1992) |
| cp437 | IBM PC: CP 437 |
| cp850 | IBM PS/2: CP 850 (Multilingual) |
| apple | Macintosh Standard Roman character set |
| next | NEXTSTEP character set |
| atari | ATARI-ST character set |
| LATEX | LaTeX character encodings for European characters (Latin-1) |
| HTML | HTML including many entity extensions |
| UTF-8 | File System Safe Universal Transformation Format (UCS-2) |
| unicode | Universal Character Set (16-bit) |
Support for over 50 additional character sets including many Asian and Oriental languages is in preperation.
1) The ISO 8859-1 (Latin-1) character set is the common Unix default in the US and Europe.