Textual File Handling
- Last UpdatedJan 09, 2024
- 1 minute read
By default all sequential text files read by Marine will be expected to be in UTF8 format with a Byte Order Mark (BOM) present to identify them. Similarly by default all sequential text files written by Marine will be in Unicode UTF8 format with a BOM present.
The following Marine environment variables are available for users to modify how Marine handles sequential text files:
CADC_LANG specifies file encoding for reading Marine external files which do not have an expected Unicode BOM present. If the variable is unset, then default to LATIN1 format. Files with the following Unicode BOMs will be translated to UTF8 on reading: UTF16 little-endian, UTF16 big-endian, UTF32 little-endian, UTF32 big-endian.
CADC_LANG_NEW specifies the file encoding for new files written by Marine . If the variable is unset, then default to Unicode UTF8 format with a BOM present.
The following encodings are currently supported:
|
Unicode |
|
|
UTF8 |
Unicode UTF8 |
|
UTF16LE |
UTF16 little-endian |
|
UTF16BE |
UTF16 big-endian |
|
UTF32LE |
UTF32 little-endian |
|
UTF32BE |
UTF32 big-endian |
|
ISO |
|
|
LATIN1 |
ISO8859-1 |
|
LATIN2 |
ISO8859-2 |
|
LATIN5 |
ISO8859-5 Cyrillic |
|
Windows code page |
|
|
CP932 |
Japanese shift-JIS |
|
CP936 |
Simplified Chinese GBK |
|
CP949 |
Korean |
|
CP950 |
Traditional Chinese Big5 |
|
CP1250 |
Central European |
|
CP1251 |
Cyrillic |
|
CP1252 |
LATIN1 + some extras (beware) |
|
For backwards compatibility with legacy PDMS Projects |
|
|
JAPANESE |
Japanese shift-JIS |
|
CHINESE |
Simplified Chinese (EUC) |
|
KOREAN |
Korean (EUC) |
|
TCHINESE |
Traditional Chinese (used in Taiwan for example) (EUC) |