Character Encoding Settings
In addition to the most commonly used character encodings (or, in short, "encodings") US-ASCII and ISO-8859-1 (West European, Latin 1), LISTSERV Maestro also supports the use of various other encodings. The information presented here outlines the different encodings available. If you are not familiar with encodings and their usage, please read below for an introduction to the topic.
LISTSERV Maestro allows you to choose among the encodings listed in the table below for encoding email messages. All of the encodings have their advantages and disadvantages; therefore, make sure that you carefully consider which encoding to use. See below for possible pitfalls.
|US-ASCII||American / English, contains the common letters, digits and characters.
US-ASCII is contained in all the following encodings, meaning that in all those encodings, the values 0-127 map to the same characters, the ones defined by US-ASCII.
|ISO-8859-1||West European, Latin 1 - adds characters for the more common West European languages to US-ASCII|
|ISO-8859-2||East European, Latin 2 - adds characters for the Central and East European languages to US-ASCII|
|ISO-8859-3||South European, Latin 3 - adds characters for the South European languages to US-ASCII|
|ISO-8859-4||North European, Latin 4 - adds characters for the North European languages to US-ASCII|
|ISO-8859-5||Cyrillic - adds the Cyrillic characters to US-ASCII|
|ISO-8859-6||Arabic - adds the basic Arabic alphabet to US-ASCII|
|ISO-8859-7||Greek - adds the Greek characters to US-ASCII|
|ISO-8859-8||Hebrew - adds the Hebrew characters to US-ASCII|
|ISO-8859-9||Turkish - very similar to ISO-8859-1, but replaces some rarely used characters with Turkish ones|
|ISO-8859-15||Same as ISO-8859-1, but replaces the international currency symbol '¤' with the Euro symbol '€'|
|GB-2312||Simplified Chinese - mostly used in mainland China and Singapore|
|Big5||Traditional Chinese - mostly used in Taiwan and Hong Kong|
|UTF-8||International Unicode encoding, in UTF-8 format.
Unicode is the most complete encoding. Where US-ASCII contains only 128 different characters and the ISO-8859 encodings contain 256, Unicode contains many, many thousands, making room for most of the characters of the world, including Asian characters and symbol characters, in a single encoding.
|LISTSERV Maestro determines optimal encoding automatically (but not
LISTSERV Maestro will automatically choose either US-ASCII, one of the ISO-8859 encodings, one of the chinese encodings or EUC-JP or Shift-JIS, depending on which characters are actually used in the mail. It will not choose UTF-8.
Note: Due to its definition, the encoding ISO-2022-JP is never chosen automatically. If your email is viewed best with this encoding, you have to select it manually.
|LISTSERV Maestro determines optimal encoding automatically (allow Unicode)
LISTSERV Maestro will automatically choose either US-ASCII or any of the other encodings (ignoring ISO-2022-JP, see above), or even UTF-8, depending on which characters are actually used in the mail.
Pitfalls to Consider When Choosing an Encoding
There are few errors that can occur using the "standard" encodings US-ASCII and ISO-8859-1 to ISO-8859-4. They are very common encodings and all or almost all email clients and Web mailers are able to display messages encoded with them. A problem can occur if your message, or the merge data that is merged into the message, contains characters that are not displayable in the chosen encoding. These characters will appear as '?'s (or as some other unpredictable but usually incorrect character) to the recipient if this happens.
Most notably, while the whole of the email text may contain only English characters, making US-ASCII the logical choice, the merge data of the recipients may very well contain characters that are not part of US-ASCII. In this case, these characters would appear incorrectly to the recipients. Please see below for of what to consider when using text merge.
The encodings for Cyrillic, Arabic, Greek, Hebrew, Japanese, Chinese, and Korean are also quite common. However, to display these characters correctly, your recipients may have to install some additional language support. Without this, the message may display incorrectly, even if the email client itself (or the Web mailer) is able to handle the encoding. Arabic and Hebrew are languages that are read right-to-left, adding another layer of complexity to encoding the message.
Understanding your targeted recipients' capabilities makes selecting an appropriate encoding easier. For example, if you are writing a message in Greek, you are probably also planning to send it to a select number of recipients who are all able to read and understand Greek. Recipients who understand Greek and expect to receive Greek language email probably have Greek language support already installed.
The encodings ISO-8859-9 for Turkish and ISO-8859-15 for Latin 1 with the Euro currency symbol are very new encodings; therefore, a number of older email clients may not be able to display them. Even though these encodings do not contain any unusual characters, be careful when you use them. Do a very thorough test delivery with a variety of email clients and Web mailers to verify that the email is indeed received and displayed as expected.
UTF-8 is a form of encoding for Unicode. The great advantage to Unicode is its ability to display almost every conceivable character, including Asian characters. Unicode is able to display a mix of characters that do not fit into a single ISO-8859 encoding (for example, a Russian message in Cyrillic letters that includes mathematical formulas using letters from the Greek encoding).
The main disadvantage of using Unicode is that a number of older email clients are not able to properly display UTF-8 encoded mail. Additionally, even if recipients are able to display UTF-8, they might not have support for the correct language installed. Similar to the more "exotic" ISO-8859 encodings, if you know your recipients well, UTF-8 is a safe choice. For example, UTF-8 would be a safe selection if you wanted to send a Japanese text message to a select number of recipients you know have email clients that support UTF-8 and have Japanese language support installed.
The automatic options, where LISTSERV Maestro determines the optimal encoding automatically have advantages in some situations, but should be used with a good understanding of what they entail so that you do not accidentally send your email with an unintended encoding.
The first automatic choice, but not Unicode, will send the email using US-ASCII as long as there are not any non-US-ASCII characters found in the email body or subject line. If there are any non-US-ASCII characters, one of the ISO-8859 encodings or one of the Asian encodings will be selected. Selection is based on matching the most characters contained in the email to the encoding where the set of non-displayable characters that are replaced with '?' is minimized, or even empty. ISO-8859 encodings with lower numbers are given precedence (ISO-8859-1 over ISO-8859-2, and so on). After testing the ISO-8859 encodings, first the Chinese encodings are considered, then the Japanese (ignoring ISO-2022-JP, see below), and then finally the Korean. The result may still contain non-displayable characters, but their number is minimized and the UTF-8 encoding is avoided. A problem may appear with merge data, depending on where the merge data comes from so do not use this method when using text merge in combination with certain recipients types, see below.
The second automatic choice, allow Unicode, works similarly to the first one. The Unicode encoding with UTF-8 encoding is only selected when it is determined that none of the other encodings is able to display all characters in the message. If that happens, all the caveats of using UTF-8 apply again, so be careful with this option. The same considerations as with the first automatic choice apply here too.
Either automatic choice is best suited if you do not know what kind of characters are in the email, and you want to maximize the chance that the correct encoding is used.
Special considerations about the encoding ISO-2022-JP: Text in this encoding is encoded in a way that the method of counting '?' occurrences does not work, i.e. when interpreted with the (wrong) encoding US-ASCII, the result contains almost no non-displayable characters. This causes the system to choose US-ASCII as encoding if the text is actually encoded with ISO-2022-JP. If your email content is displayed best with this encoding, please choose ISO-2022-JP explicitly from the list and avoid the automatic modes.
Mail Merge and Encodings
If LISTSERV Maestro uses a certain encoding because the user has selected a specific encoding, or has told LISTSERV Maestro to determine the encoding automatically, then the entire email message, in all its copies to all its recipients, will be sent using this same encoding.
This can create problems when using mail merge in conjunction with certain recipients types:
Send to a Subscriber List: With this recipient type, all merge data is processed through LISTSERV Maestro, where LISTSERV Maestro is able to apply the same encoding to the merge data as it applies to the email body itself. The only problem that might arise is if the merge data from the subscriber list contains a character that is not displayable in the chosen encoding of the email body. If this is the case, the character will be replaced with a '?'. This is true even for the automatic choices because the merge data from the subscriber list is not taken into consideration when LISTSERV Maestro tries to determine the optimal encoding.
To ensure that characters from the subscriber list merge data are not replaced with '?', make sure to set the encoding for sending either to the same encoding as the subscriber list uses, or set it to Unicode if your email body (outside of the merge data) contains characters that are not included in the lists's encoding.
Upload Recipients: The same comments as for "Send ot a Subscriber List" apply (see above). All merge data is processed through LISTSERV Maestro, allowing LISTSERV Maestro to apply the chosen encoding.
To ensure that characters from the uploaded merge-data are not replaced with '?', make sure to set the encoding for sending to the same encoding that you selected when you uploaded the text file, or set it to Unicode if your email body (outside of the merge data) contains characters that are not included in the upload encoding.
Select Recipients from a Database or LDAP Directory: The same comments as for "Send ot a Subscriber List" apply (see above). All merge data is processed through LISTSERV Maestro, allowing LISTSERV Maestro to apply the chosen encoding.
To ensure that characters from the merge-data that is read from the database are not replaced with '?', make sure to set the encoding for sending to the same encoding as is used in the database from where the data is selected, or set it to Unicode if your email body (outside of the merge data) contains characters that are not included in the database encoding. You may have to ask your Database Administrator for this information.
Send to a Classic LISTSERV List: In this case, the merge data comes directly from the LISTSERV list and does not pass through LISTSERV Maestro; therefore, the chosen encoding is not applied. This means that the merged data needs to be encoded in the correct encoding. If you want to use this kind of recipients type together with mail merge, then find out the encoding with which the data is stored in the LISTSERV list (which may be backed by a database). Use this same encoding for your mail. The best presumption for any standard LISTSERV installation is ISO-8859-1, which is also a safe choice if the actual encoding is US-ASCII since US-ASCII is entirely contained in ISO-8859-1.
Let LISTSERV Select Recipients from a Database or LDAP Directory: The same comments as with "Send to a Classic LISTSERV list" apply. The merge data never passes through LISTSERV Maestro. Find out from your Database Administrator which encoding the data is initially written to the database with, and then use the same encoding for your mail.
A Short Introduction to Encodings
What is an encoding? Why are they used?
Computers store all information as numbers, not letters and texts. Reading only numbers is extremely difficult for human beings; therefore, encodings have been introduced. An encoding (also called character set, character encoding, code page, or character page) is simply a table that matches numbers to letters, or more precisely, characters. This matching of numbers to characters is called mapping.
An example of mapping for the US-ASCII encoding is the number 65 represents the letter 'A', 66 represents the letter 'B', and so on. 97 represents 'a', 98 represents 'b', and so on. Not only letters are represented, but also digits (49 stands for '1'), punctuation marks (46 for '.'), and other characters. The @-character has the value 64 assigned.
When you give a computer a sequence of numbers like 77, 97, 105, and 108 and tell it that these numbers map to characters from the US-ASCII encoding, then the computer will determine that it is supposed to display these four numbers as the character string "Mail" on your screen.
Apart from US-ASCII, which only maps the numbers 0-127 to characters, there are many other encodings. The most widely used ones in the western hemisphere are the encodings from the ISO-8859 family, each defines a mapping of the numbers 0-255.
ISO-8859-1, the so called "Latin 1" encoding for West European languages, contains all sorts of "special" characters that are required by various European languages, such as 'ö' and 'ä' used in German and various Scandinavian languages, or 'é' and 'ç' used in French. In comparison, ISO-8859-7 is for the Greek language, and contains all the Greek letters, such as 'α' and 'β'.
What all ISO-8859 encodings have in common is that they contain the US-ASCII mapping, meaning that the numbers 0-127 are mapped to exactly the same characters as in US-ASCII. The remaining numbers, 128-255, are used to include all the special characters for the language or group of languages for which they are designed.
Different ISO-8859 encodings map the numbers 128-255 to different characters. For example, ISO-8859-1 maps the number 225 to the French character 'á' while the same number in the ISO-8859-7 encoding means the Greek character 'α'. As a result, simply giving a computer the number 225 and telling it that it is supposed to be a character is not enough. The computer also needs to know which character set to choose the character from. When computers transfer data between themselves, including email transfer, one computer sends email to another computer by sending a sequence of numbers. The receiving computer needs to know which encoding to use to map these numbers back to characters, so that the correct characters are displayed to the user.
Therefore, when doing email transfers, the email needs to be augmented with information about which encoding to use to interpret the numbers back to characters. The sending computer can determine this encoding (as LISTSERV Maestro does if you choose the LISTSERV Maestro determines optimal encoding automatically option), but there may be unforeseen problems that would cause the computer to select an encoding that is not really the best choice (see above for the drawbacks associated with the automatic choices).
To provide the most flexibility, LISTSERV Maestro offers you the option of defining which encoding to use for encoding your email. Alternatively, you can choose to let LISTSERV Maestro select the encoding, but be aware of the potential problems, (see above).