LISTSERV Maestro 11.1-3 Help Table Of Contents

CSV Files

The term "comma-separated values" (or "tab-separated values" or "CSV") is often used as a catch-all term for all kinds of text-based data formats where the data is formatted in a line-by-line fashion. Each line contains one data record, and a number of columns per line, where the different columns are separated by comma, a tab, or some other separator character.

LISTSERV Maestro can correctly interpret comma separated text files in various formats as long as the following rules are applied:

  • Any character may be used as the separator character, although comma, tab, or semicolon is conventional.
     
  • The same separator character must be used in all lines for the entire file.
     
  • All lines in the file must have the same number of columns, which means the same number of separator characters.
     
  • Empty columns may be created in order that the same number of separator characters is present in every line of the file.
     
  • Having two separator characters in direct succession, without any characters in between, creates an empty column.
     
  • If a line begins with the separator character, Maestro assumes the line begins with an empty column.
     
  • If a line ends with the separator character, Maestro assumes the line ends with an empty column.
     
  • If the character that is used as the separator character also appears as part of the value of one or several of the column fields, then it is necessary to enclose the fields in quotation marks or another quote character.

The last rule listed above introduces the concept of "quoted values". As described, it is necessary to quote a value if the value contains the separator (because otherwise the separator would be interpreted as the start of another value). For Maestro to correctly know how to deal with quoted values, it is necessary to tell Maestro if the comma separated file contains any quoted values or not.

If a file does not contain any quoted values, then the additional rules explained below do not apply, i.e. even if one of the usual quote characters (for example quotation marks or the apostrophe) would appear anywhere in the file, they would be interpreted by Maestro as just another normal character.
However, such a file can also not have any value which contains the separator. If at least one value contains the separator, then this value must be quoted, and by this the file becomes a file with quoted values again.

If a file does contain quoted values (at least one of them), then it must follow these additional rules:

  • Any character, except for the separator character, can be used as the quote character (quotation marks or apostrophe are conventional). This character must be used both as the opening and closing quote and must be used for all quoted fields in the file.
     
  • A field must be quoted if it fulfills any of these two conditions:
     
    • If the field contains the separator character in the value, then the field must be quoted.
       
    • If the field contains the quote character in the value and this quote character is also the first character of the value, then the field must be quoted. (This however also means that if the field contains the quote character but not as the first character, then it is not necessary to quote the field.)

  • It is not necessary that all fields are quoted. Only fields that fall into one of the two cases described above have to be quoted. However, it is legal to also quote fields which do not fulfill these conditions.
    Usually one of two styles is used: One style quotes all fields (both the ones that have to be quoted and the ones which do not), while the other style quotes only exactly those fields which have to be quoted (all others are left unquoted). Maestro is able to understand both of these styles (and also mixes of the two styles, as long as the rules described here are followed).
     
  • If a field is a "quoted field" and the quote character also appears as part of the value of the field, then this character must be escaped. Escape the quote character by using it twice, in direct succession. The double appearance of the quote character will be interpreted as a single appearance that is part of the field value.
     
  • If a field is an "unquoted field" and the quote character also appears as part of the value of the field, then this character must not be escaped. Quote-escaping is only necessary in quoted fields!
     
  • A "quoted field" is parsed from the file as follows: The field starts with the opening quote and ends with the next appearance of a not-escaped quote character after the opening quote. (The end of the field must then be followed by a separator character or by the end of the line - trailing white space after the last field of the line is allowed.)
    The value of the field is the text between the two quotes, excluding the quotes. Any escaped quotes in the value will be unescaped.
     
  • An "unquoted field" is parsed from the file as follows: The field starts with the first character and ends with the next appearance of the separator character (or the end of the line). The value of the field is the text with this start and end, excluding the separator character.

Here are some examples:

Simple values, separated by comma, not quoted:

John,Doe,Chicago,USA
Lucy,Summers,London,GB
Karl,Hauser,Frankfurt,D

This will be parsed as follows:

John Doe Chicago USA
Lucy Summers London GB
Karl Hauser Frankfurt D

Simple values, separated by comma, not quoted, with empty fields:

John,,Chicago,USA
,Summers,London,GB
Karl,Hauser,Frankfurt,

This will be parsed as follows:

John   Chicago USA
  Summers London GB
Karl Hauser Frankfurt  

Values of which some contain a comma, separated by comma, quoted with <">:

Using the style that quotes all values:

"John","Doe","Chicago, Illinois","USA"
"Lucy","Summers","London, England","GB"
"Karl","Hauser","Frankfurt","D"

Or using the style that quotes only the values that have to be quoted:

John,Doe,"Chicago, Illinois",USA
Lucy,Summers,"London, England",GB
Karl,Hauser,Frankfurt,D

(The only values that have to be quoted in this example are the two values containing the separator character <,>.)

Both will be parsed as follows:

John Doe Chicago, Illinois USA
Lucy Summers London, England GB
Karl Hauser Frankfurt D

Values of which some contain a comma, separated by comma, quoted with <">, with empty fields:

Using the style that quotes all values:

"John","","Chicago, Illinois","USA"
"","Summers","London, England","GB"
"Karl","Hauser","Frankfurt",""

Or using the style that quotes only the values that have to be quoted:

John,,"Chicago, Illinois",USA
,Summers,"London, England",GB
Karl,Hauser,Frankfurt,

(The only values that have to be quoted in this example are the two values containing the separator character <,>.)

Both will be parsed as follows:

John   Chicago, Illinois USA
  Summers London, England GB
Karl Hauser Frankfurt  

Values of which some contain a comma and some the quote character, separated by comma, quoted with <">:

Using the style that quotes all values:

"John","Doe","Chicago ""The Windy City"", Illinois","USA"
"""Little"" Lucy","Summers","London, England","GB"
"Karl ""Big Boy""","Hauser","Frankfurt","D"

Or using the style that quotes only the values that have to be quoted:

John,Doe,"Chicago ""The Windy City"", Illinois",USA
"""Little"" Lucy",Summers,"London, England",GB
Karl "Big Boy",Hauser,Frankfurt,D

(The values that have to be quoted in this example are the two values containing the separator character <,> and also the first value of the second row, which starts with the quote character <">. In comparison, the first value of the third row does contain the quote character too, but not as the first character. Therefore this field does not have to be quoted and the quote character is therefore also not escaped.)

Both will be parsed as follows:

John Doe Chicago "The Windy City", Illinois USA
"Little" Lucy Summers London, England GB
Karl "Big Boy" Hauser Frankfurt D
© 2002-2023 L-Soft Sweden AB. All rights reserved.