Escaped double-quote in attribute value produces invalid CSV


I have an XML file whose content is essentially tabular. Each "row" is presented as an element, and each "column" is presented as an attribute.

A typical record in the file looks like this:
<Location Id="151393" Name="Yalta - Downtown" Lat="44.498" Lng="34.173" CountryCode="UA" Address="Hotel "massandra", Drazhinskogo, 46, Yalta, 98600" CityName="Yalta" PostalCode="98600" Airport="0" RailwayStation="0"/>
The encoded value of the Address attribute looks like this:
"Hotel "massandra", Drazhinskogo, 46, Yalta, 98600"
Because the double-quote character is an XML control character, double-quotes in the data are entity-encoded (quot;).

The decoded Address value of this record should look like this:
Hotel "massandra", Drazhinskogo, 46, Yalta, 98600
XmlToCsv should use two double-quotes ("") to encode the double-quotes in the data as CSV.

The CSV-encoded value should look like this:
"Hotel ""massandra"", Drazhinskogo, 46, Yalta, 98600"
I created a file called 151393.xml and passed it through XmlToCsv.Console:
$ XmlToCsv.Console -xml 151393.xml -dir .
There is one output file called Location.csv. It looks like this:
"151393","Yalta - Downtown","44.498","34.173","UA","Hotel "massandra", Drazhinskogo, 46, Yalta, 98600","Yalta","98600","0","0"
The value in the Address column contains unescaped double-quote characters.

This confuses data import tools like ogr2ogr. See my original question on gis.se for the problems this can cause.

Can you modify the tool to correctly encode data containing double-quotes?

file attachments


iainelder wrote Feb 24, 2014 at 10:17 PM

I updated the description to use " in the right places. I must have copied it wrongly.

Thanks for fixing this!

iainelder wrote Feb 24, 2014 at 10:19 PM

Hmm, looks like codeplex doesn't let me write &quot; literally in code blocks.

The attached file contains a real example.