Escaped double-quote in attribute value produces invalid CSV

Sep 4, 2013 at 5:09 PM
I have an XML file whose content is essentially tabular. Each "row" is presented as an element, and each "column" is presented as an attribute.

A typical record in the file looks like this:
<Location Id="151393" Name="Yalta - Downtown" Lat="44.498" Lng="34.173" CountryCode="UA" Address="Hotel &quot;massandra&quot;, Drazhinskogo, 46, Yalta, 98600" CityName="Yalta" PostalCode="98600" Airport="0" RailwayStation="0"/>
The encoded value of the Address attribute looks like this:
"Hotel &quot;massandra&quot;, Drazhinskogo, 46, Yalta, 98600"
Because the double-quote character is an XML control character, double-quotes in the data are entity-encoded (quot;).

The decoded Address value of this record should look like this:
Hotel "massandra", Drazhinskogo, 46, Yalta, 98600
XmlToCsv should use two double-quotes ("") to encode the double-quotes in the data as CSV.

The CSV-encoded value should look like this:
"Hotel ""massandra"", Drazhinskogo, 46, Yalta, 98600"
I created a file called 151393.xml and passed it through XmlToCsv.Console:
$ XmlToCsv.Console -xml 151393.xml -dir .
There is one output file called Location.csv. It looks like this:
Id,Name,Lat,Lng,CountryCode,Address,CityName,PostalCode,Airport,RailwayStation
"151393","Yalta - Downtown","44.498","34.173","UA","Hotel "massandra", Drazhinskogo, 46, Yalta, 98600","Yalta","98600","0","0"
The value in the Address column contains unescaped double-quote characters.

This confuses data import tools like ogr2ogr. See my original question on gis.se for the problems this can cause.

Can you modify the tool to correctly encode data containing double-quotes?
Coordinator
Sep 7, 2013 at 12:06 PM
Hi Iain

Is the following csv output as desired? Please confirm and I'll check-in the changes which overcome the issue described to source control and publish an updated version of the conversion library.

Id,Name,Lat,Lng,CountryCode,Address,CityName,PostalCode,Airport,RailwayStation
"151393","Yalta - Downtown","44.498","34.173","UA","Hotel ""massandra"", Drazhinskogo, 46, Yalta, 98600","Yalta","98600","0","0"

Kind regards,

Tim


Sep 7, 2013 at 2:12 PM
That looks perfect.

Will you reply here when the update is available?
Coordinator
Sep 10, 2013 at 9:50 AM
Hi Iainelder,

I am in the middle of moving over from Germany to Holland. Have not yet had a chance of uploading this fix, my excuses for that. It will probably be done this weekend.

Kind regards,

Tim


Sep 10, 2013 at 10:37 AM
Edited Sep 10, 2013 at 10:39 AM
Thanks for the update, Tim. Hope the move goes smoothly!

I'm happy to wait till the weekend for the fix.

For now I'm using A7Soft's xml2csv to convert the XML.

The equivalent command line for xml2csv looks like this:
xml2csv 151393.xml locations.csv xml2csv-fields.txt -Q
The file xml2csv-fields.txt contains the names of all the elements I want to appear as columns in the CSV. It looks like this:
Id,Name,Lat,Lng,CountryCode,Address,CityName,Airport,AirportCode,RailwayStation
The parameter -Q means 'enclose all values in doublequotes and escape doublequotes by repeating them' (the same behavior I requested here).

As you can see, your tool has a simpler interface because it automatically works out all the column names.

I will be happy to switch back to your tool when the update is released!
Feb 20, 2014 at 11:27 AM
Hi Tim,

I have the same problem with doublequotes. Did you solve this feature?

Viele Grüße
Sven
Coordinator
Feb 24, 2014 at 10:07 PM
This is finally resolved. Please let me know if it works for you.
Feb 25, 2014 at 6:56 AM
Yes, it works Thank you ;-) Am 25. Februar 2014, hat timmytimeless geschrieben: > > > From: timmytimeless > > This is finally resolved. Please let me know if it works for you. > > > Read the full discussion online . > > To add a post to this discussion, reply to this email ([email removed] ) > > To start a new discussion for this project, email > > You are receiving this email because you subscribed to this discussion on CodePlex. You can unsubscribe on CodePlex.com. > > Please note: Images and attachments will be removed from emails. Any posts to this discussion will also be available online at CodePlex.com Yes, it works
Thank you ;-)

Am 25. Februar 2014, hat timmytimeless <[email removed]> geschrieben:

From: timmytimeless

This is finally resolved. Please let me know if it works for you.

Read the full discussion online.

To add a post to this discussion, reply to this email ([email removed])

To start a new discussion for this project, email [email removed]

You are receiving this email because you subscribed to this discussion on CodePlex. You can unsubscribe on CodePlex.com.

Please note: Images and attachments will be removed from emails. Any posts to this discussion will also be available online at CodePlex.com