about ram

Jul 29, 2012 at 2:55 PM

Hi:

Thank you for your converter,  But i have a problem on it.
when i need convert a large xml file, (about 1gb, three major table )

it can't work.

Have something need to modify to let it work ?

 thanks  

Coordinator
Aug 1, 2012 at 11:06 AM

Hi Mingchei,

When I wrote the original code for the converter in 2008, I tested it with files up to 500mb on a machine with 8GB RAM. It might be needed to change code as to flush the CSV output at certain intervals during the conversion process to prevent a running out of memory. Could you send me a copy of the error message you receive? I will try to advise you further from there on.

Best regards,

Tim van der Schaaf

From: mingchei [email removed]
Sent: zondag 29 juli 2012 16:55
To: timmytimeless@hotmail.com
Subject: about ram [xmltocsv:389452]

From: mingchei

Hi:

Thank you for your converter, But i have a problem on it.
when i need convert a large xml file, (about 1gb, three major table )

it can't work.

Have something need to modify to let it work ?

thanks

Coordinator
Aug 2, 2012 at 8:31 AM
Edited Aug 2, 2012 at 8:51 AM

Hi Mingchei,

I found this on MSDN:

If you call ReadXml to load a very large file, you may encounter slow performance. To ensure best performance for ReadXml, on a large file, call the BeginLoadData method for each table in the DataSet, and then call ReadXml. Finally, call EndLoadData for each table in the DataSet, as shown in the following example:

foreach (DataTable dataTable in dataSet.Tables)
   dataTable.BeginLoadData();

dataSet.ReadXml("file.xml"); 

foreach (DataTable dataTable in dataSet.Tables)
   dataTable.EndLoadData();

Full article on MSDN is here.

You should make adjustments to the code along the lines of the following example.

In file XmlToCsvUsingDataSet.cs, rewrite code as follows:

Original code:

public XmlToCsvUsingDataSet(string xmlSourceFilePath, bool autoRenameWhenNamingConflict)
        {
            XmlDataSet = new DataSet();

            try
            {
                XmlDataSet.ReadXml(xmlSourceFilePath);

                foreach (DataTable table in XmlDataSet.Tables)
                {
                    TableNameCollection.Add(table.TableName);
                }
            }
            catch (DuplicateNameException)
            {
...

Change as follows  for memory optimization (NOT TESTED!):

public XmlToCsvUsingDataSet(string xmlSourceFilePath, bool autoRenameWhenNamingConflict)
        {
            XmlDataSet = new DataSet();

            try
            {
                foreach (DataTable table in XmlDataSet.Tables)
                {
   		    table.BeginLoadData();
                }

                XmlDataSet.ReadXml(xmlSourceFilePath);

		foreach (DataTable dataTable in dataSet.Tables)
		{
                  TableNameCollection.Add(table.TableName);
   		   dataTable.EndLoadData();
		}
            }
            catch (DuplicateNameException)
            {
...

I can't test and implement this immediately myself, but thought it may help you and others if you feel like making the adjustment yourself and recompile the code.

Regards,

Tim