Fastest method to import large XML into SQL-Server table

I have a really large and not to pretty XML which I want to import in my sql-server database. The format of the XML is like I said really ugly:

<myxml xmlns="http://somenamespace.whatever.com/schemas/xmldata/1/" xmlns:xs="http://www.w3.org/2001/XMLSchema-instance">     <mydata>         <item>             <record>some</record>             <record>123</record>             <record xs:nil="true" />             <record>random</record>             <record>234</record>             </item>         <item>             <record>345</record>             <record>in all</record>             <record>these</record>             <record>cells</record>             <record>123asdf</record>                     </item>         <item>             <record>how</record>             <record>to</record>             <record>import</record>             <record>987654321</record>             <record xs:nil="true" />         </item>     </mydata> </myxml> 

This is just a small sample. In fact there the XML is over a 100ML, it has over 200k on items and each item has 15 records but this sample will do just fine.

I know what each ‘record’ in the ‘item’ represents but for me its enough to import all record-values to column with varchar(100). Lets say this table:"

CREATE TABLE [dbo].[DataFromXml](     [Column1] [varchar](100) NULL,     [Column2] [varchar](100) NULL,     [Column3] [varchar](100) NULL,     [Column4] [varchar](100) NULL,     [Column5] [varchar](100) NULL ) ON [PRIMARY] GO 

I’m able to get this done with this code:

CREATE TABLE XmlTable (     XMLData XML )  INSERT INTO XmlTable(XMLData) SELECT CONVERT(XML, BulkColumn)  FROM OPENROWSET(BULK 'D:\myverylarge.xml', SINGLE_CLOB) AS x;  DECLARE @XML AS XML SELECT @XML=XMLData FROM XmlTable  ;WITH XMLNAMESPACES ('http://www.w3.org/2001/XMLSchema-instance' as xs, DEFAULT 'http://somenamespace.whatever.com/schemas/xmldata/1/') INSERT INTO DataFromXml(Column1, Column2, Column3, Column4, Column5) SELECT  ref.value('record[1][not(@xs:nil = "true")]' ,'varchar(100)') as Column1         ,ref.value('record[2][not(@xs:nil = "true")]' ,'varchar(100)') as Column2         ,ref.value('record[3][not(@xs:nil = "true")]' ,'varchar(100)') as Column3         ,ref.value('record[4][not(@xs:nil = "true")]' ,'varchar(100)') as Column4         ,ref.value('record[5][not(@xs:nil = "true")]' ,'varchar(100)') as Column5         FROM @XML.nodes('/myxml/mydata/item') xmlData( ref ) 

This runs for a minute or 2 which might not be so bad. I dont have a good reference. My feeling is that this could be a lot faster since getting the XML (over 100MB) into the database using the OPENROWSET just takes seconds.

Can I optimize the insert and if so, how would I do that?

Asked on July 16, 2020 in XML.
Add Comment
2 Answer(s)

Dealing with NULL values is something special in XML.

The definition of a NULL value in XML is not existing. So

<a>     <b>hi</b>     <c></c>     <d/> </a> 
  • <a> is the root elelement.
  • <b> is an element with a text() node.
  • <c> is an empty element
  • <d> is a self-closing element
  • <e> is – uhm – not there…

Important hint: <c> and <d> are the same, absolutely no difference!

You can query for the element with

.value('(/a/b)[1]','nvarchar(100)') 

And you can query for the the text() node specifically

.value('(/a/b/text())[1]','nvarchar(100)') 

In this you find a possible answer (a bit hidden): You can do all your code without the NULL checking predicate, if you query for the text() node specifically.

Change this

ref.value('record[1][not(@xs:nil = "true")]' ,'varchar(100)') 

to this

ref.value('(record[1]/text())[1]' ,'varchar(100)') 

What might break this: If a <record>‘s content may be an empty string you would get a NULL back and not ''. But it should be much faster… Hope, this is okay for you…

About performance: Read this answer. It covers your issue quite well. Especially the part where the time is consumed (follow the links in this answer).

Answered on July 16, 2020.
Add Comment

Just to complement @Shnugo answer.

All credit goes to him.

This is your exact SQL statement. It should give you around 20% performance improvement. Please give it a shot.

;WITH XMLNAMESPACES ('http://www.w3.org/2001/XMLSchema-instance' as xs, DEFAULT 'http://somenamespace.whatever.com/schemas/xmldata/1/') INSERT INTO DataFromXml(Column1, Column2, Column3, Column4, Column5) SELECT  ref.value('(record[1]/text())[1]' ,'varchar(100)') as Column1         ,ref.value('(record[2]/text())[1]' ,'varchar(100)') as Column2         ,ref.value('(record[3]/text())[1]' ,'varchar(100)') as Column3         ,ref.value('(record[4]/text())[1]' ,'varchar(100)') as Column4         ,ref.value('(record[5]/text())[1]' ,'varchar(100)') as Column5 FROM @XML.nodes('/myxml/mydata/item') xmlData(ref); 
Add Comment

Your Answer

By posting your answer, you agree to the privacy policy and terms of service.