Fastest method to import large XML into SQL-Server table
I have a really large and not to pretty XML which I want to import in my sql-server database. The format of the XML is like I said really ugly:
<myxml xmlns="http://somenamespace.whatever.com/schemas/xmldata/1/" xmlns:xs="http://www.w3.org/2001/XMLSchema-instance"> <mydata> <item> <record>some</record> <record>123</record> <record xs:nil="true" /> <record>random</record> <record>234</record> </item> <item> <record>345</record> <record>in all</record> <record>these</record> <record>cells</record> <record>123asdf</record> </item> <item> <record>how</record> <record>to</record> <record>import</record> <record>987654321</record> <record xs:nil="true" /> </item> </mydata> </myxml>
This is just a small sample. In fact there the XML is over a 100ML, it has over 200k on items and each item has 15 records but this sample will do just fine.
I know what each ‘record’ in the ‘item’ represents but for me its enough to import all record-values to column with varchar(100). Lets say this table:"
CREATE TABLE [dbo].[DataFromXml]( [Column1] [varchar](100) NULL, [Column2] [varchar](100) NULL, [Column3] [varchar](100) NULL, [Column4] [varchar](100) NULL, [Column5] [varchar](100) NULL ) ON [PRIMARY] GO
I’m able to get this done with this code:
CREATE TABLE XmlTable ( XMLData XML ) INSERT INTO XmlTable(XMLData) SELECT CONVERT(XML, BulkColumn) FROM OPENROWSET(BULK 'D:\myverylarge.xml', SINGLE_CLOB) AS x; DECLARE @XML AS XML SELECT @XML=XMLData FROM XmlTable ;WITH XMLNAMESPACES ('http://www.w3.org/2001/XMLSchema-instance' as xs, DEFAULT 'http://somenamespace.whatever.com/schemas/xmldata/1/') INSERT INTO DataFromXml(Column1, Column2, Column3, Column4, Column5) SELECT ref.value('record[1][not(@xs:nil = "true")]' ,'varchar(100)') as Column1 ,ref.value('record[2][not(@xs:nil = "true")]' ,'varchar(100)') as Column2 ,ref.value('record[3][not(@xs:nil = "true")]' ,'varchar(100)') as Column3 ,ref.value('record[4][not(@xs:nil = "true")]' ,'varchar(100)') as Column4 ,ref.value('record[5][not(@xs:nil = "true")]' ,'varchar(100)') as Column5 FROM @XML.nodes('/myxml/mydata/item') xmlData( ref )
This runs for a minute or 2 which might not be so bad. I dont have a good reference. My feeling is that this could be a lot faster since getting the XML (over 100MB) into the database using the OPENROWSET just takes seconds.
Can I optimize the insert and if so, how would I do that?
Dealing with NULL values is something special in XML.
The definition of a NULL value in XML is not existing. So
<a> <b>hi</b> <c></c> <d/> </a>
<a>
is the root elelement.<b>
is an element with atext()
node.<c>
is an empty element<d>
is a self-closing element<e>
is – uhm – not there…
Important hint: <c>
and <d>
are the same, absolutely no difference!
You can query for the element with
.value('(/a/b)[1]','nvarchar(100)')
And you can query for the the text()
node specifically
.value('(/a/b/text())[1]','nvarchar(100)')
In this you find a possible answer (a bit hidden): You can do all your code without the NULL checking predicate, if you query for the text()
node specifically.
Change this
ref.value('record[1][not(@xs:nil = "true")]' ,'varchar(100)')
to this
ref.value('(record[1]/text())[1]' ,'varchar(100)')
What might break this: If a <record>
‘s content may be an empty string you would get a NULL
back and not ''
. But it should be much faster… Hope, this is okay for you…
About performance: Read this answer. It covers your issue quite well. Especially the part where the time is consumed (follow the links in this answer).
Just to complement @Shnugo answer.
All credit goes to him.
This is your exact SQL statement. It should give you around 20% performance improvement. Please give it a shot.
;WITH XMLNAMESPACES ('http://www.w3.org/2001/XMLSchema-instance' as xs, DEFAULT 'http://somenamespace.whatever.com/schemas/xmldata/1/') INSERT INTO DataFromXml(Column1, Column2, Column3, Column4, Column5) SELECT ref.value('(record[1]/text())[1]' ,'varchar(100)') as Column1 ,ref.value('(record[2]/text())[1]' ,'varchar(100)') as Column2 ,ref.value('(record[3]/text())[1]' ,'varchar(100)') as Column3 ,ref.value('(record[4]/text())[1]' ,'varchar(100)') as Column4 ,ref.value('(record[5]/text())[1]' ,'varchar(100)') as Column5 FROM @XML.nodes('/myxml/mydata/item') xmlData(ref);