Parse UTF-8-BOM xml files using XmlService Google App script
I’m currently trying to extract the content of xml files located on my google drive to a spreadsheet format. I made a function that parse an xml and extract the data I need to a sheet of the current spreadsheet. My function works fine and make the job, I’m using UTF-8 encoded xml files.
My problem comes when I parse an xml encoded in UTF-8-BOM. The XmlService.parse(data)
doesn’t work anymore :
I’m having the error Exception: Error on line 1: Content is not allowed in prolog.
on the line var xmlDocument=XmlService.parse(data);
I tried to modify following line according to an answer i saw here.
var data = DriveApp.getFileById(xml).getBlob().getDataAsString("UTF-8-BOM");
But I have the following error : Exception: Invalid argument: UTF-8-BOM
on this line.
Do you have any idea of how i could make it parse my UTF-8-BOM xml?
By the way, even if the xml file is encoded in UTF-8-BOM, the first line is : <?xml version="1.0" encoding="UTF-8" standalone="yes"?>
Here is my code if needed, it works for UTF-8 xml files.
function parseBasicXML(xml,sheet) { var data = DriveApp.getFileById(xml).getBlob().getDataAsString(); var xmlDocument=XmlService.parse(data); var root=xmlDocument.getRootElement(); var items = root.getChildren("ZZZZ"); for (var i = 0; i < items.length; i++) { var items2 = items[i].getChild("AAAA"); var items3 = items[i].getChild("YYYY") sheet.getRange(i+2, 1).setValue(root.getAttribute("XXX").getValue()); sheet.getRange(i+2, 4).setValue(items[i].getAttribute("BBBB").getValue()); if(items2 != null){ sheet.getRange(i+2, 9).setValue(items2.getAttribute("WWWW").getValue()); sheet.getRange(i+2, 13).setValue(items2.getAttribute("TTTT").getValue());} if(items3 != null){sheet.getRange(i+2,7).setValue(items3.getAttribute("DDDD").getValue()); sheet.getRange(i+2,8).setValue(items3.getAttribute("OOOO").getValue());} } }
Here is a sample of an xml structure I use :
<RRRR Version="NA" Speed="100000" Gap="4" ZZZZ="AAAA" OOOO="N/A" Comments=""> <Message LLLL="XXXX" DDDD="XXXX" SSM_00="XXXX" S_01="XXXX" S_10="XXXX" S_11="XXXX" SSSS="XXXX"> <TTTT> <DDDD NNNN="XXXX" DDDD="XXXX" BBBB="XXX"/> <DDDD NNNN="XXXX" DDDD="XXXX" BBBB="XXX"/> <DDDD NNNN="XXXX" DDDD="XXXX" BBBB="XXX"/> </TTTT> <SSS> <VVVV>00</VVVV> </SSS> </Message> <Message LLLL="XXXX" DDDD="XXXX" S_00="XXXX" S_01="XXXX" S_10="XXXX" S_11="XXXX" SSSS="XXXX"> <TTTT> <DDDD NNNN="XXXX" DDDD="" BBBB="XXX"/> <DDDD NNNN="XXXX" DDDD="" BBBB="XXX"/> <DDDD NNNN="XXXX" DDDD="" BBBB="XXX"/> </TTTT> <SSS> <VVVV>00</VVVV> </SSS> </Message> <Message LLLL="XXXX" DDDD="XXXX" S_00="XXXX" S_01="XXXX" S_10="XXXX" S_11="XXXX" SSSS="XXXX"> <TTTT> <DDDD NNNN="XXXX" DDDD="" BBBB="XXX"/> <DDDD NNNN="XXXX" DDDD="" BBBB="XXX"/> <DDDD NNNN="XXXX" DDDD="" BBBB="XXX"/> </TTTT> <SSS> <VVVV>00</VVVV> </SSS> </Message> </RRRR>
I made the following code so you can test it when you encode the xml file as UTF-8 and UTF-8-BOM : (the xml
data correspond to the ID of the xml file on your dirve, the sheet
the sheet name you want your datas to be exported)
function parseSampleXML(xml,sheet) { var data = DriveApp.getFileById(xml).getBlob().getDataAsString(); // Added var xmlDocument=XmlService.parse(data); var root=xmlDocument.getRootElement(); var items = root.getChildren("Message"); for (var i = 0; i < items.length; i++) { var items3 = items[i].getChild("SSSS") sheet.getRange(i+2, 1).setValue(root.getAttribute("ZZZZ").getValue()); sheet.getRange(i+2, 4).setValue(items[i].getAttribute("LLLL").getValue()); sheet.getRange(i+2, 5).setValue(items[i].getAttribute("DDDD").getValue()); sheet.getRange(i+2, 6).setValue(items[i].getAttribute("SSSS").getValue()); if(items3 != null){sheet.getRange(i+2,7).setValue(items3.getValue());} } }
Thanks in advance
Victor