XML scrapping using nodeJs

I have a very huge xml file that I got by exporting all the data from tally, I am trying to use web scraping to get elements out of my code using cheerio, but I am having trouble with the formatting or something similar. Reading it with fs.readFileSync() works fine and the console.log shows complete xml file but when I write the file using the fs.writeFileSync it makes it look like this: output image

And my web scrapping code outputs empty file:

const cheerio = require('cheerio'); const fs = require ('fs');    var xml = fs.readFileSync('Master.xml','utf8');               const htmlC = cheerio.load(xml);                      var list = [];              list = htmlC('ENVELOPE').find('BODY>TALLYMESSAGE>STOCKITEM>LANGUAGENAME.LIST>NAME.LIST>NAME').each(function (index, element) {                 list.push(htmlC(element).attr('data-prefix'));              })              console.log(list)              fs.writeFileSync("data.html",list,()=>{}) 
Add Comment
1 Answer(s)

You might try checking to make sure that Cheerio isn’t decoding all the HTML entities. Change:

const htmlC = cheerio.load(xml); 

to:

const htmlC = cheerio.load(xml, { decodeEntities: false }); 
Answered on July 17, 2020.
Add Comment

Your Answer

By posting your answer, you agree to the privacy policy and terms of service.