XML scrapping using nodeJs
I have a very huge xml file that I got by exporting all the data from tally, I am trying to use web scraping to get elements out of my code using cheerio, but I am having trouble with the formatting or something similar. Reading it with fs.readFileSync() works fine and the console.log shows complete xml file but when I write the file using the fs.writeFileSync it makes it look like this:
And my web scrapping code outputs empty file:
const cheerio = require('cheerio'); const fs = require ('fs'); var xml = fs.readFileSync('Master.xml','utf8'); const htmlC = cheerio.load(xml); var list = []; list = htmlC('ENVELOPE').find('BODY>TALLYMESSAGE>STOCKITEM>LANGUAGENAME.LIST>NAME.LIST>NAME').each(function (index, element) { list.push(htmlC(element).attr('data-prefix')); }) console.log(list) fs.writeFileSync("data.html",list,()=>{})
You might try checking to make sure that Cheerio isn’t decoding all the HTML entities. Change:
const htmlC = cheerio.load(xml);
to:
const htmlC = cheerio.load(xml, { decodeEntities: false });