Problem in storing dataframe to csv format

I am currently using the below code to web scrape data and then store it in a CSV file.

from bs4 import BeautifulSoup  import requests  url='https://www.business-standard.com/rss/companies-101.rss' soup = BeautifulSoup(requests.get(url).content, 'xml')  news_items = []  for item in soup.findAll('item'):     news_item = {}     news_item['title'] = item.title.text     news_item['excerpt'] = item.description.text      print(item.link.text)     s = BeautifulSoup(requests.get(item.link.text).content, 'html.parser')      news_item['text'] = s.select_one('.p-content').get_text(strip=True, separator=' ')     news_item['link'] = item.link.text     news_item['pubDate'] = item.pubDate.text     news_item['Category'] = 'Company'     news_items.append(news_item)  import pandas as pd df = pd.DataFrame(news_items) df.to_csv('company_data.csv',index = False)  

When displaying the data frame, the results look fine as attached.enter image description here But while opening the csv file, the columns are not as expected. enter image description hereCan anyone tell me the reason.

Add Comment
2 Answer(s)

The issue is that your data contains commas and the default seperator for to_csv is "," So each comma in your data set is treated as a seperate column.

If you perform df.to_excel('company_data.xlsx', index=False) you won’t have this issue since it is not comma seperated.

Add Comment

A csv file is not an Excel file despite what Microsoft pretends but is a text file containing records and fields. The records are separated with sequences of \r\n and the fields are separated with a delimiter, normally the comma. Fields can contain new lines or delimiters provided they are enclosed in quotation marks.

But Excel is known to have a very poor csv handling module. More exactly it can read what it has written, or what is formatted the way it would have written it. To be locale friendly, MS folks decided that they will use the locale decimal separator (which is the comma , in some West European locales) and will use another separator (the semicolon ; when the comma is the decimal separator). As a result, using Excel to read CSV files produced by other tools is a nightmare with possible workarounds like changing the separator when writing the CSV file, but no clean way. LibreOffice is far behind MS Office for most features except CSV handling. So my advice is to avoid using Excel for CSV files but use LibreOffice Calc instead.

Answered on July 15, 2020.
Add Comment

Your Answer

By posting your answer, you agree to the privacy policy and terms of service.