How to turn list into data frame after list comprehension

I have the following code and I was wondering how do I properly turn it into a data frame with country as one column and population as the other after looping through my function with list comprehension?

from bs4 import BeautifulSoup import html from urllib.request import urlopen import pandas as pd  countries = ['af', 'ax']  def get_data(countries):     url = 'https://www.cia.gov/library/publications/the-world-factbook/geos/'+countries+'.html'     page = urlopen(url)     soup = BeautifulSoup(page,'html.parser')     # geography     country = soup.find('span', {'class' : 'region'}).text     population = soup.find('div', {'id' : 'field-population'}).find_next('span').get_text(strip=True)     dataframe = [country, population]     dataframe = pd.DataFrame([dataframe])     return dataframe results = [get_data(p) for p in countries]  

What I tried and it gives me the following data frame:

results = pd.DataFrame(results)                                        0                                       1 0   0 Afghanistan Name: 0, dtype: object    0 Afghanistan Name: 0, dtype: object 1   0 Akrotiri Name: 0, dtype: object       0 Akrotiri Name: 0, dtype: object 
Add Comment
3 Answer(s)

I’m not quite sure why you’re returning it as a DataFrame from get_data(). If you return it as a dictionary, it will be much more logical for conversion to a dataframe later.

countries = ['af', 'ax']  def get_data(countries):     url = 'https://www.cia.gov/library/publications/the-world-factbook/geos/'+countries+'.html'     page = urlopen(url)     soup = BeautifulSoup(page,'html.parser')     # geography     country = soup.find('span', {'class' : 'region'}).text     population = soup.find('div', {'id' : 'field-population'}).find_next('span').get_text(strip=True)     scraped = {'country':country, 'population':population}      return scraped results = [get_data(p) for p in countries] 

This returns a list of dictionaries such as:

[{'country': 'Afghanistan', 'population': '36,643,815'},  {'country': 'Akrotiri',   'population': 'approximately 15,500 on the Sovereign Base Areas of Akrotiri and Dhekelia including 9,700 Cypriots and 5,800 Service and UK-based contract personnel and dependents'}] 

So when you convert with pd.DataFrame(results) you get:

       country                                         population 0  Afghanistan                                         36,643,815 1     Akrotiri  approximately 15,500 on the Sovereign Base Are... 
Answered on July 16, 2020.
Add Comment
In [136]: from bs4 import BeautifulSoup      ...: import html      ...: from urllib.request import urlopen      ...: import pandas as pd      ...:      ...: countries = ['af', 'ax']      ...:      ...: def get_data(countries):      ...:     url = 'https://www.cia.gov/library/publications/the-world-factbook/geos/'+countries+'.html'      ...:     page = urlopen(url)      ...:     soup = BeautifulSoup(page,'html.parser')      ...:     # geography      ...:     country = soup.find('span', {'class' : 'region'}).text      ...:     population = soup.find('div', {'id' : 'field-population'}).find_next('span').get_text(strip=True)      ...:     json_str = {"country":country, "population":population}      ...:     return json_str      ...: results = [get_data(p) for p in countries]      ...: df = pd.DataFrame(results)  In [137]: df Out[137]:        country                                         population 0  Afghanistan                                         36,643,815 1     Akrotiri  approximately 15,500 on the Sovereign Base Are... 
Answered on July 16, 2020.
Add Comment

If you rewrite your original function as:

def get_data(countries):     url = 'https://www.cia.gov/library/publications/the-world-factbook/geos/'+countries+'.html'     page = urlopen(url)     soup = BeautifulSoup(page,'html.parser')     # geography     country = soup.find('span', {'class' : 'region'}).text     population = soup.find('div', {'id' : 'field-population'}).find_next('span').get_text(strip=True)     return country, population 

and call

results = [get_data(p) for p in countries] 

as you suggested, you can do something like this:

def listToFrame(res, column_labels=None):     C = len(res[0]) # number of columns     if column_labels is None:         column_labels = list(range(C))     dct = {}     for c in range(C):         col = []         for r in range(len(res)):             col.append(res[r][c])         dct[column_labels[c]] = col     return pd.DataFrame(dct)  df = listToFrame(results) 

or, even nicer,

df = listToFrame(results, ['Country', 'Population']) 
Add Comment

Your Answer

By posting your answer, you agree to the privacy policy and terms of service.