I'm getting an memory error with pandas and lists data scraping
This code grabs up data from an excel or csv file, then for each row in an specific column it do web scraping at one specific website with the information acquired. The problem is that i cant access any part of the html, so i decided to use pyautogui and some image to detect when the page is loading. Also the script keep collecting data inside two variables and at some time the thread runs out of memory, the script exports the data to an csv file. Maybe the problem is cause the two variables that collect data is consuming all my RAM, i’m using pandas to export the data extracted to an .csv file every data collected at a new row relative to the data used to web scrape it, im actually doing it creating a dataframe and then exporting it to the .csv, is there anyway to insert data in an .csv without needing to do that? Sorry for bad english. Describing proccess:
ler_excel = eg.fileopenbox("Select file...") df = pd.read_csv(ler_excel, sep = ';', keep_default_na=False) list_data = df['data'].tolist() operations = [] yes_or_no = []
This part load the .csv file and reads data from data column and save it in a list, also i do create two lists to send data scraped based in data scraped
When the data is collected the code do this
operations.append('MANY TEXT SCRAPED') if condition in yes_or_no: yes_or_no.append('yes') else: yes_or_no.append('no')
There is where the operations and yes_or_no lists became too big to my memory to handle, and from there my code insert all the content from the lists in columns inside the dataframe
df["OPERATIONS"] = pd.Series(operations) df["YES_NO] = pd.Series(yes_or_no)
Then all i need to do is export the data to a .csv
df.to_csv(cwd+"/database.csv", sep = ';', index = False)
But, when the df file become heavy, in some place at the code the thread from the looping run out of memory, cause i need to do several operations like wait elements to appear using pg.locateOnScreen and copy and paste data selecting with mouse, is there anyway to clear the data from the variable and dont lose the index or order? I need that for EVERY data the SCRAPED DATA be in the same row, but in another column, don’t know if there’s a way to do that