Why can I not access the Table containers in the html?
I am very new to Python and Web-Scraping. I am trying to access the data in all of the tables on this web page and I am unsure why my code is not working. Perhaps something to do with JavaScript and python’s inability to read it. My code is:
from urllib.request import urlopen from bs4 import BeautifulSoup import requests headers = {"User-Agent":"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.97 Safari/537.36"} res = requests.get("https://www.mcmaster.com/cam-lock-fittings/material~aluminum/", headers=headers) soup = BeautifulSoup(res.text, 'lxml') item_containers = soup.findAll("div", {"class":"ItmTblCntnr PrsnttnTbl"}) print(len(item_containers))
Any help would be greatly appreciated! Thanks!
Maybe you should try using the html.parser and the content attribute of the response:
soup = BeautifulSoup(res.content, "html.parser")
By they way which version of Beautiful soup are you using? In mine I have to use find_all instead of findAll.
I went ahead and opened up the webpage that you are trying to access with your code. When you see the spinner animation on the page that indicated that it is using JavaScript. When you make a request with the requests library it doesn’t execute any JavaScript. It only receives the html that the server sends. In this case the tables you are trying to access probably dont exist in the initial page load of the web page. So of you would want to web scrape a webpage like this you would use some browser automation software like selenium.