Why can I not access the Table containers in the html?

I am very new to Python and Web-Scraping. I am trying to access the data in all of the tables on this web page and I am unsure why my code is not working. Perhaps something to do with JavaScript and python’s inability to read it. My code is:

from urllib.request import urlopen from bs4 import BeautifulSoup import requests  headers = {"User-Agent":"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.97 Safari/537.36"} res = requests.get("https://www.mcmaster.com/cam-lock-fittings/material~aluminum/", headers=headers)  soup = BeautifulSoup(res.text, 'lxml')  item_containers = soup.findAll("div", {"class":"ItmTblCntnr PrsnttnTbl"})  print(len(item_containers)) 

Any help would be greatly appreciated! Thanks!

Asked on July 16, 2020 in Python.
Add Comment
2 Answer(s)

Maybe you should try using the html.parser and the content attribute of the response:

soup = BeautifulSoup(res.content, "html.parser") 

By they way which version of Beautiful soup are you using? In mine I have to use find_all instead of findAll.

Add Comment

I went ahead and opened up the webpage that you are trying to access with your code. When you see the spinner animation on the page that indicated that it is using JavaScript. When you make a request with the requests library it doesn’t execute any JavaScript. It only receives the html that the server sends. In this case the tables you are trying to access probably dont exist in the initial page load of the web page. So of you would want to web scrape a webpage like this you would use some browser automation software like selenium.

Add Comment

Your Answer

By posting your answer, you agree to the privacy policy and terms of service.