Scraping a specific website with a search box and javascripts in Python

Question

Home

Scraping a specific website with a search box and javascripts in Python

0

On the website https://sray.arabesque.com/dashboard there is a search box "input" in html. I want to enter a company name in the search box, choose the first suggestion for that name in the dropout menu (e.g., "Anglo American plc"), go to the url with the info about that company, load javascripts to get full html version of the obtained page, and then scrape it for GC Score, ESG Score, Temperature Score in the bottom.

!apt install chromium-chromedriver !cp /usr/lib/chromium-browser/chromedriver /usr/bin !pip install selenium  from selenium import webdriver from selenium.webdriver.common.keys import Keys options = webdriver.ChromeOptions() options.add_argument('-headless') options.add_argument('-no-sandbox') options.add_argument('-disable-dev-shm-usage')  wd = webdriver.Chrome('chromedriver',options=options)  companies = ['Anglo American plc']  for company in companies:   # dryscrape.start_xvfb()   # session = dryscrape.Session()   # session.visit("https://srayapi.arabesque.com/api/sray/company/history/004BTP-E")   resp = wd.get('https://sray.arabesque.com/dashboard/') #print(driver.page_source)   e = wd.find_element_by_id(id_='mat-input-0')   e.send_keys(company)   e.send_keys(Keys.ENTER)   innerHTML = e.execute_script("return document.body.innerHTML")   print(innerHTML)

I don’t quite understand how to visit an URL with info about Anglo American and scrape it if we don’t know the URL after entering the company name in the search box.

Billymargojosefa Asked on July 16, 2020 in Python.

Share
Comment(0)

Add Comment

2 Answer(s)

Votes
Oldest

0

You can do that using selenium.Couple of things you need to update.

While interacting headless you need to provide window size.

Induce WebDriverWait() to avoid synchronization issue.

Code:

from selenium import webdriver from selenium.webdriver.support.ui import WebDriverWait from selenium.webdriver.support import expected_conditions as EC from selenium.webdriver.common.by import By  options = webdriver.ChromeOptions() options.add_argument('-headless') options.add_argument('-no-sandbox') options.add_argument('-disable-dev-shm-usage') options.add_argument('window-size=1920,1080')  wd = webdriver.Chrome(options=options)  companies = ['Anglo American plc']  for company in companies:   wd.get('https://sray.arabesque.com/dashboard/')   WebDriverWait(wd, 20).until(EC.element_to_be_clickable((By.XPATH, "//a[text()='list']"))).click()   WebDriverWait(wd, 20).until(EC.element_to_be_clickable((By.XPATH, "//input[@id='mat-input-0']"))).send_keys(company)   WebDriverWait(wd, 20).until(EC.element_to_be_clickable((By.XPATH, "//span[contains(.,' Anglo American plc ')]"))).click()   WebDriverWait(wd, 20).until(EC.element_to_be_clickable((By.XPATH, "(//span[normalize-space(.)='Open dashboard'])[1]"))).click()   WebDriverWait(wd,10).until(EC.visibility_of_element_located((By.CSS_SELECTOR,"div.mat-tab-labels")))   print(wd.find_element_by_xpath("//div[@class='mat-tab-label-content'][contains(.,'GC Score')]/span").text)   print(wd.find_element_by_xpath("//div[@class='mat-tab-label-content'][contains(.,'ESG Score')]/span").text)   print(wd.find_element_by_xpath("//div[@class='mat-tab-label-content'][contains(.,'Temp')]/span").text)

Output:

57.03 53.78 2.7°C

Mohamedclarissaangie Answered on July 16, 2020.

Share
Comment(0)

Add Comment

0

Without exactly knowing why you want to use selenium, use the search and then getting another site, here is what I would do to get the data you are looking for:

import requests import json  session = requests.Session() url = 'https://srayapi.arabesque.com/api/sray/q' response = session.get(url).json()  rays = response['data']['rays'] [ray for ray in rays if ray['name'].startswith('Anglo American')]

Then do whatever you want, so for esg, gc and temperature perhaps:

myObj = [{result['name']: {'gc': result['gc'], 'esg': result['esg'], 'temp': result['score_near']}} for result in results]

Mohamedlucianofrankie Answered on July 16, 2020.

Share
Comment(0)

Add Comment

Your Answer

Answer 1

BuddyPress is a plugin for WordPress that enables you to create a social network or community website. It has all the...

Answer 2

I value you getting some margin to help me with this task. Without you, no part of this would have...

Answer 3

Try to define a Cohesive class, until and unless the methods are written relevant to the class and it defines...

Answer 4

Try to add exportAllData: true, as an other option, hope it helps :)

Answer 5

DataSet can read an XML, infer schema and create a tabular representation that's easy to manipulate: DataSet ip1 = new...

Answer 6

I created a class and used Xml Linq : using System; using System.Collections.Generic; using System.Linq; using System.Text; using System.Xml; using...

Answer 7

XDocument first = XDocument.Load(args[0]); XDocument second = XDocument.Load(args[1]); var result = new XElement( "ipaddresses", first.Root.Elements("ip") .Zip(second.Root.Elements("ip"), (f, s) => {...

Answer 8

Following your code for the header row, you could achieve this by an <xsl:apply-templates select="/report/order_actions/order_action[order_id = current()/order_id]" /> As well...

Answer 9

BuddyPress is a plugin for WordPress that enables you to create a social network or community website. It has all the...

Answer 10

I value you getting some margin to help me with this task. Without you, no part of this would have...

Answer 11

Try to define a Cohesive class, until and unless the methods are written relevant to the class and it defines...

Answer 12

Try to add exportAllData: true, as an other option, hope it helps :)

Answer 13

DataSet can read an XML, infer schema and create a tabular representation that's easy to manipulate: DataSet ip1 = new...

Answer 14

I created a class and used Xml Linq : using System; using System.Collections.Generic; using System.Linq; using System.Text; using System.Xml; using...

Answer 15

XDocument first = XDocument.Load(args[0]); XDocument second = XDocument.Load(args[1]); var result = new XElement( "ipaddresses", first.Root.Elements("ip") .Zip(second.Root.Elements("ip"), (f, s) => {...

Answer 16

Following your code for the header row, you could achieve this by an <xsl:apply-templates select="/report/order_actions/order_action[order_id = current()/order_id]" /> As well...

LATEST ANSWERS

Scraping a specific website with a search box and javascripts in Python

Your Answer

TOP USERS

HOT QUESTIONS

LATEST ANSWERS

Scraping a specific website with a search box and javascripts in Python

Your Answer

Tags Widget

TOP USERS

HOT QUESTIONS