Unable to parse the links of different cases from next pages using requests

Question

Home

Unable to parse the links of different cases from next pages using requests

0

I’ve created a script to parse the links of different cases revealed upon selecting an option in dropdown from a webpage. This is the website link and this is the option Probate that should be chosen from the dropdown titled as Case Type located at the top right before hitting the search button. All the other options should be as they are.

The script can parse the links of different cases from the first page flawlessly. However, I can't make the script go on to the next pages to collect links from there as well.

This is how next pages are visible in there at the bottom:

And the dropdown should look when the option is chosen:

I’ve tried so far:

import requests from bs4 import BeautifulSoup  link = "http://surrogateweb.co.ocean.nj.us/BluestoneWeb/Default.aspx"  with requests.Session() as s:     s.headers['User-Agent'] = 'Mozilla/5.0 (Windows NT 6.1; ) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.116 Safari/537.36'     r = s.get(link)     soup = BeautifulSoup(r.text,"lxml")     payload = {i['name']:i.get('value','') for i in soup.select('input[name],select')}     for k,v in payload.items():         if k.endswith('ComboBox_case_type'):             payload[k] = "Probate"         elif k.endswith('ComboBox_case_type_VI'):             payload[k] = "WILL"         elif k.endswith('ComboBox_case_type$DDD$L'):             payload[k] = "WILL"         elif k.endswith('ComboBox_town$DDD$L'):             payload[k] = "%"      r = s.post(link,data=payload)     soup = BeautifulSoup(r.text,"lxml")     for pk_id in soup.select("a.dxeHyperlink_Youthful[href*='Q_PK_ID']"):         print(pk_id.get("href"))

How can I collect the links of different cases from next pages using requests?

PS I’m not after any selenium related solution.

Ramonrachelleshelby Asked on July 16, 2020 in Python.

Share
Comment(0)

Add Comment

1 Answer(s)

Votes
Oldest

0

This codes works but use selenium instead of requests.

You need to install selenium python lib and download gecko driver. If you do not want to have geckodriver in c:/program you have to change executable_path= to the path you have geckodriver in. You maybe want to make the sleep time shorter to, but the site is loading so slow (for me) so i have to set long sleep times so the site loads correctly before trying to read from it.

from selenium import  webdriver from bs4 import BeautifulSoup import time  link = "http://surrogateweb.co.ocean.nj.us/BluestoneWeb/Default.aspx" driver = webdriver.Firefox(executable_path='c:/program/geckodriver.exe') driver.get(link) dropdown = driver.find_element_by_css_selector('#ContentPlaceHolder1_ASPxSplitter1_ASPxComboBox_case_type_B-1') dropdown.click() time.sleep(0.5) cases = driver.find_elements_by_css_selector('.dxeListBoxItem_Youthful') for case in cases:     if case.text == 'Probate':         time.sleep(5)         case.click()         time.sleep(5) search = driver.find_element_by_css_selector('#ContentPlaceHolder1_ASPxSplitter1_ASPxButton_search') search.click() while True:     time.sleep(15)     soup = BeautifulSoup(driver.page_source,"lxml")     for pk_id in soup.select("a.dxeHyperlink_Youthful[href*='Q_PK_ID']"):         print(pk_id.get("href"))     next = driver.find_elements_by_css_selector('.dxWeb_pNext_Youthful')     if len(next) > 0:         next[0].click()     else:         break

Richvirgiliokim Answered on July 16, 2020.

Share
Comment(0)

Add Comment

Your Answer

Answer 1

BuddyPress is a plugin for WordPress that enables you to create a social network or community website. It has all the...

Answer 2

I value you getting some margin to help me with this task. Without you, no part of this would have...

Answer 3

Try to define a Cohesive class, until and unless the methods are written relevant to the class and it defines...

Answer 4

Try to add exportAllData: true, as an other option, hope it helps :)

Answer 5

DataSet can read an XML, infer schema and create a tabular representation that's easy to manipulate: DataSet ip1 = new...

Answer 6

I created a class and used Xml Linq : using System; using System.Collections.Generic; using System.Linq; using System.Text; using System.Xml; using...

Answer 7

XDocument first = XDocument.Load(args[0]); XDocument second = XDocument.Load(args[1]); var result = new XElement( "ipaddresses", first.Root.Elements("ip") .Zip(second.Root.Elements("ip"), (f, s) => {...

Answer 8

Following your code for the header row, you could achieve this by an <xsl:apply-templates select="/report/order_actions/order_action[order_id = current()/order_id]" /> As well...

Answer 9

BuddyPress is a plugin for WordPress that enables you to create a social network or community website. It has all the...

Answer 10

I value you getting some margin to help me with this task. Without you, no part of this would have...

Answer 11

Try to define a Cohesive class, until and unless the methods are written relevant to the class and it defines...

Answer 12

Try to add exportAllData: true, as an other option, hope it helps :)

Answer 13

DataSet can read an XML, infer schema and create a tabular representation that's easy to manipulate: DataSet ip1 = new...

Answer 14

I created a class and used Xml Linq : using System; using System.Collections.Generic; using System.Linq; using System.Text; using System.Xml; using...

Answer 15

XDocument first = XDocument.Load(args[0]); XDocument second = XDocument.Load(args[1]); var result = new XElement( "ipaddresses", first.Root.Elements("ip") .Zip(second.Root.Elements("ip"), (f, s) => {...

Answer 16

Following your code for the header row, you could achieve this by an <xsl:apply-templates select="/report/order_actions/order_action[order_id = current()/order_id]" /> As well...

LATEST ANSWERS

Unable to parse the links of different cases from next pages using requests

Your Answer

TOP USERS

HOT QUESTIONS

LATEST ANSWERS

Unable to parse the links of different cases from next pages using requests

Your Answer

Tags Widget

TOP USERS

HOT QUESTIONS