What kind of object is this and how do I work with it?

I ran a urllib.request.urlopen on a URL and then used read(). This is the content I got:

b'%PDF-1.5\r%\xc8\xc8\xc8\xc8\xc8\xc8\xc8\r1 0 obj\n<</Type/Page/Parent 41 0 R/Resources<</Font<</F1 32460 0 R/F2 32461 0 R/F5 32464 0 R/F4 32463 0 R/F6 32465 0 R>>/ProcSet[/PDF/Text/ImageB/ImageC/ImageI]>>/MediaBox[0 0 595.2 841.6]/Contents 2 0 R/Group<</Type/Group/S/Transparency/CS/DeviceRGB>>/Tabs/S/StructParents 1>>\r\nendobj\n2 0 obj\n<</Filter/FlateDecode/Length 38817>>stream\r\n\x1e\x9a\x04z\xe6\xd7S\x9e2\xf7_V<oRy\x9f1\x84\x94x\xed\xf8g\xe1.x\nx\r`p\xc2S\x06\x8b{\xb0\xe2\xca\xedG\n\xdcJ\x82\x1e\xa6)\xd9*\xb3\xf9\x8c\x12P!&\xb0\xbc\xb7Z\x8d\xa7@\x11<\x9d\xbba\xee&\xa8v?\xd4\xc9\x83\xfc\xbb2H\x01\xcf\x08\xa2\xee\x90\x8a\x1b\xee\x0e\x19F\xa9\xd8\xbb_|;]\x8e\x8a\xfc\xd7\x11\xf7\xa2\xd2W\x84+\xa5\xe0\xc3\xcc=\x9b\xd4\xf0L\xdep\xf1\xbf40>\x13PY\x89(\xc5\xbd<\xeft\x93\xc5\xa8\xd0\xf6\x16' 

(that’s just the first 500 characters)

My goal is to extract the panel data in the pdf.

Can someone please suggest where I can learn more about this and how to read it into the original structure?

The url alone directs to an online pdf.

Add Comment
1 Answer(s)

This looks like Byte code.

Check it out here (JournalDev) and here (Python Documentation)

Also you can find more info about how to deal with PDF’s in Python here

Answered on July 16, 2020.
Add Comment

Your Answer

By posting your answer, you agree to the privacy policy and terms of service.