Using ElementTree to extract <content:encoded>

I’m currently trying to figure out how to extract the content between <content:encoded> and </content:encoded> using ElementTree in Python. Attached below is the Python code I’m currently using to solve this. I’ve currently been unable to extract the content. I want to extract "I love playing basketball and eating food". Can anyone help me to see what’s wrong with my code?

xml = '''<item>         <title>Defensive Moves</title>         <link>www.timmy256.wordpress.com</link>         <pubDate></pubDate>         <dc:creator><![CDATA[jross]]></dc:creator>         <guid isPermaLink="false"> www.timmy256.wordpress.com </guid>            <description></description>         <content:encoded><![CDATA[I love playing basketball and eating food.]]></content:encoded>         </item>'''  import xml.etree.ElementTree as ET  tree = ET.parse(xml) root = tree.getroot() data = root.iter("content:encoded").text 
Add Comment
1 Answer(s)

Another mothod.

from simplified_scrapy import SimplifiedDoc xml = '''<item>         <title>Defensive Moves</title>         <link>www.timmy256.wordpress.com</link>         <pubDate></pubDate>         <dc:creator><![CDATA[jross]]></dc:creator>         <guid isPermaLink="false"> www.timmy256.wordpress.com </guid>            <description></description>         <content:encoded><![CDATA[I love playing basketball and eating food.]]></content:encoded>         </item>''' doc = SimplifiedDoc(xml) print(doc.select('item>content:encoded>html()')[9:-3]) 

Result:

I love playing basketball and eating food. 

Here are more examples: https://github.com/yiyedata/simplified-scrapy-demo/tree/master/doc_examples

Add Comment

Your Answer

By posting your answer, you agree to the privacy policy and terms of service.