Click here to Skip to main content
15,886,362 members
Please Sign up or sign in to vote.
0.00/5 (No votes)
See more:
I currently have this code in Python using feedparser module:



Python
import feedparser

RSS_FEEDS = {'cnn': 'http://rss.cnn.com/rss/edition.rss'}

def get_news_test(publication="cnn"):
    feed = feedparser.parse(RSS_FEEDS[publication])
    articles_cnn = feed['entries']

    for article in articles_cnn:
        print(article)


get_news_test()


This returns the following information (a single iteration):

HTML
<item>
            <title>
                <![CDATA[Are China's latest weapons science fiction or battle-ready?]]>
            </title>
            <description>
                <![CDATA[Since the beginning of January, the Chinese military has revealed a dizzying array of sophisticated and powerful new weaponry. ]]>
            </description>
            <link>https://www.cnn.com/2019/01/19/asia/china-new-weapons-2019-intl/index.html</link>
            <guid isPermaLink="true">https://www.cnn.com/2019/01/19/asia/china-new-weapons-2019-intl/index.html</guid>
            <pubDate>Sun, 20 Jan 2019 06:04:16 GMT</pubDate>
            <media:group>
                <media:content medium="image" url="https://cdn.cnn.com/cnnnext/dam/assets/190119113947-china-df-26-missile-beijing-super-169.jpg" height="619" width="1100" />
                <media:content medium="image" url="https://cdn.cnn.com/cnnnext/dam/assets/190119113947-china-df-26-missile-beijing-large-11.jpg" height="300" width="300" />
                <media:content medium="image" url="https://cdn.cnn.com/cnnnext/dam/assets/190119113947-china-df-26-missile-beijing-vertical-large-gallery.jpg" height="552" width="414" />
                <media:content medium="image" url="https://cdn.cnn.com/cnnnext/dam/assets/190119113947-china-df-26-missile-beijing-video-synd-2.jpg" height="480" width="640" />
                <media:content medium="image" url="https://cdn.cnn.com/cnnnext/dam/assets/190119113947-china-df-26-missile-beijing-live-video.jpg" height="324" width="576" />
                <media:content medium="image" url="https://cdn.cnn.com/cnnnext/dam/assets/190119113947-china-df-26-missile-beijing-t1-main.jpg" height="250" width="250" />
                <media:content medium="image" url="https://cdn.cnn.com/cnnnext/dam/assets/190119113947-china-df-26-missile-beijing-vertical-gallery.jpg" height="360" width="270" />
                <media:content medium="image" url="https://cdn.cnn.com/cnnnext/dam/assets/190119113947-china-df-26-missile-beijing-story-body.jpg" height="169" width="300" />
                <media:content medium="image" url="https://cdn.cnn.com/cnnnext/dam/assets/190119113947-china-df-26-missile-beijing-t1-main.jpg" height="250" width="250" />
                <media:content medium="image" url="https://cdn.cnn.com/cnnnext/dam/assets/190119113947-china-df-26-missile-beijing-assign.jpg" height="186" width="248" />
                <media:content medium="image" url="https://cdn.cnn.com/cnnnext/dam/assets/190119113947-china-df-26-missile-beijing-hp-video.jpg" height="144" width="256" />
            </media:group>
        </item>


I know I can return some portions of this, for instance, the title by calling:

Python
print(article.title)


Someone said this is json data but I am having a hard time trying to get the individual image tag.

What I have tried:

I have tried calling the <media:content> as a key but that doesn't work.
Posted
Updated 19-Jan-19 21:44pm
v2

1 solution

It is not JSON, it is XML. See XML Processing Modules — Python 3.7.2 documentation[^].
 
Share this answer
 
Comments
Member 14123629 20-Jan-19 4:44am    
Thanks! I did this and can get a list of the image urls but I still don't know how to get to the individual elements. :(

from bs4 import BeautifulSoup
import requests

source = requests.get('http://rss.cnn.com/rss/edition.rss')

soup = BeautifulSoup(source.text, 'xml')


#media = media.find_all("url")

for url in soup.find_all("media:content"):
print(url)
Richard MacCutchan 20-Jan-19 6:49am    
Sorry, but I do not know BeautifulSoup. Try a Google search to find sample code.

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)



CodeProject, 20 Bay Street, 11th Floor Toronto, Ontario, Canada M5J 2N8 +1 (416) 849-8900