Click here to Skip to main content
14,427,846 members
Rate this:
Please Sign up or sign in to vote.
See more:
I currently have this code in Python using feedparser module:



import feedparser

RSS_FEEDS = {'cnn': 'http://rss.cnn.com/rss/edition.rss'}

def get_news_test(publication="cnn"):
    feed = feedparser.parse(RSS_FEEDS[publication])
    articles_cnn = feed['entries']

    for article in articles_cnn:
        print(article)


get_news_test()


This returns the following information (a single iteration):

<item>
            <title>
                <![CDATA[Are China's latest weapons science fiction or battle-ready?]]>
            </title>
            <description>
                <![CDATA[Since the beginning of January, the Chinese military has revealed a dizzying array of sophisticated and powerful new weaponry. ]]>
            </description>
            <link>https://www.cnn.com/2019/01/19/asia/china-new-weapons-2019-intl/index.html</link>
            <guid isPermaLink="true">https://www.cnn.com/2019/01/19/asia/china-new-weapons-2019-intl/index.html</guid>
            <pubDate>Sun, 20 Jan 2019 06:04:16 GMT</pubDate>
            <media:group>
                <media:content medium="image" url="https://cdn.cnn.com/cnnnext/dam/assets/190119113947-china-df-26-missile-beijing-super-169.jpg" height="619" width="1100" />
                <media:content medium="image" url="https://cdn.cnn.com/cnnnext/dam/assets/190119113947-china-df-26-missile-beijing-large-11.jpg" height="300" width="300" />
                <media:content medium="image" url="https://cdn.cnn.com/cnnnext/dam/assets/190119113947-china-df-26-missile-beijing-vertical-large-gallery.jpg" height="552" width="414" />
                <media:content medium="image" url="https://cdn.cnn.com/cnnnext/dam/assets/190119113947-china-df-26-missile-beijing-video-synd-2.jpg" height="480" width="640" />
                <media:content medium="image" url="https://cdn.cnn.com/cnnnext/dam/assets/190119113947-china-df-26-missile-beijing-live-video.jpg" height="324" width="576" />
                <media:content medium="image" url="https://cdn.cnn.com/cnnnext/dam/assets/190119113947-china-df-26-missile-beijing-t1-main.jpg" height="250" width="250" />
                <media:content medium="image" url="https://cdn.cnn.com/cnnnext/dam/assets/190119113947-china-df-26-missile-beijing-vertical-gallery.jpg" height="360" width="270" />
                <media:content medium="image" url="https://cdn.cnn.com/cnnnext/dam/assets/190119113947-china-df-26-missile-beijing-story-body.jpg" height="169" width="300" />
                <media:content medium="image" url="https://cdn.cnn.com/cnnnext/dam/assets/190119113947-china-df-26-missile-beijing-t1-main.jpg" height="250" width="250" />
                <media:content medium="image" url="https://cdn.cnn.com/cnnnext/dam/assets/190119113947-china-df-26-missile-beijing-assign.jpg" height="186" width="248" />
                <media:content medium="image" url="https://cdn.cnn.com/cnnnext/dam/assets/190119113947-china-df-26-missile-beijing-hp-video.jpg" height="144" width="256" />
            </media:group>
        </item>


I know I can return some portions of this, for instance, the title by calling:

print(article.title)


Someone said this is json data but I am having a hard time trying to get the individual image tag.

What I have tried:

I have tried calling the <media:content> as a key but that doesn't work.
Posted
Updated 19-Jan-19 22:44pm
v2

1 solution

Rate this:
Please Sign up or sign in to vote.

Solution 1

It is not JSON, it is XML. See XML Processing Modules — Python 3.7.2 documentation[^].
   
Comments
Member 14123629 20-Jan-19 4:44am
   
Thanks! I did this and can get a list of the image urls but I still don't know how to get to the individual elements. :(

from bs4 import BeautifulSoup
import requests

source = requests.get('http://rss.cnn.com/rss/edition.rss')

soup = BeautifulSoup(source.text, 'xml')


#media = media.find_all("url")

for url in soup.find_all("media:content"):
print(url)
Richard MacCutchan 20-Jan-19 6:49am
   
Sorry, but I do not know BeautifulSoup. Try a Google search to find sample code.

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)




CodeProject, 503-250 Ferrand Drive Toronto Ontario, M3C 3G8 Canada +1 416-849-8900 x 100