Click here to Skip to main content
Click here to Skip to main content

Brrrr! It's cold in here

, 23 Oct 2009
Rate this:
Please Sign up or sign in to vote.
A brief description of how to parse XML from a few weather-related sites.

Introduction

While I was working on my article dealing with time servers, someone posted a question on the C/C++/MFC board about a central weather server. Even though such a thing does not currently exist, I thought it should still be possible to do some sort of "screen scraping" to get the desired data.

A few years back, I cobbled together some little VBScript files for myself that would automatically log me in to certain websites (e.g., company timesheet, no-purchase-necessary contests). They worked well enough, but they made a lot of assumptions, namely that certain HTML elements would exist and be in specific locations. If the owner of the HTML page moved stuff around, my scripts would be broken. Having this in the back of my mind, I was not too fond of trying to extract weather information from a "moving target".

Knowing that some sites, such as CP, were sending out some of their content via XML (i.e., RSS), I decided to give that a look. The result of that exercise, albeit not earth-shattering, is detailed below. Undeniably, harvesting data from an XML file rather than an HTML file is still susceptible to elements not being present or present in a different location, but it seems to be less of an issue.

I quickly found three websites that had their weather information available in XML format: NOAA, Google, and Yahoo!. You will need to change each of the URLs referenced below for your specific area unless, of course, you are interested in the weather for my area.

XML Parser

When I first started this project, I wanted to use TinyXML because I had never used it before. Since it is strictly C++ code, I had to make a few minor tweaks to get it to compile along MFC. Maybe I was just being a bit too pedantic, but looking at the code I used to parse all three XML files just did not sit right with me. Admittedly, this is my first attempt at using it, so maybe my reasoning is just off. I settled for using Microsoft's XML Core Services (MSXML) instead. To expose the type library, simply add the following (to the project's stdafx.h file):

#import <msxml6.dll>
using namespace MSXML2;

This will expose various "smart pointer" interfaces such as IXMLDOMDocument2Ptr, IXMLDOMNodePtr, and IXMLDOMNamedNodeMapPtr. I used MSXML version 6 (even though the ProgID has MSXML2 in the name) for this exercise for no other reason than it was the newest version on my machine.

NOAA

The file that we want to download from NOAA's National Weather Service is http://www.weather.gov/xml/current_obs/KTUL.xml. I've trimmed out the elements that are not part of this exercise, leaving:

<current_observation version="1.0"> 
   <location>Tulsa International Airport, OK</location> 
   <observation_time>Last Updated on Oct 14 2009, 2:53 pm CDT</observation_time> 
   <weather>Overcast</weather> 
   <temperature_string>55.0 F (12.8 C)</temperature_string> 
   <relative_humidity>83</relative_humidity> 
   <wind_string>North at 4.6 MPH (4 KT)</wind_string> 
   <dewpoint_string>50.0 F (10.0 C)</dewpoint_string> 
   <windchill_string>54 F (12 C)</windchill_string> 
   <visibility_mi>10.00</visibility_mi> 
</current_observation>

Because this file is laid out in a very straightforward fashion (i.e., little to no nesting of elements), it is very easy to parse. After the file has been downloaded, the parsing code looks like:

IXMLDOMDocument2Ptr pDoc;
HRESULT hr = pDoc.CreateInstance(_T("MSXML2.DOMDocument.6.0"));
if (SUCCEEDED(hr))
{
    if (pDoc->load(COleVariant(temp.m_strTempFilename)))
    {
        IXMLDOMNodePtr pNode = pDoc->selectSingleNode(_T("current_observation/location"));
        m_lblLocation.SetWindowText(pNode->GetnodeTypedValue().bstrVal);

        pNode = pDoc->selectSingleNode(_T("current_observation/observation_time"));
        m_lblLastUpdated.SetWindowText(pNode->GetnodeTypedValue().bstrVal);

        pNode = pDoc->selectSingleNode(_T("current_observation/weather"));
        m_lblWeather.SetWindowText(pNode->GetnodeTypedValue().bstrVal);
        
        pNode = pDoc->selectSingleNode(_T("current_observation/temperature_string"));
        m_lblTemperature.SetWindowText(pNode->GetnodeTypedValue().bstrVal);
        
        pNode = pDoc->selectSingleNode(_T("current_observation/dewpoint_string"));
        m_lblDewPoint.SetWindowText(pNode->GetnodeTypedValue().bstrVal);
        
        pNode = pDoc->selectSingleNode(_T("current_observation/relative_humidity"));        
        m_lblHumidity.SetWindowText(pNode->GetnodeTypedValue().bstrVal);

        pNode = pDoc->selectSingleNode(_T("current_observation/wind_string"));
        m_lblWind.SetWindowText(pNode->GetnodeTypedValue().bstrVal);
        
        pNode = pDoc->selectSingleNode(_T("current_observation/windchill_string"));
        m_lblWindChill.SetWindowText(pNode->GetnodeTypedValue().bstrVal);
        
        pNode = pDoc->selectSingleNode(_T("current_observation/visibility_mi"));
        m_lblVisibility.SetWindowText(pNode->GetnodeTypedValue().bstrVal);
    }
}

As you can see, there is a lot of redundancy with this. After loading the file, the cleaned up version looks like:

struct 
{
    TCHAR *pszXMLChildName;
    CWnd *pwndControl;
} ControlInfo[] =
{
    { _T("location"),           &m_lblLocation },
    { _T("observation_time"),   &m_lblLastUpdated },
    { _T("weather"),            &m_lblWeather },
    { _T("temperature_string"), &m_lblTemperature },
    { _T("dewpoint_string"),    &m_lblDewPoint },
    { _T("relative_humidity"),  &m_lblHumidity },
    { _T("wind_string"),        &m_lblWind },
    { _T("windchill_string"),   &m_lblWindChill },
    { _T("visibility_mi"),      &m_lblVisibility }
};
...
IXMLDOMNodePtr pParent = pDoc->selectSingleNode(_T("current_observation"));

for (int x = 0; x < sizeof(ControlInfo) / sizeof(ControlInfo[0]); x++)
{
    IXMLDOMNodePtr pNode = pParent->selectSingleNode(ControlInfo[x].pszXMLChildName);
    if (pNode != NULL)
        ControlInfo[x].pwndControl->SetWindowText(pNode->GetnodeTypedValue().bstrVal);
}

If I wanted to add or remove any elements, it would simply be a matter of just changing the ControlInfo structure.

Google

Google's weather feed is very similar to NOAA's. It has a bit more nesting and more sections, though. The file to download is http://www.google.com/ig/api?weather=74135. In it, the elements that we are interested in are laid out like:

<xml_api_reply version="1"> 
   <weather module_id="0" tab_id="0" mobile_row="0" mobile_zipped="1" row="0" section="0"> 
      <forecast_information> 
         <city data="Tulsa, OK" /> 
         <current_date_time data="2009-10-15 18:53:00 +0000" /> 
      </forecast_information> 
      <current_conditions> 
         <condition data="Overcast" /> 
         <temp_f data="52" /> 
         <humidity data="Humidity: 80%" /> 
         <wind_condition data="Wind: N at 7 mph" /> 
      </current_conditions> 
   </weather> 
</xml_api_reply>

After this file has been loaded, we can do the parsing with:

IXMLDOMNodePtr pNode = 
  pDoc->selectSingleNode(_T("xml_api_reply/weather/forecast_information/city"));
IXMLDOMNamedNodeMapPtr pAttributes = pNode->Getattributes();
IXMLDOMNodePtr pAttribute = pAttributes->getNamedItem(_T("data"));
m_lblCity.SetWindowText(pAttribute->GetnodeTypedValue().bstrVal);

pNode = pDoc->selectSingleNode(_T("xml_api_reply/weather/") 
             _T("forecast_information/current_date_time"));
pAttributes = pNode->Getattributes();
pAttribute = pAttributes->getNamedItem(_T("data"));
m_lblForecast.SetWindowText(pAttribute->GetnodeTypedValue().bstrVal);

pNode = pDoc->selectSingleNode(_T("xml_api_reply/weather/current_conditions/condition"));
pAttributes = pNode->Getattributes();
pAttribute = pAttributes->getNamedItem(_T("data"));
m_lblCurrent.SetWindowText(pAttribute->GetnodeTypedValue().bstrVal);

pNode = pDoc->selectSingleNode(_T("xml_api_reply/weather/current_conditions/temp_f"));
pAttributes = pNode->Getattributes();
pAttribute = pAttributes->getNamedItem(_T("data"));
m_lblTemperature.SetWindowText(pAttribute->GetnodeTypedValue().bstrVal);

pNode = pDoc->selectSingleNode(_T("xml_api_reply/weather/current_conditions/humidity"));
pAttributes = pNode->Getattributes();
pAttribute = pAttributes->getNamedItem(_T("data"));
m_lblHumidity.SetWindowText(pAttribute->GetnodeTypedValue().bstrVal);

pNode = pDoc->selectSingleNode(_T("xml_api_reply/weather/current_conditions/wind_condition"));
pAttributes = pNode->Getattributes();
pAttribute = pAttributes->getNamedItem(_T("data"));
m_lblWind.SetWindowText(pAttribute->GetnodeTypedValue().bstrVal);

This format differs somewhat from NOAA's in that attributes are used to hold the data. Like before, this code can be abbreviated to:

struct 
{
    TCHAR *pszXMLChildName;
    CWnd *pwndControl;
} ControlInfo[] =
{
    { _T("forecast_information/city"),              &m_lblCity },
    { _T("forecast_information/current_date_time"), &m_lblForecast },
    { _T("current_conditions/condition"),           &m_lblCurrent },
    { _T("current_conditions/temp_f"),              &m_lblTemperature },
    { _T("current_conditions/humidity"),            &m_lblHumidity },
    { _T("current_conditions/wind_condition"),      &m_lblWind }
};
...
IXMLDOMNodePtr pParent = pDoc->selectSingleNode(_T("xml_api_reply/weather"));

for (int x = 0; x < sizeof(ControlInfo) / sizeof(ControlInfo[0]); x++)
{
    IXMLDOMNodePtr pNode = pParent->selectSingleNode(ControlInfo[x].pszXMLChildName);
    if (pNode != NULL)
    {
        IXMLDOMNamedNodeMapPtr pAttributes = pNode->Getattributes();
        IXMLDOMNodePtr pAttribute = pAttributes->getNamedItem(_T("data"));
    
        ControlInfo[x].pwndControl->SetWindowText(pAttribute->GetnodeTypedValue().bstrVal);
    }
}

Yahoo!

I saved Yahoo!'s for last simply because it uses namespaces which require a few more lines of code in order to extract the data. There was also the need to convert wind direction from degrees to cardinal directions. The file to download is http://xml.weather.yahoo.com/forecastrss?p=74135&u=f. The relevant elements of the file are:

<rss version="2.0" xmlns:yweather="http://xml.weather.yahoo.com/ns/rss/1.0"> 
   <channel> 
      <lastBuildDate>Fri, 16 Oct 2009 12:53 pm CDT 
      <yweather:location city="Tulsa" region="OK" country="US" /> 
      <yweather:units temperature="F" distance="mi" pressure="in" speed="mph" /> 
      <yweather:wind chill="59" direction="0" speed="6" /> 
      <yweather:atmosphere humidity="53" visibility="10" pressure="30.25" rising="2" /> 
      <yweather:astronomy sunrise="7:31 am" sunset="6:47 pm" /> 
      <item> 
         <title>Conditions for Tulsa, OK at 12:53 pm CDT</title /> 
         <yweather:condition text="Partly Cloudy" code="30" 
                   temp="59" date="Fri, 16 Oct 2009 12:53 pm CDT" /> 
      </item> 
   </channel> 
</rss>

One of the interesting differences in Yahoo!'s format is that the measurement units are not part of the actual values. In the code below, I extract these units and store them for later use. Before the file is downloaded, I needed to tell the document object about the namespaces. This is done with a call to the setProperty() method:

pDoc->setProperty("SelectionNamespaces", 
                  "xmlns:yweather=\"http://xml.weather.yahoo.com/ns/rss/1.0\"");

The yweather namespace will be used in the selectSingleNode() calls below. After the file has been downloaded, it can then be parsed with code like:

IXMLDOMNodePtr pParent = pDoc->selectSingleNode(_T("rss/channel"));
if (pParent != NULL)
{
    IXMLDOMNodePtr pChild = pParent->selectSingleNode(_T("lastBuildDate"));
    m_lblForecast.SetWindowText(pChild->GetnodeTypedValue().bstrVal);

    pChild = pParent->selectSingleNode(_T("//yweather:location"));
    IXMLDOMNamedNodeMapPtr pAttributes = pChild->Getattributes();
    IXMLDOMNodePtr pAttribute = pAttributes->getNamedItem(_T("city"));
    m_lblLocation.SetWindowText(pAttribute->GetnodeTypedValue().bstrVal);

    pChild = pParent->selectSingleNode(_T("//yweather:units"));
    pAttributes = pChild->Getattributes();
    CString strUnitTemp     = CString(_T(' ')) + 
      pAttributes->getNamedItem(_T("temperature"))->GetnodeTypedValue().bstrVal;
    CString strUnitDistance = CString(_T(' ')) + 
      pAttributes->getNamedItem(_T("distance"))->GetnodeTypedValue().bstrVal;
    CString strUnitPressure = CString(_T(' ')) + 
      pAttributes->getNamedItem(_T("pressure"))->GetnodeTypedValue().bstrVal;
    CString strUnitSpeed    = CString(_T(' ')) + 
      pAttributes->getNamedItem(_T("speed"))->GetnodeTypedValue().bstrVal;

    pChild = pParent->selectSingleNode(_T("//yweather:wind"));
    pAttributes = pChild->Getattributes();
    CString strWindSpeed = CString(_T(' ')) + 
      pAttributes->getNamedItem(_T("speed"))->GetnodeTypedValue().bstrVal + strUnitSpeed;
    m_lblWind.SetWindowText(ComputeDirection(pAttributes->getNamedItem(
      _T("direction"))->GetnodeTypedValue().bstrVal) + strWindSpeed);

    pChild = pParent->selectSingleNode(_T("//yweather:atmosphere"));
    pAttributes = pChild->Getattributes();
    m_lblHumidity.SetWindowText(pAttributes->getNamedItem(
      _T("humidity"))->GetnodeTypedValue().bstrVal + CString(_T('%')));
    m_lblBarometer.SetWindowText(pAttributes->getNamedItem(
      _T("pressure"))->GetnodeTypedValue().bstrVal + strUnitPressure);
    m_lblVisibility.SetWindowText(pAttributes->getNamedItem(
      _T("visibility"))->GetnodeTypedValue().bstrVal + strUnitDistance);

    pChild = pParent->selectSingleNode(_T("//yweather:astronomy"));
    pAttributes = pChild->Getattributes();
    m_lblSunrise.SetWindowText(pAttributes->getNamedItem(
      _T("sunrise"))->GetnodeTypedValue().bstrVal);
    m_lblSunset.SetWindowText(pAttributes->getNamedItem(
      _T("sunset"))->GetnodeTypedValue().bstrVal);

    pChild = pParent->selectSingleNode(_T("item//yweather:condition"));
    pAttributes = pChild->Getattributes();
    m_lblCondition.SetWindowText(pAttributes->getNamedItem(
      _T("text"))->GetnodeTypedValue().bstrVal);
    m_lblTemperature.SetWindowText(pAttributes->getNamedItem(
      _T("temp"))->GetnodeTypedValue().bstrVal + strUnitTemp);
}

Not much can be done to shrink this down!

The wind direction is a 360-degree value that needs to be converted to 1 of 16 cardinal directions. If we break up a circle into 16 sectors, then each sector is 22.5 degrees. However, since each cardinal direction is in the middle of a sector, that means there are 11.25 degrees on either side of the cardinal direction. For example, N is from 348.75 to 11.25 degrees; S is from 168.75 to 191.25 degrees. To account for this, simply add 11.25 degrees (clockwise) before dividing by 22.5 degrees. The result will be a number in the range 0-16. The sectors are numbered 0-15, so to keep degrees larger than 348.75 in sector 0, use the modulo operator. The function to do this looks like:

CString CYahooComDlg::ComputeDirection( const TCHAR *pszDegrees )
{
    CString strDirections[16] = { _T("N"), _T("NNE"), _T("NE"), _T("ENE"),
                                  _T("E"), _T("ESE"), _T("SE"), _T("SSE"),
                                  _T("S"), _T("SSW"), _T("SW"), _T("WSW"),
                                  _T("W"), _T("WNW"), _T("NW"), _T("NNW") };

    TCHAR *pStop;
    double dDegrees = _tstof(pszDegrees) + 11.25;

    return strDirections[(UINT) (dDegrees / 22.5) % 16];
}

Extras

When calling URLDownloadToFile(), it needs a location that it can write to. My first choice for this location was the Local Settings\Temp folder. The problem was that I could not find a CSIDL value for that or even its parent. There was the TEMP environment variable, however. As I'm not a big fan of relying on environment variables, I could fall back on CSIDL_PERSONAL which resolves to the My Documents folder. Since this code would be needed in several locations, I created a handy little class (but a function would have probably sufficed) to lessen some of the redundancy. All of the work is done in the constructor so the object is ready to go once it's created.

CTempFilename::CTempFilename( HWND hWnd )
{
    BOOL    bResult = FALSE;
    TCHAR   szPath[MAX_PATH],
            szFilename[MAX_PATH];

    // first try the environment variable
    if (GetEnvironmentVariable(_T("TEMP"), szPath, sizeof(szPath)) != 0)
        bResult = TRUE;

    // if that didn't work, use a CSIDL
    if (! bResult)
    {
        if (SUCCEEDED(SHGetFolderPath(hWnd, CSIDL_PERSONAL, NULL, 
                                      SHGFP_TYPE_CURRENT, szPath)))
            bResult = TRUE;
    }

    if (bResult && GetTempFileName(szPath, _T("Weather"), 0, szFilename) != 0)
        m_strTempFilename = szFilename;
}

Enjoy!

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

About the Author

DavidCrow
Software Developer (Senior) Pinnacle Business Systems
United States United States

The page you are looking for might have been removed, had its name changed, or is temporarily unavailable.
 
HTTP 404 - File not found
Internet Information Services

Comments and Discussions

 
QuestionIts a very gud article PinmemberLe@rner26-Jul-11 19:24 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.

| Advertise | Privacy | Mobile
Web03 | 2.8.140721.1 | Last Updated 23 Oct 2009
Article Copyright 2009 by DavidCrow
Everything else Copyright © CodeProject, 1999-2014
Terms of Service
Layout: fixed | fluid