You need to define 'relevant content' precisely. At the moment you're having trouble because you don't actually know what you want; if you have downloaded the whole page then you have the complete object tree, you can analyse it and assign values to various metrics for each node and you can run a heuristic analysis to find the 'main' node. But you need to define what you are looking for.
Because divs can be used for actual divisions and for layout, it can be quite difficult. How do you distinguish between
<pre><div id="layout">
<div id="content">
<p>bla bla</b>
</div>
<div id="footer">
<p>some template footer stuff
</div>
</div></pre>
... and
<pre><div id="content">
<div id="p1">
<p>bla bla</b>
</div>
<div id="p2">
<p>yak yak
</div>
</div></pre>
... where you want only the first paragraph in the top example but both in the second? If the divs are different styles then that can be a clue, but then again you want to include image captions, insets and other divs which may have a different style.
If you know the sites that you're scraping then you can use that information to help you. In the extreme case, you will know that site X puts its main content inside a <div id="content"/> and you can just go straight to that item.
Any page that relies on scripts to work (i.e. to load the primary content) won't be readable unless you run a scripting engine. Not many do that for the main content, but it's something to be aware of.