 |
|
 |
Adam,
A continued thanks for your efforts, works like a champ!
Byron
|
|
|
|
 |
|
 |
Hi
Adam; just a quick note to say thanks for your article. Really helpful.
|
|
|
|
 |
|
 |
the application works fine for me on XP but on Vista it won't run
any suggestions?
Cmd: GetSiteThumbnail.exe http://www.google.com 12312313.jpg 1024 768 115 86
Error:
------------------------------------------------------
[Window Title]
Microsoft Windows
[Main Instruction]
Site Thumbnail Getter has stopped working
[Content]
A problem caused the program to stop working correctly. Windows will close the program and notify you if a solution is available.
[Close program]
------------------------------------------------------
(I've tried running it in administrator mode: still nothing..)
|
|
|
|
 |
|
 |
weird, as it seems to be workin in Vista (Home premium) for me. Do you perhaps run it off on a network share? This sometimes proves to cause problems for .net apps. Although I tried that too and it still worked. Do you have the latest .net installed on both (patches and the likes)? I might have compiled it against 3.5 (although this still should be running I think).
Alas it seems that it's all the help I can offer at this point as it works perfectly well for me and actually even was compiled on Vista.
|
|
|
|
 |
|
 |
Hi Adam,
thanks for your reply.
Iam running it on vista ultimate with .net framework 3.5. no network share: c:\thumbnail\GetSiteThumbnail.exe
hmm will look a little further in the matter because you say its working fine for you.. I'll keep you posted.
cheers
|
|
|
|
 |
|
|
 |
|
 |
After spending a lot of time tiring to set up a reliable way to capture webpage snapshots I gave up! Here are some issues I ran into along the way:
1. Capturing snapshots is a resource hog.
2. Security was a big issue; websites containing harmful script cannot be captured because they have to be loaded on your server making it vulnerable
3. Huge bandwidth consumption
4. There are so many different websites and issues you will run into such as pop up messages, flash, and JavaScript to name a few. It is difficult to capture a wide range of websites correctly.
My final solution was to abandon my attempt and use a free website thumbnail generator. Snapcasa servers about 16000+ website snapshots daily on three of my domains for free and without a watermark. There are many snapshot providers online but I found snapcasa.com to be the best. Save yourself the frustration I went through and let one of the online snapshot providers worry about it.
|
|
|
|
 |
|
 |
Your post was not very helpful on a site where people are looking for coding solutions to technical problems. Not sure how the services you suggest would work in say an intranet environment. This is a good article and can be extended to solve specific problem areas as source code is provided unlike Snapcasa etc. Anyone could google for a service like that!
|
|
|
|
 |
|
 |
I like this idea, but I've got this error on line:
while (webBrowser.ReadyState != WebBrowserReadyState.Complete)
System.UnauthorizedAccessException was unhandled
Message="Access is denied. (Exception from HRESULT: 0x80070005 (E_ACCESSDENIED))"
Source="System.Windows.Forms"
StackTrace:
at System.Windows.Forms.UnsafeNativeMethods.IHTMLLocation.GetHref()
at System.Windows.Forms.WebBrowser.get_Document()
at System.Windows.Forms.WebBrowser.get_ReadyState()
at GetSiteThumbnail.WebPageBitmap.Fetch() in E:\projecten.2005\rss\siteThumbnail\WebPageBitmap.cs:line 29
at GetSiteThumbnail.Program.Main(String[] args) in E:\projecten.2005\rss\siteThumbnail\Program.cs:line 45
at System.AppDomain.nExecuteAssembly(Assembly assembly, String[] args)
at System.AppDomain.ExecuteAssembly(String assemblyFile, Evidence assemblySecurity, String[] args)
at Microsoft.VisualStudio.HostingProcess.HostProc.RunUsersAssembly()
at System.Threading.ThreadHelper.ThreadStart_Context(Object state)
at System.Threading.ExecutionContext.Run(ExecutionContext executionContext, ContextCallback callback, Object state)
at System.Threading.ThreadHelper.ThreadStart()
this after the 4th time te while loop evaluates the condition. Someone has also this problem?
I'm using: WinXp sp2, Ie7 and .net 2.0
-- modified at 18:15 Friday 14th September, 2007
It seems that it only appears with sites with javascript (e.g. www.tweakers.net)
(see also reply http://www.codeproject.com/csharp/ExtendedWebBrowser.asp?df=100&forumid=285594&exp=0&select=2126380)
Google.nl 'works' (no exception, but blank image)
|
|
|
|
 |
|
 |
I ported this app into mine, which is web-based and written in c# 2.0. I've a problem with this:
webBrowser.Navigate(url);
while (webBrowser.ReadyState != WebBrowserReadyState.Complete)
{
Application.DoEvents();
}
The DoEvents() method fire the html fetched from url to my browser, so I have to download the html file. This should not happens since the code would call the method to create the thumbnail. Any idea?
|
|
|
|
 |
|
 |
Im very new to C Sharp. While using the code (in C# console app) Im getting this error message "The type or namespace name 'WebBrowser' could not be found (are you missing a using directive or an assembly reference?)"! I dont know what to do, any help please!
Again, how do I make sure the app keeps running in different machine. Is there a set of Dependant DLLs that I can pack along with the app?
|
|
|
|
 |
|
 |
I want a similar application design (to create website thumbnailer) I was wondering if there is any replacement in Java?
|
|
|
|
 |
|
 |
I've not touched java for a couple of years now, from what I can remember you could probably (theoretically) access the IE com objects through JINI, but then of course you would loose portability. If there is a good HTML control for Java, I guess you could try to use that, but as I said, it's been long since I have been doing anything in Java so can't really come with any good suggestion there.
|
|
|
|
 |
|
 |
Very nice job. The problem is that you don't always know the size of the page you want to capture, and the thumbnails width and height parameters might cause the a partly capture of the page.
To over come this, you can find this parameters from the web browser itself:
For example, add this code to the document completed event:
HtmlDocument doc = webBrowser.Document;
doc.Body.SetAttribute("scroll", "yes");
scrollHeight = int.Parse(doc.Body.GetAttribute("scrollHeight"));
scrollWidth = int.Parse(doc.Body.GetAttribute("scrollWidth"));
webBrowser.Size = new Size(scrollWidth, scrollHeight);
and then the Getbitmap should look similar to that:
internal Bitmap GetBitmap()
{
Bitmap bitmap = new Bitmap(scrollWidth, scrollHeight);
Rectangle bitmapRect = new Rectangle(0, 0, scrollWidth, scrollHeight);
webBrowser.DrawToBitmap(bitmap, bitmapRect);
return bitmap;
}
And you can skip the thumbnails width and height parameters
|
|
|
|
 |
|
|
 |
|
 |
I am building a similar class library to take website snapshots. I only seem to have problems with blank images when I try to capture websites that have Java Applets on them. I was curious, so I downloaded your application to see if it has the same issue: unfortunately it does.
I created a windows form application that has a built-in WebBrowser control and the same website with the Java Applet then loads just fine. I noticed something though in the Windows Form application. The website has a slight delay while it loads the Java Applet.
I have a theory here; Even though the DocumentCompleted event fires in my class library, I don't think the WebBrowser control is really done painting the page to the browser screen, so when you call Browser.DrawToBitmap, it paints a blank image because it hasn't been drawn to the browser's window yet.
You can test your application and the blank image it gets by the following page...
http://www.w3.org/People/mimasa/test/object/java/clock[^]
Can someone please verify on their system they also get a blank page? I have tried it on two different systems so far. It could be an IE Protection Setting, but I can't seem to find a way to get around it by using code or by setting my IE settings to a lower setting. Any help would be greatly appreciated, thanks.
- Jon
|
|
|
|
 |
|
 |
such as www.google.com?
|
|
|
|
 |
|
 |
Hey, I've had this problem on a number of machines that I distributed a similar application to. I just found the common link: Machines running the .NET 2.0 framework, having MSIE 7 installed, work just fine. No IE7 means a lot of blank, or white images.
Hopefully this helps!
-Colin
|
|
|
|
 |
|
 |
I have .net framework 2.0 and IE7, but still get all blank pages.
zhongjie, MCPD
|
|
|
|
 |
|
 |
I tried same code in an window application(form application), it works fine. I thinkn that maybe some API works wierd for console application
Zhongjie
MCPD
|
|
|
|
 |
|
 |
According to MSDN, the DrawToBitmap method is not supported for the WebBrowser control, which is why some pages render OK and others don't. If you look into this problem, some people have problems with pages showing Java. For me simple pages like www.google.co.uk would not render.
The only sure fire way round I found was to fall back to using the raw interfaces:
1) Add a reference to MSHTML.tlb
2) Create your own version of IHTMLElementRender which uses an IntPtr rather than _RemotableHandle:
[InterfaceType(1)]
[Guid("3050F669-98B5-11CF-BB82-00AA00BDCE0B")]
public interface IHTMLElementRender2
{
void DrawToDC(IntPtr hdc);
void SetDocumentPrinter(string bstrPrinterName, ref _RemotableHandle hdc);
}
2) Use this code to capture the initial web page image:
IHTMLDocument2 rawDoc = (IHTMLDocument2)hiddenWebBrowser.Document.DomDocument;
IHTMLElement rawBody = rawDoc.body;
IHTMLElementRender2 render = (IHTMLElementRender2)rawBody;
Bitmap screenCapture = new Bitmap(width, height);
Rectangle drawRectangle = new Rectangle(0, 0, width, height);
myWebBrowser.DrawToBitmap(screenCapture, drawRectangle);
Graphics graphics = Graphics.FromImage(screenCapture);
IntPtr graphicshdc = graphics.GetHdc();
render.DrawToDC(graphicshdc);
bmpg.ReleaseHdc();
bmpg.Dispose();
|
|
|
|
 |
|
 |
hmm... are hiddenWebBrowser and myWebBrowser the same?
if not, how should we make use of them separately?
and where does bmpg come from?
should we call screenCapture.Save(...) afterwards?
please forgive my ignorance.
Thanks a million!
|
|
|
|
 |
|
 |
Hi all,
Piers, your solution worked fine.
I post all the code of file WebPageBitmap.cs that works for me:
using System;
using System.Windows.Forms;
using System.Drawing;
using mshtml;
using System.Reflection;
using System.Runtime.InteropServices;
namespace GetSiteThumbnail
{
[InterfaceType(1)]
[Guid("3050F669-98B5-11CF-BB82-00AA00BDCE0B")]
public interface IHTMLElementRender2
{
void DrawToDC(IntPtr hdc);
void SetDocumentPrinter(string bstrPrinterName, ref _RemotableHandle hdc);
}
class WebPageBitmap
{
private WebBrowser webBrowser;
private string url;
private int width;
private int height;
private bool isReady;
public WebPageBitmap(string url, int width, int height, bool scrollBarsEnabled)
{
this.url = url;
this.width = width;
this.height = height;
webBrowser = new WebBrowser();
webBrowser.DocumentCompleted +=
new WebBrowserDocumentCompletedEventHandler(documentCompletedEventHandler);
webBrowser.Size = new Size(width, height);
webBrowser.ScrollBarsEnabled = scrollBarsEnabled;
}
public void Fetch()
{
webBrowser.Navigate(url);
while (webBrowser.ReadyState != WebBrowserReadyState.Complete)
{
Application.DoEvents();
}
}
private void documentCompletedEventHandler(object sender, WebBrowserDocumentCompletedEventArgs e)
{
isReady = true;
}
~WebPageBitmap()
{
/* webBrowser.Dispose();
*/
}
internal Bitmap GetBitmap(int thumbwidth, int thumbheight)
{
IHTMLDocument2 rawDoc = (IHTMLDocument2)webBrowser.Document.DomDocument;
IHTMLElement rawBody = rawDoc.body;
IHTMLElementRender2 render = (IHTMLElementRender2)rawBody;
Bitmap bitmap = new Bitmap(width, height);
Rectangle bitmapRect = new Rectangle(0, 0, width, height);
webBrowser.DrawToBitmap(bitmap, bitmapRect);
Graphics graphics = Graphics.FromImage(bitmap);
IntPtr graphicshdc = graphics.GetHdc();
render.DrawToDC(graphicshdc);
graphics.ReleaseHdc(graphicshdc);
graphics.Dispose();
return bitmap;
}
}
}
|
|
|
|
 |
|
 |
Thanks a bunch for the through investigation of the topic. This has been bugging me for wuite a while!
I'll be releasing the new code shortly so that people can make full use of it.
Cheers!
|
|
|
|
 |
|
 |
I did a rather crappy workaround But i will reduce the amount of blank thumbs.
I fetch the html with HttpWebRequest/Response
Then i run these replaces:
dt = Regex.Replace(dt, "(<head>)", String.Format("$1<base href=""{0}"" />", url))
dt = Regex.Replace(dt, "<script[^>]*>(.*?)</script>", "", RegexOptions.IgnoreCase Or RegexOptions.Singleline)
dt = Regex.Replace(dt, "(< noscript>)", "", RegexOptions.IgnoreCase)
dt = Regex.Replace(dt, "(< / noscript>)", "", RegexOptions.IgnoreCase)
Above inject a <base href="..."> and remove <script> blocks and < noscript> tags.
Instead of doing a WebBrowser.Navigate I set the WebBrowser.DocumentText
This will propably raise script errors so set the WebBrowser.ScriptErrorsSuppressed to True. This you can avoid by removing tag events attributes.
It suck but will give u shot of google, yahoo
-- modified at 2:47 Monday 15th October, 2007
-- modified at 2:47 Monday 15th October, 2007
//casper
|
|
|
|
 |