Click here to Skip to main content
Rate this: bad
good
Please Sign up or sign in to vote.
See more: C# ASP.NET .NET4
HI friends
Im doing my final year project on Web page summarization.
I need an from u ppl... Frown | :( My project is all about summarizing the google page links and provide a short abstract of those links in the place of snippets below each link when we go for google search.
To start this ive to retrieve the contents of all those 10 links... For this i tried using web crawlers , html2txt s/w's but all ended in failure..
Please someone guide me to retrieve the contents in the web page links given by google search. Whe i used crawlers it retrieved all the contents.. like all the hyper links from that search result page.
Im talking about those 10 links alone which are returned by the google engine in return to our query... Please help me.. Still ive oly 3months to complete my project Frown | :(
 
[edit]Urgency deleted - OriginalGriff[/edit]
Posted 2-Jan-13 4:09am
Edited 2-Jan-13 4:11am
v3
Comments
OriginalGriff at 2-Jan-13 10:12am
   
Urgency deleted: It may be urgent to you, but it isn't to us. All that your stressing the urgency does is to make us think you have left it too late, and want us to do it for you. This annoys some people, and can slow a response.
 
BTW: Multiple exclamation marks are a sign of a diseased mind...
Ponscedric at 2-Jan-13 10:14am
   
thanks

1 solution

Rate this: bad
good
Please Sign up or sign in to vote.

Solution 1

Read over this: http://www.codersource.net/MicrosoftNet/CAdvanced/HTMLScreenScrapinginC.aspx[^]
 
It's a windows application and he is just getting the links on a page but it should get your headed in the right direction.
  Permalink  
Comments
Ponscedric at 2-Jan-13 10:35am
   
but sir i want to retrieve just the contents of a link. Say if we use "data mining" as our search query in google then it ll return top 10 links related to our query . I just want to retrieve the contents of those 10 links alone. not unnecesary links :(
Adam R Harris at 2-Jan-13 12:02pm
   
You are going to have to use Regular Expressions or something similar to parse the links out of the response. Then you are going to have to get that page content the same way you got it from Google.

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

  Print Answers RSS
0 Sergey Alexandrovich Kryukov 363
1 OriginalGriff 344
2 George Jonsson 248
3 Shemeemsha RA 138
4 Animesh Datta 130
0 OriginalGriff 6,179
1 Sergey Alexandrovich Kryukov 5,616
2 CPallini 4,770
3 George Jonsson 3,400
4 Gihan Liyanage 2,522


Advertise | Privacy | Mobile
Web04 | 2.8.140916.1 | Last Updated 2 Jan 2013
Copyright © CodeProject, 1999-2014
All Rights Reserved. Terms of Service
Layout: fixed | fluid

CodeProject, 503-250 Ferrand Drive Toronto Ontario, M3C 3G8 Canada +1 416-849-8900 x 100