Click here to Skip to main content
15,886,362 members
Please Sign up or sign in to vote.
1.24/5 (4 votes)
I have decided to implement a web crawler for my CS major project.The project is focused towards adaptive search.I want the pages to be as user specific as possible and time efficiency is not much a constraint.After searching the web for few days(provided I have no prior knowledge in web mining),I realised major research on adaptive web mining reduced down to three major areas:

1)Genetic based algorithms:inspired by evolutionary biology studies.One most researched and efficient algorithm is Infospider<a href="http://cseweb.ucsd.edu/~rik/foa/l2h/foa-7-6-2.html">link</a>.

2)Ant-based algorithms:based on a model of social insect collective behaviour.

3)Machine learning based algorithms:aims at learning stastically characteristics of the linkage structure of the web.For example algorithms based on Hidden Markov model<a href="http://en.wikipedia.org/wiki/Hidden_Markov_model">link</a>.

Now since I have no prior knowledge in web mining.I am unable to decide the most apt algorithm for my project(whether genetic based or machine learning based),given I have 7-8 months to complete my project and I want the project to be as near as possible to practical applications.Also I found both fields(Machine learning and genetic algorithms) to be very intersting.So please enlighten me.
Posted
Updated 1-Oct-14 6:11am
v4
Comments
Sergey Alexandrovich Kryukov 1-Oct-14 12:11pm    
This is very interesting and advanced stuff, only the formulation of the question looks confusing: how come it can be genetic vs learning? Why genetic is not a kind of learning, but hidden Markov is? Just the note on terminology.
—SA
unknown_ 1-Oct-14 23:03pm    
By genetic algorithm I largely mean Infospiders because I did'n come aross any other.Also machine learning based algorithm are more mathematical in nature and use stastics and probability to update the index.

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)



CodeProject, 20 Bay Street, 11th Floor Toronto, Ontario, Canada M5J 2N8 +1 (416) 849-8900