Click here to Skip to main content
15,662,823 members
Please Sign up or sign in to vote.
3.00/5 (2 votes)
See more:
I am doing project in text summarization. Genetic algorithm can be used for summarization.

The project is about; News can be taken as input and it can be pre-processed. For each sentences in the text a score can be given in that pre-processed text. I finished till that part. In the sentence score I can apply GA and want to have summarized single document. I have no idea about how GA can apply to the text. if anybody knows please explain with example with text not with gene.

I know that in GA, first fitness can be calculated. Then i don't know how to apply mutation, crossover concept in the text. If you have program please help me...
Updated 24-Feb-11 15:26pm
Orcun Iyigun 24-Feb-11 20:55pm    
what have you done so far?
Orcun Iyigun 24-Feb-11 21:22pm    
Updated for readability.

1 solution

Well, this is not a simple question... (Since the field you dig into is still very much research).

I would recommend:

1. read current articles about automatic text summarization and application of genetic algorithms

For this you can search on or e.g. conference proceedings of AI conferences such as ECAI or IJCAI.

2. Understand the Data:

I.e. you have a sentence where you model some of the Information into a sequence of values, which then "kind of" form your "gene-sequence". The problem here is to find significant information from which can be concluded, that a specific word needs to be in the summary, e.g. the word's frequency in the text (but careful here: stopwords like "no", "yes", etc. should be filtered here) otherwise you will need so much data to train you model, that it will simply be unfeasible.

3. Check on Standard preprocessing methods in Computer Linguistics, such as stemming, stop-word filtering, Taggers for phrase types etc. These will all reduce your search space and simplify your search for the "gene"-sequence ;)

4. To me it seems to be an extremely ambitious project (but I'm out of the field for a few years now...)

Hope this helps a bit,
Cheers, Arndt
Share this answer

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

CodeProject, 20 Bay Street, 11th Floor Toronto, Ontario, Canada M5J 2N8 +1 (416) 849-8900