Why for loops are taking to much time to execute in following program

Question

1.00/5 (2 votes)

See more:

in this program, i have given wikipeadia URL for text extraction logic but after extraction of text "for loops" are taking to much time to execute.
the same logic too fast in python program.

how to reduces execution time ?

import java.io.IOException;
import java.net.URL;
import java.util.Scanner;
import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class TextExtraction1 
{
	static TextExtraction1 fj;
	public String toHtmlString(String url) throws IOException 
	{
		StringBuilder sb = new StringBuilder();
		   for(Scanner sc = new Scanner(new URL(url).openStream()); sc.hasNext(); )
		      sb.append(sc.nextLine()).append('\n');
		   return sb.toString();
	}
	
	static int search(String key,String target)
	{
		int count=0;
		Pattern p=Pattern.compile(key);
		Matcher m=p.matcher(target);
		while(m.find()){count++;}
		return count;
	} 

	String extractText(String s) throws IOException
	{
				 
		String h1 = fj.toHtmlString(s); 
        System.out.println("extracted \n\n");
        int i2=0;
        String h2[] = h1.split("\n");
        String html="";
        long start = System.currentTimeMillis();
        
        for(String h3:h2)
        {	//bw.write(h3);bw.newLine();
        		html += h3;
                html += ""; //iu=iu+1;               	
        }
        long end = System.currentTimeMillis();
        System.out.println(++i2+" th loop end in "+(end-start)/1000+" seconds");
        boolean capture = true;
        String filtered_text = "";
        
        String html_text[] = html.split("<");
        String h_text[];//System.out.println("kyhe1");
        
        
        start = System.currentTimeMillis();
        for(String h:html_text)
        {
        	h = "<" + h;
        	h_text = h.split(">");
        	for(String w :h_text)
        	{
        		if(w.length()>0)	{	if(w.substring(0, 1).equals("<")){w +=">";}	}
        		if(search("</script>",w)>0){capture=true;}
        		else if(search("<script",w)>0){capture=false;}
        		else if(capture){filtered_text += w;     filtered_text += "\n";}
        	}
        }
       // System.out.println("kyhe1");
        end = System.currentTimeMillis();
        html_text = filtered_text.split("\n");
        
        System.out.println(++i2+" th loop end in "+(end-start)/1000+" seconds");
        return html_text[0];
	}
	
		
	public static void main(String []args)throws IOException 
	{
		fj = new TextExtraction1();
		System.out.println(fj.extractText("https://en.wikipedia.org/wiki/Varanasi"));
	}
}

Same python code is too fast

import urllib2
import re
import sys
def get_text(f1):                #(f1)
    h1 = f1.read()        #f1.read()
    html = ''                # h3 is a string 
    h2 = h1.split('\n')
    f= open("guru99.txt","w+")
    
    for h3 in h2:
        html += h3
        html += ' '
        
           
    capture = True
    filtered_text = ''
    html_text = html.split('<')
   
    i=0
    for h in html_text:
        h = '<' + h
        h_text = h.split('>')
        
        for w in h_text:           
            if w:
                if w[0] == '<':
                    w += '>'
                    
            if re.search(r'</script>', w):
                capture = True                
            elif re.search(r'<script', w):
                capture = False                
            else:
                if capture:
                    filtered_text += w
                    filtered_text += '\n'
   
def get_url_text(url):
    
    try :
        f = urllib2.urlopen(url)
    except (urllib2.HTTPError,urllib2.URLError) :
        return '\n'
    else:
        return get_text(f)
def main():
    get_url_text(sys.argv[1])
if __name__ == "__main__": main()

What I have tried:

i just converted "for loop" into while loop

String h3="";int i3=0;
        while(i3<h2.length)
        {	//bw.write(h3);bw.newLine();
        		h3=h2[i3];
        		html += h3;
                html += "";i3++; //iu=iu+1;               	
        }

Posted 25-Jan-17 1:03am

sanjay gupta

Updated 25-Jan-17 9:18am

Add a Solution

2 solutions

Add a Solution

Add your solution here

Treat my content as plain text, not as HTML

Preview 0

…

Existing Members

Sign in to your account

...or Join us

Download, Vote, Comment, Publish.

Your Email
Password
Forgot your password?

Your Email
This email is in use. Do you need your password?
Optional Password

I have read and agree to the Terms of Service and Privacy Policy
Please subscribe me to the CodeProject newsletters

When answering a question please:

Read the question carefully.
Understand that English isn't everyone's first language so be lenient of bad spelling and grammar.
If a question is poorly phrased then either ask for clarification, ignore it, or edit the question and fix the problem. Insults are not welcome.
Don't tell someone to read the manual. Chances are they have and don't get it. Provide an answer or move on to the next question.

Let's work to help developers, not make them feel stupid.

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

Jochen Arndt · Answer 1 · 2017-01-25T02:09:00

Solution 1

You should try to optimise the Java code.

The best optimisation can be achieved by avoiding dynamic object creation inside loops.

An example:

PHP

# PHP
if w:
    if w[0] == '<':
        w += '>'

Java

// Java
if(w.length()>0)
{	
    if(w.substring(0, 1).equals("<"))
    {
        w +=">";
    }	
}

Her substring will create a new string dynamically and perform a string comparison.
Why not just use String.charAt() and perform a character comparison?

Java

if(w.length()>0)
{	
    if(w.charAt(0) == '<')
    {
        w += ">";
    }	
}

Another optimisation might be using class or static members to store the used regex search Patterns. Then Pattern.compile() has not to be executed multiple times.

Posted 25-Jan-17 2:09am

Jochen Arndt

Comments

Afzaal Ahmad Zeeshan 25-Jan-17 15:51pm

Not sure, and too lazy to Google, but can't you apply indexers in Java to get the character at that index in String objects? Just curious. :-)

Jochen Arndt 26-Jan-17 2:57am

I would have to Google it too.

An indexer would be better when iterating over the characters because it avoids the bound checking. Here it would require to split the loop into two then to process characters and substrings.

Afzaal Ahmad Zeeshan 26-Jan-17 6:58am

Yup, I also looked around and found only charAt available in Java API, whereas they could write an interface that allows such possibility.

Your answer was good, and my comment was just another quick question to you only, nothing about the post. 5ed for that. :-)

Patrice T · Answer 2 · 2017-01-25T09:18:00

There is a tool that lets you know where a program spend time, its name is Profiler.
Profiling (computer programming) - Wikipedia[^]

You should try to use StringBuilder every time you have to concatenate strings.
Note that

Java

filtered_text += w;     filtered_text += "\n";

is slower than

Java

filtered_text += w + "\n";

Why for loops are taking to much time to execute in following program

2 solutions

Solution 1

Solution 3

Add your solution here

Preview 0

Existing Members

...or Join us