Click here to Skip to main content
15,883,812 members
Please Sign up or sign in to vote.
0.00/5 (No votes)
See more:
I have a for loop that in each iteration goes through a list of url and grabs the <title> tag of each one and then writes it to a file. here is the code:
Python
#!/usr/bin/env python
import os
import requests
from bs4 import BeautifulSoup
import time, random
from concurrent.futures import ProcessPoolExecutor, as_completed
from concurrent.futures import ThreadPoolExecutor
from concurrent.futures import Future 
import concurrent.futures
from threading import Thread
import asyncio
#import GRequests


class unu:
    def get_title(self):
        
        #print("Wut")
        with open("ip_scanate.txt") as f:
                
                
            for line in f:

                
                line2 = line.rstrip('\r\n')
                try:
                    
                    rechi = requests.get("http://"+ line2 + ":8080", verify = False, timeout = 1)
                    print("Connection Succesful! " + line2)
                    print(rechi.status_code)
                    print(" ")
                    con = BeautifulSoup(rechi.content, 'html.parser')
                    title = con.title
                    
                    
                    
                except:
                    print("TIMED OUT " + line2)
                    print(" ")
                    continue


                g = open('hosts_final2.txt','a')
                try:
                    #if title.string == "Linksys Smart Wi-Fi":
                    #if title.string == "Tracer Synchrony":
                    
                    print(title)
                    g.write(title.string + " " + line2 + "\n")
                    #g.write(title.string + " " + line2 + "\n")
                    #else:
                    print("")
                except AttributeError:
                    print("empty source")


def multiprocessing_funct():

    obj = unu()
    #obj.get_title()
    executor = concurrent.futures.ProcessPoolExecutor(10)
    future = executor.submit(obj.get_title)
    #concurrent.futures.wait(futures)
    
    
    #for task in as_completed(processes):
        #obj.write_file(self)
    

            
multiprocessing_funct()


What I have tried:

I have tried this code but it does not work. I would like to know of some solutions to this problem because I've been trying different things these days but nothing worked.
Posted
Updated 8-Jun-20 3:36am
v2
Comments
Patrice T 7-Jun-20 22:56pm    
elaborate "does not work".

1 solution

You need to be aware that threading is not a "magic bullet" that will solve all your performance woes at a stroke - it needs to be carefully thought about and planned, or it can do two things:
1) Slow your machine to a crawl, and make your application considerably slower than it started out.
2) Crash or lock up your app completely.

The reasons why are simple:
1) Threads require two things to run: memory and a free core. The memory will be at the very least the size of a system stack in your language (usually around 1MB for Windows, 8MB for Linux) plus some overhead for the thread itself and yet more for any memory based objects each thread creates; and a thread can only run when a core becomes available. If you generate more threads than you have cores then most of them will spend a lot of time sitting waiting for a core to be available.
The more threads you generate, the worse problems become: more threads puts more load on the system to switch threads more often and that takes core time as well. All threads ion the system form all processes share the cores in the machine, so other apps and System threads also need their time to run. Add too many, and the system will spend more and more of it's time trying to work out which thread to run and performance degrades. Generate enough threads to exceed the physical memory in your computer and performance suddenly takes an enormous hit as the virtual memory system comes in and starts threashing memory pages to the HDD.

2) Multiple threads within a process have to be thread safe because they share memory and other resources - which means that several things can happen:
2a) If two threads need the same resource then you can easily end up in a situation where thread A has locked resource X and wants Y, while thread B has locked resource Y and wants X. At this point a "deadly embrace" has occurred and no other thread (nor any other that need X or Y can run ever again.
2b) If your code isn't thread safe, then different threads can try to read and / or alter the same memory at the same time: this often happens when trying to add or remove items from a collection. At this point strange things start to happen up to and including your app crashing.
2c) If resources have a finite capacity - like the bandwidth on an internet connection for example - then bad threading can easily use it all - at either end of the link. If you run out of capacity, your threads will stall waiting for it (and everybody else uing the connection will also suffer). If the other end runs out of capacity it may stutter, slow down, crash, or assume that you are a DDOS attack and take precautions.

You can't just go "multithread this" and assume it will work: it's somethgin that needs very, very careful planning.
Think about it this way: if you have a very large bus it is a slow way to get from A to B, but when you average it out over the large number of passengers it's pretty quick. But if you put each passenger in a separate car in theory they can all get there faster - except you are putting a lot more vehicles on the same roads which means more chance of traffic jams, accidents, breakdowns, and so forth. Put too many on the same roads and they get blocked up with cars and nobody can move anywhere because there is a car in their way ...
 
Share this answer
 
v2

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)



CodeProject, 20 Bay Street, 11th Floor Toronto, Ontario, Canada M5J 2N8 +1 (416) 849-8900