Click here to Skip to main content
15,611,599 members
Please Sign up or sign in to vote.
0.00/5 (No votes)
I'm trying to remove a logo from a HTML page using 'Web Scraping' in Python. There's no issue with the code. It has been tested before, but I'm having an issue with selecting the HTML tag that contains the image source. Please guide me in selecting the tag.

What I have tried:

Here's my code. Please pay attention to Links variable -->

def Histogram(file_path, rows, columns):
   
        df = file_reading(file_path, rows)
        column1 = columns[0]
        fig = px.histogram(df, x=column1).update_xaxes(
            categoryorder='total ascending')
        hist_path="templates/boxplot.html"
        fig.write_html(hist_path)
        
        links = ['<a href="https://plotly.com/" target="_blank" data- 
        title="Produced with Plotly.js (v2.16.1)" class="modebar-btn 
        plotlyjsicon modebar-btn--logo __web-inspector-hide-shortcut__"><SVG 
        xmlns="http://www.w3.org/2000/svg" viewBox="0 0 132 132" height="1em" 
        width="1em"><defs> <style>  .cls-0{fill:#000;}  .cls-1{fill:#FFF;}  
       .cls-2{fill:#F26;}  .cls-3{fill:#D69;}  .cls-4{fill:#BAC;}  .cls- 
       5{fill:#9EF;} </style></defs> <title>plotly-logomark</title> <g 
       id="symbol">  <rect class="cls-0" x="0" y="0" width="132" height="132" 
       rx="18" ry="18"></rect>  <circle class="cls-5" cx="102" cy="30" r="6"> 
       </circle>  <circle class="cls-4" cx="78" cy="30" r="6"></circle>  
       <circle class="cls-4" cx="78" cy="54" r="6"></circle>  <circle 
       class="cls-3" cx="54" cy="30" r="6"></circle>  <circle class="cls-2" 
       cx="30" cy="30" r="6"></circle>  <circle class="cls-2" cx="30" cy="54" 
       r="6"></circle>  <path class="cls-1" d="M30,72a6,6,0,0,0- 
       6,6v24a6,6,0,0,0,12,0V78A6,6,0,0,0,30,72Z"></path>  <path class="cls- 
       1" d="M78,72a6,6,0,0,0-6,6v24a6,6,0,0,0,12,0V78A6,6,0,0,0,78,72Z"> 
       </path>  <path class="cls-1" d="M54,48a6,6,0,0,0- 
       6,6v48a6,6,0,0,0,12,0V54A6,6,0,0,0,54,48Z"></path>  <path class="cls- 
       1" d="M102,48a6,6,0,0,0-6,6v48a6,6,0,0,0,12,0V54A6,6,0,0,0,102,48Z"> 
       </path> </g></svg></a>']
        
        replace_with = ""

        with open(hist_path, 'r+') as f:
                content = f.read()
                content = content.replace(links[0], replace_with)
                f.seek(0)
                f.truncate()
                f.write(content)          
   
                return hist_path
Posted
Updated 31-Dec-22 15:45pm
Comments
Richard MacCutchan 30-Dec-22 6:10am    
"I'm having an issue with selecting the HTML tag"
You need to explain what you mean by that.
Apoorva 2022 30-Dec-22 8:06am    
I mean I want to know what portion of the HTML code that refers to the logo needs to be selected. What I copied is too big I don't think I captured it right.
Gerry Schmitz 30-Dec-22 9:11am    
"Removing" is not scraping. "There's no issuee with the code ... but I'm having an issue ...". And there is no "image"; it's using "path commands" to draw.
Dave Kreskowiak 30-Dec-22 11:38am    
And why are you doing this? If you're trying to remove that logo to use the page in your own site, that's illegal.
Apoorva 2022 30-Dec-22 21:49pm    
It's a part of my training assignment. It's not going to be used anywhere or sold to anyone. My task is to remove any one of the buttons in the page. I chose the logo as it is the most useless of all (to me). Others buttons are related to downloading the plots, zooming, etc which are the important features I must demonstrate. In fact, my instructor told me the same, that removing the logo could be illegal.

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)



CodeProject, 20 Bay Street, 11th Floor Toronto, Ontario, Canada M5J 2N8 +1 (416) 849-8900