Click here to Skip to main content
14,734,532 members
Please Sign up or sign in to vote.
1.00/5 (3 votes)
See more:
I have tried to create a program that will download EBooks(pdf) from www.pdfdrive.net but the site first checks the availability and the condition of the pdf and then gives the pdf link.
But unfortunately I am unable to bypass the availability procedure when doing in on my C# program.

when I am using "Inspect Code" in my Mozilla Browser, it is showing me the pdf url
<div id="alternatives" class="mt-2" style="text-align: left;">

<h2 style="color: #696969;line-height: 39px;padding-top: 3px;text-align: center;" id="file-available">
                        Your download will begin in a moment.<br>If it doesn't 
                        <a class="btn btn-success btn-responsive" href="https://ssrvmmath.files.wordpress.com/2014/07/irodov-problems_in_general_physics.pdf" target="_blank" rel="nofollow" onclick="c(); ga('send', 'event', 'Download', 'download-page');">Go to PDF</a>
</h2>
<div style="font-size:12px;text-align:center;margin-bottom: 7px;">
                    hosted by ssrvmmath.files.wordpress.com.
                    <a href="/home/dmca" target="_blank" rel="_nofollow">Report</a></div>
<div style="text-align:center">
    <span class="sexy_line big"></span>
    <form class="form-inline" onsubmit="createAlert(); return false;" id="alert-form" style="margin-top:12px;">
        
        <div class="input-group" style="margin: auto;padding: 0px 12px;">
            <img src="/assets/img/pd-alerts.png" style="width:218px; height:41px; border:0; margin-right: 11px;" class="hidemobile">
            <input class="form-control" autocomplete="on" id="alert-email" placeholder="Enter your email" style="" value="" type="email">
            <span class="input-group-btn">
                <button type="submit" class="btn btn-info btn-responsive">Create Alert</button>
            </span>
        </div>
   <div class="row subscribe-options">
    <div class="col">
     <input checked="checked" id="newversion" name="newversion" style="vertical-align: middle;" type="checkbox"> Alert me when the new version of the file available.
    </div>
    <div class="col">
    <input checked="checked" id="subscribe" name="subscribe" style="vertical-align: middle;" type="checkbox"> Send me weekly top trending free books
    </div>
  </div>

</form>
            <span class="sexy_line big"></span>
            
</div>
<script>ga('send', 'event', 'Download-result', 'healthy');</script>

</div>


But when I am using "View Source Code", it is just showing

<div id="alternatives" class="mt-2" style="display:none; text-align: left;"></div>


I want to save the html code containing the url of the pdf () in a file named pip.txt

What should i do????

What I have tried:

I have first tried to load the site in the default web browser which did not wroked...
I have also tried the following piece of code...
var request = (HttpWebRequest)WebRequest.Create("https://www.pdfdrive.net/irodov-problems-in-general-physics-d24882553.html");
					request.Method = "GET";
					request.AllowAutoRedirect = false;
					request.UserAgent = "Mozilla/5.0 (Windows NT 6.1; rv:7.0.1) Gecko/20100101 Firefox/7.0.1";
					request.Headers.Add("DNT", "1");
					request.Accept = "text/html,application/xhtml+xml,application/xml";
	
					using(var response = (HttpWebResponse)request.GetResponse())
					using(var stream = response.GetResponseStream())
					using (var sr = new StreamReader(stream, Encoding.UTF8))
					{
					    responseStr = sr.ReadToEnd();
					    response.Close();
					    if (stream != null)
					        stream.Close();
					    sr.Close();
					    
					    File.WriteAllText(Environment.CurrentDirectory + "/pip.txt", e.Url.ToString() + responseStr);
					}
Posted
Updated 30-Dec-17 20:26pm
v3
Comments
Richard MacCutchan 1-Jan-18 5:28am
   
If you are the same person who posted Downloading pdf from [DELETED] programmatically[^], then please delete the duplicate account

1 solution

Because in view source the HTML isn't populated that would indicate the HTML in the console is being loaded via ajax calls. Those calls, i believe, don't trigger if you are trying to download the HTML (which is what you are doing based on your code) which would explain why the html that you download doesn't contain the link you are looking for.

If you were to look at the console you'll see that it uses an AJAX call to /ebook/broken?id= which, if successful, will load the HTML that allows you to click a download link.

It uses a session parameter in the URL to validate the URL call so unless you can figure out how they generate their session id's to spoof a legit call you are going to be stuck.

So you are using the wrong URL but given they have a session id to validate the call on their side for a broken link, you will probably not be able to accomplish what you are trying to do unless you figure out how to generate session id's that are considered valid by their system.
   
Comments
Arnav Das 1-Jan-18 1:01am
   
If I try to programmatically click the download button inside the request??? Will I be able to navigated to the PDF then????
If not is there any other process to download the PDF???
Arnav Das 1-Jan-18 11:57am
   
Thank you very much....
I do have tried with the default browser but not with the selenium one...
I will definitely try it......
Thank you once again.....

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)




CodeProject, 20 Bay Street, 11th Floor Toronto, Ontario, Canada M5J 2N8 +1 (416) 849-8900