Click here to Skip to main content
15,949,741 members
Please Sign up or sign in to vote.
0.00/5 (No votes)
See more:
I am using the JSOUP API to scrape the contents of the webpage.I am attaching the JAVA source code and html source which i have tried.
Java
Document document = Jsoup
.connect(
"http://sports.williamhill.com/bet/en-gb/betting/e/4509085/St-Patricks-v-Limerick.html")
.timeout(1000 * 1000).get();
Elements content = document.select("div.livePushContent");
Elements info = content.select("div#allMarketsTab");
Elements primerycollect = info.select("div#primaryCollectionContainer");
 
System.out.println("1..............................................");
System.out.println("No of Tab's: "
+ info.select("div#primaryCollectionContainer")
.select("div.marketHolderExpanded").size());
Elements market = primerycollect.select("table.tableData");
Elements tbody = market.select("tbody");
for (int i = 0; i < info.select("div#primaryCollectionContainer")
.select("div.marketHolderExpanded").size(); i++) {
    int a = 1;
    a+=i;
    System.out.println("Tab No"+a+": "+info.select("div#primaryCollectionContainer")
    .select("div.marketHolderExpanded").get(i)
    .select("table.tableData").select("thead").select("tr")
    .select("th[class~=leftPad title]").select("span").last().text());
}
 
Elements primerycollect1 = info.select("div#sur_collection_267");
System.out.println(".............................................."); 
System.out.println("2..............................................");
System.out.println("No of Tab's: "
+ info.select("div#sur_collection_267")
.select("div.marketHolderExpanded").size());
Elements market1 = primerycollect1.select("table.tableData");
Elements tbody1 = market1.select("tbody");
for (int i = 0; i < info.select("div#sur_collection_267")
.select("div.marketHolderExpanded").size(); i++) {
    System.out.println("Tab No"+i+++": "+info.select("div#sur_collection_267")
    .select("div.marketHolderExpanded")
    .select("table.tableData")
    .select("thead").select("tr")
    .select("th[class~=leftPad title]").select("span").last().text());
}

System.out.println("3..............................................");
System.out.println("No of Tab's: "
+ info.select("div#sur_collection_25")
.select("div#collection25").size());
Elements primerycollect2 = info.select("div#sur_collection_25");
Elements market2 = primerycollect2.select("table.tableData");
Elements tbody2 = market2.select("tbody");
for (int i = 0; i < info.select("div#sur_collection_25")
.select("div#collection25").size(); i++) {
    System.out.println("Tab No"+i+": "+info.select("div#sur_collection_25")
    .select("div.collectionContainer displayBlock displayNone")
    .select("div.marketHolderCollapsed").get(i)
    .select("table.tableData").select("thead").select("tr")
    .select("th[class~=leftPad title]").select("span").last().text());
}
 
System.out.println("..............................................");


The output i am getting is
1..............................................
No of Tab's: 10
Tab No1: Match Betting
Tab No2: Correct Score
Tab No3: Double Result
Tab No4: Draw No Bet
Tab No5: Match Handicaps
Tab No6: Double Chance
Tab No7: Both Teams To Score
Tab No8: 1st Half Betting
Tab No9: 2nd Half Betting
Tab No10: Total Match Goals Under/Over
..............................................
2..............................................
No of Tab's: 1
Tab No0: GOAL scored in the first 5 minutes? 00:00 - 04:59
3..............................................
No of Tab's: 1
Exception in thread "main" java.lang.IndexOutOfBoundsException: Index: 0, Size: 0
at java.util.ArrayList.rangeCheck(Unknown Source)
at java.util.ArrayList.get(Unknown Source)
at org.jsoup.select.Elements.get(Elements.java:523)
at com.yotechnologies.scraper.williamhill.test.Trial3.main(Trial3.java:67)

Now i am attaching the html source code
which i am not able to scrpe



Goals


Goals

Show All 30 Markets


My Actual Requirement is i want to scrape all the tab names of the page like Match Betting,Correct Score,etc. For the attached html source code i am having 52 tabs. The problem occurs in the third part of the tag as you can see in the output. It should return St Patricks To Score Both Halves ,Limerick To Score Both Halves .etc . It is returning null values. I am not able to scrape. I want to know the reason behind that. Please help me. They are updating the site frequently. If it is not having any tabs . Please inform me . i will mention the different url.
Posted
Updated 3-Jun-13 21:51pm
v3

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)



CodeProject, 20 Bay Street, 11th Floor Toronto, Ontario, Canada M5J 2N8 +1 (416) 849-8900