Click here to Skip to main content
15,898,937 members
Please Sign up or sign in to vote.
0.00/5 (No votes)
See more:
Python
class YinPin(CrawlSpider):
    name = "yingping"
    allowed_domains = ['movie.mtime.com']

    start_urls = ['http://movie.mtime.com/' ]

    rules = (
        Rule(LinkExtractor(allow=(r'http://movie.mtime.com/\d+/$')), callback='movie_info', follow=True),

    )


    def movie_info(self, response):
        selector = Selector(response)
        movie_url = response.url  #
        number = re.compile(r'\d+')
        movie_num = int(number.search(str(movie_url)).group())
        movie_name = selector.xpath('//*[@id="db_head"]/div[2]/div/div[1]/h1/text()').extract_first()
        movie_release_time = selector.xpath('//*[@id="db_head"]/div[2]/div/div[1]/p[1]/a/text()').extract_first()
        movie_type = selector.xpath('//*[@id="db_head"]/div[2]/div/div[2]/a/text()').extract()
        if movie_type:
            movie_type_l = movie_type.pop()
        movie_type = ' '.join(movie_type)
        self.logger.info(response.url)
        item = YingpingItem(
            movie_num = movie_num,
            movie_name = movie_name,
            movie_release_time = movie_release_time,
            movie_type = movie_type,
        )
        yield item


What I have tried:

Modify settings and test one page to crawl,but these method didn't work,i don't know how to do it
Posted
Updated 4-Feb-18 17:36pm
v2

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)



CodeProject, 20 Bay Street, 11th Floor Toronto, Ontario, Canada M5J 2N8 +1 (416) 849-8900