7882 2019-01-01 19:21:26 [searchwww][scrapy.core.engine] INFO: Spider opened 7883 2019-01-01 19:21:26 [searchwww][scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min) 7884 2019-01-01 19:21:26 [searchwww][scrapy.extensions.telnet] DEBUG: Telnet console listening on 127.0.0.1:6031 7885 2019-01-01 19:21:38 [searchwww][scrapy.core.engine] DEBUG: Crawled (200) <GET https://searchwww.sec.gov/EDGARFSClient/jsp/EDGAR_MainAccess.jsp?search_text=F-1+ for&sort=Date&startDoc=101&numResults=100&isAdv=true&formType=FormF1&fromDate=mm/dd/yyyy&toDate=mm/dd/yyyy&stemming=true> (referer: None) 7886 2019-01-01 19:21:38 [searchwww][scrapy.core.engine] DEBUG: Crawled (301) <GET http://www.sec.gov/Archives/edgar/data/1747624/000121390018017885/ff12018_fitboxxholdings.htm> (referer: None) 7887 2019-01-01 19:21:38 [searchwww][scrapy.pipelines.files] WARNING: File (code: 301): Error downloading file from <GET http://www.sec.gov/Archives/edgar/data/1747624/000121390018017885/ ff12018_fitboxxholdings.htm> referred in <none></none>
from scrapy.pipelines.files import FilesPipeline from scrapy import Request
class download_pipeline(FilesPipeline):
def file_path(self, request, response=None, info=None):
return request.meta.get('filename', '')
def get_media_requests(self, item, info):
file_url = item['file_url']
meta = {'filename': item['name']}
yield Request(url=file_url, meta=meta)
这个在下载的管道里面总是报错 301 求指教
1
wellCh4n 2019-01-02 10:29:54 +08:00
被重定向了吗?
|
4
Ewig OP 这个主要是框架做的,yield 回去的,我用正常的 request 就没有问题,搞不懂
|