-
Notifications
You must be signed in to change notification settings - Fork 39
Open
Description
以下是settings.py的所有配置,不确定是我配置的有问题,还是其他地方出了问题,只要运行起来,刚开始可以爬到一些代理IP并存入数据库里,然后就开始自动进入我自己的爬虫程序。
BOT_NAME = 'CommoditySpider'
SPIDER_MODULES = ['CommoditySpider.spiders']
NEWSPIDER_MODULE = 'CommoditySpider.spiders'
USER_AGENT = 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/69.0.3497.81 Safari/537.36'
ROBOTSTXT_OBEY = True
DOWNLOAD_DELAY = 2
COOKIES_ENABLED = False
DEFAULT_REQUEST_HEADERS = {
'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8',
'Accept-Language': 'zh-CN,zh;q=0.8',
'Accept-Encoding': 'gzip, deflate, br',
'Content-Type': 'text/html;charset=UTF-8',
'Cache-Control': 'no-cache',
}
ITEM_PIPELINES = {
'CommoditySpider.aliexpresslines.pipelines.AliExpressPipeline': 300
}
AUTOTHROTTLE_ENABLED = True
AUTOTHROTTLE_START_DELAY = 3
AUTOTHROTTLE_MAX_DELAY = 60
AUTOTHROTTLE_TARGET_CONCURRENCY = 1.0
HTTPCACHE_ENABLED = True
HTTPCACHE_EXPIRATION_SECS = 0
HTTPCACHE_DIR = 'httpcache'
HTTPCACHE_IGNORE_HTTP_CODES = []
HTTPCACHE_STORAGE = 'scrapy.extensions.httpcache.FilesystemCacheStorage'
DOWNLOADER_MIDDLEWARES = {
# 第二行的填写规则
# yourproject.myMiddlewares(文件名).middleware类
# 设置 User-Agent
'scrapy.contrib.downloadermiddleware.useragent.UserAgentMiddleware': 400,
}
# 默认使用 IP 代理池
if IF_USE_PROXY:
DOWNLOADER_MIDDLEWARES = {
# 第二行的填写规则
# yourproject.myMiddlewares(文件名).middleware类
# 设置 User-Agent
'scrapy.contrib.downloadermiddleware.useragent.UserAgentMiddleware': None,
'proxyPool.scrapy.RandomUserAgentMiddleware.RandomUserAgentMiddleware': 400,
# 设置代理
'scrapy.contrib.downloadermiddleware.httpproxy.HttpProxyMiddleware': None,
'proxyPool.scrapy.middlewares.ProxyMiddleware': 100,
# 设置自定义捕获异常中间层
'proxyPool.scrapy.middlewares.CatchExceptionMiddleware': 105,
# 设置自定义重连中间件
'scrapy.contrib.downloadermiddleware.retry.RetryMiddleware': None,
'proxyPool.scrapy.middlewares.RetryMiddleware': 95,
}```
Metadata
Metadata
Assignees
Labels
No labels

