最近又懒又忙,又不想动,好久没更新了。其实我更想写一点日记,但是又总觉得无从下笔,不过我也不知道怎么有时间来弄这个乱七八糟的东西的。
写这种毫无营养的爬虫记录真是比写论文简单多了。【也有意思多了】
查询网站在这里:主机掌中宝,可以查到switch数字版游戏在不同地区的低价折扣,但是并不一定能满足每个人的需求。比如我,就想看看那些原价高,折扣力度大的游戏,所以我重拾自己那一点点爬虫技能。
一、准备数据库
首先建一个mysql数据库,名字就叫ns吧,然后创建一个数据表,名字就叫ns_discount,创建表直接用sql语句就行
CREATE TABLE IF NOT EXISTS `ns_discount`( `GAME_ID` INT NOT NULL AUTO_INCREMENT, `GAME` VARCHAR(100), `GAME_EN` VARCHAR(100), `HIT_COUNT` INT, `ID` INT, `IMAGE` VARCHAR(200), `IMAGE_M` VARCHAR(200), `PRICE` FLOAT, `REGION_NAME` VARCHAR(8), `SALE` INT, `TAGS` VARCHAR(20), PRIMARY KEY (`GAME_ID`) );
二、准备爬虫
用scrapy新建一个项目,定位到里面新建一个爬虫
scrapy startproject ns cd ns scrapy genspider nsp eshop-switch.com
要改的地方只有四个,其他的用框架自带的就行
1 修改nsp.py
import scrapy
from scrapy.http import FormRequest
from ns.items import NsItem
class NspSpider(scrapy.Spider):
name = 'nsp'
allowed_domains = ['eshop-switch.com']
def start_requests(self):
url = 'http://www.eshop-switch.com/game/queryGame'
for page in range(1,233):
data = {
'current_page': str(page),
'order_by': '0',
'search': '',
'tag': '',
'page_size': '24',
}
req = FormRequest(url, formdata=data, callback=self.parse_page)
yield req
def parse_page(self, response):
sale_list = response.json()['list']
for i in sale_list:
item = NsItem()
for key in ['SALE','HIT_COUNT','GAME','IMAGE','REGION_NAME','PRICE','GAME_EN','IMAGE_M','ID','TAGS']:
try:
item[key] = i[key]
except:
pass
yield item
2 修改item.py
import scrapy
class NsItem(scrapy.Item):
SALE = scrapy.Field()
HIT_COUNT = scrapy.Field()
GAME = scrapy.Field()
IMAGE = scrapy.Field()
REGION_NAME = scrapy.Field()
PRICE = scrapy.Field()
GAME_EN = scrapy.Field()
IMAGE_M = scrapy.Field()
ID = scrapy.Field()
TAGS = scrapy.Field()
3 修改pipline.py
from itemadapter import ItemAdapter
import pymysql
class NsPipeline:
def __init__(self):
self.connection = pymysql.connect(
host='127.0.0.1',
user='root',
password='hahahahahaha',
db='ns',
charset='utf8mb4',
)
self.cursor = self.connection.cursor()
def process_item(self, item, spider):
columns = ', '.join(item.keys())
values = ', '.join(['\'{}\''.format(str(x).replace('\'','')) for x in item.values()])
insert_sql = "INSERT INTO ns_discount({}) VALUES ({})".format(columns, values)
self.cursor.execute(insert_sql)
self.connection.commit()
return item
def close_spider(self, spider):
self.connection.close()
4 在settings里启用pipeline
然后运行爬虫就会爬到数据库里咯
三、查询结果
最后运行一下查询语句,看看有没有原价200以上,现价50以下的:
SELECT n.id,n.`GAME`,n.`PRICE`,n.`SALE`,round(n.`PRICE`/(1-n.`SALE`/100),2) AS origin,n.`REGION_NAME`,n.`TAGS` FROM `ns_discount` AS n WHERE N.`PRICE`<50 and round(n.`PRICE`/(1-n.`SALE`/100),2) >200 ORDER BY origin DESC, n.`SALE` DESC;




Comments | NOTHING