之前在写一个爬虫的时候,模拟浏览器的一个POST请求,老是给我报内部服务器错误,明明在浏览器中是可以的。

Plain text
Copy to clipboard
Open code in new window
EnlighterJS 3 Syntax Highlighter
# [Input]
import requests
url = 'https://cmis.cicpa.org.cn/publicQuery/getCpaListByPage'
data = {"OFF_NAME":"",
"ASC_GUID":"0000010f-8496-8440-e06b-4f9f27a6e22a",
"PER_CODE":"100000510872",
"PER_NAME":"",
"pageNow":1,
"pageSize":10}
r = requests.post(url, data=data)
r.json()
# [Output]
{'result': 0, 'msg': '服务器内部错误!', 'info': None}
# [Input] import requests url = 'https://cmis.cicpa.org.cn/publicQuery/getCpaListByPage' data = {"OFF_NAME":"", "ASC_GUID":"0000010f-8496-8440-e06b-4f9f27a6e22a", "PER_CODE":"100000510872", "PER_NAME":"", "pageNow":1, "pageSize":10} r = requests.post(url, data=data) r.json() # [Output] {'result': 0, 'msg': '服务器内部错误!', 'info': None}
# [Input]
import requests
url = 'https://cmis.cicpa.org.cn/publicQuery/getCpaListByPage'
data = {"OFF_NAME":"",
        "ASC_GUID":"0000010f-8496-8440-e06b-4f9f27a6e22a",
        "PER_CODE":"100000510872",
        "PER_NAME":"",
        "pageNow":1,
        "pageSize":10}
r = requests.post(url, data=data)
r.json()


# [Output]
{'result': 0, 'msg': '服务器内部错误!', 'info': None}

后来发现这个是RequestPayload,应该是要放到Body里的,如图

Requests

requests有个讨巧的写法,就是把data=直接改成json=:

Plain text
Copy to clipboard
Open code in new window
EnlighterJS 3 Syntax Highlighter
# [Input]
import requests
url = 'https://cmis.cicpa.org.cn/publicQuery/getCpaListByPage'
data = {"OFF_NAME":"",
"ASC_GUID":"0000010f-8496-8440-e06b-4f9f27a6e22a",
"PER_CODE":"100000510872",
"PER_NAME":"",
"pageNow":1,
"pageSize":10}
r = requests.post(url, json=data)
r.json()
# [Output]
{'result': 1,
'msg': 'success',
'info': {'rows': [{'ID': '0000010f-849c-6e98-8abc-aecb9e70286e',
'PER_NAME': '马永香',
'OFF_GUID': '0000010f-8496-8857-403f-a440bedd405c',
'OFF_NAME': '北京中之光会计师事务所有限责任公司'}],
'totalCount': 1,
'pageCount': 1,
'pageNow': 1,
'pageSize': 10}}
# [Input] import requests url = 'https://cmis.cicpa.org.cn/publicQuery/getCpaListByPage' data = {"OFF_NAME":"", "ASC_GUID":"0000010f-8496-8440-e06b-4f9f27a6e22a", "PER_CODE":"100000510872", "PER_NAME":"", "pageNow":1, "pageSize":10} r = requests.post(url, json=data) r.json() # [Output] {'result': 1, 'msg': 'success', 'info': {'rows': [{'ID': '0000010f-849c-6e98-8abc-aecb9e70286e', 'PER_NAME': '马永香', 'OFF_GUID': '0000010f-8496-8857-403f-a440bedd405c', 'OFF_NAME': '北京中之光会计师事务所有限责任公司'}], 'totalCount': 1, 'pageCount': 1, 'pageNow': 1, 'pageSize': 10}}
# [Input]
import requests
url = 'https://cmis.cicpa.org.cn/publicQuery/getCpaListByPage'
data = {"OFF_NAME":"",
        "ASC_GUID":"0000010f-8496-8440-e06b-4f9f27a6e22a",
        "PER_CODE":"100000510872",
        "PER_NAME":"",
        "pageNow":1,
        "pageSize":10}
r = requests.post(url, json=data)
r.json()


# [Output]
{'result': 1,
 'msg': 'success',
 'info': {'rows': [{'ID': '0000010f-849c-6e98-8abc-aecb9e70286e',
    'PER_NAME': '马永香',
    'OFF_GUID': '0000010f-8496-8857-403f-a440bedd405c',
    'OFF_NAME': '北京中之光会计师事务所有限责任公司'}],
  'totalCount': 1,
  'pageCount': 1,
  'pageNow': 1,
  'pageSize': 10}}

Scrapy

Scrapy则是要把data格式化之后放到body里

Plain text
Copy to clipboard
Open code in new window
EnlighterJS 3 Syntax Highlighter
# [Input]
fetch(scrapy.Request(url, method='POST', body=json.dumps(data), headers={'Content-Type':'application/json'}))
response.json()
# [Output]
{'result': 1,
'msg': 'success',
'info': {'rows': [{'ID': '0000010f-849c-6e98-8abc-aecb9e70286e',
'PER_NAME': '马永香',
'OFF_GUID': '0000010f-8496-8857-403f-a440bedd405c',
'OFF_NAME': '北京中之光会计师事务所有限责任公司'}],
'totalCount': 1,
'pageCount': 1,
'pageNow': 1,
'pageSize': 10}}
# [Input] fetch(scrapy.Request(url, method='POST', body=json.dumps(data), headers={'Content-Type':'application/json'})) response.json() # [Output] {'result': 1, 'msg': 'success', 'info': {'rows': [{'ID': '0000010f-849c-6e98-8abc-aecb9e70286e', 'PER_NAME': '马永香', 'OFF_GUID': '0000010f-8496-8857-403f-a440bedd405c', 'OFF_NAME': '北京中之光会计师事务所有限责任公司'}], 'totalCount': 1, 'pageCount': 1, 'pageNow': 1, 'pageSize': 10}}
# [Input]
fetch(scrapy.Request(url, method='POST', body=json.dumps(data), headers={'Content-Type':'application/json'}))
response.json()


# [Output]
{'result': 1,
 'msg': 'success',
 'info': {'rows': [{'ID': '0000010f-849c-6e98-8abc-aecb9e70286e',
    'PER_NAME': '马永香',
    'OFF_GUID': '0000010f-8496-8857-403f-a440bedd405c',
    'OFF_NAME': '北京中之光会计师事务所有限责任公司'}],
  'totalCount': 1,
  'pageCount': 1,
  'pageNow': 1,
  'pageSize': 10}}

对于我这个场景,如果不加上

Plain text
Copy to clipboard
Open code in new window
EnlighterJS 3 Syntax Highlighter
headers = {'Content-Type':'application/json'}
headers = {'Content-Type':'application/json'}
headers = {'Content-Type':'application/json'}

还是会报“内部服务器错误”,就是这个浪费了我半天时间。