目录
1. 打开百度翻译网页,查找翻译结果的网络资源包
2. 获取翻译结果网络资源包的url、请求头、请求体,解析json文件数据
3. 观察请求体字段,发现 query 字段便是我们输入的需要翻译的值
4. ctrl F 快捷键搜索sign值的网络资源包,并逐一检查
5. 拿到与请求体的结构高度相似的资源包,通过资源包查看 js 代码
6. 打上断点,网页重新进行翻译操作
7. 进入生成sing值的 b 函数
8. 新建 GetSign.js 文件,将生成 sign 的相关函数复制进去
9. 下载能够调用 js 代码的包——PyExecJS,并测试能否得到 sign 值
10. python调用获取 sign值 的 js代码,并设计循环输入翻译
示例:通过爬虫,获取百度翻译结果的数据包,实现翻译程序
注:本代码仅供学习使用,无任何商业用途
1. 打开百度翻译网页,查找翻译结果的网络资源包
示例输入:你好世界
2. 获取翻译结果网络资源包的url、请求头、请求体,解析json文件数据
import requests
import json
url = 'https://fanyi.baidu.com/v2transapi?from=zh&to=en'
head = {
'Accept': '*/*',
'Accept-Encoding': 'gzip, deflate, br',
'Accept-Language': 'zh-CN,zh;q=0.9',
'Acs-Token': '1681821543479_1681821547074_cdT+RJYeqGduG6BGb1EQqooYQeVUGl4fWWVaRRFLLS9VMj1oUeo7+LGHskLILBwXG5ixw+3TmSgPA3eXigF/SZJvUY7ZBCSG2bHqPTeCE5vksblcieD5l2xmK/SK5Mg2mx63+EqIsrlVUBsgRRzhmcvKeRiHzQ6mw8CN03KrdjAGQASyEwH418TOl+mU7a9cI+wDmoPRhTJ+nATiAn2OoJlCHDpvmWN+7D92f7EQmVuBI0N8j+fs+vJ8rvT6S+o9ToY0IJQzl0U/leQ/qAv0jFStvp+I6dZixZdh6aCzin9sCVKdEGgt2/LBmFE1USpx3IJBnOzRq+LE7DWZu6mm5gAtWbyI2bLvWKeFA2G9+2Oz0iYR5fqrT8+jPjyB8FbJsviypmeSFBP8AhymqOQUJS9eXYCpMBVyFiv9H2zaONxZ7pYhG4yzhWEja/wLZSgKWbb32C2wTbHPwUd7AUfsAbOilLvs7hrwW6i9V7Pp6rc=',
'Connection': 'keep-alive',
'Content-Length': '165',
'Content-Type': 'application/x-www-form-urlencoded; charset=UTF-8',
'Cookie': 'BAIDUID=0647EC4156FC828CCBB17B759524E3CB:FG=1; APPGUIDE_10_0_2=1; REALTIME_TRANS_SWITCH=1; FANYI_WORD_SWITCH=1; HISTORY_SWITCH=1; SOUND_SPD_SWITCH=1; SOUND_PREFER_SWITCH=1; BIDUPSID=0647EC4156FC828CCBB17B759524E3CB; PSTM=1677378660; BAIDUID_BFESS=0647EC4156FC828CCBB17B759524E3CB:FG=1; BA_HECTOR=aha124a58g81a481a184249r1i3sgie1n; ZFY=jo6gNdVtZQsuvZ3s6FJsXdFdp7ChS0d1062AcKWdkuE:C; H_PS_PSSID=38516_36547_38469_38368_38468_38486_37931_26350_22157; delPer=0; PSINO=5; BDORZ=B490B5EBF6F3CD402E515D22BCDA1598; BCLID=7777928453825166006; BCLID_BFESS=7777928453825166006; BDSFRCVID=4A0OJexroG07VWbfhAjSrpnZrLweG7bTDYrEOwXPsp3LGJLVFe3JEG0Pts1-dEu-S2OOogKKKgOTHICF_2uxOjjg8UtVJeC6EG0Ptf8g0M5; BDSFRCVID_BFESS=4A0OJexroG07VWbfhAjSrpnZrLweG7bTDYrEOwXPsp3LGJLVFe3JEG0Pts1-dEu-S2OOogKKKgOTHICF_2uxOjjg8UtVJeC6EG0Ptf8g0M5; H_BDCLCKID_SF=tRAOoC_-tDvDqTrP-trf5DCShUFstU_jB2Q-XPoO3KtbSx3Pb47NMltXKMTzbf7f5mkf3fbgy4op8P3y0bb2DUA1y4vp0tLeWeTxoUJ2-KDVeh5Gqq-KXU4ebPRiB-b9QgbA5hQ7tt5W8ncFbT7l5hKpbt-q0x-jLTnhVn0MBCK0HPonHjD5j6vL3D; H_BDCLCKID_SF_BFESS=tRAOoC_-tDvDqTrP-trf5DCShUFstU_jB2Q-XPoO3KtbSx3Pb47NMltXKMTzbf7f5mkf3fbgy4op8P3y0bb2DUA1y4vp0tLeWeTxoUJ2-KDVeh5Gqq-KXU4ebPRiB-b9QgbA5hQ7tt5W8ncFbT7l5hKpbt-q0x-jLTnhVn0MBCK0HPonHjD5j6vL3D; Hm_lvt_64ecd82404c51e03dc91cb9e8c025574=1679997333,1681233344,1681821530; Hm_lpvt_64ecd82404c51e03dc91cb9e8c025574=1681821542; ab_sr=1.0.1_MWRjYzE0MDliYTg3MWQ4ZWYxNGQwNDgwNDkwNzk5YTkxNmZhYzE4MWUyM2I2NWVlY2Y2NmI4MjM4NTk4Mjk2ZGE1NmFhM2IxYzUxZDczZTAyZWQ1NTc3YzQ2M2Q5YzZiZmY5YmIyMjI5YjU5MTViNjE4MmVjZjBhYjYyZmIxNDc0NTM5YzI0OWNiMjVjZDQzOWU1ZDNmOTU5MTk0ZGZkYw==',
'Host': 'fanyi.baidu.com',
'Origin': 'https://fanyi.baidu.com',
'Referer': 'https://fanyi.baidu.com/translate?aldtype=16047&query=&keyfrom=baidu&smartresult=dict&lang=auto2zh',
'sec-ch-ua': '"Chromium";v="112", "Google Chrome";v="112", "Not:A-Brand";v="99"',
'sec-ch-ua-mobile': '?0',
'sec-ch-ua-platform': '"Windows"',
'Sec-Fetch-Dest': 'empty',
'Sec-Fetch-Mode': 'cors',
'Sec-Fetch-Site': 'same-origin',
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/112.0.0.0 Safari/537.36',
'X-Requested-With': 'XMLHttpRequest'
}
data = {
'from': 'zh',
'to': 'en',
'query': '你好世界',
'transtype': 'realtime',
'simple_means_flag': 3,
'sign': '1265.321472',
'token': '29bebf2fa756361e24b224d2c2b0151f',
'domain': 'common'
}
response = requests.post(url, headers=head, data=data)
print(response.text)
print(response.status_code)
res = json.loads(response.text)
val = res['trans_result']['data'][0]['dst']
print(val)
3. 观察请求体字段,发现 query 字段便是我们输入的需要翻译的值
修改 query 字段里面的值,进行翻译验证
出现报错信息,我们重新在网页进行翻译验证,观察并比较请求体信息
观察可得,随着 query 的改变,sign 也一起改变了,sign 的值看起来并无规律,所以我们接下来逆向解析js代码,获取sign
4. ctrl F 快捷键搜索sign值的网络资源包,并逐一检查
5. 拿到与请求体的结构高度相似的资源包,通过资源包查看 js 代码
在资源包响应体response内部点击右键,选择进入源代码,搜索sign并找到第17个sign处,发现sign的值和函数 b 以及参数 i 有关,并且参数 i 的值就是 query 的值
6. 打上断点,网页重新进行翻译操作
停到断点处之后,点击右上角箭头,调试下一步,进入函数
7. 进入生成sing值的 b 函数
b 函数的代码实现原理是什么我们不关心,我们需要一步步调试,找到和 b 函数相关的函数有哪些,然后一起复制进一个新建的 js 文件
8. 新建 GetSign.js 文件,将生成 sign 的相关函数复制进去
通过调试可得,相关的函数还有一个 n 函数,将 n 函数与 b 函数一起复制进入GetSign.js里面
观察其他 n 函数代码格式,修改函数的命名格式
9. 下载能够调用 js 代码的包——PyExecJS,并测试能否得到 sign 值
# pip install PyExecJS
import execjs
with open("GetSign.js", 'r', encoding='utf-8') as f:
ctx = execjs.compile(f.read())
sign = ctx.call('GetSign', '我的世界') # 函数名 + 参数
print(sign)
运行报错 window 未定义
在前端 js 代码查找window,发现 b 函数内的 window 与 r 值相关,而且是对 r 赋值
将 b 函数调试结束,我们可以得到 r 的值, 我们直接对 r 赋值
成功得到 sign 值
10. python调用获取 sign值 的 js代码,并设计循环输入翻译
import requests
import json
import execjs
url = 'https://fanyi.baidu.com/v2transapi?from=zh&to=en'
head = {
'Accept': '*/*',
'Accept-Encoding': 'gzip, deflate, br',
'Accept-Language': 'zh-CN,zh;q=0.9',
'Acs-Token': '1681821543479_1681821547074_cdT+RJYeqGduG6BGb1EQqooYQeVUGl4fWWVaRRFLLS9VMj1oUeo7+LGHskLILBwXG5ixw+3TmSgPA3eXigF/SZJvUY7ZBCSG2bHqPTeCE5vksblcieD5l2xmK/SK5Mg2mx63+EqIsrlVUBsgRRzhmcvKeRiHzQ6mw8CN03KrdjAGQASyEwH418TOl+mU7a9cI+wDmoPRhTJ+nATiAn2OoJlCHDpvmWN+7D92f7EQmVuBI0N8j+fs+vJ8rvT6S+o9ToY0IJQzl0U/leQ/qAv0jFStvp+I6dZixZdh6aCzin9sCVKdEGgt2/LBmFE1USpx3IJBnOzRq+LE7DWZu6mm5gAtWbyI2bLvWKeFA2G9+2Oz0iYR5fqrT8+jPjyB8FbJsviypmeSFBP8AhymqOQUJS9eXYCpMBVyFiv9H2zaONxZ7pYhG4yzhWEja/wLZSgKWbb32C2wTbHPwUd7AUfsAbOilLvs7hrwW6i9V7Pp6rc=',
'Connection': 'keep-alive',
'Content-Length': '165',
'Content-Type': 'application/x-www-form-urlencoded; charset=UTF-8',
'Cookie': 'BAIDUID=0647EC4156FC828CCBB17B759524E3CB:FG=1; APPGUIDE_10_0_2=1; REALTIME_TRANS_SWITCH=1; FANYI_WORD_SWITCH=1; HISTORY_SWITCH=1; SOUND_SPD_SWITCH=1; SOUND_PREFER_SWITCH=1; BIDUPSID=0647EC4156FC828CCBB17B759524E3CB; PSTM=1677378660; BAIDUID_BFESS=0647EC4156FC828CCBB17B759524E3CB:FG=1; BA_HECTOR=aha124a58g81a481a184249r1i3sgie1n; ZFY=jo6gNdVtZQsuvZ3s6FJsXdFdp7ChS0d1062AcKWdkuE:C; H_PS_PSSID=38516_36547_38469_38368_38468_38486_37931_26350_22157; delPer=0; PSINO=5; BDORZ=B490B5EBF6F3CD402E515D22BCDA1598; BCLID=7777928453825166006; BCLID_BFESS=7777928453825166006; BDSFRCVID=4A0OJexroG07VWbfhAjSrpnZrLweG7bTDYrEOwXPsp3LGJLVFe3JEG0Pts1-dEu-S2OOogKKKgOTHICF_2uxOjjg8UtVJeC6EG0Ptf8g0M5; BDSFRCVID_BFESS=4A0OJexroG07VWbfhAjSrpnZrLweG7bTDYrEOwXPsp3LGJLVFe3JEG0Pts1-dEu-S2OOogKKKgOTHICF_2uxOjjg8UtVJeC6EG0Ptf8g0M5; H_BDCLCKID_SF=tRAOoC_-tDvDqTrP-trf5DCShUFstU_jB2Q-XPoO3KtbSx3Pb47NMltXKMTzbf7f5mkf3fbgy4op8P3y0bb2DUA1y4vp0tLeWeTxoUJ2-KDVeh5Gqq-KXU4ebPRiB-b9QgbA5hQ7tt5W8ncFbT7l5hKpbt-q0x-jLTnhVn0MBCK0HPonHjD5j6vL3D; H_BDCLCKID_SF_BFESS=tRAOoC_-tDvDqTrP-trf5DCShUFstU_jB2Q-XPoO3KtbSx3Pb47NMltXKMTzbf7f5mkf3fbgy4op8P3y0bb2DUA1y4vp0tLeWeTxoUJ2-KDVeh5Gqq-KXU4ebPRiB-b9QgbA5hQ7tt5W8ncFbT7l5hKpbt-q0x-jLTnhVn0MBCK0HPonHjD5j6vL3D; Hm_lvt_64ecd82404c51e03dc91cb9e8c025574=1679997333,1681233344,1681821530; Hm_lpvt_64ecd82404c51e03dc91cb9e8c025574=1681821542; ab_sr=1.0.1_MWRjYzE0MDliYTg3MWQ4ZWYxNGQwNDgwNDkwNzk5YTkxNmZhYzE4MWUyM2I2NWVlY2Y2NmI4MjM4NTk4Mjk2ZGE1NmFhM2IxYzUxZDczZTAyZWQ1NTc3YzQ2M2Q5YzZiZmY5YmIyMjI5YjU5MTViNjE4MmVjZjBhYjYyZmIxNDc0NTM5YzI0OWNiMjVjZDQzOWU1ZDNmOTU5MTk0ZGZkYw==',
'Host': 'fanyi.baidu.com',
'Origin': 'https://fanyi.baidu.com',
'Referer': 'https://fanyi.baidu.com/translate?aldtype=16047&query=&keyfrom=baidu&smartresult=dict&lang=auto2zh',
'sec-ch-ua': '"Chromium";v="112", "Google Chrome";v="112", "Not:A-Brand";v="99"',
'sec-ch-ua-mobile': '?0',
'sec-ch-ua-platform': '"Windows"',
'Sec-Fetch-Dest': 'empty',
'Sec-Fetch-Mode': 'cors',
'Sec-Fetch-Site': 'same-origin',
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/112.0.0.0 Safari/537.36',
'X-Requested-With': 'XMLHttpRequest'
}
while True:
with open("GetSign.js", "r", encoding="utf-8") as f:
Source = input('请输入被翻译文本:')
ctx = execjs.compile(f.read())
sign = ctx.call("GetSign", Source)
data = {
'from': 'zh',
'to': 'en',
'query': Source,
'transtype': 'realtime',
'simple_means_flag': 3,
'sign': sign,
'token': '29bebf2fa756361e24b224d2c2b0151f',
'domain': 'common'
}
response = requests.post(url, headers=head, data=data)
# print(response.text)
# print(response.status_code)
res = json.loads(response.text)
val = res['trans_result']['data'][0]['dst']
print(val)