可以参考开源项目:https://github.com/Sjj1024/douyin-live
桌面端的直播间项目:https://github.com/Sjj1024/LiveBox
ttwid参数
ttwid类似客户端id,即便是游客模式,也可以对页面数据进行埋点统计,通过收集ttwid下的用户行为数据,给与内容推荐和广告推荐。这个也是某节公司下的基础服务,所以生成的id,只要是某节下的服务都可以使用。
获取方式:
从直播间的html中获取
def parseLiveRoomUrl(url):
"""
解析直播的弹幕websocket地址
:param url:直播地址
:return:
"""
headers = {
'authority': 'live.douyin.com',
'accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.7',
'accept-language': 'zh-CN,zh;q=0.9,en;q=0.8,en-GB;q=0.7,en-US;q=0.6',
'cache-control': 'max-age=0',
'cookie': 'xgplayer_user_id=251959789708; passport_assist_user=Cj1YUtyK7x-Br11SPK-ckKl61u5KX_SherEuuGPYIkLjtmV3X8m3EU1BAGVoO541Sp_jwUa8lBlNmbaOQqheGkoKPOVVH42rXu6KEb9WR85pUw4_qNHfbcotEO-cml5itrJowMBlYXDaB-GDqJwNMxMElMoZUycGhzdNVAT4XxCJ_74NGImv1lQgASIBA3Iymus%3D; n_mh=nNwOatDm453msvu0tqEj4bZm3NsIprwo6zSkIjLfICk; LOGIN_STATUS=1; store-region=cn-sh; store-region-src=uid; sid_guard=b177a545374483168432b16b963f04d5%7C1697713285%7C5183999%7CMon%2C+18-Dec-2023+11%3A01%3A24+GMT; ttwid=1%7C9SEGPfK9oK2Ku60vf6jyt7h6JWbBu4N_-kwQdU-SPd8%7C1697721607%7Cc406088cffa073546db29932058720720521571b92ba67ba902a70e5aaffd5d6; odin_tt=1f738575cbcd5084c21c7172736e90f845037328a006beefec4260bf8257290e2d31b437856575c6caeccf88af429213; __live_version__=%221.1.1.6725%22; device_web_cpu_core=16; device_web_memory_size=8; live_use_vvc=%22false%22; csrf_session_id=38b68b1e672a92baa9dcb4d6fd1c5325; FORCE_LOGIN=%7B%22videoConsumedRemainSeconds%22%3A180%7D; __ac_nonce=0658d6780004b23f5d0a8; __ac_signature=_02B4Z6wo00f01Klw1CQAAIDAXxndAbr7OHypUNCAAE.WSwYKFjGSE9AfNTumbVmy1cCS8zqYTadqTl8vHoAv7RMb8THl082YemGIElJtZYhmiH-NnOx53mVMRC7MM8xuavIXc-9rE7ZEgXaA13; webcast_leading_last_show_time=1703765888956; webcast_leading_total_show_times=1; webcast_local_quality=sd; xg_device_score=7.90435294117647; live_can_add_dy_2_desktop=%221%22; msToken=sTwrsWOpxsxXsirEl0V0d0hkbGLze4faRtqNZrIZIuY8GYgo2J9a0RcrN7r_l179C9AQHmmloI94oDvV8_owiAg6zHueq7lX6TgbKBN6OZnyfvZ6OJyo2SQYawIB_g==; tt_scid=NyxJTt.vWxv79efmWAzT2ZAiLSuybiEOWF0wiVYs5KngMuBf8oz5sqzpg5XoSPmie930; pwa2=%220%7C0%7C1%7C0%22; download_guide=%223%2F20231228%2F0%22; msToken=of81bsT85wrbQ9nVOK3WZqQwwku95KW-wLfjFZOef2Orr8PRQVte27t6Mkc_9c_ROePolK97lKVG3IL5xrW6GY6mdUDB0EcBPfnm8-OAShXzlELOxBBCdiQYIjCGpQ==; IsDouyinActive=false; odin_tt=7409a7607c84ba28f27c62495a206c66926666f2bbf038c847b27817acbdbff28c3cf5930de4681d3cfd4c1139dd557e; ttwid=1%7C9SEGPfK9oK2Ku60vf6jyt7h6JWbBu4N_-kwQdU-SPd8%7C1697721607%7Cc406088cffa073546db29932058720720521571b92ba67ba902a70e5aaffd5d6',
'referer': 'https://live.douyin.com/721566130345?cover_type=&enter_from_merge=web_live&enter_method=web_card&game_name=&is_recommend=&live_type=game&more_detail=&room_id=7317569386624125734&stream_type=vertical&title_type=&web_live_tab=all',
'upgrade-insecure-requests': '1',
'user-agent': USER_AGENT
}
res = requests.get(url=url, headers=headers)
global ttwid, roomStore, liveRoomId, liveRoomTitle, live_stream_url
data = res.cookies.get_dict()
ttwid = data['ttwid']
x-bogus参数
x-bogus是一种防数据包伪造的一个参数, 又称为x伪造,主要用于反爬虫,这个是某节公司下面基础服务,这个反爬虫机制几乎用在了它所有的产品中,不过,只要是能正常使用,这些东西都是透明的,x-bogus生成算法。
a-bogus:同x-bogus,x-bogus的新版本。
获取方式:
msToken参数
msToken可以理解成Message Token,相当于每次消息请求的令牌,主要用于请求统计,这也是具有反爬虫的机制,如果相同msToken请求太多,也会被定义成恶意请求,这时候会出现验证码校验。所以我们在使用的时候,可以用uuid或者是雪花算法的id来模拟msToken,当然长度大于32位的唯一串最好。
获取方法如下:
def get_ms_token(self, randomlength=107):
"""
根据传入长度产生随机字符串
"""
random_str = ''
base_str = 'ABCDEFGHIGKLMNOPQRSTUVWXYZabcdefghigklmnopqrstuvwxyz0123456789='
length = len(base_str) - 1
for _ in range(randomlength):
random_str += base_str[random.randint(0, length)]
return random_str
signature参数
signature是签名,主要是防止数据传输的过程中,“中间人”对数据进行串改。签名一般都是公钥加密,私钥验签。参与签名的参数有url地址中的参数,还有post请求中的内容,也会参与签名。signature主要用于post或者put表单的时候进行使用,其他情况暂时未遇到。
获取方式:
def get_signature(x_ms_stub):
try:
ctx = jsengine.jsengine()
js_dom = f"""
document = {{}}
window = {{}}
navigator = {{
'userAgent': '{USER_AGENT}'
}}
""".strip()
js_enc = load_webmssdk('webmssdk.js')
final_js = js_dom + js_enc
ctx.eval(final_js)
function_caller = f"get_sign('{x_ms_stub}')"
signature = ctx.eval(function_caller)
# print("signature: ", signature)
return signature
except:
logger.exception("get_signature error")
return "00000000"
__ac_nonce:临时加密参数,用于与_signature一起使用
webid:同ttwid,类似客户端id,也可以说是浏览器id,不过ttwid可在cookie获取,webid可在随意一个视频请求,返回的html文本的script里再通过正则获取:
def get_ttwid_webid(self, req_url):
"""
获取 ttwid 和 webid
:param req_url:请求的视频地址
:return:
"""
while True:
try:
headers = {
"User-Agent":'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/123.0.0.0 Safari/537.36',
"Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.7",
"Accept-Language": "zh-CN,zh;q=0.9,en;q=0.8"
}
response = requests.request("GET", req_url, headers=headers, verify=False, timeout=3)
cookies_dict = response.cookies.get_dict()
ttwid_str = cookies_dict.get('ttwid')
render_data_text = \
re.compile('\<script id=\"RENDER_DATA\" type\=\"application\/json\">(.*?)\<\/script\>').findall(
response.text)
if(render_data_text):
render_data_text=render_data_text[0]
render_data_text = requests.utils.unquote(render_data_text)
render_data_json = json.loads(render_data_text, strict=False)
webid = render_data_json.get('app').get('odin').get('user_unique_id')
return ttwid_str, webid
except Exception as e:
logging.error(e)
time.sleep(1)
直播地址和主播信息
可以从直播的heml中提取到主播的信息和直播状态,还可以获取到直播流地址,提取方式就是正则表达式获取到json信息,使用rust代码如下:
pub async fn get_room_info(&mut self) -> Result<LiveInfo, Box<dyn std::error::Error>> {
println!("获取直播间的room_info: {}", self.room_url);
let mut headers = reqwest::header::HeaderMap::new();
headers.insert("accept", "text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.7".parse()?);
headers.insert("accept-language", "zh-CN,zh;q=0.9,en;q=0.8".parse()?);
headers.insert("cache-control", "max-age=0".parse()?);
headers.insert("cookie", "has_avx2=null; device_web_cpu_core=8; device_web_memory_size=8; live_use_vvc=%22false%22; xgplayer_user_id=903249300469; csrf_session_id=e2291ffdef635bf0666cf1a399de55de; webcast_local_quality=sd; SEARCH_RESULT_LIST_TYPE=%22single%22; bd_ticket_guard_client_web_domain=2; passport_csrf_token=644447c8fab148b9d360fdda67a1c4f8; passport_csrf_token_default=644447c8fab148b9d360fdda67a1c4f8; passport_assist_user=Cj2OyRQMJT-e_Yzf449bZPEv0yUeH6BvdNRi_8h6CtjL9f25qppJm9BSLatEfMUCnczMIxVgMYDWHw364CG_GkoKPOLAdamXQxwSIWj12fJ2HNDAzmKpcbYG_dL6XR46fpnifsYZHT0z_Z_o1N8bgNfGp2ung2BIO__buAxCBRCn0s4NGImv1lQgASIBA6XIvQ8%3D; n_mh=nNwOatDm453msvu0tqEj4bZm3NsIprwo6zSkIjLfICk; sso_uid_tt=25ba3d5cf4d058bf184ba57854a3853e; sso_uid_tt_ss=25ba3d5cf4d058bf184ba57854a3853e; toutiao_sso_user=561cc4c4cf742716616ebfd519d93768; toutiao_sso_user_ss=561cc4c4cf742716616ebfd519d93768; sid_ucp_sso_v1=1.0.0-KDUwODgyZmE5YzRjZGYxYjM2ZDE4ZDM5OWY3NDBiZGM4ZmE3YmFiM2MKHwj1jJ219wIQzs7vsAYY7zEgDDD-jcTZBTgGQPQHSAYaAmhsIiA1NjFjYzRjNGNmNzQyNzE2NjE2ZWJmZDUxOWQ5Mzc2OA; ssid_ucp_sso_v1=1.0.0-KDUwODgyZmE5YzRjZGYxYjM2ZDE4ZDM5OWY3NDBiZGM4ZmE3YmFiM2MKHwj1jJ219wIQzs7vsAYY7zEgDDD-jcTZBTgGQPQHSAYaAmhsIiA1NjFjYzRjNGNmNzQyNzE2NjE2ZWJmZDUxOWQ5Mzc2OA; passport_auth_status=854512750fb9b055d3809c2222eba72c%2C; passport_auth_status_ss=854512750fb9b055d3809c2222eba72c%2C; uid_tt=66858707d0775da51ae9674c1c591c27; uid_tt_ss=66858707d0775da51ae9674c1c591c27; sid_tt=c5061d6aae3b61f174b0c0696c6b7418; sessionid=c5061d6aae3b61f174b0c0696c6b7418; sessionid_ss=c5061d6aae3b61f174b0c0696c6b7418; _bd_ticket_crypt_doamin=2; _bd_ticket_crypt_cookie=bfcf62e1ae0bb79801498b683a86f505; __security_server_data_status=1; sid_guard=c5061d6aae3b61f174b0c0696c6b7418%7C1713104723%7C5183998%7CThu%2C+13-Jun-2024+14%3A25%3A21+GMT; sid_ucp_v1=1.0.0-KDY3ZWE2NzI3NDg3NjFjOWFlZjQ1ZmE0ZDE0OGI5NTY5NmYyMmE3MTcKGQj1jJ219wIQ087vsAYY7zEgDDgGQPQHSAQaAmxxIiBjNTA2MWQ2YWFlM2I2MWYxNzRiMGMwNjk2YzZiNzQxOA; ssid_ucp_v1=1.0.0-KDY3ZWE2NzI3NDg3NjFjOWFlZjQ1ZmE0ZDE0OGI5NTY5NmYyMmE3MTcKGQj1jJ219wIQ087vsAYY7zEgDDgGQPQHSAQaAmxxIiBjNTA2MWQ2YWFlM2I2MWYxNzRiMGMwNjk2YzZiNzQxOA; s_v_web_id=verify_luzmbzzg_wsuYknjY_Tc27_468O_9tHB_sKsLLWh3qV2R; ttwid=1%7CngabJA52sDUnYMxFKTFQmYEe2_RYNkefWVWEfuA53Mo%7C1713104743%7C34512c898d125865794d949a2477dda7493530c850da7c59a19c32a46642876c; LOGIN_STATUS=1; store-region=cn-sh; store-region-src=uid; publish_badge_show_info=%220%2C0%2C0%2C1714188960632%22; volume_info=%7B%22isUserMute%22%3Afalse%2C%22isMute%22%3Atrue%2C%22volume%22%3A0.6%7D; stream_recommend_feed_params=%22%7B%5C%22cookie_enabled%5C%22%3Atrue%2C%5C%22screen_width%5C%22%3A1470%2C%5C%22screen_height%5C%22%3A956%2C%5C%22browser_online%5C%22%3Atrue%2C%5C%22cpu_core_num%5C%22%3A8%2C%5C%22device_memory%5C%22%3A8%2C%5C%22downlink%5C%22%3A10%2C%5C%22effective_type%5C%22%3A%5C%224g%5C%22%2C%5C%22round_trip_time%5C%22%3A200%7D%22; strategyABtestKey=%221714189004.807%22; webcast_leading_last_show_time=1714189045348; webcast_leading_total_show_times=6; __live_version__=%221.1.1.9879%22; FOLLOW_LIVE_POINT_INFO=%22MS4wLjABAAAAdWaAD1s4nTXy5AWB9YQOjjVuEBSdF9Ke149hLM64PdY%2F1714233600000%2F0%2F0%2F1714215078555%22; FOLLOW_NUMBER_YELLOW_POINT_INFO=%22MS4wLjABAAAAdWaAD1s4nTXy5AWB9YQOjjVuEBSdF9Ke149hLM64PdY%2F1714233600000%2F0%2F1714214478555%2F0%22; passport_fe_beating_status=true; download_guide=%223%2F20240427%2F0%22; home_can_add_dy_2_desktop=%221%22; pwa2=%220%7C0%7C3%7C0%22; __ac_nonce=0662cd7bb00365383c41; __ac_signature=_02B4Z6wo00f01CXmuJwAAIDDdxmYhyaqaAglxrwAAG9c5BKd035mkv0aWuHT5.B6XgeFC-vvGuX4JgBV0ExZ0N.P5fFlhkfuwOam9askMq120O4j4k80SiSu9eVCx7llvUbc0L38xoSU.Iztae; xg_device_score=7.4571195567119375; live_can_add_dy_2_desktop=%221%22; bd_ticket_guard_client_data=eyJiZC10aWNrZXQtZ3VhcmQtdmVyc2lvbiI6MiwiYmQtdGlja2V0LWd1YXJkLWl0ZXJhdGlvbi12ZXJzaW9uIjoxLCJiZC10aWNrZXQtZ3VhcmQtcmVlLXB1YmxpYy1rZXkiOiJCRHpRSzBPc3RkRE5EOEIxVVM2QUhibjFxeGFEK0FCYkljbmMzeWxwMC9ZVE5SVks4cUZQTEtTSFRjbGtZdys2NnlpR1hEdDVsT05XaHd6UDFScWUrbUE9IiwiYmQtdGlja2V0LWd1YXJkLXdlYi12ZXJzaW9uIjoxfQ%3D%3D; msToken=dkS_SWd4Y0gsRB3GqURAf762ahdZlFp8lnIe5X99t-lSKMPg7ly9UIa4MZtcIS6gtJS_GkR2qE_V3WZfqOihnoDO2Td2BZlE0ZTYfHzH3memmRkD256iF9-MJUWi; odin_tt=441e6d35a29801763dc3805e40897a3197ad8eb3dede1b88ebda81998f313f13820db809f050a325915ac1347cb213ba; IsDouyinActive=false; msToken=ckrB5duL8xVXFP110HedaLvZk2iXY6ADOnnYAk3wQiInRW7veHIyuMdqd47VCyM-wNyW6ZpY6f7YqTH4Hwrwne--fd8bLF9qLOWgvIB3MG47BhPkkBNiDL77xuyU; ttwid=1%7CngabJA52sDUnYMxFKTFQmYEe2_RYNkefWVWEfuA53Mo%7C1713104743%7C34512c898d125865794d949a2477dda7493530c850da7c59a19c32a46642876c".parse()?);
headers.insert("priority", "u=0, i".parse()?);
headers.insert("referer", "https://live.douyin.com/972176515698?_ct=1714214842847&action_type=click&enter_from_merge=web_search&enter_method=web_video_head&enter_method_temai=web_video_head&group_id=undefined&is_livehead_preview_mini_window_show=&mini_window_show_type=&preview_info_str=eyJ1cmwiOiIiLCJsb3dVcmwiOiIiLCJ1aWQiOiIxMDA3NzQ5MjE4NDUiLCJ1dWlkIjoiNzM0NzE0NTY1MzUwMjAxOTEyNiIsImlzX211bHRpcGxlIjowLCJpc19wYWlkIjowLCJpc19tdWx0aV9jYW1lcmEiOjAsInJlc29sdXRpb25zIjpbXX0%3D&request_id=2024042718421695936492028C53AC640D&room_info=7362491920259713818&search_tab=aweme_general&_ct=1714214842848".parse()?);
headers.insert(
"sec-ch-ua",
"\"Chromium\";v=\"124\", \"Google Chrome\";v=\"124\", \"Not-A.Brand\";v=\"99\""
.parse()?,
);
headers.insert("sec-ch-ua-mobile", "?0".parse()?);
headers.insert("sec-ch-ua-platform", "\"macOS\"".parse()?);
headers.insert("sec-fetch-dest", "document".parse()?);
headers.insert("sec-fetch-mode", "navigate".parse()?);
headers.insert("sec-fetch-site", "same-origin".parse()?);
headers.insert("sec-fetch-user", "?1".parse()?);
headers.insert("upgrade-insecure-requests", "1".parse()?);
headers.insert("user-agent", "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/124.0.0.0 Safari/537.36".parse()?);
let request = self.request.get(self.room_url.clone()).headers(headers);
let response = request.send().await?;
// 先使用cookie,再使用text
let cookies = response.cookies();
let mut ttwid = String::new();
for c in cookies {
println!("cookies: {:?} value:{:?}", c.name(), c.value());
if c.name() == "ttwid" {
ttwid = c.value().to_string();
}
}
// 获取cookie里面的ttwid
let body = response.text().await?;
// println!("获取的直播间HTML内容是:{}", body);
// 判断是不是已经停播了,是的话仅获取主播头像
// 使用正则表达式匹配直播间信息
let re;
let mut unique_id = "";
if body.contains(r#"status\":4"#) {
println!("主播已停播了");
// 使用正则表达式匹配直播间信息
re = Regex::new(r#"anchor\\":(.*?),\\"open_id_str"#).unwrap();
} else {
// 使用正则表达式匹配直播间信息
re = Regex::new(r#"roomInfo\\":\{\\"room\\":(.*?),\\"toolbar_data"#).unwrap();
let unique_re = Regex::new(r#"user_unique_id\\":\\"(.*?)\\"}"#).unwrap();
unique_id = unique_re.captures(&body).unwrap().get(1).unwrap().as_str();
}
let main_info = re.captures(&body).unwrap().get(1).unwrap().as_str();
// 替换里面的双引号,方便json解析
let room_info = String::from(main_info) + "}";
self.room_info = room_info.replace(r#"\""#, r#"""#);
// println!("直播间信息是:{}", self.room_info);
Ok(LiveInfo {
room_info: self.room_info.clone(),
ttwid,
unique_id: String::from(unique_id),
})
}