前言
大家早好、午好、晚好吖 ❤ ~
“是非面前稍不留神,就会步入万丈深渊,唯有坚守信仰,才能守得初心”
2023年首部爆款剧集《狂飙》迎来大结局,今天我们就来采集一下评论,看看为什么它这么火爆
开发环境:
-
python 3.8
-
pycharm 专业版
模块使用:
-
requests >>> pip install requests 数据请求
-
parsel >>> pip install parsel 数据解析
如果安装python第三方模块:
-
win + R 输入 cmd 点击确定, 输入安装命令 pip install 模块名 (pip install requests)回车
-
在pycharm中点击Terminal(终端) 输入安装命令
代码展示
( 源码、教程、文档点击此处跳转跳转文末名片加入君羊,找管理员小姐姐领取呀~ )
导入模块
import requests
import parsel
伪装
headers = {
'Cookie': 'll="118267"; bid=vmTru_a25m8; __utma=30149280.50068328.1675317520.1675317520.1675317520.1; __utmc=30149280; __utmz=30149280.1675317520.1.1.utmcsr=baidu|utmccn=(organic)|utmcmd=organic; ap_v=0,6.0; _pk_ref.100001.4cf6=%5B%22%22%2C%22%22%2C1675317540%2C%22https%3A%2F%2Fwww.douban.com%2F%22%5D; _pk_ses.100001.4cf6=*; __utma=223695111.62892083.1675317540.1675317540.1675317540.1; __utmb=223695111.0.10.1675317540; __utmc=223695111; __utmz=223695111.1675317540.1.1.utmcsr=douban.com|utmccn=(referral)|utmcmd=referral|utmcct=/; __gads=ID=fb33508fbeefffdc-22b1618a7fd900c1:T=1675317540:RT=1675317540:S=ALNI_Ma0hUcCRHqTpc0wmcM01k3qpX3big; __gpi=UID=0000099c3e5d1190:T=1675317540:RT=1675317540:S=ALNI_MYqY1aqMuFbXYpmO6sFDn6zMnHB9g; __yadk_uid=KpA5hjYEmww6Sf2qskRgZamuj7aaecAC; ct=y; __utmb=30149280.3.10.1675317520; _vwo_uuid_v2=D091DE0AFC99F8C5AFC3169D9CB1E30F3|b218e266efb05a6a7a8652ac6ceecfe9; _pk_id.100001.4cf6=a8eb1a0fc7d89e94.1675317540.1.1675318206.1675317540.',
'Host': 'movie.*****.com',
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/109.0.0.0 Safari/537.36',
}
for page in range(0, 4000):
print(page)
发送请求
url = f'https://movie.***.com/subject/35465232/comments?start={page*20}&limit=20&status=P&sort=new_score'
response = requests.get(url=url, headers=headers)
select = parsel.Selector(response.text)
comments = select.css('.comment-item .comment')
for comment in comments:
name = comment.css('.comment-info a::text').get()
try:
score_str = comment.css('.comment-info .rating::attr(class)').get()
score = score_str.replace('0 rating', '').replace('allstar', '')
except:
score = 0
comment_time = comment.css('.comment-info .comment-time::text').get().strip()
vote_count = comment.css('.comment-vote .votes.vote-count::text').get()
comment_content = comment.css('.comment-content span::text').get()
print(name, score, comment_time, vote_count, comment_content)
效果展示
贴出来的代码可以采集前十页的数据
后面的评论就需要登录才可以看到采集拉
你们可以登录后改一下 ’ Cookie ’ 然后就可以全部采集拉~
最后,宣传一下呀~👇👇👇更多源码、资料、素材、解答、交流皆点击下方名片获取呀👇👇👇
尾语 💝
好了,今天的分享就差不多到这里了!
完整代码、更多资源、疑惑解答直接点击下方名片自取即可。
有更多建议或问题可以评论区或私信我哦!一起加油努力叭(ง •_•)ง
喜欢就关注一下博主,或点赞收藏评论一下我的文章叭!!!