爬取的网站:
完整代码在文章末尾
https://koubei.16888.com/57233/0-0-0-2
使用方法:
from bs4 import BeautifulSoup
拿到html后使用find_all()拿到文本数据,下图可见,数据标签为:
content_text = soup.find_all('span', class_='show_dp f_r')
因为优点,缺点,综述的classname一样,所以写了个小分类:
for index,x in enumerate(content_text):
if index % 3 == 0:
with open("car_post.txt", "a", encoding='utf-8') as f:
f.write(x.text+"\n")
elif index % 3 == 1:
with open("car_nev.txt", "a", encoding='utf-8') as f:
f.write(x.text+"\n")
else:
with open("car_text.txt", "a", encoding='utf-8') as f:
f.write(x.text+"\n")
结果预览
消极:
积极:
综述:
完整代码
from bs4 import BeautifulSoup
import requests
for j in range(1,300):
url="https://koubei.16888.com/57233/0-0-0-{}".format(j)
headers={
"user-agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/107.0.0.0 Safari/537.36 Edg/107.0.1418.35"
}
resp=requests.get(url,headers=headers)
resp.encoding="utf-8"
soup=BeautifulSoup(resp.text,"html.parser")
content_text = soup.find_all('span', class_='show_dp f_r')
for index,x in enumerate(content_text):
if index % 3 == 0:
with open("car_post.txt", "a", encoding='utf-8') as f:
f.write(x.text+"\n")
elif index % 3 == 1:
with open("car_nev.txt", "a", encoding='utf-8') as f:
f.write(x.text+"\n")
else:
with open("car_text.txt", "a", encoding='utf-8') as f:
f.write(x.text+"\n")
print(j)