前言: 爬虫哪家强,当然是python
我是属于啥语言都用,都懂点,不精通,实际工作中能能够顶上就可以。去年写的抓取bing每日的壁纸,保存到本地,并上传到阿里云oss,如果只是本地壁纸切换,存下来就行,一直想做个壁纸站点,但最终还是没有付诸于实际行动。
一、bing壁纸介绍:
我们访问bing中国的网址https://cn.bing.com,可以看到背景是精选的图片,而且这张图片每天是不一样的,所以可以每天抓取这张图片,来作为壁纸的来源.
二、如何获取到Bing每日的壁纸:
有人会想到每日访问https://cn.bing.com 站点, 解析出网页中壁纸的所在url然后再下载图片不就可以了.这确实是一个思路,但不用这么麻烦, 其实微软官方早就给我们提供了壁纸接口.可以查询到7日内每日壁纸:
https://cn.bing.com/HPImageArchive.aspx?format=js&idx=0%d&n=10
访问这个url返回的数据为JSON格式,如:
{
"images": [
{
"startdate": "20240329",
"fullstartdate": "202403291600",
"enddate": "20240330",
"url": "/th?id=OHR.SleepySloth_ZH-CN6084460583_1920x1080.jpg&rf=LaDigue_1920x1080.jpg&pid=hp",
"urlbase": "/th?id=OHR.SleepySloth_ZH-CN6084460583",
"copyright": "睡在号角树上的褐喉树懒,哥斯达黎加 (© Juan Carlos Vindas/Getty Images)",
"copyrightlink": "https://www.bing.com/search?q=%E8%A4%90%E5%96%89%E6%A0%91%E6%87%92&form=hpcapt&mkt=zh-cn",
"title": "来自“颠倒世界”的问候",
"quiz": "/search?q=Bing+homepage+quiz&filters=WQOskey:%22HPQuiz_20240329_SleepySloth%22&FORM=HPQUIZ",
"wp": true,
"hsh": "c56a6fc0a2750b049a409de4450daf27",
"drk": 1,
"top": 1,
"bot": 1,
"hs": [
]
},
{
"startdate": "20240328",
"fullstartdate": "202403281600",
"enddate": "20240329",
"url": "/th?id=OHR.SouthStackLight_ZH-CN5932471774_1920x1080.jpg&rf=LaDigue_1920x1080.jpg&pid=hp",
"urlbase": "/th?id=OHR.SouthStackLight_ZH-CN5932471774",
"copyright": "日落时的南斯塔克灯塔,霍利希德,威尔士,英国 (© mariotlr/Getty Images)",
"copyrightlink": "https://www.bing.com/search?q=%E9%9C%8D%E5%88%A9%E5%B8%8C%E5%BE%B7&form=hpcapt&mkt=zh-cn",
"title": "潮涨潮落,灯火通明",
"quiz": "/search?q=Bing+homepage+quiz&filters=WQOskey:%22HPQuiz_20240328_SouthStackLight%22&FORM=HPQUIZ",
"wp": true,
"hsh": "259f54d785e9ebf610729787a05fb5f2",
"drk": 1,
"top": 1,
"bot": 1,
"hs": [
]
},
{
"startdate": "20240327",
"fullstartdate": "202403271600",
"enddate": "20240328",
"url": "/th?id=OHR.ShanghaiBlossoms_ZH-CN5594677517_1920x1080.jpg&rf=LaDigue_1920x1080.jpg&pid=hp",
"urlbase": "/th?id=OHR.ShanghaiBlossoms_ZH-CN5594677517",
"copyright": "上海的樱花,中国 (© Yaorusheng/Getty Images)",
"copyrightlink": "https://www.bing.com/search?q=%E6%A8%B1%E8%8A%B1&form=hpcapt&mkt=zh-cn",
"title": "花香满径",
"quiz": "/search?q=Bing+homepage+quiz&filters=WQOskey:%22HPQuiz_20240327_ShanghaiBlossoms%22&FORM=HPQUIZ",
"wp": true,
"hsh": "0b0be2a4a80ccb847b97fad514d4974d",
"drk": 1,
"top": 1,
"bot": 1,
"hs": [
]
},
{
"startdate": "20240326",
"fullstartdate": "202403261600",
"enddate": "20240327",
"url": "/th?id=OHR.TeatroColon_ZH-CN5378730986_1920x1080.jpg&rf=LaDigue_1920x1080.jpg&pid=hp",
"urlbase": "/th?id=OHR.TeatroColon_ZH-CN5378730986",
"copyright": "布宜诺斯艾利斯哥伦布剧院,阿根廷 (© Wei Hao Ho/Alamy Stock Photo)",
"copyrightlink": "https://www.bing.com/search?q=%E4%B8%96%E7%95%8C%E6%88%8F%E5%89%A7%E6%97%A5&form=hpcapt&mkt=zh-cn",
"title": "戏剧成为关注的焦点",
"quiz": "/search?q=Bing+homepage+quiz&filters=WQOskey:%22HPQuiz_20240326_TeatroColon%22&FORM=HPQUIZ",
"wp": true,
"hsh": "e49947b0643729ced7b6066a2842b68f",
"drk": 1,
"top": 1,
"bot": 1,
"hs": [
]
},
{
"startdate": "20240325",
"fullstartdate": "202403251600",
"enddate": "20240326",
"url": "/th?id=OHR.HangRaiVietnam_ZH-CN1601428109_1920x1080.jpg&rf=LaDigue_1920x1080.jpg&pid=hp",
"urlbase": "/th?id=OHR.HangRaiVietnam_ZH-CN1601428109",
"copyright": "海水从古老的珊瑚礁上倾泻而下,杭莱,越南 (© Thang Tat Nguyen/Getty Images)",
"copyrightlink": "https://www.bing.com/search?q=%E5%AE%81%E9%A1%BA%E6%B5%B7%E7%8D%AD%E6%B4%9E&form=hpcapt&mkt=zh-cn",
"title": "潮汐探戈",
"quiz": "/search?q=Bing+homepage+quiz&filters=WQOskey:%22HPQuiz_20240325_HangRaiVietnam%22&FORM=HPQUIZ",
"wp": true,
"hsh": "fb962e1ad50ebf0fa2c5a506d99dfbfe",
"drk": 1,
"top": 1,
"bot": 1,
"hs": [
]
},
{
"startdate": "20240324",
"fullstartdate": "202403241600",
"enddate": "20240325",
"url": "/th?id=OHR.TulipAbbotsford_ZH-CN1401627293_1920x1080.jpg&rf=LaDigue_1920x1080.jpg&pid=hp",
"urlbase": "/th?id=OHR.TulipAbbotsford_ZH-CN1401627293",
"copyright": "弗雷泽河谷的郁金香田,阿伯兹福德,不列颠哥伦比亚省,加拿大 (© LeonU/Getty Images)",
"copyrightlink": "https://www.bing.com/search?q=%E4%B8%8D%E5%88%97%E9%A2%A0%E5%93%A5%E4%BC%A6%E6%AF%94%E4%BA%9A%E7%9C%81+%E9%98%BF%E4%BC%AF%E5%85%B9%E7%A6%8F%E5%BE%B7&form=hpcapt&mkt=zh-cn",
"title": "春意盎然",
"quiz": "/search?q=Bing+homepage+quiz&filters=WQOskey:%22HPQuiz_20240324_TulipAbbotsford%22&FORM=HPQUIZ",
"wp": true,
"hsh": "a1b4fe059803db763eda39cc2b48c33b",
"drk": 1,
"top": 1,
"bot": 1,
"hs": [
]
},
{
"startdate": "20240323",
"fullstartdate": "202403231600",
"enddate": "20240324",
"url": "/th?id=OHR.WhiteEyes_ZH-CN1130380430_1920x1080.jpg&rf=LaDigue_1920x1080.jpg&pid=hp",
"urlbase": "/th?id=OHR.WhiteEyes_ZH-CN1130380430",
"copyright": "樱花树枝上的灰胸绣眼鸟,韩国 (© TigerSeo/Getty Images)",
"copyrightlink": "https://www.bing.com/search?q=%E7%81%B0%E8%83%B8%E7%BB%A3%E7%9C%BC%E9%B8%9F&form=hpcapt&mkt=zh-cn",
"title": "你能挪过去一点吗?",
"quiz": "/search?q=Bing+homepage+quiz&filters=WQOskey:%22HPQuiz_20240323_WhiteEyes%22&FORM=HPQUIZ",
"wp": true,
"hsh": "006ea10e45abfa33db0e42504a10ed3e",
"drk": 1,
"top": 1,
"bot": 1,
"hs": [
]
},
{
"startdate": "20240322",
"fullstartdate": "202403221600",
"enddate": "20240323",
"url": "/th?id=OHR.AmazonClouds_ZH-CN0578911147_1920x1080.jpg&rf=LaDigue_1920x1080.jpg&pid=hp",
"urlbase": "/th?id=OHR.AmazonClouds_ZH-CN0578911147",
"copyright": "巴西亚马逊上空巨大的砧状云 (© NASA)",
"copyrightlink": "https://www.bing.com/search?q=%E4%B8%96%E7%95%8C%E6%B0%94%E8%B1%A1%E6%97%A5&form=hpcapt&mkt=zh-cn",
"title": "造雾",
"quiz": "/search?q=Bing+homepage+quiz&filters=WQOskey:%22HPQuiz_20240322_AmazonClouds%22&FORM=HPQUIZ",
"wp": true,
"hsh": "49990f5f0aecedc302513c6a05dc6e48",
"drk": 1,
"top": 1,
"bot": 1,
"hs": [
]
}
],
"tooltips": {
"loading": "正在加载...",
"previous": "上一个图像",
"next": "下一个图像",
"walle": "此图片不能下载用作壁纸。",
"walls": "下载今日美图。仅限用作桌面壁纸。"
}
}
从json中解析可以得到images中urlbase就是我们需要的图片地址, 如3月30日的壁纸的url就是: https://cn.bing.com/th?id=OHR.SleepySloth_ZH-CN6084460583_1920x1080.jpg ,构造的图片url地址规律 https://cn.bing.com + baseurl + "_" + imagesize + ".jpg" ,我显示器的分比率是1920*1080的,所以这个分比率就够了,如果要下载最大分辨率的那么url就应该是 "https://cn.bing.com" + baseUrl + "_UHD.jpg"
三、上传阿里云OSS:
阿里云有完整的实例代码,在这里就不做过多的阐述. 可以自己去官方找文档, pip 安装下oss2的库,import 进去就可以使用.
四、完整的Python代码:
import json
import logging
import os
import pymysql
import requests
import oss2
forder = "f:\\BingImage\\" # 图片存放的目录
db = pymysql.connect(host='mysql数据库地址', user='用户名', password='密码', port=3306,
db='数据库')
cursor = db.cursor()
insertsql = "insert into bingimage(imgDes,imgUrl,imgSize,pubDate,getDate,ifDown,baseUrl,title,hsh,ifup) values(%s,%s,%s,%s,current_time ,1,%s,%s,%s,%s)"
checksql = "select count(*) from bingimage where baseUrl =%s"
accessKeyId = '' //阿里云accessKeyId
accessKeySecret = '' //阿里云accessKeySecret
auth = oss2.Auth(accessKeyId, accessKeySecret)
bucket = oss2.Bucket(auth, 'https://oss-cn-shenzhen.aliyuncs.com', 'imagefrombing')
try:
headers = {
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/115.0.0.0 Safari/537.36"
}
res = requests.get("https://cn.bing.com/HPImageArchive.aspx?format=js&idx=0%d&n=10",headers =headers)
data = json.loads(res.text)
for img in data['images']:
hsh = img['hsh']
baseUrl = img['urlbase']
cursor.execute(checksql,(baseUrl))
row = cursor.fetchone()
if row[0] ==0 :
pubDate = img['startdate']
imgDes = img['copyright']
title = img['title']
imgSize='1920x1080'
imgUrl = "https://cn.bing.com" + baseUrl + "_" + imgSize + ".jpg"
r = requests.get(imgUrl)
imgName = baseUrl.replace("/th?id=","") + "_" + imgSize + ".jpg"
with open(forder +pubDate + "-" + imgName,"wb") as f :
f.write(r.content)
if os.path.exists(forder +pubDate + "-" + imgName) :
cursor.execute(insertsql, (imgDes, imgUrl, imgSize, pubDate, baseUrl, title, hsh,0))
db.commit()
print(hsh + "-" + imgSize + "-下载完毕")
imgSize='UHD'
imgUrl = "https://cn.bing.com" + baseUrl + "_" + imgSize + ".jpg"
r = requests.get(imgUrl)
imgName = baseUrl.replace("/th?id=", "") + "_" + imgSize + ".jpg"
with open(forder + pubDate + "-" + imgName, "wb") as f:
f.write(r.content)
if os.path.exists(forder + pubDate + "-" + imgName):
with open(forder + pubDate + "-" + imgName, "rb") as f2:
bucket.put_object(pubDate + "-" + imgName, f2)
cursor.execute(insertsql, (imgDes, imgUrl, imgSize, pubDate, baseUrl, title, hsh,1))
db.commit()
print(hsh + "-" + imgSize + "-下载完毕")
else :
print(hsh + '已经存在数据库')
except Exception as e:
logging.Logger.info(e)
db.close()
五、每日下载的实现:
windows下做个地址任务,每天定时执行两次这个python的脚本就可以了。
六、win11设置壁纸轮换:
在控制面板-》个性化-》背景里面设置