爬虫进阶之selenium模拟浏览器

news2024/11/26 8:28:27

爬虫进阶之selenium模拟浏览器

  • 简介
  • 环境配置
    • 1、建议先安装conda
    • 2、创建虚拟环境并安装对应的包
    • 3、下载对应的谷歌驱动以及与驱动对应的浏览器
  • 代码
    • setting.py配置
    • scrapy脚本参考
    • 中间件middlewares.py
  • 附录:selenium教程

简介

Selenium是一个用于自动化浏览器操作的工具,通常用于Web应用测试。然而,它也可以用作爬虫,通过模拟用户在浏览器中的操作来提取网页数据。以下是有关Selenium爬虫的一些基本介绍:

  1. 浏览器自动化: Selenium允许你通过编程方式控制浏览器的行为,包括打开网页、点击按钮、填写表单等。这样你可以模拟用户在浏览器中的操作。

  2. 支持多种浏览器: Selenium支持多种主流浏览器,包括Chrome、Firefox、Edge等。你可以选择适合你需求的浏览器来进行自动化操作。

  3. 网页数据提取: 利用Selenium,你可以加载网页并提取页面上的数据。这对于一些动态加载内容或需要用户交互的网页来说特别有用。

  4. 等待元素加载: 由于网页可能会异步加载,Selenium提供了等待机制,确保在继续执行之前等待特定的元素加载完成。

  5. 选择器: Selenium支持各种选择器,类似于使用CSS选择器或XPath来定位网页上的元素。

  6. 动态网页爬取: 对于使用JavaScript动态生成内容的网页,Selenium是一个有力的工具,因为它可以执行JavaScript代码并获取渲染后的结果。

尽管Selenium在爬虫中可以提供很多便利,但也需要注意一些方面。首先,使用Selenium进行爬取速度较慢,因为它模拟了真实用户的操作。其次,网站可能会检测到自动化浏览器,并采取措施来防止爬虫,因此使用Selenium时需要小心谨慎,遵守网站的使用规定和政策。

在使用selenium前需要有scrapy爬虫框架的相关知识,selenium需要结合scrapy的中间件才能发挥爬虫的作用,详细请看→前提知识:https://blog.csdn.net/shizuguilai/article/details/135554205

环境配置

1、建议先安装conda

参考连接:https://blog.csdn.net/Q_fairy/article/details/129158178

2、创建虚拟环境并安装对应的包

# 创建名字为scrapy的包
conda create -n scrapy 
# 进入虚拟环境
conda activate scrapy
# 下载对应的包
pip install scrapy
pip install selenium

3、下载对应的谷歌驱动以及与驱动对应的浏览器

参考连接:https://zhuanlan.zhihu.com/p/665018772
记得配置好环境变量

代码

目录结构:spiders下面就是我放scrapy脚本的位置。
在这里插入图片描述

setting.py配置

# Scrapy settings for sw project
#
# For simplicity, this file contains only settings considered important or
# commonly used. You can find more settings consulting the documentation:
#
#     https://docs.scrapy.org/en/latest/topics/settings.html
#     https://docs.scrapy.org/en/latest/topics/downloader-middleware.html
#     https://docs.scrapy.org/en/latest/topics/spider-middleware.html

BOT_NAME = "sw"

SPIDER_MODULES = ["sw.spiders"]
NEWSPIDER_MODULE = "sw.spiders"
DOWNLOAD_DELAY = 3
RANDOMIZE_DOWNLOAD_DELAY = True
USER_AGENT = 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36'
COOKIES_ENABLED = True


# Crawl responsibly by identifying yourself (and your website) on the user-agent
#USER_AGENT = "sw (+http://www.yourdomain.com)"

# Obey robots.txt rules
ROBOTSTXT_OBEY = False

# 文件settings.py中

# ----------- selenium参数配置 -------------
SELENIUM_TIMEOUT = 25           # selenium浏览器的超时时间,单位秒
LOAD_IMAGE = True               # 是否下载图片
WINDOW_HEIGHT = 900             # 浏览器窗口大小
WINDOW_WIDTH = 900

# Configure maximum concurrent requests performed by Scrapy (default: 16)
#CONCURRENT_REQUESTS = 32

# Configure a delay for requests for the same website (default: 0)
# See https://docs.scrapy.org/en/latest/topics/settings.html#download-delay
# See also autothrottle settings and docs
#DOWNLOAD_DELAY = 3
# The download delay setting will honor only one of:
#CONCURRENT_REQUESTS_PER_DOMAIN = 16
#CONCURRENT_REQUESTS_PER_IP = 16

# Disable cookies (enabled by default)
#COOKIES_ENABLED = False

# Disable Telnet Console (enabled by default)
#TELNETCONSOLE_ENABLED = False

# Override the default request headers:
#DEFAULT_REQUEST_HEADERS = {
#    "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8",
#    "Accept-Language": "en",
#}

# Enable or disable spider middlewares
# See https://docs.scrapy.org/en/latest/topics/spider-middleware.html
#SPIDER_MIDDLEWARES = {
#    "sw.middlewares.SwSpiderMiddleware": 543,
#}

# Enable or disable downloader middlewares
# See https://docs.scrapy.org/en/latest/topics/downloader-middleware.html
#DOWNLOADER_MIDDLEWARES = {
#    "sw.middlewares.SwDownloaderMiddleware": 543,
#}

# Enable or disable extensions
# See https://docs.scrapy.org/en/latest/topics/extensions.html
#EXTENSIONS = {
#    "scrapy.extensions.telnet.TelnetConsole": None,
#}

# Configure item pipelines
# See https://docs.scrapy.org/en/latest/topics/item-pipeline.html
#ITEM_PIPELINES = {
#    "sw.pipelines.SwPipeline": 300,
#}
# ITEM_PIPELINES = {
#    "sw.pipelines.SwPipeline": 300,
# }

# DB_SETTINGS = {
#     'host': '127.0.0.1',
#     'port': 3306,
#     'user': 'root',
#     'password': '123456',
#     'db': 'scrapy_news_2024_01_08',
#     'charset': 'utf8mb4',
# }

# Enable and configure the AutoThrottle extension (disabled by default)
# See https://docs.scrapy.org/en/latest/topics/autothrottle.html
#AUTOTHROTTLE_ENABLED = True
# The initial download delay
#AUTOTHROTTLE_START_DELAY = 5
# The maximum download delay to be set in case of high latencies
#AUTOTHROTTLE_MAX_DELAY = 60
# The average number of requests Scrapy should be sending in parallel to
# each remote server
#AUTOTHROTTLE_TARGET_CONCURRENCY = 1.0
# Enable showing throttling stats for every response received:
#AUTOTHROTTLE_DEBUG = False

# Enable and configure HTTP caching (disabled by default)
# See https://docs.scrapy.org/en/latest/topics/downloader-middleware.html#httpcache-middleware-settings
#HTTPCACHE_ENABLED = True
#HTTPCACHE_EXPIRATION_SECS = 0
#HTTPCACHE_DIR = "httpcache"
#HTTPCACHE_IGNORE_HTTP_CODES = []
#HTTPCACHE_STORAGE = "scrapy.extensions.httpcache.FilesystemCacheStorage"

# Set settings whose default value is deprecated to a future-proof value
REQUEST_FINGERPRINTER_IMPLEMENTATION = "2.7"
TWISTED_REACTOR = "twisted.internet.asyncioreactor.AsyncioSelectorReactor"
FEED_EXPORT_ENCODING = "utf-8"
# REDIRECT_ENABLED = False

scrapy脚本参考

"""
Created on 2024/01/06 14:00 by Fxy
"""
import scrapy
from sw.items import SwItem
import time
from datetime import datetime
import locale
from scrapy_splash import SplashRequest
# scrapy 信号相关库
from scrapy.utils.project import get_project_settings
# 下面这种方式,即将废弃,所以不用
# from scrapy.xlib.pydispatch import dispatcher
from scrapy import signals
# scrapy最新采用的方案
from pydispatch import dispatcher
from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait


class NhcSpider(scrapy.Spider):
   '''
   scrapy变量
   '''
   # 爬虫名称
   name = "1000_nhc"
   # 允许爬取的域名
   allowed_domains = ["xxxx.cn"]
   # 爬虫的起始链接
   start_urls = ["xxxx.shtml"]
   # 创建一个VidoItem实例
   item = SwItem()
   custom_settings = {
       'LOG_LEVEL':'INFO',
       'DOWNLOAD_DELAY': 0,
       'COOKIES_ENABLED': False,  # enabled by default
       'DOWNLOADER_MIDDLEWARES': {
           # SeleniumMiddleware 中间件
           'sw.middlewares.SeleniumMiddleware': 543, # 这个数字是启用的优先级
           # 将scrapy默认的user-agent中间件关闭
           'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware': None,
       }
   }
   '''
   自定义变量
   '''
   # 机构名称
   org = "xxxx数据"
   # 机构英文名称
   org_e = "None"
   # 日期格式
   site_date_format = '发布时间:\n            \t%Y-%m-%d\n            ' # 网页的日期格式
   date_format = '%d.%m.%Y %H:%M:%S' # 目标日期格式
   # 网站语言格式
   language_type = "zh2zh"  # 中文到中文的语言代码, 调用翻译接口时,使用
   # 模拟浏览器格式
   meta = {'usedSelenium': name, 'dont_redirect': True}

   # 将chrome初始化放到spider中,成为spider中的元素
   def __init__(self, timeout=40, isLoadImage=True, windowHeight=None, windowWidth=None):
       # 从settings.py中获取设置参数
       self.mySetting = get_project_settings()
       self.timeout = self.mySetting['SELENIUM_TIMEOUT']
       self.isLoadImage = self.mySetting['LOAD_IMAGE']
       self.windowHeight = self.mySetting['WINDOW_HEIGHT']
       self.windowWidth = self.mySetting['windowWidth']
       # 初始化chrome对象
       options = webdriver.ChromeOptions()
       options.add_experimental_option('useAutomationExtension', False) # 隐藏selenium特性
       options.add_experimental_option('excludeSwitches', ['enable-automation']) # 隐藏selenium特性
       options.add_argument('--ignore-certificate-errors') # 忽略证书错误
       options.add_argument('--ignore-certificate-errors-spki-list')
       options.add_argument('--ignore-ssl-errors') # 忽略ssl错误
       # chrome_options = webdriver.ChromeOptions()
       # chrome_options.binary_location = "E:\\学校的一些资料\\文档\研二上\\chrome-win64\\chrome.exe"  # 替换为您的特定版本的Chrome浏览器路径
       #1.创建Chrome或Firefox浏览器对象,这会在电脑上在打开一个浏览器窗口
       # browser = webdriver.Chrome(executable_path ="E:\\chromedriver\\chromedriver", chrome_options=chrome_options) #第一个参数为驱动的路径,第二个参数为对应的应用程序地址
       self.browser = webdriver.Chrome(chrome_options=options)
       self.browser.execute_cdp_cmd("Page.addScriptToEvaluateOnNewDocument", { # 隐藏selenium特性
           "source": """
           Object.defineProperty(navigator, 'webdriver', {
           get: () => undefined
           })
           """
       })
       if self.windowHeight and self.windowWidth:
           self.browser.set_window_size(900, 900)
       self.browser.set_page_load_timeout(self.timeout)        # 页面加载超时时间
       self.wait = WebDriverWait(self.browser, 30)             # 指定元素加载超时时间
       super(NhcSpider, self).__init__()
       # 设置信号量,当收到spider_closed信号时,调用mySpiderCloseHandle方法,关闭chrome
       dispatcher.connect(receiver = self.mySpiderCloseHandle,
                          signal = signals.spider_closed
                          )


   # 信号量处理函数:关闭chrome浏览器
   def mySpiderCloseHandle(self, spider):
       print(f"mySpiderCloseHandle: enter ")
       self.browser.quit()

   def start_requests(self):
       yield scrapy.Request(url = self.start_urls[0],
           meta = self.meta,
           callback = self.parse,
           # errback = self.error
       )

   #爬虫的主入口,这里是获取所有的归档文章链接, 从返回的respose
   def parse(self,response):
       # locale.setlocale(locale.LC_TIME, 'en_US') #本地语言为英语 //*[@id="538034"]/div

       achieve_links = response.xpath('//ul[@class="zxxx_list"]/li/a/@href').extract()
       print("achieve_links",achieve_links)
       for achieve_link in achieve_links:
           full_achieve_link = "http:/xxxx.cn" + achieve_link
           print("full_achieve_link", full_achieve_link)
           # 进入每个归档链接
           yield scrapy.Request(full_achieve_link, callback=self.parse_item,dont_filter=True, meta=self.meta)

       #翻页逻辑
       xpath_expression = f'//*[@id="page_div"]/div[@class="pagination_index"]/span/a[text()="下一页"]/@href'
       next_page = response.xpath(xpath_expression).extract_first()
       print("next_page = ", next_page)

       # 翻页操作
       if next_page != None:
           # print(next_page)
           # print('next page')
           full_next_page = "http://xxxx/" + next_page
           print("full_next_page",full_next_page)
           meta_page = {'usedSelenium': self.name, "whether_wait_id" : True} # 翻页的meta和请求的meta要不一样
           yield scrapy.Request(full_next_page, callback=self.parse, dont_filter=True, meta=meta_page)


   #获取每个文章的内容,并存入item
   def parse_item(self,response):
       source_url = response.url
       title_o = response.xpath('//div[@class="tit"]/text()').extract_first().strip()
       # title_t = my_tools.get_trans(title_o, "de2zh")
       publish_time = response.xpath('//div[@class="source"]/span[1]/text()').extract_first()
       date_object = datetime.strptime(publish_time, self.site_date_format) # 先读取成网页的日期格式
       date_object = date_object.strftime(self.date_format) # 转换成目标的日期字符串
       publish_time = datetime.strptime(date_object, self.date_format) # 从符合格式的字符串,转换成日期

       content_o = [content.strip() for content in response.xpath('//div[@id="xw_box"]//text()').extract()]
       # content_o = ' '.join(content_o) # 这个content_o提取出来是一个字符串数组,所以要拼接成字符串
       # content_t = my_tools.get_trans(content_o, "de2zh")

       print("source_url:", source_url)
       print("title_o:", title_o)
       # print("title_t:", title_t)
       print("publish_time:", publish_time) #15.01.2008
       print("content_o:", content_o)
       # print("content_t:", content_t)
       print("-" * 50)

       page_data = { 
           'source_url': source_url,
           'title_o': title_o,
           # 'title_t' : title_t,
           'publish_time': publish_time,
           'content_o': content_o,
           # 'content_t': content_t,
           'org' : self.org,
           'org_e' : self.org_e,
       }
       self.item['url'] = page_data['source_url']
       self.item['title'] = page_data['title_o']
       # self.item['title_t'] = page_data['title_t']
       self.item['time'] = page_data['publish_time']
       self.item['content'] = page_data['content_o']
       # self.item['content_t'] = page_data['content_t']
       # 获取当前时间
       current_time = datetime.now()
       # 格式化成字符串
       formatted_time = current_time.strftime(self.date_format)
       # 将字符串转换为 datetime 对象
       datetime_object = datetime.strptime(formatted_time, self.date_format)
       self.item['scrapy_time'] = datetime_object
       self.item['org'] = page_data['org']
       self.item['trans_org'] = page_data['org_e']

       yield self.item

中间件middlewares.py

# Define here the models for your spider middleware
#
# See documentation in:
# https://docs.scrapy.org/en/latest/topics/spider-middleware.html

from scrapy import signals

# useful for handling different item types with a single interface
from itemadapter import is_item, ItemAdapter


class SwSpiderMiddleware:
   # Not all methods need to be defined. If a method is not defined,
   # scrapy acts as if the spider middleware does not modify the
   # passed objects.

   @classmethod
   def from_crawler(cls, crawler):
       # This method is used by Scrapy to create your spiders.
       s = cls()
       crawler.signals.connect(s.spider_opened, signal=signals.spider_opened)
       return s

   def process_spider_input(self, response, spider):
       # Called for each response that goes through the spider
       # middleware and into the spider.

       # Should return None or raise an exception.
       return None

   def process_spider_output(self, response, result, spider):
       # Called with the results returned from the Spider, after
       # it has processed the response.

       # Must return an iterable of Request, or item objects.
       for i in result:
           yield i

   def process_spider_exception(self, response, exception, spider):
       # Called when a spider or process_spider_input() method
       # (from other spider middleware) raises an exception.

       # Should return either None or an iterable of Request or item objects.
       pass

   def process_start_requests(self, start_requests, spider):
       # Called with the start requests of the spider, and works
       # similarly to the process_spider_output() method, except
       # that it doesn’t have a response associated.

       # Must return only requests (not items).
       for r in start_requests:
           yield r

   def spider_opened(self, spider):
       spider.logger.info("Spider opened: %s" % spider.name)


class SwDownloaderMiddleware:
   # Not all methods need to be defined. If a method is not defined,
   # scrapy acts as if the downloader middleware does not modify the
   # passed objects.

   @classmethod
   def from_crawler(cls, crawler):
       # This method is used by Scrapy to create your spiders.
       s = cls()
       crawler.signals.connect(s.spider_opened, signal=signals.spider_opened)
       return s

   def process_request(self, request, spider):
       # Called for each request that goes through the downloader
       # middleware.

       # Must either:
       # - return None: continue processing this request
       # - or return a Response object
       # - or return a Request object
       # - or raise IgnoreRequest: process_exception() methods of
       #   installed downloader middleware will be called
       return None

   def process_response(self, request, response, spider):
       # Called with the response returned from the downloader.

       # Must either;
       # - return a Response object
       # - return a Request object
       # - or raise IgnoreRequest
       return response

   def process_exception(self, request, exception, spider):
       # Called when a download handler or a process_request()
       # (from other downloader middleware) raises an exception.

       # Must either:
       # - return None: continue processing this exception
       # - return a Response object: stops process_exception() chain
       # - return a Request object: stops process_exception() chain
       pass

   def spider_opened(self, spider):
       spider.logger.info("Spider opened: %s" % spider.name)


# -*- coding: utf-8 -*- 使用selenium
from selenium import webdriver
from selenium.common.exceptions import TimeoutException
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.keys import Keys
from scrapy.http import HtmlResponse
from logging import getLogger
import time

class SeleniumMiddleware():
   # Middleware中会传递进来一个spider,这就是我们的spider对象,从中可以获取__init__时的chrome相关元素
   def process_request(self, request, spider):
       '''
       用chrome抓取页面
       :param request: Request请求对象
       :param spider: Spider对象
       :return: HtmlResponse响应
       '''
       print(f"chrome is getting page = {request.url}")
       # 依靠meta中的标记,来决定是否需要使用selenium来爬取
       usedSelenium = request.meta.get('usedSelenium', None) # 从request中的meta字段中获取usedSelenium值,不过不存在,返回默认的None
       # print("来到中间了?")
       if usedSelenium == "1000_nhc":
           try:
               spider.browser.get(request.url)
               time.sleep(4)
               if(request.meta.get('whether_wait_id', False)): # 从request中的meta字段中获取whether_wait_id值,不过不存在,返回默认的False
                   print("准备等待翻页的元素出现。。。")
                   # 使用WebDriverWait等待页面加载完成
                   wait = WebDriverWait(spider.browser, 20)  # 设置最大等待时间为60秒
                   # 示例:等待页面中的某个元素加载完成,可根据实际情况调整
                   wait.until(EC.presence_of_element_located((By.ID, "page_div"))) # 等待翻页结束,才进行下一步
           except TimeoutException: # 没有等到元素,继续重新进行请求
               print("Timeout waiting for element. Retrying the request.")
               self.retry_request(request, spider)
           except Exception as e:
               print(f"chrome getting page error, Exception = {e}")
               return HtmlResponse(url=request.url, status=500, request=request)
           else:
               time.sleep(4)
               # 页面爬取成功,构造一个成功的Response对象(HtmlResponse是它的子类)
               return HtmlResponse(url=request.url,
                                   body=spider.browser.page_source,
                                   request=request,
                                   # 最好根据网页的具体编码而定
                                   encoding='utf-8',
                                   status=200)


           # try:
           #     spider.browser.get(request.url)
           #     # 搜索框是否出现
           #     input = spider.wait.until(
           #         EC.presence_of_element_located((By.XPATH, "//div[@class='nav-search-field ']/input"))
           #     )
           #     time.sleep(2)
           #     input.clear()
           #     input.send_keys("iphone 7s")
           #     # 敲enter键, 进行搜索
           #     input.send_keys(Keys.RETURN)
           #     # 查看搜索结果是否出现
           #     searchRes = spider.wait.until(
           #         EC.presence_of_element_located((By.XPATH, "//div[@id='resultsCol']"))
           #     )
           # except Exception as e:
           #     print(f"chrome getting page error, Exception = {e}")
           #     return HtmlResponse(url=request.url, status=500, request=request)
           # else:
           #     time.sleep(3)
           #     # 页面爬取成功,构造一个成功的Response对象(HtmlResponse是它的子类)
           #     return HtmlResponse(url=request.url,
           #                         body=spider.browser.page_source,
           #                         request=request,
           #                         # 最好根据网页的具体编码而定
           #                         encoding='utf-8',
           #                         status=200)


附录:selenium教程

参考链接1 selenium如何等待具体元素的出现:https://selenium-python-zh.readthedocs.io/en/latest/waits.html
参考链接2 selenium具体用法:https://pythondjango.cn/python/tools/7-python_selenium/#%E5%85%83%E7%B4%A0%E5%AE%9A%E4%BD%8D%E6%96%B9%E6%B3%95
参考链接3 别人的的实战:https://blog.csdn.net/zwq912318834/article/details/79773870
在这里插入图片描述

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.coloradmin.cn/o/1399994.html

如若内容造成侵权/违法违规/事实不符,请联系多彩编程网进行投诉反馈,一经查实,立即删除!

相关文章

web开发学习笔记(10.postman请求响应,后端接口基础知识)

1.springboot使用get请求接受简单参数 上述写法不去别get或者post请求 2.postman各种提交方式的区别 1、form-data: 就是http请求中的multipart/form-data,它会将表单的数据处理为一条消息,以标签为单元,用分隔符分开。既可以上传键值对,也可…

huggingface学习 | 云服务器使用hf_hub_download下载huggingface上的模型文件

系列文章目录 huggingface学习 | 云服务器使用git-lfs下载huggingface上的模型文件 文章目录 系列文章目录一、hf_hub_download介绍二、找到需要下载的huggingface文件三、准备工作及下载过程四、全部代码 一、hf_hub_download介绍 hf_hub_download是huggingface官方支持&…

【C++】文件操作

文件操作 一、文本文件(一)写文件读文件 二、二进制文件(一)写文件(二)读文件 程序运行时产生的数据都属于临时数据,程序一旦运行结束都会被释放,通过文件可以将数据持久化&#xff…

支付宝小程序开发踩坑笔记(支付宝、学习强国小程序)

1、接口请求安卓端回调 success,IOS 端回调 fail 原因:dataType 设置不对,默认是 json 格式,对返回数据会进行 json 解析,如果解析失败,就会回调 fail 。加密传输一般是 text 格式。 2、input 禁止输入空格…

【动态规划】【数学】【C++算法】805 数组的均值分割

作者推荐 【动态规划】【数学】【C算法】18赛车 本文涉及知识点 动态规划 数学 805 数组的均值分割 给定你一个整数数组 nums 我们要将 nums 数组中的每个元素移动到 A 数组 或者 B 数组中,使得 A 数组和 B 数组不为空,并且 average(A) average(B)…

2788.按分隔符拆分字符串

前言 力扣还挺上道(bushi),今天第一次写每日一题,给了个简单等级的数组题,我只能说,首战告捷(小白的呐喊),看看这每日一题我能坚持一天写出来, ok&#xff…

Ubuntu安装mysql8详细步骤

1、拉取镜像 docker pull mysql:8.0.20 2、启动镜像 docker run -p 3307:3306 --name mysql8 -e MYSQL_ROOT_PASSWORD123456 -d mysql:8.0.20 检查是否启动成功 docker ps 3、配置挂载 创建挂载目录(请检查保证创建成功) mkdir -p /docker/mysql8.0.20/…

Java安全 CC链1分析

Java安全之CC链1分析 什么是CC链环境搭建jdk下载idea配置创建项目 前置知识Transformer接口ConstantTransformer类invokerTransformer类ChainedTransformer类 构造CC链1CC链1核心demo1demo1分析 寻找如何触发CC链1核心TransformedMap类AbstractInputCheckedMapDecorator类readO…

IDEA在重启springboot项目时没有自动重新build

IDEA在重启springboot项目时没有自动重新build 问题描述 当项目里面某些依赖或者插件更新了,target的class文件没有找到,导致不是我们需要的效果。 只能手动的清理target文件,麻烦得很 , 单体项目还好说,一次清理就…

Qt5.15.2中加入图片资源

系列文章目录 文章目录 系列文章目录前言一、加入图片资源二、代码 前言 以前用的Qt5.15.2之前的版本,QtCreator默认的工程文件是*.pro,现在用5.15.2创建工程默认的工程文件是CMameList.txt,当然在创建项目时,仍然可以使用pro工程文件用QtCr…

JRP Version 1.4.120

使用Flask学习制作网页一个月后: 借用HTML书籍学习,自己做的NAS管理系统终于是长得好看了一些: 使用模版继承,最开始是引用人家的库 from flask_bootstrap import Bootstrap, 效果: 我准备进一步管理但是发…

vivado 定义板文件板

定义板文件板 &#xff1c;board&#xff1e;标记是板文件的根。它包括识别基本信息的属性关于董事会。 <board schema_version"2.1" vendor"xilinx.com" name"kc705" display_name"Kintex-7 KC705 Evaluation Platform" url&qu…

python-基础篇-函数

文章目录 函数基础目标01. 函数的快速体验1.1 快速体验 02. 函数基本使用2.1 函数的定义2.2 函数调用2.3 第一个函数演练思考 2.4 PyCharm 的调试工具2.5 函数的文档注释 03. 函数的参数3.1 函数参数的使用3.2 参数的作用3.3 形参和实参 04. 函数的返回值05. 函数的嵌套调用函数…

Redis(四)

1、Redis的单/多线程 1.1、单线程 其实直接说Redis什么单线程或者是多线程&#xff0c;不太准确&#xff0c;在redis的4.0版主之前是单线程&#xff0c;然后在之后的版本中redis的渐渐改为多线程。 Redis是单线程主要是指Redis的网络IO和键值对读写是由一个线程来完成的&#…

C语言/c++指针详细讲解【超详细】【由浅入深】

指针用法简单介绍 指针&#xff0c;是内存单元的编号。 内存条分好多好多小单元&#xff0c;一个小单元有 8 位&#xff0c;可以存放 8 个 0 或 1&#xff1b;也就是说&#xff0c;内存的编号不是以位算的&#xff0c;而是以字节算的&#xff0c;不是一个 0 或 1 是一个编号&…

k3s x GitLab Runner Operator,GitLab CI 云原生构建新体验

GitLab CI 是非常常用的一款 CI/CD 工具&#xff0c;只需要在 .gitlab-ci.yml 文件中用 YAML 语法编写 CI/CD 流水线即可。而 GitLab CI 能够运行的关键组件是 GitLab Runner。GitLab Runner 是一个轻量级、高扩展的代理&#xff0c;主要用来执行 GitLab CI/CD 流水线中的 Job&…

1、中级机器学习课程简介

文章目录 1、课程简介2、先决条件 本课程所需数据集夸克网盘下载链接&#xff1a;https://pan.quark.cn/s/9b4e9a1246b2 提取码&#xff1a;uDzP 1、课程简介 欢迎来到机器学习中级课程&#xff01; 如果你对机器学习有一些基础&#xff0c;并且希望学习如何快速提高模型质量…

three.js从入门到精通系列教程026 - three.js通过SphereBufferGeometry创建用于投射阴影的球体

<!DOCTYPE html> <html><head><meta charset"UTF-8"><title>three.js从入门到精通系列教程026 - three.js通过SphereBufferGeometry创建用于投射阴影的球体</title><script src"ThreeJS/three.js"></script&…

立体视觉几何(一)

1.什么是立体视觉几何 立体视觉对应重建&#xff1a; • 对应&#xff1a;给定一幅图像中的点pl&#xff0c;找到另一幅图像中的对应点pr。 • 重建&#xff1a;给定对应关系(pl, pr)&#xff0c;计算空间中相应点的3D 坐标P。 立体视觉&#xff1a;从图像中的投影恢复场景中点…

vue2 点击按钮下载文件保存到本地(后台返回的zip压缩流)

// import ./mock/index.js; // 该项目所有请求使用mockjs模拟 去掉mock页面url下载 console.log(res, res)//token 是使页面不用去登录了if (res.file) {window.location.href Vue.prototype.$config.VUE_APP_BASE_IDSWAPI Vue.prototype.$config.VUE_APP_IDSW /service/mode…