Python爬虫学习笔记(十二)————scrapy案例

news2025/1/11 12:46:27

目录

1.yield

2.案例:当当网

3.案例:电影天堂


1.yield

(1)带有 yield 的函数不再是一个普通函数,而是一个生成器generator,可用于迭代

(2) yield 是一个类似 return 的关键字,迭代一次遇到yield时就返回yield后面(右边)的值。重点是:下一次迭代 时,从上一次迭代遇到的yield后面的代码(下一行)开始执行

(3)简要理解:yield就是 return 返回一个值,并且记住这个返回的位置,下次迭代就从这个位置后(下一行)开始

2.案例:当当网

(1)工程结构

(2)__init__.py文件

# This package will contain the spiders of your Scrapy project
#
# Please refer to the documentation for information on how to create and manage
# your spiders.

(3)dang.py文件

import scrapy
from scrapy_dangdang_095.items import ScrapyDangdang095Item



class DangSpider(scrapy.Spider):
    name = 'dang'
    # 如果是多页下载的话 那么必须要调整的是allowed_domains的范围 一般情况下只写域名
    allowed_domains = ['category.dangdang.com']
    start_urls = ['http://category.dangdang.com/cp01.01.02.00.00.00.html']


    base_url = 'http://category.dangdang.com/pg'
    page = 1

    def parse(self, response):
#       pipelines 下载数据
#       items     定义数据结构的
#         src = //ul[@id="component_59"]/li//img/@src
#         alt = //ul[@id="component_59"]/li//img/@alt
#         price = //ul[@id="component_59"]/li//p[@class="price"]/span[1]/text()
#         所有的seletor的对象 都可以再次调用xpath方法
        li_list = response.xpath('//ul[@id="component_59"]/li')

        for li in li_list:
            src = li.xpath('.//img/@data-original').extract_first()
            # 第一张图片和其他的图片的标签的属性是不一样的
            # 第一张图片的src是可以使用的  其他的图片的地址是data-original
            if src:
                src = src
            else:
                src = li.xpath('.//img/@src').extract_first()

            name = li.xpath('.//img/@alt').extract_first()
            price = li.xpath('.//p[@class="price"]/span[1]/text()').extract_first()

            book = ScrapyDangdang095Item(src=src,name=name,price=price)

            # 获取一个book就将book交给pipelines
            yield book


#       每一页的爬取的业务逻辑全都是一样的,所以我们只需要将执行的那个页的请求再次调用parse方法
# 就可以了
#         http://category.dangdang.com/pg2-cp01.01.02.00.00.00.html
#         http://category.dangdang.com/pg3-cp01.01.02.00.00.00.html
#         http://category.dangdang.com/pg4-cp01.01.02.00.00.00.html

        if self.page < 100:
            self.page = self.page + 1

            url = self.base_url + str(self.page) + '-cp01.01.02.00.00.00.html'

#             怎么去调用parse方法
#             scrapy.Request就是scrpay的get请求
#             url就是请求地址
#             callback是你要执行的那个函数  注意不需要加()
            yield scrapy.Request(url=url,callback=self.parse)

(4)items.py文件

# Define here the models for your scraped items
#
# See documentation in:
# https://docs.scrapy.org/en/latest/topics/items.html

import scrapy


class ScrapyDangdang095Item(scrapy.Item):
    # define the fields for your item here like:
    # name = scrapy.Field()
    # 通俗的说就是你要下载的数据都有什么

    # 图片
    src = scrapy.Field()
    # 名字
    name = scrapy.Field()
    # 价格
    price = scrapy.Field()

(5)middlewares.py文件

# Define here the models for your spider middleware
#
# See documentation in:
# https://docs.scrapy.org/en/latest/topics/spider-middleware.html

from scrapy import signals

# useful for handling different item types with a single interface
from itemadapter import is_item, ItemAdapter


class ScrapyDangdang095SpiderMiddleware:
    # Not all methods need to be defined. If a method is not defined,
    # scrapy acts as if the spider middleware does not modify the
    # passed objects.

    @classmethod
    def from_crawler(cls, crawler):
        # This method is used by Scrapy to create your spiders.
        s = cls()
        crawler.signals.connect(s.spider_opened, signal=signals.spider_opened)
        return s

    def process_spider_input(self, response, spider):
        # Called for each response that goes through the spider
        # middleware and into the spider.

        # Should return None or raise an exception.
        return None

    def process_spider_output(self, response, result, spider):
        # Called with the results returned from the Spider, after
        # it has processed the response.

        # Must return an iterable of Request, or item objects.
        for i in result:
            yield i

    def process_spider_exception(self, response, exception, spider):
        # Called when a spider or process_spider_input() method
        # (from other spider middleware) raises an exception.

        # Should return either None or an iterable of Request or item objects.
        pass

    def process_start_requests(self, start_requests, spider):
        # Called with the start requests of the spider, and works
        # similarly to the process_spider_output() method, except
        # that it doesn’t have a response associated.

        # Must return only requests (not items).
        for r in start_requests:
            yield r

    def spider_opened(self, spider):
        spider.logger.info('Spider opened: %s' % spider.name)


class ScrapyDangdang095DownloaderMiddleware:
    # Not all methods need to be defined. If a method is not defined,
    # scrapy acts as if the downloader middleware does not modify the
    # passed objects.

    @classmethod
    def from_crawler(cls, crawler):
        # This method is used by Scrapy to create your spiders.
        s = cls()
        crawler.signals.connect(s.spider_opened, signal=signals.spider_opened)
        return s

    def process_request(self, request, spider):
        # Called for each request that goes through the downloader
        # middleware.

        # Must either:
        # - return None: continue processing this request
        # - or return a Response object
        # - or return a Request object
        # - or raise IgnoreRequest: process_exception() methods of
        #   installed downloader middleware will be called
        return None

    def process_response(self, request, response, spider):
        # Called with the response returned from the downloader.

        # Must either;
        # - return a Response object
        # - return a Request object
        # - or raise IgnoreRequest
        return response

    def process_exception(self, request, exception, spider):
        # Called when a download handler or a process_request()
        # (from other downloader middleware) raises an exception.

        # Must either:
        # - return None: continue processing this exception
        # - return a Response object: stops process_exception() chain
        # - return a Request object: stops process_exception() chain
        pass

    def spider_opened(self, spider):
        spider.logger.info('Spider opened: %s' % spider.name)

(6) pipelines.py文件

# Define your item pipelines here
#
# Don't forget to add your pipeline to the ITEM_PIPELINES setting
# See: https://docs.scrapy.org/en/latest/topics/item-pipeline.html


# useful for handling different item types with a single interface
from itemadapter import ItemAdapter

# 如果想使用管道的话 那么就必须在settings中开启管道
class ScrapyDangdang095Pipeline:
    # 在爬虫文件开始的之前就执行的一个方法
    def open_spider(self,spider):
        self.fp = open('book.json','w',encoding='utf-8')

    # item就是yield后面的book对象
    def process_item(self, item, spider):
        # 以下这种模式不推荐  因为每传递过来一个对象 那么就打开一次文件  对文件的操作过于频繁

        # # (1) write方法必须要写一个字符串 而不能是其他的对象
        # # (2) w模式 会每一个对象都打开一次文件 覆盖之前的内容
        # with open('book.json','a',encoding='utf-8')as fp:
        #     fp.write(str(item))

        self.fp.write(str(item))


        return item

    # 在爬虫文件执行完之后  执行的方法
    def close_spider(self,spider):
        self.fp.close()

import urllib.request

# 多条管道开启
#    (1) 定义管道类
#   (2) 在settings中开启管道
# 'scrapy_dangdang_095.pipelines.DangDangDownloadPipeline':301
class DangDangDownloadPipeline:
    def process_item(self, item, spider):

        url = 'http:' + item.get('src')
        filename = './books/' + item.get('name') + '.jpg'

        urllib.request.urlretrieve(url = url, filename= filename)

        return item

(7)settings.py文件

# Scrapy settings for scrapy_dangdang_095 project
#
# For simplicity, this file contains only settings considered important or
# commonly used. You can find more settings consulting the documentation:
#
#     https://docs.scrapy.org/en/latest/topics/settings.html
#     https://docs.scrapy.org/en/latest/topics/downloader-middleware.html
#     https://docs.scrapy.org/en/latest/topics/spider-middleware.html

BOT_NAME = 'scrapy_dangdang_095'

SPIDER_MODULES = ['scrapy_dangdang_095.spiders']
NEWSPIDER_MODULE = 'scrapy_dangdang_095.spiders'


# Crawl responsibly by identifying yourself (and your website) on the user-agent
#USER_AGENT = 'scrapy_dangdang_095 (+http://www.yourdomain.com)'

# Obey robots.txt rules
ROBOTSTXT_OBEY = True

# Configure maximum concurrent requests performed by Scrapy (default: 16)
#CONCURRENT_REQUESTS = 32

# Configure a delay for requests for the same website (default: 0)
# See https://docs.scrapy.org/en/latest/topics/settings.html#download-delay
# See also autothrottle settings and docs
#DOWNLOAD_DELAY = 3
# The download delay setting will honor only one of:
#CONCURRENT_REQUESTS_PER_DOMAIN = 16
#CONCURRENT_REQUESTS_PER_IP = 16

# Disable cookies (enabled by default)
#COOKIES_ENABLED = False

# Disable Telnet Console (enabled by default)
#TELNETCONSOLE_ENABLED = False

# Override the default request headers:
#DEFAULT_REQUEST_HEADERS = {
#   'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
#   'Accept-Language': 'en',
#}

# Enable or disable spider middlewares
# See https://docs.scrapy.org/en/latest/topics/spider-middleware.html
#SPIDER_MIDDLEWARES = {
#    'scrapy_dangdang_095.middlewares.ScrapyDangdang095SpiderMiddleware': 543,
#}

# Enable or disable downloader middlewares
# See https://docs.scrapy.org/en/latest/topics/downloader-middleware.html
#DOWNLOADER_MIDDLEWARES = {
#    'scrapy_dangdang_095.middlewares.ScrapyDangdang095DownloaderMiddleware': 543,
#}

# Enable or disable extensions
# See https://docs.scrapy.org/en/latest/topics/extensions.html
#EXTENSIONS = {
#    'scrapy.extensions.telnet.TelnetConsole': None,
#}

# Configure item pipelines
# See https://docs.scrapy.org/en/latest/topics/item-pipeline.html
ITEM_PIPELINES = {
   #  管道可以有很多个  那么管道是有优先级的  优先级的范围是1到1000   值越小优先级越高
   'scrapy_dangdang_095.pipelines.ScrapyDangdang095Pipeline': 300,

#    DangDangDownloadPipeline
   'scrapy_dangdang_095.pipelines.DangDangDownloadPipeline':301
}

# Enable and configure the AutoThrottle extension (disabled by default)
# See https://docs.scrapy.org/en/latest/topics/autothrottle.html
#AUTOTHROTTLE_ENABLED = True
# The initial download delay
#AUTOTHROTTLE_START_DELAY = 5
# The maximum download delay to be set in case of high latencies
#AUTOTHROTTLE_MAX_DELAY = 60
# The average number of requests Scrapy should be sending in parallel to
# each remote server
#AUTOTHROTTLE_TARGET_CONCURRENCY = 1.0
# Enable showing throttling stats for every response received:
#AUTOTHROTTLE_DEBUG = False

# Enable and configure HTTP caching (disabled by default)
# See https://docs.scrapy.org/en/latest/topics/downloader-middleware.html#httpcache-middleware-settings
#HTTPCACHE_ENABLED = True
#HTTPCACHE_EXPIRATION_SECS = 0
#HTTPCACHE_DIR = 'httpcache'
#HTTPCACHE_IGNORE_HTTP_CODES = []
#HTTPCACHE_STORAGE = 'scrapy.extensions.httpcache.FilesystemCacheStorage'

3.案例:电影天堂

(1)工程结构

 

 (2)mv.py文件

import scrapy

from scrapy_movie_099.items import ScrapyMovie099Item

class MvSpider(scrapy.Spider):
    name = 'mv'
    allowed_domains = ['www.dytt8.net']
    start_urls = ['https://www.dytt8.net/html/gndy/china/index.html']

    def parse(self, response):
#         要第一个的名字 和 第二页的图片
        a_list = response.xpath('//div[@class="co_content8"]//td[2]//a[2]')

        for a in a_list:
            # 获取第一页的name 和 要点击的链接
            name = a.xpath('./text()').extract_first()
            href = a.xpath('./@href').extract_first()

            # 第二页的地址是
            url = 'https://www.dytt8.net' + href


            # 对第二页的链接发起访问
            yield  scrapy.Request(url=url,callback=self.parse_second,meta={'name':name})

    def parse_second(self,response):
        # 注意 如果拿不到数据的情况下  一定检查你的xpath语法是否正确
        src = response.xpath('//div[@id="Zoom"]//img/@src').extract_first()
        # 接受到请求的那个meta参数的值
        name = response.meta['name']

        movie = ScrapyMovie099Item(src=src,name=name)

        yield movie

(3)items.py文件

# Define here the models for your scraped items
#
# See documentation in:
# https://docs.scrapy.org/en/latest/topics/items.html

import scrapy


class ScrapyMovie099Item(scrapy.Item):
    # define the fields for your item here like:
    # name = scrapy.Field()
    name = scrapy.Field()
    src = scrapy.Field()

(4)middlewares.py文件

# Define here the models for your spider middleware
#
# See documentation in:
# https://docs.scrapy.org/en/latest/topics/spider-middleware.html

from scrapy import signals

# useful for handling different item types with a single interface
from itemadapter import is_item, ItemAdapter


class ScrapyMovie099SpiderMiddleware:
    # Not all methods need to be defined. If a method is not defined,
    # scrapy acts as if the spider middleware does not modify the
    # passed objects.

    @classmethod
    def from_crawler(cls, crawler):
        # This method is used by Scrapy to create your spiders.
        s = cls()
        crawler.signals.connect(s.spider_opened, signal=signals.spider_opened)
        return s

    def process_spider_input(self, response, spider):
        # Called for each response that goes through the spider
        # middleware and into the spider.

        # Should return None or raise an exception.
        return None

    def process_spider_output(self, response, result, spider):
        # Called with the results returned from the Spider, after
        # it has processed the response.

        # Must return an iterable of Request, or item objects.
        for i in result:
            yield i

    def process_spider_exception(self, response, exception, spider):
        # Called when a spider or process_spider_input() method
        # (from other spider middleware) raises an exception.

        # Should return either None or an iterable of Request or item objects.
        pass

    def process_start_requests(self, start_requests, spider):
        # Called with the start requests of the spider, and works
        # similarly to the process_spider_output() method, except
        # that it doesn’t have a response associated.

        # Must return only requests (not items).
        for r in start_requests:
            yield r

    def spider_opened(self, spider):
        spider.logger.info('Spider opened: %s' % spider.name)


class ScrapyMovie099DownloaderMiddleware:
    # Not all methods need to be defined. If a method is not defined,
    # scrapy acts as if the downloader middleware does not modify the
    # passed objects.

    @classmethod
    def from_crawler(cls, crawler):
        # This method is used by Scrapy to create your spiders.
        s = cls()
        crawler.signals.connect(s.spider_opened, signal=signals.spider_opened)
        return s

    def process_request(self, request, spider):
        # Called for each request that goes through the downloader
        # middleware.

        # Must either:
        # - return None: continue processing this request
        # - or return a Response object
        # - or return a Request object
        # - or raise IgnoreRequest: process_exception() methods of
        #   installed downloader middleware will be called
        return None

    def process_response(self, request, response, spider):
        # Called with the response returned from the downloader.

        # Must either;
        # - return a Response object
        # - return a Request object
        # - or raise IgnoreRequest
        return response

    def process_exception(self, request, exception, spider):
        # Called when a download handler or a process_request()
        # (from other downloader middleware) raises an exception.

        # Must either:
        # - return None: continue processing this exception
        # - return a Response object: stops process_exception() chain
        # - return a Request object: stops process_exception() chain
        pass

    def spider_opened(self, spider):
        spider.logger.info('Spider opened: %s' % spider.name)

(5)pipelines.py文件

# Define your item pipelines here
#
# Don't forget to add your pipeline to the ITEM_PIPELINES setting
# See: https://docs.scrapy.org/en/latest/topics/item-pipeline.html


# useful for handling different item types with a single interface
from itemadapter import ItemAdapter


class ScrapyMovie099Pipeline:

    def open_spider(self,spider):
        self.fp = open('movie.json','w',encoding='utf-8')


    def process_item(self, item, spider):

        self.fp.write(str(item))
        return item


    def close_spider(self,spider):
        self.fp.close()

(6)settings.py文件

# Scrapy settings for scrapy_movie_099 project
#
# For simplicity, this file contains only settings considered important or
# commonly used. You can find more settings consulting the documentation:
#
#     https://docs.scrapy.org/en/latest/topics/settings.html
#     https://docs.scrapy.org/en/latest/topics/downloader-middleware.html
#     https://docs.scrapy.org/en/latest/topics/spider-middleware.html

BOT_NAME = 'scrapy_movie_099'

SPIDER_MODULES = ['scrapy_movie_099.spiders']
NEWSPIDER_MODULE = 'scrapy_movie_099.spiders'


# Crawl responsibly by identifying yourself (and your website) on the user-agent
#USER_AGENT = 'scrapy_movie_099 (+http://www.yourdomain.com)'

# Obey robots.txt rules
ROBOTSTXT_OBEY = True

# Configure maximum concurrent requests performed by Scrapy (default: 16)
#CONCURRENT_REQUESTS = 32

# Configure a delay for requests for the same website (default: 0)
# See https://docs.scrapy.org/en/latest/topics/settings.html#download-delay
# See also autothrottle settings and docs
#DOWNLOAD_DELAY = 3
# The download delay setting will honor only one of:
#CONCURRENT_REQUESTS_PER_DOMAIN = 16
#CONCURRENT_REQUESTS_PER_IP = 16

# Disable cookies (enabled by default)
#COOKIES_ENABLED = False

# Disable Telnet Console (enabled by default)
#TELNETCONSOLE_ENABLED = False

# Override the default request headers:
#DEFAULT_REQUEST_HEADERS = {
#   'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
#   'Accept-Language': 'en',
#}

# Enable or disable spider middlewares
# See https://docs.scrapy.org/en/latest/topics/spider-middleware.html
#SPIDER_MIDDLEWARES = {
#    'scrapy_movie_099.middlewares.ScrapyMovie099SpiderMiddleware': 543,
#}

# Enable or disable downloader middlewares
# See https://docs.scrapy.org/en/latest/topics/downloader-middleware.html
#DOWNLOADER_MIDDLEWARES = {
#    'scrapy_movie_099.middlewares.ScrapyMovie099DownloaderMiddleware': 543,
#}

# Enable or disable extensions
# See https://docs.scrapy.org/en/latest/topics/extensions.html
#EXTENSIONS = {
#    'scrapy.extensions.telnet.TelnetConsole': None,
#}

# Configure item pipelines
# See https://docs.scrapy.org/en/latest/topics/item-pipeline.html
ITEM_PIPELINES = {
   'scrapy_movie_099.pipelines.ScrapyMovie099Pipeline': 300,
}

# Enable and configure the AutoThrottle extension (disabled by default)
# See https://docs.scrapy.org/en/latest/topics/autothrottle.html
#AUTOTHROTTLE_ENABLED = True
# The initial download delay
#AUTOTHROTTLE_START_DELAY = 5
# The maximum download delay to be set in case of high latencies
#AUTOTHROTTLE_MAX_DELAY = 60
# The average number of requests Scrapy should be sending in parallel to
# each remote server
#AUTOTHROTTLE_TARGET_CONCURRENCY = 1.0
# Enable showing throttling stats for every response received:
#AUTOTHROTTLE_DEBUG = False

# Enable and configure HTTP caching (disabled by default)
# See https://docs.scrapy.org/en/latest/topics/downloader-middleware.html#httpcache-middleware-settings
#HTTPCACHE_ENABLED = True
#HTTPCACHE_EXPIRATION_SECS = 0
#HTTPCACHE_DIR = 'httpcache'
#HTTPCACHE_IGNORE_HTTP_CODES = []
#HTTPCACHE_STORAGE = 'scrapy.extensions.httpcache.FilesystemCacheStorage'

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.coloradmin.cn/o/778562.html

如若内容造成侵权/违法违规/事实不符,请联系多彩编程网进行投诉反馈,一经查实,立即删除!

相关文章

《数据分析-JiMuReport07》JiMuReport报表开发-下拉框条数参数调整

JimuReport报表下拉框条数参数调整 {selectSearchPageSize:n} 1.下拉框条数限制 下拉框默认只显示10条记录&#xff0c;如果想要显示更多条数可以通过添加参数实现。 2.参数 selectSearchPageSize参数&#xff0c;设置参数大小 3.效果 可以看到设置的下拉框条数20条已经实现

细说小程序底部标签---【浅入深出系列006】

浅入深出系列总目录在000集 如何0元学微信小程序–【浅入深出系列000】 文章目录 本系列校训学习资源的选择 学习语法的前提底部标签的总概鹅厂的自定义标签官方说明&#xff1a; 先来了解app.json文件tabBar 位于app.json哪里 使用流程要注意的是&#xff1a;配套资源作业&a…

el-popover在原生table中,弹出多个以及内部取消按钮无效问题

问题&#xff1a;当el-popover和原生table同时使用的时候会失效&#xff08;不是el-table) <el-popover placement"bottom" width"500" trigger"click" :key"popover-${item.id}"></el-popover> 解决&#xff1a; :key…

虚拟数字人——NeRF实现实时对话数字人

前言 1.这是一个能实时对话的虚拟数字人demo,使用的是NeRF&#xff08;Neural Radiance Fields&#xff09;&#xff0c;训练方式可以看看我前面的博客。 2.文本转语音是用了VITS语音合成&#xff0c;项目git:https://github.com/jaywalnut310/vits . 3.语言模型是用了新开…

Jenkins从配置到实战(一) - 实现C/C++项目自动化编译

前言 本文章主要介绍了&#xff0c;如何去安装和部署Jenkins&#xff0c;并实现自动拉取项目代码&#xff0c;自动化编译流程。 网站 官网中文网站 下载安装 可以下载这个 安装jenkins前先安装java yum search java|grep jdkyum install java-1.8.0-openjdk 安装jenkins j…

NE555 PWM输出

NE555是一种集成电路&#xff08;IC&#xff09;&#xff0c;通常用于电子电路的各种目的&#xff0c;包括计时器、振荡器等等。 本文介绍搭建NE555电路输出PWM信号&#xff0c;电路如图下&#xff1a; 使用该电路可以输出PWM占空比≥50%波形&#xff0c;仿真波形如下图&#…

20230723在win10的命令行下显示文本文件的内容type

20230723在win10的命令行下显示文本文件的内容type 2023/7/23 20:35 百度搜索&#xff1a;WINDOWS 命令行 打开文本文件 windows命令行读取文件命令-WinFrom控件库|.net开源控件库... 2023年7月14日 linux下,可能会用到cat或都是more命令,windows下可以使用type或more命令 type…

VMware Fusion 14 Tech Preview - 适用于 Arm 的 Windows 11 上的全面 3D 加速

VMware Fusion 14 Tech Preview - 适用于 Arm 的 Windows 11 上的全面 3D 加速 VMware Fusion Tech Preview 2023 请访问原文链接&#xff1a;https://sysin.org/blog/vmware-fusion-14/&#xff0c;查看最新版。原创作品&#xff0c;转载请保留出处。 作者主页&#xff1a;…

求解包含约束的最优化问题:罚函数法

文章目录 外点罚函数法内点罚函数法罚函数法 vs 拉格朗日乘子法 外点罚函数法 针对包含约束条件的最优化问题&#xff0c;此前介绍的拉格朗日乘子法和KKT条件已经提供一种有效的解决方案。但由于我是从智能优化算法入门运筹优化行业的&#xff0c;所以在遇到这类问题时&#x…

day35-Image Carousel(图片轮播图简易版)

50 天学习 50 个项目 - HTMLCSS and JavaScript day35-Image Carousel&#xff08;图片轮播图简易版&#xff09; 效果 index.html <!DOCTYPE html> <html lang"en"><head><meta charset"UTF-8" /><meta name"viewport…

93、简述kafka架构设计

kafka架构设计 Consumer Group&#xff1a;消费者组&#xff0c;消费者组内每个消费者负责消费不同分区的数据&#xff0c;提高消费能力。逻辑上的一个订阅者。Topic: 可以理解为一个队列&#xff0c;Topic 将消息分类&#xff0c;生产者和消费者面向的是同一个 Topic。Partiti…

netty组件详解-中

接着之前的博客netty组件详解-上&#xff0c;我们继续深入到源码层面&#xff0c;来探究netty的各个组件和其设计思想&#xff1a; netty内置的通讯模式 我们在编写netty代码时&#xff0c;经常使用NioServerSocketChannel 作为通讯模式。 例如下面的简单netty客户端示例: pri…

Docker迁移默认的/var/lib/docker目录

安装完Docker后&#xff0c;默认存储路径在/var/lib/docker目录&#xff0c;如果服务器挂载的硬盘不是根目录的话&#xff0c;可能会造成资源不够用。这时候就需要迁移docker默认的目录。 1.停止docker服务 systemctl stop docker 复制 2.创建docker新目录 mkdir -p /data…

airtest-selenium 脚本爬取百度热搜标题

目录 1. 前言 2. 爬取标题的脚本 3. 命令行运行 Web 自动化脚本 1&#xff09;python 环境准备 2&#xff09;chrome 与 chromedriver 版本对应 3&#xff09;命令行运行 1. 前言 airtest-selenium是一个基于Python的UI自动化测试框架&#xff0c;它结合了airtest和sele…

【Redis】缓存问题小记

文章目录 1、缓存模型和思路1.1、缓存更新策略1.2、具体实现思路 2、缓存穿透问题2.1、方案分析2.2、缓存空对象实现思路2.3、小总结 3、缓存雪崩4、缓存击穿4.1、方案分析4.1.1、互斥锁4.1.2、逻辑过期4.1.3、方案对比 4.2、互斥锁实现思路4.3、逻辑过期实现思路 1、缓存模型和…

微服务——统一网关Getway

为什么需要网关&#xff1f; 网关的两种实现: 网关Getway——快速入门 步骤一 网关背身也是一个微服务&#xff0c;需要注册到nacos中去 步骤二 成功运行后 可以通过网关进行请求转发到对应服务。 流程如下&#xff1a; 路由断言工厂 网关路由可以配置的东西有如下。 spri…

RocketMQ分布式事务 -> 最终一致性实现

文章目录 前言事务消息场景代码示例订单服务事务日志表TransactionMQProducerOrderTransactionListener业务实现类调用总结 积分服务积分记录表消费者启动消费者监听器增加积分幂等性消费消费异常 前言 分布式事务的问题常在业务与面试中被提及, 近日摸鱼看到这篇文章, 阐述的…

Web前端开发概述(二)

&#x1f60a;Web前端开发概述&#xff08;二&#xff09; &#x1f47b;前言&#x1fa81;前端开发背景&#x1f50d;当下前端开发要求&#x1f526;Web前端开发技术&#x1f3ad;HTML&#x1f3ad;CSS&#x1f3ad;JavaScript&#x1f3ad;HTML DOM&#x1f3ad;BOM&#x1f…

Spring中AOP的通知类型和执行顺序

Spring中AOP的通知类型&#xff1a; Around&#xff1a;环绕通知&#xff0c;此注解标注的通知方法在目标方法前、后都被执行Before&#xff1a;前置通知&#xff0c;此注解标注的通知方法在目标方法前被执行After &#xff1a;后置通知&#xff0c;此注解标注的通知方法在目标…

Jmeter+Jenkins+Ant自动化持续集成环境搭建

一、安装准备 1.JDK:jdk-8u121-windows-x64 2.jmeter工具&#xff1a;apache-jmeter-2.13 3.ANT工具&#xff1a;apache-ant-1.9.7-bin 4.jenkins工具&#xff1a;jenkins-2.32.2 二、软件安装 1.JDK的安装 >双击JDK安装包&#xff0c;选择安装路径&#xff08;本人是…