python 字符串驻留机制

news2025/4/6 7:48:53

偶然发现一个python字符串的现象:

>>> a = '123_abc'
>>> b = '123_abc'
>>> a is b
True
>>> c = 'abc#123'
>>> d = 'abc#123'
>>> c is d
False

这是为什么呢,原来它们的id不一样。

>>> id(a) == id(b)
True
>>> id(c) == id(d)
False

那为什么它们的地址有的相同,有的不同呢?查询后得知这是一种 Python 的字符串驻留机制。

字符串驻留机制

也称为字符串常量优化(string interning),是一种在 Python 解释器中自动进行的优化过程。它主要目的是减少内存的使用,提高程序的运行效率。

工作原理

  1. 小字符串:Python 只会对短小的字符串进行驻留。但是,这个长度并不是固定的,它可能会因 Python 的不同版本或实现而有所不同。
  2. 字符串池(String Pool):Python 解释器维护一个字符串池,用于存储所有已经出现过的字符串常量。
  3. 驻留(Interning):当解释器遇到一个新的字符串字面量时,它会首先检查这个字符串是否已经存在于字符串池中。如果存在,则直接使用池中的引用;如果不存在,就将这个字符串添加到池中,并返回这个字符串的引用。
  4. 内存节省:由于相同的字符串字面量在程序中可能被多次使用,通过字符串驻留机制,可以确保这些重复的字符串只存储一次,从而节省内存。
  5. 性能提升:字符串比较操作可以通过比较它们的引用地址来完成,这比逐字符比较要快得多。因此,字符串驻留可以提高字符串比较的性能。
  6. 自动和透明:字符串驻留是自动进行的,程序员不需要显式地进行任何操作。Python 解释器会在后台处理这一过程。
  7. 不可变类型:字符串驻留机制只适用于不可变类型,因为可变类型的对象内容可能会改变,这会使得引用地址比较失去意义。如果字符串可以修改,那么驻留机制可能会导致意外的副作用。
  8. 限制:字符串驻留机制虽然有诸多好处,但也存在一些限制。例如,如果程序中使用了大量的动态生成的字符串,那么字符串驻留可能不会带来太大的好处,因为这些字符串可能不会被重复使用。
  9. 字符串字面量:只有当字符串是字面量时,Python 才会尝试进行驻留。通过其他方式(如 str() 函数、字符串拼接等)创建的字符串通常不会被驻留。
  10. 编译时驻留:字符串驻留是在 Python 源代码编译成字节码时进行的,而不是在运行时。这意味着在运行时动态生成的字符串通常不会被驻留。

显式驻留

Python 提供了一个sys库函数 intern(),允许程序员显式地将一个字符串驻留。使用这个函数可以手动控制字符串的驻留过程:

>>> from sys import intern
>>> s = intern('abc#123')
>>> t = intern('abc#123')
>>> s is t
True
>>> s = 'abc#123'
>>> t = 'abc#123'
>>> s is t
False
>>> a = intern('abc_123')
>>> b = 'abc_123'
>>> a is b
True

字符串长短的分界

小字符串才进行驻留,具体多少长度是界线也没有细查,大概用二分法也能枚举出来。

>>> x = 'abcdefghijklmnopqrstuvwxyz'*100
>>> y = 'abcdefghijklmnopqrstuvwxyz'*100
>>> x is y
True
>>> x = 'abcdefghijklmnopqrstuvwxyz'*200
>>> y = 'abcdefghijklmnopqrstuvwxyz'*200
>>> x is y
False
>>> x = 'abcdefghijklmnopqrstuvwxyz'*150
>>> y = 'abcdefghijklmnopqrstuvwxyz'*150
>>> x is y
True
>>> x = 'abcdefghijklmnopqrstuvwxyz'*175
>>> y = 'abcdefghijklmnopqrstuvwxyz'*175
>>> x is y
False
......

附:字符串方法

(版本python 3.12)

capitalize(self, /)
    Return a capitalized version of the string.

    More specifically, make the first character have upper case and the rest lower
    case.

casefold(self, /)
    Return a version of the string suitable for caseless comparisons.

center(self, width, fillchar=' ', /)
    Return a centered string of length width.

    Padding is done using the specified fill character (default is a space).

count(...)
    S.count(sub[, start[, end]]) -> int

    Return the number of non-overlapping occurrences of substring sub in
    string S[start:end].  Optional arguments start and end are
    interpreted as in slice notation.

encode(self, /, encoding='utf-8', errors='strict')
    Encode the string using the codec registered for encoding.

    encoding
      The encoding in which to encode the string.
    errors
      The error handling scheme to use for encoding errors.
      The default is 'strict' meaning that encoding errors raise a
      UnicodeEncodeError.  Other possible values are 'ignore', 'replace' and
      'xmlcharrefreplace' as well as any other name registered with
      codecs.register_error that can handle UnicodeEncodeErrors.

endswith(...)
    S.endswith(suffix[, start[, end]]) -> bool

    Return True if S ends with the specified suffix, False otherwise.
    With optional start, test S beginning at that position.
    With optional end, stop comparing S at that position.
    suffix can also be a tuple of strings to try.

expandtabs(self, /, tabsize=8)
    Return a copy where all tab characters are expanded using spaces.

    If tabsize is not given, a tab size of 8 characters is assumed.

find(...)
    S.find(sub[, start[, end]]) -> int

    Return the lowest index in S where substring sub is found,
    such that sub is contained within S[start:end].  Optional
    arguments start and end are interpreted as in slice notation.

    Return -1 on failure.

format(...)
    S.format(*args, **kwargs) -> str

    Return a formatted version of S, using substitutions from args and kwargs.
    The substitutions are identified by braces ('{' and '}').

format_map(...)
    S.format_map(mapping) -> str

    Return a formatted version of S, using substitutions from mapping.
    The substitutions are identified by braces ('{' and '}').

index(...)
    S.index(sub[, start[, end]]) -> int

    Return the lowest index in S where substring sub is found,
    such that sub is contained within S[start:end].  Optional
    arguments start and end are interpreted as in slice notation.

    Raises ValueError when the substring is not found.

isalnum(self, /)
    Return True if the string is an alpha-numeric string, False otherwise.

    A string is alpha-numeric if all characters in the string are alpha-numeric and
    there is at least one character in the string.

isalpha(self, /)
    Return True if the string is an alphabetic string, False otherwise.

    A string is alphabetic if all characters in the string are alphabetic and there
    is at least one character in the string.

isascii(self, /)
    Return True if all characters in the string are ASCII, False otherwise.

    ASCII characters have code points in the range U+0000-U+007F.
    Empty string is ASCII too.

isdecimal(self, /)
    Return True if the string is a decimal string, False otherwise.

    A string is a decimal string if all characters in the string are decimal and
    there is at least one character in the string.

isdigit(self, /)
    Return True if the string is a digit string, False otherwise.

    A string is a digit string if all characters in the string are digits and there
    is at least one character in the string.

isidentifier(self, /)
    Return True if the string is a valid Python identifier, False otherwise.

    Call keyword.iskeyword(s) to test whether string s is a reserved identifier,
    such as "def" or "class".

islower(self, /)
    Return True if the string is a lowercase string, False otherwise.

    A string is lowercase if all cased characters in the string are lowercase and
    there is at least one cased character in the string.

isnumeric(self, /)
    Return True if the string is a numeric string, False otherwise.

    A string is numeric if all characters in the string are numeric and there is at
    least one character in the string.

isprintable(self, /)
    Return True if the string is printable, False otherwise.

    A string is printable if all of its characters are considered printable in
    repr() or if it is empty.

isspace(self, /)
    Return True if the string is a whitespace string, False otherwise.

    A string is whitespace if all characters in the string are whitespace and there
    is at least one character in the string.

istitle(self, /)
    Return True if the string is a title-cased string, False otherwise.

    In a title-cased string, upper- and title-case characters may only
    follow uncased characters and lowercase characters only cased ones.

isupper(self, /)
    Return True if the string is an uppercase string, False otherwise.

    A string is uppercase if all cased characters in the string are uppercase and
    there is at least one cased character in the string.

join(self, iterable, /)
    Concatenate any number of strings.

    The string whose method is called is inserted in between each given string.
    The result is returned as a new string.

    Example: '.'.join(['ab', 'pq', 'rs']) -> 'ab.pq.rs'

ljust(self, width, fillchar=' ', /)
    Return a left-justified string of length width.

    Padding is done using the specified fill character (default is a space).

lower(self, /)
    Return a copy of the string converted to lowercase.

lstrip(self, chars=None, /)
    Return a copy of the string with leading whitespace removed.

    If chars is given and not None, remove characters in chars instead.

partition(self, sep, /)
    Partition the string into three parts using the given separator.

    This will search for the separator in the string.  If the separator is found,
    returns a 3-tuple containing the part before the separator, the separator
    itself, and the part after it.

    If the separator is not found, returns a 3-tuple containing the original string
    and two empty strings.

removeprefix(self, prefix, /)
    Return a str with the given prefix string removed if present.

    If the string starts with the prefix string, return string[len(prefix):].
    Otherwise, return a copy of the original string.

removesuffix(self, suffix, /)
    Return a str with the given suffix string removed if present.

    If the string ends with the suffix string and that suffix is not empty,
    return string[:-len(suffix)]. Otherwise, return a copy of the original
    string.

replace(self, old, new, count=-1, /)
    Return a copy with all occurrences of substring old replaced by new.

      count
        Maximum number of occurrences to replace.
        -1 (the default value) means replace all occurrences.

    If the optional argument count is given, only the first count occurrences are
    replaced.

rfind(...)
    S.rfind(sub[, start[, end]]) -> int

    Return the highest index in S where substring sub is found,
    such that sub is contained within S[start:end].  Optional
    arguments start and end are interpreted as in slice notation.

    Return -1 on failure.

rindex(...)
    S.rindex(sub[, start[, end]]) -> int

    Return the highest index in S where substring sub is found,
    such that sub is contained within S[start:end].  Optional
    arguments start and end are interpreted as in slice notation.

    Raises ValueError when the substring is not found.

rjust(self, width, fillchar=' ', /)
    Return a right-justified string of length width.

    Padding is done using the specified fill character (default is a space).

rpartition(self, sep, /)
    Partition the string into three parts using the given separator.

    This will search for the separator in the string, starting at the end. If
    the separator is found, returns a 3-tuple containing the part before the
    separator, the separator itself, and the part after it.

    If the separator is not found, returns a 3-tuple containing two empty strings
    and the original string.

rsplit(self, /, sep=None, maxsplit=-1)
    Return a list of the substrings in the string, using sep as the separator string.

      sep
        The separator used to split the string.

        When set to None (the default value), will split on any whitespace
        character (including \n \r \t \f and spaces) and will discard
        empty strings from the result.
      maxsplit
        Maximum number of splits (starting from the left).
        -1 (the default value) means no limit.

    Splitting starts at the end of the string and works to the front.

rstrip(self, chars=None, /)
    Return a copy of the string with trailing whitespace removed.

    If chars is given and not None, remove characters in chars instead.

split(self, /, sep=None, maxsplit=-1)
    Return a list of the substrings in the string, using sep as the separator string.

      sep
        The separator used to split the string.

        When set to None (the default value), will split on any whitespace
        character (including \n \r \t \f and spaces) and will discard
        empty strings from the result.
      maxsplit
        Maximum number of splits (starting from the left).
        -1 (the default value) means no limit.

    Note, str.split() is mainly useful for data that has been intentionally
    delimited.  With natural text that includes punctuation, consider using
    the regular expression module.

splitlines(self, /, keepends=False)
    Return a list of the lines in the string, breaking at line boundaries.

    Line breaks are not included in the resulting list unless keepends is given and
    true.

startswith(...)
    S.startswith(prefix[, start[, end]]) -> bool

    Return True if S starts with the specified prefix, False otherwise.
    With optional start, test S beginning at that position.
    With optional end, stop comparing S at that position.
    prefix can also be a tuple of strings to try.

strip(self, chars=None, /)
    Return a copy of the string with leading and trailing whitespace removed.

    If chars is given and not None, remove characters in chars instead.

swapcase(self, /)
    Convert uppercase characters to lowercase and lowercase characters to uppercase.

title(self, /)
    Return a version of the string where each word is titlecased.

    More specifically, words start with uppercased characters and all remaining
    cased characters have lower case.

translate(self, table, /)
    Replace each character in the string using the given translation table.

      table
        Translation table, which must be a mapping of Unicode ordinals to
        Unicode ordinals, strings, or None.

    The table must implement lookup/indexing via __getitem__, for instance a
    dictionary or list.  If this operation raises LookupError, the character is
    left untouched.  Characters mapped to None are deleted.

upper(self, /)
    Return a copy of the string converted to uppercase.

zfill(self, width, /)
    Pad a numeric string with zeros on the left, to fill a field of the given width.

    The string is never truncated.


目录

字符串驻留机制

工作原理

显式驻留

字符串长短的分界

附:字符串方法


本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.coloradmin.cn/o/1853978.html

如若内容造成侵权/违法违规/事实不符,请联系多彩编程网进行投诉反馈,一经查实,立即删除!

相关文章

uni-pay 2.x:一站式支付解决方案,让支付变得简单高效

一、引言 在移动互联网时代,支付功能已成为各类应用不可或缺的一部分。然而,支付功能的开发往往伴随着复杂的流程和高昂的成本,特别是在对接微信支付、支付宝支付等主流支付渠道时,前端后端的开发工作量和出错率都较高。为了简化…

TCP与UDP_三次握手_四次挥手

TCP vs UDP TCP数据 具体可以通过Cisco Packet Tracer工具查看: UDP数据 三次握手、四次挥手 为什么是3/4次?这牵扯到单工、双工通信的问题 TCP建立连接:表白 TCP释放连接:分手 TCP—建立连接—三次握手 解释: 首先&…

对于C++ 程序员来说,35岁魔咒是否存在?

大家常说程序员职业生涯会在35岁左右遇到所谓的“35岁魔咒”。这意味着在这个年龄段,程序员可能会面临就业不稳定或职业发展的挑战。对于C程序员来说,这个问题更加引人关注。 随着时间的推移,技术行业不断演进,新的编程语言层出不…

linux高级编程(1)

linux操作系统编程: 实现一个 用户程序 (1).库函数 --来实现 (2).系统调用 也就是说,程序要进行系统调用的话,有直接和间接(通过库函数)两种方式 linux里面对文件的处理: 思想: 一切皆文件 everything is file&…

【jenkins1】gitlab与jenkins集成

文章目录 1.Jenkins-docker配置:运行在8080端口上,机器只要安装docker就能装载image并运行容器2.Jenkins与GitLab配置:docker ps查看正在运行,浏览器访问http://10....:8080/2.1 GitLab与Jenkins的Access Token配置:不…

20-OWASP top10--XXS跨站脚本攻击

目录 什么是xxs? XSS漏洞出现的原因 XSS分类 反射型XSS 储存型XSS DOM型 XSS XSS漏洞复现 XSS的危害或能做什么? 劫持用户cookie 钓鱼登录 XSS获取键盘记录 同源策略 (1)什么是跨域 (2)同源策略…

PD虚拟机和VMware有什么区别?PD虚拟机和VMware谁更好用?

随着电脑硬件设备的飞快发展,一些高端的技术已经不再遥不可及,比如虚拟化,虚拟机技术已经成为IT领域和个人用户不可或缺的工具。特别是PD虚拟机(Parallels Desktop)和VMware,作为市场上两个主流的虚拟机软件…

【Web APIs】DOM 文档对象模型 ⑤ ( 获取特殊元素 | 获取 html 元素 | 获取 body 元素 )

文章目录 一、获取特殊元素1、获取 html 元素2、获取 body 元素3、完整代码示例 本博客相关参考文档 : WebAPIs 参考文档 : https://developer.mozilla.org/zh-CN/docs/Web/APIgetElementById 函数参考文档 : https://developer.mozilla.org/zh-CN/docs/Web/API/Document/getE…

4、SpringMVC 实战小项目【加法计算器、用户登录、留言板、图书管理系统】

SpringMVC 实战小项目 3.1 加法计算器3.1.1 准备⼯作前端 3.1.2 约定前后端交互接⼝需求分析接⼝定义请求参数:响应数据: 3.1.3 服务器代码 3.2 ⽤⼾登录3.2.1 准备⼯作3.2.2 约定前后端交互接⼝3.2.3 实现服务器端代码 3.3 留⾔板实现服务器端代码 3.4 图书管理系统准备后端 3…

【STM32c8t6】AHT20温湿度采集

【STM32c8t6】AHT20温湿度采集 一、探究目的二、探究原理2.1 I2C2.1. 硬件I2C2.1. 软件I2C 2.2 AHT20数据手册 三、实验过程3.1 CubeMX配置3.2 实物接线图3.3 完整代码3.4 效果展示 四、探究总结 一、探究目的 学习I2C总线通信协议,使用STM32F103完成基于I2C协议的A…

打印机状态显示错误是什么原因?这5个有效方法要记好!

打印机是现代办公中不可或缺的设备之一,但在使用过程中,打印机状态显示错误是一个常见的问题。本文将详细探讨打印机状态显示错误的原因及其解决方法。 摘要 打印机状态显示错误的原因及解决方法如下: 1、网络连接问题:原因&…

【docker1】指令,docker-compose,Dockerfile

文章目录 1.pull/image,run/ps(进程),exec/commit2.save/load:docker save 镜像id,不是容器id3.docker-compose:多容器:宿主机(eth0网卡)安装docker会生成一…

Linux简单使用——配置仓库

虚拟机和Xshell连接 在虚拟机上打开终端查看IP 在Xshell上建立会话 输入ssh root192.168.231.123 防火墙关闭 、 重启计算机命令 删除文件 然后ls查看 清除之前的垃圾 最后做一下命令缓存

C语言 | Leetcode C语言题解之第174题地下城游戏

题目: 题解: int calculateMinimumHP(int** dungeon, int dungeonSize, int* dungeonColSize) {int n dungeonSize, m dungeonColSize[0];int dp[n 1][m 1];memset(dp, 0x3f, sizeof(dp));dp[n][m - 1] dp[n - 1][m] 1;for (int i n - 1; i >…

DataStructure.时间和空间复杂度

时间和空间复杂度 【本节目标】1. 如何衡量一个算法的好坏2. 算法效率3. 时间复杂度3.1 时间复杂度的概念3.2 大O的渐进表示法3.3 推导大O阶方法3.4 常见时间复杂度计算举例3.4.1 示例13.4.2 示例23.4.3 示例33.4.4 示例43.4.5 示例53.4.6 示例63.4.7 示例7 4.空间复杂度4.1 示…

车载测试系列:CAN协议之远程帧

远程帧(也叫遥控帧):是接收单元向发送单元请求发送具有标识符的数据所用的帧,由 6 个段组成,没有数据段。 当某个节点需要数据时,可以发送远程帧请求另一节点发送相应数据帧。 简单的说:发起方…

首次使用回声状态网络 (ESN) 和语音特征进行帕金森病 (PD) 预测

帕金森病(Parkinsons disease, PD)是一种使人衰弱的神经退行性疾病,它需要进行精确和早期的诊断,以便为患者提供有效的治疗和护理。这种疾病是由James Parkinson在1817年首次确定的,其特征是多巴胺生成神经元的退化。多…

每日一题——Python代码实现力扣1. 两数之和(举一反三+思想解读+逐步优化)五千字好文

一个认为一切根源都是“自己不够强”的INTJ 个人主页:用哲学编程-CSDN博客专栏:每日一题——举一反三Python编程学习Python内置函数 Python-3.12.0文档解读 目录 菜鸡写法 代码分析 时间复杂度分析 空间复杂度分析 改进建议 我要更强 方法1: 使…

项目训练营第四天

项目训练营第四天 前端部分修改 前端用的是WebStorm和Ant Design Pro框架 Ant Design Pro是比较流行的一个前端登陆、注册、管理框架,能帮我们快速实现前端界面的开发 效果大致如图 使用起来也极为方便,首先在WebStorm 控制台中输入如下命令 # 使用…

Linux的基本指令第二篇

1.cat - 查看文件 语法:cat [选项] [文件] 功能: 查看目标文件的内容 -b 对非空输出行编号 -n对输出的所有行编号 -s不输出多行空行 现有一个文件test.c cat -n test.c cat -b test.c cat -s test.c 创建一个新文件 加入源文件的内容 || …