Chapter 12: Regular expressions | Python for Everybody 讲义笔记_En

news2024/10/5 17:21:26

文章目录

  • Python for Everybody
    • 课程简介
    • Regular Expressions
      • Regular Expressions
      • Character matching in regular expressions
      • Extracting data using regular expressions
      • Combining searching and extracting
      • Escape character
      • Summary
      • Bonus section for Unix / Linux users
      • Debugging
      • Python Regular Expression Quick Guide
      • Glossary
      • Exercises


Python for Everybody

Exploring Data Using Python 3
Dr. Charles R. Severance


课程简介

Python for Everybody 零基础程序设计(Python 入门)

  • This course aims to teach everyone the basics of programming computers using Python. 本课程旨在向所有人传授使用 Python 进行计算机编程的基础知识。
  • We cover the basics of how one constructs a program from a series of simple instructions in Python. 我们介绍了如何通过 Python 中的一系列简单指令构建程序的基础知识。
  • The course has no pre-requisites and avoids all but the simplest mathematics. Anyone with moderate computer experience should be able to master the materials in this course. 该课程没有任何先决条件,除了最简单的数学之外,避免了所有内容。任何具有中等计算机经验的人都应该能够掌握本课程中的材料。
  • This course will cover Chapters 1-5 of the textbook “Python for Everybody”. Once a student completes this course, they will be ready to take more advanced programming courses. 本课程将涵盖《Python for Everyday》教科书的第 1-5 章。学生完成本课程后,他们将准备好学习更高级的编程课程。
  • This course covers Python 3.

在这里插入图片描述

coursera

Python for Everybody 零基础程序设计(Python 入门)

Charles Russell Severance
Clinical Professor

在这里插入图片描述

个人主页
Twitter

在这里插入图片描述

University of Michigan


课程资源

coursera原版课程视频
coursera原版视频-中英文精校字幕-B站
Dr. Chuck官方翻录版视频-机器翻译字幕-B站

PY4E-课程配套练习
Dr. Chuck Online - 系列课程开源官网



Regular Expressions

Regular Expressions allow us to search for patterns in strings and extract data from strings using the regular expression programming language.


Regular Expressions

So far we have been reading through files, looking for patterns and extracting various bits of lines that we find interesting. We have been using string methods like split and find and using lists and string slicing to extract portions of the lines.

This task of searching and extracting is so common that Python has a very powerful module called regular expressions that handles many of these tasks quite elegantly. The reason we have not introduced regular expressions earlier in the book is because while they are very powerful, they are a little complicated and their syntax takes some getting used to.

Regular expressions are almost their own little programming language for searching and parsing strings. As a matter of fact, entire books have been written on the topic of regular expressions. In this chapter, we will only cover the basics of regular expressions. For more detail on regular expressions, see:
https://en.wikipedia.org/wiki/Regular_expression
https://docs.python.org/library/re.html

The regular expression module re must be imported into your program before you can use it. The simplest use of the regular expression module is the search() function. The following program demonstrates a trivial use of the search function.

# Search for lines that contain 'From'
import re
hand = open('mbox-short.txt')
for line in hand:
    line = line.rstrip()
    if re.search('From:', line):
        print(line)

# Code: http://www.py4e.com/code3/re01.py

We open the file, loop through each line, and use the regular expression search() to only print out lines that contain the string “From:”. This program does not use the real power of regular expressions, since we could have just as easily used line.find() to accomplish the same result.

The power of the regular expressions comes when we add special characters to the search string that allow us to more precisely control which lines match the string. Adding these special characters to our regular expression allow us to do sophisticated matching and extraction while writing very little code.

For example, the caret character is used in regular expressions to match “the beginning” of a line. We could change our program to only match lines where “From:” was at the beginning of the line as follows:

# Search for lines that start with 'From'
import re
hand = open('mbox-short.txt')
for line in hand:
    line = line.rstrip()
    if re.search('^From:', line):
        print(line)

# Code: http://www.py4e.com/code3/re02.py

Now we will only match lines that start with the string “From:”. This is still a very simple example that we could have done equivalently with the startswith() method from the string module. But it serves to introduce the notion that regular expressions contain special action characters that give us more control as to what will match the regular expression.


Character matching in regular expressions

There are a number of other special characters that let us build even more powerful regular expressions. The most commonly used special character is the period or full stop, which matches any character.

In the following example, the regular expression F..m: would match any of the strings “From:”, “Fxxm:”, “F12m:”, or “F!@m:” since the period characters in the regular expression match any character.

# Search for lines that start with 'F', followed by
# 2 characters, followed by 'm:'
import re
hand = open('mbox-short.txt')
for line in hand:
    line = line.rstrip()
    if re.search('^F..m:', line):
        print(line)

# Code: http://www.py4e.com/code3/re03.py

This is particularly powerful when combined with the ability to indicate that a character can be repeated any number of times using the * or + characters in your regular expression. These special characters mean that instead of matching a single character in the search string, they match zero-or-more characters (in the case of the asterisk) or one-or-more of the characters (in the case of the plus sign).

We can further narrow down the lines that we match using a repeated wild card character in the following example:

# Search for lines that start with From and have an at sign
import re
hand = open('mbox-short.txt')
for line in hand:
    line = line.rstrip()
    if re.search('^From:.+@', line):
        print(line)

# Code: http://www.py4e.com/code3/re04.py

The search string ^From:.+@ will successfully match lines that start with “From:”, followed by one or more characters (.+), followed by an at-sign. So this will match the following line:

From: stephen.marquard@uct.ac.za

You can think of the .+ wildcard as expanding to match all the characters between the colon character and the at-sign.

From:.+@

It is good to think of the plus and asterisk characters as “pushy”. For example, the following string would match the last at-sign in the string as the .+ pushes outwards, as shown below:

From: stephen.marquard@uct.ac.za, csev@umich.edu, and cwen @iupui.edu

It is possible to tell an asterisk or plus sign not to be so “greedy” by adding another character. See the detailed documentation for information on turning off the greedy behavior.


Extracting data using regular expressions

If we want to extract data from a string in Python we can use the findall() method to extract all of the substrings which match a regular expression. Let’s use the example of wanting to extract anything that looks like an email address from any line regardless of format. For example, we want to pull the email addresses from each of the following lines:

From stephen.marquard@uct.ac.za Sat Jan  5 09:14:16 2008
Return-Path: <postmaster@collab.sakaiproject.org>
          for <source@collab.sakaiproject.org>;
Received: (from apache@localhost)
Author: stephen.marquard@uct.ac.za

We don’t want to write code for each of the types of lines, splitting and slicing differently for each line. This following program uses findall() to find the lines with email addresses in them and extract one or more addresses from each of those lines.

import re
s = 'A message from csev@umich.edu to cwen@iupui.edu about meeting @2PM'
lst = re.findall('\S+@\S+', s)
print(lst)

# Code: http://www.py4e.com/code3/re05.py

The findall() method searches the string in the second argument and returns a list of all of the strings that look like email addresses. We are using a two-character sequence that matches a non-whitespace character (\S).

The output of the program would be:

['csev@umich.edu', 'cwen@iupui.edu']

Translating the regular expression, we are looking for substrings that have at least one non-whitespace character, followed by an at-sign, followed by at least one more non-whitespace character. The \S+ matches as many non-whitespace characters as possible.

The regular expression would match twice (csev@umich.edu and cwen@iupui.edu), but it would not match the string “@2PM” because there are no non-blank characters before the at-sign. We can use this regular expression in a program to read all the lines in a file and print out anything that looks like an email address as follows:

# Search for lines that have an at sign between characters
import re
hand = open('mbox-short.txt')
for line in hand:
    line = line.rstrip()
    x = re.findall('\S+@\S+', line)
    if len(x) > 0:
        print(x)

# Code: http://www.py4e.com/code3/re06.py

We read each line and then extract all the substrings that match our regular expression. Since findall() returns a list, we simply check if the number of elements in our returned list is more than zero to print only lines where we found at least one substring that looks like an email address.

If we run the program on mbox-short.txt we get the following output:

...
['<source@collab.sakaiproject.org>;']
['<source@collab.sakaiproject.org>;']
['apache@localhost)']
['source@collab.sakaiproject.org;']
['cwen@iupui.edu']
['source@collab.sakaiproject.org']
['cwen@iupui.edu']
['cwen@iupui.edu']
['wagnermr@iupui.edu']

Some of our email addresses have incorrect characters like “<” or “;” at the beginning or end. Let’s declare that we are only interested in the portion of the string that starts and ends with a letter or a number.

To do this, we use another feature of regular expressions. Square brackets are used to indicate a set of multiple acceptable characters we are willing to consider matching. In a sense, the \S is asking to match the set of “non-whitespace characters”. Now we will be a little more explicit in terms of the characters we will match.

Here is our new regular expression:

[a-zA-Z0-9]\S*@\S*[a-zA-Z]

This is getting a little complicated and you can begin to see why regular expressions are their own little language unto themselves. Translating this regular expression, we are looking for substrings that start with a single lowercase letter, uppercase letter, or number “[a-zA-Z0-9]”, followed by zero or more non-blank characters (\S*), followed by an at-sign, followed by zero or more non-blank characters (\S*), followed by an uppercase or lowercase letter. Note that we switched from + to * to indicate zero or more non-blank characters since [a-zA-Z0-9] is already one non-blank character. Remember that the * or + applies to the single character immediately to the left of the plus or asterisk.

If we use this expression in our program, our data is much cleaner:

# Search for lines that have an at sign between characters
# The characters must be a letter or number
import re
hand = open('mbox-short.txt')
for line in hand:
    line = line.rstrip()
    x = re.findall('[a-zA-Z0-9]\S*@\S*[a-zA-Z]', line)
    if len(x) > 0:
        print(x)

# Code: http://www.py4e.com/code3/re07.py
...
['wagnermr@iupui.edu']
['cwen@iupui.edu']
['postmaster@collab.sakaiproject.org']
['200801032122.m03LMFo4005148@nakamura.uits.iupui.edu']
['source@collab.sakaiproject.org']
['source@collab.sakaiproject.org']
['source@collab.sakaiproject.org']
['apache@localhost']

Notice that on the source@collab.sakaiproject.org lines, our regular expression eliminated two letters at the end of the string (“>;”). This is because when we append [a-zA-Z] to the end of our regular expression, we are demanding that whatever string the regular expression parser finds must end with a letter. So when it sees the “>” at the end of “sakaiproject.org>;” it simply stops at the last “matching” letter it found (i.e., the “g” was the last good match).

Also note that the output of the program is a Python list that has a string as the single element in the list.


Combining searching and extracting

If we want to find numbers on lines that start with the string “X-” such as:

X-DSPAM-Confidence: 0.8475
X-DSPAM-Probability: 0.0000

we don’t just want any floating-point numbers from any lines. We only want to extract numbers from lines that have the above syntax.

We can construct the following regular expression to select the lines:

^X-.*: [0-9.]+

Translating this, we are saying, we want lines that start with X-, followed by zero or more characters (.*), followed by a colon (:) and then a space. After the space we are looking for one or more characters that are either a digit (0-9) or a period [0-9.]+. Note that inside the square brackets, the period matches an actual period (i.e., it is not a wildcard between the square brackets).

This is a very tight expression that will pretty much match only the lines we are interested in as follows:

# Search for lines that start with 'X' followed by any non
# whitespace characters and ':'
# followed by a space and any number.
# The number can include a decimal.
import re
hand = open('mbox-short.txt')
for line in hand:
    line = line.rstrip()
    if re.search('^X\S*: [0-9.]+', line):
        print(line)

# Code: http://www.py4e.com/code3/re10.py

When we run the program, we see the data nicely filtered to show only the lines we are looking for.

X-DSPAM-Confidence: 0.8475
X-DSPAM-Probability: 0.0000
X-DSPAM-Confidence: 0.6178
X-DSPAM-Probability: 0.0000
...

But now we have to solve the problem of extracting the numbers. While it would be simple enough to use split, we can use another feature of regular expressions to both search and parse the line at the same time.

Parentheses are another special character in regular expressions. When you add parentheses to a regular expression, they are ignored when matching the string. But when you are using findall(), parentheses indicate that while you want the whole expression to match, you only are interested in extracting a portion of the substring that matches the regular expression.

So we make the following change to our program:

# Search for lines that start with 'X' followed by any
# non whitespace characters and ':' followed by a space
# and any number. The number can include a decimal.
# Then print the number if it is greater than zero.
import re
hand = open('mbox-short.txt')
for line in hand:
    line = line.rstrip()
    x = re.findall('^X\S*: ([0-9.]+)', line)
    if len(x) > 0:
        print(x)

# Code: http://www.py4e.com/code3/re11.py

Instead of calling search(), we add parentheses around the part of the regular expression that represents the floating-point number to indicate we only want findall() to give us back the floating-point number portion of the matching string.

The output from this program is as follows:

['0.8475']
['0.0000']
['0.6178']
['0.0000']
['0.6961']
['0.0000']
...

The numbers are still in a list and need to be converted from strings to floating point, but we have used the power of regular expressions to both search and extract the information we found interesting.

As another example of this technique, if you look at the file there are a number of lines of the form:

Details: http://source.sakaiproject.org/viewsvn/?view=rev&rev=39772

If we wanted to extract all of the revision numbers (the integer number at the end of these lines) using the same technique as above, we could write the following program:

# Search for lines that start with 'Details: rev='
# followed by numbers
# Then print the number if one is found
import re
hand = open('mbox-short.txt')
for line in hand:
    line = line.rstrip()
    x = re.findall('^Details:.*rev=([0-9]+)', line)
    if len(x) > 0:
        print(x)

# Code: http://www.py4e.com/code3/re12.py

Translating our regular expression, we are looking for lines that start with Details:, followed by any number of characters (.*), followed by rev=, and then by one or more digits. We want to find lines that match the entire expression but we only want to extract the integer number at the end of the line, so we surround [0-9]+ with parentheses.

When we run the program, we get the following output:

['39772']
['39771']
['39770']
['39769']
...

Remember that the [0-9]+ is “greedy” and it tries to make as large a string of digits as possible before extracting those digits. This “greedy” behavior is why we get all five digits for each number. The regular expression module expands in both directions until it encounters a non-digit, or the beginning or the end of a line.

Now we can use regular expressions to redo an exercise from earlier in the book where we were interested in the time of day of each mail message. We looked for lines of the form:

From stephen.marquard@uct.ac.za Sat Jan  5 09:14:16 2008

and wanted to extract the hour of the day for each line. Previously we did this with two calls to split. First the line was split into words and then we pulled out the fifth word and split it again on the colon character to pull out the two characters we were interested in.

While this worked, it actually results in pretty brittle code that is assuming the lines are nicely formatted. If you were to add enough error checking (or a big try/except block) to insure that your program never failed when presented with incorrectly formatted lines, the code would balloon to 10-15 lines of code that was pretty hard to read.

We can do this in a far simpler way with the following regular expression:

^From .* [0-9][0-9]:

The translation of this regular expression is that we are looking for lines that start with From (note the space), followed by any number of characters (.*), followed by a space, followed by two digits [0-9][0-9], followed by a colon character. This is the definition of the kinds of lines we are looking for.

In order to pull out only the hour using findall(), we add parentheses around the two digits as follows:

^From .* ([0-9][0-9]):

This results in the following program:

# Search for lines that start with From and a character
# followed by a two digit number between 00 and 99 followed by ':'
# Then print the number if one is found
import re
hand = open('mbox-short.txt')
for line in hand:
    line = line.rstrip()
    x = re.findall('^From .* ([0-9][0-9]):', line)
    if len(x) > 0: print(x)

# Code: http://www.py4e.com/code3/re13.py

When the program runs, it produces the following output:

['09']
['18']
['16']
['15']
...

Escape character

Since we use special characters in regular expressions to match the beginning or end of a line or specify wild cards, we need a way to indicate that these characters are “normal” and we want to match the actual character such as a dollar sign or caret.

We can indicate that we want to simply match a character by prefixing that character with a backslash. For example, we can find money amounts with the following regular expression.

import re
x = 'We just received $10.00 for cookies.'
y = re.findall('\$[0-9.]+',x)

Since we prefix the dollar sign with a backslash, it actually matches the dollar sign in the input string instead of matching the “end of line”, and the rest of the regular expression matches one or more digits or the period character. Note: Inside square brackets, characters are not “special”. So when we say [0-9.], it really means digits or a period. Outside of square brackets, a period is the “wild-card” character and matches any character. Inside square brackets, the period is a period.


Summary

While this only scratched the surface of regular expressions, we have learned a bit about the language of regular expressions. They are search strings with special characters in them that communicate your wishes to the regular expression system as to what defines “matching” and what is extracted from the matched strings. Here are some of those special characters and character sequences:
^ Matches the beginning of the line.

$ Matches the end of the line.

. Matches any character (a wildcard).

\s Matches a whitespace character.

\S Matches a non-whitespace character (opposite of \s).

* Applies to the immediately preceding character(s) and indicates to match zero or more times.

*? Applies to the immediately preceding character(s) and indicates to match zero or more times in “non-greedy mode”.

+ Applies to the immediately preceding character(s) and indicates to match one or more times.

+? Applies to the immediately preceding character(s) and indicates to match one or more times in “non-greedy mode”.

? Applies to the immediately preceding character(s) and indicates to match zero or one time.

?? Applies to the immediately preceding character(s) and indicates to match zero or one time in “non-greedy mode”.

[aeiou] Matches a single character as long as that character is in the specified set. In this example, it would match “a”, “e”, “i”, “o”, or “u”, but no other characters.

[a-z0-9] You can specify ranges of characters using the minus sign. This example is a single character that must be a lowercase letter or a digit.

[^A-Za-z] When the first character in the set notation is a caret, it inverts the logic. This example matches a single character that is anything other than an uppercase or lowercase letter.

( ) When parentheses are added to a regular expression, they are ignored for the purpose of matching, but allow you to extract a particular subset of the matched string rather than the whole string when using findall().

\b Matches the empty string, but only at the start or end of a word.

\B Matches the empty string, but not at the start or end of a word.

\d Matches any decimal digit; equivalent to the set [0-9].

\D Matches any non-digit character; equivalent to the set [^0-9].

Bonus section for Unix / Linux users

Support for searching files using regular expressions was built into the Unix operating system since the 1960s and it is available in nearly all programming languages in one form or another.

As a matter of fact, there is a command-line program built into Unix called grep (Generalized Regular Expression Parser) that does pretty much the same as the search() examples in this chapter. So if you have a Macintosh or Linux system, you can try the following commands in your command-line window.

$ grep '^From:' mbox-short.txt
From: stephen.marquard@uct.ac.za
From: louis@media.berkeley.edu
From: zqian@umich.edu
From: rjlowe@iupui.edu

This tells grep to show you lines that start with the string “From:” in the file mbox-short.txt. If you experiment with the grep command a bit and read the documentation for grep, you will find some subtle differences between the regular expression support in Python and the regular expression support in grep. As an example, grep does not support the non-blank character \S so you will need to use the slightly more complex set notation [^ ], which simply means match a character that is anything other than a space.


Debugging

Python has some simple and rudimentary built-in documentation that can be quite helpful if you need a quick refresher to trigger your memory about the exact name of a particular method. This documentation can be viewed in the Python interpreter in interactive mode.

You can bring up an interactive help system using help().

>>> help()

help> modules

If you know what module you want to use, you can use the dir() command to find the methods in the module as follows:

>>> import re
>>> dir(re)
[.. 'compile', 'copy_reg', 'error', 'escape', 'findall',
'finditer', 'match', 'purge', 'search', 'split', 'sre_compile',
'sre_parse', 'sub', 'subn', 'sys', 'template']

You can also get a small amount of documentation on a particular method using the help command combined with the desired method.

>>> help (re.search)
Help on function search in module re:

search(pattern, string, flags=0)
    Scan through string looking for a match to the pattern, returning
    a match object, or None if no match was found.
>>>

The built-in documentation is not very extensive, but it can be helpful when you are in a hurry or don’t have access to a web browser or search engine.


Python Regular Expression Quick Guide

Python Regular Expression Quick Guide

Python Regular Expression Quick Guide

^        Matches the beginning of a line
$        Matches the end of the line
.        Matches any character
\s       Matches whitespace
\S       Matches any non-whitespace character
*        Repeats a character zero or more times
*?       Repeats a character zero or more times 
         (non-greedy)
+        Repeats a character one or more times
+?       Repeats a character one or more times 
         (non-greedy)
[aeiou]  Matches a single character in the listed set
[^XYZ]   Matches a single character not in the listed set
[a-z0-9] The set of characters can include a range
(        Indicates where string extraction is to start
)        Indicates where string extraction is to end

Glossary

brittle code
Code that works when the input data is in a particular format but is prone to breakage if there is some deviation from the correct format. We call this “brittle code” because it is easily broken.
greedy matching
The notion that the + and * characters in a regular expression expand outward to match the largest possible string.
grep
A command available in most Unix systems that searches through text files looking for lines that match regular expressions. The command name stands for “Generalized Regular Expression Parser”.
regular expression
A language for expressing more complex search strings. A regular expression may contain special characters that indicate that a search only matches at the beginning or end of a line or many other similar capabilities.
wild card
A special character that matches any character. In regular expressions the wild-card character is the period.


Exercises

Exercise 1:
Write a simple program to simulate the operation of the grep command on Unix. Ask the user to enter a regular expression and count the number of lines that matched the regular expression:

$ python grep.py
Enter a regular expression: ^Author
mbox.txt had 1798 lines that matched ^Author

$ python grep.py
Enter a regular expression: ^X-
mbox.txt had 14368 lines that matched ^X-

$ python grep.py
Enter a regular expression: java$
mbox.txt had 4175 lines that matched java$

Exercise 2:
Write a program to look for lines of the form:

New Revision: 39772

Extract the number from each of the lines using a regular expression and the findall() method. Compute the average of the numbers and print out the average as an integer.

Enter file:mbox.txt
38549

Enter file:mbox-short.txt
39756

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.coloradmin.cn/o/849194.html

如若内容造成侵权/违法违规/事实不符,请联系多彩编程网进行投诉反馈,一经查实,立即删除!

相关文章

[保研/考研机试] 约瑟夫问题No.2 C++实现

题目要求&#xff1a; 输入、输出样例&#xff1a; 源代码&#xff1a; #include<iostream> #include<queue> #include<vector> using namespace std;//例题5.2 约瑟夫问题No.2 int main() {int n, p, m;while (cin >> n >> p >> m) {//如…

【LeetCode】字符串与栈的练习

字符串相乘 class Solution { public:/** 将两个字符串相乘拆分成两步&#xff1a;* 先将一个字符串的每个字符与另一个字符串相乘得到一个计算结果* 再将所有计算结果的字符串进行相加*/string multiply(string num1, string num2) {string result "0";// 一个字…

微服务——es数据聚合+RestClient实现聚合

数据聚合 聚合的种类 DSL实现Bucket聚合 如图所示&#xff0c;设置了10个桶&#xff0c;那么就显示了数量最多的前10个桶&#xff0c;品牌含有7天酒店的有30家&#xff0c; 品牌含有如家的也有30家。 修改排序规则 限定聚合范围 DSL实现Metrics聚合 如下案例要求对不同的品…

录像模糊变高清:提高录制视频清晰度的方法

录像是记录生活点滴的重要方式之一&#xff0c;然而&#xff0c;由于种种原因&#xff0c;我们可能会遇到一些模糊、不清晰的视频。为了解决这一问题&#xff0c;今天来给大家分享一下如何利用牛学长视频修复工具修复录像视频清晰度的方法&#xff0c;方便快捷&#xff0c;无需…

springBoot集成caffeine,自定义缓存配置 CacheManager

目录 springboot集成caffeine Maven依赖 配置信息&#xff1a;properties文件 config配置 使用案例 Caffeine定制化配置多个cachemanager springboot集成redis并且定制化配置cachemanager springboot集成caffeine Caffeine是一种基于服务器内存的缓存库。它将数据存储在…

批量查询快递信息的最佳解决方案

快递查询是我们日常生活中经常需要进行的操作&#xff0c;然而&#xff0c;当我们有多个快递单号需要查询时&#xff0c;逐个查询就显得非常繁琐和耗时。为了解决这个问题&#xff0c;今天给大家推荐一款实用的软件——【固乔快递查询助手】。 首先&#xff0c;在浏览器中搜索并…

Activity启动过程详解(Android 12源码分析)

Activity的启动方式 启动一个Activity&#xff0c;通常有两种情况&#xff0c;一种是在应用内部启动Activity&#xff0c;另一种是Launcher启动 1、应用内启动 通过startActivity来启动Activity 启动流程&#xff1a; 一、Activity启动的发起 二、Activity的管理——ATMS 三、…

Mysql根据创建时间表分区实践

背景 最近订单表遇到大数据量的问题&#xff0c;并且表中随着时间的积累会变得更大&#xff0c;当数据量较大时&#xff0c;存储的物理文件会变得非常大、使用性能很差。 我们用的是GaussDB。为了提高查询效率&#xff0c;建议表大于500w进行分区&#xff0c;所以在规划阶段我…

opencv基础47 查找图像轮廓cv2.findContours()详解

什么是图像轮廓&#xff1f; 图像轮廓是指图像中物体边缘的连续性曲线。在计算机视觉和图像处理中&#xff0c;轮廓通常被用于检测物体、分割图像以及提取物体特征。 图像轮廓是由一系列连续的像素点组成&#xff0c;这些像素点位于物体边界上。轮廓的特点是在物体和背景之间的…

springboot生成表结构和表数据sql

需求 业务背景是需要某单机程序需要把正在进行的任务导出&#xff0c;然后另一台电脑上单机继续运行&#xff0c;我这里选择的方案是同步SQL形式&#xff0c;并保证ID随机&#xff0c;多个数据库不会重复。 实现 package com.nari.web.controller.demo.controller;import cn…

Android 11 获取启动其他应用

Android 11 获取启动其他应用 本文代码地址 https://gitee.com/chenjim/QueryAppInfo 最新更新地址 https://gitee.com/chenjim/chenjimblog 前言 如果应用以 Android 11&#xff08;API 级别 30&#xff09;或更高版本为目标平台&#xff0c;并查询与设备上已安装的其他应用相…

Python 面试必知必会(一):数据结构

《Python Cookbook》的作者David Beazley的课程PPT开源了&#xff0c;目标用户是希望从编写基础脚本过渡到编写更复杂程序的高级 Python 程序员&#xff0c;课程主题侧重于流行库和框架中使用的编程技术&#xff0c;主要目的是更好地理解 Python 语言本身&#xff0c;以便阅读他…

简易图书管理系统(面向对象思想)

目录 前言 1.整体思路 2.Book包 2.1Book类 2.2BookList类 3.user包 3.1User类 3.2NormalUser类 3.3AdminUser类 4.operation 4.1IOPeration接口 4.2ExitOperation类 4.3FindOperation类 4.4ShowOperation类 4.5AddOperation类 4.6DelOperation类 4.7BorrowOpera…

下半年提速拓店,为什么说屈臣氏引领美妆零售的未来?

屈臣氏过去是美妆零售的先锋&#xff0c;目前来看它或许仍然是先锋。 杰弗里摩尔在《公司进化论》中总结提出&#xff0c;自由市场经济运作的方式&#xff0c;遵循着一些类似自然界有机系统的定律&#xff1a;通俗来说&#xff0c;资源竞争带来“刺激创新”——由消费者偏好形…

看重ARM?苹果、三星、英伟达等知名企业纷纷表示加大投资

根据日经亚洲的报道&#xff0c;芯片设计公司Arm计划进行首次公开募股并在纳斯达克上市。苹果、三星电子、英伟达、英特尔等知名企业计划在Arm美股上市后投资该公司。 据悉&#xff0c;Arm将于9月份上市&#xff0c;预计估值将达到至少600亿美元&#xff08;约合4314亿元人民币…

【PyQt5+matplotlib】获取鼠标在canvas上的点击坐标

示例代码&#xff1a; import sys import matplotlib.pyplot as plt from PyQt5.QtWidgets import QApplication, QMainWindow, QVBoxLayout, QWidget from matplotlib.backends.backend_qt5agg import FigureCanvasQTAgg as FigureCanvasclass MyMainWindow(QMainWindow):de…

vulnhub靶场-y0usef笔记

vulnhub靶场-y0usef笔记 信息收集 首先fscan找到目标机器ip http://192.168.167.70/ nmap扫描端口 Host is up (0.00029s latency). Not shown: 998 closed tcp ports (reset) PORT STATE SERVICE VERSION 22/tcp open ssh OpenSSH 6.6.1p1 Ubuntu 2ubuntu2.13 (Ub…

(十五)大数据实战——hive的安装部署

前言 Hive是由Facebook开源&#xff0c;基于Hadoop的一个数据仓库工具&#xff0c;可以将结构化的数据文件映射为一张表&#xff0c;并提供类SQL查询功能。本节内容我们主要介绍一下hive的安装与部署的相关内容。 正文 上传hive安装包到hadoop101服务器/opt/software目录 解…

Reinforcement Learning with Code (对比Monte-Carlo与TD算法)【Code 3. MonteCarlo】

Reinforcement Learning with Code & &#xff08;对比Monte-Carlo与TD算法&#xff09;【Code 3. MonteCarlo】 This note records how the author begin to learn RL. Both theoretical understanding and code practice are presented. Many material are referenced s…

成功搞定H7-TOO的FreeRTOS Trace图形化链表方式展示任务管理

之前推出了H7-TOOL的RTOS Trace功能&#xff0c;已经支持RTX5&#xff0c;ThreadX&#xff0c;uCOS-III&#xff0c;uCOS-II和FreeRTOS&#xff0c;特色是不需要目标板额外做任何代码&#xff0c;实时检测RTOS任务执行情况&#xff0c;支持在线和脱机玩法&#xff0c;效果是下面…