Chapter 10: Dictionaries | Python for Everybody 讲义笔记_En

news2024/9/20 6:02:22

文章目录

  • Python for Everybody
    • 课程简介
    • Dictionaries
      • Dictionaries
      • Dictionary as a set of counters
      • Dictionaries and files
      • Looping and dictionaries
      • Advanced text parsing
      • Debugging
      • Glossary


Python for Everybody

Exploring Data Using Python 3
Dr. Charles R. Severance


课程简介

Python for Everybody 零基础程序设计(Python 入门)

  • This course aims to teach everyone the basics of programming computers using Python. 本课程旨在向所有人传授使用 Python 进行计算机编程的基础知识。
  • We cover the basics of how one constructs a program from a series of simple instructions in Python. 我们介绍了如何通过 Python 中的一系列简单指令构建程序的基础知识。
  • The course has no pre-requisites and avoids all but the simplest mathematics. Anyone with moderate computer experience should be able to master the materials in this course. 该课程没有任何先决条件,除了最简单的数学之外,避免了所有内容。任何具有中等计算机经验的人都应该能够掌握本课程中的材料。
  • This course will cover Chapters 1-5 of the textbook “Python for Everybody”. Once a student completes this course, they will be ready to take more advanced programming courses. 本课程将涵盖《Python for Everyday》教科书的第 1-5 章。学生完成本课程后,他们将准备好学习更高级的编程课程。
  • This course covers Python 3.

在这里插入图片描述

coursera

Python for Everybody 零基础程序设计(Python 入门)

Charles Russell Severance
Clinical Professor

在这里插入图片描述

个人主页
Twitter

在这里插入图片描述

University of Michigan


课程资源

coursera原版课程视频
coursera原版视频-中英文精校字幕-B站
Dr. Chuck官方翻录版视频-机器翻译字幕-B站

PY4E-课程配套练习
Dr. Chuck Online - 系列课程开源官网



Dictionaries

The dictionary data structures allows us to store multiple values in an object and look up the values by their key.


Dictionaries

A dictionary is like a list, but more general. In a list, the index positions have to be integers; in a dictionary, the indices can be (almost) any type.

You can think of a dictionary as a mapping between a set of indices (which are called keys) and a set of values. Each key maps to a value. The association of a key and a value is called a key-value pair or sometimes an item.

As an example, we’ll build a dictionary that maps from English to Spanish words, so the keys and the values are all strings.

The function dict creates a new dictionary with no items. Because dict is the name of a built-in function, you should avoid using it as a variable name.

>>> eng2sp = dict()
>>> print(eng2sp)
{}

The curly brackets, {}, represent an empty dictionary. To add items to the dictionary, you can use square brackets:

>>> eng2sp['one'] = 'uno'

This line creates an item that maps from the key 'one' to the value “uno”. If we print the dictionary again, we see a key-value pair with a colon between the key and value:

>>> print(eng2sp)
{'one': 'uno'}

This output format is also an input format. For example, you can create a new dictionary with three items.

>>> eng2sp = {'one': 'uno', 'two': 'dos', 'three': 'tres'}
>>> print(eng2sp)
{'one': 'uno', 'two': 'dos', 'three': 'tres'}

Since Python 3.7x the order of key-value pairs is the same as their input order, i.e. dictionaries are now ordered structures.

But that doesn’t really matter because the elements of a dictionary are never indexed with integer indices. Instead, you use the keys to look up the corresponding values:

>>> print(eng2sp['two'])
'dos'

The key 'two' always maps to the value “dos” so the order of the items doesn’t matter.

If the key isn’t in the dictionary, you get an exception:

>>> print(eng2sp['four'])
KeyError: 'four'

The len function works on dictionaries; it returns the number of key-value pairs:

>>> len(eng2sp)
3

The in operator works on dictionaries; it tells you whether something appears as a key in the dictionary (appearing as a value is not good enough).

>>> 'one' in eng2sp
True
>>> 'uno' in eng2sp
False

To see whether something appears as a value in a dictionary, you can use the method values, which returns the values as a type that can be converted to a list, and then use the in operator:

>>> vals = list(eng2sp.values())
>>> 'uno' in vals
True

The in operator uses different algorithms for lists and dictionaries. For lists, it uses a linear search algorithm. As the list gets longer, the search time gets longer in direct proportion to the length of the list. For dictionaries, Python uses an algorithm called a hash table that has a remarkable property: the in operator takes about the same amount of time no matter how many items there are in a dictionary. I won’t explain why hash functions are so magical, but you can read more about it at wikipedia.org/wiki/Hash_table.

Exercise 1:
Download a copy of the file www.py4e.com/code3/words.txt
Write a program that reads the words in words.txt and stores them as keys in a dictionary. It doesn’t matter what the values are. Then you can use the in operator as a fast way to check whether a string is in the dictionary.

Dictionary as a set of counters

Suppose you are given a string and you want to count how many times each letter appears. There are several ways you could do it:

  1. You could create 26 variables, one for each letter of the alphabet. Then you could traverse the string and, for each character, increment the corresponding counter, probably using a chained conditional.

  2. You could create a list with 26 elements. Then you could convert each character to a number (using the built-in function ord), use the number as an index into the list, and increment the appropriate counter.

  3. You could create a dictionary with characters as keys and counters as the corresponding values. The first time you see a character, you would add an item to the dictionary. After that you would increment the value of an existing item.

Each of these options performs the same computation, but each of them implements that computation in a different way.

An implementation is a way of performing a computation; some implementations are better than others. For example, an advantage of the dictionary implementation is that we don’t have to know ahead of time which letters appear in the string and we only have to make room for the letters that do appear.

Here is what the code might look like:

word = 'brontosaurus'
d = dict()
for c in word:
    if c not in d:
        d[c] = 1
    else:
        d[c] = d[c] + 1
print(d)

We are effectively computing a histogram, which is a statistical term for a set of counters (or frequencies).

The for loop traverses the string. Each time through the loop, if the character c is not in the dictionary, we create a new item with key c and the initial value 1 (since we have seen this letter once). If c is already in the dictionary we increment d[c].

Here’s the output of the program:

{'a': 1, 'b': 1, 'o': 2, 'n': 1, 's': 2, 'r': 2, 'u': 2, 't': 1}

The histogram indicates that the letters “a” and “b” appear once; “o” appears twice, and so on.

Dictionaries have a method called get that takes a key and a default value. If the key appears in the dictionary, get returns the corresponding value; otherwise it returns the default value. For example:

>>> counts = { 'chuck' : 1 , 'annie' : 42, 'jan': 100}
>>> print(counts.get('jan', 0))
100
>>> print(counts.get('tim', 0))
0

We can use get to write our histogram loop more concisely. Because the get method automatically handles the case where a key is not in a dictionary, we can reduce four lines down to one and eliminate the if statement.

word = 'brontosaurus'
d = dict()
for c in word:
    d[c] = d.get(c,0) + 1
print(d)

The use of the get method to simplify this counting loop ends up being a very commonly used “idiom” in Python and we will use it many times in the rest of the book. So you should take a moment and compare the loop using the if statement and in operator with the loop using the get method. They do exactly the same thing, but one is more succinct.

Dictionaries and files

One of the common uses of a dictionary is to count the occurrence of words in a file with some written text. Let’s start with a very simple file of words taken from the text of Romeo and Juliet.

For the first set of examples, we will use a shortened and simplified version of the text with no punctuation. Later we will work with the text of the scene with punctuation included.

But soft what light through yonder window breaks
It is the east and Juliet is the sun
Arise fair sun and kill the envious moon
Who is already sick and pale with grief

We will write a Python program to read through the lines of the file, break each line into a list of words, and then loop through each of the words in the line and count each word using a dictionary.

You will see that we have two for loops. The outer loop is reading the lines of the file and the inner loop is iterating through each of the words on that particular line. This is an example of a pattern called nested loops because one of the loops is the outer loop and the other loop is the inner loop.

Because the inner loop executes all of its iterations each time the outer loop makes a single iteration, we think of the inner loop as iterating “more quickly” and the outer loop as iterating more slowly.

The combination of the two nested loops ensures that we will count every word on every line of the input file.

fname = input('Enter the file name: ')
try:
    fhand = open(fname)
except:
    print('File cannot be opened:', fname)
    exit()

counts = dict()
for line in fhand:
    words = line.split()
    for word in words:
        if word not in counts:
            counts[word] = 1
        else:
            counts[word] += 1

print(counts)

# Code: http://www.py4e.com/code3/count1.py

In our else statement, we use the more compact alternative for incrementing a variable. counts[word] += 1 is equivalent to counts[word] = counts[word] + 1. Either method can be used to change the value of a variable by any desired amount. Similar alternatives exist for -=, *=, and /=.

When we run the program, we see a raw dump of all of the counts in unsorted hash order. (the romeo.txt file is available at www.py4e.com/code3/romeo.txt)

python count1.py
Enter the file name: romeo.txt
{'But': 1, 'soft': 1, 'what': 1, 'light': 1, 'through': 1, 'yonder': 1,
'window': 1, 'breaks': 1, 'It': 1, 'is': 3, 'the': 3, 'east': 1, 'and': 3,
'Juliet': 1, 'sun': 2, 'Arise': 1, 'fair': 1, 'kill': 1, 'envious': 1,
'moon': 1, 'Who': 1, 'already': 1, 'sick': 1, 'pale': 1, 'with': 1,
'grief': 1}

It is a bit inconvenient to look through the dictionary to find the most common words and their counts, so we need to add some more Python code to get us the output that will be more helpful.

Looping and dictionaries

If you use a dictionary as the sequence in a for statement, it traverses the keys of the dictionary. This loop prints each key and the corresponding value:

counts = { 'chuck' : 1 , 'annie' : 42, 'jan': 100}
for key in counts:
    print(key, counts[key])

Here’s what the output looks like:

chuck 1
annie 42
jan 100

Again, the keys are ordered.

We can use this pattern to implement the various loop idioms that we have described earlier. For example if we wanted to find all the entries in a dictionary with a value above ten, we could write the following code:

counts = { 'chuck' : 1 , 'annie' : 42, 'jan': 100}
for key in counts:
    if counts[key] > 10 :
        print(key, counts[key])

The for loop iterates through the keys of the dictionary, so we must use the index operator to retrieve the corresponding value for each key. Here’s what the output looks like:

annie 42
jan 100

We see only the entries with a value above 10.

If you want to print the keys in alphabetical order, you first make a list of the keys in the dictionary using the keys method available in dictionary objects, and then sort that list and loop through the sorted list, looking up each key and printing out key-value pairs in sorted order as follows:

counts = { 'chuck' : 1 , 'annie' : 42, 'jan': 100}
lst = list(counts.keys())
print(lst)
lst.sort()
for key in lst:
    print(key, counts[key])

Here’s what the output looks like:

['chuck', 'annie', 'jan']
annie 42
chuck 1
jan 100

First you see the list of keys in non-alphabetical order that we get from the keys method. Then we see the key-value pairs in alphabetical order from the for loop.

Advanced text parsing

In the above example using the file romeo.txt, we made the file as simple as possible by removing all punctuation by hand. The actual text has lots of punctuation, as shown below.

But, soft! what light through yonder window breaks?
It is the east, and Juliet is the sun.
Arise, fair sun, and kill the envious moon,
Who is already sick and pale with grief,

Since the Python split function looks for spaces and treats words as tokens separated by spaces, we would treat the words “soft!” and “soft” as different words and create a separate dictionary entry for each word.

Also since the file has capitalization, we would treat “who” and “Who” as different words with different counts.

We can solve both these problems by using the string methods lower, punctuation, and translate. The translate is the most subtle of the methods. Here is the documentation for translate:

line.translate(str.maketrans(fromstr, tostr, deletestr))

Replace the characters in fromstr with the character in the same position in tostr and delete all characters that are in deletestr. The fromstr and tostr can be empty strings and the deletestr parameter can be omitted.

We will not specify the tostr but we will use the deletestr parameter to delete all of the punctuation. We will even let Python tell us the list of characters that it considers “punctuation”:

>>> import string
>>> string.punctuation
'!"#$%&\'()*+,-./:;<=>?@[\\]^_`{|}~'

The parameters used by translate were different in Python 2.0.

We make the following modifications to our program:

import string

fname = input('Enter the file name: ')
try:
    fhand = open(fname)
except:
    print('File cannot be opened:', fname)
    exit()

counts = dict()
for line in fhand:
    line = line.rstrip()
    line = line.translate(line.maketrans('', '', string.punctuation))
    line = line.lower()
    words = line.split()
    for word in words:
        if word not in counts:
            counts[word] = 1
        else:
            counts[word] += 1

print(counts)

# Code: http://www.py4e.com/code3/count2.py

Part of learning the “Art of Python” or “Thinking Pythonically” is realizing that Python often has built-in capabilities for many common data analysis problems. Over time, you will see enough example code and read enough of the documentation to know where to look to see if someone has already written something that makes your job much easier.

The following is an abbreviated version of the output:

Enter the file name: romeo-full.txt
{'romeo': 40, 'and': 42, 'juliet': 32, 'act': 1, '2': 2, 'scene': 2,
'ii': 1, 'capulets': 1, 'orchard': 2, 'enter': 1, 'he': 5, 'jests': 1,
'at': 9, 'scars': 1, 'that': 30, 'never': 2, 'felt': 1, 'a': 24, 'wound': 1,
'appears': 1, 'above': 6, 'window': 2, 'but': 18, 'soft': 1, 'what': 11,
'light': 5, 'through': 2, 'yonder': 2, 'breaks': 1, ...}

Looking through this output is still unwieldy and we can use Python to give us exactly what we are looking for, but to do so, we need to learn about Python tuples. We will pick up this example once we learn about tuples.

Debugging

As you work with bigger datasets it can become unwieldy to debug by printing and checking data by hand. Here are some suggestions for debugging large datasets:

Scale down the input
If possible, reduce the size of the dataset. For example if the program reads a text file, start with just the first 10 lines, or with the smallest example you can find. You can either edit the files themselves, or (better) modify the program so it reads only the first n lines.

If there is an error, you can reduce n to the smallest value that manifests the error, and then increase it gradually as you find and correct errors.

Check summaries and types
Instead of printing and checking the entire dataset, consider printing summaries of the data: for example, the number of items in a dictionary or the total of a list of numbers.

A common cause of runtime errors is a value that is not the right type. For debugging this kind of error, it is often enough to print the type of a value.

Write self-checks
Sometimes you can write code to check for errors automatically. For example, if you are computing the average of a list of numbers, you could check that the result is not greater than the largest element in the list or less than the smallest. This is called a “sanity check” because it detects results that are “completely illogical”.

Another kind of check compares the results of two different computations to see if they are consistent. This is called a “consistency check”.

Pretty print the output
Formatting debugging output can make it easier to spot an error.
Again, time you spend building scaffolding can reduce the time you spend debugging.

Glossary

dictionary
A mapping from a set of keys to their corresponding values.
hashtable
The algorithm used to implement Python dictionaries.
hash function
A function used by a hashtable to compute the location for a key.
histogram
A set of counters.
implementation
A way of performing a computation.
item
Another name for a key-value pair.
key
An object that appears in a dictionary as the first part of a key-value pair.
key-value pair
The representation of the mapping from a key to a value.
lookup
A dictionary operation that takes a key and finds the corresponding value.
nested loops
When there are one or more loops “inside” of another loop. The inner loop runs to completion each time the outer loop runs once.
value
An object that appears in a dictionary as the second part of a key-value pair. This is more specific than our previous use of the word “value”.

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.coloradmin.cn/o/821267.html

如若内容造成侵权/违法违规/事实不符,请联系多彩编程网进行投诉反馈,一经查实,立即删除!

相关文章

《向量数据库指南》——腾讯云向量数据库Tencent Cloud VectorDB产品特性,架构和应用场景

腾讯云向量数据库(Tencent Cloud VectorDB)是一款全托管的自研企业级分布式数据库服务,专用于存储、检索、分析多维向量数据。该数据库支持多种索引类型和相似度计算方法,单索引支持 10 亿级向量规模,可支持百万级 QPS 及毫秒级查询延迟。腾讯云向量数据库不仅能为大模型提…

Window版本ES(ElasticSearch)的安装,使用,启动

首先我这里是根据小破站狂神说up主&#xff0c;而学习的&#xff0c;下面是笔记文档&#xff0c;文档可能比我更细&#xff0c;但我还是记录一下&#xff0c;嘿嘿嘿 ElasticSearch7.6入门学习笔记-KuangStudy-文章 下面直接开始&#xff0c;首先我们需要下载ELK三个安装包&…

spring boot合并 http请求(前后端实现)

为什么要合并http请求 页面加载时要初始化很多资源信息&#xff0c;会发送好多http请求&#xff0c;根据http的通信握手特点&#xff0c;多个http请求比较浪费资源。进而如果能将多个http请求合并为一个发送给服务器&#xff0c;由服务器获取对应的资源返回给客户端 奇思妙解 …

echarts实现正负轴柱状图

效果&#xff1a; data变量配置&#xff1a; // 正负柱状图zhengfuZhu: {},data1: [],data2: [],data3: [],data4: [],xAxisData1: [],//横轴刻度seriesLabel: { //柱子上的字体show: false,position: outside,color: #BEC3EB,},seriesEmphasis: { //itemStyle: {shadowBlur: …

【福建事业单位-语言理解】01中心理解

【福建事业单位-语言理解】01中心理解 一、中心理解题&#xff1a;关键词1.1 转折关系词总结 1.2因果关系词总结 1.3必要条件关系对策&#xff08;只有才的变体题型&#xff09;反面论证&#xff08;如果等词汇错误做法不好的结果&#xff09;对策特殊脉络总结 1.4 并列结构总结…

怎么找小红书美食探店博主合作?

在小红书上有许多活跃的美食探店博主&#xff0c;他们通过分享自己的餐厅体验、特色美食以及口味评价&#xff0c;吸引了大量用户的关注。对于想要在小红书上进行美食推广的企业来说&#xff0c;与这些博主合作是一种非常直接的方式&#xff0c;接下来伯乐网络传媒来给大家分享…

【已解决】标签死活不响应单击事件

大家好&#xff0c;我是执念斩长河。今天在公司写代码的时候突然遇到一个问题&#xff0c;这个问题困扰了我不久&#xff0c;就是html中li标签不能响应我的单击事件。最后在仔细分析下&#xff0c;解决了这个问题。 文章目录 1、问题来源2、问题解决方案3、问题解决效果4、总结…

C语言假期作业 DAY 10

一、选择题 1、求函数返回值&#xff0c;传入 -1 &#xff0c;则在64位机器上函数返回&#xff08; &#xff09; int func(int x) { int count 0; while (x) { count; x x&(x - 1);//与运算 } r eturn count; } A: 死循环 B: 64 C: 32 D: 16 答案解析 正确答案&#xff…

Elasticsearch:如何将整个 Elasticsearch 索引导出到文件 - Python 8.x

在实际的使用中&#xff0c;我们有时希望把 Elasticsearch 的索引保存到 JSON 文件中。在之前&#xff0c;我写了一篇管如何备份 Elasticsearch 索引的文章 “Elasticsearch&#xff1a;索引备份及恢复”。在今天&#xff0c;我们使用一种 Python 的方法来做进一步的探讨。你可…

SpringBoot复习:(12)SpringApplicationRunListener和 SpringApplicationRunListeners

SpringApplicationRunListener接口定义如下&#xff1a; public interface SpringApplicationRunListener {default void starting() {}default void environmentPrepared(ConfigurableEnvironment environment) {}default void contextPrepared(ConfigurableApplicationConte…

2023年【零声教育】13代C/C++Linux服务器开发高级架构师课程体系分析

对于零声教育的C/CLinux服务器高级架构师的课程到2022目前已经迭代到13代了&#xff0c;像之前小编也总结过&#xff0c;但是课程每期都有做一定的更新&#xff0c;也是为了更好的完善课程跟上目前互联网大厂的岗位技术需求&#xff0c;之前课程里面也包含了一些小的分支&#…

【连通块染色,双指针维护区间map,整除分块】CF616 CDE

Dashboard - Educational Codeforces Round 5 - Codeforces C 题意&#xff1a; 思路&#xff1a; 经典的给格子染色&#xff0c;直接dfs染色就行&#xff0c;不要用并查集&#xff0c;虽然也能做但是不好写 然后对于每一个*&#xff0c;把周围的格子的颜色扔到set里去重统…

什么叫做阻塞队列的有界和无界

问题描述 什么叫做阻塞队列的有界和无界&#xff1a; 解决方案&#xff1a; 1. 阻塞队列&#xff08;如图&#xff09;&#xff0c;是一种特殊的队列&#xff0c;它在普通队列的基础上提供了两个附加功能 a. 当队列为空的时候&#xff0c;获取队列中元素的消费者线程会被阻塞…

ChinaJoy 2023微星雷鸟17游戏本震撼发布:搭载AMD锐龙9 7945HX首发8499元

ChinaJoy 2023展会中微星笔记本再次给大家带来惊喜&#xff0c;发布了搭载AMD移动端16大核的旗舰游戏本&#xff1a;雷鸟17&#xff0c;更重要的这样一款旗舰性能的游戏本&#xff0c;首发价8499元堪称当今游戏本市场中的“性价比爆款”&#xff01; 本着和玩家一同制霸游戏战场…

leetcode 712. Minimum ASCII Delete Sum for Two Strings(字符串删除字母的ASCII码之和)

两个字符串s1, s2, 删除其中的字母使s1和s2相等。 问删除字母的最小ASCII码之和是多少。 思路&#xff1a; DP 先考虑极端的情况&#xff0c;s1为空&#xff0c;那么要想达到s2和s1相等&#xff0c;就要把s2中的字母删完&#xff0c; ASCII码之和就是s2中所有字母的ASCII码之…

大型集团企业一体化运维监控方案

当前&#xff0c;云计算、大数据、人工智能等IT技术迅猛发展&#xff0c;企业的信息化步入了一个崭新的时代&#xff0c;企业规模不断壮大&#xff0c;业务不断拓展&#xff0c;企业信息化依赖的网络结构和IT技术越来越复杂。企业运维部门采用的运维工具和技术实力直接决定企业…

算法训练营第五十七天||647.回文子串、516.最长回文子序列、动态规划总结

647.回文子串 出自代码随想录 如果大家做了很多这种子序列相关的题目&#xff0c;在定义dp数组的时候 很自然就会想题目求什么&#xff0c;我们就如何定义dp数组。 绝大多数题目确实是这样&#xff0c;不过本题如果我们定义&#xff0c;dp[i] 为 下标i结尾的字符串有 dp[i]个…

Cordova+Vue2.0打包apk,保姆教程来袭!

1.环境准备&#xff08;全部都需要配置环境变量&#xff09; java version "1.8.0_341" 安卓sdk android-29 Gradle 4.10.1 node v16.16.0 cordova 10.0.0 (cordova-lib10.1.0)2.安卓环境变量 1. 确认已安装 Android SDK Build-Tools 和 Android SDK Platform-Tool…

Sentaurus TCAD-Sdevice CV (电容特性)

File {grid "nnode|NMOS_msh.tdr" ## 输入包含掺杂信息的网格文件current "plot" ## 输出包含时间、电压和电流的_des.plt曲线文件&#xff0c;简称current plotplot "tdrdat" ## 输出包含掺杂浓度、电流密度、电势分布、电场分布…

使用redis-cli操作redis

redis-cli是原生redis自带的命令行工具&#xff0c;可以帮助我们通过简单的命令连接redis服务&#xff0c;并进行数据管理&#xff0c;即redis键&#xff08;key&#xff09;和redis数据结构的管理。 关于如何进入redis-cli命令行客户端&#xff0c;请查看文章&#xff1a;Redi…