1 一个爬图片pic的代码的例子
- 下面这段是爬一些图片pic的代码
- 学写了一段bs的代码,但是马上报错
#E:\work\FangCloudV2\personal_space\2learn\python3\py0001.txt
import requests
from bs4 import BeautifulSoup
url="https://movie.douban.com/celebrity/1011562/photos/"
res=requests.get(url)
content= BeautifulSoup(res.text, "html.parser")
data=content.find_all("div",attrs={'class':'cover'})
picture_list=[]
for d in data:
plist=d.find("img")["src"]
picture_list.append(plist)
print (picture_list)
null刘涛最新图片https://movie.douban.com/celebrity/1011562/photos/
2 直接在cmd里 python运行报错
2.1 运行报错
- 运行cmd
- python 文件 报错
- 报错内容: ModuleNotFoundError: No module named 'bs4'
2.2 报错原因
- 这个报错的原因,是因为在默认的python目录下并没有安装 bs4 (BeautifulSoup)这个模块,当然会报错
- 那如果是以下情况,就不会遇到这个报错
- 如果是,先在默认python下安装了 bs4 ,就不会遇到这种报错
- 如果是我直接使用 anaconda环境下的 cmd 或者 spygt ,pythoncharm
2.3 查询pc里 python相关的所有安装内容
- 接下来的问题就是
- (因为使用的电脑环境并不一定是自己安装的环境,也可能很久后忘记了)
- 我是否可以在安装前知道,已经安装了 bs4?
- 同样,我想知道是否已经安装过 pip ,requeset 等其他模块
- 这些模块装在哪儿呢?
2.3.1 查看所有python版本的命令
- py -0p
- 可以查看电脑中所有的 python版本
- 其中* 号是默认的版本
- 我这里显示1个是默认的,一个 anaconda里的
2.3.2 pip list 列表显示
- pip list
- pip list --format=columns
- 可以查看pip下的各种模块
- 而这个pip list 显示的各个模块,实际对应硬盘上的哪个路径呢?
- Python311\site-packages
- \Python37_64\Lib\site-packages\pip\_vendor
- C:\Program Files (x86)\Microsoft Visual Studio\Shared\Python37_64\Lib\site-packages\pip\_vendor
- \Python37_64\Lib\site-packages\pip\_vendor
2.3.3 pip show 模块 命令
- pip show pip
- pip show requests
- 显示详细信息: name , version ,安装位置等
- 如果是没有安装的模块,就会找不到
- 比如这里的 bs4
2.3.4 pip 的其他常用命令
- pip --help # 可以查看帮助,全部命令
- pip --version
- pip list
- pip list -0
- pip show XXX模块
- pip install
- pip install --upgrade XXX
- pip uninstall
- pip search XXX
Commands:
- install Install packages.
- download Download packages.
- uninstall Uninstall packages.
- freeze Output installed packages in requirements format.
- inspect Inspect the python environment.
- list List installed packages.
- show Show information about installed packages.
- check Verify installed packages have compatible dependencies.
- config Manage local and global configuration.
- search Search PyPI for packages.
- cache Inspect and manage pip's wheel cache.
- index Inspect information available from package indexes.
- wheel Build wheels from your requirements.
- hash Compute hashes of package archives.
- completion A helper command used for command completion.
- debug Show information useful for debugging.
- help Show help for commands.
General Options:
- -h, --help Show help.
- --debug Let unhandled exceptions propagate outside the main subroutine, instead of logging them
- to stderr.
- --isolated Run pip in an isolated mode, ignoring environment variables and user configuration.
- --require-virtualenv Allow pip to only run in a virtual environment; exit with an error otherwise.
- --python <python> Run pip with the specified Python interpreter.
- -v, --verbose Give more output. Option is additive, and can be used up to 3 times.
- -V, --version Show version and exit.
- -q, --quiet Give less output. Option is additive, and can be used up to 3 times (corresponding to
- WARNING, ERROR, and CRITICAL logging levels).
- --log <path> Path to a verbose appending log.
- --no-input Disable prompting for input.
- --keyring-provider <keyring_provider>
- Enable the credential lookup via the keyring library if user input is allowed. Specify
- which mechanism to use [disabled, import, subprocess]. (default: disabled)
- --proxy <proxy> Specify a proxy in the form scheme://[user:passwd@]proxy.server:port.
- --retries <retries> Maximum number of retries each connection should attempt (default 5 times).
- --timeout <sec> Set the socket timeout (default 15 seconds).
- --exists-action <action> Default action when a path already exists: (s)witch, (i)gnore, (w)ipe, (b)ackup,
- (a)bort.
- --trusted-host <hostname> Mark this host or host:port pair as trusted, even though it does not have valid or any
- HTTPS.
- --cert <path> Path to PEM-encoded CA certificate bundle. If provided, overrides the default. See 'SSL
- Certificate Verification' in pip documentation for more information.
- --client-cert <path> Path to SSL client certificate, a single file containing the private key and the
- certificate in PEM format.
- --cache-dir <dir> Store the cache data in <dir>.
- --no-cache-dir Disable the cache.
- --disable-pip-version-check
- Don't periodically check PyPI to determine whether a new version of pip is available for
- download. Implied with --no-index.
- --no-color Suppress colored output.
- --no-python-version-warning
- Silence deprecation warnings for upcoming unsupported Pythons.
- --use-feature <feature> Enable new functionality, that may be backward incompatible.
- --use-deprecated <feature> Enable deprecated functionality, that will be removed in the future.
2.3.5 不太好用的命令
- python -m site
- 显示的是 py3.7这一层目录的文件夹目录位置!!
- C:\Program Files (x86)\Microsoft Visual Studio\Shared\Python37_64
- 而不是pip 下安装模块的文件夹目录位置!!
- C:\Program Files (x86)\Microsoft Visual Studio\Shared\Python37_64\Lib\site-packages\pip\_vendor
2.3.6 安装好 bs4后,问题可以解决
3 我选择在anaconda下 使用 bs4 (BeautifulSoup)
- 我没有继续在python 默认路径下安装bs4
- 而是选择在 anaconda下,运行cmd,
- 这里是已经安装了 bs4的
- 注意这里是在 anaconda下启动的 cmd
3.1 又遇到报错1
- ImportError: cannot import name 'beautifulsoup' from 'bs4' (e:\ProgramData\anaconda3\lib\site-packages\bs4\__init__.py)
- from bs4 import beautifulsoup 错误导致
- 修改首字母大写即可解决这个问题
- from bs4 import BeautifulSoup
3.2 没有报错,但是也可以爬成功
- 怀疑是没有加headers 被拒绝了。。。
- 只返还了一个空列表
其他内容补充(暂时放这)
1
检查
按F12
看的内容不一样
检查
空白处点检查
选中某一个元素如图片,点击检查可以定位到当前图片的 标记位置
2
有两种解析内容
Beautiful soup
基本按着html结构解析,head body div p a li 等等
也可以选择按xml解析
Xpath就是按照xml解析
Node
Div等
3