chrome镜像
selenium提供了一个镜像,但这个镜像里面包含了比较多的东西:
镜像地址-github
-
supervisord
-
java
-
chrome
-
webDriver
实际的使用中遇到了一些问题
- chrome遇到一些比较耗费内存和cup的操作的时候,有的时候会kill掉java进程,但supervisord不会自动拉起java进程。从ps看的话,webdriver和chrome就是孤儿进程了。
- 在某些情况下,webdriver退出不彻底,再次启动webdriver会在启动浏览器,此时,就会有两个浏览器。
当然加大内存是最简单的处理方式,但为了可控,随即自己封装chrome,将chrome和自己的业务代码封装在一起,要是发生上面的情况,可以调用命令行来做操作。
提供两种镜像
- 基于Ubuntu
- 基于Debian
大体流程如下
- 下载对应平台的安装包(手动下载到本地在安装,或者直接用apt-get 去搜索下载)。
- 下载对应版本的webDriver。
- apt-get 安装字体。
- 安装python,pip,用pip安装selenium库(这一点可以按照自己所熟悉的来,python我提供了代码,java也可以自己来安装jdk的环境,将自己的代码写成jar包,java jar运行,或者可以安装jdk17,命令行交互来操作)
基于Ubuntu
下载对应平台的安装包
-
手动下载安装
需要下载两个东西
-
chromium-codecs-ffmpeg-extra 90.0.4430.72-0ubuntu0.16.04.1 (amd64 binary) in ubuntu xenial
-
访问页面
https://launchpad.net/ubuntu/xenial/amd64/chromium-codecs-ffmpeg-extra/90.0.4430.72-0ubuntu0.16.04.1
-
下载链接
http://launchpadlibrarian.net/534151129/chromium-codecs-ffmpeg-extra_90.0.4430.72-0ubuntu0.16.04.1_amd64.deb
-
-
-
chromium-browser 90.0.4430.72-0ubuntu0.16.04.1
-
访问页面
https://www.ubuntuupdates.org/package_metas?cx=005406051221663887916%3Aaw7ejs-ceqo&cof=FORID%3A11&ie=UTF-8&q=google-chrome-stable&commit=Package+Search
-
下载链接
https://www.ubuntuupdates.org/package/core/xenial/universe/updates/chromium-browser
-
-
apt-get在线下载安装
这个安装方式基于Debian也是可以的,在Debian里面就不写了。
这里的安装方式来源于selenium提供的Dockfile
https://github.com/SeleniumHQ/docker-selenium/blob/trunk/NodeChrome/Dockerfile
ARG CHROME_VERSION="google-chrome-stable" RUN wget -q -O - https://dl-ssl.google.com/linux/linux_signing_key.pub | apt-key add - \ && echo "deb http://dl.google.com/linux/chrome/deb/ stable main" >> /etc/apt/sources.list.d/google-chrome.list \ && apt-get update -qqy \ && apt-get -qqy install \ ${CHROME_VERSION:-google-chrome-stable} \ && rm /etc/apt/sources.list.d/google-chrome.list \ && rm -rf /var/lib/apt/lists/* /var/cache/apt/* #============================================ # Chrome webdriver #============================================ # can specify versions by CHROME_DRIVER_VERSION # Latest released version will be used by default #============================================ ARG CHROME_DRIVER_VERSION RUN if [ -z "$CHROME_DRIVER_VERSION" ]; \ then CHROME_MAJOR_VERSION=$(google-chrome --version | sed -E "s/.* ([0-9]+)(\.[0-9]+){3}.*/\1/") \ && NO_SUCH_KEY=$(curl -ls https://chromedriver.storage.googleapis.com/LATEST_RELEASE_${CHROME_MAJOR_VERSION} | head -n 1 | grep -oe NoSuchKey) ; \ if [ -n "$NO_SUCH_KEY" ]; then \ echo "No Chromedriver for version $CHROME_MAJOR_VERSION. Use previous major version instead" \ && CHROME_MAJOR_VERSION=$(expr $CHROME_MAJOR_VERSION - 1); \ fi ; \ CHROME_DRIVER_VERSION=$(wget --no-verbose -O - "https://chromedriver.storage.googleapis.com/LATEST_RELEASE_${CHROME_MAJOR_VERSION}"); \ fi \ && echo "Using chromedriver version: "$CHROME_DRIVER_VERSION \ && wget --no-verbose -O /tmp/chromedriver_linux64.zip https://chromedriver.storage.googleapis.com/$CHROME_DRIVER_VERSION/chromedriver_linux64.zip \ && rm -rf /opt/selenium/chromedriver \ && unzip /tmp/chromedriver_linux64.zip -d /opt/selenium \ && rm /tmp/chromedriver_linux64.zip \ && mv /opt/selenium/chromedriver /opt/selenium/chromedriver-$CHROME_DRIVER_VERSION \ && chmod 755 /opt/selenium/chromedriver-$CHROME_DRIVER_VERSION \ && sudo ln -fs /opt/selenium/chromedriver-$CHROME_DRIVER_VERSION /usr/bin/chromedriver
安装字体
#================
# Font libraries
#================
# libfontconfig ~1 MB
# libfreetype6 ~1 MB
# xfonts-cyrillic ~2 MB
# xfonts-scalable ~2 MB
# fonts-liberation ~3 MB
# fonts-ipafont-gothic ~13 MB
# fonts-wqy-zenhei ~17 MB
# fonts-tlwg-loma-otf ~300 KB
# ttf-ubuntu-font-family ~5 MB
# Ubuntu Font Family, sans-serif typeface hinted for clarity 在Debian里面也可以用
# Removed packages:
# xfonts-100dpi ~6 MB
# xfonts-75dpi ~6 MB
# fonts-noto-color-emoji ~10 MB
# Regarding fonts-liberation see:
# https://github.com/SeleniumHQ/docker-selenium/issues/383#issuecomment-278367069
# Layer size: small: 50.3 MB (with --no-install-recommends)
# Layer size: small: 50.3 MB
RUN apt-get -qqy update \
&& apt-get -qqy --no-install-recommends install \
libfontconfig \
libfreetype6 \
xfonts-cyrillic \
xfonts-scalable \
fonts-liberation \
fonts-ipafont-gothic \
fonts-wqy-zenhei \
fonts-tlwg-loma-otf \
ttf-ubuntu-font-family \
fonts-noto-color-emoji \
&& rm -rf /var/lib/apt/lists/* \
&& apt-get -qyy clean
下载webdriver
下载对应版本的webDriver,这里下载90版本的
http://chromedriver.storage.googleapis.com/index.html
Dockfile
前期准备工作已经就绪,需要注意,我这里用的是本地先下载好安装包,本地安装的方式,如果是在线安装,就把本地安装的代替掉
FROM ubuntu:18.04
RUN mkdir /google
#==============================
# copy chrome and driver
#==============================
COPY chromium-browser_90.0.4430.72-0ubuntu0.16.04.1_amd64.deb \
chromedriver \
chromium-codecs-ffmpeg-extra_90.0.4430.72-0ubuntu0.16.04.1_amd64.deb \
/google/
#==============================
# Locale and encoding settings,install fonts and chrome,python3.6 pip3
#==============================
ENV LANG_WHICH en
ENV LANG_WHERE US
ENV ENCODING UTF-8
ENV LANGUAGE ${LANG_WHICH}_${LANG_WHERE}.${ENCODING}
ENV LANG ${LANGUAGE}
# Layer size: small: ~9 MB
# Layer size: small: ~9 MB MB (with --no-install-recommends)
RUN apt-get -y update \
&& apt-get -y --no-install-recommends install \
language-pack-en \
tzdata \
locales \
&& locale-gen ${LANGUAGE} \
&& dpkg-reconfigure --frontend noninteractive locales \
&& apt-get -y autoremove \
&& apt-get -y --no-install-recommends install \
libfontconfig \
libfreetype6 \
xfonts-cyrillic \
xfonts-scalable \
fonts-liberation \
fonts-ipafont-gothic \
fonts-wqy-zenhei \
fonts-tlwg-loma-otf \
ttf-ubuntu-font-family \
fonts-noto-color-emoji \
python3.6 python3-pip \
&& apt-get install -y --no-install-recommends \
/google/chromium-codecs-ffmpeg-extra_90.0.4430.72-0ubuntu0.16.04.1_amd64.deb \
/google/chromium-browser_90.0.4430.72-0ubuntu0.16.04.1_amd64.deb \
&& rm -rf /var/lib/apt/lists/* \
&& apt-get -qyy clean
#==============================
# pip3 install selenium
#==============================
RUN pip3 install selenium
目录结构
构建命令
在当前目录下面运行build命令
docker build . -t test/chrome:90
进入容器查看
基于Debian
和上面相比,主要是安装包不一样,别的都一样, chrome也得下载对应的版本
这里安装的chrome的版本是108的
下载对应平台的安装包
需要安装两个东西
-
chromium(108.0.5359.94-1~deb11u1)
-
访问页面
https://packages.debian.org/bullseye/chromium
-
下载链接
https://packages.debian.org/bullseye/amd64/chromium/download
-
-
chromium-common(108.0.5359.94-1~deb11u1)
-
访问页面
https://packages.debian.org/bullseye/chromium-common
-
下载链接
https://packages.debian.org/bullseye/amd64/chromium-common/download
-
要注意,webdriver也需要下载108的
Dockerfile
FROM ubuntu:18.04
RUN mkdir /google
#==============================
# copy chrome and driver
#==============================
COPY chromium-common_108.0.5359.94-1~deb11u1_amd64.deb \
chromedriver \
chrome/chromium_108.0.5359.94-1~deb11u1_amd64.deb \
/google/
#==============================
# Locale and encoding settings,install fonts and chrome,python3.6 pip3
#==============================
ENV LANG_WHICH en
ENV LANG_WHERE US
ENV ENCODING UTF-8
ENV LANGUAGE ${LANG_WHICH}_${LANG_WHERE}.${ENCODING}
ENV LANG ${LANGUAGE}
# Layer size: small: ~9 MB
# Layer size: small: ~9 MB MB (with --no-install-recommends)
RUN apt-get -y update \
&& apt-get -y --no-install-recommends install \
language-pack-en \
tzdata \
locales \
&& locale-gen ${LANGUAGE} \
&& dpkg-reconfigure --frontend noninteractive locales \
&& apt-get -y autoremove \
&& apt-get -y --no-install-recommends install \
libfontconfig \
libfreetype6 \
xfonts-cyrillic \
xfonts-scalable \
fonts-liberation \
fonts-ipafont-gothic \
fonts-wqy-zenhei \
fonts-tlwg-loma-otf \
ttf-ubuntu-font-family \
fonts-noto-color-emoji \
python3.6 python3-pip \
&& apt-get install -y --no-install-recommends \
/google/chromium-common_108.0.5359.94-1~deb11u1_amd64.deb \
/google/chrome/chromium_108.0.5359.94-1~deb11u1_amd64.deb \
&& rm -rf /var/lib/apt/lists/* \
&& apt-get -qyy clean
#==============================
# pip3 install selenium
#==============================
RUN pip3 install selenium
ps:需要注意,我这里在安装成功之后并没有删除deb包,可以在上面结束之后删除掉。
Python 代码
这里是将网页导出PDF
import base64
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.common.desired_capabilities import DesiredCapabilities
from selenium.webdriver.support import expected_conditions
from selenium.webdriver.support.ui import WebDriverWait
import time
option = webdriver.ChromeOptions()
option.add_argument("--headless")
option.add_argument("--no-sandbox")
driver = webdriver.Chrome(executable_path="/usr/src/app/chromedriver", port = 5055,options=option)
# 绑定命令
driver.command_executor._commands["SendCommand"] = ("POST","/session/$sessionId/chromium/send_command")
driver.command_executor._commands["executeCdpCommand"] = ("POST","/session/$sessionId/goog/cdp/execute")
driver.get("https://www.baidu.com/")
time.sleep(10)
driver.set_page_load_timeout(300) # set page load time out for wait pdf render
param = {"paperWidth": 8.27,"paperHeight": 11.69,"printBackground": True}
page_res = driver.execute("executeCdpCommand", {"cmd": "Page.printToPDF", "params": param})["value"]
img_byte_arr = base64.b64decode(page_res["data"])
with open("/test1.pdf","wb") as f:
f.write(img_byte_arr)
这里保存在容器里面,需要通过cp命令从容器copy出来才可以查看
相关的可以看
https://stackoverflow.com/questions/49614217/selenium-clear-chrome-cache
https://chromedevtools.github.io/devtools-protocol/tot/Storage/
https://peter.sh/experiments/chromium-command-line-switches/#hide-scrollbars
https://stackoverflow.com/questions/53902507/unknown-error-session-deleted-because-of-page-crash-from-unknown-error-cannot