示例一、没有引入jar错误
错误描述:
以spark在pyspark环境解析xml为例
spark核心包不支持解析xml,所以需要引入引用依赖包配置为: config("spark.jars.packages", "com.databricks:spark-xml_2.12:0.16.0")
在spark-submit部署提交时,如果没有--packages
配置将报错:
Failed to find data source: xml. Please find packages
代码如下(亲测有效):
from pyspark.sql import SparkSession
import pyspark.sql.functions as F
from pyspark.sql.types import FloatType
import time
if __name__ == '__main__':
starttime = time.time()
spark = SparkSession.builder.appName("spark解析xml") \
.config("spark.sql.shuffle.partitions", "4") \
.config("spark.jars.packages", "com.databricks:spark-xml_2.12:0.16.0") \
.getOrCreate()
xmlstarttime = time.time()
# groupId: artifactId:version
# spark-submit --master local[*] --packages com.databricks:spark-xml_2.12:0.16.0 index.py
df = spark.read.format("xml") \
.options(rootTag='KYV') \
.options(rowTag='KYV') \
.option("attributePrefix", "") \
.load('hdfs://node1:8020/qar/keyValues/K_AirFASE_B_STD.xml')
print(df.printSchema())
print(df.columns)
错误示范
> spark-submit --master local[*] index.py
正确示范
> spark-submit --master local[*] --packages com.databricks:spark-xml_2.12:0.16.0 index.py
示例二、没有正确导入module模块,需要–py-files正确引入依赖包文件
提示:引入单个文件直接--py-files 文件名.py
,如果是多个文件必须zip多文件压缩后--py-files service-prod.zip
执行;
> spark-submit --master local[*] --packages com.databricks:spark-xml_2.12:0.16.0 --py-files service-prod.zip service-index.py
错误出现:
console显示如下:
Traceback (most recent call last):
File "/root/s-main/service-index.py", line 6, in <module>
from derivedParameter._AILERON_1_STD import _AILERON_1
File "<frozen zipimport>", line 259, in load_module
File "/root/s-main/service-prod.zip/derivedParameter/_AILERON_1_STD.py", line 3, in <module>
ModuleNotFoundError: No module named 'qar'
错误分析:
在文件/root/s-main/service-prod.zip/derivedParameter/_AILERON_1_STD.py
的第三行,qar
的模块不存在;
模块没有找到,是因为你没有正确引入包(也有可能项目中没导入该包),
在当前文件夹中,service-index.py是主执行文件,zip是derivedParameter、utils文件夹的压缩包,而在derivedParameter的_AILERON_1_STD.py中的from qar.derivedParameter._AILERON_L_STD import _AILERON_L
明显是错误,正确是去掉qar因为当前文件的正确路径是from derivedParameter._AILERON_L_STD import _AILERON_L