文章目录
- 前言
- 取某列前几个字符
- 方法一:[x[:7] for x in data["calling_nbr"]]
- 方法二:data['calling_nbr'].str[:7]
前言
在进行数据分析时,有时候我们需要提取单列的前几个字符串进行分析。本文主要讲述针对这种情况处理方法。
取某列前几个字符
1、构建样例数据
import pandas as pd
import numpy as np
#主叫号码
calling_nbr=["13389012374","13389012375","13389012376","13389012377","13389012379","13389012378","16758439532","16758439533","16758439534","16758439535","16758439536","16758439537"]
#对端号码
called_nbr=["14374397533","14374397533","14374397533","15926372438","15926372439"]
#通话时间
start_date=["20230404","20230406","20230408"]
data=pd.DataFrame({
"calling_nbr":[calling_nbr[x] for x in np.random.randint(0,len(calling_nbr),20)],
"called_nbr":[called_nbr[x] for x in np.random.randint(0,len(called_nbr),20)],
"calling_duration":np.random.randint(10,120,20),
"start_date":[start_date[x] for x in np.random.randint(0,len(start_date),20)]})
data
需求: 取"calling_nbr"列主叫号码前几7个数字,比如0行中”16758439534“,需要提取“1675843”这7个字符,并创建新的一列“calling_pre_7”在dataframe中。
2、查看数据类型
data.dtypes
方法一:[x[:7] for x in data[“calling_nbr”]]
data["calling_pre_7"]=[x[:7] for x in data["calling_nbr"]]
data
方法二:data[‘calling_nbr’].str[:7]
data["calling_pre_7"]=data["calling_nbr"].str[:7]
data
注意:
方法一与方法二都是针对数据列的数据类型是字符串(pandas中的object)才有效,如果该列是其他数据类型,要转换成字符串数据类型,才可以运行成功。
举例
1、构建样例数据
import pandas as pd
import numpy as np
#主叫号码
calling_nbr=[13389012374,13389012375,13389012376,13389012377,13389012379,13389012378,16758439532,16758439533,16758439534,16758439535,16758439536,16758439537]
#对端号码
called_nbr=["14374397533","14374397533","14374397533","15926372438","15926372439"]
#通话时间
start_date=["20230404","20230406","20230408"]
data=pd.DataFrame({
"calling_nbr":[calling_nbr[x] for x in np.random.randint(0,len(calling_nbr),20)],
"called_nbr":[called_nbr[x] for x in np.random.randint(0,len(called_nbr),20)],
"calling_duration":np.random.randint(10,120,20),
"start_date":[start_date[x] for x in np.random.randint(0,len(start_date),20)]})
data
2、查看数据类型
data.dtypes
3、取”calling_nbr"前7个数字
方法一:
data["calling_pre_7"]=[x[:7] for x in data["calling_nbr"]]
方法二:
data["calling_pre_7"]=data[”calling_nbr"].str[:7]
可以看到,对数据类型是int的列直接使用两种方法都报错。
解决方法:先将“calling_nbr"转换成str数据类型。转换数据类型方法可参考:Pandas数据类型转换
data['calling_nbr']=data['calling_nbr'].astype('str')
data.dtypes
成功转换数据类型为str后,再选择以下两种方法之一,就可以运行成功了。
data["calling_pre_7"]=[x[:7] for x in data["calling_nbr"]]
data["calling_pre_7"]=data[”calling_nbr"].str[:7]