Numpy
- Numpy的优势
- ndarray属性
- 基本操作
ndarray.func()
numpy.func() - ndarray的运算:逻辑运算、统计运算、数组间运算
- 合并、分割、IO操作、数据处理,不过这个一般使用的是pandas
Numpy的优势
Numpy = numerical数值化 + python 数值计算的python库,用于快速处理任意维度的数组。
ndarrray = n任意个 + d(dimension维度) + array 任意维度的数组的意思
Numpy使用ndarray对象来处理多维数组,该对象是一个快速而灵活的大数据容器
Numpy提供了一个N维数组类型ndarray,他描述相同类型的items的集合
import numpy as np
score = np.array([[80, 89, 86, 67, 79],
[78, 97, 89, 67, 81],
[90, 94, 78, 67, 74],
[91, 91, 90, 67, 69],
[76, 87, 75, 67, 86],
[70, 79, 84, 67, 84],
[94, 92, 93, 67, 64],
[86, 85, 83, 67, 80]])
score
array([[80, 89, 86, 67, 79],
[78, 97, 89, 67, 81],
[90, 94, 78, 67, 74],
[91, 91, 90, 67, 69],
[76, 87, 75, 67, 86],
[70, 79, 84, 67, 84],
[94, 92, 93, 67, 64],
[86, 85, 83, 67, 80]])
type(score)
numpy.ndarray
## ndarray和list的效率的对比
import random
import time
import numpy as np
a = []
for i in range(5000000):
a.append(random.random())
t1 = time.time()
sum1=sum(a)
t2 = time.time()
b = np.array(a)
t4 = time.time()
sum3=np.sum(b)
t5 = time.time()
print('使用原生list的求和计算使用的时间:', t2-t1, "\t使用ndarry的时间计算:", t5-t4)
使用原生list的求和计算使用的时间: 0.03126645088195801 使用ndarry的时间计算: 0.0027697086334228516
从上面的结果显示使用ndarray的时间处理和原生的list相比更加快速
Numpy专门的针对ndarray的操作和运算进行了设计,所以数组的存储效率和输入输出性能远远的高于Python中嵌套列表
- 第一个:内存块存储风格:ndarray必须要相同的类型,可以连续存储 list的通用性强,可以不同类型数据,所以list数据之间是依靠引用的形式存储
- 第二个:并行化处理形式:ndarray支持并行化运算
- 第三个:底层语言:Numpy底层语言是c,内部解除了GIL全局解释器的限制
ndarray属性
属性
ndarray.shape:数组维度的元组
ndarray.ndim:数组维度
ndarray.size:数组中元素的个数
ndarray.itemszie:一个数组元素的长度
ndarray.dtype:数组元素的类型
score
print(score.shape) #(8, 5) 8行5列
print(score.ndim) # 2
print(score.size) # 40
print(score.itemsize) # 4
print(score.dtype) # int32
(8, 5)
2
40
4
int32
## ndarray的形状
import numpy as np
a = np.array([1, 2, 3, 4])
b = np.array([[1, 2, 3], [3, 4, 5]])
c = np.array([[[1, 3, 4], [3, 4, 5]],
[[1, 5, 7], [4, 7, 8]]])
print(a.shape, b.shape, c.shape)
(4,) (2, 3) (2, 2, 3)
print(a, '\n\n', b, '\n\n', c)
[1 2 3 4]
[[1 2 3]
[3 4 5]]
[[[1 3 4]
[3 4 5]]
[[1 5 7]
[4 7 8]]]
data = np.array([1.1, 2.2, 3.3], dtype=np.float32)
data2 = np.array([1.2, 2.2, 3.2], dtype='float32')
print(data, data.dtype, data2, data2.dtype)
[1.1 2.2 3.3] float32 [1.2 2.2 3.2] float32
生成数组
- 生成0和1的:
np.ones(shape[, dtype, order]) np.zeros(shape[, dtype, order])
- np.ones(shape=(2, 3), dtype=‘int32’)
- np.zeros(shape=(2, 3), dtype=np.float32)
- 从现有数组中生成:
np.array() np.copy() np.asarray()
- data1 = np.array(score) ## 深拷贝
- data2 = np.asarray(score) ## 浅拷贝
- data3 = np.copy(score) ## 深拷贝
- 生成固定范围的数组:
np.linspace(satrt, stop, num, endpoint, restep, detype) np.arange()
- np.linspace(0, 10, 100) ## [0, 10]产生100个等距离的数组
- np.arange(a, b, c) ## 产生[a, b) 步长为c的数组
- 生成随机数组:
np.random.rand(d0, d1, d2,....)
返回[0.0, 1.0]内的一组均匀分布的数组, d0, d1, d2表示维度的元组数据- np.random.uniform(low=0.0, high=1.0, size=None) 均匀分布[low, high),size-int类型表输出一位样本数,元组表输出的是对应维度数组
- np.random.normal(loc=0.0, scale=1.0, size=None) 正态分布 均值loc 标准差scale 形状size
np.ones(shape=(2, 4))
array([[1., 1., 1., 1.],
[1., 1., 1., 1.]])
np.zeros((4, 3))
array([[0., 0., 0.],
[0., 0., 0.],
[0., 0., 0.],
[0., 0., 0.]])
data1 = np.array([1, 3, 4, 5])
data1
array([1, 3, 4, 5])
data2 = np.asarray(data1)
data2
array([1, 3, 4, 5])
data3 = np.copy(data1)
data3
array([1, 3, 4, 5])
np.linspace(0, 10, 100)
array([ 0. , 0.1010101 , 0.2020202 , 0.3030303 , 0.4040404 ,
0.50505051, 0.60606061, 0.70707071, 0.80808081, 0.90909091,
1.01010101, 1.11111111, 1.21212121, 1.31313131, 1.41414141,
1.51515152, 1.61616162, 1.71717172, 1.81818182, 1.91919192,
2.02020202, 2.12121212, 2.22222222, 2.32323232, 2.42424242,
2.52525253, 2.62626263, 2.72727273, 2.82828283, 2.92929293,
3.03030303, 3.13131313, 3.23232323, 3.33333333, 3.43434343,
3.53535354, 3.63636364, 3.73737374, 3.83838384, 3.93939394,
4.04040404, 4.14141414, 4.24242424, 4.34343434, 4.44444444,
4.54545455, 4.64646465, 4.74747475, 4.84848485, 4.94949495,
5.05050505, 5.15151515, 5.25252525, 5.35353535, 5.45454545,
5.55555556, 5.65656566, 5.75757576, 5.85858586, 5.95959596,
6.06060606, 6.16161616, 6.26262626, 6.36363636, 6.46464646,
6.56565657, 6.66666667, 6.76767677, 6.86868687, 6.96969697,
7.07070707, 7.17171717, 7.27272727, 7.37373737, 7.47474747,
7.57575758, 7.67676768, 7.77777778, 7.87878788, 7.97979798,
8.08080808, 8.18181818, 8.28282828, 8.38383838, 8.48484848,
8.58585859, 8.68686869, 8.78787879, 8.88888889, 8.98989899,
9.09090909, 9.19191919, 9.29292929, 9.39393939, 9.49494949,
9.5959596 , 9.6969697 , 9.7979798 , 9.8989899 , 10. ])
np.arange(0, 100, 2)
array([ 0, 2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28, 30, 32,
34, 36, 38, 40, 42, 44, 46, 48, 50, 52, 54, 56, 58, 60, 62, 64, 66,
68, 70, 72, 74, 76, 78, 80, 82, 84, 86, 88, 90, 92, 94, 96, 98])
np.random.uniform(1, 2, 20)
array([1.08186729, 1.14786875, 1.70033877, 1.21356519, 1.80826522,
1.82539046, 1.2411259 , 1.94754535, 1.26016768, 1.95195603,
1.83118684, 1.93096164, 1.42540342, 1.01900246, 1.00777939,
1.94587154, 1.30147204, 1.85872718, 1.51138215, 1.72144173])
np.random.rand(2, 3)
array([[0.93695681, 0.54056962, 0.05346231],
[0.25430123, 0.4679477 , 0.42365386]])
data4 = np.random.normal(0, 1, 10000000)
data4
array([-1.37843425, 0.43112438, 0.74566392, ..., 1.11031839,
-0.35627334, -0.49286865])
import matplotlib.pyplot as plt
plt.figure(figsize=(15, 8), dpi=80)
plt.hist(data4, 1000)
plt.show()
数组的切片操作和数据索引
import numpy as np
stock_change = np.random.normal(loc=0, scale=1, size=(8, 10))
stock_change
array([[-0.0128315 , 1.36389291, 1.67468755, -1.63839812, 0.50246918,
0.40632079, 0.5468709 , -1.51506239, -0.95175431, 0.79676231],
[-0.29024725, -0.85783328, -2.88228976, 0.09475102, 0.26886068,
-0.72337737, 0.32906655, 1.38442008, 0.22017286, 0.11595155],
[-1.48797053, -0.34888996, -0.46878054, 0.06614233, -1.2163201 ,
-0.12437208, -0.48048511, 0.92053831, 1.37148844, 0.4052761 ],
[-0.68483909, 1.45441467, 0.32439071, 2.09266866, -1.40087978,
0.21482243, 1.06350017, -1.12371055, -0.21362273, -0.86489608],
[-0.8955743 , -2.80666246, -1.81775787, -0.64719575, -1.03749633,
-0.09075791, 0.04027887, 0.88156425, -0.38851649, 0.4366844 ],
[-0.6112534 , 0.20743331, -1.10785011, -1.94937533, 0.79183302,
-1.43629441, -0.39276676, 1.43465142, -0.77917209, 0.75375268],
[-0.45255197, 0.21874378, 0.74356075, 0.89123163, 0.80052696,
0.07645454, 1.18475498, 1.21210169, -2.57089921, -0.04719686],
[ 1.49996354, 1.73125796, 0.35972564, -0.31768555, -0.23859956,
0.14878977, 1.78480518, -0.157626 , 0.52180221, 1.53564593]])
stock_change[0, 0:3] # 二维数组中第一个一维数组中的第0到3个之间的数据,左闭右开
array([-1.23848824, 1.80273454, 0.48612183])
a1 = np.array([[[1, 2, 3],[4, 5, 6]], [[12, 3, 4], [5, 6, 7]]])
a1
array([[[ 1, 2, 3],
[ 4, 5, 6]],
[[12, 3, 4],
[ 5, 6, 7]]])
a1[1, 0, 2] ## 三维数组中第二个二维数组中的第一个一维数组的第三个数据
4
形状的修改
- ndarray.reshape(shape)
- ndarray.resize(shape)
- ndarray.T
print(stock_change.shape)
print(stock_change)
data = stock_change.reshape(10, 8) ## 有返回值 不修改对象stock_change的原始数据
print(data.shape)
print(data)
(10, 8)
[[-1.23848824 1.80273454 0.48612183 -0.72560924 0.70273282 1.0001417
-1.50264292 0.07910228]
[ 0.50097203 -0.30643765 -2.06606864 1.06603865 -0.24707909 -0.43582239
1.40507793 0.16617008]
[ 0.90592803 0.42831191 -0.92043446 -0.86909989 1.86906101 -0.27504789
-0.85507962 -0.06812796]
[-0.47386474 -0.12860694 0.78529739 0.6299527 1.35195163 0.52554048
-1.44443021 -0.30228474]
[-2.00270709 -0.93547033 -1.91377025 -0.44282643 0.39398671 -1.15777911
1.06886255 -0.99258445]
[ 1.46011953 0.02989662 -0.57156073 0.33255032 1.10206919 1.10728184
-0.2309872 -0.36046913]
[ 0.6419396 0.45193213 -0.28647482 2.35270101 -1.36580147 -0.3416711
-0.68923525 0.40515396]
[-0.65856583 -0.80067154 1.00151152 -0.59024112 1.72517446 0.99283299
0.32894163 0.29112266]
[-0.02950995 1.00548516 0.28799688 -0.23560119 -0.27545952 -2.06756887
0.10599702 1.29010633]
[ 0.10229354 -1.61937238 -2.19289266 -2.0243394 -1.584921 1.1576834
0.11722609 1.00201755]]
(10, 8)
[[-1.23848824 1.80273454 0.48612183 -0.72560924 0.70273282 1.0001417
-1.50264292 0.07910228]
[ 0.50097203 -0.30643765 -2.06606864 1.06603865 -0.24707909 -0.43582239
1.40507793 0.16617008]
[ 0.90592803 0.42831191 -0.92043446 -0.86909989 1.86906101 -0.27504789
-0.85507962 -0.06812796]
[-0.47386474 -0.12860694 0.78529739 0.6299527 1.35195163 0.52554048
-1.44443021 -0.30228474]
[-2.00270709 -0.93547033 -1.91377025 -0.44282643 0.39398671 -1.15777911
1.06886255 -0.99258445]
[ 1.46011953 0.02989662 -0.57156073 0.33255032 1.10206919 1.10728184
-0.2309872 -0.36046913]
[ 0.6419396 0.45193213 -0.28647482 2.35270101 -1.36580147 -0.3416711
-0.68923525 0.40515396]
[-0.65856583 -0.80067154 1.00151152 -0.59024112 1.72517446 0.99283299
0.32894163 0.29112266]
[-0.02950995 1.00548516 0.28799688 -0.23560119 -0.27545952 -2.06756887
0.10599702 1.29010633]
[ 0.10229354 -1.61937238 -2.19289266 -2.0243394 -1.584921 1.1576834
0.11722609 1.00201755]]
stock_change.resize((10, 8)) ## 无返回值 直接改变stock_change对象
stock_change
array([[-1.23848824, 1.80273454, 0.48612183, -0.72560924, 0.70273282,
1.0001417 , -1.50264292, 0.07910228],
[ 0.50097203, -0.30643765, -2.06606864, 1.06603865, -0.24707909,
-0.43582239, 1.40507793, 0.16617008],
[ 0.90592803, 0.42831191, -0.92043446, -0.86909989, 1.86906101,
-0.27504789, -0.85507962, -0.06812796],
[-0.47386474, -0.12860694, 0.78529739, 0.6299527 , 1.35195163,
0.52554048, -1.44443021, -0.30228474],
[-2.00270709, -0.93547033, -1.91377025, -0.44282643, 0.39398671,
-1.15777911, 1.06886255, -0.99258445],
[ 1.46011953, 0.02989662, -0.57156073, 0.33255032, 1.10206919,
1.10728184, -0.2309872 , -0.36046913],
[ 0.6419396 , 0.45193213, -0.28647482, 2.35270101, -1.36580147,
-0.3416711 , -0.68923525, 0.40515396],
[-0.65856583, -0.80067154, 1.00151152, -0.59024112, 1.72517446,
0.99283299, 0.32894163, 0.29112266],
[-0.02950995, 1.00548516, 0.28799688, -0.23560119, -0.27545952,
-2.06756887, 0.10599702, 1.29010633],
[ 0.10229354, -1.61937238, -2.19289266, -2.0243394 , -1.584921 ,
1.1576834 , 0.11722609, 1.00201755]])
stock_change.T ## 转置
array([[-1.23848824, 0.50097203, 0.90592803, -0.47386474, -2.00270709,
1.46011953, 0.6419396 , -0.65856583, -0.02950995, 0.10229354],
[ 1.80273454, -0.30643765, 0.42831191, -0.12860694, -0.93547033,
0.02989662, 0.45193213, -0.80067154, 1.00548516, -1.61937238],
[ 0.48612183, -2.06606864, -0.92043446, 0.78529739, -1.91377025,
-0.57156073, -0.28647482, 1.00151152, 0.28799688, -2.19289266],
[-0.72560924, 1.06603865, -0.86909989, 0.6299527 , -0.44282643,
0.33255032, 2.35270101, -0.59024112, -0.23560119, -2.0243394 ],
[ 0.70273282, -0.24707909, 1.86906101, 1.35195163, 0.39398671,
1.10206919, -1.36580147, 1.72517446, -0.27545952, -1.584921 ],
[ 1.0001417 , -0.43582239, -0.27504789, 0.52554048, -1.15777911,
1.10728184, -0.3416711 , 0.99283299, -2.06756887, 1.1576834 ],
[-1.50264292, 1.40507793, -0.85507962, -1.44443021, 1.06886255,
-0.2309872 , -0.68923525, 0.32894163, 0.10599702, 0.11722609],
[ 0.07910228, 0.16617008, -0.06812796, -0.30228474, -0.99258445,
-0.36046913, 0.40515396, 0.29112266, 1.29010633, 1.00201755]])
类型的修改和数组去重
- ndarray.astype(type)
- ndarray序列化到本地
- ndarray.tostring()
- ndarray.tobytes()
- np.unique() 去重
stock_change.astype(np.int32)
array([[ 0, 1, 1, -1, 0, 0, 0, -1, 0, 0],
[ 0, 0, -2, 0, 0, 0, 0, 1, 0, 0],
[-1, 0, 0, 0, -1, 0, 0, 0, 1, 0],
[ 0, 1, 0, 2, -1, 0, 1, -1, 0, 0],
[ 0, -2, -1, 0, -1, 0, 0, 0, 0, 0],
[ 0, 0, -1, -1, 0, -1, 0, 1, 0, 0],
[ 0, 0, 0, 0, 0, 0, 1, 1, -2, 0],
[ 1, 1, 0, 0, 0, 0, 1, 0, 0, 1]])
stock_change.tobytes() ## 之前可以使用tostring的方法
b"\x10\x83d\xcbfG\x8a\xbf\x06\xcb\n_\x81\xd2\xf5?\xf6i\x89+\x85\xcb\xfa?(\x9dK\xf1\xe06\xfa\xbf\x040\xd3<:\x14\xe0?\xf4\x96\xb4\xeb(\x01\xda?\x9b\xfe\x94e\xf7\x7f\xe1?\x80I\xb5\x10\xb2=\xf8\xbf\xf2\x01\xcbv\xc5t\xee\xbf\x92\xbe9\xac\x13\x7f\xe9?F\x98\xc71i\x93\xd2\xbf\xcf~\x07\xc6^s\xeb\xbf$a\xd4\xee\xed\x0e\x07\xc0\xf2\xf0\x87I\x9aA\xb8?/\x91\xedg\x035\xd1?\xc0\x85\xe6K\xe8%\xe7\xbf9\r\r*m\x0f\xd5?H\x8d\xcb\xab\x95&\xf6?A\xed \xca\x9f.\xcc?\xb0\xce\x0f;\x00\xaf\xbd?\xe4\xa3\x860\xba\xce\xf7\xbf\x9e5\x1b\x8c6T\xd6\xbfv\xdd\xc3\x15\x80\x00\xde\xbf\x19s/\x1c\xb4\xee\xb0?\x9c\xc7I\x11\x0cv\xf3\xbf`\xcb$A\xd9\xd6\xbf\xbf}\xbd\xa6\x99D\xc0\xde\xbf(\xedu\xc2\x0cu\xed?W\x04\xd2\xdd\x9d\xf1\xf5?MD\xf8)\x0b\xf0\xd9?[`\xc0\xaa3\xea\xe5\xbf6ozQHE\xf7?M*CB\xd1\xc2\xd4?{<!\x11\xc9\xbd\x00@\xb3\x0b\xb0\xeb\x00j\xf6\xbf\x86\xfc\xe7*M\x7f\xcb?_\xca\xdc\xbf\x18\x04\xf1?\x85]G\xea\xb7\xfa\xf1\xbfi4gX\xfdW\xcb\xbf\xc2g^\x8b:\xad\xeb\xbf\x06l\x0bo\x8b\xa8\xec\xbf{L4s\x0bt\x06\xc0;\xdd0F\x89\x15\xfd\xbf\xf6\x06\x03\xde\xd3\xb5\xe4\xbf}\x13\xc9\xc0\x95\x99\xf0\xbf\xfb\rF\x14\xe9;\xb7\xbf\xa1\xa1\x9fpn\x9f\xa4?\x19\xa3\x84<\xc65\xec?\xb9^\xa1Ft\xdd\xd8\xbf\x8b,N\x1f\xa3\xf2\xdb?\xe4@UJc\x8f\xe3\xbfC\x02\xa0\xbf,\x8d\xca?\xf5)\x82\t\xc1\xb9\xf1\xbf\xbdxl0\xa40\xff\xbfi\x02C3\xb2V\xe9?Q^\x8d\xd9\x0f\xfb\xf6\xbf\xb0\x9c\x914\x17#\xd9\xbfe\xdf\xd2\x0cU\xf4\xf6?\xf3\xf7\xf4M\xfa\xee\xe8\xbf\xb6R=\xee\xbd\x1e\xe8?\x84o$\x87\x9c\xf6\xdc\xbf\xf5\xc6$\xd6\xcb\xff\xcb?@\xf6@\xeb?\xcb\xe7?\\\xa311\xf8\x84\xec?S\xf6>\xb5\xea\x9d\xe9?\x06\x18\xed_\x86\x92\xb3?_\xaf\x14\xa2\xc1\xf4\xf2?O\xd2\x02\xbd\xc4d\xf3?p\xe7\x80\x9a3\x91\x04\xc0\xeb\xfe#\xf2/*\xa8\xbf\x9a\xfa\\\xc5\xd9\xff\xf7?\xf4\xfe\xb0\x8a;\xb3\xfb?\x97\x89l\xad\xbe\x05\xd7?\x1d\xc4\xce\xc6\xf5T\xd4\xbf\xfd\x99\xf0'n\x8a\xce\xbf:J\xe4\x15\x8b\x0b\xc3??UZ\xe1\x8f\x8e\xfc?\xebph\xb9\x16-\xc4\xbf\x87]\xab\x8b\x9a\xb2\xe0?\xb4\xa2\tw\x01\x92\xf8?"
temp = np.array([[1, 2, 3, 4], [3, 4, 5, 6]])
np.unique(temp)
array([1, 2, 3, 4, 5, 6])
ndarray的运算
- 逻辑运算:
- stock_change > 0.5 数据大于0.5的标记为True 否则为False
- stock_change[stock_change > 0.5] 返回所有大于0.5的数据
- stock_change[stock_change > 0.5] = 1.1 返回所有大于0.5的数据更改为1.1
- np.all(布尔值) 布尔值里面所有True才返回True, 只要有一个False就返回False
- np.all(stock_change[0:2, 0:5] > 0) 判断里面数据是否全部大于0
- np.any(布尔值) 布尔值里面有一个True就返回True,只有全是False才会返回False
- np.any(stock_change[0:2, 0:5] > 0) 判断里面是否有数据大于0
- 三元运算符:np.where(布尔值, True的位置的值, False位置的值)
- np.where(stock_change>0, 1, 0) 将大于0的数据置为1 否则置为0
- np.where(np.logical_and(stock_change > 0.5, stock_change < 1), 1, 0) 将大于0.5并且小于1的置为1,否则置为0
- np.where(np.logical_or(stock_change > 0.5, stock_change < -0.5), 1, 0) 将大于0.5或者小于-0.5的置为1,否则置为0
- 统计运算:
- 统计指标函数:min,max,mean,median,var,std,函数其中有一个参数axis,为1代表使用行去进行统计,为0使用列进行统计计算。
- np.max(a, axis=1) / ndarray.max(axis=1) / np.max(a) / adarray.max()
- 返回最大值、最小值的位置:
- np.argmax(a. axis=) / np.argmin(a, axis=)
- 统计指标函数:min,max,mean,median,var,std,函数其中有一个参数axis,为1代表使用行去进行统计,为0使用列进行统计计算。
- 数组间运算:
- 数组与数的运算:arr ±*/等等直接对数组中的每个元素执行相同的操作
- 数组与数组的运算:需要满足广播机制
- 广播机制:当操作两个数组进行运算的时候,numpy会比较两个数组的shape,只有满足shape对应位置相等或者相对应的一个地方为1的数组才可以进行运算,结果对应shape取相应的位置的最大值。
- 矩阵运算:矩阵matrix 矩阵必须是二维的,但是数组可以是一位的。
- np.mat() 将数组转换为矩阵
- 有两种方法来存储矩阵:ndarray二维数组、matrix数据结构
- 矩阵运算 (m, n) * (n , l) = (m, l) 也就是第一个矩阵的列数和第二个矩阵的行数要相等
- np.matmul() numpy库中用于矩阵乘法的函数,它的作用是计算两个矩阵的乘积
- np.dot() 向量点乘
逻辑运算
import numpy as np
stock_change = np.random.normal(loc=0, scale=1, size=(8, 10))
stock_change > 0.5
array([[False, False, False, False, False, False, True, True, False,
False],
[False, False, False, False, False, False, False, True, False,
False],
[False, True, False, False, False, False, True, True, True,
True],
[False, True, True, False, False, True, False, False, False,
False],
[False, False, False, True, False, False, False, True, True,
False],
[False, False, False, False, False, False, False, False, True,
False],
[False, False, False, True, False, True, False, False, True,
True],
[False, False, False, False, True, False, True, False, True,
False]])
stock_change[stock_change > 0.5]
array([1.36389291, 1.67468755, 0.50246918, 0.5468709 , 0.79676231,
1.38442008, 0.92053831, 1.37148844, 1.45441467, 2.09266866,
1.06350017, 0.88156425, 0.79183302, 1.43465142, 0.75375268,
0.74356075, 0.89123163, 0.80052696, 1.18475498, 1.21210169,
1.49996354, 1.73125796, 1.78480518, 0.52180221, 1.53564593])
stock_change[stock_change > 0.5] = 1.1
stock_change
array([[-0.0128315 , 1.1 , 1.1 , -1.63839812, 1.1 ,
0.40632079, 1.1 , -1.51506239, -0.95175431, 1.1 ],
[-0.29024725, -0.85783328, -2.88228976, 0.09475102, 0.26886068,
-0.72337737, 0.32906655, 1.1 , 0.22017286, 0.11595155],
[-1.48797053, -0.34888996, -0.46878054, 0.06614233, -1.2163201 ,
-0.12437208, -0.48048511, 1.1 , 1.1 , 0.4052761 ],
[-0.68483909, 1.1 , 0.32439071, 1.1 , -1.40087978,
0.21482243, 1.1 , -1.12371055, -0.21362273, -0.86489608],
[-0.8955743 , -2.80666246, -1.81775787, -0.64719575, -1.03749633,
-0.09075791, 0.04027887, 1.1 , -0.38851649, 0.4366844 ],
[-0.6112534 , 0.20743331, -1.10785011, -1.94937533, 1.1 ,
-1.43629441, -0.39276676, 1.1 , -0.77917209, 1.1 ],
[-0.45255197, 0.21874378, 1.1 , 1.1 , 1.1 ,
0.07645454, 1.1 , 1.1 , -2.57089921, -0.04719686],
[ 1.1 , 1.1 , 0.35972564, -0.31768555, -0.23859956,
0.14878977, 1.1 , -0.157626 , 1.1 , 1.1 ]])
print(np.all(stock_change[0:2, 0:5] > 0))
print(np.any(stock_change[0:2, 0:5] > 0))
False
True
print(np.where(stock_change>0, 1, 0))
[[0 1 1 0 0 0 1 1 1 0]
[0 1 0 0 0 1 1 0 0 1]
[0 1 0 0 0 0 0 1 0 0]
[1 0 1 0 1 1 0 0 0 1]
[1 0 0 1 0 1 0 0 1 0]
[1 1 1 1 0 1 1 1 0 1]
[0 1 0 1 0 0 1 0 1 0]
[1 1 1 0 1 1 1 0 0 1]]
print(np.where(np.logical_and(stock_change > 0.5, stock_change < 1), 1 , 0))
[[0 0 0 0 0 0 0 0 0 0]
[0 0 0 0 0 0 0 0 0 1]
[0 1 0 0 0 0 0 0 0 0]
[0 0 0 0 1 0 0 0 0 0]
[0 0 0 0 0 0 0 0 0 0]
[0 0 1 0 0 0 1 1 0 0]
[0 0 0 0 0 0 1 0 0 0]
[0 0 0 0 0 0 0 0 0 1]]
print(np.where(np.logical_or(stock_change > 0.5, stock_change < -0.5), 1 , 0))
[[1 0 0 0 1 0 0 1 1 0]
[1 1 1 1 0 1 1 1 0 1]
[0 1 1 1 1 0 1 0 1 0]
[1 0 1 0 1 1 1 0 0 1]
[0 1 0 0 0 1 0 1 1 0]
[0 1 1 0 0 0 1 1 1 1]
[1 0 0 0 0 0 1 1 0 0]
[1 0 1 0 1 0 0 1 1 1]]
统计运算
print(np.max(stock_change), stock_change.max())
2.837073584187165 2.837073584187165
print(np.mean(stock_change, axis=0), np.mean(stock_change, axis=1), np.mean(stock_change))
[-0.9652667 -0.15328082 0.08317861 -0.54300528 -0.42430401 -0.27689675
-0.03939256 0.58928582 0.11866925 0.06092911] [-0.24814861 -0.59923979 0.47094442 0.21607003 -0.15542244 -0.36903679
-0.12744662 -0.42778684] -0.15500833265906144
print(np.argmax(stock_change), np.argmax(stock_change, axis=1))
32 [7 7 7 2 3 8 5 8]
数组的运算
数组和数的运算
arr = np.array([[1, 2, 3, 2, 1, 4], [5, 6, 1, 2, 3, 1]])
arr
array([[1, 2, 3, 2, 1, 4],
[5, 6, 1, 2, 3, 1]])
arr+1
array([[2, 3, 4, 3, 2, 5],
[6, 7, 2, 3, 4, 2]])
arr*2
array([[ 2, 4, 6, 4, 2, 8],
[10, 12, 2, 4, 6, 2]])
arr/2
array([[0.5, 1. , 1.5, 1. , 0.5, 2. ],
[2.5, 3. , 0.5, 1. , 1.5, 0.5]])
arr-2
array([[-1, 0, 1, 0, -1, 2],
[ 3, 4, -1, 0, 1, -1]])
数组和数组运算
arr1 = np.array([[1, 2, 3, 2, 1, 4], [5, 6, 1, 2, 3, 1]])
arr2 = np.array([[1], [3]])
print(arr1, "\n\n", arr2)
[[1 2 3 2 1 4]
[5 6 1 2 3 1]]
[[1]
[3]]
print(arr1 * arr2, '\n\n', arr1 / arr2)
[[ 1 2 3 2 1 4]
[15 18 3 6 9 3]]
[[1. 2. 3. 2. 1. 4. ]
[1.66666667 2. 0.33333333 0.66666667 1. 0.33333333]]
矩阵运算
data = np.array([
[80, 86],
[82, 80],
[85, 78],
[90, 90],
[86, 82],
[82, 90],
[78, 80],
[92, 94]])
data
array([[80, 86],
[82, 80],
[85, 78],
[90, 90],
[86, 82],
[82, 90],
[78, 80],
[92, 94]])
data2 = np.mat([[80, 86],
[82, 80],
[85, 78],
[90, 90],
[86, 82],
[82, 90],
[78, 80],
[92, 94]])
print(data2, '\n\n', type(data2))
[[80 86]
[82 80]
[85 78]
[90 90]
[86 82]
[82 90]
[78 80]
[92 94]]
<class 'numpy.matrix'>
data3 = np.mat([[0.3], [0.7]])
data3
matrix([[0.3],
[0.7]])
print(data2 * data3, '\n\n', data @ np.array([[0.3], [0.7]])) ## 计算成绩 第一列乘上0.3 第二列乘上0.7
[[84.2]
[80.6]
[80.1]
[90. ]
[83.2]
[87.6]
[79.4]
[93.4]]
[[84.2]
[80.6]
[80.1]
[90. ]
[83.2]
[87.6]
[79.4]
[93.4]]
print(np.matmul(data2, data3), '\n\n', np.dot(data2, data3))
[[84.2]
[80.6]
[80.1]
[90. ]
[83.2]
[87.6]
[79.4]
[93.4]]
[[84.2]
[80.6]
[80.1]
[90. ]
[83.2]
[87.6]
[79.4]
[93.4]]
合并和分割
- 合并:合并可以从水平的方向进行合并,也可以在垂直的方法进行合并
- numpy.hstack(tuple(column, wise)) 水平拼接
- numpy.vstack(tuple(row, wise)) 垂直拼接
- numpy.concatenate((a1, a2, a3…), axis=0) axis=1来表示水平,axis=0表示垂直
- 分割
- np.split(ary, indices_or_sections, axis=0)
合并
import numpy as np
a = np.array([1, 2, 3])
b = np.array([2, 3, 4])
np.hstack((a, b))
array([1, 2, 3, 2, 3, 4])
np.vstack((a, b))
array([[1, 2, 3],
[2, 3, 4]])
np.concatenate((a, b), axis=0)
array([1, 2, 3, 2, 3, 4])
x = np.array([[1, 2], [3, 4]])
print(np.concatenate((x, x), axis=0))
print('\n\n', np.concatenate((x, x), axis=1))
[[1 2]
[3 4]
[1 2]
[3 4]]
[[1 2 1 2]
[3 4 3 4]]
分割
x1 = np.arange(9.0)
np.split(x1, 3)
[array([0., 1., 2.]), array([3., 4., 5.]), array([6., 7., 8.])]
x1 = np.arange(8.0)
np.split(x1, [3, 5, 6, 8]) ## 按照索引进行分割
[array([0., 1., 2.]),
array([3., 4.]),
array([5.]),
array([6., 7.]),
array([], dtype=float64)]
IO操作和数据处理
- numpy数据读取:
- np.genfromtxt(path, delimiter=) ## 文件路径和分隔符号
- np.genfromtxt(‘tes.csv’, delimiter=‘,’)
import numpy as np
data = np.genfromtxt('gh.csv', delimiter=',')
data
array([[ nan, nan, nan],
[ 12., 213., 321.],
[ 123., 345., 1241.],
[ 14., 24., 123.]])
- 对于上面的数组中的nan值的类型是float64,对于这个的一般处理方式有两种
- 将数据存在nan的行删除
- 使用该列的平均值填充到nan的位置
总结
- Numpy的优势:内存存储风格,ndrray存储相同数据,内存连续存储,底层c语言实现,支持多线程
- ndarray的属性:shape、dtype、ndim、size、itemsize
- 基本操作:ndarray.方法() np.函数()
- 生成数组的方法:np.ones(shape) np.zeros(shape)
- 从现有数组中生成:np.array() np.copy() np.asarray()
- 生成固定范围的数组:np.linspace(a, b, c) np.arange(a, b, c)
- 生成随机数:均匀分布:np.random.uniform() 正态分布:np.random.normal()
- 切片索引
- 形状修改:ndarray.reshape((a, b)) ndarray.resize((a, b)) ndarray.T
- 类型修改:ndarray.astype(type) ndarray.tobytes()
- 数组去重:np.unique()
- numpy的运算:
- 逻辑运算:
- 布尔索引
- np.all() np.any()
- np.where(a, b, c) a是布尔值 b是true对应的值 c是false对应的值
- 统计运算
- 统计指标:max min mean median var std
- 最大值最小值位置:np.argmax() np.argmin()
- 数组间运算
- 数组与数的运算:
- 数组与数组的运算:要注意广播机制
- 矩阵运算:np.mat() np.dot() np.matmul()
- 逻辑运算: