练习9-时间序列
探索Apple公司股价数据
步骤1 导入必要的库
运行以下代码
import pandas as pd
import numpy as np
visualization
import matplotlib.pyplot as plt
%matplotlib inline
步骤2 数据集地址
运行以下代码
path9 = ‘…/input/pandas_exercise/pandas_exercise/exercise_data/Apple_stock.csv’ # Apple_stock.csv
步骤3 读取数据并存为一个名叫apple的数据框
运行以下代码
apple = pd.read_csv(path9)
apple.head()
Date Open High Low Close Volume Adj Close
0 2014-07-08 96.27 96.80 93.92 95.35 65130000 95.35
1 2014-07-07 94.14 95.99 94.10 95.97 56305400 95.97
2 2014-07-03 93.67 94.10 93.20 94.03 22891800 94.03
3 2014-07-02 93.87 94.06 93.09 93.48 28420900 93.48
4 2014-07-01 93.52 94.07 93.13 93.52 38170200 93.52
步骤4 查看每一列的数据类型
运行以下代码
apple.dtypes
Date object
Open float64
High float64
Low float64
Close float64
Volume int64
Adj Close float64
dtype: object
步骤5 将Date这个列转换为datetime类型
运行以下代码
apple.Date = pd.to_datetime(apple.Date)
apple[‘Date’].head()
0 2014-07-08
1 2014-07-07
2 2014-07-03
3 2014-07-02
4 2014-07-01
Name: Date, dtype: datetime64[ns]
步骤6 将Date设置为索引
运行以下代码
apple = apple.set_index(‘Date’)
apple.head()
Open High Low Close Volume Adj Close
Date
2014-07-08 96.27 96.80 93.92 95.35 65130000 95.35
2014-07-07 94.14 95.99 94.10 95.97 56305400 95.97
2014-07-03 93.67 94.10 93.20 94.03 22891800 94.03
2014-07-02 93.87 94.06 93.09 93.48 28420900 93.48
2014-07-01 93.52 94.07 93.13 93.52 38170200 93.52
步骤7 有重复的日期吗?
运行以下代码
apple.index.is_unique
True
步骤8 将index设置为升序
运行以下代码
apple.sort_index(ascending = True).head()
Open High Low Close Volume Adj Close
Date
1980-12-12 28.75 28.87 28.75 28.75 117258400 0.45
1980-12-15 27.38 27.38 27.25 27.25 43971200 0.42
1980-12-16 25.37 25.37 25.25 25.25 26432000 0.39
1980-12-17 25.87 26.00 25.87 25.87 21610400 0.40
1980-12-18 26.63 26.75 26.63 26.63 18362400 0.41
步骤9 找到每个月的最后一个交易日(business day)
运行以下代码
apple_month = apple.resample(‘BM’)
apple_month.head()
/opt/conda/lib/python3.5/site-packages/ipykernel_launcher.py:3: FutureWarning:
.resample() is now a deferred operation
You called head(…) on this deferred object which materialized it into a dataframe
by implicitly taking the mean. Use .resample(…).mean() instead
This is separate from the ipykernel package so we can avoid doing imports until
Open High Low Close Volume Adj Close
Date
1980-12-31 30.481538 30.567692 30.443077 30.443077 2.586252e+07 0.473077
1981-01-30 31.754762 31.826667 31.654762 31.654762 7.249867e+06 0.493810
1981-02-27 26.480000 26.572105 26.407895 26.407895 4.231832e+06 0.411053
1981-03-31 24.937727 25.016818 24.836364 24.836364 7.962691e+06 0.387727
1981-04-30 27.286667 27.368095 27.227143 27.227143 6.392000e+06 0.423333
步骤10 数据集中最早的日期和最晚的日期相差多少天?
运行以下代码
(apple.index.max() - apple.index.min()).days
12261
步骤11 在数据中一共有多少个月?
运行以下代码
apple_months = apple.resample(‘BM’).mean()
len(apple_months.index)
404
步骤12 按照时间顺序可视化Adj Close值
运行以下代码
makes the plot and assign it to a variable
appl_open = apple[‘Adj Close’].plot(title = “Apple Stock”)
changes the size of the graph
fig = appl_open.get_figure()
fig.set_size_inches(13.5, 9)