一般的食品配送服务需要显示交付订单所需的准确时间,以保持与客户的透明度。这些公司使用机器学习算法来预测食品配送时间,基于配送合作伙伴过去在相同距离上花费的时间。
食品配送时间预测
为了实时预测食物的交付时间,我们需要计算食物准备点和食物消耗点之间的距离。在找到餐厅和送货地点之间的距离之后,我们需要找到送货合作伙伴在过去相同距离内运送食物所花费的时间之间的关系。
导入必要的Python库和数据集:
import pandas as pd
import numpy as np
import plotly.express as px
data = pd.read_csv("deliverytime.txt")
print(data.head())
输出
ID Delivery_person_ID Delivery_person_Age Delivery_person_Ratings \
0 4607 INDORES13DEL02 37 4.9
1 B379 BANGRES18DEL02 34 4.5
2 5D6D BANGRES19DEL01 23 4.4
3 7A6A COIMBRES13DEL02 38 4.7
4 70A2 CHENRES12DEL01 32 4.6
Restaurant_latitude Restaurant_longitude Delivery_location_latitude \
0 22.745049 75.892471 22.765049
1 12.913041 77.683237 13.043041
2 12.914264 77.678400 12.924264
3 11.003669 76.976494 11.053669
4 12.972793 80.249982 13.012793
Delivery_location_longitude Type_of_order Type_of_vehicle Time_taken(min)
0 75.912471 Snack motorcycle 24
1 77.813237 Snack scooter 33
2 77.688400 Drinks motorcycle 26
3 77.026494 Buffet motorcycle 21
4 80.289982 Snack scooter 30
在继续之前,让我们先看看数据大概信息:
data.info()
输出
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 45593 entries, 0 to 45592
Data columns (total 11 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 ID 45593 non-null object
1 Delivery_person_ID 45593 non-null object
2 Delivery_person_Age 45593 non-null int64
3 Delivery_person_Ratings 45593 non-null float64
4 Restaurant_latitude 45593 non-null float64
5 Restaurant_longitude 45593 non-null float64
6 Delivery_location_latitude 45593 non-null float64
7 Delivery_location_longitude 45593 non-null float64
8 Type_of_order 45593 non-null object
9 Type_of_vehicle 45593 non-null object
10 Time_taken(min) 45593 non-null int64
dtypes: float64(5), int64(2), object(4)
memory usage: 3.8+ MB
看看这个数据集是否包含任何null值:
data.isnull().sum()
输出
ID 0
Delivery_person_ID 0
Delivery_person_Age 0
Delivery_person_Ratings 0
Restaurant_latitude 0
Restaurant_longitude 0
Delivery_location_latitude 0
Delivery_location_longitude 0
Type_of_order 0
Type_of_vehicle 0
Time_taken(min) 0
dtype: int64
计算两个经纬度之间的距离
该数据集没有任何显示餐厅和送货地点之间差异的功能。我们只有餐厅的经纬度和送货地点。我们可以使用半正矢公式根据两个位置的纬度和经度计算它们之间的距离。
下面是我们如何通过使用半正矢公式根据其纬度和经度来计算餐厅和送货地点之间的距离:
# Set the earth's radius (in kilometers)
R = 6371
# Convert degrees to radians
def deg_to_rad(degrees):
return degrees * (np.pi/180)
# Function to calculate the distance between two points using the haversine formula
def distcalculate(lat1, lon1, lat2, lon2):
d_lat = deg_to_rad(lat2-lat1)
d_lon = deg_to_rad(lon2-lon1)
a = np.sin(d_lat/2)**2 + np.cos(deg_to_rad(lat1)) * np.cos(deg_to_rad(lat2)) * np.sin(d_lon/2)**2
c = 2 * np.arctan2(np.sqrt(a), np.sqrt(1-a))
return R * c
# Calculate the distance between each pair of points
data['distance'] = np.nan
for i in range(len(data)):
data.loc[i, 'distance'] = distcalculate(data.loc[i, 'Restaurant_latitude'],
data.loc[i, 'Restaurant_longitude'],
data.loc[i, 'Delivery_location_latitude'],
data.loc[i, 'Delivery_location_longitude'])
我们现在已经计算出了餐厅和送货地点之间的距离。我们还在数据集中添加了一个新的特征作为距离。让我们再看看数据集:
print(data.head())
输出
ID Delivery_person_ID Delivery_person_Age Delivery_person_Ratings \
0 4607 INDORES13DEL02 37 4.9
1 B379 BANGRES18DEL02 34 4.5
2 5D6D BANGRES19DEL01 23 4.4
3 7A6A COIMBRES13DEL02 38 4.7
4 70A2 CHENRES12DEL01 32 4.6
Restaurant_latitude Restaurant_longitude Delivery_location_latitude \
0 22.745049 75.892471 22.765049
1 12.913041 77.683237 13.043041
2 12.914264 77.678400 12.924264
3 11.003669 76.976494 11.053669
4 12.972793 80.249982 13.012793
Delivery_location_longitude Type_of_order Type_of_vehicle Time_taken(min) \
0 75.912471 Snack motorcycle 24
1 77.813237 Snack scooter 33
2 77.688400 Drinks motorcycle 26
3 77.026494 Buffet motorcycle 21
4 80.289982 Snack scooter 30
distance
0 3.025149
1 20.183530
2 1.552758
3 7.790401
4 6.210138
数据探索
现在,让我们探索数据以找到特征之间的关系。先来看看距离和运送食物所需时间之间的关系:
figure = px.scatter(data_frame = data,
x="distance",
y="Time_taken(min)",
size="Time_taken(min)",
trendline="ols",
title = "Relationship Between Distance and Time Taken")
figure.show()
在运送食物所花费的时间和行进的距离之间存在一致的关系。这意味着,无论距离远近,大多数配送合作伙伴都能在25-30分钟内将食物送达。
现在我们来看看送餐时间和送餐人年龄的关系:
figure = px.scatter(data_frame = data,
x="Delivery_person_Age",
y="Time_taken(min)",
size="Time_taken(min)",
color = "distance",
trendline="ols",
title = "Relationship Between Time Taken and Age")
figure.show()
在运送食物所花费的时间与运送人的年龄之间存在线性关系。这意味着与年长的合作伙伴相比,年轻的交付合作伙伴需要更少的时间来交付食物。
现在我们来看看送餐时间和送餐人评分之间的关系:
figure = px.scatter(data_frame = data,
x="Delivery_person_Ratings",
y="Time_taken(min)",
size="Time_taken(min)",
color = "distance",
trendline="ols",
title = "Relationship Between Time Taken and Ratings")
figure.show()
在运送食物所花费的时间与运送人的评级之间存在逆线性关系。这意味着与评分较低的合作伙伴相比,评分较高的合作伙伴需要更少的时间来交付食物。
现在让我们来看看客户订购的食品类型和送货合作伙伴使用的车辆类型是否会影响送货时间:
fig = px.box(data,
x="Type_of_vehicle",
y="Time_taken(min)",
color="Type_of_order")
fig.show()
因此,根据他们驾驶的车辆和他们运送的食物类型,送货合作伙伴所花费的时间没有太大差异。
因此,根据我们的分析,对食品交付时间贡献最大的特征是:
- 交付伙伴的年龄
- 交付合作伙伴的评级
- 餐厅和送货地点之间的距离
在下面的部分中,带你学习如何训练机器学习模型来预测食物交付时间。
食品配送时间预测模型
现在,让我们使用LSTM神经网络模型来训练机器学习模型,用于食物交付时间预测任务:
#splitting data
from sklearn.model_selection import train_test_split
x = np.array(data[["Delivery_person_Age",
"Delivery_person_Ratings",
"distance"]])
y = np.array(data[["Time_taken(min)"]])
xtrain, xtest, ytrain, ytest = train_test_split(x, y,
test_size=0.10,
random_state=42)
# creating the LSTM neural network model
from keras.models import Sequential
from keras.layers import Dense, LSTM
model = Sequential()
model.add(LSTM(128, return_sequences=True, input_shape= (xtrain.shape[1], 1)))
model.add(LSTM(64, return_sequences=False))
model.add(Dense(25))
model.add(Dense(1))
model.summary()
输出
Model: "sequential"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
lstm (LSTM) (None, 3, 128) 66560
lstm_1 (LSTM) (None, 64) 49408
dense (Dense) (None, 25) 1625
dense_1 (Dense) (None, 1) 26
=================================================================
Total params: 117,619
Trainable params: 117,619
Non-trainable params: 0
_________________________________________________________________
# training the model
model.compile(optimizer='adam', loss='mean_squared_error')
model.fit(xtrain, ytrain, batch_size=1, epochs=9)
输出
Epoch 1/9
41033/41033 [==============================] - 410s 10ms/step - loss: 69.7154
Epoch 2/9
41033/41033 [==============================] - 405s 10ms/step - loss: 63.6772
Epoch 3/9
41033/41033 [==============================] - 404s 10ms/step - loss: 61.4656
Epoch 4/9
41033/41033 [==============================] - 406s 10ms/step - loss: 60.5741
Epoch 5/9
41033/41033 [==============================] - 401s 10ms/step - loss: 59.7685
Epoch 6/9
41033/41033 [==============================] - 401s 10ms/step - loss: 59.3501
Epoch 7/9
41033/41033 [==============================] - 397s 10ms/step - loss: 59.3121
Epoch 8/9
41033/41033 [==============================] - 402s 10ms/step - loss: 58.6929
Epoch 9/9
41033/41033 [==============================] - 399s 10ms/step - loss: 58.6897
现在,让我们通过输入来预测食物交付时间来测试我们的模型的性能:
print("Food Delivery Time Prediction")
a = int(input("Age of Delivery Partner: "))
b = float(input("Ratings of Previous Deliveries: "))
c = int(input("Total Distance: "))
features = np.array([[a, b, c]])
print("Predicted Delivery Time in Minutes = ", model.predict(features))
输出
Food Delivery Time Prediction
Age of Delivery Partner: 29
Ratings of Previous Deliveries: 2.9
Total Distance: 6
1/1 [==============================] - 0s 23ms/step
Predicted Delivery Time in Minutes = [[41.34929]]
这就是如何使用机器学习来完成使用Python进行食物配送时间预测的任务。