import pandas as pd
from sklearn.model_selection import train_test_split
导入库
# Read the data
X_full = pd.read_csv('/content/Housing-prices-data/train.csv', index_col='Id')
X_test_full = pd.read_csv('/content/Housing-prices-data/test.csv', index_col='Id')
读取数据
index_col='Id'是为了将数据框的索引列设置为’Id’列。
# Remove rows with missing target, separate target from predictors
X_full.dropna(axis=0, subset=['SalePrice'], inplace=True)
y = X_full.SalePrice
X_full.drop(['SalePrice'], axis=1, inplace=True)
SalePrice 是我们尝试预测的目标变量。
删除训练数据中带有缺失目标值(‘SalePrice’)的行。
将目标值(‘SalePrice’)存储在变量y中,并从特征中删除。
# To keep things simple, we'll use only numerical predictors
X = X_full.select_dtypes(exclude=['object'])
X_test = X_test_full.select_dtypes(exclude=['object'])
# Shape of training data (num_rows, num_columns)print(X_train.shape)# Number of missing values in each column of training data
missing_val_count_by_column =(X_train.isnull().sum())print(missing_val_count_by_column[missing_val_count_by_column >0])
from sklearn.ensemble import RandomForestRegressor
from sklearn.metrics import mean_absolute_error
# Function for comparing different approachesdefscore_dataset(X_train, X_valid, y_train, y_valid):
model = RandomForestRegressor(n_estimators=100, random_state=0)
model.fit(X_train, y_train)
preds = model.predict(X_valid)return mean_absolute_error(y_valid, preds)
# Get names of columns with missing values
cols_with_missing =[col for col in X_train.columns
if X_train[col].isnull().any()]# Drop columns in training and validation data
reduced_X_train = X_train.drop(cols_with_missing, axis=1)
reduced_X_valid = X_valid.drop(cols_with_missing, axis=1)
# Preprocessed training and validation features
final_imputer = SimpleImputer(strategy='median')
final_X_train = pd.DataFrame(final_imputer.fit_transform(X_train))
final_X_valid = pd.DataFrame(final_imputer.transform(X_valid))
final_X_train.columns = X_train.columns
final_X_valid.columns = X_valid.columns
设置填充策略为’median’(中位数)。这意味着缺失值将会使用每列的中位数值来进行填充。
# Define and fit model
model = RandomForestRegressor(n_estimators=100, random_state=0)
model.fit(final_X_train, y_train)# Get validation predictions and MAE
preds_valid = model.predict(final_X_valid)print("MAE (Your approach):")print(mean_absolute_error(y_valid, preds_valid))
# Fill in the line below: preprocess test data
final_X_test = pd.DataFrame(final_imputer.transform(X_test))
final_X_test.columns = X_test.columns
# Fill in the line below: get test predictions
preds_test = model.predict(final_X_test)
# Save test predictions to file
output = pd.DataFrame({'Id': X_test.index,'SalePrice': preds_test})
output.to_csv('submission.csv', index=False)
之前也写过两篇关于NIOS II 出现:Downloading RLF Process failed的问题,但是总结都不是很全面,小梅哥的教程总结的比较全面特此记录。
问题:nios II 下载程序到板子上时出现 Downloading RLF Process failed的问题。 即当nios中…
Google vs IBM vs Microsoft: 哪个在线数据分析师证书最好? 对目前市场上前三个数据分析师证书进行审查和比较|Madison Hunter
似乎每个重要的公司都推出了自己版本的同一事物:专业数据分析师认证,旨在使您成为雇主的下一个热门商品。
随着…
Qt插件,也叫qt-vsaddin,它以*.vsix后缀名结尾。从visual studio 2010版本开始,VS支持Qt框架的开发,Qt以插件方式集成到VS里。这里在visual studio 2019里配置Qt 5.14.2插件,并配置Qt环境。
1 下载VS2019 下载VS2019,官…