一、写在开头

阅读者可能需要先阅读2023年国赛C题才能读懂下面的内容。

文章着重于解题方向指引和经历分享，只解释部分核心代码。

二、内容概述

刚刚做完比赛，对这段经历和对问题的处理方法进行下记录。

三、个人经历

今年大三，第一次参加数学建模，队友是大二9月份确定好的，徐同学和周同学（两个人性格都超级好）。大二上的时候学院有一门课就是数模，没怎么认真听，不过最后的大作业比较认真的拿sklearn的BP神经网络和k-means写了一个关于中草药的国赛题。后来学院有组织数模培训，但是听了听就是机器学习，都学过就没有去。赛前没有准备太多，为了防止手生也为了CSP在leecode上刷了二三十道dp。

周同学不会用python，沟通能力很好，负责论文（是计科的，不会写但是能看懂代码，word很会用也很认真），徐同学和我一个专业，代码不如我但是自学能力、抗压能力沟通能力很强，也很认真，他论文和模型都做了一些，是队员1所以也充当队长，我负责代码和模型（最后提交的代码都是我敲的）。

我们第一天出题很快就确定了选C题，因为另外两个都读不懂，关于物理的问题也不是我们擅长的（全员网计学院）。开题晚上进行了数据处理，没有做题，11点回去休息。第二天因为读取数据太慢（附件2 87万条数据）花了一上午没做题把处理后数据存到了文件里，当时觉得特别焦虑一上午啥也没写，但是后来做题因为提前存了文件节约了相当多的时间。然后第一天做完了第一题，第二题开了个头。第二天白天做完了第二题，第三题没碰，晚上才开始写第三题，我的队友们在我写第二题的时候在看第三和第四题，当时感觉进度相当慢，有点焦虑，感觉C题比感觉到的难。然后通宵写第三题，我现学python解最优化，但是题目是非凸的，相当不好解。队友用SPSS预测了销量，然后我再继续处理，第二天上午做完了，解最优化用的是现学的遗传算法（之前听说过也了解他的原理，但是是第一次实用）。第四题是开放性问题，我没太重视，写完第三题感觉任务不太多了，摆烂状态帮忙改论文。那天下午继续修改论文，但是下午四点我突然发现我的第三问有一个点写的相当有问题，徐同学觉得别改了，没时间了。但是我感觉时间还够，和他说了我可以后他没再说啥，继续改论文，我就开始高度集中精神修改第四问，然后改的时候也挺焦虑的，特别怕队友怪我（队友什么都没说），不过最后我用1个半小时成功把代码修改好了，结果也很不错。然后就是和队友修修改改论文，八点交MD5回去休息，任务完成。

开题晚上和第一天晚上都是正常作息。第二天晚上熬夜干题，我没有纯熬，12点去找了个桌子椅子睡了3个小时，3点又爬起来写到6点开始恶心又去休息了30分钟起来继续写，徐同学晚上过来我那屋躺地上睡了3小时，周同学断断续续休息了3多小时左右。

当时我们在怎么解第二题和第三题是有分歧的。开题晚上时候第二题我认为定价策略应该算完全成本加成的系数，徐同学认为应该直接用定价，我俩谁也说服不了谁所以决定先去做题，我去做数据处理和第一题，徐同学去做第二题，但是第二天上午他发现他处理后的数据并不和他想的那样，定价和销量负相关，然后俩个人又想了想他觉得是应该用成本加成定价系数，所以我去重做第二问定价策略的部分，他去学ARIMA来做销量预测。我在第二天下午埋头磕第二题的时候徐同学尝试解第三题，我看他写了一个最优化的式子，但是我俩最优化成绩都稀烂（绩点4/5，最优化考70）所以当时讨论怎么解的时候俩人都否定了这么解，然后我继续闷头干题2，他去想新解法。晚上徐同学认为可以用背包加贪心策略去做，但是我感觉这么做一是方法简单难拿高分，而是没法用到前面的结论，所以否定这种做法，我的想法是把之前解第二问的思想加到第三问中，然后现学，磕最优化解法。但是徐同学觉得无法用python实现（我们三个都不擅长matlab，python之前没做过），然后我尝试说服他，然后他觉得我的想法并不合理，也难实现，剩下的时间来不及，提了好几个不合理的点（里面确实有一个地方相当不合理，关于如何预测第三问菜品定价的，后面我会细说）一直不支持我这么做。之后又尝试说服他了好一会儿，他说你就按你的想法做，我就开始干。

下面写下关于那个相当不合理的点的处理。第二天（最后一天）下午四点半我突然发现，第三问的菜品定价系数预测是直接使用第二问用大类的销量和定价系数数据进行训练的神经网络。也就是说我的目标是通过对神经网络输入小类的销量，对具体小类的定价系数进行预测，但是我这里用来预测的神经网络是用大类的数据训练的，而大类的两个值：销量和定价系数，前者是大类总销量，后者是大类定价系数加权平均值。所以我此时建立的实际映射是：通过大类中的大类总销量值约为目前单个蔬菜的销量值对当前单个蔬菜的定价系数进行预测（也就是拿大类全天买的销量极小的某些位置对单品类定价系数进行预测）。这个映射就相当有问题，所以我还是坚持要改，挑选数据，重新训练小品类模型把大品类模型换掉。

四、具体解法

4.1数据处理

首先引入处理excel文件的pandas包和处理矩阵用的numpy包。

读入附件1，这个文件就是小品id、小品类名称、所在的大品类id、大品类名称的对应，相当适合做成hash表，我用python dict()实现。并顺序存储了两个名称、id在list()中，以后遍历就按照这个遍历和确定对立关系（小品类251种，大品类6种）。

读入附件2，文件太大肯定不能每次都来读取他，需要压缩转存。浏览四个问题我们可以知道，题目中提问的时间最小单位为“天”，而附件二每一笔交易是不同时刻的，即“X年X月X日X时X分”，交易的时和分我们是不需要考虑的，所以我们可以读入附件2后，首先遍历确定其天数有多少（函数名：function_time_count(self)，result:1085天）。然后初始化两个空矩阵，一个251*1085用于存放每个小品类的每天销量，一个6*1085用于存放每个大品类的每天销量。

def read_datas(path_data1, path_data2, path_data3, path_data2_sort):
    data2 = pd.read_excel(path_data2)
    data1 = pd.read_excel(path_data1)
    data3 = pd.read_excel(path_data3)
    data22 = pd.read_excel(path_data2_sort)
    return data1, data2, data3, data22


class Data():
    def __init__(self, path_data1=r"D:\win10浏览器\CUMCM2023Problems\C题\附件1.xlsx",
                 path_data2=r"D:\win10浏览器\CUMCM2023Problems\C题\附件2_做第一题简化版.xlsx",
                 path_data3=r"D:\win10浏览器\CUMCM2023Problems\C题\附件3_去除改版.xlsx",
                 path_data2_sort=r"D:\win10浏览器\CUMCM2023Problems\C题\附件2_打折与否排序.xlsx",
                 read_ori_plus=False
                 ):
        self.data1, self.data2, self.data3, self.data22 = read_datas(path_data1, path_data2, path_data3,
                                                                     path_data2_sort)
        self.types = dict()
        self.data1_values = self.data1.values
        self.data2_values = self.data2.values
        self.data3_values = self.data3.values

        self.data22_values = self.data22.values
        self.data2_values_fullSort = self.data22_values
        self.read_ori_plus = read_ori_plus

    def function_integration(self):
        self.function1()
        self.function2()
        # self.record_one_food_function()

    # dict[单位编号]=序列号
    def function1(self):
        rows_type = dict()  # dict[单位编号]=序列号
        i = -1
        type_names = list()
        for row in self.data1_values:
            i += 1
            rows_type[int(row[0])] = i
            type_names.append(row[1])

        self.type_names = type_names
        self.rows_type = rows_type

        # 保存文件
        tf = open(get_path("names_type.json"), "w")
        json.dump(type_names, tf)
        tf.close()

        tf = open(get_path("findIndexFromID.json"), "w")
        json.dump(rows_type, tf)
        tf.close()

    # 共6种大类 dict[分类]=list(编码)
    # dict[小id]=爸爸id
    def function2(self):
        rows_type_big = dict()  # dict[分类]=list(编码)
        typeBig_names = list()
        find_bigtype_fromsmallID = dict()  # dict[小id]=爸爸id
        typeBig_ids = list()  # 大类的id

        before_typeBig = 123
        for row in self.data1_values:
            big_id = int(row[2])
            big_name = row[3]
            id = int(row[0])
            if before_typeBig != big_id:
                before_typeBig = big_id
                typeBig_names.append(big_name)
                typeBig_ids.append(big_id)
                rows_type_big[big_id] = list()
            rows_type_big[big_id].append(id)
            find_bigtype_fromsmallID[id] = big_id

        self.typeBig_ids = typeBig_ids
        self.rows_type_big = rows_type_big  # dict[分类]=list(编码)
        self.typeBig_names = typeBig_names
        self.find_bigtype_fromsmallID = find_bigtype_fromsmallID

        # 保存文件
        tf = open(get_path("findIndexFromBigID.json"), "w")
        json.dump(rows_type_big, tf)
        tf.close()
        tf = open(get_path("names_bigType.json"), "w")
        json.dump(typeBig_names, tf)
        tf.close()
        tf = open(get_path("listBig_ids.json"), "w")
        json.dump(typeBig_ids, tf)
        tf.close()
        tf = open(get_path("find_bigtype_fromsmallID.json"), "w")
        json.dump(find_bigtype_fromsmallID, tf)
        tf.close()

    # 用于求得单个品种食物每天的销量
    # 用于求得单个大品类每天的销量
    def record_one_food_function(self):
        time = '0'
        record_one_type = np.zeros(shape=(251, 1085))
        record_one_bigtype = np.zeros(shape=(6, 1085))
        i = -1
        record_time = list()
        for current in self.data2_values:
            current_time = current[0]
            current_id = int(current[1])
            current_weight = float(current[2])
            if time != current_time:
                time = current_time
                i += 1  # index
                record_time.append(time)

            food_number = self.rows_type[current_id]
            # print(food_number,self.type_names[food_number],current_id)
            record_one_type[food_number][i] += current_weight

            big_id = self.find_bigtype_fromsmallID[current_id]
            index_bigId = self.typeBig_ids.index(big_id)
            # print(current_id,index_bigId,big_id,self.typeBig_names[index_bigId])
            record_one_bigtype[index_bigId][i] += current_weight

        self.record_one_type = record_one_type
        self.record_one_bigtype = record_one_bigtype
        self.record_time = record_time

    # 用于求得单个品种的每天销量
    # result=1085
    def function_time_count(self):
        # 看有几天
        sum_time = 0
        current = '0'
        for row in self.data2_values:
            if current != row[0]:
                current = row[0]
                sum_time += 1
        print(sum_time)
        # 按行遍历
        current_time = '0'
        for row in self.data2_values:
            if current_time != row[0]:
                current_time = row[0]

    # 计算每日物品的进货价
    def function_deal_data3(self):
        values3 = self.data3_values
        times3_list = list()
        record_cost = np.zeros(shape=(251, 1085))

        time = '0'
        i = -1
        for current in values3:
            current_time = current[0]
            current_id = int(current[1])
            current_cost = float(current[2])
            if time != current_time:
                time = current_time
                i += 1
                times3_list.append(time)

            food_number = self.rows_type[current_id]
            record_cost[food_number][i] = current_cost
        # save
        np.save(r'D:\win10浏览器\CUMCM2023Problems\C题\record_cost.npy', record_cost)
        np.save(r"D:\win10浏览器\CUMCM2023Problems\C题\time3_list.npy", np.array(times3_list))

    # 时间Int化
    def deal_with_time_save(self):
        self.time_Int = time_Int(self.time3_list)
        np.save(r'D:\win10浏览器\CUMCM2023Problems\C题\time_Int.npy', self.time_Int)

    # 将每天的每个品类和每个单品的销量进行记录
    def save_array(self):
        np.save(r'D:\win10浏览器\CUMCM2023Problems\C题\record_one_type.npy', self.record_one_type)
        np.save(r'D:\win10浏览器\CUMCM2023Problems\C题\record_one_bigtype.npy', self.record_one_bigtype)

    def load_array(self):
        # record_one_food_function
        self.record_one_type = np.load(r'D:\win10浏览器\CUMCM2023Problems\C题\record_one_type.npy')
        self.record_one_bigtype = np.load(r'D:\win10浏览器\CUMCM2023Problems\C题\record_one_bigtype.npy')

        self.record_cost = np.load(r'D:\win10浏览器\CUMCM2023Problems\C题\record_cost.npy')
        self.time3_list = np.load(r'D:\win10浏览器\CUMCM2023Problems\C题\time3_list.npy', allow_pickle=True)

        if self.read_ori_plus == True:
            self.plus_type = np.load(r'D:\win10浏览器\CUMCM2023Problems\C题\plus_type.npy')
            self.weight_type_noDiscount = np.load(r'D:\win10浏览器\CUMCM2023Problems\C题\weight_type_noDiscount.npy')
            self.weight_type_Discount = np.load(r'D:\win10浏览器\CUMCM2023Problems\C题\weight_type_Discount.npy')
            self.plus_lower_type = np.load(r'D:\win10浏览器\CUMCM2023Problems\C题\plus_lower_type.npy')

        self.plus_Bigtype = np.load(r'D:\win10浏览器\CUMCM2023Problems\C题\plus_Bigtype.npy')
        self.weight_Bigtype_noDiscount = np.load(r'D:\win10浏览器\CUMCM2023Problems\C题\weight_Bigtype_noDiscount.npy')
        self.weight_Bigtype_Discount = np.load(r'D:\win10浏览器\CUMCM2023Problems\C题\weight_Bigtype_Discount.npy')
        self.plus_lower_Bigtype = np.load(r'D:\win10浏览器\CUMCM2023Problems\C题\plus_lower_Bigtype.npy')

        self.time_Int = np.load(r'D:\win10浏览器\CUMCM2023Problems\C题\time_Int.npy')
        self.time_Int_list = list(self.time_Int)
        # function1
        tf = open(get_path("names_type.json"), "r")
        new_dict = json.load(tf)  # list
        self.names_type = new_dict
        tf.close()
        self.type_names = new_dict

        tf = open(get_path("findIndexFromID.json"), "r")
        new_dict = json.load(tf)
        self.findIndexFromID = new_dict  # dict
        tf.close()
        self.rows_type = new_dict

        # function2
        tf = open(get_path("findIndexFromBigID.json"), "r")
        new_dict = json.load(tf)  #
        self.findIndexFromBigID = new_dict
        tf.close()
        self.rows_type_big = new_dict  # dict[分类]=list(编码)

        tf = open(get_path("names_bigType.json"), "r")
        new_dict = json.load(tf)
        self.names_bigType = new_dict  # dict
        tf.close()
        self.typeBig_names = new_dict

        tf = open(get_path("listBig_ids.json"), "r")
        new_dict = json.load(tf)  # list
        self.listBig_ids = new_dict
        tf.close()
        self.typeBig_ids = new_dict

        tf = open(get_path("find_bigtype_fromsmallID.json"), "r")
        new_dict = json.load(tf)
        self.find_bigtype_fromsmallID = new_dict  # dict
        tf.close()
        self.find_bigtype_fromsmallID = new_dict  # 没变

    # 计算加成值
    def addition(self):
        pass

    def save_array2(self):
        pass

其中的核心代码是（获得251种小类和6大类1085天每日销量）：

    # 用于求得单个品种食物每天的销量
    # 用于求得单个大品类每天的销量
    def record_one_food_function(self):
        time = '0'
        record_one_type = np.zeros(shape=(251, 1085))
        record_one_bigtype = np.zeros(shape=(6, 1085))
        i = -1
        record_time = list()
        for current in self.data2_values:
            current_time = current[0]
            current_id = int(current[1])
            current_weight = float(current[2])
            if time != current_time:
                time = current_time
                i += 1  # index
                record_time.append(time)

            food_number = self.rows_type[current_id]
            # print(food_number,self.type_names[food_number],current_id)
            record_one_type[food_number][i] += current_weight

            big_id = self.find_bigtype_fromsmallID[current_id]
            index_bigId = self.typeBig_ids.index(big_id)
            # print(current_id,index_bigId,big_id,self.typeBig_names[index_bigId])
            record_one_bigtype[index_bigId][i] += current_weight

        self.record_one_type = record_one_type
        self.record_one_bigtype = record_one_bigtype
        self.record_time = record_time

4.2第一问求解

引入做二维图的matplotlib.pyplot。

对于大类，一共就6个，直接使用np.corrcoef()计算他们之间的相关系数（皮尔逊相关系数），然后用plt把相关系数们当作热力图输出出来就成，效果如下：

核心代码（热力图）：

# 热力图
def heat_map_type(d):
    np.set_printoptions(threshold=np.inf)

    # print(d.record_one_type[:,0])
    # print(d.record_one_type[:,-1])
    # print(d.record_one_bigtype[0,:])
    corr = pd.DataFrame(d.record_one_type)
    corr = corr.corr()
    # print(corr)

    fig = plt.figure()
    ax = fig.add_subplot(1, 1, 1)
    p1 = sns.heatmap(corr, xticklabels=d.type_names, yticklabels=d.type_names, cmap='viridis')
    ax.set_title('Heat Map2')

    s1 = p1.get_figure()
    plt.show()
    s1.savefig(r'D:\win10浏览器\CUMCM2023Problems\C题\HeatMap_small.jpg', dpi=300, bbox_inches='tight')
    print('ok')

对于小类，251种啊，画出个热力图一个小格快成像素点了不能这么弄，所以用聚类，我用的k-means，为了获取相关度尽量大的把k设为80，将与其他蔬菜相关度低的蔬菜单独聚成一簇，那些簇内元素不为1的就是我实际需要的相关度较高的小类蔬菜。聚类的结果是有相当多一部分的蔬菜是聚到了一类（超过20个菜品聚到了一个簇），剩余的都是一个簇内不超过15个，以四个以下居多，举两个例子：

对于那个相当多元素的簇，我的解释是那些是蔬菜可能可以种大棚里，或者居民在各个季节需求都比较平均，没啥淡旺季，所以进一簇了（就比如小米辣椒）。

核心代码（小品类寻找有联系的蔬菜）：

def KMS_type_show(d, k=80):
    kms = KMeans(n_clusters=k, random_state=0).fit(d.record_one_type)
    labels = kms.labels_
    # record_cluster = labels.sorted()
    # 首先剔除常用少量蔬菜
    count = np.bincount(labels)
    biggest_count = count.max()
    biggest_count_number = np.where(biggest_count == count)[0][0]
    # 其次剔除单个元素
    cluster_need_index = list()
    cluster_need_count = list()
    index = -1
    for i in count:
        index += 1
        if i <= 1 or i == biggest_count:
            pass
        else:
            cluster_need_index.append(index)  # 簇名
            cluster_need_count.append(i)  # 出现次数
    print('最大簇的簇的簇标记为{},其内商品有:'.format(biggest_count_number))
    # goods序号
    cluster_Biggest = np.where(labels == biggest_count_number)[0]
    cluster_set = np.zeros(shape=(biggest_count, 1085))
    names_cs = list()
    for i in range(len(cluster_Biggest)):
        print(d.type_names[cluster_Biggest[i]])
        cluster_set[i, :] = d.record_one_type[cluster_Biggest[i], :]
        names_cs.append(d.type_names[cluster_Biggest[i]])
    # show
    x = np.arange(1085)
    x = np.expand_dims(x, 0).repeat(biggest_count, axis=0)
    plt.plot(x.T, cluster_set.T)
    plt.legend(names_cs, loc="best")
    plt.show()

    # 处理有效簇
    print('以下为有效簇的信息：')
    for i in range(len(cluster_need_index)):
        print('簇标记为{},簇内有商品{}个,分别为：'.format(cluster_need_index[i], cluster_need_count[i]))
        cluster_current = np.where(labels == cluster_need_index[i])[0]
        names_cs = list()
        cluster_set = np.zeros(shape=(cluster_need_count[i], 1085))

        index = -1
        for j in cluster_current:
            index += 1
            print('{}'.format(d.type_names[j], end=' '))
            names_cs.append(d.type_names[j])
            cluster_set[index, :] = d.record_one_type[j, :]
            # show
        x = np.arange(1085)
        x = np.expand_dims(x, 0).repeat(cluster_need_count[i], axis=0)
        plt.plot(x.T, cluster_set.T)
        plt.legend(names_cs, loc="best")
        plt.show()

4.3第二问求解

首先要做出大品类销量和成本加成定价的关系，我的做法是找单个大品类每日的销量和成本加成系数的关系，而题目中说明了"从需求侧来看，蔬菜类商品的销售量与时间往往存在一定的关联关系",所以认为应该考虑大品类每日的销量是受时间的影响的，因此我对每一种大类都使用聚类聚成两类，将其分为淡旺季，然后再进一步对每一季处理。

核心代码（分两季）：

def show_bigtype(d):
    x = np.arange(1085)
    x = np.expand_dims(x, 0).repeat(6, axis=0)
    plt.plot(x.T, d.record_one_bigtype.T)
    plt.legend(d.typeBig_names, loc="best")
    plt.show()
def mounth_analysis():
    mounth_record = np.zeros(shape=(6, 12))
    two_season = np.zeros(shape=(6, 12))

    record = np.load(r'D:\win10浏览器\CUMCM2023Problems\C题\record_one_bigtype.npy')
    day_record = np.load(r'D:\win10浏览器\CUMCM2023Problems\C题\time3_list.npy', allow_pickle=True)

    for i in range(6):
        current_month = 7
        index = 0
        for j in range(len(day_record)):
            if current_month != day_record[j].month:
                current_month = day_record[j].month
            mounth_record[i][current_month - 1] += record[i][j]
    for i in range(6):
        kms = KMeans(n_clusters=2, random_state=0).fit(mounth_record[i, :].reshape(-1, 1))
        two_season[i, :] = kms.labels_
    np.save(r"D:\win10浏览器\CUMCM2023Problems\C题\two_season.npy", two_season)
    print(two_season)

分为两季后，对于每一季的销量和定价系数分别处理、显示、分析，以下举两个图片：

可以看出，销量受加成系数影响较低，不呈现负相关，商超在某种菜品销量高时更倾向于提高定价以获取更多利润。

然后要给出各蔬菜品类未来一周(2023 年 7 月 1-7 日)的日补货总量和定价策略，我们首先用季节性ARIMA时间序列分析对这一周的每日销量进行预测，然后将销量作为这几天的日补货总量。再用季节销量做属性，加成系数做标签，用神经网络对他俩做拟合，以后通过销量预测定价用。部分拟合结果：

之后将ARIMA预测到的销量输入刚才训练好的bp网络中，确定加成定价系数。

核心代码（训练神经网路并展示模型）：

sklearn.neural_network import MLPRegressor
from sklearn.model_selection import train_test_split
import joblib
def Regression_show2(save_sn):
    # 防止刷新模型
    if save_sn != None:
        return
    model = list(list(list() for j in range(2)) for i in range(6))
    for foodBig in range(6):
        for reason in range(2):
            # x是加成定价，y是销量
            X = save_sn[foodBig][reason][0].reshape(-1)
            y = save_sn[foodBig][reason][1].reshape(-1, 1)
            X, y = y, X

            X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.38, random_state=42)
            mlp_regressor = MLPRegressor(hidden_layer_sizes=(400), max_iter=5000, random_state=42)
            mlp_regressor.fit(X_train, y_train)
            y_pred = mlp_regressor.predict(X_test)
            plt.scatter(X_test, y_test, c='b', label='Data')
            plt.scatter(X_test, y_pred, c='r', label='Predictions')
            plt.xlabel('X')
            plt.ylabel('y')
            plt.legend()
            plt.show()
            # save model
            joblib.dump(mlp_regressor,
                        r'D:\win10浏览器\CUMCM2023Problems\C题\model2\model{}_{}.pkl'.format(foodBig, reason))
#提选出时间

4.4第三问求解

首先写上约束公式，出优化公式：

公式解释：

用scikit-opt(国人写的包)里的GA(遗传算法)和sklearn.neural_network里的MLPRegressor。先筛选出6月24日到6月30日的可以进的蔬菜，然后用第二题的相同的方法每一个小类训练训练60个输入销量输出加成系数的神经网络，之后引入sko的GA包把优化公式写进去迭代求解就行，附上官方API:

文档 (scikit-opt.github.io)

迭代四百次的结果（论文里好像忘记加这个图了呜呜）：

可以看出最后最优值（纯利润）在1150RMB收敛了。

核心代码（训练小类网络和用GA算法解优化问题）：

def Regression_show():
    plus = np.load(get_path('plus_2_624707.npy'))
    wei  = np.load(get_path('wei_2_624707.npy'))
    #food_sale可用

    for index in range(max_choice):
        #id = food_sale[index]
        #type_index = findIndexFromID[str(id)]
        # x是加成定价，y是销量
        X = plus[index][:].reshape(-1)
        y = wei[index][:].reshape(-1,1)
        X, y = y, X

        X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.38, random_state=42)
        mlp_regressor = MLPRegressor(hidden_layer_sizes=(400), max_iter=5000, random_state=42)
        mlp_regressor.fit(X_train, y_train)
        if index<10:
            y_pred = mlp_regressor.predict(X_test)
            plt.scatter(X_test, y_test, c='b', label='Data')
            plt.scatter(X_test, y_pred, c='r', label='Predictions')
            plt.xlabel('X')
            plt.ylabel('y')
            plt.legend()
            plt.show()
        # save model
        joblib.dump(mlp_regressor,r'D:\win10浏览器\CUMCM2023Problems\C题\model2\model{}.pkl'.format(index))
def read_modelAnd_predict2(predict_list, bigFoodIndex=0, reason=0):
    # load model
    # print('predict_list',predict_list)
    rfc2 = joblib.load(r"D:\win10浏览器\CUMCM2023Problems\C题\model\model{}_{}.pkl".format(bigFoodIndex, reason))
    result = rfc2.predict(np.array(predict_list).reshape(-1, 1))
    # print('result',result)
    return result
def read_modelAnd_predict(predict_list,Index=0):
    # load model
    # print('predict_list',predict_list)
    rfc2 = joblib.load(r"D:\win10浏览器\CUMCM2023Problems\C题\model2\model{}.pkl".format(Index))
    result = rfc2.predict(np.array(predict_list).reshape(-1, 1))
    # print('result',result)
    return result

#优化目标
def fun_question(x):
    #loss_npList = list(loss_npList)
    es = x[:max_choice:]#选取61个值     进货量
    qs = x[max_choice::]#选取61个值     是否选择
    w=0.0
    for i in range(max_choice):#遍历每一个可取值
        #确定id,选择父类
        id = food_sale[i]
        #print(data4_index_list,'\n',id)

        loss_index = data4_index_list.index(id)
        loss = loss_npList[loss_index]*0.01*0.75#增加

        e_loss = es[i]*(1-loss)
        #print('es[i]={},loss={}'.format(es[i],loss))
        #寻找物品当前的批发价
        cost = cost_71[i]
        if qs[i]>0.5:
            one = read_modelAnd_predict(e_loss,i)*e_loss*qs[i]*cost-cost*es[i]
            w += one
            #print('one',one)

    return -1*w[0]
from sko.GA import GA
#GA计算
def optimize_function():
    #进货量
    #是否选择
    constraint_ueq = (
        lambda x: sum(x[max_choice::]) - 33,
        lambda x: 27 - sum(x[max_choice::]),
    )
    arima_one = read_excel_71_predict()
    std_one   = np.load(get_path('std_canSale.npy'))
    std_two   = list( 2*i for i in std_one)
    lb_0 = [0]*(max_choice)
    ub_1 = [1]*(max_choice)
    #只看销量
    # 最小值
    min_put = [2.5]*max_choice
    #arima +- 2*std
    upper = np.add(arima_one,std_one)
    lower = np.subtract(arima_one,std_one)

    lb = find_lb(min_put,lower)
    ub = finb_up(min_put,upper)
    lb = list(lb)+list(lb_0)
    ub = list(ub)+list(ub_1)

    precision = [1e-7 for i in range(max_choice)]+[1 for i in range(max_choice)]
    ga = GA(func=fun_question,n_dim=max_choice*2,
            size_pop=50,max_iter=300,prob_mut=0.001,
            lb = lb,ub=ub,
            constraint_ueq=constraint_ueq,
            precision=precision
            )
    best_x, best_y = ga.run()
    print('best_x:', best_x, '\n', 'best_y:', best_y)
    np.save(get_path(r'answer3\best_x.npy'),best_x)
    np.save(get_path(r'answer3\best_y.npy'), best_y)
    Y_history = pd.DataFrame(ga.all_history_Y)
    fig, ax = plt.subplots(2, 1)
    ax[0].plot(Y_history.index, Y_history.values, '.', color='red')
    Y_history.min(axis=1).cummin().plot(kind='line')
    plt.show()