《机器学习实战》chap1 机器学习概览

news2025/1/12 22:47:08

《机器学习实战》chap1 机器学习概览

Chap1 The Machine Learning Landscape

这本书第三版也已经出版了:https://github.com/ageron/handson-ml3

  • Hands-on Machine Learning with Scikit-Learn,Keras & TensorFlow

  • 引入

    • 很早的应用:光学字符识别(OCR, Optical Character Recognition)
    • 20世纪90年代:垃圾邮件过滤器 the spam filter
    • … Now 数不清的机器学习应用

现在开始培养自己看英文原版的习惯
结合着看…

文章目录

  • 《机器学习实战》chap1 机器学习概览
  • 1.1 What is Machine Learning
  • 1.2 Why Use Machine Learning?
  • 1.3 Types of Machine Learning Systems
    • 1.3.1 Supervised/Unsupervised Learning
      • Supervised Learning
      • Unsupervised learning
      • Semisupervised learning
      • Reinforcement learning
    • 1.3.2 Batch and Online Learning
      • Batch learning
      • Online learning
    • 1.3.3 Instance-Based Versus Model-Based Learning
      • Instance-based learning
      • Model-based learning
  • 1.4 Main Challenges of Machine Learning
    • Insuffcient Quantity of Training Data
    • Nonrepresentative Training Data
    • Poor-Quality Data
    • Irrelevant Features
    • Overfitting the Training Data
    • Underfitting the Training Data
  • 1.5 Testing and Validating
  • Exercises
  • 参考

Examples of Applications 举得一些例子没有记笔记

1.1 What is Machine Learning

  • definition

    • a slightly more general definition:
      • [Machine Learning is the] field of study that gives computers the ability to learn without being explicitly programmed. —Arthur Samuel, 1959
      • 机器学习是一个研究领域,让计算机无须进行明确编程就具备学 习能力。
    • a more engineering-oriented one:
    • A computer program is said to learn from experience E with respect to some task T and some performance measure P, if its performance on T, as measured by P, improves with experience E. —Tom Mitchell, 1997
    • 一个计算机程序利用经验E来学习任务T,性能是P,如果针对任务T的性能P随着经验E不断增长,则称为机器学习
  • For example, your spam filter is a Machine Learning program that can learn to flag spam given examples of spam emails (e.g., flagged by users) and examples of regular (nonspam, also called “ham”) emails.

    • 你的垃圾邮件过滤器是一个机器学习程序,它可以根据垃圾邮件的例子(例如,被用户标记的垃圾邮件)和普通的例子(非垃圾邮件,也称为 “ham”)邮件的例子学习如何标记出垃圾邮件。
  • The examples that the system uses to learn(系统用来进行学习的样例) are called the training set(训练集). Each training example is called a training instance (or sample)(训练实例,训练样本). In this case, the task T is to flag spam for new emails(标记垃圾邮件), the experience E is the training data(训练数据),

    • the performance measure P needs to be defined; for example, you can use the ratio of correctly classified emails.

    (比如可以把P定义为 可以正确分类邮件的比例)

    • This particular performance measure is called accuracy(准确率) and it is often used in classification tasks(分类任务).

1.2 Why Use Machine Learning?

  • 作者还是用垃圾邮件的例子,感觉讲的很棒

  • Machine Learning is great for:

    • Problems for which existing solutions require a lot of hand-tuning or long lists of rules: one Machine Learning algorithm can often simplify code and perform better.
    • Complex problems for which there is no good solution at all using a traditional approach: the best Machine Learning techniques can find a solution.
    • Fluctuating(波动的) environments: a Machine Learning system can adapt to new data.
    • Getting insights about complex problems and large amounts of data
  • 积累一个我觉得文中很好的表达,下面这个shine

    Another area where Machine Learning shines is for problems that either are too complex for traditional approaches or have no known algorithm.

  • Machine Learning is about making machines get better at some task by learning from data, instead of having to explicitly code rules.

1.3 Types of Machine Learning Systems

哈哈哈最近发现state-of-the-art这个词用的挺多的

There are so many different types of Machine Learning systems that it is useful to classify them in broad categories based on: (据以下标准进行大致分类)

  • Whether or not they are trained with human supervision (supervised, unsuper‐ vised, semisupervised, and Reinforcement Learning)

  • Whether or not they can learn incrementally on the fly (online versus batch learning)

    是否可以动态地进行增量学习 (在线学习和批量学习)

  • Whether they work by simply comparing new data points to known data points, or instead detect patterns in the training data and build a predictive model, much like scientists do (instance-based versus model-based learning)

These criteria are not exclusive; you can combine them in any way you like. For example, a state-of-the-art spam filter may learn on the fly using a deep neural net‐ work model trained using examples of spam and ham; this makes it an online, model-based, supervised learning system.

1.3.1 Supervised/Unsupervised Learning

  • be classified according to the amount and type of supervision they get during training.

  • four major categories: supervised learning, unsupervised learning, semisupervised learning, and Reinforcement Learning.

Supervised Learning

  • typical supervised learning tasks

    • classification

      • 还是以spam filter为例:A labeled training set 中, trained with many example emails with their class (spam or ham)

      image-20230106220604362

    • regression

      Fun fact: this odd-sounding name is a statistics term introduced by Francis Galton while he was studying the fact that the children of tall people tend to be shorter than their parents. Since children were shorter, he called this regression to the mean. This name was then applied to the methods he used to analyze correlations between variables.

      • predict a target numeric value, such as the price of a car, given a set of features (mileage, age, brand, etc.) called predictors.
      • To train the system, you need to give it many examples of cars, including both their predictors and their labels (i.e., their prices).

      image-20230106220617939

  • Some of the most important supervised learning algorithms (covered in this book):

    Some regression algorithms can be used for classification as well, and vice versa. For example, Logistic Regression is commonly used for classification.

    • k-Nearest Neighbors
    • Linear Regression
    • Logistic Regression
    • Support Vector Machines (SVMs)
    • Decision Trees and Random Forests
    • Neural networks

Unsupervised learning

  • The training data is unlabeled. The system tries to learn without a teacher

    image-20230106220915133

以下为典型task和对应的一些重要算法

  • Clustering

    • K-Means
    • DBSCAN
    • Hierarchical Cluster Analysis (HCA) 分层聚类分析
  • Anomaly detection and novelty detection 异常检测和新颖性检测

    Anomaly detection:it learns to recognize them and when it sees a new instance it can tell whether it looks like a normal one or whether it is likely an anomaly

    Novelty detection: 和前者类似,the difference is that novelty detection algorithms expect to see only normal data during training, while anomaly detection algorithms are usually more tolerant, they can often perform well even with a small percentage of outliers in the training set.

    • One-class SVM 单类SVM
    • Isolation Forest
  • Visualization and dimensionality reduction 可视化和降维

    visualization:you feed them a lot of complex and unlabeled data, and they output a 2D or 3D rep‐ resentation of your data that can easily be plotted

    dimensionality reduction:the goal is to simplify the data without losing too much information

    It is often a good idea to try to reduce the dimension of your train‐ ing data using a dimensionality reduction algorithm before you feed it to another Machine Learning algorithm (such as a super‐ vised learning algorithm). It will run much faster, the data will take up less disk and memory space, and in some cases it may also per‐ form better

    • Principal Component Analysis (PCA)
    • Kernel PCA 核主成分分析
    • Locally-Linear Embedding (LLE) 局部线性嵌入
    • t-distributed Stochastic Neighbor Embedding (t-SNE) t-分布随机近邻嵌入
  • Association rule learning 关联规则学习

the goal is to dig into large amounts of data and discover interesting relations between attributes

  • Apriori
  • Eclat

Semisupervised learning

  • deal with partially labeled training data, usually a lot of unlabeled data and a little bit of labeled data.
  • Most semisupervised learning algorithms are combinations of unsupervised and supervised algorithms.

Reinforcement learning

The learning system, called an agent in this context, can observe the environment, select and perform actions, and get rewards in return (or penalties in the form of negative rewards). It must then learn by itself what is the best strategy, called a policy, to get the most reward over time. A policy defines what action the agent should choose when it is in a given situation.

image-20230106222215550

1.3.2 Batch and Online Learning

这一小节目前不是很理解

  • Another criterion used to classify Machine Learning systems is whether or not the system can learn incrementally from a stream of incoming data.

    看系统是否可以从传入的数据流中进行增量学习

Batch learning

  • In batch learning, the system is incapable of learning incrementally: it must be trained using all the available data.

  • This will generally take a lot of time and computing resources, so it is typically done offline. First the system is trained, and then it is launched into production and runs without learning anymore; it just applies what it has learned. This is called offline learning.

Online learning

  • In online learning, you train the system incrementally by feeding it data instances sequentially, either individually or by small groups called mini-batches. Each learning step is fast and cheap, so the system can learn about new data on the fly, as it arrives.(系统可以根据飞速写入的最新数据进行学习)

    on the fly :(计算机)运行中

    image-20230106223203276

1.3.3 Instance-Based Versus Model-Based Learning

  • One more way to categorize Machine Learning systems is by how they generalize(如何泛化). Most Machine Learning tasks are about making predictions. This means that given a number of training examples, the system needs to be able to generalize to examples it has never seen before. Having a good performance measure on the training data is good, but insufficient; the true goal is to perform well on new instances.

    在训练数据上实现良好的性能指标固然重要,但是还不够充分。真正的目的是要在新的对象实 例上表现出色

  • There are two main approaches to generalization: instance-based learning and model-based learning.

Instance-based learning

  • the system learns the examples by heart, then generalizes to new cases by comparing them to the learned examples (or a subset of them), using a similarity measure

Model-based learning

  • build a model of these exam‐ ples, then use that model to make predictions

  • In summary:

    • You studied the data.
    • You selected a model.
    • You trained it on the training data (i.e., the learning algorithm searched for the model parameter values that minimize a cost function).
    • Finally, you applied the model to make predictions on new cases (this is called inference), hoping that this model will generalize well.

    This is what a typical Machine Learning project looks like.


  • There are many different types of ML systems: supervised or not, batch or online, instance-based or model-based, and so on.
  • In a ML project you gather data in a training set, and you feed the training set to a learning algorithm. If the algorithm is model-based it tunes some parameters to fit the model to the training set (i.e., to make good predictions on the training set itself), and then hopefully it will be able to make good predictions on new cases as well. If the algorithm is instance-based, it just learns the examples by heart and uses a similarity measure to generalize to new instances.

1.4 Main Challenges of Machine Learning

In short, since your main task is to select a learning algorithm and train it on some data, the two things that can go wrong are “bad algorithm” and “bad data.” Let’s start with examples of bad data.

Insuffcient Quantity of Training Data

作者这里写的很好玩哈哈哈

For a toddler (刚学会走路的孩子,牙牙学语的小孩)to learn what an apple is, all it takes is for you to point to an apple and say “apple” (possibly repeating this procedure a few times). Now the child is able to recognize apples in all sorts of colors and shapes. Genius. (天才)

Machine Learning is not quite there yet; it takes a lot of data for most Machine Learning algorithms to work properly. Even for very simple problems you typically need thousands of examples, and for complex problems such as image or speech recognition you may need millions of examples (unless you can reuse parts of an existing model).

Nonrepresentative Training Data

In order to generalize well, it is crucial that your training data be representative of the new cases you want to generalize to. This is true whether you use instance-based learning or model-based learning.

It is crucial to use a training set that is representative of the cases you want to generalize to. This is often harder than it sounds: if the sample is too small, you will have sampling noise (i.e., nonrepresentative data as a result of chance), but even very large samples can be nonrepresentative if the sampling method is flawed. This is called sampling bias.

针对你想要泛化的案例使用具有代表性的训练集,这一点至关重要。不过说起来容易,做起来难:如果样本集太小,将会出现采样噪声 (即非代表性数据被选中);而即便是非常大的样本数据,如果采样方式欠妥,也同样可能导致非代表性数据集,这就是所谓的采样偏差。

Poor-Quality Data

Obviously, if your training data is full of errors, outliers(异常值), and noise (e.g., due to poorquality measurements), it will make it harder for the system to detect the underlying patterns, so your system is less likely to perform well. It is often well worth the effort to spend time cleaning up your training data. The truth is, most data scientists spend a significant part of their time doing just that.

Irrelevant Features

As the saying goes: garbage in, garbage out. Your system will only be capable of learning if the training data contains enough relevant features and not too many irrelevant ones. A critical part of the success of a Machine Learning project is coming up with a good set of features to train on. This process, called feature engineering, involves:

  • Feature selection: selecting the most useful features to train on among existing features.
  • Feature extraction: combining existing features to produce a more useful one (as we saw earlier, dimensionality reduction algorithms can help).
  • Creating new features by gathering new data.

前面几个challenge主要是bad data,下面几个主要是bad algorithm

Overfitting the Training Data

Say you are visiting a foreign country and the taxi driver rips you off. You might be tempted to say that all taxi drivers in that country are thieves. Overgeneralizing is something that we humans do all too often, and unfortunately machines can fall into the same trap if we are not careful.

这一段写的也好哈哈,仔细想想我们人确实常犯过拟合的错误。

  • In Machine Learning this is called overfitting: it means that the model performs well on the training data, but it does not generalize well.

  • Overfitting happens when the model is too complex relative to the amount and noisiness of the training data. The possible solutions are:

    • To simplify the model by selecting one with fewer parameters (e.g., a linear model rather than a high-degree polynomial model), by reducing the number of attributes in the training data or by constraining the model
    • To gather more training data
    • To reduce the noise in the training data (e.g., fix data errors and remove outliers)
  • Constraining a model to make it simpler and reduce the risk of overfitting is called regularization.

  • The amount of regularization to apply during learning can be controlled by a hyper‐ parameter. A hyperparameter is a parameter of a learning algorithm (not of the model). As such, it is not affected by the learning algorithm itself; it must be set prior to training and remains constant during training. If you set the regularization hyper‐ parameter to a very large value, you will get an almost flat model (a slope close to zero); the learning algorithm will almost certainly not overfit the training data, but it will be less likely to find a good solution. Tuning hyperparameters is an important part of building a Machine Learning system。

Underfitting the Training Data

  • Underfitting is the opposite of overfitting: it occurs when your model is too simple to learn the underlying structure of the data.
  • The main options to fix this problem are:
  • Selecting a more powerful model, with more parameters
  • Feeding better features to the learning algorithm (feature engineering)
  • Reducing the constraints on the model (e.g., reducing the regularization hyper‐ parameter)

The system will not perform well if your training set is too small, or if the data is not representative, noisy, or polluted with irrelevant features (garbage in, garbage out). Lastly, your model needs to be neither too simple (in which case it will underfit) nor too complex (in which case it will overfit).

once you have trained a model, you don’t want to just “hope” it generalizes to new cases. You want to evaluate it, and fine-tune(微调,精调) it if necessary. Let’s see how👇

1.5 Testing and Validating

这部分里面提到的训练集,测试集,验证集,交叉验证,之前学习周志华老师的课的时候感觉那里老师讲的很清楚,当时笔记记录在这:周志华 《机器学习初步》模型评估与选择

另外,这里也提到了 No Free Lunch(NFL) Theorem 哈哈哈!

  • NFL定理 : 一个算法 L a \mathfrak{L}_{a} La若在某些问题上比另一个算法 L b \mathfrak{L}_{b} Lb 好,必存在 另一些问题 L b \mathfrak{L}_{b} Lb L a \mathfrak{L}_{a} La

In a famous 1996 paper,(“The Lack of A Priori Distinctions Between Learning Algorithms,”) David Wolpert demonstrated that if you make absolutely no assumption about the data, then there is no reason to prefer one model over any other. This is called the No Free Lunch (NFL) theorem. For some datasets the best model is a linear model, while for other datasets it is a neural network. There is no model that is a priori guaranteed to work better (hence the name of the theorem). The only way to know for sure which model is best is to evaluate them all. Since this is not possible, in practice you make some reasonable assumptions about the data and you evaluate only a few reasonable models. For example, for simple tasks you may evaluate linear models with various levels of regularization, and for a complex problem you may evaluate various neural networks.

Exercises

  • 1.How would you define Machine Learning?

    Machine Learning is about building systems that can learn from data. Learning means getting better at some task, given some performance measure.

  • 2.Can you name four types of problems where it shines?

    Machine Learning is great for complex problems for which we have no algorithmic solution, to replace long lists of hand-tuned rules, to build systems that adapt to fluctuating environments, and finally to help humans learn (e.g., data mining).

  • 3.What is a labeled training set?

    A labeled training set is a training set that contains the desired solution (a.k.a. a label) for each instance.

    “a.k.a.” stands for “also known as.” It is used to introduce an alternative name or nickname for someone or something. For example, you might say, “The artist a.k.a. Kanye West.” This means that Kanye West is also known as “the artist.”

  • 4.What are the two most common supervised tasks?

    Regression and classification.

  • 5.Can you name four common unsupervised tasks?

    Clustering, visualization, dimensionality reduction, and association rule learning.

  • 6.What type of Machine Learning algorithm would you use to allow a robot to walk in various unknown terrains?

    Reinforcement Learning is likely to perform best if we want a robot to learn to walk in various unknown terrains, since this is typically the type of problem that Reinforcement Learning tackles. It might be possible to express the problem as a supervised or semi-supervised learning problem, but it would be less natural.

  • 7.What type of algorithm would you use to segment your customers into multiple groups?

    If you don’t know how to define the groups, then you can use a clustering algorithm (unsupervised learning) to segment your customers into clusters of similar customers. However, if you know what groups you would like to have, then you can feed many examples of each group to a classification algorithm (supervised learning), and it will classify all your customers into these groups.

  • 8.Would you frame the problem of spam detection as a supervised learning problem or an unsupervised learning problem?

    Spam detection is a typical supervised learning problem: the algorithm is fed many emails along with their labels (spam or not spam).

  • 9.What is an online learning system?

    An online learning system can learn incrementally, as opposed to a batch learning system. This makes it capable of adapting rapidly to both changing data and autonomous systems, and of training on very large quantities of data.

  • 10.What is out-of-core learning?

    这个前面笔记里面偷懒没记

    Out-of-core algorithms can handle vast quantities of data that cannot fit in a computer’s main memory. An out-of-core learning algorithm chops the data into mini-batches and uses online learning techniques to learn from these mini-batches.

  • 11.What type of learning algorithm relies on a similarity measure to make predictions?

    An instance-based learning system learns the training data by heart; then, when given a new instance, it uses a similarity measure to find the most similar learned instances and uses them to make predictions.

  • 12.What is the difference between a model parameter and a learning algorithm’s hyperparameter?

    A model has one or more model parameters that determine what it will predict given a new instance (e.g., the slope 斜率 of a linear model). A learning algorithm tries to find optimal values(最优解) for these parameters such that the model generalizes well to new instances. A hyperparameter is a parameter of the learning algorithm itself, not of the model (e.g., the amount of regularization to apply).

  • 13.What do model-based learning algorithms search for? What is the most common strategy they use to succeed? How do they make predictions?

    Model-based learning algorithms search for an optimal value for the model parameters such that the model will generalize well to new instances. We usually train such systems by minimizing a cost function that measures how bad the system is at making predictions on the training data, plus a penalty for model complexity if the model is regularized. To make predictions, we feed the new instance’s features into the model’s prediction function, using the parameter values found by the learning algorithm.

  • 14.Can you name four of the main challenges in Machine Learning?

    Some of the main challenges in Machine Learning are the lack of data, poor data quality, nonrepresentative data, uninformative features, excessively simple models that underfit the training data, and excessively complex models that overfit the data.

  • 15.If your model performs great on the training data but generalizes poorly to new instances, what is happening? Can you name three possible solutions?

    If a model performs great on the training data but generalizes poorly to new instances, the model is likely overfitting the training data (or we got extremely lucky on the training data). Possible solutions to overfitting are getting more data, simplifying the model (selecting a simpler algorithm, reducing the number of parameters or features used, or regularizing the model), or reducing the noise in the training data.

  • 16.What is a test set and why would you want to use it?

    A test set is used to estimate the generalization error that a model will make on new instances, before the model is launched in production.

  • 17.What is the purpose of a validation set?

    A validation set is used to compare models. It makes it possible to select the best model and tune the hyperparameters.

  • 18.What is the train-dev set, when do you need it, and how do you use it?

    The train-dev set is used when there is a risk of mismatch between the training data and the data used in the validation and test datasets (which should always be as close as possible to the data used once the model is in production). The train-dev set is a part of the training set that’s held out (the model is not trained on it). The model is trained on the rest of the training set, and evaluated on both the train-dev set and the validation set. If the model performs well on the training set but not on the train-dev set, then the model is likely overfitting the training set. If it performs well on both the training set and the train-dev set, but not on the validation set, then there is probably a significant data mismatch between the training data and the validation + test data, and you should try to improve the training data to make it look more like the validation + test data.

    In the context of machine learning, a “train-dev set” is a set of data that is used to evaluate the performance of a model during the training process. The model is trained on a separate training set, and then its performance is evaluated on the development set to see how well it generalizes to unseen data. The train-dev set is used to tune the hyperparameters of the model and to help prevent overfitting. It is common practice to have a separate test set that is only used to evaluate the final performance of the model once training is complete.

    In the context of machine learning, a development set (also known as a validation set) is a set of data that is used to evaluate the performance of a model during the training process. It is called a “development set” because it is used to develop the model, i.e., to tune the hyperparameters and ensure that the model is not overfitting the training data.

    The development set is different from the training set, which is used to fit the model to the data, and the test set, which is used to evaluate the final performance of the model on unseen data.

    Some people use the terms “development set” and “validation set” interchangeably, while others use them to refer to slightly different things. In general, the term “validation set” is used more commonly, and it usually refers to a set of data that is used to evaluate the model during training and fine-tune the hyperparameters.

  • 19.What can go wrong if you tune hyperparameters using the test set?

    If you tune hyperparameters using the test set, you risk overfitting the test set, and the generalization error you measure will be optimistic (you may launch a model that performs worse than you expect).

参考

  • https://github.com/ageron/handson-ml3

  • https://github.com/ageron/handson-ml2

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.coloradmin.cn/o/145767.html

如若内容造成侵权/违法违规/事实不符,请联系多彩编程网进行投诉反馈,一经查实,立即删除!

相关文章

远程办公之怎样在外网登录在线答题网站

很多学校或企业因为教学、测试需要,为学生或员工提供了在线答题平台网站,但弊端是这种在线答题平台只能在校内或在企业内网访问使用,在外网是无法登录访问的。在无公网Ip服务器上部署的web,默认情况下只能内网访问,公网…

TLE4943C/CH505C轮速传感器芯片的输出协议介绍

Infineon公司的TLE4943是一款集成式有源磁场传感器,适用于基于霍尔技术的车轮速度应用。它的基本功能是测量磁极轮或铁磁齿轮的速度。它具有使用AK协议进行通信的两线电流接口。该协议除了提供速度信号外,还提供其他信息,如车轮旋转方向和气隙…

java安装教程-windows

检查是否已经安装过jav打开cmd命令窗口 输入 java -v下载java安装包网址:https://www.oracle.com/java/technologies/downloads/安装java双击运行程序jdk-19_windows-x64_bin.exe,点击下一步进行安装可以更改安装路径,注意安装路径不能有中文…

【TypeScript】TS类型守卫(六)

🐱个人主页:不叫猫先生 🙋‍♂️作者简介:前端领域新星创作者、华为云享专家、阿里云专家博主,专注于前端各领域技术,共同学习共同进步,一起加油呀! 💫系列专栏&#xff…

独立开发变现周刊(第86期):月收入4000美元的日程规划器

分享独立开发、产品变现相关内容,每周五发布。目录1、NotionReads: 在Notion中管理你的阅读书籍2、Zaap.ai: 面向创作者的一站式工具3、microfeed: 开源的可自我托管的轻量级内容管理系统(CMS)4、Reactive Resume:一个免费的开源简历生成器5、一个月收入…

2019年1月政企终端安全态势分析报告

声明 本文是学习2019年1月政企终端安全态势分析报告. 下载地址 http://github5.com/view/55037而整理的学习笔记,分享出来希望更多人受益,如果存在侵权请及时联系我们 漏洞利用病毒攻击政企分析 奇安信终端安全实验室监测数据显示,2019年4月,有6.7%的…

JavaScript中的元编程

紧接上回,伴随着Reflect,Proxy降世,为js带来了更便捷的元编程! 什么是元编程?这词第一次听,有点懵,好像有点高级,这不得学一下装…进自己的知识库 概念 元编程是一种编程技术&…

【数据结构与算法】Collection接口迭代器

Java合集框架 数据结构是以某种形式将数据组织在一起的合集(collection)。数据结构不仅存储数据,还支持访问和处理数据的操作 在面向对象的思想里,一种数据结构也被认为是一个容器(container)或者容器对象…

【MySQL】MySQL表的七大约束

序号系列文章1【MySQL】MySQL介绍及安装2【MySQL】MySQL基本操作详解3【MySQL】MySQL基本数据类型4【MySQL】MySQL表的七大约束文章目录MySQL表的约束1,默认约束2,非空约束3,唯一约束4,主键约束5,自增约束6&#xff0c…

详细分析单调栈,及正确性证明

什么是单调栈 对于一个数组,需要对每个位置生成,左右两边离它最近的,比它小(或比它大)的位置在哪 例如: 如果对每个位置都遍历下左右两边,找到第一个比它小的位置,就是O(N ^ 2)的…

IPv6 时代如何防御 DDoS 攻击?

在互联网世界,每台联网的设备都被分配了一个用于标识和位置定义的 IP 地址。20 世纪 90 年代以来互联网的快速发展,联网设备所需的地址远远多于可用 IPv4 地址的数量,导致了 IPv4 地址耗尽。因此,协议 IPv6 的开发和部署已经刻不容…

从第三方平台导出大数据量本地Excel怎么解决性能问题?

对于日常需要做分析的我们来说,周期性需要从第三方系统导出数据,日积月累数据量越来愈大,由开始的几百条数据慢慢增至十几万甚至百万级的数据量,在本地Excel直接做分析汇总老是卡顿等半天,效率日益低下,每天…

连续四年发布科技趋势预测,他们在探索中国科技的“主干道”

,*本文配图由百度飞桨文心一格提供AI绘画技术支持。古希腊流传着一句谚语:智慧不仅是能够明察眼前,更要能够预测未来。身处科技界,一到年底年初我们就会看到各种各样的趋势预测。这些预测五花八门,神奇多变。但大多数科…

JAVA中使用最广泛的本地缓存?Ehcache的自信从何而来3 —— 本地缓存变身分布式集群缓存,打破本地缓存天花板

大家好,又见面了。 本文是笔者作为掘金技术社区签约作者的身份输出的缓存专栏系列内容,将会通过系列专题,讲清楚缓存的方方面面。如果感兴趣,欢迎关注以获取后续更新。 上一篇文章中,我们知晓了如何在项目中通过不同的…

【Python】Numpy处理多项式类Polynomial

文章目录构造函数求导和积分求根和反演采样与拟合其他方法构造函数 Numpy中提供了多项式模块,里面封装了一些用以快速解决多项式问题的类和函数,其中最重要类的自然是Polynomial,其构造函数为 class numpy.polynomial.polynomial.Polynomia…

list容器的底层结构(详述insert()与erase())

目录 一、带头结点的双向循环链表(list) 二、贯穿list容器的insert与erase接口​编辑 一、带头结点的双向循环链表(list) 二、贯穿list容器的insert与erase接口 通过在指定位置的元素之前插入新元素来扩展容器。 这有效地增加了…

520页(17万字)集团大数据平台整体解决方案-v1.0

【版权声明】本资料来源网络,知识分享,仅供个人学习,请勿商用。【侵删致歉】如有侵权请联系小编,将在收到信息后第一时间删除!完整资料领取见文末,部分资料内容: 1.1.1 系统总体逻辑结构 4-14系…

Golang 面试题总结

一.基础部分 go语言的值类型和引用类型? 值类型:int、float、bool、string和数组这些类型都属于值类型。 值类型的变量直接指向存在内存中的值,值类型的变量的值存储在栈中。当使用 将一个变量的值赋给另一个变量时,如 j i ,实…

九、k8s 安全认证

文章目录1 访问控制概述2 认证管理3 授权管理4 准入控制1 访问控制概述 Kubernetes作为一个分布式集群的管理工具,保证集群的安全性是其一个重要的任务。所谓的安全性其实就是保证对Kubernetes的各种客户端进行认证和鉴权操作。 客户端 在Kubernetes集群中&#…

MySQL调优-深入理解MVCC机制

目录 MySQL调优-深入理解MVCC机制 MVCC多版本并发控制机制 undo日志版本链与read view机制详解 根据图2和图3对应画出下图的undo日志版本链: 版本链比对规则: 注意: 举例1:分析一下下图select1的read_view以及各个select语句…