研究论文有监督和无监督学习定义
Research paper has Supervised and Unsupervised Learning definition
我正在寻找一些研究论文或书籍,它们对什么是监督学习和非监督学习有很好的基本定义。这样我就可以在我的项目中引用这些定义。
非常感谢。
Christopher M. Bishop,“模式识别和机器学习”,第 3 页(强调我的)
Applications in which the training data comprises examples of the input vectors along with their corresponding target vectors are known as supervised learning problems...
In other pattern recognition problems, the training data consists of a set of input vectors x without any corresponding target values. The goal in such unsupervised learning problems may be to discover groups of similar examples within the data,
where it is called clustering, or to determine the distribution of data within the input space, known as density estimation, or to project the data from a high-dimensional space down to two or three dimensions for the purpose of visualization.
尽你所能。基本上,最显着的区别是我们是否有标签。我们希望学习模型进行优化。如果我们没有一些标签,它仍然可以被描述为弱监督学习。如果没有可用的标签,唯一剩下的就是在数据中找到一些结构。
感谢@Pavel Tyshevskyi 的回答。你的回答很完美,但对于像我这样的初学者来说似乎有点难以理解。
经过一个小时的搜索,我在“机器学习傻瓜版,IBM 限量版”一书的第 1 章“理解机器学习”的“机器学习方法”部分找到了我自己的答案版本。它有更简单的定义,并有可以帮助我更好地理解的例子。 Link 到书:Machine Learning For Dummies, IBM Limited Edition
Supervised learning
Supervised learning typically begins with an established set of data and a certain understanding of how that data is classified. Supervised learning is intended to find patterns in data that can be applied to an analytics process. This data has labeled features that define the meaning of data. For example, there could be mil-lions of images of animals and include an explanation of what each animal is and then you can create a machine learning appli-cation that distinguishes one animal from another. By labeling this data about types of animals, you may have hundreds of cat-egories of different species. Because the attributes and the mean-ing of the data have been identified, it is well understood by the users that are training the modeled data so that it fits the details of the labels. When the label is continuous, it is a regression; when the data comes from a finite set of values, it known as classifica-tion. In essence, regression used for supervised learning helps you understand the correlation between variables. An example of supervised learning is weather forecasting. By using regression analysis, weather forecasting takes into account known historical weather patterns and the current conditions to provide a predic-tion on the weather.
The algorithms are trained using preprocessed examples, and at this point, the performance of the algorithms is evaluated with test data. Occasionally, patterns that are identified in a subset of the data can’t be detected in the larger population of data. If the model is fit to only represent the patterns that exist in the training subset, you create a problem called overfitting. Overfit-ting means that your model is precisely tuned for your training data but may not be applicable for large sets of unknown data. To protect against overfitting, testing needs to be done against unforeseen or unknown labeled data. Using unforeseen data for the test set can help you evaluate the accuracy of the model in predicting outcomes and results. Supervised training models have broad applicability to a variety of business problems, including fraud detection, recommendation solutions, speech recognition, or risk analysis.
Unsupervised learning
Unsupervised learning is best suited when the problem requires a massive amount of data that is unlabeled. For example, social media applications, such as Twitter, Instagram, Snapchat, and.....
我会参考以下书籍:人工智能:现代方法(第 3 版)第 3 版,作者 Stuart Russell 和 Peter Norvig。在第 18 章和第 693 页及以后的内容中有对监督学习和非监督学习的分析。关于无监督学习:
In unsupervised learning, the agent learns patterns in the input
even though no explicit feedback is supplied.
The most common unsupervised learning task is clustering:
detecting potentially useful clusters of input examples.
For example, a taxi agent might gradually develop a concept
of “good traffic days” and “bad traffic days” without ever being
given labeled examples of each by a teacher
受监督时:
In supervised learning, the agent observes some example input–output
pairs
and learns a function that maps from input to output. In component 1 above,
the inputs are percepts and the output are provided by a teacher
who says “Brake!” or “Turn left.” In component 2, the inputs are camera
images and the outputs again come from a teacher who says “that’s a bus.”
In 3, the theory of braking is a function from states and braking actions
to stopping distance in feet. In this case the output value is available
directly from the agent’s percepts (after the fact); the environment
is the teacher.
我正在寻找一些研究论文或书籍,它们对什么是监督学习和非监督学习有很好的基本定义。这样我就可以在我的项目中引用这些定义。
非常感谢。
Christopher M. Bishop,“模式识别和机器学习”,第 3 页(强调我的)
Applications in which the training data comprises examples of the input vectors along with their corresponding target vectors are known as supervised learning problems...
In other pattern recognition problems, the training data consists of a set of input vectors x without any corresponding target values. The goal in such unsupervised learning problems may be to discover groups of similar examples within the data, where it is called clustering, or to determine the distribution of data within the input space, known as density estimation, or to project the data from a high-dimensional space down to two or three dimensions for the purpose of visualization.
尽你所能。基本上,最显着的区别是我们是否有标签。我们希望学习模型进行优化。如果我们没有一些标签,它仍然可以被描述为弱监督学习。如果没有可用的标签,唯一剩下的就是在数据中找到一些结构。
感谢@Pavel Tyshevskyi 的回答。你的回答很完美,但对于像我这样的初学者来说似乎有点难以理解。
经过一个小时的搜索,我在“机器学习傻瓜版,IBM 限量版”一书的第 1 章“理解机器学习”的“机器学习方法”部分找到了我自己的答案版本。它有更简单的定义,并有可以帮助我更好地理解的例子。 Link 到书:Machine Learning For Dummies, IBM Limited Edition
Supervised learning
Supervised learning typically begins with an established set of data and a certain understanding of how that data is classified. Supervised learning is intended to find patterns in data that can be applied to an analytics process. This data has labeled features that define the meaning of data. For example, there could be mil-lions of images of animals and include an explanation of what each animal is and then you can create a machine learning appli-cation that distinguishes one animal from another. By labeling this data about types of animals, you may have hundreds of cat-egories of different species. Because the attributes and the mean-ing of the data have been identified, it is well understood by the users that are training the modeled data so that it fits the details of the labels. When the label is continuous, it is a regression; when the data comes from a finite set of values, it known as classifica-tion. In essence, regression used for supervised learning helps you understand the correlation between variables. An example of supervised learning is weather forecasting. By using regression analysis, weather forecasting takes into account known historical weather patterns and the current conditions to provide a predic-tion on the weather.
The algorithms are trained using preprocessed examples, and at this point, the performance of the algorithms is evaluated with test data. Occasionally, patterns that are identified in a subset of the data can’t be detected in the larger population of data. If the model is fit to only represent the patterns that exist in the training subset, you create a problem called overfitting. Overfit-ting means that your model is precisely tuned for your training data but may not be applicable for large sets of unknown data. To protect against overfitting, testing needs to be done against unforeseen or unknown labeled data. Using unforeseen data for the test set can help you evaluate the accuracy of the model in predicting outcomes and results. Supervised training models have broad applicability to a variety of business problems, including fraud detection, recommendation solutions, speech recognition, or risk analysis.
Unsupervised learning
Unsupervised learning is best suited when the problem requires a massive amount of data that is unlabeled. For example, social media applications, such as Twitter, Instagram, Snapchat, and.....
我会参考以下书籍:人工智能:现代方法(第 3 版)第 3 版,作者 Stuart Russell 和 Peter Norvig。在第 18 章和第 693 页及以后的内容中有对监督学习和非监督学习的分析。关于无监督学习:
In unsupervised learning, the agent learns patterns in the input even though no explicit feedback is supplied. The most common unsupervised learning task is clustering: detecting potentially useful clusters of input examples. For example, a taxi agent might gradually develop a concept of “good traffic days” and “bad traffic days” without ever being given labeled examples of each by a teacher
受监督时:
In supervised learning, the agent observes some example input–output pairs and learns a function that maps from input to output. In component 1 above, the inputs are percepts and the output are provided by a teacher who says “Brake!” or “Turn left.” In component 2, the inputs are camera images and the outputs again come from a teacher who says “that’s a bus.” In 3, the theory of braking is a function from states and braking actions to stopping distance in feet. In this case the output value is available directly from the agent’s percepts (after the fact); the environment is the teacher.