表和数据集有什么区别?

What is the difference between tables and datasets?

在我的 MATLAB 代码中,我主要使用 datasets 将不同类型的数据和元数据存储在单个容器变量中。但是,我发现同事使用 tables。在我看来,这两种数据类型非常相似:都可以通过列名或索引访问,都支持 summary 函数等。

这两种数据类型有什么区别?

不细说了,table是一个相当新的函数,是基本的Matlab自带的。然而,较旧的 dataset 统计和机器学习工具箱 的一部分。

如您所知,它们非常相似,但我无法准确告诉您如何相似。但是 doc 实际上很清楚你应该使用什么:

The dataset data type might be removed in a future release. To work with heterogeneous data, use the MATLAB® table data type instead. See MATLAB table documentation for more information.

所以 tabledataset 的替代函数,可供所有人使用。只需使用 table,您的未来就安全了。

正如brodroll在评论中提到的,还有一个statement of MathWorks on Matlab Central:

Broadly speaking, Tables and datasets essentially serve the same functionality. Following are some of the differences:

1) Tables are included as part of core MATLAB, and do not need the installation of Statistics Toolbox to use them. Moreover, their design and terminology makes them a bit more accessible for non-statistical users, though they remain just as useful for statistics.

2) TABLE is ultimately meant to replace DATASET over time. Hence it is recommended to use TABLE in place of DATASET. Please note that this transition will not happen immediately and upcoming releases will provide more details and strategies for making the transition.

3) You still need to use DATASET in the Statistics Toolbox while using classes such as ‘LinearModel’ and ‘LinearMixedModel’ (which is new in MATAB R2013b). It is recommended to use TABLE and converting to DATASET only when needed, using TABLE2DATASET.

4) The TABLE class is currently sealed. Hence it is not possible to subclass from it unlike the DATASET class which can be inherited by a subclass.