具有特定域的橙色数据 Table
Orange Data Table with specific domain
我正在尝试从 csv-file 创建橙色数据 Table。为实现这一目标,我目前正在尝试使用以下步骤来做到这一点:
- 创建目标域
- 正在读取文件到临时数据table
- 使用临时 table 中的数据创建新数据 table 和
目标域
将 csv 更改为具有三行 header (https://docs.orange.biolab.si/3/data-mining-library/reference/data.io.html) 的 tab-file 不是一个选项。
将此过程转换为代码时,我得到以下信息:
from Orange.data import Domain, DiscreteVariable, ContinuousVariable, Table
# Creating specific domain. Two attributes and a Class variable used as target
target_domain = Domain([ContinuousVariable.make("Attribute 1"),ContinuousVariable.make("Attribute 2")],DiscreteVariable.make("Class"))
print('Target domain:',target_domain)
# Target domain: [Attribute 1, Attribute 2 | Class]
# Reading in the file
test_data = Table.from_file('../data/knn_trainingset_example.csv')
print('Domain from file:',test_data.domain)
# Domain from file: [Attribute 1, Attribute 2, Class]
# Using specific domain with test_data
final_data = Table.from_table(target_domain,test_data)
print('Domain:',final_data.domain)
print('Data:')
print(final_data)
# Domain: [Attribute 1, Attribute 2 | Class]
# Data:
# [[0.800, 6.300 | ?],
# [1.400, 8.100 | ?],
# [2.100, 7.400 | ?],
# [2.600, 14.300 | ?],
# [6.800, 12.600 | ?],
# [8.800, 9.800 | ?],
# ...
正如您从最终打印语句中看到的那样,class 变量是未知的 (?) 而不是预期的 class(+ 或 -)。
有人可以explain/solve这种行为吗?提供一种 better/different 方法来创建具有特定域的数据 Table?
好的,谢谢!如参考资料 (https://docs.orange.biolab.si/3/data-mining-library/reference/data.variable.html#discrete-variables) 中所述,您必须提供可能的值。因此,将它们作为元组提供就可以了。为了将来参考,我将调整后的代码放在下面。
from Orange.data import Domain, DiscreteVariable, ContinuousVariable, Table
# Creating specific domain. Two attributes and a Class variable used as target
target_domain = Domain([ContinuousVariable.make("Attribute 1"),ContinuousVariable.make("Attribute 2")],DiscreteVariable.make("Class",values=('+','-')))
print('Target domain:',target_domain)
# Target domain: [Attribute 1, Attribute 2 | Class]
# Reading in the file
test_data = Table.from_file('../data/knn_trainingset_example.csv')
print('Domain from file:',test_data.domain)
# Domain from file: [Attribute 1, Attribute 2, Class]
print('Data:')
print(test_data)
# [[0.800, 6.300 | −],
# [1.400, 8.100 | −],
# [2.100, 7.400 | −],
# [2.600, 14.300 | +],
# [6.800, 12.600 | −],
# [8.800, 9.800 | +],
# ...
# Using specific domain with test_data
final_data = Table.from_table(target_domain,test_data)
print('Domain:',final_data.domain)
# Domain: [Attribute 1, Attribute 2 | Class]
print('Data:')
# Data:
# [[0.800, 6.300 | −],
# [1.400, 8.100 | −],
# [2.100, 7.400 | −],
# [2.600, 14.300 | +],
# [6.800, 12.600 | −],
# [8.800, 9.800 | +],
# ...
我正在尝试从 csv-file 创建橙色数据 Table。为实现这一目标,我目前正在尝试使用以下步骤来做到这一点:
- 创建目标域
- 正在读取文件到临时数据table
- 使用临时 table 中的数据创建新数据 table 和 目标域
将 csv 更改为具有三行 header (https://docs.orange.biolab.si/3/data-mining-library/reference/data.io.html) 的 tab-file 不是一个选项。
将此过程转换为代码时,我得到以下信息:
from Orange.data import Domain, DiscreteVariable, ContinuousVariable, Table
# Creating specific domain. Two attributes and a Class variable used as target
target_domain = Domain([ContinuousVariable.make("Attribute 1"),ContinuousVariable.make("Attribute 2")],DiscreteVariable.make("Class"))
print('Target domain:',target_domain)
# Target domain: [Attribute 1, Attribute 2 | Class]
# Reading in the file
test_data = Table.from_file('../data/knn_trainingset_example.csv')
print('Domain from file:',test_data.domain)
# Domain from file: [Attribute 1, Attribute 2, Class]
# Using specific domain with test_data
final_data = Table.from_table(target_domain,test_data)
print('Domain:',final_data.domain)
print('Data:')
print(final_data)
# Domain: [Attribute 1, Attribute 2 | Class]
# Data:
# [[0.800, 6.300 | ?],
# [1.400, 8.100 | ?],
# [2.100, 7.400 | ?],
# [2.600, 14.300 | ?],
# [6.800, 12.600 | ?],
# [8.800, 9.800 | ?],
# ...
正如您从最终打印语句中看到的那样,class 变量是未知的 (?) 而不是预期的 class(+ 或 -)。
有人可以explain/solve这种行为吗?提供一种 better/different 方法来创建具有特定域的数据 Table?
好的,谢谢!如参考资料 (https://docs.orange.biolab.si/3/data-mining-library/reference/data.variable.html#discrete-variables) 中所述,您必须提供可能的值。因此,将它们作为元组提供就可以了。为了将来参考,我将调整后的代码放在下面。
from Orange.data import Domain, DiscreteVariable, ContinuousVariable, Table
# Creating specific domain. Two attributes and a Class variable used as target
target_domain = Domain([ContinuousVariable.make("Attribute 1"),ContinuousVariable.make("Attribute 2")],DiscreteVariable.make("Class",values=('+','-')))
print('Target domain:',target_domain)
# Target domain: [Attribute 1, Attribute 2 | Class]
# Reading in the file
test_data = Table.from_file('../data/knn_trainingset_example.csv')
print('Domain from file:',test_data.domain)
# Domain from file: [Attribute 1, Attribute 2, Class]
print('Data:')
print(test_data)
# [[0.800, 6.300 | −],
# [1.400, 8.100 | −],
# [2.100, 7.400 | −],
# [2.600, 14.300 | +],
# [6.800, 12.600 | −],
# [8.800, 9.800 | +],
# ...
# Using specific domain with test_data
final_data = Table.from_table(target_domain,test_data)
print('Domain:',final_data.domain)
# Domain: [Attribute 1, Attribute 2 | Class]
print('Data:')
# Data:
# [[0.800, 6.300 | −],
# [1.400, 8.100 | −],
# [2.100, 7.400 | −],
# [2.600, 14.300 | +],
# [6.800, 12.600 | −],
# [8.800, 9.800 | +],
# ...