如何修复 "column has values of Type which is not the same as an earlier observed Type "

How to fix "column has values of Type which is not the same as an earlier observed Type "

我正在创建一个机器学习模型,我想从文本文件中读取不同的值并使用 CustomMapping 处理它们。当 运行 CustomMapping.

时程序抛出 System.InvalidOperationException

我已经将原因缩小到我的 CustomMapping 函数,我正在读取的文本文件没有任何空值。我已经仔细检查了我所有的变量声明,并确保它们都使用了正确的类型。我的直觉是自定义映射将 1 和 0 解释为布尔值而不是浮点数,尽管我认为没有理由这样做。

为大量转储道歉,问题是关于类型问题,所以我认为显示所有内容很重要。

我的管道:

var pipeline = context.Transforms.CustomMapping<ProfileInput, ProfileProcess>(ProfileMapping.Transform, nameof(ProfileMapping))
.Append(context.Transforms.Concatenate("Features", "isBanned", "profileVisibility", "profileConfigured", "lastLogOff", "commentPermission", "timeCreated", "friendCount", "gameBannedFriendsCount", "vacBannedFriendsCount", "gameBannedFriendsPercent", "vacBannedFriendsPercent"));

我的自定义映射:

public static void Transform(ProfileInput input, ProfileProcess output)
{
  if (input.numberGameBans > 0 || input.numberVacBans > 0)
    output.isBanned = false;

  output.gameBannedFriendsPercent = input.gameBannedFriendsCount / input.friendCount;
  output.vacBannedFriendsPercent = input.vacBannedFriendsCount / input.friendCount;
  output.profileVisibility = input.profileVisibility;
  output.profileConfigured = input.profileConfigured;
  output.lastLogOff = input.lastLogOff;
  output.commentPermission =  input.commentPermission;
  output.timeCreated = input.timeCreated;
  output.friendCount = input.friendCount;
  output.gameBannedFriendsCount = input.gameBannedFriendsCount;
  output.vacBannedFriendsCount = input.vacBannedFriendsCount;
}

配置文件输入:

public class ProfileInput
{
  [LoadColumn(0)]
  public bool commentPermission;
  [LoadColumn(1)]
  public float lastLogOff;
  [LoadColumn(2)]
  public bool profileConfigured;
  [LoadColumn(3)]
  public float profileVisibility;
  [LoadColumn(4)]
  public float timeCreated;
  [LoadColumn(5)]
  public float numberVacBans;
  [LoadColumn(6)]
  public float numberGameBans;
  [LoadColumn(7)]
  public float vacBannedFriendsCount;
  [LoadColumn(8)]
  public float gameBannedFriendsCount;
  [LoadColumn(9)]
  public float friendCount;
}

配置文件进程:

public class ProfileProcess
{
  public bool isBanned;
  public float profileVisibility;
  public bool profileConfigured;
  public float lastLogOff;
  public bool commentPermission;
  public float timeCreated;
  public float friendCount;
  public float gameBannedFriendsCount;
  public float vacBannedFriendsCount;
  public float gameBannedFriendsPercent;
  public float vacBannedFriendsPercent;
}

当 运行 pipeline.fit() 我得到以下异常:

System.InvalidOperationException: 'Column 'profileVisibility' has values of R4which is not the same as earlier observed type of Bool.'

我希望它能成功完成代码而不会抛出错误,实际输出将是一个 TransformerChain 模型——我知道管道还没有训练器,所以该模型将一无是处,因为它现在就站着。

context.Transforms.Concatenate 连接相同类型的列。类型由第一个输入列定义,在您的例子中是 "isBanned"。因为那是一个布尔值,Concatenate 期望下一个值也是一个布尔值。

如果您要将列连接在一起,而不对它们进行任何其他预处理,您可以直接将它们加载为浮点数 (0/1) 而不是布尔值。

您需要做的就是 OneHotEncode 您的非浮点列

.Append(context.Transforms.Categorical.OneHotEncoding(outputColumnName: "isBannedEncoded", inputColumnName: "isBanned"))
.Append(context.Transforms.Categorical.OneHotEncoding(outputColumnName: "profileConfiguredEncoded", inputColumnName: "profileConfigured"))
.Append(context.Transforms.Categorical.OneHotEncoding(outputColumnName: "commentPermissionEncoded", inputColumnName: "commentPermission"))

.Append(context.Transforms.Concatenate("Features", "isBannedEncoded", "profileVisibility", "profileConfiguredEncoded", "lastLogOff", "commentPermissionEncoded", "timeCreated", "friendCount", "gameBannedFriendsCount", "vacBannedFriendsCount", "gameBannedFriendsPercent", "vacBannedFriendsPercent"));

希望对您有所帮助