polars dataframe TypeError: must be real number, not str

Question

所以基本上我将 panda.frame 更改为 polars.frame 以获得更好的 yolov5 速度但是当我运行代码时，它在某些时候工作正常（我不知道什么时候发生错误）并且它给我 TypeError: must be real number, not str. 运行将它与 panda 结合使用效果很好，没有任何错误，但仅适用于 polars。我知道它一定是使用了错误的数据类型，但我真的不知道我应该在哪里寻找，因为我刚刚开始 python。所以如果有人能帮助我，我将不胜感激！感谢阅读，祝你有美好的一天！

Traceback (most recent call last):
 File "C:\yolov5\test.py", line 61, in <module>
  boxes = results.polars().xywh[0]
 File "c:\yolov5\.\models\common.py", line 684, in polars
  setattr(new, k, [pl.DataFrame(x, columns=c) for x in a])
 File "c:\yolov5\.\models\common.py", line 684, in <listcomp>
  setattr(new, k, [pl.DataFrame(x, columns=c) for x in a])
 File 
 "C:\Users\jojow\AppData\Local\Programs\Python\Python39\lib\site- 
 packages\polars\internals\frame.py", line 311, in __init__
self._df = sequence_to_pydf(data, columns=columns, orient=orient)
 File 
"C:\Users\jojow\AppData\Local\Programs\Python\Python39\lib\site- 
packages\polars\internals\construction.py", line 495, in 
sequence_to_pydf
data_series = [
File 
"C:\Users\jojow\AppData\Local\Programs\Python\Python39\lib\site- 
 packages\polars\internals\construction.py", line 496, in 
 <listcomp>
pli.Series(columns[i], data[i], dtypes.get(columns[i])).inner()
 File 
"C:\Users\jojow\AppData\Local\Programs\Python\Python39\lib\site- 
 packages\polars\internals\series.py", line 227, in __init__
self._s = sequence_to_pyseries(name, values, dtype=dtype, 
strict=strict)
 File 
 "C:\Users\jojow\AppData\Local\Programs\Python\Python39\lib\site- 
packages\polars\internals\construction.py", line 239, in 
sequence_to_pyseries
return constructor(name, values, strict)
TypeError: must be real number, not str

这是我的代码（已编辑）

import polars as pl 
import pandas as pd

class new:
    xyxy = 0

a = [[[370.01605224609375, 346.4305114746094, 398.3968811035156, 
384.5684814453125, 0.9011853933334351, 0, 'corn'], 
[415.436767578125, 279.4227294921875, 433.930419921875, 
305.5151672363281, 0.8829901814460754, 0, 'corn'], 
[383.8118896484375, 268.781494140625, 402.35479736328125, 
292.4585266113281, 0.8579609394073486, 0, 'corn'], 
[431.42791748046875, 570.9154663085938, 476.672119140625, 600.0, 
0.810459554195404, 0, 'corn'], [414.912841796875, 
257.7676086425781, 427.7708740234375, 274.69635009765625,
0.7384995818138123, 0, 'corn'], [391.22821044921875, 
250.48876953125, 403.9199523925781, 268.1374816894531, 
0.6828912496566772, 0, 'corn'], [414.2362060546875, 
250.18174743652344, 423.82537841796875, 264.02667236328125, 
0.517136812210083, 0, 'corn']]]

ca = 'xmin', 'ymin', 'xmax', 'ymax', 'confidence', 'class', 'name'  # xyxy columns
cb = 'xcenter', 'ycenter', 'width', 'height', 'confidence', 'class', 'name'  # xywh columns

for k, c in zip(['xyxy', 'xyxyn', 'xywh', 'xywhn'], [ca, ca, cb, 
cb]):
    setattr(new, k, [pl.DataFrame(x, columns=c) for x in a])

print (new.xyxy[0])

Answer 1

根据您提供的信息，我只能提供一个关于在哪里查看的提示。

接近代码末尾时，您正在创建一个新列表 DataFrame

setattr(new, k, [polars.DataFrame(x, columns=c) for x in a])

错误是由这个调用引起的：

polars.DataFrame(x, columns=c)

正在发生的事情是，您传递给 (x) 到其中一个 DataFrame 的列表之一混合了数字和字符串。更具体地说，其中一个列表以一个或多个数字开头，但之后某处包含一个字符串。这导致了一个错误，因为 Polars 试图从该列表中创建一列数字。

一个例子

让我们仔细看看。下面是创建 DataFrame 的示例：

import polars as pl
pl.DataFrame([["one", "two", "three"], [1.0, 2.0, 3.0]],
             columns=["col1", "col2"])

注意["one", "two", "three"]都是字符串。而 [1.0, 2.0, 3.0] 都是数字。因此，在每一列中，我们只有一种类型的数据。我们没有错误...

shape: (3, 2)
┌───────┬──────┐
│ col1  ┆ col2 │
│ ---   ┆ ---  │
│ str   ┆ f64  │
╞═══════╪══════╡
│ one   ┆ 1.0  │
├╌╌╌╌╌╌╌┼╌╌╌╌╌╌┤
│ two   ┆ 2.0  │
├╌╌╌╌╌╌╌┼╌╌╌╌╌╌┤
│ three ┆ 3.0  │
└───────┴──────┘

现在让我们看看当我们不小心将字符串与数字列混在一起时会发生什么：

pl.DataFrame([["one", "two", "three"], [1.0, 2.0, "Oops, this is a string mixed in with numbers"]],
             columns=["col1", "col2"])

我们收到一个错误...

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/xxxx/.virtualenvs/PolarsTesting3.10/lib/python3.10/site-packages/polars/internals/frame.py", line 311, in __init__
    self._df = sequence_to_pydf(data, columns=columns, orient=orient)
  File "/home/xxxx/.virtualenvs/PolarsTesting3.10/lib/python3.10/site-packages/polars/internals/construction.py", line 495, in sequence_to_pydf
    data_series = [
  File "/home/xxxx/.virtualenvs/PolarsTesting3.10/lib/python3.10/site-packages/polars/internals/construction.py", line 496, in <listcomp>
    pli.Series(columns[i], data[i], dtypes.get(columns[i])).inner()
  File "/home/xxxx/.virtualenvs/PolarsTesting3.10/lib/python3.10/site-packages/polars/internals/series.py", line 227, in __init__
    self._s = sequence_to_pyseries(name, values, dtype=dtype, strict=strict)
  File "/home/xxxx/.virtualenvs/PolarsTesting3.10/lib/python3.10/site-packages/polars/internals/construction.py", line 239, in sequence_to_pyseries
    return constructor(name, values, strict)
TypeError: must be real number, not str

将此错误消息与您收到的消息进行比较。它们非常匹配（目录除外，这些目录特定于每台计算机）。

因此，您需要查找以一个或多个数字开头但包含字符串的列表。 Polars 尝试使用此列表创建一列数字，并抛出错误。

也许列表中一个或多个本应是数字的元素包含一个字符串，例如“Error”或“NULL”或“#N/A”或类似的东西。

你必须调试它才能找到答案。

Answer 2

感谢您添加数据。它使解决问题变得容易。

您需要做的是将 orient="row" 添加到您创建 DataFrame 的调用中：

pl.DataFrame(x, columns=c, orient="row")

一旦我们通过添加 orient="row" 关键字和 re-run 对您的代码进行更改，我们将得到：

shape: (7, 7)
┌────────────┬────────────┬────────────┬────────────┬────────────┬───────┬──────┐
│ xmin       ┆ ymin       ┆ xmax       ┆ ymax       ┆ confidence ┆ class ┆ name │
│ ---        ┆ ---        ┆ ---        ┆ ---        ┆ ---        ┆ ---   ┆ ---  │
│ f64        ┆ f64        ┆ f64        ┆ f64        ┆ f64        ┆ i64   ┆ str  │
╞════════════╪════════════╪════════════╪════════════╪════════════╪═══════╪══════╡
│ 370.016052 ┆ 346.430511 ┆ 398.396881 ┆ 384.568481 ┆ 0.901185   ┆ 0     ┆ corn │
├╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌┼╌╌╌╌╌╌┤
│ 415.436768 ┆ 279.422729 ┆ 433.9304   ┆ 305.515167 ┆ 0.8829     ┆ 0     ┆ corn │
├╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌┼╌╌╌╌╌╌┤
│ 383.8118   ┆ 268.781494 ┆ 402.354797 ┆ 292.458527 ┆ 0.857961   ┆ 0     ┆ corn │
├╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌┼╌╌╌╌╌╌┤
│ 431.427917 ┆ 570.915466 ┆ 476.672119 ┆ 600.0      ┆ 0.8104     ┆ 0     ┆ corn │
├╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌┼╌╌╌╌╌╌┤
│ 414.912842 ┆ 257.767609 ┆ 427.770874 ┆ 274.6963   ┆ 0.7385     ┆ 0     ┆ corn │
├╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌┼╌╌╌╌╌╌┤
│ 391.2282   ┆ 250.4887   ┆ 403.919952 ┆ 268.137482 ┆ 0.682891   ┆ 0     ┆ corn │
├╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌┼╌╌╌╌╌╌┤
│ 414.236206 ┆ 250.181747 ┆ 423.825378 ┆ 264.026672 ┆ 0.517137   ┆ 0     ┆ corn │
└────────────┴────────────┴────────────┴────────────┴────────────┴───────┴──────┘

为什么在这种情况下需要 `orient` 关键字

让我们从一个简单的例子开始。我们将提供三个个列表，以及两个个列名称：

pl.DataFrame([[1.1, 'a'], [2.2, 'b'], [3.3, 'c']], columns=['col_1', 'col_2'])

在此示例中，Polars 尝试推断每个列表（例如，[1.1, 'a']）代表一行还是一列。来自 polars.DataFrame 的文档：

orient{‘col’, ‘row’}, default None
Whether to interpret two-dimensional data as columns or as rows. If None, the orientation is inferred by matching the columns and data dimensions. If this does not yield conclusive results, column orientation is used.

因此，在上述情况下，Polars 会尝试通过查看 columns 关键字中列名的数量来推断每个列表代表一列还是一行。由于有三个列表，但只有两个列名，Polars（正确地）推断出每个列表必须代表一行，而不是一列。

shape: (3, 2)
┌───────┬───────┐
│ col_1 ┆ col_2 │
│ ---   ┆ ---   │
│ f64   ┆ str   │
╞═══════╪═══════╡
│ 1.1   ┆ a     │
├╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌┤
│ 2.2   ┆ b     │
├╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌┤
│ 3.3   ┆ c     │
└───────┴───────┘

现在，让我们删除其中一个列表，以便有两个个列表和两个个列名：

pl.DataFrame([[1.1, 'a'], [2.2, 'b']], columns=['col_1', 'col_2'])

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/xxxx/.virtualenvs/Whosebug3.10/lib/python3.10/site-packages/polars/internals/frame.py", line 311, in __init__
    self._df = sequence_to_pydf(data, columns=columns, orient=orient)
  File "/home/xxxx/.virtualenvs/Whosebug3.10/lib/python3.10/site-packages/polars/internals/construction.py", line 495, in sequence_to_pydf
    data_series = [
  File "/home/xxxx/.virtualenvs/Whosebug3.10/lib/python3.10/site-packages/polars/internals/construction.py", line 496, in <listcomp>
    pli.Series(columns[i], data[i], dtypes.get(columns[i])).inner()
  File "/home/xxxx/.virtualenvs/Whosebug3.10/lib/python3.10/site-packages/polars/internals/series.py", line 227, in __init__
    self._s = sequence_to_pyseries(name, values, dtype=dtype, strict=strict)
  File "/home/xxx/.virtualenvs/Whosebug3.10/lib/python3.10/site-packages/polars/internals/construction.py", line 239, in sequence_to_pyseries
    return constructor(name, values, strict)
TypeError: must be real number, not str

这个错误看起来很眼熟，不是吗？

因为现在有两个个列表和两个个列名，所以不清楚每个列表代表一行还是代表一列。因此，根据文档，Polars 将每个列表解释为一列，而不是一行。

但这会导致问题，因为每个列表（在本例中为 [1, 'a']）既有数字又有字符串。这会导致错误。

因此，由于列表的数量等于列名的数量，我们需要告诉 Polars 每个列表代表一行，而不是一列。

pl.DataFrame([[1.1, 'a'], [2.2, 'b']], columns=['col_1', 'col_2'], orient='row')

现在错误消失了。

shape: (2, 2)
┌───────┬───────┐
│ col_1 ┆ col_2 │
│ ---   ┆ ---   │
│ f64   ┆ str   │
╞═══════╪═══════╡
│ 1.1   ┆ a     │
├╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌┤
│ 2.2   ┆ b     │
└───────┴───────┘

考虑到这一点，让我们看看您的代码。 a 中有多少个列表？七。提供了多少个列名？ ca 和 cb 都提供 7 列名称。由于列表的数量和列名的数量相等，因此 Polars 将每个列表解释为列，而不是一行。例如，Polars 解释

[370.01605224609375, 346.4305114746094, 398.3968811035156, 
384.5684814453125, 0.9011853933334351, 0, 'corn']

作为列，而不是行。因此，Polars 看到字符串“corn”与同一列中的数字混合在一起。因此错误。

polars dataframe TypeError: must be real number, not str