getting 'TypeError: object of type 'float' has no len()' when trying to convert Json into Dataframe

Question

我正在尝试根据从 Quickbooks APAgingSummary API 获取的 json 创建数据框，但出现错误 "TypeError: object of type 'float' has no len()"，当我以列表的形式插入 json_normalize 数据到 pandas 时。我使用相同的代码从 Quickbooks AccountListDetail API Json 创建 Dataframe，它工作正常。

此代码用于获取数据：

    base_url = 'https://sandbox-quickbooks.api.intuit.com'
    url = f"{base_url}/v3/company/{auth_client.realm_id}/reports/AgedPayables?&minorversion=62"
    auth_header = f'Bearer {auth_client.access_token}'
    headers = {
        'Authorization': auth_header,
        'Accept': 'application/json'
    }
    response = requests.get(url, headers=headers)
    responseJson = response.json()
    responseJson

这是回复Json:

{'Header': {'Time': '2021-10-05T04:33:02-07:00',
  'ReportName': 'AgedPayables',
  'DateMacro': 'today',
  'StartPeriod': '2021-10-05',
  'EndPeriod': '2021-10-05',
  'SummarizeColumnsBy': 'Total',
  'Currency': 'USD',
  'Option': [{'Name': 'report_date', 'Value': '2021-10-05'},
   {'Name': 'NoReportData', 'Value': 'false'}]},
 'Columns': {'Column': [{'ColTitle': '', 'ColType': 'Vendor'},
   {'ColTitle': 'Current',
    'ColType': 'Money',
    'MetaData': [{'Name': 'ColKey', 'Value': 'current'}]},
   {'ColTitle': '1 - 30',
    'ColType': 'Money',
    'MetaData': [{'Name': 'ColKey', 'Value': '0'}]},
   {'ColTitle': '31 - 60',
    'ColType': 'Money',
    'MetaData': [{'Name': 'ColKey', 'Value': '1'}]},
   {'ColTitle': '61 - 90',
    'ColType': 'Money',
    'MetaData': [{'Name': 'ColKey', 'Value': '2'}]},
   {'ColTitle': '91 and over',
    'ColType': 'Money',
    'MetaData': [{'Name': 'ColKey', 'Value': '3'}]},
   {'ColTitle': 'Total',
    'ColType': 'Money',
    'MetaData': [{'Name': 'ColKey', 'Value': 'total'}]}]},
 'Rows': {'Row': [{'ColData': [{'value': 'Brosnahan Insurance Agency',
      'id': '31'},
     {'value': ''},
     {'value': '241.23'},
     {'value': ''},
     {'value': ''},
     {'value': ''},
     {'value': '241.23'}]},
   {'ColData': [{'value': "Diego's Road Warrior Bodyshop", 'id': '36'},
     {'value': '755.00'},
     {'value': ''},
     {'value': ''},
     {'value': ''},
     {'value': ''},
     {'value': '755.00'}]},
   {'ColData': [{'value': 'Norton Lumber and Building Materials', 'id': '46'},
     {'value': ''},
     {'value': '205.00'},
     {'value': ''},
     {'value': ''},
     {'value': ''},
     {'value': '205.00'}]},
   {'ColData': [{'value': 'PG&E', 'id': '48'},
     {'value': ''},
     {'value': ''},
     {'value': '86.44'},
     {'value': ''},
     {'value': ''},
     {'value': '86.44'}]},
   {'ColData': [{'value': 'Robertson & Associates', 'id': '49'},
     {'value': ''},
     {'value': '315.00'},
     {'value': ''},
     {'value': ''},
     {'value': ''},
     {'value': '315.00'}]},
   {'Summary': {'ColData': [{'value': 'TOTAL'},
      {'value': '755.00'},
      {'value': '761.23'},
      {'value': '86.44'},
      {'value': '0.00'},
      {'value': '0.00'},
      {'value': '1602.67'}]},
    'type': 'Section',
    'group': 'GrandTotal'}]}}

这是我收到错误的代码：

colHeaders = []

for i in responseJson['Columns']['Column']:
    colHeaders.append(i['ColTitle'])

responseDf = pd.json_normalize(responseJson["Rows"]["Row"])
responseDf[colHeaders] = pd.DataFrame(responseDf.ColData.tolist(), index= responseDf.index)

这是json_normalize:

后的responseDf

ColData                                        type group Summary.ColData
0   [{'value': 'Brosnahan Insurance Agency', 'id':...   NaN NaN NaN
1   [{'value': 'Diego's Road Warrior Bodyshop', 'i...   NaN NaN NaN
2   [{'value': 'Norton Lumber and Building Materia...   NaN NaN NaN
3   [{'value': 'PG&E', 'id': '48'}, {'value': ''},...   NaN NaN NaN
4   [{'value': 'Robertson & Associates', 'id': '49...   NaN NaN NaN
5   NaN Section GrandTotal  [{'value': 'TOTAL'}, {'value': '755.00'}, {'va...

ColData 的每个元素都包含字典列表。

这是错误：

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-215-6ce65ce2ac94> in <module>
      6 
      7 responseDf = pd.json_normalize(responseJson["Rows"]["Row"])
----> 8 responseDf[colHeaders] = pd.DataFrame(responseDf.ColData.tolist(), index= responseDf.index)
      9 responseDf
     10 

C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\frame.py in __init__(self, data, index, columns, dtype, copy)
    507                     if is_named_tuple(data[0]) and columns is None:
    508                         columns = data[0]._fields
--> 509                     arrays, columns = to_arrays(data, columns, dtype=dtype)
    510                     columns = ensure_index(columns)
    511 

C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\internals\construction.py in to_arrays(data, columns, coerce_float, dtype)
    522         return [], []  # columns if columns is not None else []
    523     if isinstance(data[0], (list, tuple)):
--> 524         return _list_to_arrays(data, columns, coerce_float=coerce_float, dtype=dtype)
    525     elif isinstance(data[0], abc.Mapping):
    526         return _list_of_dict_to_arrays(

C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\internals\construction.py in _list_to_arrays(data, columns, coerce_float, dtype)
    559     else:
    560         # list of lists
--> 561         content = list(lib.to_object_array(data).T)
    562     # gh-26429 do not raise user-facing AssertionError
    563     try:

pandas\_libs\lib.pyx in pandas._libs.lib.to_object_array()

TypeError: object of type 'float' has no len()

任何帮助将不胜感激。

Answer 1

您收到错误是因为 responseDf 中的 ColData 列有 NaN 值。 NaN 被认为是 float 类型并且没有 len()，因此出现错误。

为了解决这个问题，可以用.fillna()初始化NaN with list of empty dict，如下：

responseDf['ColData'] = responseDf['ColData'].fillna({i: [{}] for i in responseDf.index})

将代码紧跟 pd.json_normalize

行之后

全套代码为：

colHeaders = []

for i in responseJson['Columns']['Column']:
    colHeaders.append(i['ColTitle'])

responseDf = pd.json_normalize(responseJson["Rows"]["Row"])

## Add the code here
responseDf['ColData'] = responseDf['ColData'].fillna({i: [{}] for i in responseDf.index})

responseDf[colHeaders] = pd.DataFrame(responseDf.ColData.tolist(), index= responseDf.index)

然后，你会通过错误得到responseDf的结果，如下：

print(responseDf)


                                                                                                                                                                 ColData     type       group                                                                                                                                 Summary.ColData                                                                             Current               1 - 30             31 - 60        61 - 90    91 and over                Total
0            [{'value': 'Brosnahan Insurance Agency', 'id': '31'}, {'value': ''}, {'value': '241.23'}, {'value': ''}, {'value': ''}, {'value': ''}, {'value': '241.23'}]      NaN         NaN                                                                                                                                             NaN            {'value': 'Brosnahan Insurance Agency', 'id': '31'}        {'value': ''}  {'value': '241.23'}       {'value': ''}  {'value': ''}  {'value': ''}  {'value': '241.23'}
1         [{'value': 'Diego's Road Warrior Bodyshop', 'id': '36'}, {'value': '755.00'}, {'value': ''}, {'value': ''}, {'value': ''}, {'value': ''}, {'value': '755.00'}]      NaN         NaN                                                                                                                                             NaN         {'value': 'Diego's Road Warrior Bodyshop', 'id': '36'}  {'value': '755.00'}        {'value': ''}       {'value': ''}  {'value': ''}  {'value': ''}  {'value': '755.00'}
2  [{'value': 'Norton Lumber and Building Materials', 'id': '46'}, {'value': ''}, {'value': '205.00'}, {'value': ''}, {'value': ''}, {'value': ''}, {'value': '205.00'}]      NaN         NaN                                                                                                                                             NaN  {'value': 'Norton Lumber and Building Materials', 'id': '46'}        {'value': ''}  {'value': '205.00'}       {'value': ''}  {'value': ''}  {'value': ''}  {'value': '205.00'}
3                                    [{'value': 'PG&E', 'id': '48'}, {'value': ''}, {'value': ''}, {'value': '86.44'}, {'value': ''}, {'value': ''}, {'value': '86.44'}]      NaN         NaN                                                                                                                                             NaN                                  {'value': 'PG&E', 'id': '48'}        {'value': ''}        {'value': ''}  {'value': '86.44'}  {'value': ''}  {'value': ''}   {'value': '86.44'}
4                [{'value': 'Robertson & Associates', 'id': '49'}, {'value': ''}, {'value': '315.00'}, {'value': ''}, {'value': ''}, {'value': ''}, {'value': '315.00'}]      NaN         NaN                                                                                                                                             NaN                {'value': 'Robertson & Associates', 'id': '49'}        {'value': ''}  {'value': '315.00'}       {'value': ''}  {'value': ''}  {'value': ''}  {'value': '315.00'}
5                                                                                                                                                                   [{}]  Section  GrandTotal  [{'value': 'TOTAL'}, {'value': '755.00'}, {'value': '761.23'}, {'value': '86.44'}, {'value': '0.00'}, {'value': '0.00'}, {'value': '1602.67'}]                                                             {}                 None                 None                None           None           None                 None

getting 'TypeError: object of type 'float' has no len()' when trying to convert Json into Dataframe

getting 'TypeError: object of type 'float' has no len()' when trying to convert Json into Dataframe

python

json

dataframe

pandas

json-normalize