如何使用 Python 上的 for 循环和数组更新 df?
How to update a df using a for loop and arrays on Python?
假设我创建了以下 df:
import pandas as pd
#column names
column_names = ["Time", "Currency", "Volatility expected", "Event", "Actual", "Forecast", "Previous"]
#create a dataframe including the column names
df = pd.DataFrame(columns=column_names)
然后,我创建了以下数组,其中的单元格值将添加到我的 df
:
rows = ["2:00", "GBP", "", "Construction Output (MoM) (Jan)", "1.1%", "0.5%", "2.0%",
"2:00", "GBP", "", "U.K. Construction Output (YoY) (Jan)", "9.9%", "9.2%", "7.4%"]
那么,我如何使用 for 循环来更新我的 df
,使其最终像这样:
|Time |Currency |Volatility expected |Event |Actual |Forecast |Previous |
------------------------------------------------------------------------------------------------------------------
|02:00 |GBP | |Construction Output (MoM) (Jan) |1.1% |0.5% |2.0% |
|04:00 |GBP | |U.K. Construction Output (YoY) (Jan)|9.9% |9.2% |7.4% |
我试过了:
column_name_location = 0
for row in rows:
df.at['0', df[column_name_location]] = row
column_name_location += 1
print(df)
但是得到了:
KeyError: 0
我可以在这里得到一些建议吗?
如果 rows
是项目的平面列表,您可以将其转换为 numpy 数组以先对其进行整形
假设 rows
实际上是 sub-lists 的列表,每个 sub-list 是一行,您可以使用数据框的列名称从每一行创建一个 pd.Series
系列的索引,然后使用 df.append
将它们全部附加:
df.append([pd.Series(r, index=df.columns) for r in rows])
如果 rows
实际上只是一个平面列表,您需要将其转换为 numpy 数组以重塑它:
rows = np.array(rows).reshape(-1, 7).tolist()
您似乎创建了一个包含 14 项的列表。
您可以改为将其作为包含 2 个项目的列表,其中每个项目是一个包含 7 个值的列表。
rows = [["2:00", "GBP", "", "Construction Output (MoM) (Jan)", "1.1%", "0.5%", "2.0%"],
["2:00", "GBP", "", "U.K. Construction Output (YoY) (Jan)", "9.9%", "9.2%", "7.4%"]]
有了这个,我们可以直接创建一个dataframe,如下图
df = pd.DataFrame(rows, columns=column_names)
print(df)
这输出 2 行
Time Currency Volatility expected Event Actual Forecast Previous
0 2:00 GBP Construction Output (MoM) (Jan) 1.1% 0.5% 2.0%
1 2:00 GBP U.K. Construction Output (YoY) (Jan) 9.9% 9.2% 7.4%
假设我创建了以下 df:
import pandas as pd
#column names
column_names = ["Time", "Currency", "Volatility expected", "Event", "Actual", "Forecast", "Previous"]
#create a dataframe including the column names
df = pd.DataFrame(columns=column_names)
然后,我创建了以下数组,其中的单元格值将添加到我的 df
:
rows = ["2:00", "GBP", "", "Construction Output (MoM) (Jan)", "1.1%", "0.5%", "2.0%",
"2:00", "GBP", "", "U.K. Construction Output (YoY) (Jan)", "9.9%", "9.2%", "7.4%"]
那么,我如何使用 for 循环来更新我的 df
,使其最终像这样:
|Time |Currency |Volatility expected |Event |Actual |Forecast |Previous |
------------------------------------------------------------------------------------------------------------------
|02:00 |GBP | |Construction Output (MoM) (Jan) |1.1% |0.5% |2.0% |
|04:00 |GBP | |U.K. Construction Output (YoY) (Jan)|9.9% |9.2% |7.4% |
我试过了:
column_name_location = 0
for row in rows:
df.at['0', df[column_name_location]] = row
column_name_location += 1
print(df)
但是得到了:
KeyError: 0
我可以在这里得到一些建议吗?
如果 rows
是项目的平面列表,您可以将其转换为 numpy 数组以先对其进行整形
假设 rows
实际上是 sub-lists 的列表,每个 sub-list 是一行,您可以使用数据框的列名称从每一行创建一个 pd.Series
系列的索引,然后使用 df.append
将它们全部附加:
df.append([pd.Series(r, index=df.columns) for r in rows])
如果 rows
实际上只是一个平面列表,您需要将其转换为 numpy 数组以重塑它:
rows = np.array(rows).reshape(-1, 7).tolist()
您似乎创建了一个包含 14 项的列表。 您可以改为将其作为包含 2 个项目的列表,其中每个项目是一个包含 7 个值的列表。
rows = [["2:00", "GBP", "", "Construction Output (MoM) (Jan)", "1.1%", "0.5%", "2.0%"],
["2:00", "GBP", "", "U.K. Construction Output (YoY) (Jan)", "9.9%", "9.2%", "7.4%"]]
有了这个,我们可以直接创建一个dataframe,如下图
df = pd.DataFrame(rows, columns=column_names)
print(df)
这输出 2 行
Time Currency Volatility expected Event Actual Forecast Previous
0 2:00 GBP Construction Output (MoM) (Jan) 1.1% 0.5% 2.0%
1 2:00 GBP U.K. Construction Output (YoY) (Jan) 9.9% 9.2% 7.4%