如何使用 pandas 从 DataFrame 中的所有元素中减去一个数字?
How to subtract a number from all elements in a DataFrame with pandas?
我正在尝试使用 pandas 从 DataFrame 中的所有元素中减去一个数字。但是,只有第一个元素被减去,其他元素得到 NaN
.
这是数据:
DataFrame_3x5.csv
A B C
0.1 0.3 0.5
0.2 0.4 0.6
0.3 0.5 0.7
0.4 0.6 0.8
0.5 0.7 0.9
这是我的代码:
import pandas as pd
data = pd.read_csv(r"DataFrame_3x5.csv")
df = pd.DataFrame(data)
medianList = pd.DataFrame()
for i in range(0, data.shape[1]):
medianList = medianList.append([df.iloc[:,i].median()], ignore_index=True)
for i in range(0, data.shape[1]):
print(data.iloc[:,i])
print(medianList.iloc[i])
print(data.iloc[:,i] - medianList.iloc[i])
# print(data.iloc[:,i].sub([medianList.iloc[i]], axis='columns')) # doesn't work
结果如下:
0 0.1
1 0.2
2 0.3
3 0.4
4 0.5
Name: A, dtype: float64
0 0.3
Name: 0, dtype: float64
0 -0.2
1 NaN
2 NaN
3 NaN
4 NaN
dtype: float64
0 0.3
1 0.4
2 0.5
3 0.6
4 0.7
Name: B, dtype: float64
0 0.5
Name: 1, dtype: float64
0 -0.2
1 NaN
2 NaN
3 NaN
4 NaN
dtype: float64
0 0.5
1 0.6
2 0.7
3 0.8
4 0.9
Name: C, dtype: float64
0 0.7
Name: 2, dtype: float64
0 -0.2
1 NaN
2 NaN
3 NaN
4 NaN
dtype: float64
我的期望:
0 -0.2
1 -0.1
2 0.0
3 0.1
4 0.2
根据this site,
print(data.iloc[:,i].sub([medianList.iloc[i]], axis='columns'))
... 应该可以,但实际上会产生错误:
ValueError: No axis named columns for object type <class 'pandas.core.series.Series'>
我不知道该怎么办了。请帮我。谢谢。
我想如果你先尝试 dropna
然后简单地减去它就可以了
df=df.dropna(how='any')
df['Sub']=int(df['A']) - int(df['B']) - int(df['C'])
你可以这样做:
df - df.median(axis=0)
和 pandas 会处理用于计算值的轴
一个简单的解决方案:
import pandas as pd
df = pd.read_csv(r"DataFrame_3x5.csv")
df['A'] - df['A'].median()
import pandas as pd
data = pd.read_csv(r"DataFrame_3x5.csv")
df = pd.DataFrame(data)
medianList = pd.DataFrame()
for i in range(0, data.shape[1]):
medianList = medianList.append([df.iloc[:,i].median()], ignore_index=True)
df1 = pd.DataFrame(columns=['A'])
j=0
for i in range(0, data.shape[0]):
print(data['A'].iloc[i]) # one column
print(medianList.iloc[i]) #1 value
print(data['A'].iloc[i] - medianList.iloc[j])
我正在尝试使用 pandas 从 DataFrame 中的所有元素中减去一个数字。但是,只有第一个元素被减去,其他元素得到 NaN
.
这是数据: DataFrame_3x5.csv
A B C
0.1 0.3 0.5
0.2 0.4 0.6
0.3 0.5 0.7
0.4 0.6 0.8
0.5 0.7 0.9
这是我的代码:
import pandas as pd
data = pd.read_csv(r"DataFrame_3x5.csv")
df = pd.DataFrame(data)
medianList = pd.DataFrame()
for i in range(0, data.shape[1]):
medianList = medianList.append([df.iloc[:,i].median()], ignore_index=True)
for i in range(0, data.shape[1]):
print(data.iloc[:,i])
print(medianList.iloc[i])
print(data.iloc[:,i] - medianList.iloc[i])
# print(data.iloc[:,i].sub([medianList.iloc[i]], axis='columns')) # doesn't work
结果如下:
0 0.1
1 0.2
2 0.3
3 0.4
4 0.5
Name: A, dtype: float64
0 0.3
Name: 0, dtype: float64
0 -0.2
1 NaN
2 NaN
3 NaN
4 NaN
dtype: float64
0 0.3
1 0.4
2 0.5
3 0.6
4 0.7
Name: B, dtype: float64
0 0.5
Name: 1, dtype: float64
0 -0.2
1 NaN
2 NaN
3 NaN
4 NaN
dtype: float64
0 0.5
1 0.6
2 0.7
3 0.8
4 0.9
Name: C, dtype: float64
0 0.7
Name: 2, dtype: float64
0 -0.2
1 NaN
2 NaN
3 NaN
4 NaN
dtype: float64
我的期望:
0 -0.2
1 -0.1
2 0.0
3 0.1
4 0.2
根据this site,
print(data.iloc[:,i].sub([medianList.iloc[i]], axis='columns'))
... 应该可以,但实际上会产生错误:
ValueError: No axis named columns for object type <class 'pandas.core.series.Series'>
我不知道该怎么办了。请帮我。谢谢。
我想如果你先尝试 dropna
然后简单地减去它就可以了
df=df.dropna(how='any')
df['Sub']=int(df['A']) - int(df['B']) - int(df['C'])
你可以这样做:
df - df.median(axis=0)
和 pandas 会处理用于计算值的轴
一个简单的解决方案:
import pandas as pd
df = pd.read_csv(r"DataFrame_3x5.csv")
df['A'] - df['A'].median()
import pandas as pd
data = pd.read_csv(r"DataFrame_3x5.csv")
df = pd.DataFrame(data)
medianList = pd.DataFrame()
for i in range(0, data.shape[1]):
medianList = medianList.append([df.iloc[:,i].median()], ignore_index=True)
df1 = pd.DataFrame(columns=['A'])
j=0
for i in range(0, data.shape[0]):
print(data['A'].iloc[i]) # one column
print(medianList.iloc[i]) #1 value
print(data['A'].iloc[i] - medianList.iloc[j])