从合并的数据库中加入列
join columns from merged databases
我想添加单个数据框的 2 列。数据帧是两个单个数据帧的结果。代码如下
df1 = pd.read_csv("acc.csv")
df2 = pd.read_csv("gyr.csv")
df = pd.merge(df1, df2, right_index=True, left_index=True)
所以我有列 id、activity、time、accx、accy、accz、id、activity_gur、time、gurx、gury、gurz
data.head
df["acc_activity"].value_counts()
sitting 32833
standing 31924
lying 31229
running 30429
climbing_up 26938
walking 26080
climbing_down 25281
jumping 4232
Name: activity, dtype: int64
df["gur_activity"].value_counts()
sitting 33267
standing 32546
walking 31912
lying 31822
running 30958
climbing_down 25786
climbing_up 18343
jumping 4312
Name: activity_gur, dtype: int64
所以我想在现有列中添加一个新列 (参见 DATA.HEAD),它是 activity_gur 和 activity 的总和。因此,在数据头的右侧将添加一个名为活动的新列。例如,该列的下行 activity 为 25281+25786=51067。因此在活动列的第 0 行将描述 activity 爬下尽可能多的行。其他活动也一样。当我编写 df["activities"].value_counts() 时,它会 return 我像上面那样。之后,我将删除 activity_gur 和 activity 列
我尝试了如下操作:
df1 = pd.DataFrame({'activity': 32833, 31924, 31793, 31229, 30429, 26938, 25281, 4232],
'activity_gur': [33267, 32546, 31912, 31822, 30958, 25786, 18343, 4312]})
df['activityfinal']=df1.activity + df1.activity_gur
但结果列只是相加的值,我不知道 activity 来自
的每个总和
你能帮我吗?
使用Pandasjoin,
像这样,
new_data = activity.append(activity_gur, ignore_index=True)
试试这个
X = df["activity"].value_counts()
Y = df["activity_gur"].value_counts()
RESULT = pd.DataFrame(X + Y, columns=['TOTAL'])
然后更新现有数据框:
df["TOTAL"] = df["activity"].apply(lambda x: RESULT["TOTAL"].loc[x])
完成上述步骤后,删除列:
df = df.drop(["activity", "activity_gur"], axis=1)
测试数据:
import pandas as pd
import numpy as np
df = pd.DataFrame({
'activity' : np.random.choice( ['sitting', 'standing', 'lying', 'running', 'climbing_up', 'walking', 'climbing_down', 'jumping'], 10000),
'activity_gur' : np.random.choice( ['sitting', 'standing', 'lying', 'running', 'climbing_up', 'walking', 'climbing_down', 'jumping'], 10000)})
第一个value_counts对象:
activity_value_counts = df["activity"].value_counts().sort_index()
activity_value_counts
输出:
climbing_down 1222
climbing_up 1248
jumping 1274
lying 1193
running 1277
sitting 1283
standing 1227
walking 1276
Name: activity, dtype: int64
第二个value_counts对象:
activity_gur_value_counts = df["activity_gur"].value_counts().sort_index()
activity_gur_value_counts
输出:
climbing_down 1238
climbing_up 1274
jumping 1236
lying 1262
running 1220
sitting 1259
standing 1247
walking 1264
Name: activity_gur, dtype: int64
最终数据帧:
df_final = pd.DataFrame({'activity':activity_value_counts})
df_final['activity_gur'] = activity_gur_value_counts
df_final['sum'] = df_final['activity'] + df_final['activity_gur']
df_final
输出:
然后您可以删除 activity
和 activity_gur
列,并根据需要重命名 sum
列。
我想添加单个数据框的 2 列。数据帧是两个单个数据帧的结果。代码如下
df1 = pd.read_csv("acc.csv")
df2 = pd.read_csv("gyr.csv")
df = pd.merge(df1, df2, right_index=True, left_index=True)
所以我有列 id、activity、time、accx、accy、accz、id、activity_gur、time、gurx、gury、gurz data.head
df["acc_activity"].value_counts()
sitting 32833
standing 31924
lying 31229
running 30429
climbing_up 26938
walking 26080
climbing_down 25281
jumping 4232
Name: activity, dtype: int64
df["gur_activity"].value_counts()
sitting 33267
standing 32546
walking 31912
lying 31822
running 30958
climbing_down 25786
climbing_up 18343
jumping 4312
Name: activity_gur, dtype: int64
所以我想在现有列中添加一个新列 (参见 DATA.HEAD),它是 activity_gur 和 activity 的总和。因此,在数据头的右侧将添加一个名为活动的新列。例如,该列的下行 activity 为 25281+25786=51067。因此在活动列的第 0 行将描述 activity 爬下尽可能多的行。其他活动也一样。当我编写 df["activities"].value_counts() 时,它会 return 我像上面那样。之后,我将删除 activity_gur 和 activity 列
我尝试了如下操作:
df1 = pd.DataFrame({'activity': 32833, 31924, 31793, 31229, 30429, 26938, 25281, 4232],
'activity_gur': [33267, 32546, 31912, 31822, 30958, 25786, 18343, 4312]})
df['activityfinal']=df1.activity + df1.activity_gur
但结果列只是相加的值,我不知道 activity 来自
的每个总和你能帮我吗?
使用Pandasjoin,
像这样,
new_data = activity.append(activity_gur, ignore_index=True)
试试这个
X = df["activity"].value_counts()
Y = df["activity_gur"].value_counts()
RESULT = pd.DataFrame(X + Y, columns=['TOTAL'])
然后更新现有数据框:
df["TOTAL"] = df["activity"].apply(lambda x: RESULT["TOTAL"].loc[x])
完成上述步骤后,删除列:
df = df.drop(["activity", "activity_gur"], axis=1)
测试数据:
import pandas as pd
import numpy as np
df = pd.DataFrame({
'activity' : np.random.choice( ['sitting', 'standing', 'lying', 'running', 'climbing_up', 'walking', 'climbing_down', 'jumping'], 10000),
'activity_gur' : np.random.choice( ['sitting', 'standing', 'lying', 'running', 'climbing_up', 'walking', 'climbing_down', 'jumping'], 10000)})
第一个value_counts对象:
activity_value_counts = df["activity"].value_counts().sort_index()
activity_value_counts
输出:
climbing_down 1222
climbing_up 1248
jumping 1274
lying 1193
running 1277
sitting 1283
standing 1227
walking 1276
Name: activity, dtype: int64
第二个value_counts对象:
activity_gur_value_counts = df["activity_gur"].value_counts().sort_index()
activity_gur_value_counts
输出:
climbing_down 1238
climbing_up 1274
jumping 1236
lying 1262
running 1220
sitting 1259
standing 1247
walking 1264
Name: activity_gur, dtype: int64
最终数据帧:
df_final = pd.DataFrame({'activity':activity_value_counts})
df_final['activity_gur'] = activity_gur_value_counts
df_final['sum'] = df_final['activity'] + df_final['activity_gur']
df_final
输出:
然后您可以删除 activity
和 activity_gur
列,并根据需要重命名 sum
列。