从合并的数据库中加入列

join columns from merged databases

我想添加单个数据框的 2 列。数据帧是两个单个数据帧的结果。代码如下

df1 = pd.read_csv("acc.csv")
df2 = pd.read_csv("gyr.csv")

df = pd.merge(df1, df2, right_index=True, left_index=True)

所以我有列 id、activity、time、accx、accy、accz、id、activity_gur、time、gurx、gury、gurz data.head

df["acc_activity"].value_counts()
sitting          32833
standing         31924
lying            31229
running          30429
climbing_up      26938
walking          26080
climbing_down    25281
jumping           4232
Name: activity, dtype: int64

df["gur_activity"].value_counts()

sitting          33267
standing         32546
walking          31912
lying            31822
running          30958
climbing_down    25786
climbing_up      18343
jumping           4312
Name: activity_gur, dtype: int64

所以我想在现有列中添加一个新列 (参见 DATA.HEAD),它是 activity_gur 和 activity 的总和。因此,在数据头的右侧将添加一个名为活动的新列。例如,该列的下行 activity 为 25281+25786=51067。因此在活动列的第 0 行将描述 activity 爬下尽可能多的行。其他活动也一样。当我编写 df["activities"].value_counts() 时,它会 return 我像上面那样。之后,我将删除 activity_gur 和 activity 列

我尝试了如下操作:

df1 = pd.DataFrame({'activity': 32833, 31924, 31793, 31229, 30429, 26938, 25281, 4232],
                       'activity_gur': [33267, 32546, 31912, 31822, 30958, 25786, 18343, 4312]})

df['activityfinal']=df1.activity + df1.activity_gur

但结果列只是相加的值,我不知道 activity 来自

的每个总和

你能帮我吗?

使用Pandasjoin,

像这样,

new_data = activity.append(activity_gur, ignore_index=True)

试试这个

X = df["activity"].value_counts()
Y = df["activity_gur"].value_counts()
RESULT = pd.DataFrame(X + Y, columns=['TOTAL'])

然后更新现有数据框:

df["TOTAL"] = df["activity"].apply(lambda x: RESULT["TOTAL"].loc[x])

完成上述步骤后,删除列:

df = df.drop(["activity", "activity_gur"], axis=1)

测试数据:

import pandas as pd
import numpy as np

df = pd.DataFrame({
    'activity' : np.random.choice( ['sitting', 'standing', 'lying', 'running', 'climbing_up', 'walking', 'climbing_down', 'jumping'], 10000),
    'activity_gur' : np.random.choice( ['sitting', 'standing', 'lying', 'running', 'climbing_up', 'walking', 'climbing_down', 'jumping'], 10000)})

第一个value_counts对象:

activity_value_counts = df["activity"].value_counts().sort_index()
activity_value_counts

输出:

climbing_down    1222
climbing_up      1248
jumping          1274
lying            1193
running          1277
sitting          1283
standing         1227
walking          1276
Name: activity, dtype: int64

第二个value_counts对象:

activity_gur_value_counts = df["activity_gur"].value_counts().sort_index()
activity_gur_value_counts

输出:

climbing_down    1238
climbing_up      1274
jumping          1236
lying            1262
running          1220
sitting          1259
standing         1247
walking          1264
Name: activity_gur, dtype: int64

最终数据帧:

df_final = pd.DataFrame({'activity':activity_value_counts})
df_final['activity_gur'] = activity_gur_value_counts
df_final['sum'] = df_final['activity'] + df_final['activity_gur']
df_final

输出:

然后您可以删除 activityactivity_gur 列,并根据需要重命名 sum 列。