如何从现有数据框创建一个数据框,其中很少有列附加为行?

How to create a dataframe from existing data frame where few column are appended as rows?

我有如下数据框:

data = {'Age': [20, 30, 19, 21],'city1':['ny','nj','ln','tampa'],'country1':['usa','usa','usa','usa'],'city2':['london','edinburg',np.nan,'tampa'],
        'country2':['uk','uk','uk','usa'],
        'city1':['ny','london',np.nan,np.nan],'country2':['usa','uk',np.nan,np.nan]}  
df1=pd.DataFrame(data)
print(df1)
    Age city1   country1    city2   country2
0   20  ny      usa        london    usa
1   30  london  usa        edinburg  uk
2   19  NaN     usa        NaN       NaN
3   21  NaN     usa        tampa     NaN

现在我想创建一个新数据框,其中 age 列值根据离开 age 列的列数的一半重复。在上面留下年龄列的数据框中,有四列的一半是 2。因此,年龄列值必须重复两次。形成新的年龄列后,我需要追加 city1, country1 有一行和 city2,country2 作为第二行(类似于例外输出中所示的内容)。尽管我能够将值重复为列表并尝试从其他列中获取值作为列表并附加为如下所示的行:-

code:-
#for repeating the value.
main_list = np.repeat(df1['Age'],2)
#for getting the column values 
r=[]
for i in range(len(df1)):
    r.append(df1.iloc[:,1:3].loc[i].values.tolist())
print(r)
[['ny', 'usa'], ['london', 'usa'], [nan, 'usa'], [nan, 'usa']]

但如您所见,它只给出了 city1,country1 的值,但没有给出 city2,country2 的值,这在将列表 r 值作为行附加到新数据框时引发错误,如下所示:-

newdata = {'Age':main_list}
res=pd.DataFrame(newdata)
print(res)
   Age
0   20
0   20
1   30
1   30
2   19
2   19
3   21
3   21

res.loc[len(res)] = r
print(res)
ValueError: cannot set a row with mismatched columns

如何获取例外值列表并创建如下所示的数据框:-

异常输出:-

r =[['ny', 'usa'], ['london', 'usa'],['london', 'usa'],['edinburg','uk'],
                     [nan, 'usa'],[nan,nan],[nan, 'usa'],['tampa',nan]]

最终数据框:-

   Age  city     country
0   20  'ny'     'usa'
0   20  'london' 'usa'
1   30  'london' 'usa'
1   30  'edinburg''uk'
2   19   NaN     'usa'
2   19   NaN      NaN
3   21   NaN     'usa'
3   21  'tampa'  NaN

您可以使用 wide_to_long:

(pd
 .wide_to_long(df1.reset_index(),
               stubnames=['city', 'country'], i=['index', 'Age'], j='id')
 .droplevel(-1)
 .reset_index('Age')
)

输出:

       Age      city country
index                       
0       20        ny     usa
0       20    london     usa
1       30    london     usa
1       30  edinburg      uk
2       19       NaN     usa
2       19       NaN     NaN
3       21       NaN     usa
3       21     tampa     NaN

您也可以使用 janitor 提供的函数 pivot_longer:

import janitor
df1.pivot_longer('Age', names_to = ['.value', 'group'], names_pattern = '(\D+)(\d+)')

   Age group      city country
0   20     1        ny     usa
1   30     1    london     usa
2   19     1       NaN     usa
3   21     1       NaN     usa
4   20     2    london     usa
5   30     2  edinburg      uk
6   19     2       NaN     NaN
7   21     2     tampa     NaN