如何从现有数据框创建一个数据框,其中很少有列附加为行?
How to create a dataframe from existing data frame where few column are appended as rows?
我有如下数据框:
data = {'Age': [20, 30, 19, 21],'city1':['ny','nj','ln','tampa'],'country1':['usa','usa','usa','usa'],'city2':['london','edinburg',np.nan,'tampa'],
'country2':['uk','uk','uk','usa'],
'city1':['ny','london',np.nan,np.nan],'country2':['usa','uk',np.nan,np.nan]}
df1=pd.DataFrame(data)
print(df1)
Age city1 country1 city2 country2
0 20 ny usa london usa
1 30 london usa edinburg uk
2 19 NaN usa NaN NaN
3 21 NaN usa tampa NaN
现在我想创建一个新数据框,其中 age
列值根据离开 age
列的列数的一半重复。在上面留下年龄列的数据框中,有四列的一半是 2。因此,年龄列值必须重复两次。形成新的年龄列后,我需要追加 city1,
country1
有一行和 city2,country2
作为第二行(类似于例外输出中所示的内容)。尽管我能够将值重复为列表并尝试从其他列中获取值作为列表并附加为如下所示的行:-
code:-
#for repeating the value.
main_list = np.repeat(df1['Age'],2)
#for getting the column values
r=[]
for i in range(len(df1)):
r.append(df1.iloc[:,1:3].loc[i].values.tolist())
print(r)
[['ny', 'usa'], ['london', 'usa'], [nan, 'usa'], [nan, 'usa']]
但如您所见,它只给出了 city1,country1
的值,但没有给出 city2,country2
的值,这在将列表 r 值作为行附加到新数据框时引发错误,如下所示:-
newdata = {'Age':main_list}
res=pd.DataFrame(newdata)
print(res)
Age
0 20
0 20
1 30
1 30
2 19
2 19
3 21
3 21
res.loc[len(res)] = r
print(res)
ValueError: cannot set a row with mismatched columns
如何获取例外值列表并创建如下所示的数据框:-
异常输出:-
r =[['ny', 'usa'], ['london', 'usa'],['london', 'usa'],['edinburg','uk'],
[nan, 'usa'],[nan,nan],[nan, 'usa'],['tampa',nan]]
最终数据框:-
Age city country
0 20 'ny' 'usa'
0 20 'london' 'usa'
1 30 'london' 'usa'
1 30 'edinburg''uk'
2 19 NaN 'usa'
2 19 NaN NaN
3 21 NaN 'usa'
3 21 'tampa' NaN
您可以使用 wide_to_long
:
(pd
.wide_to_long(df1.reset_index(),
stubnames=['city', 'country'], i=['index', 'Age'], j='id')
.droplevel(-1)
.reset_index('Age')
)
输出:
Age city country
index
0 20 ny usa
0 20 london usa
1 30 london usa
1 30 edinburg uk
2 19 NaN usa
2 19 NaN NaN
3 21 NaN usa
3 21 tampa NaN
您也可以使用 janitor
提供的函数 pivot_longer
:
import janitor
df1.pivot_longer('Age', names_to = ['.value', 'group'], names_pattern = '(\D+)(\d+)')
Age group city country
0 20 1 ny usa
1 30 1 london usa
2 19 1 NaN usa
3 21 1 NaN usa
4 20 2 london usa
5 30 2 edinburg uk
6 19 2 NaN NaN
7 21 2 tampa NaN
我有如下数据框:
data = {'Age': [20, 30, 19, 21],'city1':['ny','nj','ln','tampa'],'country1':['usa','usa','usa','usa'],'city2':['london','edinburg',np.nan,'tampa'],
'country2':['uk','uk','uk','usa'],
'city1':['ny','london',np.nan,np.nan],'country2':['usa','uk',np.nan,np.nan]}
df1=pd.DataFrame(data)
print(df1)
Age city1 country1 city2 country2
0 20 ny usa london usa
1 30 london usa edinburg uk
2 19 NaN usa NaN NaN
3 21 NaN usa tampa NaN
现在我想创建一个新数据框,其中 age
列值根据离开 age
列的列数的一半重复。在上面留下年龄列的数据框中,有四列的一半是 2。因此,年龄列值必须重复两次。形成新的年龄列后,我需要追加 city1,
country1
有一行和 city2,country2
作为第二行(类似于例外输出中所示的内容)。尽管我能够将值重复为列表并尝试从其他列中获取值作为列表并附加为如下所示的行:-
code:-
#for repeating the value.
main_list = np.repeat(df1['Age'],2)
#for getting the column values
r=[]
for i in range(len(df1)):
r.append(df1.iloc[:,1:3].loc[i].values.tolist())
print(r)
[['ny', 'usa'], ['london', 'usa'], [nan, 'usa'], [nan, 'usa']]
但如您所见,它只给出了 city1,country1
的值,但没有给出 city2,country2
的值,这在将列表 r 值作为行附加到新数据框时引发错误,如下所示:-
newdata = {'Age':main_list}
res=pd.DataFrame(newdata)
print(res)
Age
0 20
0 20
1 30
1 30
2 19
2 19
3 21
3 21
res.loc[len(res)] = r
print(res)
ValueError: cannot set a row with mismatched columns
如何获取例外值列表并创建如下所示的数据框:-
异常输出:-
r =[['ny', 'usa'], ['london', 'usa'],['london', 'usa'],['edinburg','uk'],
[nan, 'usa'],[nan,nan],[nan, 'usa'],['tampa',nan]]
最终数据框:-
Age city country
0 20 'ny' 'usa'
0 20 'london' 'usa'
1 30 'london' 'usa'
1 30 'edinburg''uk'
2 19 NaN 'usa'
2 19 NaN NaN
3 21 NaN 'usa'
3 21 'tampa' NaN
您可以使用 wide_to_long
:
(pd
.wide_to_long(df1.reset_index(),
stubnames=['city', 'country'], i=['index', 'Age'], j='id')
.droplevel(-1)
.reset_index('Age')
)
输出:
Age city country
index
0 20 ny usa
0 20 london usa
1 30 london usa
1 30 edinburg uk
2 19 NaN usa
2 19 NaN NaN
3 21 NaN usa
3 21 tampa NaN
您也可以使用 janitor
提供的函数 pivot_longer
:
import janitor
df1.pivot_longer('Age', names_to = ['.value', 'group'], names_pattern = '(\D+)(\d+)')
Age group city country
0 20 1 ny usa
1 30 1 london usa
2 19 1 NaN usa
3 21 1 NaN usa
4 20 2 london usa
5 30 2 edinburg uk
6 19 2 NaN NaN
7 21 2 tampa NaN