根据分隔符将数据框列拆分为两列
Split dataframe column into two columns based on delimiter
我正在为分类预处理文本,并像这样导入我的数据集:
dataset = pd.read_csv('lyrics.csv', delimiter = '\t', quoting = 2)
dataset
在终端上打印:
lyrics,classification
0 I should have known better with a girl like yo...
1 You can shake an apple off an apple tree\nShak...
2 It's been a hard day's night\nAnd I've been wo...
3 Michelle, ma belle\nThese are words that go to...
然而,当我使用 spyder
仔细检查变量 dataset
时,我发现我只有一列,而不是所需的两列。
考虑到歌词本身有逗号和“,”分隔符不起作用,
如何更正上面的数据框以便:
1) lyrics
一栏
2) classification
一栏
每一行都有相应的数据?
如果您的歌词本身不包含逗号(很可能包含逗号),那么您可以将 read_csv
与 delimiter=','
结合使用。
但是,如果这不是一个选项,您可以使用 str.rsplit
:
dataset.iloc[:, 0].str.rsplit(',', expand=True)
df
lyrics,classification
0 I should have known better with a girl like yo...
1 You can shake an...,0
2 It's been a hard day's night...,0
df = df.iloc[:, 0].str.rsplit(',', 1, expand=True)
df.columns = ['lyrics', 'classification']
df
lyrics classification
0 I should have known better with a girl like yo... 0
1 You can shake an... 0
2 It's been a hard day's night... 0
我正在为分类预处理文本,并像这样导入我的数据集:
dataset = pd.read_csv('lyrics.csv', delimiter = '\t', quoting = 2)
dataset
在终端上打印:
lyrics,classification
0 I should have known better with a girl like yo...
1 You can shake an apple off an apple tree\nShak...
2 It's been a hard day's night\nAnd I've been wo...
3 Michelle, ma belle\nThese are words that go to...
然而,当我使用 spyder
仔细检查变量 dataset
时,我发现我只有一列,而不是所需的两列。
考虑到歌词本身有逗号和“,”分隔符不起作用,
如何更正上面的数据框以便:
1) lyrics
2) classification
每一行都有相应的数据?
如果您的歌词本身不包含逗号(很可能包含逗号),那么您可以将 read_csv
与 delimiter=','
结合使用。
但是,如果这不是一个选项,您可以使用 str.rsplit
:
dataset.iloc[:, 0].str.rsplit(',', expand=True)
df
lyrics,classification
0 I should have known better with a girl like yo...
1 You can shake an...,0
2 It's been a hard day's night...,0
df = df.iloc[:, 0].str.rsplit(',', 1, expand=True)
df.columns = ['lyrics', 'classification']
df
lyrics classification
0 I should have known better with a girl like yo... 0
1 You can shake an... 0
2 It's been a hard day's night... 0