如何重塑 Python 中的数据
How to reshape data in Python
我有一个只包含一行但包含多列的数据框:
我想每 5 列换一行。这是预期的输出:
原始数据在列表中,我转换为数据框。我不知道通过列表重塑是否更容易,但这里有一个示例列表供您试用,原始列表真的很长。 ['review: I stayed around 11 days and enjoyed stay very much.', 'compound: 0.5106, ','neg: 0.0, ','neu: 0.708, ','pos: 0.292, ','review: Plans for weekend stay canceled due to Coronavirus shutdown.','compound: 0.0, ','neg: 0.0, ','neu: 1.0, ','pos: 0.0, ']
更容易将其解析为列表,然后将其转换为数据帧。
- 对于每个条目,用“:”拆分条目并将 key\value 添加到字典中
- 将字典转换为数据框
试试这个:
import pandas as pd
lst = ['review: I stayed around 11 days and enjoyed stay very much.', 'compound: 0.5106, ','neg: 0.0, ','neu: 0.708, ','pos: 0.292, ',
'review: Plans for weekend stay canceled due to Coronavirus shutdown.','compound: 0.0, ','neg: 0.0, ','neu: 1.0, ','pos: 0.0, ']
dd = {}
for x in lst:
sp = x.split(':')
if sp[0] in dd:
dd[sp[0]].append(sp[1].replace(',',"").strip())
else:
dd[sp[0]] = [sp[1].replace(',',"").strip()]
print(dd)
print(pd.DataFrame(dd).to_string(index=False))
输出
review compound neg neu pos
I stayed around 11 days and enjoyed stay very much. 0.5106 0.0 0.708 0.292
Plans for weekend stay canceled due to Coronavirus shutdown. 0.0 0.0 1.0 0.0
def main():
data_new = ['review: I stayed around 11 days and enjoyed stay very much.', 'compound: 0.5106, ','neg: 0.0, ','neu: 0.708, ','pos: 0.292, ','review: Plans for weekend stay canceled due to Coronavirus shutdown.','compound: 0.0, ','neg: 0.0, ','neu: 1.0, ','pos: 0.0, ']
len_data = len(data_new)
proc_row_mul_of_five = len_data / 5
j = 5
k = 0
for i in range(0,proc_row_mul_of_five):
print(data_new[k:j])
k = i + 5
j = j + 5
主要()
您可以尝试使用字典
lst = ['review: I stayed around 11 days and enjoyed stay very much.', 'compound: 0.5106, ','neg: 0.0, ','neu: 0.708, ','pos: 0.292, ',
'review: Plans for weekend stay canceled due to Coronavirus shutdown.','compound: 0.0, ','neg: 0.0, ','neu: 1.0, ','pos: 0.0, ']
from collections import defaultdict
import pandas as pd
data_dict = defaultdict(list)
for _ in lst:
header, value = _.split(':')
data_dict [header].append(value.strip())
pd.DataFrame.from_dict(data_dict)
输出是
您可以使用 numpy 轻松做到这一点
import numpy as np
import pandas as pd
lis = np.array(['review: I stayed around 11 days and enjoyed stay very much.', 'compound: 0.5106, ','neg: 0.0, ','neu: 0.708, ','pos: 0.292, ','review: Plans for weekend stay canceled due to Coronavirus shutdown.','compound: 0.0, ','neg: 0.0, ','neu: 1.0, ','pos: 0.0, '])
columns = 5
t = np.char.split(lis,":")
cols,vals = list(zip(*t))
dff = pd.DataFrame(np.split(np.array(vals),len(vals)/columns),
columns=cols[:columns]).replace(",","",regex=True)
我有一个只包含一行但包含多列的数据框:
我想每 5 列换一行。这是预期的输出:
原始数据在列表中,我转换为数据框。我不知道通过列表重塑是否更容易,但这里有一个示例列表供您试用,原始列表真的很长。 ['review: I stayed around 11 days and enjoyed stay very much.', 'compound: 0.5106, ','neg: 0.0, ','neu: 0.708, ','pos: 0.292, ','review: Plans for weekend stay canceled due to Coronavirus shutdown.','compound: 0.0, ','neg: 0.0, ','neu: 1.0, ','pos: 0.0, ']
更容易将其解析为列表,然后将其转换为数据帧。
- 对于每个条目,用“:”拆分条目并将 key\value 添加到字典中
- 将字典转换为数据框
试试这个:
import pandas as pd
lst = ['review: I stayed around 11 days and enjoyed stay very much.', 'compound: 0.5106, ','neg: 0.0, ','neu: 0.708, ','pos: 0.292, ',
'review: Plans for weekend stay canceled due to Coronavirus shutdown.','compound: 0.0, ','neg: 0.0, ','neu: 1.0, ','pos: 0.0, ']
dd = {}
for x in lst:
sp = x.split(':')
if sp[0] in dd:
dd[sp[0]].append(sp[1].replace(',',"").strip())
else:
dd[sp[0]] = [sp[1].replace(',',"").strip()]
print(dd)
print(pd.DataFrame(dd).to_string(index=False))
输出
review compound neg neu pos
I stayed around 11 days and enjoyed stay very much. 0.5106 0.0 0.708 0.292
Plans for weekend stay canceled due to Coronavirus shutdown. 0.0 0.0 1.0 0.0
def main():
data_new = ['review: I stayed around 11 days and enjoyed stay very much.', 'compound: 0.5106, ','neg: 0.0, ','neu: 0.708, ','pos: 0.292, ','review: Plans for weekend stay canceled due to Coronavirus shutdown.','compound: 0.0, ','neg: 0.0, ','neu: 1.0, ','pos: 0.0, ']
len_data = len(data_new)
proc_row_mul_of_five = len_data / 5
j = 5
k = 0
for i in range(0,proc_row_mul_of_five):
print(data_new[k:j])
k = i + 5
j = j + 5
主要()
您可以尝试使用字典
lst = ['review: I stayed around 11 days and enjoyed stay very much.', 'compound: 0.5106, ','neg: 0.0, ','neu: 0.708, ','pos: 0.292, ',
'review: Plans for weekend stay canceled due to Coronavirus shutdown.','compound: 0.0, ','neg: 0.0, ','neu: 1.0, ','pos: 0.0, ']
from collections import defaultdict
import pandas as pd
data_dict = defaultdict(list)
for _ in lst:
header, value = _.split(':')
data_dict [header].append(value.strip())
pd.DataFrame.from_dict(data_dict)
输出是
您可以使用 numpy 轻松做到这一点
import numpy as np
import pandas as pd
lis = np.array(['review: I stayed around 11 days and enjoyed stay very much.', 'compound: 0.5106, ','neg: 0.0, ','neu: 0.708, ','pos: 0.292, ','review: Plans for weekend stay canceled due to Coronavirus shutdown.','compound: 0.0, ','neg: 0.0, ','neu: 1.0, ','pos: 0.0, '])
columns = 5
t = np.char.split(lis,":")
cols,vals = list(zip(*t))
dff = pd.DataFrame(np.split(np.array(vals),len(vals)/columns),
columns=cols[:columns]).replace(",","",regex=True)