您如何使用 pd.read_clipboard 读取包含列表的数据框?
How do you read in a dataframe with lists using pd.read_clipboard?
这是来自另一个 的一些数据:
positive negative neutral
1 [marvel, moral, bold, destiny] [] [view, should]
2 [beautiful] [complicated, need] []
3 [celebrate] [crippling, addiction] [big]
我首先要做的是在所有单词中添加引号,然后:
import ast
df = pd.read_clipboard(sep='\s{2,}')
df = df.applymap(ast.literal_eval)
有没有更聪明的方法来做到这一点?
我是这样做的:
df = pd.read_clipboard(sep='\s{2,}', engine='python')
df = df.apply(lambda x: x.str.replace(r'[\[\]]*', '').str.split(',\s*', expand=False))
PS 我确定 - 一定有更好的方法...
字符串列表
对于基本结构,您可以使用 yaml 而无需添加引号:
import yaml
df = pd.read_clipboard(sep='\s{2,}').applymap(yaml.load)
type(df.iloc[0, 0])
Out: list
数值数据列表
在特定条件下,您可以将列表读取为字符串,并使用 literal_eval
(或 pd.eval
,如果它们是简单列表)对其进行转换。
例如,
A B
0 [1, 2, 3] 11
1 [4, 5, 6] 12
首先,确保列之间至少有两个空格,然后复制您的数据和运行以下内容:
import ast
df = pd.read_clipboard(sep=r'\s{2,}', engine='python')
df['A'] = df['A'].map(ast.literal_eval)
df
A B
0 [1, 2, 3] 11
1 [4, 5, 6] 12
df.dtypes
A object
B int64
dtype: object
Notes
for multiple columns, use applymap
in the conversion step:
df[['A', 'B', ...]] = df[['A', 'B', ...]].applymap(ast.literal_eval)
if your columns can contain NaNs, define a function that can handle them appropriately:
parser = lambda x: x if pd.isna(x) else ast.literal_eval(x)
df[['A', 'B', ...]] = df[['A', 'B', ...]].applymap(parser)
if your columns contain lists of strings, you will need something like yaml.load
(requires installation) to parse them instead if you don't want to manually add
quotes to the data. See above.
另一个版本:
df.applymap(lambda x:
ast.literal_eval("[" + re.sub(r"[[\]]", "'",
re.sub("[,\s]+", "','", x)) + "]"))
另一种选择是
In [43]: df.applymap(lambda x: x[1:-1].split(', '))
Out[43]:
positive negative neutral
1 [marvel, moral, bold, destiny] [] [view, should]
2 [beautiful] [complicated, need] []
3 [celebrate] [crippling, addiction] [big]
请注意,这假定每个单元格中的第一个和最后一个字符是 [
和 ]
。
它还假定逗号后正好有一个 space。
来自@MaxU 的帮助
df = pd.read_clipboard(sep='\s{2,}', engine='python')
然后:
>>> df.apply(lambda col: col.str[1:-1].str.split(', '))
positive negative neutral
1 [marvel, moral, bold, destiny] [] [view, should]
2 [beautiful] [complicated, need] []
3 [celebrate] [crippling, addiction] [big]
>>> df.apply(lambda col: col.str[1:-1].str.split()).loc[3, 'negative']
['crippling', 'addiction']
根据提出类似解决方案的@unutbu 的笔记:
assumes the first and last character in each cell is [ and ]. It also assumes there is exactly one space after the commas.
这是来自另一个
positive negative neutral
1 [marvel, moral, bold, destiny] [] [view, should]
2 [beautiful] [complicated, need] []
3 [celebrate] [crippling, addiction] [big]
我首先要做的是在所有单词中添加引号,然后:
import ast
df = pd.read_clipboard(sep='\s{2,}')
df = df.applymap(ast.literal_eval)
有没有更聪明的方法来做到这一点?
我是这样做的:
df = pd.read_clipboard(sep='\s{2,}', engine='python')
df = df.apply(lambda x: x.str.replace(r'[\[\]]*', '').str.split(',\s*', expand=False))
PS 我确定 - 一定有更好的方法...
字符串列表
对于基本结构,您可以使用 yaml 而无需添加引号:
import yaml
df = pd.read_clipboard(sep='\s{2,}').applymap(yaml.load)
type(df.iloc[0, 0])
Out: list
数值数据列表
在特定条件下,您可以将列表读取为字符串,并使用 literal_eval
(或 pd.eval
,如果它们是简单列表)对其进行转换。
例如,
A B
0 [1, 2, 3] 11
1 [4, 5, 6] 12
首先,确保列之间至少有两个空格,然后复制您的数据和运行以下内容:
import ast
df = pd.read_clipboard(sep=r'\s{2,}', engine='python')
df['A'] = df['A'].map(ast.literal_eval)
df
A B
0 [1, 2, 3] 11
1 [4, 5, 6] 12
df.dtypes
A object
B int64
dtype: object
Notes
for multiple columns, use
applymap
in the conversion step:df[['A', 'B', ...]] = df[['A', 'B', ...]].applymap(ast.literal_eval)
if your columns can contain NaNs, define a function that can handle them appropriately:
parser = lambda x: x if pd.isna(x) else ast.literal_eval(x) df[['A', 'B', ...]] = df[['A', 'B', ...]].applymap(parser)
if your columns contain lists of strings, you will need something like
yaml.load
(requires installation) to parse them instead if you don't want to manually add quotes to the data. See above.
另一个版本:
df.applymap(lambda x:
ast.literal_eval("[" + re.sub(r"[[\]]", "'",
re.sub("[,\s]+", "','", x)) + "]"))
另一种选择是
In [43]: df.applymap(lambda x: x[1:-1].split(', '))
Out[43]:
positive negative neutral
1 [marvel, moral, bold, destiny] [] [view, should]
2 [beautiful] [complicated, need] []
3 [celebrate] [crippling, addiction] [big]
请注意,这假定每个单元格中的第一个和最后一个字符是 [
和 ]
。
它还假定逗号后正好有一个 space。
来自@MaxU 的帮助
df = pd.read_clipboard(sep='\s{2,}', engine='python')
然后:
>>> df.apply(lambda col: col.str[1:-1].str.split(', '))
positive negative neutral
1 [marvel, moral, bold, destiny] [] [view, should]
2 [beautiful] [complicated, need] []
3 [celebrate] [crippling, addiction] [big]
>>> df.apply(lambda col: col.str[1:-1].str.split()).loc[3, 'negative']
['crippling', 'addiction']
根据提出类似解决方案的@unutbu 的笔记:
assumes the first and last character in each cell is [ and ]. It also assumes there is exactly one space after the commas.