如何在包含全名的列表中提取名称的第一部分（名字）并丢弃名称的一部分

Question

我有一个包含一列姓名的 CSV 文件。我想要的是 python 代码来检查列中的每个名称并查看名称是否有多个部分，它只需要第一部分并将其附加到新的 CSV 文件列表中，同时它会跳过任何名称在旧的 CSV 文件中只有一部分。

例如

输入CSV文件

Column1
Metarhizium robertsii ARSEF 23
Danio rerio
Parascaris equorum
Hevea
Gossypium
Vitis vinifera

输出的 CSV 文件应该是

Column1
Metarhizium
Danio
Parascaris
Vitis

Answer 1

名字总是用 space 分隔吗？

您可以使用 python 中的 re 模块并使用正则表达式，或者如果您想要一些简单的东西，您还可以使用 python 中的 str.split() 方法：

for name in column:
    split_name = name.split(' ', 1) #Splits the name once after the first space and returns a list of strings
    if len(split_name) > 1: new_csv.write(split_name[0]) #write the first part of the split up name into the new csv

Answer 2

您可以拆分然后应用函数 len 来屏蔽结果，然后获取行中筛选的第一个元素。

import pandas as pd
df = pd.read_csv("input.csv")
splitted = df.Column1.apply(lambda x: x.split())
output = splitted[splitted.apply(len) > 1].apply(lambda x: x[0])
output.to_csv("output.csv")
# > ,Column1
#  0,Metarhizium
#  1,Danio
#  2,Parascaris
#  5,Vitis

Answer 3

您可以先为那些有多个单词的值创建一个标志，然后使用 apply() 方法并编写一个 lambda 函数来检索所有名称中的第一个单词。

flag = df.loc[:,'Column1'].str.split(' ').apply(len) > 1
split_names = lambda name: name.split()[0] if (len(name.split())) else None
new_df = df.loc[flag,'Column1'].apply(split_names)
new_df.to_csv('output.csv', index=False)

如何在包含全名的列表中提取名称的第一部分（名字）并丢弃名称的一部分

how to extract first part of name(first name) in a list that contains full names and discard names with one part

python

csv

list

bioinformatics

biopython

如何在包含全名的列表中提取名称的第一部分（名字）并丢​​弃名称的一部分

how to extract first part of name(first name) in a list that contains full names and discard names with one part

python

csv

list

bioinformatics

biopython

如何在包含全名的列表中提取名称的第一部分（名字）并丢弃名称的一部分