另存为数据框 python
save as dataframe python
我真的是 python 的新手,所以我正在做一个咨询,我希望结果像数据帧一样保存,而不是只在终端中打印。这是我的代码:
service = Service("https://www.mousemine.org/mousemine/service")
query = service.new_query("Gene")
query.add_view(
"primaryIdentifier", "symbol", "organism.name",
"homologues.homologue.primaryIdentifier", "homologues.homologue.symbol",
"homologues.homologue.organism.name", "homologues.type",
"homologues.dataSets.name"
)
query.add_constraint("homologues.type", "NONE OF", ["horizontal gene transfer", "least diverged horizontal gene transfer"], code = "B")
query.add_constraint("Gene", "LOOKUP", "ENSMUSG00000026981,ENSMUSG00000068039,ENSMUSG00000035007,ENSMUSG00000022972,", "M. musculus", code = "A")
query.add_constraint("homologues.homologue.organism.name", "=", "Homo sapiens", code = "C")
query.add_constraint("homologues.dataSets.name", "=", "Mouse/Human Orthologies from MGI", code = "D")
for row in query.rows():
print(row["primaryIdentifier"], row["symbol"], row["organism.name"], \
row["homologues.homologue.primaryIdentifier"],
row["homologues.homologue.symbol"], \
row["homologues.homologue.organism.name"], row["homologues.type"], \
row["homologues.dataSets.name"])
这是我得到的结果
MGI:1915251 Cfap298 Mus musculus 56683 CFAP298 Homo sapiens orthologue Mouse/Human 来自 MGI 的直系同源物
MGI:2144506 Rundc1 Mus musculus 146923 RUNDC1 Homo sapiens orthologue Mouse/Human 来自 MGI 的直系同源物
MGI:96547 Il1rn Mus musculus 3557 IL1RN Homo sapiens orthologue Mouse/Human 来自 MGI 的直系同源物
MGI:98535 Tcp1 Mus musculus 6950 TCP1 Homo sapiens orthologue Mouse/Human 来自 MGI 的 Orthologies
完全没问题,但我需要在数据框中使用它。如果我可以使用带有所有 ID 的 table 进行咨询,而不必一个一个地写(因为有 14000 个),那将是惊人的。
Pandas 会帮助你。https://pandas.pydata.org。如果您遇到问题,请随时提出问题。
您的示例不可重现,因为我们无法构造查询,但我想,这应该可行:
import pandas as pd
df = pd.DataFrame(list(query.rows()))
# OR
df = pd.DataFrame(query.rows(), columns=query.views)
输出:
>>> df
Gene.briefDescription Gene.description Gene.id ... Gene.homologues.homologue.organism.name Gene.homologues.type Gene.homologues.dataSets.name
0 None FUNCTION: <B>Automated description from the Al... 23666503 ... Homo sapiens orthologue Mouse/Human Orthologies from MGI
1 None FUNCTION: <B>Automated description from the Al... 23341647 ... Homo sapiens orthologue Mouse/Human Orthologies from MGI
2 None FUNCTION: <B>Automated description from the Al... 23862751 ... Homo sapiens orthologue Mouse/Human Orthologies from MGI
3 None FUNCTION: <B>Automated description from the Al... 23677242 ... Homo sapiens orthologue Mouse/Human Orthologies from MGI
[4 rows x 21 columns]
使用循环:
df = pd.DataFrame({})
for row in query.rows():
data = {'primaryIdentifier': row["primaryIdentifier"],
'symbol': row["symbol"],
'organism.name': row["organism.name"],
'homologues.homologue.primaryIdentifier': row["homologues.homologue.primaryIdentifier"],
'homologues.homologue.symbol': row["homologues.homologue.symbol"],
'homologues.homologue.organism.name': row["homologues.homologue.organism.name"],
'homologues.type': row["homologues.type"],
'homologues.dataSets.name': row["homologues.dataSets.name"]}
df = df.append(pd.DataFrame(data), ignore_index = True)
请注意,您必须执行 import pandas as pd
并确保数据为 str
。
我真的是 python 的新手,所以我正在做一个咨询,我希望结果像数据帧一样保存,而不是只在终端中打印。这是我的代码:
service = Service("https://www.mousemine.org/mousemine/service")
query = service.new_query("Gene")
query.add_view(
"primaryIdentifier", "symbol", "organism.name",
"homologues.homologue.primaryIdentifier", "homologues.homologue.symbol",
"homologues.homologue.organism.name", "homologues.type",
"homologues.dataSets.name"
)
query.add_constraint("homologues.type", "NONE OF", ["horizontal gene transfer", "least diverged horizontal gene transfer"], code = "B")
query.add_constraint("Gene", "LOOKUP", "ENSMUSG00000026981,ENSMUSG00000068039,ENSMUSG00000035007,ENSMUSG00000022972,", "M. musculus", code = "A")
query.add_constraint("homologues.homologue.organism.name", "=", "Homo sapiens", code = "C")
query.add_constraint("homologues.dataSets.name", "=", "Mouse/Human Orthologies from MGI", code = "D")
for row in query.rows():
print(row["primaryIdentifier"], row["symbol"], row["organism.name"], \
row["homologues.homologue.primaryIdentifier"],
row["homologues.homologue.symbol"], \
row["homologues.homologue.organism.name"], row["homologues.type"], \
row["homologues.dataSets.name"])
这是我得到的结果
MGI:1915251 Cfap298 Mus musculus 56683 CFAP298 Homo sapiens orthologue Mouse/Human 来自 MGI 的直系同源物 MGI:2144506 Rundc1 Mus musculus 146923 RUNDC1 Homo sapiens orthologue Mouse/Human 来自 MGI 的直系同源物 MGI:96547 Il1rn Mus musculus 3557 IL1RN Homo sapiens orthologue Mouse/Human 来自 MGI 的直系同源物 MGI:98535 Tcp1 Mus musculus 6950 TCP1 Homo sapiens orthologue Mouse/Human 来自 MGI 的 Orthologies
完全没问题,但我需要在数据框中使用它。如果我可以使用带有所有 ID 的 table 进行咨询,而不必一个一个地写(因为有 14000 个),那将是惊人的。
Pandas 会帮助你。https://pandas.pydata.org。如果您遇到问题,请随时提出问题。
您的示例不可重现,因为我们无法构造查询,但我想,这应该可行:
import pandas as pd
df = pd.DataFrame(list(query.rows()))
# OR
df = pd.DataFrame(query.rows(), columns=query.views)
输出:
>>> df
Gene.briefDescription Gene.description Gene.id ... Gene.homologues.homologue.organism.name Gene.homologues.type Gene.homologues.dataSets.name
0 None FUNCTION: <B>Automated description from the Al... 23666503 ... Homo sapiens orthologue Mouse/Human Orthologies from MGI
1 None FUNCTION: <B>Automated description from the Al... 23341647 ... Homo sapiens orthologue Mouse/Human Orthologies from MGI
2 None FUNCTION: <B>Automated description from the Al... 23862751 ... Homo sapiens orthologue Mouse/Human Orthologies from MGI
3 None FUNCTION: <B>Automated description from the Al... 23677242 ... Homo sapiens orthologue Mouse/Human Orthologies from MGI
[4 rows x 21 columns]
使用循环:
df = pd.DataFrame({})
for row in query.rows():
data = {'primaryIdentifier': row["primaryIdentifier"],
'symbol': row["symbol"],
'organism.name': row["organism.name"],
'homologues.homologue.primaryIdentifier': row["homologues.homologue.primaryIdentifier"],
'homologues.homologue.symbol': row["homologues.homologue.symbol"],
'homologues.homologue.organism.name': row["homologues.homologue.organism.name"],
'homologues.type': row["homologues.type"],
'homologues.dataSets.name': row["homologues.dataSets.name"]}
df = df.append(pd.DataFrame(data), ignore_index = True)
请注意,您必须执行 import pandas as pd
并确保数据为 str
。