另存为数据框 python

Question

我真的是 python 的新手，所以我正在做一个咨询，我希望结果像数据帧一样保存，而不是只在终端中打印。这是我的代码：

service = Service("https://www.mousemine.org/mousemine/service")
query = service.new_query("Gene")
query.add_view(
    "primaryIdentifier", "symbol", "organism.name",
    "homologues.homologue.primaryIdentifier", "homologues.homologue.symbol",
    "homologues.homologue.organism.name", "homologues.type",
    "homologues.dataSets.name"
)
query.add_constraint("homologues.type", "NONE OF", ["horizontal gene transfer", "least diverged horizontal gene transfer"], code = "B")
query.add_constraint("Gene", "LOOKUP", "ENSMUSG00000026981,ENSMUSG00000068039,ENSMUSG00000035007,ENSMUSG00000022972,", "M. musculus", code = "A")
query.add_constraint("homologues.homologue.organism.name", "=", "Homo sapiens", code = "C")
query.add_constraint("homologues.dataSets.name", "=", "Mouse/Human Orthologies from MGI", code = "D")

   for row in query.rows():
    print(row["primaryIdentifier"], row["symbol"], row["organism.name"], \
        row["homologues.homologue.primaryIdentifier"], 
        row["homologues.homologue.symbol"], \
        row["homologues.homologue.organism.name"], row["homologues.type"], \
        row["homologues.dataSets.name"])

这是我得到的结果

MGI:1915251 Cfap298 Mus musculus 56683 CFAP298 Homo sapiens orthologue Mouse/Human 来自 MGI 的直系同源物 MGI:2144506 Rundc1 Mus musculus 146923 RUNDC1 Homo sapiens orthologue Mouse/Human 来自 MGI 的直系同源物 MGI:96547 Il1rn Mus musculus 3557 IL1RN Homo sapiens orthologue Mouse/Human 来自 MGI 的直系同源物 MGI:98535 Tcp1 Mus musculus 6950 TCP1 Homo sapiens orthologue Mouse/Human 来自 MGI 的 Orthologies

完全没问题，但我需要在数据框中使用它。如果我可以使用带有所有 ID 的 table 进行咨询，而不必一个一个地写（因为有 14000 个），那将是惊人的。

Answer 1

Pandas 会帮助你。https://pandas.pydata.org。如果您遇到问题，请随时提出问题。

Answer 2

您的示例不可重现，因为我们无法构造查询，但我想，这应该可行：

import pandas as pd

df = pd.DataFrame(list(query.rows()))

# OR

df = pd.DataFrame(query.rows(), columns=query.views)

输出：

>>> df
  Gene.briefDescription                                   Gene.description   Gene.id  ...  Gene.homologues.homologue.organism.name Gene.homologues.type     Gene.homologues.dataSets.name
0                  None  FUNCTION: <B>Automated description from the Al...  23666503  ...                             Homo sapiens           orthologue  Mouse/Human Orthologies from MGI
1                  None  FUNCTION: <B>Automated description from the Al...  23341647  ...                             Homo sapiens           orthologue  Mouse/Human Orthologies from MGI
2                  None  FUNCTION: <B>Automated description from the Al...  23862751  ...                             Homo sapiens           orthologue  Mouse/Human Orthologies from MGI
3                  None  FUNCTION: <B>Automated description from the Al...  23677242  ...                             Homo sapiens           orthologue  Mouse/Human Orthologies from MGI

[4 rows x 21 columns]

Answer 3

使用循环：

   df = pd.DataFrame({})

   for row in query.rows():
    data = {'primaryIdentifier': row["primaryIdentifier"],
            'symbol': row["symbol"],
            'organism.name': row["organism.name"],
            'homologues.homologue.primaryIdentifier': row["homologues.homologue.primaryIdentifier"],
            'homologues.homologue.symbol': row["homologues.homologue.symbol"],
            'homologues.homologue.organism.name': row["homologues.homologue.organism.name"],
            'homologues.type': row["homologues.type"],
            'homologues.dataSets.name': row["homologues.dataSets.name"]}
    df = df.append(pd.DataFrame(data), ignore_index = True)

请注意，您必须执行 import pandas as pd 并确保数据为 str。

另存为数据框 python

save as dataframe python

python

bioinformatics

rna-seq