Jupyter 和 CSV 文件

Jupyther and CSV files

我有 csv 文件,我想对其进行一些分析,但没有正常的 csv 文件.....无论是谁决定创建它们,都没有在文件中保持相同的格式.... .

在大多数情况下,它们遵循标准格式,数据列在列中,但偶尔会有一行包含一些 error/warning/info 文本。

因此 csv 列数据从第 1 列开始作为第 1 列中的日期和时间,然后从第 2 列到第 n 列将是数据,但每隔一段时间第 2 列将包含 error/warning/info 文本,但没有任何内容其他列 .

我可以轻松地从 csv 中排除这些行并对数据进行分析,但我想提取这些行并将它们分别存储在单独的数据框中......

但是我很难简单地做到这一点。我在这里遗漏了一个技巧,有没有办法用 juypter 在 csv 文件中简单地分离出这些数据?

您按照下面的代码示例描述数据。根据第二列是数字还是非数字,使用 read_csv() 后拆分很简单。

from pathlib import Path

csv = """date,seq,val0,val1,val2
2021-01-01,1.0,0.919113692407093,1.6229411332628496,0.24659242223048927
2021-01-02,11.473684210526315,0.07253225286428067,0.5829646480126915,0.8417325582368181
2021-01-03,21.94736842105263,0.32438619968096405,1.4561059102864153,0.09907995077630782
2021-01-04,32.421052631578945,0.7926071257043146,1.7922407755587069,0.398524618028244
2021-01-05,42.89473684210526,0.2157414433351048,0.42316983774076333,0.26429821215433835
2021-01-06,53.368421052631575,0.5880798026850204,0.30631991278000203,2.157299668724619
2021-01-07,63.84210526315789,0.05680775379053116,0.09056762487241565,1.8432282529150985
2021-01-08,74.3157894736842,0.8638058796950695,0.956874782181419,0.560113292182499
2021-01-09,84.78947368421052,0.8578723804393844,1.3962261744237703,1.8002069590315575
2021-01-01,Adipisci etincidunt quiquia consectetur numquam dolorem aliquam.
2021-01-10,95.26315789473684,0.1842964050114777,0.9421910982208783,1.1097524348417385
2021-01-11,105.73684210526315,0.26926150072049215,0.3263406301607237,0.8337896257615581
2021-01-03,Aliquam neque porro est.
2021-01-04,Quisquam labore dolorem amet dolore.
2021-01-12,116.21052631578947,0.1487208436794849,1.9384707893168265,1.1932374325424484
2021-01-13,126.68421052631578,0.9738540881030379,1.2959312690277112,1.9354291047422771
2021-01-14,137.15789473684208,0.1420363534166592,0.6564997473347189,0.7491839162744267
2021-01-02,Magnam modi voluptatem quaerat.
2021-01-05,Neque dolor dolore quisquam dolor ut.
2021-01-06,Dolorem porro aliquam quiquia.
2021-01-07,Sit modi adipisci porro porro eius ipsum quisquam.
2021-01-15,147.6315789473684,0.9961022971940973,0.13346940964659093,2.4870460594816794
2021-01-16,158.10526315789474,0.8866086488360403,1.7565870140977553,2.7345560454964826
2021-01-17,168.57894736842104,0.27548274054720157,1.0466205997810067,2.146515617796502
2021-01-18,179.05263157894734,0.5564778653140571,1.0674809651747388,2.1899218384075683
2021-01-19,189.52631578947367,0.20504429969811966,0.2887690704253574,0.005236244550076985
2021-01-20,200.0,0.15569496004718852,0.28625583495153517,1.3681772459983979
2021-01-08,Dolorem tempora dolor consectetur.
2021-01-09,Velit ipsum consectetur neque modi magnam quaerat.
2021-01-10,Dolor quaerat sit sit dolorem sit amet dolore.
"""

fname = Path.cwd().joinpath("mixed.csv")
with open(fname, "w") as f: f.write(csv)
df = pd.read_csv(fname)

mask = pd.to_numeric(df["seq"], errors="coerce").isna()
dfdata = df.loc[~mask].assign(seq=lambda d: d["seq"].astype(float))
dfmsg = df.loc[mask].pipe(lambda d: d.drop(columns=[c for c in d.columns if d[c].isna().all()]))