Python：根据条件拆分单列

Question

我有一个本应是 json 的 csv，我正在尝试将其排序为多列

它可能是这样的 json（如果有帮助的话）：

{"username":"jane.doe@gmail.com"
   "app": [
        {"appid":"123456"
        "appname:"apppname"
        "scopes":["scope1","scope2"]}
        {"appid":"23456
        "appname:"apppname"2
        "scopes":["scope1","scope2"]}
{"username":john.doe@gmail.com"
   ...}

这是数据

Value
User: jane.doe@gmail.com
Client ID: CI1
anonymous: False
displayText: app1
nativeApp: False
userKey: uk1
scopes:
http://scope1.com
http://scope2.com
Client ID: CI2
anonymous: False
displayText: app2
nativeApp: False
userKey: uk2
scopes:
http://scopeapp2-1.com
http://scopeapp2-1.com

继续下去，用户可以拥有任意数量的应用程序，并且应用程序可以有多个范围。预期输出

User	anonymous	displayText	nativeApp	scopes	Client_id	userKey
jane.doe@gmail.com	false	app1	false	http://scope1.com http://scope2.com	CI1	UK1
jane.doe@gmail.com	false	app2	false	https://scopeapp2-1.com http://scopeapp2-2.com	CI2	UK2

所以我做到了，但我觉得我的代码有点难看，想知道你是否有更好的想法

for index, row in df.iterrows():
    if 'User' in df.at[index,'value']:
        x=index
        df.at[x,'User']=df.at[index,'value']
    elif 'Client ID' in df.at[index,'value']:
        df.at[x,'Client_ID']=df.at[index,'value']
        x=x+1
    elif 'anonymous' in df.at[index,'value']:
        df.at[x,'anonymous']=df.at[index,'value']       
    elif 'displayText' in df.at[index,'value']:
        df.at[x,'displayText']=df.at[index,'value']   
    elif 'nativeApp' in df.at[index,'value']:
        df.at[x,'nativeApp']=df.at[index,'value']   
    elif 'userKey' in df.at[index,'value']:
        df.at[x,'userKey']=df.at[index,'value']
    elif 'http' in df.at[index,'value']:
        df.at[x,'scopes']=df.at[x,'scopes'] + ' ' +df.at[index,'value']

然后我将删除空行。我想知道是否有更好的方法来做到这一点，所有这些 elif 都不是很干净...

如有任何帮助，我们将不胜感激。

Answer 1

我假设你的意思是你有一个 csv 文件。

如果您可以依靠结构，即 1 个用户、1 到 N 个客户端 ID 部分以及一个范围部分带有 1 .. N 个网址，您可以这样做：

if __name__ == '__main__':
    from itertools import islice
    from pprint import pprint
    data = {}


    def fieldv(line):
        return line.rsplit(':', 1)[1].strip()


    users = []
    client_data = []
    user_record = None
    scopes = []
    with open(..., 'r') as infile:
        while line := infile.readline():
            if line.startswith('User'):
                user = fieldv(line)
                client_data = []
                user_record = {'User': user, 'client_data': client_data}
                users.append(user_record)
            elif line.startswith('http://'):
                scopes.append(line.strip())
            else:
                d = list(islice(infile, 5))
                scopes = []
                app = {'Client ID': fieldv(line),
                       'anonymous': fieldv(d[0]),
                       # other fields d[1], d[2]...,
                       'scopes': scopes}
                client_data.append(app)

正在使用提供的数据打印用户列表：

[{'User': 'jane.doe@gmail.com',
  'client_data': [{'Client ID': 'CI1',
                   'anonymous': 'False',
                   'scopes': ['http://scope1.com', 'http://scope2.com']},
                  {'Client ID': 'CI2',
                   'anonymous': 'False',
                   'scopes': ['http://scopeapp2-1.com',
                              'http://scopeapp2-1.com']}]}]

Answer 2

您的文件非常接近 YaML 插入缺少的缩进和列表分隔符然后使用 json_normalize()

加载很简单

import pandas as pd
import io
from pathlib import Path
import yaml

raw = """User: jane.doe@gmail.com
Client ID: CI1
anonymous: False
displayText: app1
nativeApp: False
userKey: uk1
scopes:
http://scope1.com
http://scope2.com
Client ID: CI2
anonymous: False
displayText: app2
nativeApp: False
userKey: uk2
scopes:
http://scopeapp2-1.com
http://scopeapp2-1.com"""

fn = Path.cwd().joinpath("so.yaml")
with io.StringIO(raw) as f, open(fn, "w") as fw:
    while True:
        suffix = ""
        l = f.readline()
        if not l: break
        elif l.startswith("User:"): 
            prefix = ""
            suffix = "\napp:"
        elif l.startswith("Client ID:"): prefix = "  - "
        elif (" " in l) or l.startswith("scopes:"): prefix = "    "
        else: prefix = "    - "
        fw.write(f"{prefix}{l.strip()}{suffix}\n")

    
with open(fn) as f: myyaml = yaml.safe_load(f)
    
pd.json_normalize(myyaml, record_path="app", meta="User")

	Client ID	anonymous	displayText	nativeApp	userKey	scopes	User
0	CI1	False	app1	False	uk1	['http://scope1.com', 'http://scope2.com']	jane.doe@gmail.com
1	CI2	False	app2	False	uk2	['http://scopeapp2-1.com', 'http://scopeapp2-1.com']	jane.doe@gmail.com

Python：根据条件拆分单列

Python: split single column based on conditions

python

pandas

jupyter

data-cleaning