Python:根据条件拆分单列
Python: split single column based on conditions
我有一个本应是 json 的 csv,我正在尝试将其排序为多列
它可能是这样的 json(如果有帮助的话):
{"username":"jane.doe@gmail.com"
"app": [
{"appid":"123456"
"appname:"apppname"
"scopes":["scope1","scope2"]}
{"appid":"23456
"appname:"apppname"2
"scopes":["scope1","scope2"]}
{"username":john.doe@gmail.com"
...}
这是数据
Value
User: jane.doe@gmail.com
Client ID: CI1
anonymous: False
displayText: app1
nativeApp: False
userKey: uk1
scopes:
http://scope1.com
http://scope2.com
Client ID: CI2
anonymous: False
displayText: app2
nativeApp: False
userKey: uk2
scopes:
http://scopeapp2-1.com
http://scopeapp2-1.com
继续下去,用户可以拥有任意数量的应用程序,并且应用程序可以有多个范围。
预期输出
User
anonymous
displayText
nativeApp
scopes
Client_id
userKey
jane.doe@gmail.com
false
app1
false
http://scope1.com http://scope2.com
CI1
UK1
jane.doe@gmail.com
false
app2
false
https://scopeapp2-1.com http://scopeapp2-2.com
CI2
UK2
所以我做到了,但我觉得我的代码有点难看,想知道你是否有更好的想法
for index, row in df.iterrows():
if 'User' in df.at[index,'value']:
x=index
df.at[x,'User']=df.at[index,'value']
elif 'Client ID' in df.at[index,'value']:
df.at[x,'Client_ID']=df.at[index,'value']
x=x+1
elif 'anonymous' in df.at[index,'value']:
df.at[x,'anonymous']=df.at[index,'value']
elif 'displayText' in df.at[index,'value']:
df.at[x,'displayText']=df.at[index,'value']
elif 'nativeApp' in df.at[index,'value']:
df.at[x,'nativeApp']=df.at[index,'value']
elif 'userKey' in df.at[index,'value']:
df.at[x,'userKey']=df.at[index,'value']
elif 'http' in df.at[index,'value']:
df.at[x,'scopes']=df.at[x,'scopes'] + ' ' +df.at[index,'value']
然后我将删除空行。
我想知道是否有更好的方法来做到这一点,所有这些 elif 都不是很干净...
如有任何帮助,我们将不胜感激。
我假设你的意思是你有一个 csv 文件。
如果您可以依靠结构,即 1 个用户、1 到 N 个客户端 ID 部分以及一个范围部分带有 1 .. N 个网址,您可以这样做:
if __name__ == '__main__':
from itertools import islice
from pprint import pprint
data = {}
def fieldv(line):
return line.rsplit(':', 1)[1].strip()
users = []
client_data = []
user_record = None
scopes = []
with open(..., 'r') as infile:
while line := infile.readline():
if line.startswith('User'):
user = fieldv(line)
client_data = []
user_record = {'User': user, 'client_data': client_data}
users.append(user_record)
elif line.startswith('http://'):
scopes.append(line.strip())
else:
d = list(islice(infile, 5))
scopes = []
app = {'Client ID': fieldv(line),
'anonymous': fieldv(d[0]),
# other fields d[1], d[2]...,
'scopes': scopes}
client_data.append(app)
正在使用提供的数据打印用户列表:
[{'User': 'jane.doe@gmail.com',
'client_data': [{'Client ID': 'CI1',
'anonymous': 'False',
'scopes': ['http://scope1.com', 'http://scope2.com']},
{'Client ID': 'CI2',
'anonymous': 'False',
'scopes': ['http://scopeapp2-1.com',
'http://scopeapp2-1.com']}]}]
您的文件非常接近 YaML 插入缺少的缩进和列表分隔符然后使用 json_normalize()
加载很简单
import pandas as pd
import io
from pathlib import Path
import yaml
raw = """User: jane.doe@gmail.com
Client ID: CI1
anonymous: False
displayText: app1
nativeApp: False
userKey: uk1
scopes:
http://scope1.com
http://scope2.com
Client ID: CI2
anonymous: False
displayText: app2
nativeApp: False
userKey: uk2
scopes:
http://scopeapp2-1.com
http://scopeapp2-1.com"""
fn = Path.cwd().joinpath("so.yaml")
with io.StringIO(raw) as f, open(fn, "w") as fw:
while True:
suffix = ""
l = f.readline()
if not l: break
elif l.startswith("User:"):
prefix = ""
suffix = "\napp:"
elif l.startswith("Client ID:"): prefix = " - "
elif (" " in l) or l.startswith("scopes:"): prefix = " "
else: prefix = " - "
fw.write(f"{prefix}{l.strip()}{suffix}\n")
with open(fn) as f: myyaml = yaml.safe_load(f)
pd.json_normalize(myyaml, record_path="app", meta="User")
Client ID
anonymous
displayText
nativeApp
userKey
scopes
User
0
CI1
False
app1
False
uk1
['http://scope1.com', 'http://scope2.com']
jane.doe@gmail.com
1
CI2
False
app2
False
uk2
['http://scopeapp2-1.com', 'http://scopeapp2-1.com']
jane.doe@gmail.com
我有一个本应是 json 的 csv,我正在尝试将其排序为多列
它可能是这样的 json(如果有帮助的话):
{"username":"jane.doe@gmail.com"
"app": [
{"appid":"123456"
"appname:"apppname"
"scopes":["scope1","scope2"]}
{"appid":"23456
"appname:"apppname"2
"scopes":["scope1","scope2"]}
{"username":john.doe@gmail.com"
...}
这是数据
Value |
---|
User: jane.doe@gmail.com |
Client ID: CI1 |
anonymous: False |
displayText: app1 |
nativeApp: False |
userKey: uk1 |
scopes: |
http://scope1.com |
http://scope2.com |
Client ID: CI2 |
anonymous: False |
displayText: app2 |
nativeApp: False |
userKey: uk2 |
scopes: |
http://scopeapp2-1.com |
http://scopeapp2-1.com |
继续下去,用户可以拥有任意数量的应用程序,并且应用程序可以有多个范围。 预期输出
User | anonymous | displayText | nativeApp | scopes | Client_id | userKey |
---|---|---|---|---|---|---|
jane.doe@gmail.com | false | app1 | false | http://scope1.com http://scope2.com | CI1 | UK1 |
jane.doe@gmail.com | false | app2 | false | https://scopeapp2-1.com http://scopeapp2-2.com | CI2 | UK2 |
所以我做到了,但我觉得我的代码有点难看,想知道你是否有更好的想法
for index, row in df.iterrows():
if 'User' in df.at[index,'value']:
x=index
df.at[x,'User']=df.at[index,'value']
elif 'Client ID' in df.at[index,'value']:
df.at[x,'Client_ID']=df.at[index,'value']
x=x+1
elif 'anonymous' in df.at[index,'value']:
df.at[x,'anonymous']=df.at[index,'value']
elif 'displayText' in df.at[index,'value']:
df.at[x,'displayText']=df.at[index,'value']
elif 'nativeApp' in df.at[index,'value']:
df.at[x,'nativeApp']=df.at[index,'value']
elif 'userKey' in df.at[index,'value']:
df.at[x,'userKey']=df.at[index,'value']
elif 'http' in df.at[index,'value']:
df.at[x,'scopes']=df.at[x,'scopes'] + ' ' +df.at[index,'value']
然后我将删除空行。 我想知道是否有更好的方法来做到这一点,所有这些 elif 都不是很干净...
如有任何帮助,我们将不胜感激。
我假设你的意思是你有一个 csv 文件。
如果您可以依靠结构,即 1 个用户、1 到 N 个客户端 ID 部分以及一个范围部分带有 1 .. N 个网址,您可以这样做:
if __name__ == '__main__':
from itertools import islice
from pprint import pprint
data = {}
def fieldv(line):
return line.rsplit(':', 1)[1].strip()
users = []
client_data = []
user_record = None
scopes = []
with open(..., 'r') as infile:
while line := infile.readline():
if line.startswith('User'):
user = fieldv(line)
client_data = []
user_record = {'User': user, 'client_data': client_data}
users.append(user_record)
elif line.startswith('http://'):
scopes.append(line.strip())
else:
d = list(islice(infile, 5))
scopes = []
app = {'Client ID': fieldv(line),
'anonymous': fieldv(d[0]),
# other fields d[1], d[2]...,
'scopes': scopes}
client_data.append(app)
正在使用提供的数据打印用户列表:
[{'User': 'jane.doe@gmail.com',
'client_data': [{'Client ID': 'CI1',
'anonymous': 'False',
'scopes': ['http://scope1.com', 'http://scope2.com']},
{'Client ID': 'CI2',
'anonymous': 'False',
'scopes': ['http://scopeapp2-1.com',
'http://scopeapp2-1.com']}]}]
您的文件非常接近 YaML 插入缺少的缩进和列表分隔符然后使用 json_normalize()
import pandas as pd
import io
from pathlib import Path
import yaml
raw = """User: jane.doe@gmail.com
Client ID: CI1
anonymous: False
displayText: app1
nativeApp: False
userKey: uk1
scopes:
http://scope1.com
http://scope2.com
Client ID: CI2
anonymous: False
displayText: app2
nativeApp: False
userKey: uk2
scopes:
http://scopeapp2-1.com
http://scopeapp2-1.com"""
fn = Path.cwd().joinpath("so.yaml")
with io.StringIO(raw) as f, open(fn, "w") as fw:
while True:
suffix = ""
l = f.readline()
if not l: break
elif l.startswith("User:"):
prefix = ""
suffix = "\napp:"
elif l.startswith("Client ID:"): prefix = " - "
elif (" " in l) or l.startswith("scopes:"): prefix = " "
else: prefix = " - "
fw.write(f"{prefix}{l.strip()}{suffix}\n")
with open(fn) as f: myyaml = yaml.safe_load(f)
pd.json_normalize(myyaml, record_path="app", meta="User")
Client ID | anonymous | displayText | nativeApp | userKey | scopes | User | |
---|---|---|---|---|---|---|---|
0 | CI1 | False | app1 | False | uk1 | ['http://scope1.com', 'http://scope2.com'] | jane.doe@gmail.com |
1 | CI2 | False | app2 | False | uk2 | ['http://scopeapp2-1.com', 'http://scopeapp2-1.com'] | jane.doe@gmail.com |