由于某些字符长度不同，如何在不使用索引的情况下解析数据

Question

我需要解析这些数据，以便数据解析列中的每个值都存放在它自己的列中。

    userid          data_to_parse
0   54f3ad9a29ada   "value":"N;U;A7;W"}]
1   54f69f2de6aec   "value":"N;U;I6;W"}]
2   54f650f004474   "value":"Y;U;A7;W"}]
3   54f52e8872227   "value":"N;U;I1;W"}]
4   54f64d3075b72   "value":"Y;U;A7;W"}]

例如，第一个条目的四个附加列的值为“N”、“U”、“A7”和“W”。我首先尝试像这样根据索引进行拆分：

parsing_df['value_one'] = parsing_df['data_to_parse'].str[9:10]
parsing_df['value_two'] = parsing_df['data_to_parse'].str[11:12]
parsing_df['value_three'] = parsing_df['data_to_parse'].str[13:15]
parsing_df['value_four'] = parsing_df['data_to_parse'].str[16:17]

除了有一些长度不同（例如 937 和 938）之外，效果非常好。

935 54f45edd13582   "value":"N;U;A7;W"}]    N   U   A7  W
936 54f4d55080113   "value":"N;C;A7;L"}]    N   C   A7  L
937 54f534614d44b   "value":"N;U;U;W"}]     N   U   U;  "
938 54f383ee53069   "value":"N;U;U;W"}]     N   U   U;  "
939 54f40656a4be4   "value":"Y;U;A1;W"}]    Y   U   A1  W
940 54f5d4e063d6a   "value":"N;U;A4;W"}]    N   U   A4  W

有没有人有不使用硬编码位置的解决方案？

感谢您的帮助！

Answer 1

w=15=shw=12=shw=13=sh w=15=WILL.y.w=13=w w=10=sh w=11=sh

Answer 2

一个相对简单的解决问题的方法：

txt = """54f45edd13582  "value":"N;U;A7;W"}]
54f4d55080113  "value":"N;C;A7;L"}]
54f534614d44b  "value":"N;U;U;W"}]
54f383ee53069  "value":"N;U;U;W"}]
54f40656a4be4  "value":"Y;U;A1;W"}]
54f5d4e063d6a  "value":"N;U;A4;W"}]
"""

import pandas as pd

txt = txt.replace('}','').replace(']','').replace('"','') #first, clean up the data
#then, collect your data (it may be possible to do it w/ list comprehension, but I prefer this):
rows = []
for l in [t.split('\tvalue:') for t in txt.splitlines()]: 
#depending on your actual data, you may have to split by "\nvalue" or "  value" or whatever
    row = l[1].split(';')
    row.insert(0,l[0])
    rows.append(row)
#define your columns
columns = ['userid','value_one','value_two','value_three','value_four'] 
#finally, create your dataframe:
pd.DataFrame(rows,columns=columns)

输出（请原谅格式）：

        userid          value_one     value_two value_three value_four
0       54f45edd13582   N   U   A7  W
1       54f4d55080113   N   C   A7  L
2       54f534614d44b   N   U   U   W
3       54f383ee53069   N   U   U   W
4       54f40656a4be4   Y   U   A1  W
5       54f5d4e063d6a   N   U   A4  W

由于某些字符长度不同，如何在不使用索引的情况下解析数据

How do I parse data without using the index because some characters are different lengths

parsing

string-parsing

python-3.x

pandas