Pandas:用缺少分隔符分隔两列
Pandas: Separate two columns with missing separator
我有这样的数据:
00052600150.00942615
00052601000.01014910
00052601050.02709672
00052601100.11454732
00052601150.23151254
00052601200.36262522
00052601250.66432348
00052601301.07723763
00052601351.26019487
00052601401.20568581
前 10 位数字表示时间步长 YYMMDDhhmm 后跟一个数字
它应该是 0005260010,0.00799872,其中第一个块是时间步长,第二个块是测量值。
我试过用 pandas 读取数据并将其转换为 str 但后来我丢失了前导零?有没有办法用数字分隔浮点数?
问候
带有 pandas 的正则表达式可以在没有分隔符的情况下拆分您的列
# sample data
df = pd.DataFrame({'A': [
'00052600150.00942615',
'00052601000.01014910',
'00052601050.02709672',
'00052601100.11454732',
'00052601150.23151254',
'00052601200.36262522',
'00052601250.66432348',
'00052601301.07723763',
'00052601351.26019487',
'00052601401.20568581',
]})
df3 = df['A'].str.extract(
r'(\d{2})(\d{2})(\d{2})(\d{2})(\d{2})(\d\.\d*)',
expand=True)
df3.columns = ['Year', 'Month', 'Day', 'Hour', 'Minute', 'Reading']
print(df3)
输出
Year Month Day Hour Minute Reading
0 00 05 26 00 15 0.00942615
1 00 05 26 01 00 0.01014910
2 00 05 26 01 05 0.02709672
3 00 05 26 01 10 0.11454732
4 00 05 26 01 15 0.23151254
5 00 05 26 01 20 0.36262522
6 00 05 26 01 25 0.66432348
7 00 05 26 01 30 1.07723763
8 00 05 26 01 35 1.26019487
9 00 05 26 01 40 1.20568581
您可以将列阅读为 str
并按位置拆分您的值
df = pd.read_csv('yourfile.csv', header=None, dtype='str', names=['col1'])
df['time'] = pd.to_datetime(df.col1.str[:10], unit='s')
df['value'] = (df.col1.str[10:]).astype('float')
df
输出:
col1 time value
0 00052600150.00942615 1970-03-02 21:06:55 0.009426
1 00052601000.01014910 1970-03-02 21:08:20 0.010149
2 00052601050.02709672 1970-03-02 21:08:25 0.027097
3 00052601100.11454732 1970-03-02 21:08:30 0.114547
4 00052601150.23151254 1970-03-02 21:08:35 0.231513
5 00052601200.36262522 1970-03-02 21:08:40 0.362625
6 00052601250.66432348 1970-03-02 21:08:45 0.664323
7 00052601301.07723763 1970-03-02 21:08:50 1.077238
8 00052601351.26019487 1970-03-02 21:08:55 1.260195
9 00052601401.20568581 1970-03-02 21:09:00 1.205686
我有这样的数据:
00052600150.00942615
00052601000.01014910
00052601050.02709672
00052601100.11454732
00052601150.23151254
00052601200.36262522
00052601250.66432348
00052601301.07723763
00052601351.26019487
00052601401.20568581
前 10 位数字表示时间步长 YYMMDDhhmm 后跟一个数字
它应该是 0005260010,0.00799872,其中第一个块是时间步长,第二个块是测量值。
我试过用 pandas 读取数据并将其转换为 str 但后来我丢失了前导零?有没有办法用数字分隔浮点数?
问候
带有 pandas 的正则表达式可以在没有分隔符的情况下拆分您的列
# sample data
df = pd.DataFrame({'A': [
'00052600150.00942615',
'00052601000.01014910',
'00052601050.02709672',
'00052601100.11454732',
'00052601150.23151254',
'00052601200.36262522',
'00052601250.66432348',
'00052601301.07723763',
'00052601351.26019487',
'00052601401.20568581',
]})
df3 = df['A'].str.extract(
r'(\d{2})(\d{2})(\d{2})(\d{2})(\d{2})(\d\.\d*)',
expand=True)
df3.columns = ['Year', 'Month', 'Day', 'Hour', 'Minute', 'Reading']
print(df3)
输出
Year Month Day Hour Minute Reading
0 00 05 26 00 15 0.00942615
1 00 05 26 01 00 0.01014910
2 00 05 26 01 05 0.02709672
3 00 05 26 01 10 0.11454732
4 00 05 26 01 15 0.23151254
5 00 05 26 01 20 0.36262522
6 00 05 26 01 25 0.66432348
7 00 05 26 01 30 1.07723763
8 00 05 26 01 35 1.26019487
9 00 05 26 01 40 1.20568581
您可以将列阅读为 str
并按位置拆分您的值
df = pd.read_csv('yourfile.csv', header=None, dtype='str', names=['col1'])
df['time'] = pd.to_datetime(df.col1.str[:10], unit='s')
df['value'] = (df.col1.str[10:]).astype('float')
df
输出:
col1 time value
0 00052600150.00942615 1970-03-02 21:06:55 0.009426
1 00052601000.01014910 1970-03-02 21:08:20 0.010149
2 00052601050.02709672 1970-03-02 21:08:25 0.027097
3 00052601100.11454732 1970-03-02 21:08:30 0.114547
4 00052601150.23151254 1970-03-02 21:08:35 0.231513
5 00052601200.36262522 1970-03-02 21:08:40 0.362625
6 00052601250.66432348 1970-03-02 21:08:45 0.664323
7 00052601301.07723763 1970-03-02 21:08:50 1.077238
8 00052601351.26019487 1970-03-02 21:08:55 1.260195
9 00052601401.20568581 1970-03-02 21:09:00 1.205686