正在准备 python 中的数据,一列按字符分隔成两列
Preparing Data in python, one column separed in two by character
我有一个文件从 spike2
导出为 .txt
,其中包含两个相同长度的信号。我使用 pandas.read_cvs
导入文件。
该文件由 19 行字符组成,然后在一列中开始我的信号值。中间有两行字符和开始我的第二个信号的值。喜欢这个模式:
"text'.........."
"text'.........."
...
...
"text'.........."
"text'.........."
1.5
2.71
...
...
...
0.56
"text'.........."
1.98
0.567
...
...
...
6.89
我想自动分离我的两个信号,将它们一个叠加另一个绘制(共享 x 轴)并绘制每个信号的频谱图。
但直到现在我还不能轻易区分我的两个信号。
PandasData Munging好玩
您可以通过几个步骤完成此操作:
读取文件时使用skiprows=
and header=None
parameters for pd.read_csv()
忽略前几行。
删除所有包含 pd.to_numeric()
and df.dropna()
的文本行。
中途拆分并放入另一列 len(df)/2
slicing followed by pd.concat()
。
假设您有 matplotlib
, just call df.plot()
个要显示。
示例:
%matplotlib inline
import pandas as pd
from cStringIO import StringIO
text_file = '''text line
text line
text line
text line
text line
text line
1.5
2.71
0.567
2.71
2.71
0.56
text line
1.98
0.567
1.98
2.71
0.56
6.89'''
# Read in data with, separate data with newline (\n) and skip the first n lines
# StringIO(text_file) is for example only
# Normally, you would use pd.read_csv('/path/to/file.csv', ...)
df = pd.read_csv(StringIO(text_file), sep='\n', header=None, skiprows=6)
print 'Two signals:'
print df
print
print 'Force to numbers:'
df = df.apply(pd.to_numeric, errors='coerce')
print df
print
print 'Remove NaNs:'
df = df.dropna().reset_index().drop('index', 1)
print df
print
# You should have 2 equal length signals, one after the other, so split half way
print 'Split into two columns:'
s1 = df[:len(df)/2].reset_index().drop('index', 1)
s2 = df[len(df)/2:].reset_index().drop('index', 1)
df = pd.concat([s1, s2], axis=1)
df.columns = ['sig1', 'sig2']
print df
print
# Plot, assuming you have matplotlib library
df.plot()
Two signals:
0
0 1.5
1 2.71
2 0.567
3 2.71
4 2.71
5 0.56
6 text line
7 1.98
8 0.567
9 1.98
10 2.71
11 0.56
12 6.89
Force to numbers:
0
0 1.500
1 2.710
2 0.567
3 2.710
4 2.710
5 0.560
6 NaN
7 1.980
8 0.567
9 1.980
10 2.710
11 0.560
12 6.890
Remove NaNs:
0
0 1.500
1 2.710
2 0.567
3 2.710
4 2.710
5 0.560
6 1.980
7 0.567
8 1.980
9 2.710
10 0.560
11 6.890
Split into two columns:
sig1 sig2
0 1.500 1.980
1 2.710 0.567
2 0.567 1.980
3 2.710 2.710
4 2.710 0.560
5 0.560 6.890
频谱图需要等待...
我有一个文件从 spike2
导出为 .txt
,其中包含两个相同长度的信号。我使用 pandas.read_cvs
导入文件。
该文件由 19 行字符组成,然后在一列中开始我的信号值。中间有两行字符和开始我的第二个信号的值。喜欢这个模式:
"text'.........."
"text'.........."
...
...
"text'.........."
"text'.........."
1.5
2.71
...
...
...
0.56
"text'.........."
1.98
0.567
...
...
...
6.89
我想自动分离我的两个信号,将它们一个叠加另一个绘制(共享 x 轴)并绘制每个信号的频谱图。
但直到现在我还不能轻易区分我的两个信号。
PandasData Munging好玩
您可以通过几个步骤完成此操作:
读取文件时使用
skiprows=
andheader=None
parameters forpd.read_csv()
忽略前几行。删除所有包含
pd.to_numeric()
anddf.dropna()
的文本行。中途拆分并放入另一列
len(df)/2
slicing followed bypd.concat()
。假设您有
matplotlib
, just calldf.plot()
个要显示。
示例:
%matplotlib inline
import pandas as pd
from cStringIO import StringIO
text_file = '''text line
text line
text line
text line
text line
text line
1.5
2.71
0.567
2.71
2.71
0.56
text line
1.98
0.567
1.98
2.71
0.56
6.89'''
# Read in data with, separate data with newline (\n) and skip the first n lines
# StringIO(text_file) is for example only
# Normally, you would use pd.read_csv('/path/to/file.csv', ...)
df = pd.read_csv(StringIO(text_file), sep='\n', header=None, skiprows=6)
print 'Two signals:'
print df
print
print 'Force to numbers:'
df = df.apply(pd.to_numeric, errors='coerce')
print df
print
print 'Remove NaNs:'
df = df.dropna().reset_index().drop('index', 1)
print df
print
# You should have 2 equal length signals, one after the other, so split half way
print 'Split into two columns:'
s1 = df[:len(df)/2].reset_index().drop('index', 1)
s2 = df[len(df)/2:].reset_index().drop('index', 1)
df = pd.concat([s1, s2], axis=1)
df.columns = ['sig1', 'sig2']
print df
print
# Plot, assuming you have matplotlib library
df.plot()
Two signals:
0
0 1.5
1 2.71
2 0.567
3 2.71
4 2.71
5 0.56
6 text line
7 1.98
8 0.567
9 1.98
10 2.71
11 0.56
12 6.89
Force to numbers:
0
0 1.500
1 2.710
2 0.567
3 2.710
4 2.710
5 0.560
6 NaN
7 1.980
8 0.567
9 1.980
10 2.710
11 0.560
12 6.890
Remove NaNs:
0
0 1.500
1 2.710
2 0.567
3 2.710
4 2.710
5 0.560
6 1.980
7 0.567
8 1.980
9 2.710
10 0.560
11 6.890
Split into two columns:
sig1 sig2
0 1.500 1.980
1 2.710 0.567
2 0.567 1.980
3 2.710 2.710
4 2.710 0.560
5 0.560 6.890
频谱图需要等待...