正在准备 python 中的数据,一列按字符分隔成两列

Preparing Data in python, one column separed in two by character

我有一个文件从 spike2 导出为 .txt,其中包含两个相同长度的信号。我使用 pandas.read_cvs 导入文件。 该文件由 19 行字符组成,然后在一列中开始我的信号值。中间有两行字符和开始我的第二个信号的值。喜欢这个模式:

"text'.........."
"text'.........."
...
...
"text'.........."
"text'.........."
1.5
2.71
...
...
...
0.56
"text'.........."
1.98
0.567
...
...
...
6.89

我想自动分离我的两个信号,将它们一个叠加另一个绘制(共享 x 轴)并绘制每个信号的频谱图。

但直到现在我还不能轻易区分我的两个信号。

PandasData Munging好玩

您可以通过几个步骤完成此操作:

  1. 读取文件时使用skiprows= and header=None parameters for pd.read_csv()忽略前几行。

  2. 删除所有包含 pd.to_numeric() and df.dropna() 的文本行。

  3. 中途拆分并放入另一列 len(df)/2 slicing followed by pd.concat()

  4. 假设您有 matplotlib, just call df.plot() 个要显示。

示例:

%matplotlib inline
import pandas as pd
from cStringIO import StringIO

text_file = '''text line
text line
text line
text line
text line
text line
1.5
2.71
0.567
2.71
2.71
0.56
text line
1.98
0.567
1.98
2.71
0.56
6.89'''

# Read in data with, separate data with newline (\n) and skip the first n lines
# StringIO(text_file) is for example only
# Normally, you would use pd.read_csv('/path/to/file.csv', ...)
df = pd.read_csv(StringIO(text_file), sep='\n', header=None, skiprows=6)
print 'Two signals:'
print df
print

print 'Force to numbers:'
df = df.apply(pd.to_numeric, errors='coerce')
print df
print

print 'Remove NaNs:'
df = df.dropna().reset_index().drop('index', 1)
print df
print

# You should have 2 equal length signals, one after the other, so split half way
print 'Split into two columns:'
s1 = df[:len(df)/2].reset_index().drop('index', 1)
s2 = df[len(df)/2:].reset_index().drop('index', 1)
df = pd.concat([s1, s2], axis=1)
df.columns = ['sig1', 'sig2']
print df
print

# Plot, assuming you have matplotlib library
df.plot()

Two signals:
            0
0         1.5
1        2.71
2       0.567
3        2.71
4        2.71
5        0.56
6   text line
7        1.98
8       0.567
9        1.98
10       2.71
11       0.56
12       6.89

Force to numbers:
        0
0   1.500
1   2.710
2   0.567
3   2.710
4   2.710
5   0.560
6     NaN
7   1.980
8   0.567
9   1.980
10  2.710
11  0.560
12  6.890

Remove NaNs:
        0
0   1.500
1   2.710
2   0.567
3   2.710
4   2.710
5   0.560
6   1.980
7   0.567
8   1.980
9   2.710
10  0.560
11  6.890

Split into two columns:
    sig1   sig2
0  1.500  1.980
1  2.710  0.567
2  0.567  1.980
3  2.710  2.710
4  2.710  0.560
5  0.560  6.890

频谱图需要等待...