在计数系列中填写间隙数据

Question

我有几个包含 3 列的 CSV 文件：

第 1 列是标识符
第 2 列是相应的浓度，
第 3 列是卷。

标识符只是运行从 1 到 950 的顺序，但并不是所有的都出现在每个样本中，因此也不是每个 CSV 文件中。

有没有办法使用 Excel 或 MATLAB 来填充不存在的数字？这样，当我将它们组合成 2D 数组时，它们都排在一起。

目前看起来像：

Compound Name,Peak Value,Volume
1,627434.8768,5.50E+07
5,2.53E+07,5.11E+08
7,1.64E+07,3.07E+08

理想情况下，应该是

Compound Name,Peak Value,Volume
1,627434.8768,5.50E+07
2,0,0
3,0,0
4,0,0
5,2.53E+07,5.11E+08
6,0,0
7,1.64E+07,3.07E+08

Answer 1

如果您的标识符始终是自然数，在 Matlab 中您只需索引即可：

input = [1, 627434.8768, 5.50E+07
         5, 2.53E+07,    5.11E+08
         7, 1.64E+07,    3.07E+08];
output = zeros(max(input(:,1)), size(input,2));  %// initiallize with zeros
output(input(:,1),:) = input;                    %// fill in available values

Answer 2

我将提供一个使用 MATLAB 的解决方案。您首先可以做的是使用 csvread 读取数据，但跳过第一行，因为它包含标题。之后，创建一个二维矩阵，其中第一行从1到950划定。接下来，使用您通过csvread读取的数据的第一列并索引到这个二维矩阵中，以便您可以存储相应的行此矩阵中的每个标识符。完成后，将这个新矩阵写入新文件。

当然，您需要先打开原始文件以获取标题，然后将其写入新文件，然后才是您的数据。您可以使用 fopen to open up the file, then fgetl 来检索原始文件的第一行。

假设您的文件名为 test.csv。代码可能如下所示：

M = csvread('test.csv', 1, 0); %// Parse through file starting from second row
out = [(1:950).' zeros(950,2)]; %// Create empty 2D matrix like before
out(M(:,1),2:3) = M(:,2:3); %// Populate the matrix and index into the right rows

fid = fopen('test.csv', 'r'); %// Open up the original file for reading
fid2 = fopen('out.csv', 'w'); %// Open up the output file for writing
first_line = fgetl(fid); %// Get the first line of the original file
fprintf(fid2, '%s\n', first_line); %// Write this first line to the output file
fprintf(fid2, '%d,%e,%e\n', out.'); %// Write the modified 2D array to file
fclose(fid); fclose(fid2); %// Close both of the files

以上代码是self-explanatory带注释的。输出的 CSV 文件被写入一个名为 out.csv 的文件。但是，我要指出的复杂之处是代码的倒数第二行。看看我如何将第一列打印为整数（'%d' 修饰符），而其他数字采用指数表示法（'%e' 修饰符）以满足您在上面向我们展示的示例。此外，MATLAB 处理 column-major 格式的数据，因此您实际上需要将矩阵的 transpose 写入文件（因此.' 运算符）。

使用您的测试数据和上面的代码，这里是输出 CSV 文件中的前 11 行（10 条数据 + 行标题）：

Compound Name,Peak Value,Volume
1,6.274349e+05,5.500000e+07
2,0.000000e+00,0.000000e+00
3,0.000000e+00,0.000000e+00
4,0.000000e+00,0.000000e+00
5,2.530000e+07,5.110000e+08
6,0.000000e+00,0.000000e+00
7,1.640000e+07,3.070000e+08
8,0.000000e+00,0.000000e+00
9,0.000000e+00,0.000000e+00
10,0.000000e+00,0.000000e+00

在计数系列中填写间隙数据

Fill in interstitial data in counting series

csv

excel

matlab