将不一致的数据集从列转置到行 csv 或 excel

Transpose Inconsistent data set from columns to rows csv or excel

您好,我有一个包含大约 380k 行和树列的 csv 文件。 A 列 - 我有密钥,它针对每个 trim 版本重复,并且每个 trim 版本每次都以品牌一词开头 B 列 - 规格 C 列 - trim ID,每个 trim 版本

的编号相同

我的问题是我的数据范围不一致,一些trim版本最多有55行数据,而其他版本只有5-6

(A)KEY                              (B)VALUE            (C)TRIM ID
======                              ========            ==========
1. Brand                            Mitsubishi          20001
2. Model                            ASX                 20001
3. Trim                             ASX (facelift 2012) 20001
4. Engine                           1.8 DI-D (114 Hp)   20001
5. Doors                            5                   20001
6. Power                            114 hp              20001
7. Maximum speed                    189 km/h            20001
8. From 0 to 100 km/h               10.2 sec            20001
9. Fuel tank volume                 63 l                20001
10. Year into production            2012                20001
11. Seats                           5                   20001                       
1. Brand                            BMW                 20015
2. Model                            M4                  20015
3. Trim                             M4 (F83)            20015
4. Engine                           3.0 (431 Hp) DCT    20015
5. Power                            431 hp              20015
1. Brand                            AUDI                25003
2. Model                            A4                  25003
3. Trim                             1.9TDI AVANT SLINE  25003
4. Power                            131 hp              25003

我想将数据转换为每个 TRIM 版本的一行并匹配数据。例如,每次找到品牌时,都会用数据开始新行,其余数据与列名品牌、型号……座位等相匹配。

像这样:

Brand   Model   Generation  Engine  Doors   Power   Maximum speed   Seats   Length
=====   =====    =========   =====  =====   =====   =============   =====   ======
AUDI    A4      2.0T SLINE  2.0T    5       210     220             4         4520
BMW     M3                  330                     280             4
HONDA   CIVIC               1.6i    4       160                     4

我试图用函数解决这个问题,但我想我需要 vba 脚本,但我不擅长。请帮助我。

我认为这可以通过一个支点轻松完成 table。只需将您的 csv 数据导入 excel 并将其转换为数据透视表 table.

好吧,我做到了(以及您提供的数据样本)

Sub createDataTable()
    Dim r
    Dim c
    Dim i
    Dim rng As Range
    Dim newSht As Worksheet
    Dim dataSht As Worksheet
    Dim j 'the counter for the rows of the table
    Dim colName As Range
    Dim theAddress

    Set dataSht = Sheets("Data")
    dataSht.Activate
    r = Range("A1").End(xlDown).Row 'take the last row of the data
    c = Range("A1").End(xlToRight).Column 'Take the last columns of the data
    Set rng = Range(Cells(2, 1), Cells(r, 1)) 'Store the column 1=A of the data

    Sheets.Add After:=Sheets(Sheets.Count) 'Add a new sheet
    Set newSht = ActiveSheet 'Store the new sheet int the var
    newSht.Name = myTime 'Rename the new sheet with the function
    j = 1

    dataSht.Activate
    rng.Copy
    Range("H1").PasteSpecial xlPasteAll
    Application.CutCopyMode = False

    ActiveSheet.Range("$H:$H").RemoveDuplicates Columns:=1, Header:=xlNo

    Range("H1", Selection.End(xlDown)).Copy
    newSht.Activate
    Range("A1").PasteSpecial Paste:=xlPasteAll, Transpose:=True
    Rows("1:1").Font.Bold = True

    dataSht.Activate 'Go to the new sheet (just in case)
    Range("H1", Selection.End(xlDown)).ClearContents

    For Each i In rng 'Here comes the magic
        If i.Value = "Brand" Then 'If is a Brand set a new row
            j = j + 1
        End If
        newSht.Activate 'Lets go to the new sheet
        With newSht.Range("A1:BZ1") 'With the headers...
            Set colName = .Find(i.Value, LookIn:=xlValues) '...Find the header of the column in that range
            If Not colName Is Nothing Then 'If colName has something then
                theAddress = colName.Address 'Put the address of the address just for reference
                Do 'and inner loop
                    Range(Cells(j, colName.Column), Cells(j, colName.Column)).Value = i.Offset(0, 1).Value
                    'put the value of the field inside the cell below the right header in the right row
                    'col header = colName.column
                    'right row = j
                    'Set colName = .FindNext(colName) 'this is not necesary, because the header are unique
                Loop While Not colName Is Nothing And colName.Address <> theAddress
            End If
        End With
        dataSht.Activate
    Next i
End Sub

编辑 #1

此代码创建 headers

Sub createDataTable()
    Dim r
    Dim c
    Dim i
    Dim rng As Range
    Dim newSht As Worksheet
    Dim dataSht As Worksheet
    Dim j 'the counter for the rows of the table
    Dim colName As Range
    Dim theAddress

    Set dataSht = Sheets("Data")
    dataSht.Activate
    'to create headers
    Rows("1:1").Insert Shift:=xlDown
    Range("A1").FormulaR1C1 = "Key"
    Range("B1").FormulaR1C1 = "Value"
    Range("C1").FormulaR1C1 = "Trim"
    Rows("1:1").Font.Bold = True

    r = Range("A1").End(xlDown).Row 'take the last row of the data
    c = Range("A1").End(xlToRight).Column 'Take the last columns of the data
    Set rng = Range(Cells(2, 1), Cells(r, 1)) 'Store the column 1=A of the data

    Sheets.Add After:=Sheets(Sheets.Count) 'Add a new sheet
    Set newSht = ActiveSheet 'Store the new sheet int the var
    newSht.Name = myTime 'Rename the new sheet with the function
    j = 1

    dataSht.Activate
    rng.Copy
    Range("H1").PasteSpecial xlPasteAll
    Application.CutCopyMode = False

    ActiveSheet.Range("$H:$H").RemoveDuplicates Columns:=1, Header:=xlNo

    Range("H1", Selection.End(xlDown)).Copy
    newSht.Activate
    Range("A1").PasteSpecial Paste:=xlPasteAll, Transpose:=True
    Rows("1:1").Font.Bold = True

    dataSht.Activate 'Go to the new sheet (just in case)
    Range("H1", Selection.End(xlDown)).ClearContents

    For Each i In rng 'Here comes the magic
        If i.Value = "Brand" Then 'If is a Brand set a new row
            j = j + 1
        End If
        newSht.Activate 'Lets go to the new sheet
        With newSht.Range("A1:BZ1") 'With the headers...
            Set colName = .Find(i.Value, LookIn:=xlValues) '...Find the header of the column in that range
            If Not colName Is Nothing Then 'If colName has something then
                theAddress = colName.Address 'Put the address of the address just for reference
                Do 'and inner loop
                    Range(Cells(j, colName.Column), Cells(j, colName.Column)).Value = i.Offset(0, 1).Value
                    'put the value of the field inside the cell below the right header in the right row
                    'col header = colName.column
                    'right row = j
                    'Set colName = .FindNext(colName) 'this is not necesary, because the header are unique
                Loop While Not colName Is Nothing And colName.Address <> theAddress
            End If
        End With
        dataSht.Activate
    Next i
End Sub

正如我在屏幕截图中看到的那样,您将代码放入工作sheet,这将return 错误 1004

因为你不能 "manipulate" 来自 sheet 的另一个 sheet。如果您 need/want 这样做,您需要在一个模块内进行,然后从该模块调用该过程。

在这种情况下,您需要添加一个新模块 在 VBA

中选择了工作簿
Insert >>> Module

一个新模块将在您的项目中,并在该模块中添加来自 Edit #2 的过程,然后 运行 它, F5.

如果您需要改进,请告诉我。

编辑 #2

很高兴能为您提供帮助...您遇到的错误是因为我确实向您发送了自定义函数...抱歉...开始!

Function myTime() As String
    Dim HH
    Dim MM
    Dim SS
    Dim TT
    HH = Hour(Now)
    MM = Minute(Now)
    SS = Second(Now)
    myTime = Format(HH, "00") & Format(MM, "00") & Format(SS, "00")
End Function

将此函数放在与所有代码相同的模块中。

我会回答我的问题,因为我找到了一个名为 OpenRefine 的完美而强大的解决方案,它是一个前 google 项目(Google Refine)。

由于我的数据集现在超过一百万行,这是最快和最好的解决方案(比 excel 好得多)。

http://openrefine.org/