如何将格式为 CSV 的 BLOB 导入到 postgres

How to import BLOB formatted as CSV to postgres

我有一个 csv 文件,它是 BLOB 存储的输出。 csv 包含 6 个相关的 table。并非所有记录都使用 6 table,但所有记录都使用 table 1。我想将 table 1 导入 postgres。数据说明如下

Files are ASCII text files comprising variable length fields delimited by an asterisk. The files have an extension of “csv”. Records are separated by a Carriage Return/Line Feed. No data items should contain an asterisk.

技术安排中提供了更多信息。

Technical Arrangement

The technical arrangement of our summary data is as follows: Fields are in the exact order in which they are listed in this file specification. Records are broken down into Types. Each Type represents a different part of the record. Each record will begin with Type ‘01’ data For each record type ‘01’, there is one or more record type ‘02’s containing Survey Line Item data. There may be zero or more of record types ’03’ and '06'. There may be zero or one of record types '04' and '05'. If a record type '06' exists, there will be one record type '07' The end of a record is only indicated by the next row of Type ‘01’ data or the end of the file. You should use this information to read the file into formal data structures.

我是数据库的新手,想知道如何解决这个问题,我知道 postgres 有 python 和 java 连接器,它们又可以读取 blob 数据。这是最好的方法吗?

编辑 示例数据,一个条目包含 2 种记录类型,然后 1 个条目包含所有 7 种记录类型;

01*15707127000*8227599000*0335*The Occupier*3****MARKET STREET**BRACKNELL*BERKS*RG12 1JG*290405*Shop And Premises*60.71*14872*14872*14750*2017*Bracknell Forest*00249200003001*20994339144*01-APR-2017**249*NIA*330.00
02*1*Ground*Retail Zone A*29.42*330.00*9709
02*2*Ground*Retail Zone B*31.29*165.00*5163
01*15707136000*492865165*0335**7-8****CHARLES SQUARE**BRACKNELL*BERKS*RG12 1DF*290405*Shop And Premises*325.10*34451*32921*32750*2017*Bracknell Forest*00215600007806*21012750144*01-APR-2017**249*NIA*260.00
02*1*Ground*Retail Zone A*68.00*260.00*17680
02*2*Ground*Remaining Retail Zone*83.50*32.50*2714
02*3*Ground*Office*7.30*26.00*190
02*4*First*Locker Room (Female)*3.20*13.00*42
02*5*First*Locker Room (Male)*5.80*13.00*75
02*6*First*Mess/Staff Room*11.50*13.00*150
02*7*Ground*Internal Storage*7.80*26.00*203
02*8*Ground*Retail Zone B*68.10*130.00*8853
02*9*Ground*Retail Zone C*69.90*65.00*4544
03*Air Conditioning System*289.5*7.00*+2027
06*Divided or split unit*-5.00%
06*Double unit*-5.00%
07*36478*-3557`

将文本文件复制到具有单个文本列的辅助table:

drop table if exists text_buffer;
create table text_buffer(text_row text);
copy text_buffer from '/data/my_file.csv';

将文本列转换为文本数组,跳过您不需要的行。您将能够 select 任何元素作为具有给定名称和类型的新列,例如:

select 
    cols[2]::bigint as bigint1,
    cols[3]::bigint as bigint2,
    cols[4]::text as text1,
    cols[5]::text as text2
    -- specify name and type of any column you need 
from text_buffer,
lateral string_to_array(text_row, '*') cols -- transform text column to text array
where left(text_row, 2) = '01';             -- get only rows for table1

   bigint1   |  bigint2   | text1 |    text2     
-------------+------------+-------+--------------
 15707127000 | 8227599000 | 0335  | The Occupier
 15707136000 |  492865165 | 0335  | 
(2 rows)