转换 .txt 文件以加载到数据框中

Converting .txt file to load into dataframe

我有一个类似这样的文本文件 (.txt):

{"accountNumber": "737265056", "customerId": "737265056", "creditLimit": 5000.0, "availableMoney": 5000.0, "transactionDateTime": "2016-08-13T14:27:32", "transactionAmount": 98.55, "merchantName": "Uber", "acqCountry": "US", "merchantCountryCode": "US", "posEntryMode": "02", "posConditionCode": "01", "merchantCategoryCode": "rideshare", "currentExpDate": "06/2023", "accountOpenDate": "2015-03-14", "dateOfLastAddressChange": "2015-03-14", "cardCVV": "414", "enteredCVV": "414", "cardLast4Digits": "1803", "transactionType": "PURCHASE", "echoBuffer": "", "currentBalance": 0.0, "merchantCity": "", "merchantState": "", "merchantZip": "", "cardPresent": false, "posOnPremises": "", "recurringAuthInd": "", "expirationDateKeyInMatch": false, "isFraud": false}
{"accountNumber": "737265056", "customerId": "737265056", "creditLimit": 5000.0, "availableMoney": 5000.0, "transactionDateTime": "2016-10-11T05:05:54", "transactionAmount": 74.51, "merchantName": "AMC #191138", "acqCountry": "US", "merchantCountryCode": "US", "posEntryMode": "09", "posConditionCode": "01", "merchantCategoryCode": "entertainment", "cardPresent": true, "currentExpDate": "02/2024", "accountOpenDate": "2015-03-14", "dateOfLastAddressChange": "2015-03-14", "cardCVV": "486", "enteredCVV": "486", "cardLast4Digits": "767", "transactionType": "PURCHASE", "echoBuffer": "", "currentBalance": 0.0, "merchantCity": "", "merchantState": "", "merchantZip": "", "posOnPremises": "", "recurringAuthInd": "", "expirationDateKeyInMatch": false, "isFraud": false}
{"accountNumber": "737265056", "customerId": "737265056", "creditLimit": 5000.0, "availableMoney": 5000.0, "transactionDateTime": "2016-11-08T09:18:39", "transactionAmount": 7.47, "merchantName": "Play Store", "acqCountry": "US", "merchantCountryCode": "US", "posEntryMode": "09", "posConditionCode": "01", "merchantCategoryCode": "mobileapps", "currentExpDate": "08/2025", "accountOpenDate": "2015-03-14", "dateOfLastAddressChange": "2015-03-14", "cardCVV": "486", "enteredCVV": "486", "cardLast4Digits": "767", "transactionType": "PURCHASE", "echoBuffer": "", "currentBalance": 0.0, "merchantCity": "", "merchantState": "", "merchantZip": "", "cardPresent": false, "posOnPremises": "", "recurringAuthInd": "", "expirationDateKeyInMatch": false, "isFraud": false}
{"accountNumber": "737265056", "customerId": "737265056", "creditLimit": 5000.0, "availableMoney": 5000.0, "transactionDateTime": "2016-12-10T02:14:50", "transactionAmount": 7.47, "merchantName": "Play Store", "acqCountry": "US", "merchantCountryCode": "US", "posEntryMode": "09", "posConditionCode": "01", "merchantCategoryCode": "mobileapps", "currentExpDate": "08/2025", "accountOpenDate": "2015-03-14", "dateOfLastAddressChange": "2015-03-14", "cardCVV": "486", "enteredCVV": "486", "cardLast4Digits": "767", "transactionType": "PURCHASE", "echoBuffer": "", "currentBalance": 0.0, "merchantCity": "", "merchantState": "", "merchantZip": "", "cardPresent": false, "posOnPremises": "", "recurringAuthInd": "", "expirationDateKeyInMatch": false, "isFraud": false}
.
.
.
.
.
.
.

如何将其加载到数据框中?

每一行看起来都是一个单独的 json 对象。仅使用 Python:

  • 将文件的每一行读入一个字符串
  • 将每一行从 json 转换为对应的 Python 字典
  • 将这些命令附加到列表中
  • 将字典列表转换为 Pandas DataFrame
import pandas as pd
import json

with open('data.json') as f:
    lines = f.readlines()

data = []
for line in lines:
    data.append(json.loads(line))

df = pd.DataFrame(data)

df

看起来像

accountNumber   customerId  creditLimit availableMoney  transactionDateTime transactionAmount   merchantName    acqCountry  merchantCountryCode posEntryMode    posConditionCode    merchantCategoryCode    currentExpDate  accountOpenDate dateOfLastAddressChange cardCVV enteredCVV  cardLast4Digits transactionType echoBuffer  currentBalance  merchantCity    merchantState   merchantZip cardPresent posOnPremises   recurringAuthInd    expirationDateKeyInMatch    isFraud
0   737265056   737265056   5000.0  5000.0  2016-08-13T14:27:32 98.55   Uber    US  US  02  01  rideshare   06/2023 2015-03-14  2015-03-14  414 414 1803    PURCHASE        0.0             False           False   False
1   737265056   737265056   5000.0  5000.0  2016-10-11T05:05:54 74.51   AMC #191138 US  US  09  01  entertainment   02/2024 2015-03-14  2015-03-14  486 486 767 PURCHASE        0.0             True            False   False
2   737265056   737265056   5000.0  5000.0  2016-11-08T09:18:39 7.47    Play Store  US  US  09  01  mobileapps  08/2025 2015-03-14  2015-03-14  486 486 767 PURCHASE        0.0             False           False   False
3   737265056   737265056   5000.0  5000.0  2016-12-10T02:14:50 7.47    Play Store  US  US  09  01  mobileapps  08/2025 2015-03-14  2015-03-14  486 486 767 PURCHASE        0.0             False           False   False

如果文件只有一个 json 对象而不是每行一个新的 json 对象,您可以只使用 pandas.read_json(file_path)

@Abhishek Mishra,只需输入 'transactions.txt' 而不是 'data.json'。它像魔术一样工作。我一直在为同样的任务而苦苦挣扎。

谢谢@Tyler