转换 .txt 文件以加载到数据框中
Converting .txt file to load into dataframe
我有一个类似这样的文本文件 (.txt):
{"accountNumber": "737265056", "customerId": "737265056", "creditLimit": 5000.0, "availableMoney": 5000.0, "transactionDateTime": "2016-08-13T14:27:32", "transactionAmount": 98.55, "merchantName": "Uber", "acqCountry": "US", "merchantCountryCode": "US", "posEntryMode": "02", "posConditionCode": "01", "merchantCategoryCode": "rideshare", "currentExpDate": "06/2023", "accountOpenDate": "2015-03-14", "dateOfLastAddressChange": "2015-03-14", "cardCVV": "414", "enteredCVV": "414", "cardLast4Digits": "1803", "transactionType": "PURCHASE", "echoBuffer": "", "currentBalance": 0.0, "merchantCity": "", "merchantState": "", "merchantZip": "", "cardPresent": false, "posOnPremises": "", "recurringAuthInd": "", "expirationDateKeyInMatch": false, "isFraud": false}
{"accountNumber": "737265056", "customerId": "737265056", "creditLimit": 5000.0, "availableMoney": 5000.0, "transactionDateTime": "2016-10-11T05:05:54", "transactionAmount": 74.51, "merchantName": "AMC #191138", "acqCountry": "US", "merchantCountryCode": "US", "posEntryMode": "09", "posConditionCode": "01", "merchantCategoryCode": "entertainment", "cardPresent": true, "currentExpDate": "02/2024", "accountOpenDate": "2015-03-14", "dateOfLastAddressChange": "2015-03-14", "cardCVV": "486", "enteredCVV": "486", "cardLast4Digits": "767", "transactionType": "PURCHASE", "echoBuffer": "", "currentBalance": 0.0, "merchantCity": "", "merchantState": "", "merchantZip": "", "posOnPremises": "", "recurringAuthInd": "", "expirationDateKeyInMatch": false, "isFraud": false}
{"accountNumber": "737265056", "customerId": "737265056", "creditLimit": 5000.0, "availableMoney": 5000.0, "transactionDateTime": "2016-11-08T09:18:39", "transactionAmount": 7.47, "merchantName": "Play Store", "acqCountry": "US", "merchantCountryCode": "US", "posEntryMode": "09", "posConditionCode": "01", "merchantCategoryCode": "mobileapps", "currentExpDate": "08/2025", "accountOpenDate": "2015-03-14", "dateOfLastAddressChange": "2015-03-14", "cardCVV": "486", "enteredCVV": "486", "cardLast4Digits": "767", "transactionType": "PURCHASE", "echoBuffer": "", "currentBalance": 0.0, "merchantCity": "", "merchantState": "", "merchantZip": "", "cardPresent": false, "posOnPremises": "", "recurringAuthInd": "", "expirationDateKeyInMatch": false, "isFraud": false}
{"accountNumber": "737265056", "customerId": "737265056", "creditLimit": 5000.0, "availableMoney": 5000.0, "transactionDateTime": "2016-12-10T02:14:50", "transactionAmount": 7.47, "merchantName": "Play Store", "acqCountry": "US", "merchantCountryCode": "US", "posEntryMode": "09", "posConditionCode": "01", "merchantCategoryCode": "mobileapps", "currentExpDate": "08/2025", "accountOpenDate": "2015-03-14", "dateOfLastAddressChange": "2015-03-14", "cardCVV": "486", "enteredCVV": "486", "cardLast4Digits": "767", "transactionType": "PURCHASE", "echoBuffer": "", "currentBalance": 0.0, "merchantCity": "", "merchantState": "", "merchantZip": "", "cardPresent": false, "posOnPremises": "", "recurringAuthInd": "", "expirationDateKeyInMatch": false, "isFraud": false}
.
.
.
.
.
.
.
如何将其加载到数据框中?
每一行看起来都是一个单独的 json 对象。仅使用 Python:
- 将文件的每一行读入一个字符串
- 将每一行从 json 转换为对应的 Python 字典
- 将这些命令附加到列表中
- 将字典列表转换为 Pandas DataFrame
import pandas as pd
import json
with open('data.json') as f:
lines = f.readlines()
data = []
for line in lines:
data.append(json.loads(line))
df = pd.DataFrame(data)
df
看起来像
accountNumber customerId creditLimit availableMoney transactionDateTime transactionAmount merchantName acqCountry merchantCountryCode posEntryMode posConditionCode merchantCategoryCode currentExpDate accountOpenDate dateOfLastAddressChange cardCVV enteredCVV cardLast4Digits transactionType echoBuffer currentBalance merchantCity merchantState merchantZip cardPresent posOnPremises recurringAuthInd expirationDateKeyInMatch isFraud
0 737265056 737265056 5000.0 5000.0 2016-08-13T14:27:32 98.55 Uber US US 02 01 rideshare 06/2023 2015-03-14 2015-03-14 414 414 1803 PURCHASE 0.0 False False False
1 737265056 737265056 5000.0 5000.0 2016-10-11T05:05:54 74.51 AMC #191138 US US 09 01 entertainment 02/2024 2015-03-14 2015-03-14 486 486 767 PURCHASE 0.0 True False False
2 737265056 737265056 5000.0 5000.0 2016-11-08T09:18:39 7.47 Play Store US US 09 01 mobileapps 08/2025 2015-03-14 2015-03-14 486 486 767 PURCHASE 0.0 False False False
3 737265056 737265056 5000.0 5000.0 2016-12-10T02:14:50 7.47 Play Store US US 09 01 mobileapps 08/2025 2015-03-14 2015-03-14 486 486 767 PURCHASE 0.0 False False False
如果文件只有一个 json 对象而不是每行一个新的 json 对象,您可以只使用 pandas.read_json(file_path)
@Abhishek Mishra,只需输入 'transactions.txt' 而不是 'data.json'。它像魔术一样工作。我一直在为同样的任务而苦苦挣扎。
谢谢@Tyler
我有一个类似这样的文本文件 (.txt):
{"accountNumber": "737265056", "customerId": "737265056", "creditLimit": 5000.0, "availableMoney": 5000.0, "transactionDateTime": "2016-08-13T14:27:32", "transactionAmount": 98.55, "merchantName": "Uber", "acqCountry": "US", "merchantCountryCode": "US", "posEntryMode": "02", "posConditionCode": "01", "merchantCategoryCode": "rideshare", "currentExpDate": "06/2023", "accountOpenDate": "2015-03-14", "dateOfLastAddressChange": "2015-03-14", "cardCVV": "414", "enteredCVV": "414", "cardLast4Digits": "1803", "transactionType": "PURCHASE", "echoBuffer": "", "currentBalance": 0.0, "merchantCity": "", "merchantState": "", "merchantZip": "", "cardPresent": false, "posOnPremises": "", "recurringAuthInd": "", "expirationDateKeyInMatch": false, "isFraud": false}
{"accountNumber": "737265056", "customerId": "737265056", "creditLimit": 5000.0, "availableMoney": 5000.0, "transactionDateTime": "2016-10-11T05:05:54", "transactionAmount": 74.51, "merchantName": "AMC #191138", "acqCountry": "US", "merchantCountryCode": "US", "posEntryMode": "09", "posConditionCode": "01", "merchantCategoryCode": "entertainment", "cardPresent": true, "currentExpDate": "02/2024", "accountOpenDate": "2015-03-14", "dateOfLastAddressChange": "2015-03-14", "cardCVV": "486", "enteredCVV": "486", "cardLast4Digits": "767", "transactionType": "PURCHASE", "echoBuffer": "", "currentBalance": 0.0, "merchantCity": "", "merchantState": "", "merchantZip": "", "posOnPremises": "", "recurringAuthInd": "", "expirationDateKeyInMatch": false, "isFraud": false}
{"accountNumber": "737265056", "customerId": "737265056", "creditLimit": 5000.0, "availableMoney": 5000.0, "transactionDateTime": "2016-11-08T09:18:39", "transactionAmount": 7.47, "merchantName": "Play Store", "acqCountry": "US", "merchantCountryCode": "US", "posEntryMode": "09", "posConditionCode": "01", "merchantCategoryCode": "mobileapps", "currentExpDate": "08/2025", "accountOpenDate": "2015-03-14", "dateOfLastAddressChange": "2015-03-14", "cardCVV": "486", "enteredCVV": "486", "cardLast4Digits": "767", "transactionType": "PURCHASE", "echoBuffer": "", "currentBalance": 0.0, "merchantCity": "", "merchantState": "", "merchantZip": "", "cardPresent": false, "posOnPremises": "", "recurringAuthInd": "", "expirationDateKeyInMatch": false, "isFraud": false}
{"accountNumber": "737265056", "customerId": "737265056", "creditLimit": 5000.0, "availableMoney": 5000.0, "transactionDateTime": "2016-12-10T02:14:50", "transactionAmount": 7.47, "merchantName": "Play Store", "acqCountry": "US", "merchantCountryCode": "US", "posEntryMode": "09", "posConditionCode": "01", "merchantCategoryCode": "mobileapps", "currentExpDate": "08/2025", "accountOpenDate": "2015-03-14", "dateOfLastAddressChange": "2015-03-14", "cardCVV": "486", "enteredCVV": "486", "cardLast4Digits": "767", "transactionType": "PURCHASE", "echoBuffer": "", "currentBalance": 0.0, "merchantCity": "", "merchantState": "", "merchantZip": "", "cardPresent": false, "posOnPremises": "", "recurringAuthInd": "", "expirationDateKeyInMatch": false, "isFraud": false}
.
.
.
.
.
.
.
如何将其加载到数据框中?
每一行看起来都是一个单独的 json 对象。仅使用 Python:
- 将文件的每一行读入一个字符串
- 将每一行从 json 转换为对应的 Python 字典
- 将这些命令附加到列表中
- 将字典列表转换为 Pandas DataFrame
import pandas as pd
import json
with open('data.json') as f:
lines = f.readlines()
data = []
for line in lines:
data.append(json.loads(line))
df = pd.DataFrame(data)
df
看起来像
accountNumber customerId creditLimit availableMoney transactionDateTime transactionAmount merchantName acqCountry merchantCountryCode posEntryMode posConditionCode merchantCategoryCode currentExpDate accountOpenDate dateOfLastAddressChange cardCVV enteredCVV cardLast4Digits transactionType echoBuffer currentBalance merchantCity merchantState merchantZip cardPresent posOnPremises recurringAuthInd expirationDateKeyInMatch isFraud
0 737265056 737265056 5000.0 5000.0 2016-08-13T14:27:32 98.55 Uber US US 02 01 rideshare 06/2023 2015-03-14 2015-03-14 414 414 1803 PURCHASE 0.0 False False False
1 737265056 737265056 5000.0 5000.0 2016-10-11T05:05:54 74.51 AMC #191138 US US 09 01 entertainment 02/2024 2015-03-14 2015-03-14 486 486 767 PURCHASE 0.0 True False False
2 737265056 737265056 5000.0 5000.0 2016-11-08T09:18:39 7.47 Play Store US US 09 01 mobileapps 08/2025 2015-03-14 2015-03-14 486 486 767 PURCHASE 0.0 False False False
3 737265056 737265056 5000.0 5000.0 2016-12-10T02:14:50 7.47 Play Store US US 09 01 mobileapps 08/2025 2015-03-14 2015-03-14 486 486 767 PURCHASE 0.0 False False False
如果文件只有一个 json 对象而不是每行一个新的 json 对象,您可以只使用 pandas.read_json(file_path)
@Abhishek Mishra,只需输入 'transactions.txt' 而不是 'data.json'。它像魔术一样工作。我一直在为同样的任务而苦苦挣扎。
谢谢@Tyler