格式化和转换日期和时间
Formatting and converting dates and time
我有一个非常大 (13 GiB) csv 文件(3856321 行和 1698),其中一些日期的格式与预期的不同。该文件看起来像 ::
2013/01/08 2:11:30 AM,abdc,good time ...
2015/12/28 8:19:30 PM,abdc,good time ...
2/15/2016 10:46:30 AM,kdafh,almost as good ...
12/13/2014 10:46:00 PM,asjhdk,not that good ...
02-Jan-2014,bad time,good time ...
1/1/2015,nomiss time,boy ...
10/15/2016 17:08:30,bad,boy ...
我想将其转换为相同的时间格式,要求输出为 ::
1/8/2013 2:11:30,abdc,good time
12/28/2015 20:19:30,abdc,good time
2/15/2016 10:46:30,kdafh,almost as good
12/13/2014 22:46:00,asjhdk,not that good
1/2/2014 00:00:00,bad time,good time
1/1/2015 00:00:00,nomiss time,boy
10/15/2016 17:08:30,bad,boy
我设法使用以下脚本格式化时间
awk -F ',' 'BEGIN{FS=OFS=","}{split(,a," ");
if(a[3]=="PM")
{ split(a[2],b,":");
b[1]=b[1]+12
a[2]=b[1]":"b[2]":"b[3]
};
if(a[2]=="")
{
a[2]="00:00:00"
}
tmp=a[1];
# tmp2=system("date -d `tmp` +%m/%d/%Y");
# print tmp2
=tmp" "a[2]
}1' time_input.csv
我从问题 https://unix.stackexchange.com/questions/177888/how-to-convert-date-format-in-file 中借用了格式化日期的想法
在倒数第二行被注释掉了。但是,这对我来说不起作用。我收到一个错误
date: invalid date ‘+%m/%d/%Y’
有没有更简单更好的方法来做到这一点?提前致谢
Awk 确实是一种很好的方法,但是因为这里真的很早,所以我不想考虑所有这些 if
,所以这是 php 中的一个,因为它是得到了一个非常好的 strtotime
函数:
$ cat program.php
<?php
$handle = fopen("file", "r");
if ($handle) {
while (($line = fgets($handle)) !== false) {
// process the line read.
$arr = explode(",", $line, 2);
echo date("m/d/Y H:i:s", strtotime($arr[0])), ",", $arr[1];
}
fclose($handle);
} else {
// error opening the file.
}
运行它:
$ php -f program.php
01/08/2013 02:11:30,abdc,good time
12/28/2015 20:19:30,abdc,good time
02/15/2016 10:46:30,kdafh,almost as good
12/13/2014 22:46:00,asjhdk,not that good
01/02/2014 00:00:00,bad time,good time
01/01/2015 00:00:00,nomiss time,boy
10/15/2016 17:08:30,bad,boy
逐行循环读取来自这里:How to read a file line by line in php。我只添加了 explode
和 strtotime
.
行
explode
将行按第一个 ,
分割成多个部分,并将它们存储到数组 $arr
中。 strtotime
函数应用于第一个元素 $arr[0]
。 $arr[1]
稍后按原样输出。
使用 Python,使用 dateutils
和 csv
模块:
import dateutil.parser as parser
import csv
with open('time_input.csv', 'rb') as inputfile, open('time_output.csv', 'w') as outputfile:
reader = csv.reader(inputfile, delimiter=',')
writer = csv.writer(outputfile)
for row in reader:
row[0] = parser.parse(row[0]).strftime('%m/%d/%Y %H:%M:%S')
writer.writerow(row)
结果输出到time_output.csv
文件。
您可以尝试以下 awk 命令 -
Input
vipin@kali:~$ cat kk.txt
2013/01/08 2:11:30 AM,abdc,good time
2015/12/28 8:19:30 PM,abdc,good time
2/15/2016 10:46:30 AM,kdafh,almost as good
12/13/2014 10:46:00 PM,asjhdk,not that good
02-Jan-2014,bad time,good time
1/1/2015,nomiss time,boy
10/15/2016 17:08:30,bad,boy
filtering -
vipin@kali:~$ awk -F"," '{split(,a," "); printf ("%s,%s,%s",,,",");system("date -d \""a[1]" "a[2]"\" +\"%m/%d/%Y %H:%M:%S\"")}' kk.txt
abdc,good time,,01/08/2013 02:11:30
abdc,good time,,12/28/2015 08:19:30
kdafh,almost as good,,02/15/2016 10:46:30
asjhdk,not that good,,12/13/2014 10:46:00
bad time,good time,,01/02/2014 00:00:00
nomiss time,boy,,01/01/2015 00:00:00
bad,boy,,10/15/2016 17:08:30
Move the filtered output to file kk.txt2
vipin@kali:~$ awk -F"," '{split(,a," "); printf ("%s,%s,%s",,,",");system("date -d \""a[1]" "a[2]"\" +\"%m/%d/%Y %H:%M:%S\"")}' kk.txt > kk.txt2
Output
vipin@kali:~$ awk -F"," '{print $NF,,}' OFS="," kk.txt2
01/08/2013 02:11:30,abdc,good time
12/28/2015 08:19:30,abdc,good time
02/15/2016 10:46:30,kdafh,almost as good
12/13/2014 10:46:00,asjhdk,not that good
01/02/2014 00:00:00,bad time,good time
01/01/2015 00:00:00,nomiss time,boy
10/15/2016 17:08:30,bad,boy
Explanation -
在第 1 列上使用 Split
函数并将其放入 a 中,然后使用 awk 的 system
函数根据我们的需要格式化日期。
我可以按顺序打印输出,但它打印的是前导零,所以我在最后一列中打印格式化日期,这就是我将数据移动到另一个文件中的原因。
最后您可以打印订单中的列。
我有一个非常大 (13 GiB) csv 文件(3856321 行和 1698),其中一些日期的格式与预期的不同。该文件看起来像 ::
2013/01/08 2:11:30 AM,abdc,good time ...
2015/12/28 8:19:30 PM,abdc,good time ...
2/15/2016 10:46:30 AM,kdafh,almost as good ...
12/13/2014 10:46:00 PM,asjhdk,not that good ...
02-Jan-2014,bad time,good time ...
1/1/2015,nomiss time,boy ...
10/15/2016 17:08:30,bad,boy ...
我想将其转换为相同的时间格式,要求输出为 ::
1/8/2013 2:11:30,abdc,good time
12/28/2015 20:19:30,abdc,good time
2/15/2016 10:46:30,kdafh,almost as good
12/13/2014 22:46:00,asjhdk,not that good
1/2/2014 00:00:00,bad time,good time
1/1/2015 00:00:00,nomiss time,boy
10/15/2016 17:08:30,bad,boy
我设法使用以下脚本格式化时间
awk -F ',' 'BEGIN{FS=OFS=","}{split(,a," ");
if(a[3]=="PM")
{ split(a[2],b,":");
b[1]=b[1]+12
a[2]=b[1]":"b[2]":"b[3]
};
if(a[2]=="")
{
a[2]="00:00:00"
}
tmp=a[1];
# tmp2=system("date -d `tmp` +%m/%d/%Y");
# print tmp2
=tmp" "a[2]
}1' time_input.csv
我从问题 https://unix.stackexchange.com/questions/177888/how-to-convert-date-format-in-file 中借用了格式化日期的想法 在倒数第二行被注释掉了。但是,这对我来说不起作用。我收到一个错误
date: invalid date ‘+%m/%d/%Y’
有没有更简单更好的方法来做到这一点?提前致谢
Awk 确实是一种很好的方法,但是因为这里真的很早,所以我不想考虑所有这些 if
,所以这是 php 中的一个,因为它是得到了一个非常好的 strtotime
函数:
$ cat program.php
<?php
$handle = fopen("file", "r");
if ($handle) {
while (($line = fgets($handle)) !== false) {
// process the line read.
$arr = explode(",", $line, 2);
echo date("m/d/Y H:i:s", strtotime($arr[0])), ",", $arr[1];
}
fclose($handle);
} else {
// error opening the file.
}
运行它:
$ php -f program.php
01/08/2013 02:11:30,abdc,good time
12/28/2015 20:19:30,abdc,good time
02/15/2016 10:46:30,kdafh,almost as good
12/13/2014 22:46:00,asjhdk,not that good
01/02/2014 00:00:00,bad time,good time
01/01/2015 00:00:00,nomiss time,boy
10/15/2016 17:08:30,bad,boy
逐行循环读取来自这里:How to read a file line by line in php。我只添加了 explode
和 strtotime
.
explode
将行按第一个 ,
分割成多个部分,并将它们存储到数组 $arr
中。 strtotime
函数应用于第一个元素 $arr[0]
。 $arr[1]
稍后按原样输出。
使用 Python,使用 dateutils
和 csv
模块:
import dateutil.parser as parser
import csv
with open('time_input.csv', 'rb') as inputfile, open('time_output.csv', 'w') as outputfile:
reader = csv.reader(inputfile, delimiter=',')
writer = csv.writer(outputfile)
for row in reader:
row[0] = parser.parse(row[0]).strftime('%m/%d/%Y %H:%M:%S')
writer.writerow(row)
结果输出到time_output.csv
文件。
您可以尝试以下 awk 命令 -
Input
vipin@kali:~$ cat kk.txt
2013/01/08 2:11:30 AM,abdc,good time
2015/12/28 8:19:30 PM,abdc,good time
2/15/2016 10:46:30 AM,kdafh,almost as good
12/13/2014 10:46:00 PM,asjhdk,not that good
02-Jan-2014,bad time,good time
1/1/2015,nomiss time,boy
10/15/2016 17:08:30,bad,boy
filtering -
vipin@kali:~$ awk -F"," '{split(,a," "); printf ("%s,%s,%s",,,",");system("date -d \""a[1]" "a[2]"\" +\"%m/%d/%Y %H:%M:%S\"")}' kk.txt
abdc,good time,,01/08/2013 02:11:30
abdc,good time,,12/28/2015 08:19:30
kdafh,almost as good,,02/15/2016 10:46:30
asjhdk,not that good,,12/13/2014 10:46:00
bad time,good time,,01/02/2014 00:00:00
nomiss time,boy,,01/01/2015 00:00:00
bad,boy,,10/15/2016 17:08:30
Move the filtered output to file kk.txt2
vipin@kali:~$ awk -F"," '{split(,a," "); printf ("%s,%s,%s",,,",");system("date -d \""a[1]" "a[2]"\" +\"%m/%d/%Y %H:%M:%S\"")}' kk.txt > kk.txt2
Output
vipin@kali:~$ awk -F"," '{print $NF,,}' OFS="," kk.txt2
01/08/2013 02:11:30,abdc,good time
12/28/2015 08:19:30,abdc,good time
02/15/2016 10:46:30,kdafh,almost as good
12/13/2014 10:46:00,asjhdk,not that good
01/02/2014 00:00:00,bad time,good time
01/01/2015 00:00:00,nomiss time,boy
10/15/2016 17:08:30,bad,boy
Explanation -
在第 1 列上使用 Split
函数并将其放入 a 中,然后使用 awk 的 system
函数根据我们的需要格式化日期。
我可以按顺序打印输出,但它打印的是前导零,所以我在最后一列中打印格式化日期,这就是我将数据移动到另一个文件中的原因。 最后您可以打印订单中的列。