"50917234872295425","Wed Mar 23 18:37:48 +0000 2011","120374665","97903753","stuff",""
to this:
"50917234872295425","2011-03-24 07:49:49","120374665","97903753","stuff",""
Task "accomplished" in awk:
awk -F"\",\"" '{OFS="\",\""
gsub(/\{7\}/,"REPLACED")
if (NF > 2) {
ORS=""
print $1 OFS
system("echo -n $(date \"+%Y-%m-%d %H:%M:%S\" -d \"" $2 "\")")
print OFS
for (i=3;i<NF;i++) print $i OFS
print $NF"\n" }
else
print $0}' $1
Task "accomplished" in perl:
perl -ne 'use Date::Parse;I chose awk and Perl because they split columns using multi-character separators (","). Actually, I chose awk first because I am really starting to hate Perl. Then I said, "wow, this was a lot of work and I bet it would be easier to do this in Perl."
s/\{7\}/REPLACED/g;
@cols=split(/","/,$_);
($ss,$mm,$hh,$day,$month,$year,$zone) = strptime($cols[1]);
$year+=1900; # Grr. Perl.
$month+=1; # Double Grr.
print qq!$cols[0]","!;
printf(qq!%04d-%02d-%02d %02d:%02d:%02d","!, $year, $month, $day, $hh, $mm, $ss);
for ($i=2 ; $i<$#cols ; $i++) {
print qq!$cols[$i]","!;
}
print "$cols[$#cols]";' $1
Either way, I had to deal with the rest of the record, which may span multiple lines and is more likely than not to contain foreign languages in the fifth column. Hence the quotes around the words "accomplished." Neither of these methods are fully tested. The ultimate recipient of this data is Postgres and I wonder if I can harness some SQL power to accomplish the same task?
(The answer to that question is "no." While I am ultimately importing data into Postgres, I am importing it using Netezza's "nzload" utility, which expects rather limited date inputs. I think a direct Postgres import would just accept the timestamp as-is.)
I dare say that writing this in awk was quicker, and required no FTP trips to CPAN. Damn you, Perl. Our tenuous relationship really is coming to an end.
0 comments:
Post a Comment