parsing - Hadoop Informatica Log processing -

- March 15, 2014

i working on project involving creating queryable set of data large informatica log files. so, files imported hadoop cluster using flume, configured coworker before began project. job create table data contained within logs queries can performed easily. issue i'm encountering has log file formatting. logs in format:

timestamp : severity : (pid | thread) : (servicetype | servicename) : clientnode : messagecode : message

the issue message field contains additional colon-delimited comments, example message [ x : y : z ]. when using hcatalog create table cannot account behavior , instead results in additional columns.

any suggestions? use ruby separate fields or replace delimiter keep integrity when importing using hcatalog. there pre-processing can cluster side allowing me this? files large handle locally.

the answer use pig script , python udf. pig script loads in file calls python script line line break fields properly. result can written friendlier csv and/or stored in table.

Search This Blog

Overvie

parsing - Hadoop Informatica Log processing -

Comments

Post a Comment

Popular posts from this blog

android - Gradle sync Error:Configuration with name 'default' not found -

StringGrid issue in Delphi XE8 firemonkey mobile app -

html - jQuery UI Sortable - Remove placeholder after item is dropped -