serialization - Serializing multiple files to binary Avro format -
i have directory of files taking large amount of space. i'd compress , serialize each of these files binary avro format free disk space. schema avro data file record containing single field store content of original file. i'm considering making use of utility in avro-tools jar provided apache serialize each file. utility takes avro schema , input file containing records serialized in json format , produces avro data file:
$ java -jar ~/avro-tools-1.7.6.jar fromjson --schema-file twitter.avsc twitter.json > twitter.avro
i'd write bash script executes tool each file in folder, i'm not sure how form equivalent record each file in json format tool expects. each of files hundreds of mb in size. wondering if forming json file ({'content': 'file content..'}) done using text processing commands or tools (awk, sed, etc.) command line. also, there better ways accomplish of achieving larger task, of migrating multiple files avro storage, approach i'm working on?
thanks.
looks far. you'll need schema, like
{ 'type': 'record', 'fields':[{'name': 'content', 'type': bytes}] }
although sympathize desire not fire real programming language, java avro api (for example) makes easier create records, , easier bytes file that's on system.
hope helps,
julian
Comments
Post a Comment