serialization - Serializing multiple files to binary Avro format -


i have directory of files taking large amount of space. i'd compress , serialize each of these files binary avro format free disk space. schema avro data file record containing single field store content of original file. i'm considering making use of utility in avro-tools jar provided apache serialize each file. utility takes avro schema , input file containing records serialized in json format , produces avro data file:

$ java -jar ~/avro-tools-1.7.6.jar fromjson --schema-file twitter.avsc twitter.json > twitter.avro 

i'd write bash script executes tool each file in folder, i'm not sure how form equivalent record each file in json format tool expects. each of files hundreds of mb in size. wondering if forming json file ({'content': 'file content..'}) done using text processing commands or tools (awk, sed, etc.) command line. also, there better ways accomplish of achieving larger task, of migrating multiple files avro storage, approach i'm working on?

thanks.

looks far. you'll need schema, like

{   'type': 'record',   'fields':[{'name': 'content', 'type': bytes}] } 

although sympathize desire not fire real programming language, java avro api (for example) makes easier create records, , easier bytes file that's on system.

hope helps,

julian


Comments

Popular posts from this blog

java - Andrioid studio start fail: Fatal error initializing 'null' -

android - Gradle sync Error:Configuration with name 'default' not found -

StringGrid issue in Delphi XE8 firemonkey mobile app -