hadoop - Will sequence file help in improve performance for reading in HDFS compared to Local File System? -


i want compare performance hdfs , local file system 1000 of small files (1-2 mb). without using sequence files, hdfs takes double time reading 1000 files compared local file system. heard of sequence files here - small files problem in hdfs want show better response time hdfs retrieving these records local fs. sequence files or should else? (hbase maybe)

edit: i'm using java program read files here hdfs read though java

yes, simple file retrieval grabbing single sequence file quicker grabbing 1000 files. when reading hdfs incur more overhead including spinning jvm (assuming you're using hadoop fs -get ...), getting location of each of files namenode, network time (assuming have more 1 datanode).

a sequence file can thought of form of container. if put 1000 files sequence file, need grab 32 blocks (if blocksize set 64mb) rather 1000. reduce location lookups , total network connections made. run issue @ point reading sequence file. binary format.

hbase better suited low-latency , random reads, may better option you. keep in mind disk seeks still occur (unless you're working memory), reading bunch of small files locally may better solution using hdfs file store.


Comments

Popular posts from this blog

android - Gradle sync Error:Configuration with name 'default' not found -

java - Andrioid studio start fail: Fatal error initializing 'null' -

html - jQuery UI Sortable - Remove placeholder after item is dropped -