Posts

Showing posts from December, 2013

How HDFS protects your data

Image
Often we get questions how HDFS protects data and what the mechanisms are to prevent data corruption. Eric Sammer explain this en detail in Hadoop Operations . Additional to the points below, you can also have a second cluster to sync the files, simply to prevent human being failures, like deleting a subset of data. If you have enough space in your cluster, enabling the trash per core-site.xml and setting to a higher value then a day helps too. <property> <name>fs.trash.interval</name> <value>1440</value> <description>Number of minutes after which the checkpoint gets deleted. If zero, the trash feature is disabled. 1440 means 1 day </description> </property> <property> <name>fs.trash.checkpoint.interval</name> <value>15</value> <description>Number of minutes between trash checkpoints. Should be smaller or equal to fs.trash.interval. Every time the checkpointer runs it creates a new check