用HBaseAdmin的同学要小心--记一次HBase RegionServer的退出
一大早过来,有RegionServer挂了。
查看log,显示
2011-09-25 22:31:51,185 [main-SendThread(XXX:2181)] INFO org.apache.zookeeper.ClientCnxn: Unable to read additional data from server sessionid 0x30694969fba6a9, likely server has closed socket, closing socket connection and attempting reconnect
很明晰,连不上zookeeper了,多次重连接后失败,之后,regionserver退出。
这台机器是比较特殊的,因为我在上面跑了个HBase的availability 监控。
我担心会不会是这个导致的,虽然代码非常简单。
于是
netstat -anp|grep 2181
一查看,果真,这个进程居然占了近3千个zookeeper connection,因为client 端连接zookeeper的connection是有限制的,default 是30, 我们这里设置到3000。这样,regionserver因为获取不到zookeeper的connection而导致 退出。
因为监控的逻辑实在简单,怀疑落在了这样一句语句上
try {HBaseAdmin.checkHBaseAvailable(hbaseConfig);} catch (MasterNotRunningException e) {logger.error(e.getMessage(),e);return new GangliaData("HBase Cluster Availability", e.getMessage() , 0);} catch (ZooKeeperConnectionException e) {logger.error(e.getMessage(),e);return new GangliaData("HBase Cluster Availability", e.getMessage() , 0);}于是看了下checkHBaseAvailable的代码,
public static void checkHBaseAvailable(Configuration conf) throws MasterNotRunningException, ZooKeeperConnectionException { Configuration copyOfConf = HBaseConfiguration.create(conf); copyOfConf.setInt("hbase.client.retries.number", 1); new HBaseAdmin(copyOfConf); }这里,就是拷贝了conf,new一个新的HBaseAdmin,如果能创建,则证明cluster available,否则会抛出异常。
但是等等,这句里面创建了一个HBaseAdmin的实例,在构造函数里面会创建一个连接zookeeper的connection而不释放。
public HBaseAdmin(Configuration c) throws MasterNotRunningException, ZooKeeperConnectionException { this.conf = HBaseConfiguration.create(c); this.connection = HConnectionManager.getConnection(this.conf); this.pause = this.conf.getLong("hbase.client.pause", 1000); this.numRetries = this.conf.getInt("hbase.client.retries.number", 10); this.retryLongerMultiplier = this.conf.getInt("hbase.client.retries.longer.multiplier", 10); this.connection.getMaster(); }最后查了一下网上,果真,这是HBase的一个bug, HBase 4417。
修改的方法就是
Index: src/main/java/org/apache/hadoop/hbase/client/HBaseAdmin.java===================================================================--- src/main/java/org/apache/hadoop/hbase/client/HBaseAdmin.java(revision 1171389)+++ src/main/java/org/apache/hadoop/hbase/client/HBaseAdmin.java(working copy)@@ -1254,6 +1254,7 @@ throws MasterNotRunningException, ZooKeeperConnectionException { Configuration copyOfConf = HBaseConfiguration.create(conf); copyOfConf.setInt("hbase.client.retries.number", 1);- new HBaseAdmin(copyOfConf);+ HBaseAdmin admin = new HBaseAdmin(copyOfConf);+ HConnectionManager.deleteConnection(admin.getConfiguration(), false); } }