背景
我们都知道hbase的数据是分布在多台RegionServer角色的机器上的,每个RegionServer都有一到多个Region管理不同rowkey范围的数据,所以建表前通过合理的Region的分区及数量,可以避免热点读写问题和充分利用各RegionServer的资源,vmaster-hbase提供了预分区的功能
手动分区
用户根据数据特点和资源组机器数量提供分割点
1.1分割点是字符串
1.2分割点是整数
hbase存储的都是二进制的byte,所有Int类型的分割点都要转换为十六进制传入,比如我们有如下分割点:1,10,15,每个分割点都是一个Int类型,可以利用Bytes.toHex(Bytes.toBytes(splitPoint))得出分割点的十六进制表示: 分割点十六进制表示
1.2.1分割点十六进制表示
Int | 十六进制表示 |
---|---|
1 | \x00\x00\x00\x01 |
10 | \x00\x00\x00\x0a |
15 | \x00\x00\x00\x0f |
1.2.2分割点测试
自动分区
2.1 HexStringSplit
分区数根据机器数选择,推荐每台机器20~30个region
rowkey是整数时,建议采用此分区算法,HexStringSplit将整个无符号整数范围00000000~FFFFFFFF根据region数据平均划分,转化为十六进制字符,长度不够8自动左填充'0',调用Bytes.toBytes(bigIntegerString)转到字节数组,核心代码如下:
2.1.1Rowkey范围切分
public byte[][] split(int n) { Preconditions.checkArgument(lastRowInt.compareTo(firstRowInt) > 0, "last row (%s) is configured less than first row (%s)", lastRow, firstRow); // +1 to range because the last row is inclusive BigInteger range = lastRowInt.subtract(firstRowInt).add(BigInteger.ONE); Preconditions.checkState(range.compareTo(BigInteger.valueOf(n)) >= 0, "split granularity (%s) is greater than the range (%s)", n, range); BigInteger[] splits = new BigInteger[n - 1]; BigInteger sizeOfEachSplit = range.divide(BigInteger.valueOf(n)); for (int i = 1; i < n; i++) { // NOTE: this means the last region gets all the slop. // This is not a big deal if we're assuming n << MAXHEX splits[i - 1] = firstRowInt.add(sizeOfEachSplit.multiply(BigInteger .valueOf(i))); } return convertToBytes(splits); } |
2.1.2分割点转为字节数组
/** * Returns the bytes corresponding to the BigInteger * * @param bigInteger number to convert * @param pad padding length * @return byte corresponding to input BigInteger */ public static byte[] convertToByte(BigInteger bigInteger, int pad) { String bigIntegerString = bigInteger.toString(16); bigIntegerString = StringUtils.leftPad(bigIntegerString, pad, '0'); return Bytes.toBytes(bigIntegerString); } |
2.2 UniformSplit
分区数根据机器数选择,推荐每台机器20~30个region
当rowkey是原始字节数组byte[],raw byte的范围是\x00~\xff,rowKey接近统一随机的byte值比如hashes,采用此分区算法,UniformSplit采用BigInteger的toByteArray()转化分割点
2.2.1分割点算法
/** * Iterate over keys within the passed range. */ public static Iterable<byte[]> iterateOnSplits( final byte[] a, final byte[]b, boolean inclusive, final int num) { byte [] aPadded; byte [] bPadded; if (a.length < b.length) { aPadded = padTail(a, b.length - a.length); bPadded = b; } else if (b.length < a.length) { aPadded = a; bPadded = padTail(b, a.length - b.length); } else { aPadded = a; bPadded = b; } if (compareTo(aPadded,bPadded) >= 0) { throw new IllegalArgumentException("b <= a"); } if (num <= 0) { throw new IllegalArgumentException("num cannot be <= 0"); } byte [] prependHeader = {1, 0}; final BigInteger startBI = new BigInteger(add(prependHeader, aPadded)); final BigInteger stopBI = new BigInteger(add(prependHeader, bPadded)); BigInteger diffBI = stopBI.subtract(startBI); if (inclusive) { diffBI = diffBI.add(BigInteger.ONE); } final BigInteger splitsBI = BigInteger.valueOf(num + 1); //when diffBI < splitBI, use an additional byte to increase diffBI if(diffBI.compareTo(splitsBI) < 0) { byte[] aPaddedAdditional = new byte[aPadded.length+1]; byte[] bPaddedAdditional = new byte[bPadded.length+1]; for (int i = 0; i < aPadded.length; i++){ aPaddedAdditional[i] = aPadded[i]; } for (int j = 0; j < bPadded.length; j++){ bPaddedAdditional[j] = bPadded[j]; } aPaddedAdditional[aPadded.length] = 0; bPaddedAdditional[bPadded.length] = 0; return iterateOnSplits(aPaddedAdditional, bPaddedAdditional, inclusive, num); } final BigInteger intervalBI; try { intervalBI = diffBI.divide(splitsBI); } catch(Exception e) { LOG.error("Exception caught during division", e); return null; } final Iterator<byte[]> iterator = new Iterator<byte[]>() { private int i = -1; @Override public boolean hasNext() { return i < num+1; } @Override public byte[] next() { i++; if (i == 0) return a; if (i == num + 1) return b; BigInteger curBI = startBI.add(intervalBI.multiply(BigInteger.valueOf(i))); byte [] padded = curBI.toByteArray(); if (padded[1] == 0) padded = tail(padded, padded.length - 2); else padded = tail(padded, padded.length - 1); return padded; } @Override public void remove() { throw new UnsupportedOperationException(); } }; return new Iterable<byte[]>() { @Override public Iterator<byte[]> iterator() { return iterator; } }; } |