murmur-hash\Md5\crc32等算法对用户信息散列均匀程度测试

问题的提出

  • 一致性hash,散列并不均匀
  • 集群中机器性能并不均衡。因素众多,比如型号不同、硬件不同;同型号下,机器性能表现也不同。
  • 当使用一致性hash作为负载均衡策略时,会让机器负载更加不均衡。

因此需要对一致性hash作为负载均衡策略时表现的的特性进行详细探索。最主要观察的指标则是散列均匀程度。

提前给出实验结果

  • murmur hash的散列均匀程度远优于crc32,和MD5相当。但murmur hash在计算消耗上低于MD5
  • 使用murmur hash,在100台物理节点,5000虚拟节点倍数的情况下,可以做到散列均匀和性能的平衡

brpc中如何实现一致性hash负载均衡

// code file: brpc/src/brpc/policy/consistent_hashing_load_balancer.cpp
int ConsistentHashingLoadBalancer::SelectServer(
    const SelectIn &in, SelectOut *out) {
    if (!in.has_request_code) {
        LOG(ERROR) << "Controller.set_request_code() is required";
        return EINVAL;
    }
    if (in.request_code > UINT_MAX) {
        LOG(ERROR) << "request_code must be 32-bit currently";
        return EINVAL;
    }
    butil::DoublyBufferedData >::ScopedPtr s;
    if (_db_hash_ring.Read(&s) != 0) {
        return ENOMEM;
    }
    if (s->empty()) {
        return ENODATA;
    }
    std::vector::const_iterator choice =
        std::lower_bound(s->begin(), s->end(), (uint32_t)in.request_code);
    if (choice == s->end()) {
        choice = s->begin();
    }
    for (size_t i = 0; i < s->size(); ++i) {
        if (((i + 1) == s->size() // always take last chance
             || !ExcludedServers::IsExcluded(in.excluded, choice->server_sock.id))
            && Socket::Address(choice->server_sock.id, out->ptr) == 0
            && (*out->ptr)->IsAvailable()) {
            return 0;
        } else {
            if (++choice == s->end()) {
                choice = s->begin();
            }
        }
    }
    return EHOSTDOWN;
}
bool ConsistentHashingLoadBalancer::AddServer(const ServerId& server) {
    std::vector add_nodes;
    add_nodes.reserve(_num_replicas);
    if (!GetReplicaPolicy(_type)->Build(server, _num_replicas, &add_nodes)) {
        return false;
    }
    std::sort(add_nodes.begin(), add_nodes.end());
    bool executed = false;
    const size_t ret = _db_hash_ring.ModifyWithForeground(
                        AddBatch, add_nodes, &executed);
    CHECK(ret == 0 || ret == _num_replicas) << ret;
    return ret != 0;
}

由代码看出hash环中的每台服务器会变为100个虚拟节点,假设有100台机器,则在hash环分布上会有10000个节点

不同散列算法对比

hash/曼哈顿距离/虚拟节点数 100*100 1000*100 3000*100 5000*100 10000*1000
murmur 30% 20% 10% 5% 5%
MD5 30% 20% 10% 5% 5%
crc32 150% 100% 50% 30% 30%

结论,murmur hash的散列均匀程度优于crc32、MD5等

MurMurHash算法分析

测试murmurhash 散列均匀程度的代码

int main () {
    int32_t rep = 100;
    int32_t base = 100;
    std::vector  nodes;
    nodes.resize(base);
    std::vector imeis;

    std::map node_info;
    std::vector rings;

    // init cs rings
    for (auto& node : nodes) {
        node.host = std::to_string(rand());
        for (int32_t i = 0; i < rep; ++i) {
            std::string host = node.host +  "-" + std::to_string(i);
            uint32_t hash = MurmurHash32(host.c_str(), host.size());
            node.hash = hash;
            node_info[hash] = &node;
            rings.push_back(hash);
        }
    }
    std::sort(rings.begin(), rings.end());
    // read imeis
    {
        std::ifstream if_stream("./imei_sample.txt");
        std::string imei;
        while(if_stream >> imei) {
            imeis.emplace_back(imei);
        }
        if_stream.close();
        cout << "imei size:" << imeis.size()
            << endl;
    }
    // search
    for (auto& imei : imeis) {
        uint32_t hash = MurmurHash32(imei.c_str(), imei.size());
        auto iter = std::lower_bound(rings.begin(), rings.end(), hash);
        if (iter == rings.end()) {
            iter = rings.begin();
        }
        node_info[*iter]->count += 1;
    }

    // print
    std::sort(nodes.begin(), nodes.end(), [](const Node& a, Node& b)->bool { return( a.count < b.count); });
    for (auto& node : nodes) {
        cout << "host:" << node.host
            << "\thash:" << node.hash
            << "\tcount:" << node.count
            << "\trate:" << (double)node.count/(double)imeis.size()
            << endl;
    }

    auto max = nodes.rbegin()->count;
    auto avg = imeis.size()/nodes.size();
    auto min = nodes.begin()->count;
    cout << "max:" << nodes.rbegin()->count << endl;
    cout << "avg:" << imeis.size()/nodes.size() << endl;
    cout << "min:" << nodes.begin()->count << endl;
    cout << "up:" << (double)(max-avg)/(double)avg << endl;
    cout << "down:" << (double)(avg-min)/(double)avg << endl;
    return 0;

测试结果

nodes_num chash_num_replicas 单核cpu使用率% 内存使用率% 100万次查询/s 波动率 hash ring时间 us hash ring sort时间 us
100 100 3 0.3 1.56 0.403038 14014 3564
100 1000 3 0.4 1.89783 0.0884258 156900 42973
100 5000 4.7 0.5 2.49725 0.0536913 880742 244460
100 10000 5 0.6 2.60658 0.0416814 1900755 519096
100 50000 10 1 3.24731 0.024373 11680936 2852591
100 100000 15 3 3.52997 0.0270811 25035142 5932167
100 500000 100 11 4.62568 0.0300247 148615862 32576504

结论

总量 235W imei
波动范围:-21~35%
+0.352182
-0.210649

./murmur_hash
imei size:2362768
host:278722862  hash:1925471051 count:18650 rate:0.00789328
host:783368690  hash:4002282954 count:19363 rate:0.00819505
host:1548233367 hash:3058897622 count:19635 rate:0.00831017
host:491705403  hash:611553964  count:19699 rate:0.00833726
host:1984210012 hash:1268219422 count:19909 rate:0.00842613
host:1303455736 hash:2326844163 count:20035 rate:0.00847946
host:855636226  hash:3637853163 count:20067 rate:0.008493
host:1059961393 hash:2408710735 count:20074 rate:0.00849597
host:1957747793 hash:1288415059 count:20084 rate:0.0085002
host:1477171087 hash:1988358724 count:20170 rate:0.0085366
host:1540383426 hash:411668640  count:20173 rate:0.00853787
host:610515434  hash:663691886  count:20526 rate:0.00868727
host:1973594324 hash:34305553   count:20670 rate:0.00874821
host:760313750  hash:3679971615 count:20685 rate:0.00875456
host:137806862  hash:1969944721 count:20771 rate:0.00879096
host:294702567  hash:2546435079 count:20788 rate:0.00879816
host:468703135  hash:646134042  count:20832 rate:0.00881678
host:943947739  hash:90910835   count:21160 rate:0.0089556
host:709393584  hash:574850828  count:21243 rate:0.00899073
host:1681692777 hash:3333659905 count:21308 rate:0.00901824
host:1102520059 hash:1769249379 count:21493 rate:0.00909653
host:1632621729 hash:1495248802 count:21503 rate:0.00910077
host:233665123  hash:1873607595 count:21784 rate:0.00921969
host:859484421  hash:3330505905 count:21797 rate:0.0092252
host:424238335  hash:4012193417 count:22014 rate:0.00931704
host:184803526  hash:2094014013 count:22097 rate:0.00935217
host:1911759956 hash:824332016  count:22155 rate:0.00937671
host:412776091  hash:2345249892 count:22194 rate:0.00939322
host:1125898167 hash:3699427959 count:22223 rate:0.00940549
host:135497281  hash:3267716888 count:22224 rate:0.00940592
host:572660336  hash:2471704921 count:22463 rate:0.00950707
host:596516649  hash:3577994617 count:22477 rate:0.00951299
host:719885386  hash:1271161395 count:22746 rate:0.00962684
host:1843993368 hash:209167012  count:22748 rate:0.00962769
host:1131176229 hash:253515204  count:22804 rate:0.00965139
host:805750846  hash:3215568319 count:22873 rate:0.00968059
host:35005211   hash:2142699485 count:22887 rate:0.00968652
host:1649760492 hash:1041784901 count:22967 rate:0.00972038
host:1967513926 hash:1618568325 count:22975 rate:0.00972376
host:2038664370 hash:3103071240 count:23099 rate:0.00977625
host:2044897763 hash:1674256958 count:23241 rate:0.00983634
host:1656478042 hash:1438521939 count:23273 rate:0.00984989
host:511702305  hash:3873703906 count:23349 rate:0.00988205
host:752392754  hash:3262287234 count:23352 rate:0.00988332
host:304089172  hash:1732284559 count:23392 rate:0.00990025
host:84353895   hash:2348091605 count:23450 rate:0.0099248
host:1469348094 hash:646865057  count:23522 rate:0.00995527
host:1726956429 hash:4181074938 count:23539 rate:0.00996247
host:945117276  hash:4238293984 count:23567 rate:0.00997432
host:1141616124 hash:1571084204 count:23687 rate:0.0100251
host:1937477084 hash:4082607906 count:23749 rate:0.0100513
host:1474612399 hash:3177345892 count:23788 rate:0.0100679
host:1129566413 hash:1470592129 count:23800 rate:0.0100729
host:1734575198 hash:2699928391 count:23809 rate:0.0100767
host:628175011  hash:2004400594 count:23829 rate:0.0100852
host:356426808  hash:1763444061 count:23858 rate:0.0100975
host:1424268980 hash:4064778616 count:23945 rate:0.0101343
host:1100661313 hash:277960042  count:23951 rate:0.0101368
host:336465782  hash:2967368358 count:24001 rate:0.010158
host:861021530  hash:204708028  count:24044 rate:0.0101762
host:1264095060 hash:2871390615 count:24102 rate:0.0102007
host:1780695788 hash:3482626452 count:24127 rate:0.0102113
host:635723058  hash:1724005082 count:24184 rate:0.0102355
host:42999170   hash:3780009203 count:24188 rate:0.0102371
host:608413784  hash:1746093087 count:24209 rate:0.010246
host:1315634022 hash:2863942689 count:24290 rate:0.0102803
host:846930886  hash:3078961593 count:24352 rate:0.0103066
host:1914544919 hash:3276410919 count:24523 rate:0.0103789
host:1801979802 hash:4146573372 count:24672 rate:0.010442
host:1365180540 hash:625462217  count:24727 rate:0.0104653
host:939819582  hash:2372438818 count:24831 rate:0.0105093
host:982906996  hash:247356574  count:24930 rate:0.0105512
host:521595368  hash:3895493184 count:24982 rate:0.0105732
host:2145174067 hash:1076998566 count:25050 rate:0.010602
host:2084420925 hash:1383371566 count:25103 rate:0.0106244
host:1433925857 hash:4101903831 count:25117 rate:0.0106303
host:1956297539 hash:1715097599 count:25379 rate:0.0107412
host:1374344043 hash:649457066  count:25383 rate:0.0107429
host:1101513929 hash:1467247780 count:25506 rate:0.010795
host:756898537  hash:1686538984 count:25587 rate:0.0108292
host:1189641421 hash:176009951  count:25675 rate:0.0108665
host:1159126505 hash:1773799926 count:25939 rate:0.0109782
host:1827336327 hash:1008242909 count:25984 rate:0.0109973
host:1714636915 hash:4167178899 count:26113 rate:0.0110519
host:749241873  hash:3575756704 count:26202 rate:0.0110895
host:1025202362 hash:2435165496 count:26208 rate:0.0110921
host:1585990364 hash:1528149985 count:26263 rate:0.0111154
host:2001100545 hash:3446016448 count:26377 rate:0.0111636
host:1998898814 hash:1116227003 count:26415 rate:0.0111797
host:1350490027 hash:1503077054 count:26472 rate:0.0112038
host:1804289383 hash:4001391486 count:26715 rate:0.0113067
host:1749698586 hash:1034046264 count:26729 rate:0.0113126
host:1411549676 hash:2524237029 count:26744 rate:0.0113189
host:149798315  hash:2620653860 count:26859 rate:0.0113676
host:1369133069 hash:474177003  count:26875 rate:0.0113744
host:2089018456 hash:2266701595 count:27028 rate:0.0114391
host:1889947178 hash:533689514  count:27869 rate:0.0117951
host:1653377373 hash:2085222316 count:28508 rate:0.0120655
host:2053999932 hash:399467675  count:30092 rate:0.0127359
host:1918502651 hash:941599913  count:31948 rate:0.0135214
max:31948
avg:23627
min:18650
up:0.352182
down:0.210649

发表回复

您的电子邮箱地址不会被公开。 必填项已用 * 标注