编程 PHP

php-perl哈希算法实现(times33哈希算法)

Posted in PHP onDecember 30, 2013

APR_DECLARE_NONSTD(unsigned int) apr_hashfunc_default(const char *char_key,
                                                      apr_ssize_t *klen)
{
    unsigned int hash = 0;
    const unsigned char *key = (const unsigned char *)char_key;
    const unsigned char *p;
    apr_ssize_t i;    /*
     * This is the popular `times 33' hash algorithm which is used by
     * perl and also appears in Berkeley DB. This is one of the best
     * known hash functions for strings because it is both computed
     * very fast and distributes very well.
     *
     * The originator may be Dan Bernstein but the code in Berkeley DB
     * cites Chris Torek as the source. The best citation I have found
     * is "Chris Torek, Hash function for text in C, Usenet message
     * <27038@mimsy.umd.edu> in comp.lang.c , October, 1990." in Rich
     * Salz's USENIX 1992 paper about INN which can be found at
     * .
     *
     * The magic of number 33, i.e. why it works better than many other
     * constants, prime or not, has never been adequately explained by
     * anyone. So I try an explanation: if one experimentally tests all
     * multipliers between 1 and 256 (as I did while writing a low-level
     * data structure library some time ago) one detects that even
     * numbers are not useable at all. The remaining 128 odd numbers
     * (except for the number 1) work more or less all equally well.
     * They all distribute in an acceptable way and this way fill a hash
     * table with an average percent of approx. 86%.
     *
     * If one compares the chi^2 values of the variants (see
     * Bob Jenkins ``Hashing Frequently Asked Questions'' at
     * http://burtleburtle.net/bob/hash/hashfaq.html for a description
     * of chi^2), the number 33 not even has the best value. But the
     * number 33 and a few other equally good numbers like 17, 31, 63,
     * 127 and 129 have nevertheless a great advantage to the remaining
     * numbers in the large set of possible multipliers: their multiply
     * operation can be replaced by a faster operation based on just one
     * shift plus either a single addition or subtraction operation. And
     * because a hash function has to both distribute good _and_ has to
     * be very fast to compute, those few numbers should be preferred.
     *
     *                  -- Ralf S. Engelschall 
     */
    if (*klen == APR_HASH_KEY_STRING) {
        for (p = key; *p; p++) {
            hash = hash * 33 + *p;
        }
        *klen = p - key;
    }
    else {
        for (p = key, i = *klen; i; i--, p++) {
            hash = hash * 33 + *p;
        }
    }
    return hash;
}

对函数注释部分的翻译: 这是很出名的times33哈希算法,此算法被perl语言采用并在Berkeley DB中出现.它是已知的最好的哈希算法之一,在处理以字符串为键值的哈希时,有着极快的计算效率和很好哈希分布.最早提出这个算法的是Dan Bernstein,但是源代码确实由Clris Torek在Berkeley DB出实作的.我找到的最确切的引文中这样说”Chris Torek,C语言文本哈希函数,Usenet消息<<27038@mimsy.umd.edu> in comp.lang.c ,1990年十月.”在Rich Salz于1992年在USENIX报上发表的讨论INN的文章中提到.这篇文章可以在上找到. 33这个奇妙的数字,为什么它能够比其他数值效果更好呢?无论重要与否,却从来没有人能够充分说明其中的原因.因此在这里,我来试着解释一下.如果某人试着测试1到256之间的每个数字(就像我前段时间写的一个底层数据结构库那样),他会发现,没有哪一个数字的表现是特别突出的.其中的128个奇数(1除外)的表现都差不多,都能够达到一个能接受的哈希分布,平均分布率大概是86%. 如果比较这128个奇数中的方差值(gibbon:统计术语,表示随机变量与它的数学期望之间的平均偏离程度)的话(见Bob Jenkins的<哈希常见疑问>http://burtleburtle.net/bob/hash/hashfaq.html,中对平方差的描述),数字33并不是表现最好的一个.(gibbon:这里按照我的理解,照常理,应该是方差越小稳定,但是由于这里不清楚作者方差的计算公式,以及在哈希离散表,是不是离散度越大越好,所以不得而知这里的表现好是指方差值大还是指方差值小),但是数字33以及其他一些同样好的数字比如 17,31,63,127和129对于其他剩下的数字,在面对大量的哈希运算时,仍然有一个大大的优势,就是这些数字能够将乘法用位运算配合加减法来替换,这样的运算速度会提高.毕竟一个好的哈希算法要求既有好的分布,也要有高的计算速度,能同时达到这两点的数字很少.

声明：登载此文出于传递更多信息之目的，并不意味着赞同其观点或证实其描述。

PHP 相关文章推荐

不错的PHP学习之php4与php5之间会穿梭一点点感悟

May 03 PHP

php 分库分表hash算法

Nov 12 PHP

php中计算时间差的几种方法

Dec 31 PHP

php skymvc 一款轻量、简单的php

Jun 28 PHP

简单实现限定phpmyadmin访问ip的方法

Mar 05 PHP

PHP设计模式之结构模式的深入解析

Jun 13 PHP

php实现mysql数据库分表分段备份

Jun 18 PHP

PHP简单日历实现方法

Jul 20 PHP

PHP获取当前URL路径的处理方法(适用于多条件筛选列表)

Feb 10 PHP

PHP简单实现二维数组赋值与遍历功能示例

Oct 19 PHP

php7 图形用户界面GUI 开发示例

Feb 22 PHP

PHP时间类完整代码实例

Feb 26 PHP

php实现在线生成条形码示例分享(条形码生成器)

Dec 30 #PHP

md5 16位二进制与32位字符串相互转换示例

Dec 30 #PHP

微信扫描二维码登录网站代码示例

Dec 30 #PHP

浅谈PHP变量作用域以及地址引用问题

Dec 27 #PHP

一个好用的PHP验证码类实例分享

Dec 27 #PHP

PHP连接SQLServer2005方法及代码

Dec 26 #PHP

php截取中文字符串不乱码的方法

Dec 25 #PHP

You might like

php empty() 检查一个变量是否为空

2011/11/10 PHP

PHPMailer使用教程(PHPMailer发送邮件实例分析)

2012/12/06 PHP

使用array_map简单搞定PHP删除文件、删除目录

2014/10/29 PHP

完整删除ecshop中获取店铺信息的API

2014/12/24 PHP

PHP多维数组遍历方法(2种实现方法)

2015/12/10 PHP

PHP6连接SQLServer2005的三部曲

2016/04/15 PHP

php格式文件打开的四种方法

2018/02/24 PHP

背景音乐每次刷新都可以自动更换

2007/02/01 Javascript

用js 让图片在 div或dl里　居中，底部对齐

2008/01/21 Javascript

利用jQuery的$.event.fix函数统一浏览器event事件处理

2009/12/21 Javascript

setTimeout和setInterval的区别你真的了解吗?

2011/03/31 Javascript

关于js注册事件的常用方法

2013/04/03 Javascript

使用javascript:将其它类型值转换成布尔类型值的解决方法详解

2013/05/07 Javascript

js如何取消事件冒泡

2013/09/23 Javascript

JS实现自动切换文字的导航效果代码

2015/08/27 Javascript

12种JavaScript常用的MVC框架比较分析

2015/11/16 Javascript

详解JavaScript基于面向对象之创建对象（2）

2015/12/10 Javascript

Vue过滤器的用法和自定义过滤器使用

2017/02/08 Javascript

从零学习node.js之模块规范（一）

2017/02/21 Javascript

Bootstrap下拉菜单Dropdowns的实现代码

2017/03/17 Javascript

js使用html2canvas实现屏幕截取的示例代码

2017/08/28 Javascript

vue自定义指令directive的使用方法

2019/04/07 Javascript

[02:03]DOTA2亚洲邀请赛 HGT战队出场宣传片

2015/02/07 DOTA

[52:29]DOTA2上海特级锦标赛主赛事日 - 2 胜者组第一轮#3Secret VS OG第三局

2016/03/03 DOTA

[02:55]含熏伴清风，风行者至宝、屠夫身心及典藏宝瓶二展示

2020/09/08 DOTA

Python 专题四文件基础知识

2017/03/20 Python

答题辅助python代码实现

2018/01/16 Python

Python列表生成式与生成器操作示例

2018/08/01 Python

Python callable内置函数原理解析

2020/03/05 Python

PageFactory设计模式基于python实现

2020/04/14 Python

python是怎么被发明的

2020/06/15 Python

使用phonegap播放音频的实现方法

2017/03/31 HTML / CSS

Lovedrobe官网：英国领先的大码服装品牌

2019/09/19 全球购物

查环查孕证明

2014/01/10 职场文书

2015年司机工作总结

2015/04/23 职场文书

Python捕获、播放和保存摄像头视频并提高视频清晰度和对比度

2022/04/14 Python