编程 Python

Python实现一个带权无回置随机抽选函数的方法

Posted in Python onJuly 24, 2019

需求

有一个抽奖应用，从所有参与的用户抽出K位中奖用户(K=奖品数量)，且要根据每位用户拥有的抽奖码数量作为权重。

如假设有三个用户及他们的权重是: A(1), B(1), C(2)。希望抽到A的概率为25%，抽到B的概率为25%, 抽到C的概率为50%。

分析

比较直观的做法是把两个C放到列表中抽选，如[A, B, C, C]，使用Python内置的函数random.choice[A, B, C, C], 这样C抽到的概率即为50%。

这个办法的问题是权重比较大的时候，浪费内存空间。

更一般的方法是，将所有权重加和4，然后从[0, 4)区间里随机挑选一个值，将A, B, C占用不同大小的区间。[0,1)是A, [1,2)是B, [2,4)是C。

使用Python的函数random.ranint(0, 3)或者int(random.random()*4)均可产生0-3的随机整数R。判断R在哪个区间即选择哪个用户。

接下来是寻找随机数在哪个区间的方法，

一种方法是按顺序遍历列表并保存已遍历的元素权重综合S，一旦S大于R，就返回当前元素。

from operator import itemgetter

users = [('A', 1), ('B', 1), ('C', 2)]

total = sum(map(itemgetter(1), users))

rnd = int(random.random()*total) # 0~3

s = 0
for u, w in users:
  s += w
  if s > rnd:
   return u

不过这种方法的复杂度是O(N)，因为要遍历所有的users。

可以想到另外一种方法，先按顺序把累积加的权重排成列表，然后对它使用二分法搜索，二分法复杂度降到O(logN)(除去其他的处理)

users = [('A', 1), ('B', 1), ('C', 2)]

cum_weights = list(itertools.accumulate(map(itemgetter(1), users))) # [1, 2, 4]

total = cum_weights[-1]

rnd = int(random.random()*total) # 0~3

hi = len(cum_weights) - 1
index = bisect.bisect(cum_weights, rnd, 0, hi)

return users(index)[0]

Python内置库random的choices函数(3.6版本后有)即是如此实现，random.choices函数签名为 random.choices(population, weights=None, *, cum_weights=None, k=1) population是待选列表， weights是各自的权重，cum_weights是可选的计算好的累加权重（两者选一），k是抽选数量（有回置抽选）。源码如下:

def choices(self, population, weights=None, *, cum_weights=None, k=1):
  """Return a k sized list of population elements chosen with replacement.
  If the relative weights or cumulative weights are not specified,
  the selections are made with equal probability.
  """
  random = self.random
  if cum_weights is None:
    if weights is None:
      _int = int
      total = len(population)
      return [population[_int(random() * total)] for i in range(k)]
    cum_weights = list(_itertools.accumulate(weights))
  elif weights is not None:
    raise TypeError('Cannot specify both weights and cumulative weights')
  if len(cum_weights) != len(population):
    raise ValueError('The number of weights does not match the population')
  bisect = _bisect.bisect
  total = cum_weights[-1]
  hi = len(cum_weights) - 1
  return [population[bisect(cum_weights, random() * total, 0, hi)]
      for i in range(k)]

更进一步

因为Python内置的random.choices是有回置抽选，无回置抽选函数是random.sample，但该函数不能根据权重抽选（random.sample(population, k)）。

原生的random.sample可以抽选个多个元素但不影响原有的列表，其使用了两种算法实现, 保证了各种情况均有良好的性能。 (源码地址：random.sample)

第一种是部分shuffle，得到K个元素就返回。时间复杂度是O(N)，不过需要复制原有的序列，增加内存使用。

result = [None] * k
n = len(population)
pool = list(population) # 不改变原有的序列
for i in range(k):
  j = int(random.random()*(n-i))
  result[k] = pool[j]
  pool[j] = pool[n-i-1] # 已选中的元素移走，后面未选中元素填上
return result

而第二种是设置一个已选择的set，多次随机抽选，如果抽中的元素在set内，就重新再抽，无需复制新的序列。当k相对n较小时，random.sample使用该算法，重复选择元素的概率较小。

selected = set()
selected_add = selected.add # 加速方法访问
for i in range(k):
  j = int(random.random()*n)
  while j in selected:
    j = int(random.random()*n)
  selected_add(j)
  result[j] = population[j]
return result

抽奖应用需要的是带权无回置抽选算法，结合random.choices和random.sample的实现写一个函数weighted_sample。

一般抽奖的人数都比奖品数量大得多，可选用random.sample的第二种方法作为无回置抽选，当然可以继续优化。

代码如下：

def weighted_sample(population, weights, k=1):
  """Like random.sample, but add weights.
  """
  n = len(population)
  if n == 0:
    return []
  if not 0 <= k <= n:
    raise ValueError("Sample larger than population or is negative")
  if len(weights) != n:
    raise ValueError('The number of weights does not match the population')

  cum_weights = list(itertools.accumulate(weights))
  total = cum_weights[-1]
  if total <= 0: # 预防一些错误的权重
    return random.sample(population, k=k)
  hi = len(cum_weights) - 1

  selected = set()
  _bisect = bisect.bisect
  _random = random.random
  selected_add = selected.add
  result = [None] * k
  for i in range(k):
    j = _bisect(cum_weights, _random()*total, 0, hi)
    while j in selected:
      j = _bisect(cum_weights, _random()*total, 0, hi)
    selected_add(j)
    result[i] = population[j]
  return result

以上就是本文的全部内容，希望对大家的学习有所帮助，也希望大家多多支持三水点靠木。

Python实现一个带权无回置随机抽选函数的方法

- Author -

EVE

声明：登载此文出于传递更多信息之目的，并不意味着赞同其观点或证实其描述。

Python 相关文章推荐

Python version 2.7 required, which was not found in the registry

Aug 26 Python

python使用BeautifulSoup分页网页中超链接的方法

Apr 04 Python

python使用fcntl模块实现程序加锁功能示例

Jun 23 Python

Python数据可视化编程通过Matplotlib创建散点图代码示例

Dec 09 Python

python使用Plotly绘图工具绘制柱状图

Apr 01 Python

python datetime中strptime用法详解

Aug 29 Python

利用Tensorboard绘制网络识别准确率和loss曲线实例

Feb 15 Python

使用Pyhton 分析酒店针孔摄像头

Mar 04 Python

解决pycharm导入numpy包的和使用时报错：RuntimeError: The current Numpy installation (‘D:\\python3.6\\lib\\site-packa的问题

Dec 08 Python

Python自动化之批量处理工作簿和工作表

Jun 03 Python

python ansible自动化运维工具执行流程

Jun 24 Python

Python人工智能之混合高斯模型运动目标检测详解分析

Nov 07 Python

Django的用户模块与权限系统的示例代码

Jul 24 #Python

python3字符串操作总结

Jul 24 #Python

django数据关系一对多、多对多模型、自关联的建立

Jul 24 #Python

django如何自己创建一个中间件

Jul 24 #Python

django如何通过类视图使用装饰器

Jul 24 #Python

django 类视图的使用方法详解

Jul 24 #Python

django如何实现视图重定向

Jul 24 #Python

You might like

一些常用的php简单命令代码集锦

2007/09/24 PHP

PHP性能优化工具篇Benchmark类调试执行时间

2011/12/06 PHP

php中3种方法统计字符串中每种字符的个数并排序

2012/08/27 PHP

php中的mongodb select常用操作代码示例

2014/09/06 PHP

php实现生成验证码实例分享

2016/04/10 PHP

CodeIgniter 完美解决URL含有中文字符串

2016/05/13 PHP

如何判断php mysqli扩展类是否开启

2016/12/24 PHP

php获取数据库结果集方法(推荐)

2017/06/01 PHP

PHP微信开发之微信录音临时转永久存储

2018/01/26 PHP

js/jquery获取文本框输入焦点的方法

2014/03/04 Javascript

jQuery处理图片加载失败的常用方法

2015/06/08 Javascript

详解jQuery移动页面开发中的ui-grid网格布局使用

2015/12/03 Javascript

js+flash实现的5图变换效果广告代码(附演示与demo源码下载)

2016/04/01 Javascript

AngularJS 实现JavaScript 动画效果详解

2016/09/08 Javascript

javascript中数组（Array)对象和字符串（String)对象的常用方法总结

2016/12/15 Javascript

jQuery zTree 异步加载添加子节点重复问题

2017/11/29 jQuery

js input输入百分号保存数据库失败的解决方法

2018/05/26 Javascript

vue监听对象及对象属性问题

2018/08/20 Javascript

jQuery序列化form表单数据为JSON对象的实现方法

2018/09/20 jQuery

基于three.js实现的3D粒子动效实例代码

2019/04/09 Javascript

Vue触发隐藏input file的方法实例详解

2019/08/14 Javascript

Python使用Dijkstra算法实现求解图中最短路径距离问题详解

2018/05/16 Python

Django模型序列化返回自然主键值示例代码

2019/06/12 Python

用scikit-learn和pandas学习线性回归的方法

2019/06/21 Python

python实现串口自动触发工作的示例

2019/07/02 Python

python输出pdf文档的实例

2020/02/13 Python

html5指南-1.html5全局属性(html5 global attributes)深入理解

2013/01/07 HTML / CSS

澳大利亚男士西服品牌：M.J.Bale

2018/02/06 全球购物

英国空调、除湿机和通风设备排名第一：Air Con Centre

2019/02/25 全球购物

开办加工厂创业计划书

2014/01/03 职场文书

食品行业求职人的自我评价

2014/01/19 职场文书

校庆标语集锦

2014/06/25 职场文书

父亲节感言

2015/08/03 职场文书

2019年入党思想汇报

2019/03/25 职场文书

Golang的继承模拟实例

2021/06/30 Golang

Win11怎么启动任务管理器？Win11启动任务管理器的几种方法

2021/11/23 数码科技