TensorFlow内存管理bfc算法实例


Posted in Python onFebruary 03, 2020

1. 基本介绍

tensorflow设备内存管理模块实现了一个best-fit with coalescing算法(后文简称bfc算法)。

bfc算法是Doung Lea's malloc(dlmalloc)的一个非常简单的版本。

它具有内存分配、释放、碎片管理等基本功能。

2. bfc基本算法思想

1. 数据结构

整个内存空间由一个按基址升序排列的Chunk双向链表来表示,它们的直接前趋和后继必须在地址连续的内存空间。Chunk结构体里含有实际大小、请求大小、是否被占用、基址、直接前趋、直接后继、Bin索引等信息。

TensorFlow内存管理bfc算法实例

2. 申请

用户申请一个内存块(malloc)。根据chunk双链表找到一个合适的内存块,如果该内存块的大小是用户申请的大小的二倍以上,那么就将该内存块切分成两块,这就是split操作。

返回其中一块给用户,并将该内存块标识为占用

Spilt操作会新增一个chunk,所以需要修改chunk双链表以维持前驱和后继关系

如果用户申请512的空间,正好有一块1024的chunk2是空闲的,由于1024/512 =2,所以chunk2 被split为2块:chunk2_1和chunk2_2。返回chunk2_1给用户并将其标志位占用状态。

3. 释放

用户释放一个内存块(free)。先将该块标记为空闲。然后根据chunk数据结构中的信息找到其前驱和后继内存块。如果前驱和后继块中有空闲的块,那么将刚释放的块和空闲的块合并成一个更大的chunk(这就是merge操作,合并当前块和其前后的空闲块)。再修改双链表结构以维持前驱后继关系。这就做到了内存碎片的回收。

如果用户要free chunk3,由于chunk3的前驱chunk2也是空闲的,所以将chunk2和chunk3合并得到一个新的chunk2',大小为chunk2和chunk3之和。

3. bins

1. bins数据结构

bfc算法采取的是被动分块的策略。最开始整个内存是一个chunk,随着用户申请空间的次数增加,最开始的大chunk会被不断的split开来,从而产生越来越多的小chunk。当chunk数量很大时,为了寻找一个合适的内存块而遍历双链表无疑是一笔巨大的开销。为了实现对空闲块的高效管理,bfc算法设计了bin这个抽象数据结构。

每个bin都有一个size属性,一个bin是一个拥有chunk size >= binsize的空闲chunk的集合。集合中的chunk按照chunk size的升序组织成单链表。bfc算法维护了一个bin的集合:bins。它由多个bin以及从属于每个bin的chunks组成。内存中所有的空闲chunk都由bins管理。

TensorFlow内存管理bfc算法实例

图中每一列表示一个bin,列首方格中的数字表示bin的size。bin size的大小都是256的2^n的倍。每个bin下面挂载了一系列的空闲chunk,每个chunk的chunk size都大于等于所属的bin的bin size,按照chunk size的升序挂载成单链表。

2. bins操作

bfc算法针对bins这个集合设计了三个操作:search、insert、delete。

search

给定一个chunk size,从bins中找到大于等于该chunksize的最小的那个空闲chunk。Search操作具体流程如下。如果bin以数组的形式组织,那么可以从index = chunk size /256 >>2的那个bin开始查找。最好的情况是开始查找的那个bin的chunk链表非空,那么直接返回链表头即可。这种情况时间复杂度是常数级的。最坏的情况是遍历bins数组中所有的bin。对于一般大小的内存来说,bins数组元素非常少,比如4G空间只需要23个bin就足够了(256 * 2 ^ 23 > 4G),因此也很快能返回结果。总体来说search操作是非常高效的。对于固定大小内存来说,查找时间是常数量级的。

insert

将一个空闲的chunk插入到一个bin所挂载的chunk链表中,同时需要维持chunk链表的升序关系。具体流程是直接将chunk插入到index = chunk size /256 >>2的那个bin中即可。

delete

将一个空闲的chunk从bins中移除。

4. 总结

将内存分块管理,按块进行空间分配和释放。

通过split操作将大内存块分解成用户需要的小内存块。

通过merge操作合并小的内存块,做到内存碎片回收

通过bin这个抽象数据结构实现对空闲块高效管理。

5. 代码分析

1. 代码地址

https://github.com/tensorflow/tensorflow/tree/master/tensorflow/core/common_runtime

2. 数据结构

Chunk

static const int kInvalidChunkHandle = -1;
...
struct Chunk {
  size_t size = 0; // Full size of buffer.

  // We sometimes give chunks that are larger than needed to reduce
  // fragmentation. requested_size keeps track of what the client
  // actually wanted so we can understand whether our splitting
  // strategy is efficient.
  size_t requested_size = 0;

  // allocation_id is set to -1 when the chunk is not in use. It is assigned a
  // value greater than zero before the chunk is returned from
  // AllocateRaw, and this value is unique among values assigned by
  // the parent allocator.
  int64 allocation_id = -1;
  void* ptr = nullptr; // pointer to granted subbuffer.

  // If not kInvalidChunkHandle, the memory referred to by 'prev' is directly
  // preceding the memory used by this chunk. E.g., It should start
  // at 'ptr - prev->size'
  ChunkHandle prev = kInvalidChunkHandle;

  // If not kInvalidChunkHandle, the memory referred to by 'next' is directly
  // following the memory used by this chunk. E.g., It should be at
  // 'ptr + size'
  ChunkHandle next = kInvalidChunkHandle;

  // What bin are we in?
  BinNum bin_num = kInvalidBinNum;

  bool in_use() const { return allocation_id != -1; }
};

Bin

// A Bin is a collection of similar-sized free chunks.
struct Bin {
  // All chunks in this bin have >= bin_size memory.
  size_t bin_size = 0;

  struct ChunkComparator {
    explicit ChunkComparator(BFCAllocator* allocator)
      : allocator_(allocator) {}
    // Sort first by size and then use pointer address as a tie breaker.
    bool operator()(const ChunkHandle ha,
            const ChunkHandle hb) const NO_THREAD_SAFETY_ANALYSIS {
      const Chunk* a = allocator_->ChunkFromHandle(ha);
      const Chunk* b = allocator_->ChunkFromHandle(hb);
      if (a->size != b->size) {
        return a->size < b->size;
      }
      return a->ptr < b->ptr;
    }

    private:
      BFCAllocator* allocator_; // The parent allocator
  };

  typedef std::set<ChunkHandle, ChunkComparator> FreeChunkSet;
  // List of free chunks within the bin, sorted by chunk size.
  // Chunk * not owned.
  FreeChunkSet free_chunks;
  Bin(BFCAllocator* allocator, size_t bs)
    : bin_size(bs), free_chunks(ChunkComparator(allocator)) {}
};

AllocationRegion

AllocationRegion给一个连续的内存区域做指针到ChunkHandle的映射。

RegionManager

RegionManager聚集了一个或多个AllocationRegion,并提供一个从指针到基础ChunkHandle的间接层,这个间接层可在多个不连续的内存区域进行分配。

3. 分配大小

将每次分配的内存大小调整为kMinAllocationSize的N倍,这样所有内存地址都是很好地按字节对齐了。

// kMinAllocationSize = 256
static const size_t kMinAllocationBits = 8;
static const size_t kMinAllocationSize = 1 << kMinAllocationBits;
...
size_t BFCAllocator::RoundedBytes(size_t bytes) {
  size_t rounded_bytes =
    (kMinAllocationSize *
    ((bytes + kMinAllocationSize - 1) / kMinAllocationSize));
  DCHECK_EQ(size_t{0}, rounded_bytes % kMinAllocationSize);
  return rounded_bytes;
}

4. 初始化bin

typedef int BinNum;
static const int kInvalidBinNum = -1;
static const int kNumBins = 21;
...
// 二进制2^8往左移0,1,2位
// (static_cast<size_t>(256) << 0) = 256
// (static_cast<size_t>(256) << 1) = 512
// (static_cast<size_t>(256) << 2) = 1024
size_t BinNumToSize(BinNum index) {
  return static_cast<size_t>(256) << index;
}
...
char bins_space_[sizeof(Bin) * kNumBins];
// Map from bin size to Bin
Bin* BinFromIndex(BinNum index) {
  return reinterpret_cast<Bin*>(&(bins_space_[index * sizeof(Bin)]));
}
...
// We create bins to fit all possible ranges that cover the
// memory_limit_ starting from allocations up to 256 bytes to
// allocations up to (and including) the memory limit.
for (BinNum b = 0; b < kNumBins; b++) {
  size_t bin_size = BinNumToSize(b);
  VLOG(1) << "Creating bin of max chunk size "
      << strings::HumanReadableNumBytes(bin_size);
  new (BinFromIndex(b)) Bin(this, bin_size);
  CHECK_EQ(BinForSize(bin_size), BinFromIndex(b));
  CHECK_EQ(BinForSize(bin_size + 255), BinFromIndex(b));
  CHECK_EQ(BinForSize(bin_size * 2 - 1), BinFromIndex(b));
  if (b + 1 < kNumBins) {
    CHECK_NE(BinForSize(bin_size * 2), BinFromIndex(b));
  }
}

5. 查找bin

// 求属于第几个bin
BinNum BinNumForSize(size_t bytes) {
  uint64 v = std::max<size_t>(bytes, 256) >> kMinAllocationBits;
  int b = std::min(kNumBins - 1, Log2FloorNonZero(v));
  return b;
}
// 最高位非零的二进制位数,eg: 0001 0101B 为5
inline int Log2FloorNonZero(uint64 n) {
#if defined(__GNUC__)
  return 63 ^ __builtin_clzll(n);
#elif defined(PLATFORM_WINDOWS)
  unsigned long index;
  _BitScanReverse64(&index, n);
  return index;
#else
  int r = 0;
  while (n > 0) {
    r++;
    n >>= 1;
  }
  return r;
#endif
}

6. 查找Chunk

// 先加锁
mutex_lock l(lock_);
void* ptr = FindChunkPtr(bin_num, rounded_bytes, num_bytes);
if (ptr != nullptr) {
  return ptr;
}
// FindChunkPtr函数内部
void* BFCAllocator::FindChunkPtr(BinNum bin_num, size_t rounded_bytes,
                 size_t num_bytes) {
  // First identify the first bin that could satisfy rounded_bytes.
  for (; bin_num < kNumBins; bin_num++) {
    // Start searching from the first bin for the smallest chunk that fits
    // rounded_bytes.
    Bin* b = BinFromIndex(bin_num);
    for (auto citer = b->free_chunks.begin(); citer != b->free_chunks.end();
        ++citer) {
      // 从之前得到的Bin索引开始,查找合适的空闲Chunk:
      const BFCAllocator::ChunkHandle h = (*citer);
      BFCAllocator::Chunk* chunk = ChunkFromHandle(h);
      DCHECK(!chunk->in_use());
      if (chunk->size >= rounded_bytes) {
        // We found an existing chunk that fits us that wasn't in use, so remove
        // it from the free bin structure prior to using.
        RemoveFreeChunkIterFromBin(&b->free_chunks, citer);

        // If we can break the size of the chunk into two reasonably
        // large pieces, do so.
        //
        // TODO(vrv): What should be the criteria when deciding when
        // to split? 
        // 具体实现后面会分析
        if (chunk->size >= rounded_bytes * 2) {
          SplitChunk(h, rounded_bytes);
          chunk = ChunkFromHandle(h); // Update chunk pointer in case it moved
        }

        // The requested size of the returned chunk is what the user
        // has allocated.
        chunk->requested_size = num_bytes;
        // Assign a unique id and increment the id counter, marking the
        // chunk as being in use.
        chunk->allocation_id = next_allocation_id_++;

        // Update stats.
        ++stats_.num_allocs;
        stats_.bytes_in_use += chunk->size;
        stats_.max_bytes_in_use =
          std::max(stats_.max_bytes_in_use, stats_.bytes_in_use);
        stats_.max_alloc_size =
          std::max<std::size_t>(stats_.max_alloc_size, chunk->size);

        VLOG(4) << "Returning: " << chunk->ptr;
        if (VLOG_IS_ON(4)) {
          LOG(INFO) << "A: " << RenderOccupancy();
        }
        return chunk->ptr;
      }
    }
  }
  return nullptr;
}

7. 拆分Chunk

如果Chunk的大小大于等于申请内存大小的2倍,那么将该Chunk拆分成2个:第一个Chunk的大小等于申请内存大小,第二个Chunk作为它的直接后继。

if (chunk->size >= rounded_bytes * 2) {
  SplitChunk(h, rounded_bytes);
  chunk = ChunkFromHandle(h); // Update chunk pointer in case it moved
}

void BFCAllocator::SplitChunk(BFCAllocator::ChunkHandle h, size_t num_bytes) {
  // Allocate the new chunk before we do any ChunkFromHandle
  ChunkHandle h_new_chunk = AllocateChunk();

  Chunk* c = ChunkFromHandle(h);
  CHECK(!c->in_use() && (c->bin_num == kInvalidBinNum));

  // Create a new chunk starting num_bytes after c
  BFCAllocator::Chunk* new_chunk = ChunkFromHandle(h_new_chunk);
  new_chunk->ptr = static_cast<void*>(static_cast<char*>(c->ptr) + num_bytes);
  region_manager_.set_handle(new_chunk->ptr, h_new_chunk);

  // Set the new sizes of the chunks.
  new_chunk->size = c->size - num_bytes;
  c->size = num_bytes;

  // The new chunk is not in use.
  new_chunk->allocation_id = -1;

  // Maintain the pointers.
  // c <-> c_neighbor becomes
  // c <-> new_chunk <-> c_neighbor
  BFCAllocator::ChunkHandle h_neighbor = c->next;
  new_chunk->prev = h;
  new_chunk->next = h_neighbor;
  c->next = h_new_chunk;
  if (h_neighbor != kInvalidChunkHandle) {
    Chunk* c_neighbor = ChunkFromHandle(h_neighbor);
    c_neighbor->prev = h_new_chunk;
  }

  // Add the newly free chunk to the free bin.
  InsertFreeChunkIntoBin(h_new_chunk);
}

8. 回收chunk

加锁,获得ChunkHandle

mutex_lock l(lock_);
BFCAllocator::ChunkHandle h = region_manager_.get_handle(ptr);
FreeAndMaybeCoalesce(h);

FreeAndMaybeCoalesce

void BFCAllocator::FreeAndMaybeCoalesce(BFCAllocator::ChunkHandle h) {
  Chunk* c = ChunkFromHandle(h);
  CHECK(c->in_use() && (c->bin_num == kInvalidBinNum));

  // Mark the chunk as no longer in use
  c->allocation_id = -1;

  // Updates the stats.
  stats_.bytes_in_use -= c->size;

  // This chunk is no longer in-use, consider coalescing the chunk
  // with adjacent chunks.
  ChunkHandle chunk_to_reassign = h;

  // If the next chunk is free, coalesce the two
  if (c->next != kInvalidChunkHandle) {
    Chunk* cnext = ChunkFromHandle(c->next);
    if (!cnext->in_use()) {
    //   VLOG(8) << "Chunk at " << cnext->ptr << " merging with c " <<
    //   c->ptr;

    chunk_to_reassign = h;

    // Deletes c->next
    RemoveFreeChunkFromBin(c->next);
    Merge(h, ChunkFromHandle(h)->next);
    }
  }

  // If the previous chunk is free, coalesce the two
  c = ChunkFromHandle(h);
  if (c->prev != kInvalidChunkHandle) {
    Chunk* cprev = ChunkFromHandle(c->prev);
    if (!cprev->in_use()) {
    //   VLOG(8) << "Chunk at " << c->ptr << " merging into c->prev "
    //    << cprev->ptr;

    chunk_to_reassign = c->prev;

    // Deletes c
    RemoveFreeChunkFromBin(c->prev);
    Merge(ChunkFromHandle(h)->prev, h);
    c = ChunkFromHandle(h);
    }
  }

  InsertFreeChunkIntoBin(chunk_to_reassign);
}

Merge

// Merges h1 and h2 when Chunk(h1)->next is h2 and Chunk(h2)->prev is c1.
// We merge Chunk(h2) into Chunk(h1).
void BFCAllocator::Merge(BFCAllocator::ChunkHandle h1,
             BFCAllocator::ChunkHandle h2) {
  Chunk* c1 = ChunkFromHandle(h1);
  Chunk* c2 = ChunkFromHandle(h2);
  // We can only merge chunks that are not in use.
  CHECK(!c1->in_use() && !c2->in_use());

  // c1's prev doesn't change, still points to the same ptr, and is
  // still not in use.

  // Fix up neighbor pointers
  //
  // c1 <-> c2 <-> c3 should become
  // c1 <-> c3

  BFCAllocator::ChunkHandle h3 = c2->next;
  c1->next = h3;
  CHECK(c2->prev == h1);
  if (h3 != kInvalidChunkHandle) {
    BFCAllocator::Chunk* c3 = ChunkFromHandle(h3);
    c3->prev = h1;
  }

  // Set the new size
  c1->size += c2->size;

  DeleteChunk(h2);
}

以上这篇TensorFlow内存管理bfc算法实例就是小编分享给大家的全部内容了,希望能给大家一个参考,也希望大家多多支持三水点靠木。

Python 相关文章推荐
python读取Android permission文件
Nov 01 Python
Python Web框架Flask中使用新浪SAE云存储实例
Feb 08 Python
python3.4用循环往mysql5.7中写数据并输出的实现方法
Jun 20 Python
Python编程使用NLTK进行自然语言处理详解
Nov 16 Python
基于Python中capitalize()与title()的区别详解
Dec 09 Python
python获取时间及时间格式转换问题实例代码详解
Dec 06 Python
浅谈python requests 的put, post 请求参数的问题
Jan 02 Python
python 使用opencv 把视频分割成图片示例
Dec 12 Python
使用Python实现Wake On Lan远程开机功能
Jan 22 Python
Python unittest单元测试openpyxl实现过程解析
May 27 Python
Django Model层F,Q对象和聚合函数原理解析
Nov 12 Python
Python编程中内置的NotImplemented类型的用法
Mar 23 Python
python numpy数组中的复制知识解析
Feb 03 #Python
opencv python Canny边缘提取实现过程解析
Feb 03 #Python
Pycharm debug调试时带参数过程解析
Feb 03 #Python
Python使用enumerate获取迭代元素下标
Feb 03 #Python
TensorFlow 显存使用机制详解
Feb 03 #Python
opencv python如何实现图像二值化
Feb 03 #Python
python实现人机猜拳小游戏
Feb 03 #Python
You might like
4.与数据库的连接
2006/10/09 PHP
PHP mcrypt可逆加密算法分析
2011/07/19 PHP
带密匙的php加密解密示例分享
2014/01/29 PHP
php获取字段名示例分享
2014/03/03 PHP
PHP简单实现冒泡排序的方法
2016/12/26 PHP
Thinkphp5框架实现图片、音频和视频文件的上传功能详解
2019/08/27 PHP
PHP7 其他语言层面的修改
2021/03/09 PHP
用JavaScript实现动画效果的方法
2013/07/20 Javascript
js为鼠标添加右击事件防止默认的右击菜单弹出
2013/07/29 Javascript
node.js使用require()函数加载模块
2014/11/26 Javascript
js数组去重的hash方法
2016/12/22 Javascript
基于JavaScript实现的顺序查找算法示例
2017/04/14 Javascript
bootstrap table方法之expandRow-collapseRow展开或关闭当前行数据
2020/08/09 Javascript
基于three.js编写的一个项目类示例代码
2018/01/05 Javascript
分享5个好用的javascript文件上传插件
2018/09/16 Javascript
详解JavaScript的数据类型以及数据类型的转换
2019/04/20 Javascript
Javascript实现鼠标点击冒泡特效
2019/12/24 Javascript
jquery实现点击左右按钮切换图片
2021/01/27 jQuery
Python中的特殊语法:filter、map、reduce、lambda介绍
2015/04/14 Python
python发送HTTP请求的方法小结
2015/07/08 Python
深入理解Python中range和xrange的区别
2017/11/26 Python
pytorch中的embedding词向量的使用方法
2019/08/18 Python
Django项目基础配置和基本使用过程解析
2019/11/25 Python
Django Form and ModelForm的区别与使用
2019/12/06 Python
python如何从键盘获取输入实例
2020/06/18 Python
Python图像阈值化处理及算法比对实例解析
2020/06/19 Python
利用python进行文件操作
2020/12/04 Python
python绕过图片滑动验证码实现爬取PTA所有题目功能 附源码
2021/01/06 Python
使用CSS3中的calc()属性来以算式表达尺寸数值
2016/06/06 HTML / CSS
事业单位公务员的职业生涯规划
2014/01/15 职场文书
清华大学自主招生自荐信
2014/01/29 职场文书
计划生育证明格式及范本
2014/10/09 职场文书
联村联户简报
2015/07/21 职场文书
职工食堂管理制度
2015/08/06 职场文书
学习党史心得体会2016
2016/01/23 职场文书
微信小程序 根据不同用户切换不同TabBar
2022/04/21 Javascript