编程 Python

TensorFlow内存管理bfc算法实例

Posted in Python onFebruary 03, 2020

1. 基本介绍

tensorflow设备内存管理模块实现了一个best-fit with coalescing算法（后文简称bfc算法）。

bfc算法是Doung Lea's malloc(dlmalloc)的一个非常简单的版本。

它具有内存分配、释放、碎片管理等基本功能。

2. bfc基本算法思想

1. 数据结构

整个内存空间由一个按基址升序排列的Chunk双向链表来表示，它们的直接前趋和后继必须在地址连续的内存空间。Chunk结构体里含有实际大小、请求大小、是否被占用、基址、直接前趋、直接后继、Bin索引等信息。

TensorFlow内存管理bfc算法实例

2. 申请

用户申请一个内存块（malloc)。根据chunk双链表找到一个合适的内存块，如果该内存块的大小是用户申请的大小的二倍以上，那么就将该内存块切分成两块，这就是split操作。

返回其中一块给用户，并将该内存块标识为占用

Spilt操作会新增一个chunk，所以需要修改chunk双链表以维持前驱和后继关系

如果用户申请512的空间，正好有一块1024的chunk2是空闲的，由于1024/512 =2，所以chunk2 被split为2块：chunk2_1和chunk2_2。返回chunk2_1给用户并将其标志位占用状态。

3. 释放

用户释放一个内存块（free）。先将该块标记为空闲。然后根据chunk数据结构中的信息找到其前驱和后继内存块。如果前驱和后继块中有空闲的块，那么将刚释放的块和空闲的块合并成一个更大的chunk（这就是merge操作，合并当前块和其前后的空闲块）。再修改双链表结构以维持前驱后继关系。这就做到了内存碎片的回收。

如果用户要free chunk3，由于chunk3的前驱chunk2也是空闲的，所以将chunk2和chunk3合并得到一个新的chunk2'，大小为chunk2和chunk3之和。

3. bins

1. bins数据结构

bfc算法采取的是被动分块的策略。最开始整个内存是一个chunk，随着用户申请空间的次数增加，最开始的大chunk会被不断的split开来，从而产生越来越多的小chunk。当chunk数量很大时，为了寻找一个合适的内存块而遍历双链表无疑是一笔巨大的开销。为了实现对空闲块的高效管理，bfc算法设计了bin这个抽象数据结构。

每个bin都有一个size属性，一个bin是一个拥有chunk size >= binsize的空闲chunk的集合。集合中的chunk按照chunk size的升序组织成单链表。bfc算法维护了一个bin的集合：bins。它由多个bin以及从属于每个bin的chunks组成。内存中所有的空闲chunk都由bins管理。

TensorFlow内存管理bfc算法实例

图中每一列表示一个bin，列首方格中的数字表示bin的size。bin size的大小都是256的2^n的倍。每个bin下面挂载了一系列的空闲chunk，每个chunk的chunk size都大于等于所属的bin的bin size，按照chunk size的升序挂载成单链表。

2. bins操作

bfc算法针对bins这个集合设计了三个操作：search、insert、delete。

search

给定一个chunk size，从bins中找到大于等于该chunksize的最小的那个空闲chunk。Search操作具体流程如下。如果bin以数组的形式组织，那么可以从index = chunk size /256 >>2的那个bin开始查找。最好的情况是开始查找的那个bin的chunk链表非空，那么直接返回链表头即可。这种情况时间复杂度是常数级的。最坏的情况是遍历bins数组中所有的bin。对于一般大小的内存来说，bins数组元素非常少，比如4G空间只需要23个bin就足够了（256 * 2 ^ 23 > 4G），因此也很快能返回结果。总体来说search操作是非常高效的。对于固定大小内存来说，查找时间是常数量级的。

insert

将一个空闲的chunk插入到一个bin所挂载的chunk链表中，同时需要维持chunk链表的升序关系。具体流程是直接将chunk插入到index = chunk size /256 >>2的那个bin中即可。

delete

将一个空闲的chunk从bins中移除。

4. 总结

将内存分块管理，按块进行空间分配和释放。

通过split操作将大内存块分解成用户需要的小内存块。

通过merge操作合并小的内存块，做到内存碎片回收

通过bin这个抽象数据结构实现对空闲块高效管理。

5. 代码分析

1. 代码地址

https://github.com/tensorflow/tensorflow/tree/master/tensorflow/core/common_runtime

2. 数据结构

Chunk

static const int kInvalidChunkHandle = -1;
...
struct Chunk {
  size_t size = 0; // Full size of buffer.

  // We sometimes give chunks that are larger than needed to reduce
  // fragmentation. requested_size keeps track of what the client
  // actually wanted so we can understand whether our splitting
  // strategy is efficient.
  size_t requested_size = 0;

  // allocation_id is set to -1 when the chunk is not in use. It is assigned a
  // value greater than zero before the chunk is returned from
  // AllocateRaw, and this value is unique among values assigned by
  // the parent allocator.
  int64 allocation_id = -1;
  void* ptr = nullptr; // pointer to granted subbuffer.

  // If not kInvalidChunkHandle, the memory referred to by 'prev' is directly
  // preceding the memory used by this chunk. E.g., It should start
  // at 'ptr - prev->size'
  ChunkHandle prev = kInvalidChunkHandle;

  // If not kInvalidChunkHandle, the memory referred to by 'next' is directly
  // following the memory used by this chunk. E.g., It should be at
  // 'ptr + size'
  ChunkHandle next = kInvalidChunkHandle;

  // What bin are we in?
  BinNum bin_num = kInvalidBinNum;

  bool in_use() const { return allocation_id != -1; }
};

Bin

// A Bin is a collection of similar-sized free chunks.
struct Bin {
  // All chunks in this bin have >= bin_size memory.
  size_t bin_size = 0;

  struct ChunkComparator {
    explicit ChunkComparator(BFCAllocator* allocator)
      : allocator_(allocator) {}
    // Sort first by size and then use pointer address as a tie breaker.
    bool operator()(const ChunkHandle ha,
            const ChunkHandle hb) const NO_THREAD_SAFETY_ANALYSIS {
      const Chunk* a = allocator_->ChunkFromHandle(ha);
      const Chunk* b = allocator_->ChunkFromHandle(hb);
      if (a->size != b->size) {
        return a->size < b->size;
      }
      return a->ptr < b->ptr;
    }

    private:
      BFCAllocator* allocator_; // The parent allocator
  };

  typedef std::set<ChunkHandle, ChunkComparator> FreeChunkSet;
  // List of free chunks within the bin, sorted by chunk size.
  // Chunk * not owned.
  FreeChunkSet free_chunks;
  Bin(BFCAllocator* allocator, size_t bs)
    : bin_size(bs), free_chunks(ChunkComparator(allocator)) {}
};

AllocationRegion

AllocationRegion给一个连续的内存区域做指针到ChunkHandle的映射。

RegionManager

RegionManager聚集了一个或多个AllocationRegion，并提供一个从指针到基础ChunkHandle的间接层，这个间接层可在多个不连续的内存区域进行分配。

3. 分配大小

将每次分配的内存大小调整为kMinAllocationSize的N倍，这样所有内存地址都是很好地按字节对齐了。

// kMinAllocationSize = 256
static const size_t kMinAllocationBits = 8;
static const size_t kMinAllocationSize = 1 << kMinAllocationBits;
...
size_t BFCAllocator::RoundedBytes(size_t bytes) {
  size_t rounded_bytes =
    (kMinAllocationSize *
    ((bytes + kMinAllocationSize - 1) / kMinAllocationSize));
  DCHECK_EQ(size_t{0}, rounded_bytes % kMinAllocationSize);
  return rounded_bytes;
}

4. 初始化bin

typedef int BinNum;
static const int kInvalidBinNum = -1;
static const int kNumBins = 21;
...
// 二进制2^8往左移0，1，2位
// (static_cast<size_t>(256) << 0) = 256
// (static_cast<size_t>(256) << 1) = 512
// (static_cast<size_t>(256) << 2) = 1024
size_t BinNumToSize(BinNum index) {
  return static_cast<size_t>(256) << index;
}
...
char bins_space_[sizeof(Bin) * kNumBins];
// Map from bin size to Bin
Bin* BinFromIndex(BinNum index) {
  return reinterpret_cast<Bin*>(&(bins_space_[index * sizeof(Bin)]));
}
...
// We create bins to fit all possible ranges that cover the
// memory_limit_ starting from allocations up to 256 bytes to
// allocations up to (and including) the memory limit.
for (BinNum b = 0; b < kNumBins; b++) {
  size_t bin_size = BinNumToSize(b);
  VLOG(1) << "Creating bin of max chunk size "
      << strings::HumanReadableNumBytes(bin_size);
  new (BinFromIndex(b)) Bin(this, bin_size);
  CHECK_EQ(BinForSize(bin_size), BinFromIndex(b));
  CHECK_EQ(BinForSize(bin_size + 255), BinFromIndex(b));
  CHECK_EQ(BinForSize(bin_size * 2 - 1), BinFromIndex(b));
  if (b + 1 < kNumBins) {
    CHECK_NE(BinForSize(bin_size * 2), BinFromIndex(b));
  }
}

5. 查找bin

// 求属于第几个bin
BinNum BinNumForSize(size_t bytes) {
  uint64 v = std::max<size_t>(bytes, 256) >> kMinAllocationBits;
  int b = std::min(kNumBins - 1, Log2FloorNonZero(v));
  return b;
}
// 最高位非零的二进制位数，eg: 0001 0101B 为5
inline int Log2FloorNonZero(uint64 n) {
#if defined(__GNUC__)
  return 63 ^ __builtin_clzll(n);
#elif defined(PLATFORM_WINDOWS)
  unsigned long index;
  _BitScanReverse64(&index, n);
  return index;
#else
  int r = 0;
  while (n > 0) {
    r++;
    n >>= 1;
  }
  return r;
#endif
}

6. 查找Chunk

// 先加锁
mutex_lock l(lock_);
void* ptr = FindChunkPtr(bin_num, rounded_bytes, num_bytes);
if (ptr != nullptr) {
  return ptr;
}
// FindChunkPtr函数内部
void* BFCAllocator::FindChunkPtr(BinNum bin_num, size_t rounded_bytes,
                 size_t num_bytes) {
  // First identify the first bin that could satisfy rounded_bytes.
  for (; bin_num < kNumBins; bin_num++) {
    // Start searching from the first bin for the smallest chunk that fits
    // rounded_bytes.
    Bin* b = BinFromIndex(bin_num);
    for (auto citer = b->free_chunks.begin(); citer != b->free_chunks.end();
        ++citer) {
      // 从之前得到的Bin索引开始，查找合适的空闲Chunk：
      const BFCAllocator::ChunkHandle h = (*citer);
      BFCAllocator::Chunk* chunk = ChunkFromHandle(h);
      DCHECK(!chunk->in_use());
      if (chunk->size >= rounded_bytes) {
        // We found an existing chunk that fits us that wasn't in use, so remove
        // it from the free bin structure prior to using.
        RemoveFreeChunkIterFromBin(&b->free_chunks, citer);

        // If we can break the size of the chunk into two reasonably
        // large pieces, do so.
        //
        // TODO(vrv): What should be the criteria when deciding when
        // to split? 
        // 具体实现后面会分析
        if (chunk->size >= rounded_bytes * 2) {
          SplitChunk(h, rounded_bytes);
          chunk = ChunkFromHandle(h); // Update chunk pointer in case it moved
        }

        // The requested size of the returned chunk is what the user
        // has allocated.
        chunk->requested_size = num_bytes;
        // Assign a unique id and increment the id counter, marking the
        // chunk as being in use.
        chunk->allocation_id = next_allocation_id_++;

        // Update stats.
        ++stats_.num_allocs;
        stats_.bytes_in_use += chunk->size;
        stats_.max_bytes_in_use =
          std::max(stats_.max_bytes_in_use, stats_.bytes_in_use);
        stats_.max_alloc_size =
          std::max<std::size_t>(stats_.max_alloc_size, chunk->size);

        VLOG(4) << "Returning: " << chunk->ptr;
        if (VLOG_IS_ON(4)) {
          LOG(INFO) << "A: " << RenderOccupancy();
        }
        return chunk->ptr;
      }
    }
  }
  return nullptr;
}

7. 拆分Chunk

如果Chunk的大小大于等于申请内存大小的2倍，那么将该Chunk拆分成2个：第一个Chunk的大小等于申请内存大小，第二个Chunk作为它的直接后继。

if (chunk->size >= rounded_bytes * 2) {
  SplitChunk(h, rounded_bytes);
  chunk = ChunkFromHandle(h); // Update chunk pointer in case it moved
}

void BFCAllocator::SplitChunk(BFCAllocator::ChunkHandle h, size_t num_bytes) {
  // Allocate the new chunk before we do any ChunkFromHandle
  ChunkHandle h_new_chunk = AllocateChunk();

  Chunk* c = ChunkFromHandle(h);
  CHECK(!c->in_use() && (c->bin_num == kInvalidBinNum));

  // Create a new chunk starting num_bytes after c
  BFCAllocator::Chunk* new_chunk = ChunkFromHandle(h_new_chunk);
  new_chunk->ptr = static_cast<void*>(static_cast<char*>(c->ptr) + num_bytes);
  region_manager_.set_handle(new_chunk->ptr, h_new_chunk);

  // Set the new sizes of the chunks.
  new_chunk->size = c->size - num_bytes;
  c->size = num_bytes;

  // The new chunk is not in use.
  new_chunk->allocation_id = -1;

  // Maintain the pointers.
  // c <-> c_neighbor becomes
  // c <-> new_chunk <-> c_neighbor
  BFCAllocator::ChunkHandle h_neighbor = c->next;
  new_chunk->prev = h;
  new_chunk->next = h_neighbor;
  c->next = h_new_chunk;
  if (h_neighbor != kInvalidChunkHandle) {
    Chunk* c_neighbor = ChunkFromHandle(h_neighbor);
    c_neighbor->prev = h_new_chunk;
  }

  // Add the newly free chunk to the free bin.
  InsertFreeChunkIntoBin(h_new_chunk);
}

8. 回收chunk

加锁，获得ChunkHandle

mutex_lock l(lock_);
BFCAllocator::ChunkHandle h = region_manager_.get_handle(ptr);
FreeAndMaybeCoalesce(h);

FreeAndMaybeCoalesce

void BFCAllocator::FreeAndMaybeCoalesce(BFCAllocator::ChunkHandle h) {
  Chunk* c = ChunkFromHandle(h);
  CHECK(c->in_use() && (c->bin_num == kInvalidBinNum));

  // Mark the chunk as no longer in use
  c->allocation_id = -1;

  // Updates the stats.
  stats_.bytes_in_use -= c->size;

  // This chunk is no longer in-use, consider coalescing the chunk
  // with adjacent chunks.
  ChunkHandle chunk_to_reassign = h;

  // If the next chunk is free, coalesce the two
  if (c->next != kInvalidChunkHandle) {
    Chunk* cnext = ChunkFromHandle(c->next);
    if (!cnext->in_use()) {
    //   VLOG(8) << "Chunk at " << cnext->ptr << " merging with c " <<
    //   c->ptr;

    chunk_to_reassign = h;

    // Deletes c->next
    RemoveFreeChunkFromBin(c->next);
    Merge(h, ChunkFromHandle(h)->next);
    }
  }

  // If the previous chunk is free, coalesce the two
  c = ChunkFromHandle(h);
  if (c->prev != kInvalidChunkHandle) {
    Chunk* cprev = ChunkFromHandle(c->prev);
    if (!cprev->in_use()) {
    //   VLOG(8) << "Chunk at " << c->ptr << " merging into c->prev "
    //    << cprev->ptr;

    chunk_to_reassign = c->prev;

    // Deletes c
    RemoveFreeChunkFromBin(c->prev);
    Merge(ChunkFromHandle(h)->prev, h);
    c = ChunkFromHandle(h);
    }
  }

  InsertFreeChunkIntoBin(chunk_to_reassign);
}

Merge

// Merges h1 and h2 when Chunk(h1)->next is h2 and Chunk(h2)->prev is c1.
// We merge Chunk(h2) into Chunk(h1).
void BFCAllocator::Merge(BFCAllocator::ChunkHandle h1,
             BFCAllocator::ChunkHandle h2) {
  Chunk* c1 = ChunkFromHandle(h1);
  Chunk* c2 = ChunkFromHandle(h2);
  // We can only merge chunks that are not in use.
  CHECK(!c1->in_use() && !c2->in_use());

  // c1's prev doesn't change, still points to the same ptr, and is
  // still not in use.

  // Fix up neighbor pointers
  //
  // c1 <-> c2 <-> c3 should become
  // c1 <-> c3

  BFCAllocator::ChunkHandle h3 = c2->next;
  c1->next = h3;
  CHECK(c2->prev == h1);
  if (h3 != kInvalidChunkHandle) {
    BFCAllocator::Chunk* c3 = ChunkFromHandle(h3);
    c3->prev = h1;
  }

  // Set the new size
  c1->size += c2->size;

  DeleteChunk(h2);
}

以上这篇TensorFlow内存管理bfc算法实例就是小编分享给大家的全部内容了，希望能给大家一个参考，也希望大家多多支持三水点靠木。

TensorFlow内存管理bfc算法实例

- Author -

齐豪

声明：登载此文出于传递更多信息之目的，并不意味着赞同其观点或证实其描述。

Python 相关文章推荐

Python3实现生成随机密码的方法

Aug 23 Python

Python的socket模块源码中的一些实现要点分析

Jun 06 Python

python实现八大排序算法（2）

Sep 14 Python

微信小程序跳一跳游戏 python脚本跳一跳刷高分技巧

Jan 04 Python

django使用xlwt导出excel文件实例代码

Feb 06 Python

python实现微信远程控制电脑

Feb 22 Python

基于Django框架利用Ajax实现点赞功能实例代码

Aug 19 Python

Python调用钉钉自定义机器人的实现

Jan 03 Python

安装完Python包然后找不到模块的解决步骤

Feb 13 Python

30行Python代码实现高分辨率图像导航的方法

May 22 Python

python pip如何手动安装二进制包

Sep 30 Python

详解tf.device()指定tensorflow运行的GPU或CPU设备实现

Feb 20 Python

python numpy数组中的复制知识解析

Feb 03 #Python

opencv python Canny边缘提取实现过程解析

Feb 03 #Python

Pycharm debug调试时带参数过程解析

Feb 03 #Python

Python使用enumerate获取迭代元素下标

Feb 03 #Python

TensorFlow 显存使用机制详解

Feb 03 #Python

opencv python如何实现图像二值化

Feb 03 #Python

python实现人机猜拳小游戏

Feb 03 #Python