[C++] Application of hash - bitset (STL) bitmap

Hash application - bitset (STL) bitmap

1. Introduction to bitset

insert image description here

1. Introduction of bitmap

Look at this interview question:

Give 4 billion unique unsigned integers, not sorted. Given an unsigned integer, how to quickly determine whether a number is among the 4 billion numbers. 【Tencent】

Simply from the perspective of judging whether a number is in a string of numbers, we can easily think of the following methods:

  1. Put these 4 billion integers into the set and unordered_set containers, and call the find function to judge
  2. Sort the 4 billion integers externally, and then go to the binary search

As far as this question is concerned, in order to overthrow the above two methods, first of all, we need to know how much space 4 billion integers take up:

image-20230412171813050

  • According to the calculation, 4 billion integers occupy 16 G of memory, and the light data occupies 16 G. If it is placed in the set container, the internal red-black tree of the underlying layer will also consume loads (storing colors, three-point connection...), and then counting The consumption of 16G is too much, and the memory is not enough to bear it. In the same way, if the memory is not enough, the data cannot be stored in the memory at all, and the sorting cannot be performed.

In order to solve this problem, we need to use bitmap to solve it.

  • ** We are judging whether a data is in the given plastic data, the result is only in these two states or not, then we can use a binary bit to represent the information of whether the data exists, if the binary bit is 1, it means Exist, 0 means not exist. Here we use the hash of the direct addressing method, and use a bit to identify whether the mapped value exists, which is the bitmap. **Example:

image-20230413164600018

For 4 billion integers, we want to open the maximum (2^32 - 1) bits of the integer, which takes up about 500MB of memory:

image-20230412202529485

It can be seen that using the bitmap method greatly reduces memory consumption and can solve this problem well.


2. The concept of bitmap

The so-called bitmap is to use each bit to store a certain state, which is suitable for scenarios where there is a large amount of data and the data is not repeated. It is usually used to judge whether a certain data exists or not.


3. Application of bitmap

  1. Quickly find whether a certain data is in a collection
  2. Sort + deduplicate
  3. Find the intersection, union, etc. of two sets
  4. Disk block marking in the operating system

Second, the use of bitset

1. The construction method of bitset

1. Use the default constructor to construct a 16-bit bitmap, initialized to 0 by default

bitset<16> bs1;

2. Elements are initialized according to the binary bits of a given integer: 0xff ——> 1111 1111

bitset<16> bs2(0xfa2);//0000111110100010

3. Use the string of 01 to initialize: std::string("01101001") ——> 01101001

bitset<16> bs3(string("01101001"));//0000000001101001

4. Use the string of 01 to initialize: (“01101001”) ——> 01101001

bitset<16> bs4("01101001");//0000000001101001

2. Use of bitset member functions

The common member functions of bitset are shown in the table below:

member function Function
set set the specified bit or all bits
reset Clear the specified bit or all bits
flip invert the specified bit or all bits
test Get the state of the specified bit
count Get the number of bits that are set
size Get the number of bits that can hold
any Returns true if any bit is set
none Returns true if no bit is set
all Returns true if all bits are set

Example:

int main()
{
     
     
	bitset<16> bs;
	bs.set(4);
	bs.set(6);
	bs.set(2);
	cout << bs.size() << endl;//16
	cout << bs << endl;//0000000001010100
	//获取指定位的状态
	cout << bs.test(0) << endl;//0
	cout << bs.test(2) << endl;//1
	//反转所有位
	bs.flip();
	cout << bs << endl;//1111111110101011
	//反转第1位
	bs.flip(1);
	cout << bs << endl;//1111111110101001
	cout << bs.count() << endl;//12
	//清空第3位
	bs.reset(3);
	cout << bs << endl;//1111111110100001
	//清空所有位
	bs.reset();
	cout << bs.none() << endl;//1
	cout << bs.any() << endl;//0
	//设置所有位
	bs.set();
	cout << bs.all() << endl;//1
	return 0;
}

Note: When using the member functions set, reset, and flip, if a certain bit is specified, the bit is operated, and if no bit is specified, all bits are operated.


3. Use of bitset operator

As shown in the table:

operator Function Description
>>、<< Input and output operators
= assignment operator
==、!= relational operator
&=、|=、^=、<<=、>>= compound assignment operator
~ unary operator
&、|、^ bitwise operator
[ ] operator[ ] operator

Example:

int main()
{
     
     
	//>>输入、<<输出运算符
	bitset<8> bs;
	cin >> bs;//10100
	cout << bs << endl;//00010100
	//复合赋值运算符
	bitset<8> bs1("101011");
	bitset<8> bs2("100100");
	cout << (bs1 >>= 2) << endl;//00001010
	cout << (bs2 |= bs1) << endl;//00101110
	//位运算符
	bitset<8> bs3("10010");
	bitset<8> bs4("11001");
	cout << (bs3 & bs4) << endl;//00010000
	cout << (bs3 ^ bs4) << endl;//00001011
	//operator[]运算符
	cout << bs3[4] << endl;//1
	cout << bs3[2] << endl;//0
}

3. Simulation implementation of bitset bitmap

1. The basic frame of the bitmap

  • bitset can realize the operation on the digits of the number, and can also access the value of each bit through the subscript similar to the array to perform the corresponding operation. Simulating bitset is to use an ordinary array to store data to achieve the purpose of simulation.
  • If we use an integer as the bit container, then if we require bits in the range of 0~N, we need N/32+1an integer to accommodate these bits. Similarly, if we use char as the container, we need N/8+1a char to accommodate N bits. Here we use the vector array as the underlying container for bits, and the data type it stores is char.
namespace bitset_realize
{
     
     
	//N个比特位的位图
	template<size_t N>
	class bitset
	{
     
     
	public:
		//构造函数
		bitset();
		//把x映射的位标记成1
		void set(size_t x);
		//把x映射的位标记成0
		void reset(size_t x);
		//判断指定比特位x的状态是否为1
		bool test(size_t x);
		//翻转指定pos
		void flip(size_t x);
		//获取位图中可以容纳位N的个数
		size_t size()
		//统计set中1的位数
		size_t count();
		//判断所有比特位若无置为1,返回true
		bool none();
		//判断位图中是否有位被置为1,若有则返回true
		bool any();
		//全部NUM个bit位被set返回true
		bool all();
	private:
		vector<char> _bits;//位图
	};
}

2. Member functions

2.1. Constructor

  • A char type has 8 bits, so in an ideal state, a bitmap of N bits needs to use N / 8 bytes, but only when N is an integer multiple of 8. If N bits are 10, then the calculation is There will be 2 less bits, so considering comprehensively, we give N / 8 + 1 bytes, so that the required N bits can definitely be accessed, and 8 bits are wasted in the case of divisibility at most bit (1 byte)

As for the constructor, we only need to initialize the size of all the bits (N / 8 + 1) bytes to 0.

//构造函数
bitset()
{
     
     
	//+1保证足够比特位,最多浪费8个比特位
	_bits.resize(N / 8 + 1, 0);
}

2.2.set reset test

  • 1、set

The function of set is to mark the position of x mapping as 1, and the implementation rules are as follows:

  1. Calculate x in the i-th char type by x/8
  2. Calculate x in the jth bit of char by x % 8
  3. Use bitwise or | to set the j-th bit position in the i-th char to 1

image-20230413150034584

//把x映射的位标记成1
void set(size_t x)
{
     
     
	//x映射的比特位在第几个char对象
	size_t i = x / 8;
	//x在char第几个比特位
	size_t j = x % 8;
	//利用按位或|把第j位标记成1
	_bits[i] |= (1 << j);
}
  • 2、reset

The function of reset is to mark the bit mapped to x as 0, and the implementation rules are as follows:

  1. Calculate x in the i-th char type by x/8
  2. Calculate x in the jth bit of char by x % 8
  3. After shifting 1 to the left by j bits and inverting it as a whole, it can be ANDed with the i-th char type.

image-20230413150654029

//把x映射的位标记成0
void reset(size_t x)
{
     
     
	//x映射的比特位在第几个char对象
	size_t i = x / 8;
	//x在char第几个比特位
	size_t j = x % 8;
	//将1左移 j 位再整体反转后与第 i 个char进行与运算
	_bits[i] &= (~(1 << j));
}
  • 3、test

The function of test is to judge whether the state of the specified bit x is 1, and the implementation rules are as follows:

  1. Calculate x in the i-th char type by x/8
  2. Calculate x in the jth bit of char by x % 8
  3. Shift 1 to the left by j bits and perform an AND operation with the i-th char type to obtain the result
  4. If the result is non-zero, the bit is set, otherwise the bit is not set

image-20230413150940919

//判断指定比特位x的状态是否为1
bool test(size_t x)
{
     
     
	//x映射的比特位在第几个char对象
	size_t i = x / 8;
	//x在char第几个比特位
	size_t j = x % 8;
	//将1左移 j 位后与第 i 个char类型进行与运算得出结果
	return _bits[i] & (1 << j);
}

2.3.flip count size

  • 1、flip

The function of flip is to flip the specified bit. If the specified bit is 0, it will be 1 after flipping. If the specified bit is 1, it will be 0 after flipping. The implementation rules are as follows:

  1. Calculate x in the i-th char type by x/8
  2. Calculate x in the jth bit of char by x % 8
  3. Shift 1 to the left by j bits and perform bitwise XOR operation^ with the i-th char.

image-20230413151651923

//翻转指定位x
void flip(size_t x)
{
     
     
	//x映射的比特位在第几个char对象
	size_t i = x / 8;
	//x在char第几个比特位
	size_t j = x % 8;
	//将1左移 j 位后与第 i 个char进行按位异或运算^即可。
	_bits[i] ^= (1 << j);
}
  • 2、count

The function of count is to count the number of 1 in the bitmap, and the implementation rules are as follows:

  1. n = n & (n-1) => eliminate the rightmost 1 in the binary number of n
  2. n is not zero and continue to execute the first step
  3. Executed several times to show how many 1s there are in n

image-20230413152942235

//统计set中1的个数
size_t count()
{
     
     
	size_t count = 0;
	for (auto e : _bits)
	{
     
     
		int n = e;
		while (n)
		{
     
     
			n = n & (n - 1);
			count++;
		}
	}
	return count;
}
  • 3、size

The function of size is to obtain the number of bits N that can be accommodated in the bitmap

//获取位图中可以容纳位N的个数
size_t size()
{
     
     
	return N;
}

2.4.none any all

  • 1、none

The function of none is to traverse each char, if all are 0, none returns true.

//判断所有比特位若无置为1,返回true
bool none()
{
     
     
	//遍历每个char
	for (auto e : _bits)
	{
     
     
		if (e != 0)//说明有位被置为1,返回false
			return false;
	}
	return true;//说明全为0,返回true
}
  • 2、any

The function of any determines whether a bit in the bitmap is set to 1, and if so, returns true. This is actually the opposite of the function of none, so we can reuse none directly.

//判断位图中是否有位被置为1,若有则返回true
bool any()
{
     
     
	return !none();
}
  • 3、all

The function of all is to judge whether all bits in the bitmap are set to 1. Pay attention to the particularity here. In order to avoid the situation that N/8 cannot be divisible when opening up the space, we specially opened an extra char type when resizing (8 bits) space, only the first N% 8 bits of these 8 bits are valid (the rest are extra spaces), so the all function needs to be discussed on a case-by-case basis.

  1. First check whether the binary of the first n-1 chars is all 1.
  2. Then check whether the first N%8 bits of the last char are all 1.

image-20230413155421403

//全部NUM个bit位被set返回true
bool all()
{
     
     
	size_t size = _bits.size();
	//先检查前N-1个char
	for (size_t i = 0; i < size - 1; i++)
	{
     
     
		if (~_bits[i] != 0)//取反应该为0,否则取反之前不全为1,返回false
			return false;
	}
	//再检查最后一个char的前 N%8 个位
	for (size_t j = 0; j < N % 8; j++)
	{
     
     
		if ((_bits[size - 1] & (1 << j)) == 0)//和test的原理一致
			return false;
	}
	return true;
}

Guess you like

Origin blog.csdn.net/m0_64224788/article/details/130207142