Hbase--Introduction and application scenarios
Others
2021-03-01 22:39:55
views: null
Introduction and application scenarios
1 Introduction
origin
- Three papers on the first three carriages of the 21st century
- GFS→HDFS
- MapReduce→MapReduce
- Bigtable→Hbase
The difference between real-time and offline
- The essence of big data: to process the data to obtain value from the data
- The value of data will gradually decrease over time
- Offline: The data has been generated for a period of time, and then the data is processed
T+1:今天处理昨天的数据
- Annual bill: processed once a year
- Timeliness: above the hour level
- Real-time: Data is processed as soon as it is generated
- Real-time risk control
- Real-time recommendation
- Timeliness: within seconds
- The data is just generated in the last second, and the data processing is applied immediately in the next second
- How to build real-time?
- Data in real time
产生
- Data in real time
采集
: Flume
- Real-time data
存储
: Hbase, Kafka
- Real-time data
处理
: Flink, SparkStreaming
- Data in real time
应用
Bigtable/Hbase design ideas
- Requirements: How to read and write big data quickly?
- Distributed solution: storing big data
- Distributed database
- Manage stored big data through structures such as databases and tables
Hive
通过数据库与表的形式来管理大数据
- The bottom layer is still HDFS
- Realize row and column data management based on files
HDFS
- Application scenarios:
一次写入,多次读取
- Disk-based distributed storage: random read and write disks
Question 1: How to achieve fast read and write access to data in a computer?
- Storage of data in the computer: hard disk, memory
- Reading and writing speed: from fast to slow
顺序读写内存
顺序读写磁盘
随机读写内存
随机读写磁盘
- Data must be in memory to improve read and write performance
- Hbase is the priority to read and write memory
- Write: write memory first
- Read: read memory first
Question 2: The memory is relatively small, how can I store big data?
- Distributed design
- Build distributed memory storage
- The data written to Hbase will be partitioned and written to the memory of different machines
- 100GB of data, 10 machines [32GB*10 = 320GB]
- 10GB is written into the memory of each machine
Question 3: The capacity of memory can never be enough for data storage?
- Data cannot be stored in memory forever
- In the law of data processing: the probability of new data being processed is much greater than that of old data
- solve
- New data is stored in memory
- New data and frequently used data
- Old data is persisted on the hard disk
- Question: What if I want to read old data?
- Reading from the hard disk for the first time, after reading, the data will be put into the cache [memory]
- Read this data later, just read it directly from the cache
- Sort: organize all the data written to the hard disk in an orderly manner
- Finding data from ordered data is very fast
Question 4: The hard disk is easily damaged. What should I do if the data in the hard disk is lost?
- How to ensure that the data can still be read when the hard disk is broken?
- Store data in HDFS
- HDFS data is stored in the hard disk
- Use copies to ensure that data will not be lost
to sum up
- Prioritize reading and writing data by building distributed memory
- Write: memory
- Periodically write old data in memory to HDFS
- Free up memory to store new data
- read
先读内存
如果没有就读缓存【内存】
如果没有就读HDFS
读完以后,将数据放入缓存
- Bottom layer
用HDFS来实现数据的持久化
既实现了数据的快速读写,又保证了数据的安全
- The main difference from the bottom layer of Hive
Hive -> 读写 -> HDFS
Hbase -> 优先读写内存 -> 读写HDFS
- Official website: hbase.apache.org
![Insert picture description here](https://img-blog.csdnimg.cn/20210227171734311.png)
- Apache HBase™ is the Hadoop database, a distributed, scalable, big data store.
- Hbase is the database of hadoop, a distributed and scalable big data storage framework
- Use Apache HBase™ when you need random, realtime read/write access to your Big Data.
- When you need random real-time read and write access to your big data, you can use hbase
2. Function
- Realize fast random real-time reading and writing of big data
3. Application scenarios
- E-commerce
- A large amount of product and order information are stored in the back-end database
MySQL存储近半年的订单
Hbase可以存储所有商品的信息
- game
- Orders, operations, upgrades
- financial
- telecommunications
- SMS, phone
- Print call log
- traffic
- Real-time high-performance random large data volume read and write
Origin blog.csdn.net/qq_46893497/article/details/114181210