How to reduce costs and increase efficiency in the context of building a super-large-scale storage architecture?

Today, with the ever-changing big data services emerging, we can see that storage services as the foundation of data are facing more and more challenges from complex environments and requirements. Both offline big data storage and online KV storage serve more and more data application scenarios. The diversification of storage business forms has led to the evolution of various storage systems. For example, the application practice of byte-beating ultra-large-scale big data storage has brought us some new technical features evolved on the HDFS architecture under the data scale of tens of exabytes. Through multi-room architecture, hierarchical storage, and effective data scheduling, etc., storage costs are reduced and data usage efficiency and security are guaranteed.
Data storage applications often have more linkages with upper-level computing. The separation of storage and computing is a relatively cutting-edge evolution direction. The amount of data and the complexity of business will pose very high requirements and challenges for architecture evolution. We can listen to how the Bilibili log system solves these problems in the context of reducing costs and increasing efficiency.
Graph storage is a very technically challenging storage product, and it is indispensable in some specific scenarios. We specially invited the relevant technical director of Xiaohongshu to share with us how to deal with the challenges brought by trillion-level social relationships in graph storage practice.
For online applications, the stability and availability of data services is very important. Stability includes not only the availability of the service itself, but also the stability of data delay. Regarding this direction, we might as well listen to the best practices of KV storage from ByteDance in solving the problems and challenges of large-scale multi-regional deployment.

Topic: Data Storage Application Practice

Producer: Feng Wei, Head of ByteDance Big Data Storage Technology
Personal introduction: ByteDance big data storage technology leader, has 10+ years of technology and product experience in the field of distributed storage. Currently responsible for the R&D and operation of ByteDance big data storage products. Mainly responsible for products and directions including HDFS (self-developed), data lake (storage) and volcano big data storage acceleration products, etc., involving the management and governance of dozens of exabytes of data.

Lecture schedule

Mao Qi Xiaohongshu Infrastructure Storage Director
Personal introduction: He has been engaged in the core development and architect roles of storage products in EMC, Huawei, and Alibaba Cloud. Currently, he is responsible for the R&D and architecture evolution of NoSQL KV database, graph database, and NewSQL database in Xiaohongshu.
Speech title: Xiaohong Written Graph Storage Practice for Trillions of Social Network Relationships
Speech outline: Xiaohongshu is a community-based product, which covers life communities in various fields and stores massive social network relationships. In order to solve the problem of updating and associated reading of ultra-large-scale data in social scenarios, and reduce database pressure and costs, we have developed a graph storage system REDtao for ultra-large-scale social networks, which greatly improves system stability. It encapsulates the cache and the underlying database, and provides a unified graph query API to the outside world, achieving access convergence and efficient edge aggregation in the cache.
Audience Benefits: Applications and Benefits of Graph Storage Systems in Social Network Relations.
 
Tian Yong ByteDance HDFS product technical director
Personal introduction: ByteDance HDFS product technical director, has participated in the development of multiple distributed products such as files, objects, and NoSQL, and has 10+ years of technical experience in the field of distributed storage. Previously, he was responsible for the research and development of NoSQL products such as Mola/Table at Baidu. At present, the main focus is on the technical architecture evolution, cost optimization and tens of exabytes of data governance of Byte HDFS products.
Speech title: Exploration and Practice of HDFS in EB- level Storage Scale in Bytes
Speech outline: HDFS is the oldest and largest storage system in Byte, with a storage scale of tens of exabytes and an operation time of more than 10 years. It supports a variety of near-offline scenarios such as big data, machine learning, and Flink/AP/MQ. Along with the development of the Byte-based business, products and technologies have undergone a series of evolutions, forming the unique characteristics of Byte: including the deployment of a single large cluster with multiple computer rooms, and solving the problems of performance and startup efficiency in the community version through the C++ reconstructed version of NameNode/DataNode. Construct a hierarchical storage system, combine the upper-level ecological data access paradigm to manage the flow of data in multi-level storage and across AZs, reduce data storage costs, and improve data access efficiency. In addition, machine learning is further combined to identify users' mistaken deletion behaviors and improve data security. This sharing mainly revolves around the exploration and practice of byte HDFS products in the above work:
  1. New features of Byte HDFS;
  2. Multi-computer room architecture challenge;
  3. Hierarchical storage practices;
  4. Data Security Protection Practices
Audience benefits:
  1. What new features does the Byte C++ reconstructed version of NameNode/DataNode introduce?
  2. How to combine the big data ecology to realize the fine-grained governance of massive data to ensure business stability and achieve cost optimization?
  3. How to design multi-computer room architecture? How to solve the cross-room access bandwidth bottleneck?
  4. How to combine AI to realize data accidental deletion protection?
 
Mingmin Xu Director of Infrastructure Microservices at bilibili
Personal introduction: Graduated in 2011, worked in Ali, Microsoft, TouchPal, ByteDance and Bilibili successively, mainly engaged in distributed cache/distributed storage/service governance/observable and other related work. After joining Bilibili, as the leader of the microservice direction, he is mainly responsible for the construction of service governance/message queue/load balancing/observability. Personally, I am more interested in distributed systems, performance optimization and new hardware applications.
Speech topic: Evolution of Log Platform Architecture at Station B
Speech outline: The main content is how the log platform of Station B has gone from 1.0 to the current 3.0 storage-computing separation/offline unified architecture. What difficulties were encountered, what choices and thinking were made on the structure, and how to achieve the goal of reducing costs and increasing efficiency with limited manpower and resources.
Audience benefits:
  1. How to make technology selection and planning based on the current situation of the Bilibili team
  2. How does the Bilibili log platform realize the unification of offline and online
  3. How does the Bilibili log platform implement step by step cost reduction and efficiency increase
 
Liu Jian Head of Product R&D of ByteDance Abase
Personal introduction: Bytedance Abase Product R&D Director, has 10+ years of technical experience in the field of distributed storage. Participated in the research and development of storage systems such as Mola and Aries at Baidu. At present, the main focus is on the stability, cost, data ecology, and multi-regional support of ultra-large-scale NoSQL databases.
Speech title: Abase2: CRDT support practice in global NoSQL databases
Speech outline: Abase is one of the most widely used and largest NoSQL databases in Byte, with a peak QPS of tens of billions and a data scale of exabytes. It supports online KV storage scenarios for almost all businesses of the company, such as recommendation, search, advertisement, headlines, vibrato, e-commerce, etc. With the development of business, more and more users need to deploy Abase clusters in different physical regions and synchronize data to solve the problems of nearby reading and writing, disaster recovery and resource bottlenecks. At the same time, because a large number of users access Abase through the Redis interface, we have designed and realized the multi-regional deployment architecture of Abase2 and provided CRDT support for the main Redis commands. Therefore, this sharing will focus on the engineering practice of Abase2 in global deployment support. The specific content includes:
  1. Requirements and Challenges of Byte Multi-Geographic Deployment
  2. Introduction to the architecture of Abase2
  3. CRDT (conflict-free replicated data type) solution introduction
  4. CRDT support engineering practice for String/Hset/Zset commands
Audience benefits:
  1. How to solve the requirements of cross-region database deployment/synchronization/consistency
  2. How to implement CRDT support for Redis main commands
  3. How to achieve high performance while supporting CRDTs
  4. How to achieve cost optimization in the process of multi-regional deployment
 
 
 
RustDesk 1.2: Using Flutter to rewrite the desktop version, supporting Wayland accused of deepin V23 successfully adapting to WSL 8 programming languages ​​​​with the most demand in 2023: PHP is strong, C/C++ demand slows down React is experiencing the moment of Angular.js? CentOS project claims to be "open to everyone" MySQL 8.1 and MySQL 8.0.34 are officially released Rust 1.71.0 stable version is released
{{o.name}}
{{m.name}}

Guess you like

Origin my.oschina.net/u/5941630/blog/10086579