Data center construction: the waterfall style with tens of millions of levels, or the iterative style with 100,000 levels, which one would you choose?

insert image description here

After ten years in China and Taiwan, looking at it again, it has become a mulberry field.

Initially, in order to solve the problem of massive data accumulation and fragmentation caused by the rapid development of the Internet industry, enterprises began to try to integrate data into a central platform to improve data usage efficiency and management level, and the construction of the central platform took shape. Under the leadership of the giants, from "big and middle stage" to "demolishing the middle stage" and then "going to the middle stage", the middle stage seems to have finished at an extremely fast speed. life.

But is the development of China and Taiwan really going to stop here?

1. The "trap" of Zhongtai: superficial imitation with painted skin but not bones

In order to build a middle platform, a manufacturing company has invested a year and a half and a
capital cost of 60 million yuan. However, the value of such a "middle-end finished product" that consumes a lot of resources has failed to meet expectations in actual operation. On the one hand, it is because the business needs and data quality issues were not fully considered in the initial stage of the construction of the middle platform, which resulted in the inability to connect business in the later stage; on the other hand, because the construction of the middle platform lacked data governance and data quality assurance, the data quality was unreliable and business Departments are reluctant to use. In the end, Zhongtai was identified as a "failed investment", and the investment cost could not be recovered.

Coincidentally, in order to build a middle platform, a company needs to introduce professional technical personnel for support and maintenance. However, due to the scarcity of technical talents and high salary requirements, the company has to invest a lot of resources in the recruitment and training of technical talents, resulting in the compression of resources in other departments and hindering business development. The construction and operation and maintenance of the middle platform are also restricted by the change of professional and technical personnel, which affects the whole body...

There are still many similar rumors of "China-Taiwan overturning". Unlike the initial positive public opinion, when "China-Taiwan" is mentioned now, the first time it is more likely to think of excessive investment in resources, extreme reliance on professional technical support, and inability to respond in a timely manner. Verification effect, high risk, etc., these are often criticized problems.

In fact, regarding the development and value of Zhongtai, there has always been a split between the level of public opinion and the level of practice. , although new companies continue to watch from the sidelines, or learn from and imitate, they have not been widely and effectively replicated in practice, especially for small and medium-sized enterprises; With the gradual rise of "bad-mouthing", more and more companies have begun to really clear up the fog and study the essence of the construction of Zhongtai under the bubble of transitional marketing, that is, the core value of "Zhongtai".

In other words, as the tide recedes, Zhongtai, as a concept of IT architecture and enterprise organizational model transformation, is still being optimized and evolved, and its core, which has positive significance and reference value, is gradually revealing its true face.

Borrow core ideas, not form imitation

The so-called backtracking, before studying the question of "whether to build a middle platform", enterprises need to know what they are really pursuing and what problems they want to solve through the middle platform. Obviously, the answer will not be to build a gorgeous middle platform like a castle in the air, but because it cannot adapt to its own business needs and organizational structure, it will cause the operation to fail, and finally nothing will happen-and this is precisely the problem of middle platform construction, which is easy to fall into The "beautiful trap" of the "beautiful trap"-the ten-million-level data middle platform built by following the trend, the space is "similar" and ignores the long-term value and strategic significance of the middle platform construction.

Talking about construction out of reality is all hooliganism.

The "pseudo-middle platform" built at the expense of the basics not only cannot really solve the problem of data islands, but also cannot play the role of data sharing and data collaboration that the middle platform should have, let alone bring substantial contributions to business innovation and efficiency improvement. After a lot of tossing, only the feathers of the disadvantages of the traditional Zhongtai remained, and then came to the conclusion-"China and Taiwan misunderstood me"-this "pot" was finally carried by Zhongtai.

The key to solving the problem is to remove the dross and keep the essence—now that you know where the trap is, then bypass it, directly grasp and disassemble the core idea of ​​Zhongtai theory, skip the form, and realize the essential innovation.

So, what is the essence of "Zhongtai" mentioned here?

Servitization: the soul of China-Taiwan construction

Before answering this question, we have to figure out the value of He Zhongtai.

Taking the retail industry as an example, in the new retail era, companies rely on e-commerce platforms and social platforms to vigorously promote online sales, and the scale of social e-commerce and e-commerce markets continues to expand. These platforms provide the retail industry with more marketing and sales channels, greatly enriching consumers' shopping choices and experience. At the same time, this also means that retail companies need to integrate information such as customers, orders and inventory from more channels in order to better manage their business and optimize their supply chain. Faced with such a demand, China Taiwan can firstly be used as a data center, internally connected to the enterprise's order management, inventory management, supply chain management, financial management and other systems; externally connected to the shopping cart, payment, and logistics tracking presented to consumers , after-sales service, customer service, etc., achieve data consistency and accuracy through centralized integration and management of data from various channels and systems. On the one hand, it enables consumers to obtain a smoother, more convenient, and more efficient shopping and after-sales experience; on the other hand, it also provides enterprises with more accurate and timely data analysis and decision support, improves internal operational efficiency, and optimizes service quality. Secondly, the flexibility and scalability of the middle platform also provides enterprises with greater freedom, so as to carry out customized development and integration according to business needs, helping enterprises to better respond to market changes and changing consumer needs.

Taking the background of the manufacturing industry as an example, there are many systems and business scenarios within the enterprise, which can be divided into two categories: one is the internal use, not user-oriented systems, such as ERP, BPM, MES, etc. These systems constitute the background of the enterprise The other type is a user-oriented visual system, such as CRM, channel management system, customer service center, etc., which is the front desk; and the middle platform, as the name suggests, is a platform that connects the background and the front desk and provides business capability services. Enterprises provide basic services such as data, business processes, and resource scheduling, making their front-end systems more efficient and intelligent, and improving overall productivity and customer experience.

It can be seen that the data center is essentially a set of enterprise data architecture that combines Internet technology and industry characteristics. By depositing the core capabilities of enterprises in the form of shared services, an open, shared, scalable, scalable An intermediate platform with reusability as the main feature is used to integrate and manage various scattered data and resources inside and outside the enterprise, and provide rapid data preparation capabilities for the business, which is the key to empowering business innovation and improving the efficiency of business innovation mechanism. The importance of Zhongtai lies in this.

When we understand the working principle of Zhongtai theory, we also grasp its essence - service-oriented - the core of which is to precipitate and transform the core data of the enterprise to form a set of self-contained "universal plugs" that can be used Services that can be invoked at any time internally and externally.

Therefore, we have come to the conclusion that the theory of data center is in line with the development demands of the times, and it still has irreplaceable value for enterprises seeking to further transform and upgrade or reduce costs and increase efficiency, but reference is by no means a copy in form , but to "transplant" according to local conditions-take the essence of "servicing" and discard the dross of "ignoring risks and blindly investing".

Since it is not advisable to build a data middle platform with tens of millions of levels at every turn, how can we break through the traditional shackles at the technical level and use what kind of technology or tools to achieve a reasonable and low-cost construction of a pragmatic and business value-oriented middle platform? What techniques or tools are used for this?

2. From large to small, split and iterative: new ideas seen from the rise of modern data stacks

This has to talk about some interesting changes we have seen in the changes in the data technology route for the analysis field.

At present, when enterprises want to improve business insight and build a data platform targeting data analysis, there are two technical routes to choose from:

One is the big data system represented by the Hadoop technology ecology;
the other is the modern data stack represented by Snowflake, Fivetran, and DBT.

Here is some analysis of the two technology stacks:

  • The Fall of Big Data
    insert image description here
    In the traditional technology stack, data processing mainly relies on big data technologies, such as Hadoop, Spark, etc. These technologies are mainly oriented to offline batch processing and are suitable for processing and analyzing large amounts of data. However, the current Internet application scenarios put forward higher real-time and interactive requirements for data processing.

Big data is gradually being marginalized by the development of the times, and its development has problems to a certain extent. Some representative points include:

  • Long setup and learning process : Setting up and learning a big data system takes a lot of time and effort. From data collection, to data cleaning, processing and storage, to data analysis and application, this process needs to be continuously adjusted and improved to adapt to changing business needs and market trends.

  • Slow response to new information : Big data analytics systems often need to run models and algorithms on large amounts of data to find useful information and trends. This process consumes a lot of computing resources and time, so it is relatively slow to respond and may take a while to produce meaningful results.

  • The cost of insight is high : In the process of big data analysis, a lot of technology and resource investment are required, including hardware and software equipment, talent training and recruitment, and data storage and processing. These costs are high and can cause hesitation and confusion for businesses and organizations when deciding whether to invest in big data.

Many big data projects can only achieve data collection and storage, but they have no way to start the application of data. Therefore, although some projects have achieved certain results within a period of one or two years, they often can only be stranded at this stage and cannot be further advanced. Due to the large and complex big data technology stack, planning and staffing require a lot of time and resources, and once adjustments or changes are required, the investment cost is also very high.

In addition, the collection and storage of historical data is also a thorny issue for big data. Although historical data also has value in big data analysis, for many business scenarios, the most valuable data is usually the latest part. In many cases, these data need to be collected and analyzed in real time in order to make timely decisions and adjustments. However, the cost of big data technology for storing, computing and using data is very high. Compared with the value generated, the cost is too high.

Therefore, starting from 2018, the three big data vendors Cloudera, MapR and Hortonworks have been acquired or merged one after another. For big data that is stuck in a bottleneck, the decline in development is inevitable.

  • The Rise of the Modern Data Stack
    insert image description here
    The development status of big data is urging us to introduce a more flexible technology stack. The concept of the Modern Data Stack (MDS) was thus proposed and gained more and more recognition. Its foundational definition is: "A series of data tool ecosystems that have emerged due to the rise of cloud data warehouses".

insert image description here
The translation is to split the tools we need in the digital construction process into individual modules, and then start from the problem and select the required modules according to business needs, instead of building a unified data platform or data at one go as in the past. Middle platform. Modern data stacks usually combine cloud services such as cloud data warehouses, and exhibit the following key features and advantages:

  • Cloud-native, hostable : Modern data stacks are often cloud-native and can be built and hosted on cloud platforms. This means computing and storage resources can be increased or decreased at any time, and scaled up or down flexibly. This managed approach can help enterprises reduce operating costs and management burden.

  • Composable and pluggable : The components of a modern data stack are generally composable and pluggable. This means that enterprises can choose and combine different components to build data processing processes according to their own needs. This flexibility can help enterprises quickly adapt to different business needs and data scenarios.

  • Iterative : Compared with the traditional top-down development methods of middle-end or big data projects, the modern data stack tends to be constructed and evolved in an iterative manner, featuring agile development, lightweight, scalable, and open It can respond to business needs and changes more quickly, and can achieve rapid iteration and delivery through continuous integration and continuous deployment.

  • Self-service : Self-selection can be completed without supplier intervention, and data processing and analysis tools can be easily used by non-technical experts. This self-service approach can help enterprises reduce their dependence on technical personnel, and at the same time realize business needs more quickly.

Starting from the source, the data will go through the steps of data access collection, processing, and business value display. Based on this, the modern data stack provides a variety of tools, including data warehouses on the cloud, integrated tools, and analysis tools, etc., which can help companies complete a fast project in a short period of time, and the time cost can be compressed to a week. , the capital cost can be as low as several thousand to tens of thousands of yuan, and may even be free.

Compared with the traditional big data technology stack, the modern data stack pays more attention to service. In other words, the modern data stack itself is a service-oriented technology stack, which also emphasizes full-service support and interactive services, allowing users to use a variety of tools and technologies to manage and process data, aiming to provide more comprehensive and flexible , Efficient data services, better support business needs, and help enterprises better realize digital transformation.

Under the development model of the modern data stack, if an enterprise can choose the right tool at the right link, it will be a good start for its own digital transformation with half the effort. So, what if we apply this concept to the data center construction for global business we mentioned above?

3. Build a data center with the concept of a modern data stack

First of all, let us refer to the logic of the modern data stack, and disassemble the data center according to different functional modules.

The data center usually includes the following layers of architecture:

  • Data integration layer : responsible for integrating data from different data sources together, and performing necessary data cleaning and conversion.
  • Data storage layer : Responsible for storing data in a unified data warehouse and providing efficient data query and storage capabilities.
  • Data development layer : Provides a series of tools and platforms for data analysts and developers, enabling them to quickly develop and deploy data analysis applications and data products.
  • Data governance layer : responsible for managing and maintaining data metadata, standards, quality, etc., to ensure the correctness, consistency and reliability of data.
  • Data service layer : Provide data services for different business departments, data analysts, and external customers within the enterprise, and promote data to become an important part of enterprise value.

These modules are divided and conquered to form a scalable and maintainable system. Data flows through each layer, and finally forms the high-value and reusable precious resources required by the enterprise. In this process, each step or steps can be completed by an independent tool or product. This again involves the question of how to choose the type of enterprise.

The following is a list of some common solutions or tools for different links:

  • Data Integration : Fivetran/Airbyte/Tapdata
  • Data storage : Hive/MongoDB/Doris
  • Data Development : DBT/Tapdata
  • Data Governance : Atlan/Informatica

At this time, in order to avoid the risk of failure to achieve the expected goals due to excessive one-time investment, which is common in the traditional middle platform construction scenario, enterprises can adopt an iterative approach to gradually realize the construction of data middle platform.

According to actual needs, a preliminary data center is determined after comprehensive consideration of a series of factors including enterprise size, business complexity, number of systems, number of supported business scenarios, business value, budget, and human resources. Architecture scheme. Then, according to this plan, select one or more key modules for construction, testing and optimization, and on this basis, gradually establish the infrastructure and data assets of the data center in stages. In addition to effectively reducing the risk of input and output, iterative advancement can also allow enterprises to continuously accumulate experience and knowledge throughout the construction process, providing more reliable support for future data analysis and business innovation.

Let's take a look at how to use Tapdata to complete the "first step" here, which is often the most critical and most obvious part of building a data platform - opening up data and warehousing.

4. Tapdata: The perfect integration of service concept and modern technology stack

Tapdata LDP (Live Data Platform) is the ingenious realization of such a "service concept of data center + modern technology stack model" at the product level. As a data-as-a-service platform with real-time data replication capabilities , Tapdata quickly connects enterprise data islands in a code-free manner , integrates data into the central data platform in real time , forms a reusable standard data model, and provides multiple downstream interactive Apps provide always fresh data.

insert image description here
Tapdata solves the first step in all data scenarios: data integration. But the biggest difference from traditional data integration is that it provides a high-speed data caching layer:

insert image description here
Tapdata with caching upgrades the traditional data integration architecture to a data service architecture. It has the following advantages:

  • Reusable: Once integrated, it can be reused multiple times, greatly reducing labor costs and improving data integration efficiency
  • Based on high-performance distributed MongoDB, it can directly provide high-performance query services and seamlessly upgrade the query capabilities of existing relational databases
  • Flexible model storage makes it easier to easily integrate the same type of data with different structures from different data sources

5. Connect to an isolated island once and serve N scenarios

Compared with the traditional large-scale middle-end platforms that often lead to "acclimatization", such a data service platform that is more flexible, more efficient, more economical, and more real-time in line with the concept of modern data stacks is exactly what we have been looking for. The product after saving the dregs and saving the essence can practically help enterprises quickly realize data sharing and exchange, improve the efficiency of data use, and better serve business development.

It is worth noting that when building their own data service platform, enterprises should choose platform tools that have been fully verified to ensure the security and stability of the platform. In addition, enterprises also need to conduct in-depth demand analysis according to their own business needs, select appropriate data service modules, and promote the rapid implementation of service-oriented construction. For the different needs and budgets of enterprises of different sizes, Tapdata also provides customized product + consulting service solutions.

Want to know more about the technical architecture of the data service platform, get more industry cases? Welcome to the live broadcast event to be held on May 10 - how to successfully build a data service platform of "connecting one isolated island and serving N scenarios".

At that time, Tapdata will also release the latest cloud data service platform, bid farewell to millions of data centers, tens of thousands of yuan including products + consulting, to help you build an agile enterprise-level digital base. For details, please pay attention to our conference special offer! Click here for event details and how to register.

insert image description here

Guess you like

Origin blog.csdn.net/weixin_58202160/article/details/130288100