MaxCompute 2.0 helps ZhongAn Insurance grow rapidly

Abstract: In the special session of Alibaba Cloud Big Data Computing Service (MaxCompute) at the 2017 Yunqi Conference, Wang Chaoqun, Data Director of ZhongAn Insurance, gave a speech on MaxCompute helping ZhongAn Insurance. This article starts with the advantages of MaxCompute, and then discusses the benefits that big data can bring to the company's operations. Finally, it focuses on the data platform construction of ZhongAn Insurance, including task scheduling, metadata, and data quality monitoring.

Original address: http://click.aliyun.com/m/43993/

2017 Yunqi Conference Alibaba Cloud Big Data Computing Service (MaxCompute) special session, ZhongAn Insurance Data Director Wang Chaoqun gave a speech on MaxCompute helping ZhongAn Insurance. This article starts with the advantages of MaxCompute, and then discusses the benefits that big data can bring to company operations. Finally, it focuses on the data platform construction of ZhongAn Insurance, including task scheduling, metadata, and data quality monitoring.

The following is a summary of the highlights:

ZhongAn Insurance, as the first Internet company in China, has used MaxCompute for its computing platform since its inception.

Image Description

Why choose MaxCompute?

At the beginning of the establishment, we also made a choice between our self-built platform and MaxCompute. We mainly considered five aspects: robustness, interaction with application systems, scalability, strong data security, and low cost.

Robustness: 7*24 service capability, abnormal recovery time;

interaction with application systems: data source acquisition and data output efficiency and cost;

scalability: when the data grows exponentially, the computing power is elastic;

data security: data abnormality attack Protection, providing multi-layer sandbox protection and permission system;

Cost: Comparison of self-built costs and MaxCompute costs.

image description

First of all, in 2013, there were not many computing platforms that could provide complete capabilities. MaxCompute was incubated in Alibaba Finance's production system and exported after verification, supporting more than 5,000 computing power, meeting our requirements for flexibility and scalability; With the trust of Alibaba Cloud's professional capabilities, we can see that Alibaba Cloud's computing share in China is far ahead; finally, MaxCompute is not only a computing platform, it also provides the ability support for analysis and mining tools, and provides available IDEs (DataWorks, Studio ) development tools, which will reduce development costs during our initial machining development process.

What disruptions can big data bring to company operations?

Picture description The development of the overall ecological chain of

cloud computing and big data is shown in the figure. The annual growth rate of domestic cloud computing is over 60%, and the number of new AWS functions is considerable. Cloud computing is getting closer to life. Since the birth of hadoop, the product variety has greatly increased Increase, the ecosystem is getting bigger and bigger.

Big data lies not only in its tools, platforms, and ecosystems, but also in its ability to empower people and scenarios, and support ecological development through empowerment. Alibaba has tens of thousands of people working with MaxCompute every day. The new occupations created by empowerment, in turn, practitioners will also feedback big data and enrich the scenarios of big data. In the development of ten years, the investment of people and resources is also feeding back the results, and at the same time, there are also benign returns on capital and continue to invest heavily. The data industry forms a closed loop.

Picture description

ZhongAn is an insurance-focused company. We provide cross-ecological connections and carry out cross-ecological cooperation with various sub-industries, including e-commerce, 3C, automobiles, etc. These products have opened up various ecological partners and will also increase our For user contact, through cooperation with more than 300 ecological partners, we have accumulated a large amount of user data and information. Ultimately, we hope that ZhongAn can not only serve these ecosystems, but also expand and strengthen ZhongAn's own open platform through data accumulation, customer accumulation, and brand accumulation.

By the end of 2016, we had served 492 million users and 7.2 billion insurance policies, providing the first insurance policy for the new generation of China's Internet. Among them, people under the age of 30 account for about 50%, indicating that ZhongAn Insurance represents this new way of life, and this group of people has sufficient asset production capacity, and their recognition and awareness of insurance is higher. It is the main force of consumption in the future.

ZhongAn Insurance's Data Platform Construction

Every string of numbers is the result of the efforts of all the company's employees. So, what have you done based on the MaxCompute data platform? How to support rapid business development?

Picture description The

data platform is divided into platform tools, data monitoring and data services. The data itself is multi-source heterogeneous data, and the value of data lies in its liquidity and openness. Only when the data is processed and quality-checked and provided to users can it generate value. Platform tools include MaxCompute, data synchronization, task scheduling, and computing storage management; data monitoring includes early warning systems, metadata, blood relationship, and data quality; data services include data portals, self-service data retrieval, and service APIs.

Task Scheduling System

Picture Description

Task scheduling is essentially the status of completing the data processing workflow. Data processing is a multi-link process. How to ensure the correctness of the data sequence? We support daily, weekly, monthly and other different periodic scheduling, support Group priority, support hourly tasks, support custom time scheduling, and the daily task volume exceeds 1W.

Task scheduling is a directed graph. Each node can see that there is a lot of source data. Red data represents an error state, blue represents success, green represents running, and yellow represents an existing state. The processing of different tasks comes from many data sources, which will cause us confusion. If the information is wrong, is it the fault of the task itself or the problem caused by the results of the upstream data source? So, how to make the development faster to locate the problem, reduce the development cost, and provide a unified caliber? We solve it with metadata.

metadata

image description

Data includes opening up the relationship between data and data, which is conducive to model optimization and anomaly location, and opening up the relationship between data and people, which is conducive to cost optimization. Data relationships include data dictionary information, blood relationship information, storage and output information, table owner information, and business metadata information, which promotes storage and computing optimization to reduce MaxCompute usage costs.

The left picture shows the basic information between the data, as well as the data output information and blood relationship; the right picture shows the source of the table, and the output will affect which tables in the next round. After obtaining the information, we will open up the data and the data. communicate with data.

Picture description After

storage optimization, the cost is reduced by 30%. Through storage calculation optimization, the invalid storage is reduced, and the calculation efficiency will be improved.

Data quality monitoring

Picture description

Data quality monitoring is embedded in the execution state of the task itself by slicing, executing the self-processing of the task, determining its own state, and verifying the accuracy of the data based on rules and templates. Only Ok will be used downstream, so that Data pollution is avoided, and self-exposure errors do not depend on downstream. It is characterized by the use of MaxCompute's statistical item collection function. Rules are statistical item rules, including table and field levels. Templates are the integration of rules + periods + statistical functions. It turns post-event monitoring into in-event monitoring, supports user customization, and overrides Key tasks, the coverage rate is 30%.

Data Services and

Security What do we consider when consuming? Data needs to be open and circulated. What should we be careful about in openness and circulation? Both data breaches and security can lead to disasters for companies.

Technically, we assign different levels based on ACL and role management. We do table and field level permission level control, establish sensitive information mask, encryption approval process for confidential information, openness and security, based on technical control and process To control, various roles require data. The foundation of openness is security control, and the key to openness is process management. We strike a balance between openness and security.

In the construction of the data platform, to maintain the three stages of availability, ease of use, and applicability, it is necessary to go through multiple iterations to upgrade the system. Data is a service. To meet the different data needs of users, data is infrastructure, and every company faces the construction and use of data platforms.

The richness of the MaxCompute ecosystem, the sharing of resources and tools, and the in-depth support and support for mining algorithms can be powerful enough to meet our needs. We can spend more time to contact users and create value for users. The cost of MaxCompute is also gradually decreasing. In the future, I hope MaxCompute will provide support for more modes, including UDF\resource libraries such as IP libraries, python algorithm packages for mining, and artificial intelligence platform support.

Identify the following QR code to read more dry goods
Image description

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=326204474&siteId=291194637