The data bridge of the advertising business system - "log center-exposure data transfer and settlement"

The data bridge of the advertising business system - "log center-exposure data transfer and settlement"

Exposure data transfer settlement

Exposure data is an additional concern of ADX, and its data flow is an important bridge for communication and settlement, involving revenue.
Exposure data is divided into production locations, and can be divided into interface exposure, real exposure, Client loading exposure... and many other types. Different types of exposure correspond to different exposure settlement forms in the advertising contract. In addition to impressions, there are other forms of payment such as conversions and clicks

The interface exposure is at the highest production position, and it is the final candidate for the advertising space delivered by the server service; followed by the client loading, and the server loaded by the client sends the candidate; finally, the real exposure, that is, the advertisement candidate that appears in the user's field of vision; and the exposure The scale is in descending order in this order.

Note: The exposure data of the main body of this article is the exposure of the interface.

Pipeline Architecture Facilitates High Availability

According to the above introduction, it can be known that the granularity of exposure data is pv, that is, one request data contains the exposure information of multiple advertising slots. The data size is positively correlated with the flow entrance and fluctuates accordingly. Of course, it can also be the granularity of advertising space, but the data scale will be doubled several times, and the coupling analysis with pv requires additional mapping/aggregation logic. Different scenarios depend on the situation.

We need to consider the maximum carrying capacity and scalability of data transfer services to ensure high availability and high performance of services. **Here we introduce an architectural pattern called " pipeline architecture ", which is suitable for undertaking the data flow function in this scenario.

Pipeline Architecture Pattern Diagram

insert image description here
In this mode, the data flow starts from the middleware and ends with the middleware, and the flow service is sandwiched in the middle.

The middleware efficiently responds to traffic scale and traffic dynamics/differentiated rises and falls, and effectively isolates/eliminates flow and downstream bearing pressure.
The data flow service is perfectly decoupled from the upstream and downstream services, and the business functions are highly specialized while being lightweight. Under the premise of no loss of data, it supports flexible and elastic deployment/function changes of services, and has strong scalability.

In data streaming business, this mode is widely used in such scenarios.

Special cache design in streaming link

Returning attention to the data transfer service, the exposure data will be reorganized/split/layered, and the data will be abstracted and reorganized into a "public field-user field-advertisement field-extended field..." style, and kv will be formed in pvId/uuid Format.

In the Kv format, k will be used as the unique identifier of exposure data for subsequent settlement and data labeling processes.

In addition, since the flow of exposure data flow is ADX full data, data caching will be performed here to provide data support for the "monitoring and reporting" service in the next part.

First, the second level cache

Here, the cache is divided into two levels, the first level cache is Redis, and the second level cache is HBase. [It can also be multi-layered, depending on the scene] Students who are familiar with the cache system, especially those whose business involves ticketing and e-commerce. You will have a better understanding of the significance of cache tiering.

insert image description here
Here is a brief description based on the current scenario:

  • Just need performance
    • Where caching occurs, it is all about improving data read performance. Especially under high concurrency/scale traffic, the carrying capacity, read and write characteristic performance, and data storage scale of traditional DB, Mysql/SqlServer and other components will be obviously out of place.
  • hot and cold data
    • In some scenarios, there are data sets with extremely high frequency of reading and writing, forming hot data; the corresponding data is cold data. In view of this characteristic of data, hot data is often placed in the first level cache layer, and cold data is placed in the second level. Because one layer is often in the memory space [the kernel high-speed buffer is not involved for the time being], the high-speed processing speed makes this limited space very expensive.

Based on some of the above factors, the hot and cold data will be separated according to the time period. Hot data is stored in Redis, while cold data is stored in HBase, and the data format is kv.

Nosql data cache component

Come here to answer a question about the selection of cache components.

The second level cache is often a traditional component such as Mysql, but here is HBase, what is the difference?

Students who have this question may be in contact with more conventional businesses, mostly based on structured data. It is not surprising that they do not have a deep understanding of the selection between storage and data structures.

The core data flow of the ADX system is based on stream processing. While ensuring failure, both accuracy and reliability need to be considered. It is quite different from the conventional business data flow processing method.
Another point is that in this scenario, the data in the cache system is in kv format, which belongs to the No-sql type, and the relational database cannot be used as a data carrier.

HBase is an open source, distributed, multi-version data, and column-based nosql database. [ For a detailed introduction to HBase, please pay attention to the follow-up blog post/public account, and do not focus on it for the time being]

  • strong compression ratio
    • Columnar storage can have a high compression ratio and store unlimited data in a limited space. Other columnar storage is also available, levelDB, Prometheus, etc.
  • objective throughput
    • Relying on Hadoop's distributed file system HDFS as the underlying storage, HBase can provide random, real-time read and write access for massive data tables with billions of rows and millions of columns.

In the end, the exposure data is divided into the settlement type field, and flows to different settlement services for counting processing.

Good architectural design and suitable technical component selection play a decisive role in enabling the business/team to deliver value quickly and continuously!

s2s monitoring report

s2s monitoring and reporting means that ADX transfers advertising exposure, interaction [click/play/download/close...], and Win data to DSP [advertiser] in the form of an interface. Advertisers perform data attribution and settlement based on this data, which is the core channel to explain/synchronize the effect of advertising...


See follow-up article!

Recommended reading:
Advertisement, recommendation, search three top complex business "Advertising Business System Details"
Advertising Business System Inheritance of the Past and the Future - "Message Center"
Advertising Business System Data Transfer Station - "Log Center - Real-time Service Monitoring"
The data bridge of the advertising business system - the core channel of the "log center-exposure data transfer and settlement"
advertising business system - the auxiliary decision-making of the "log center-s2s monitoring and reporting"
advertising business system - the "AB experimental platform"
advertising business system Framework Precipitation——Smart Fuse of “Data Consumption Service Framework”
Advertising Business System—Agile Delivery of “Smart Flow Control”
Advertising Business System——Business Connection of “Deployment Based on Docker Containers”
Advertising Business System—“PDB - Advertisement delivery [quantity and price]"


Get it done with three lines of code - Reversing the linked list...
Kafka's high-throughput, high-performance core technology and best application scenarios...
How HTTPS ensures data transmission security - TLS protocol...
Build a real-time monitoring system based on Prometheus + Grafana in five minutes...

Guess you like

Origin blog.csdn.net/qq_34417408/article/details/128643326