How to support the research and development of CSDN personalized recommendation system reconstruction

A content-based software, its recommendation system plays a pivotal role in the data side of the software. Three aspects of data determine the grade of this content software.

  • The quality of the data is good or bad
  • The correlation between data and user needs is good or bad
  • Hierarchy of data is good or bad

Usually, what we refer to as recommendations includes two different types of recommendations

  • Related recommendations under the content details page
  • Various recommendations and other information flows in the main view of the software, which belong to the [Personalized Recommendation] section

This article introduces our continuous improvement work in the various recommendations and other information flow parts of the main software view of CSDN. This part is prone to these problems under various technological waves

  • Not a high priority, the whole system is in a state of disrepair
  • The data link that the recommendation system relies on is broken, and the data is in an unhealthy state
  • Multiple conversions of R&D personnel, there are contradictions and conflicts between the newly added code and the code added by the predecessors
  • The most frightening thing is that some strategies are actually wrong, but no one knows and improves
  • Lack of up-to-date architecture and global link information map

This article focuses on four important engineering guarantees in the governance of personalized recommendation systems:

  • Combing the big map: the latest framework and global link information map with continuous maintenance
  • Tool construction: support necessary data parsing and unconventional debugging tools
  • Data Governance: Continuously sort out and simplify the work on the data side
  • Keep publishing: resolutely implement the SMART principle, demolish and promote the tasks that can be launched
  • Emphasis on testing: in-depth testing

big map

First of all, my colleague introduced the " Design and Evolution of CSDN Personalized Recommendation System ", which introduced a lot of continuous refactoring work at the service code level. This part of the work basically needs to continue to sort out, refactor, iterate, and go online every part of the system with patience.

From the perspective of system governance , direct refactoring of service code is one of the most important links in the system, but not all. This is because such a system contains an understanding of multiple parts:

  • Service interface part: How to use the interface of the system downstream, and what further processing has been done to the interface of the recommended service before the downstream can connect with the application layer.
  • Service strategy part: that is, the recommended service itself (as in the above article)
  • Service data part: Where does the Habse data in the framework diagram in the above article come from?

Therefore, from the perspective of system governance, the structure diagram roughly looks like this:
insert image description here
in the process of system governance, it is necessary to completely sort out how the three links work:

  1. How the Aggregate Interface section works
  2. How the Personalized Recommendations and Policy Configuration section works
  3. How the data source part works, and the relationship between configurations (including: scheduled computing services, dozens of computing tasks in the data warehouse, and streaming computing parts)

Without full link information, we are just trying to figure out what the blind man is doing. This is system engineering, which requires serious engineering and data quality .

During overall system governance, the engineering practice is to:

  • Maintain a continuously updated document on the flow direction of individual push data pipelines from beginning to end
    • Sort out the data flow direction of the whole link in a layered and tabular way
      • Driven by data flow and source
        • Service R&D, testing, and data R&D joint continuous update

tool build

We all know that in engineering, it is very important to maintain a complete separation of development/test/formal three-piece environment for healthy project iterations. But we also often encounter some systems that do not have a complete development/test/formal three-piece environment due to various historical reasons. This creates a lot of headaches for developing, testing and validating functionality.

But as engineers, we have to solve problems, and sometimes we can't wait for the ideal environment to emerge (this is a matter of cost). Therefore, we support R&D in spending time building online system diagnostic tools. This system originally had some simple online interface debugging information acquisition capabilities. However, there is still not enough information for initial diagnosis of those difficult problems. This part is also commonly referred to as "observability".

My colleagues worked hard to build a relatively complete and easy-to-use online system diagnostic inspect interface, which brought great convenience to the subsequent problem location and solution.

Secondly, the policy configuration of the recommendation system is a bunch of combined configurations defined by json. The code needs to parse these combined configurations to dynamically build a policy tree, which is essentially a directed acyclic graph data structure . The problem here is the mapping between the configuration and the code. It is difficult to locate and diagnose the problem by looking at the configuration alone. The old engineers also developed a visualization tool to present this DAG , but

  • On the one hand, these tools lose their timeliness without maintenance for a long time
  • On the other hand, after there are too many strategies, the visualization tool is not convenient to view
  • Finally, the mapping information between nodes and source code is not complete enough to directly map

This problem was once a bottleneck in research and development. This is the same as the previous section: In system governance, if there is no good tool to support the rapid positioning of global information or local important information, it will become a bottleneck for problem solving .

Therefore, we wrote a strategy configuration analysis tool, which parses the linear configuration into a pipeline compact json that conforms to the actual code nesting, and quickly displays the skeleton of the directed acyclic graph, which is convenient for positioning and diagnosis, and includes a variety of intermediate formatted data. It's actually a Unix-style way of compounding text to solve problems.

insert image description here
The diagram is a small example of policy configuration. The specific information has specific meanings for system development. If such a combined link becomes complicated but not conveniently displayed, it can be used to quickly locate specific classes and configurations in the source code. , will be very laborious.

In this part, our experience is: invest in tools to enable fast and direct diagnosis of system problems, thus making the R&D cycle shorter.

data governance

In the process of system governance, we follow some principles to make the system gradually simpler and more reliable:

  • Simplify the work of the service code layer as much as possible, and the service code layer has a technical policy configuration and feedback mechanism
  • Solve problems at the data layer as much as possible, because the work on the data layer has a shorter development cycle in terms of maintainability, and can be solved as small as an hour.

This part works. Our data layer R&D colleagues will have a dedicated technical blog analysis: " Data Governance of CSDN Personalized Recommendations "

But in this process, we insist on continuously sorting out all data calculation tasks:

  • Clean up old effective data
  • Strategies for Merging Redundancy
  • Reduce Specificity Rules
  • Using new structured data features from the upper AI layer
  • Build a complete control API

The core of this layer of work is to reflect: data-driven.

keep posting

One troublesome part of large system governance is how to maintain an iterative cycle. In engineering, there is an important principle: " Keep publishing is the last word ". If you think too much about a certain job and want to transform it too fast, it will often cause the system to not converge in engineering . This is a problem that most R&D will encounter.

However, we have adopted some strategies to avoid such problems more appropriately. There are probably these ways:

  • With the big map, we can judge clearly:
    • "This is a problem suited to be solved at the data layer"
    • "This is a problem suitable for solving at the policy level"
    • "This is a problem suited to be solved at the aggregation layer"
  • With tools, we can quickly locate problems in the configuration of the DAG part
    • "According to this link skeleton, the problem should be this link"
    • "According to this link skeleton, you can add features to this layer to solve the problem"
    • "According to this link skeleton, the two layers do conflict and should be merged together"
  • With the combing of the data layer, we can quickly confirm
    • "Quickly find the source of the data, check whether there is a problem with the source of the data, and if it can be solved through the source, don't move in the middle"
    • "The data layer can do more things. After providing more complete information, the strategy layer can remove a large piece of original logic..."

For tasks that require multi-link cooperation, the general approach is:

  • The data side prepares the data first
  • After the data in the data layer is ready, the strategy layer will follow up and improve
  • After testing and verifying that there is no problem with the strategy layer interface, iteratively go online through api release
  • After the strategy layer api is ready, the transformation of the aggregation layer interface will be promoted...

Through multi-stage releases, various disadvantages brought about by end-to-end R&D can be avoided.

Emphasis on testing

A critical part, our testers are familiar with how each part of the system works. This enables the test to conduct white-box testing in depth and detail in the iterations of each link. This is critical for the governance of such a system. For this part of the work, our test will also have a related technical blog: " CSDN Personalized Recommendation System - Negative Feedback Test "

summary

A well-structured data system can bring about a well-structured ecology, so that the construction of the upper-level goals will not be based on a bunch of data indicators generated by systems that have not worked correctly.

quote

[1] " Design and Evolution of CSDN Personalized Recommendation System "
[2] " Data Governance of CSDN Personalized Recommendation "
[3] " CSDN Personalized Recommendation System- Negative Feedback Test "

–end–

Guess you like

Origin blog.csdn.net/huanhuilong/article/details/131371248