How microblogging large data development projects recommended things to half a billion people

How microblogging large data development projects to five

Microblog (Weibo) is a social networking platform to share broadcast real-time information through a short attention mechanisms. Weibo users to subscribe to content by focusing, in this scenario, the system can be a good recommendation and subscription distribution system for integration, promote each other. Microblogging two core basic points: First, customer relationship building, and second, content distribution, the microblogging recommendation has been working to optimize these two points, to promote the development of micro-blog. As shown in Figure 1:

 

More exciting content, please point me to learn

Figure 1 microblogging Recommended mission

System encountered in the course of the direction of development of the micro-blog recommended changes, re-establish the business continues to change, the goal of its product ideas, architecture and algorithms also will be changes. This paper describes the recommended architecture in the evolution of this process, from product target, the algorithm needs and technological development and other dimensions of the reader presents a complete development context, but also hope that through this opportunity to explore the relationship between business and technology together with you.

To facilitate understanding microblogging recommended architecture evolution, before the introduction of statements about the need to microblogging recommended in the process of constituting, in fact, and microblogging itself does not matter, the basic theory of the existence of the industry recommended processes are the same. As shown in Figure 2, it is recommended in order to solve the relationship between the user and the item, the recommended item the user is interested to him / her. So, a recommended item is out of candidates will go through, sort, strategy, display, and then change back to assess the candidates and so form a complete loop.

Figure 2 Recommended link

On the basis of the above overall process, Twitter recommended architecture has undergone three stages shown in Figure 3:

Will usually produce architecture from the team and the business environment, due to environmental factors and to address problems in the environment, the architecture will bring a more intense form of characteristics in their implementation to produce targeted results. This article from environmental factors, elaborate architectural composition microblogging recommended three phases and characteristics as well as the effect of the implementation of these three areas.

If you are interested in big data development, want to learn the system big data, you can join the big data exchange technology to learn buttoned group: 458 Digital 345 Digital 782, welcomed the addition, private letters administrator for course descriptions, access to learning resources

1. freestanding 1.0

1.1 Environment

Environmental factors influence the formation of architecture can be divided into internal and external environmental factors, environmental factors. Internal factors are mainly related content team and its members, while external factors mainly from outside the department, the entire company or the entire industry sectors.

Microblogging Recommended 1.0 of this period is from July 2011 to around February 2013, its main objective is to achieve current business needs. For freestanding explanation: Every project is a complete business process architecture, between architecture independent, and even technology stack. It is called a stand-alone internal factors in mind:

At that time the team is a new team members are also relatively new, not many mutual cooperation, lack of integrity field experience is recommended.

Team members recommended architecture for a number of more or less have their own understanding, but for microblogging recommended architecture in the current scene, and did not form a consensus.

The decisive factor is of course the external environment, internal reasons or because of better coordination and evolution. At that time the external environmental factors, including:

Many project requirements, the project was a team of five parallel development of an average of about 3-5, of course, the most important factor was the microblogging product is in a period of rapid development, many places need microblogging Recommended support. Meanwhile, the project cycle is very short, schedule constraints, is difficult to have time to organize and carry out a detailed abstract. Typical products include: micro-bar, micro-groups, micro-print, micro-topic, users, and content sorting and so on.

Is a supportive team, most of the demand comes from outside the team, a different team each external product direction has also led struggling to cope with demand.

At that time the industry has also recommended architecture different direction, we are trying to explore some of the architectural ideas in line with their own development.

Because of those reasons mentioned above, we usually face one by one project, will be according to their own understanding using familiar technology stack to build process, thus forming one after another independent architecture.

1.2 Composition and characteristics of architecture

The reasons mentioned in the previous section to form an independent body, you may feel there is no need to describe the composition of architecture, and it is not right, and in fact the basis of subsequent layered platform architecture are derived from precisely this stage, at this stage no team ever stepped pit summary subsequent evolution produced no local conditions. Therefore, we need to analyze what we recommend architectural composition and characteristics 1.0.

1) technical objectives

With reference to FIG, 2 in order to achieve the main objective of the business of micro-Bo recommendation 1.0, no feedback, and establish a complete evaluation system, while also ordering policy is replaced, it speaks to reflect the main focus on the candidate, as well as strategies to show. The recommended procedure is converted to: Strategies candidate à à present simple form.

2) architecture consisting of

As shown, we try to each project architecture can be expressed in Figure 4, in a real implementation, each project leader will choose to use apache + mod_python Meanwhile, redis use as a storage architecture as a service selection. In certain projects, the introduction of a complex operation which gave birth to c / c ++ service framework woo; at the same time, for the storage requirements of data special type of project, and that he developed a series of db, such as early store static data mapdb, storing the key-list keylistdb like. Of course, the deployment will be more casual than the following figure, a project several servers deployed microblogging service providers http request, then find a few redis server installation as a data support, data sources and business side will be a good rule to use rsync to transfer OK, most of the policy implementation in python.

Figure we can see the main technology stack:

web services: apache + mod_python, and later developed into a more complete community of mod_wsgi. Python development language used as WEB python are mainly due to the use of normal data processing, while quick, the learning curve is gentle.

Computing services: c / c ++, is formed inside the Framework woo

db: redis / mapdb / keylistdb the like, is divided into two storage methods: redis type and self RESEARCH

Source: rsync file transfer, firehose the associated content source as Twitter [Twitter use internal to a data queue]

Figure 4 microblogging recommended 1.0 architecture diagram

3) architecture features

It will be described as being divided into architectural features advantages and disadvantages. So advantages are:

Simple, easy to implement, it does not require additional foundation support

Conducive to business functions quickly realized

And more conducive to business conducted in parallel, independently of each other phase

More exciting content, please point me to learn

The shortage is:

The recommended process is not complete, lack of feedback, and so important to assess the content, the data also extreme lack of uniform treatment

Not available to support the algorithms involved, it is difficult to do in-depth recommended

Hardly a professional operation and maintenance

QA only able to test the functional level, module level test is almost impossible, because too dispersed

Teamwork difficult, is not conducive to the decomposition of the project

1.3 achievements

Despite many shortcomings, but in the course of its development, but also to optimize the architecture behind the foundation, its results are as follows:

In the process of rapid development of microblogging, microblogging to meet the recommended service support requirements, the completion of the period in which a total of more than twenty independent projects.

Woo born basic framework, efficient internal operation of the rear frame from this

The birth of a static storage mapdb become late microblogging Recommended static storage prototype

Summary of the continuing needs of the web application layer, the formation of the formation of a common application framework recommended

2. A layered 2.0

On a separate introduction to 1.0, according to the road infrastructure development, we went to the fork in the road, one side is the popular LAMP architecture, the other side is in line with advertising, CELL architecture search. LAMP architecture separating data strategy, business development scripting language as the primary language of choice, project development and rapid iteration. CELL structure emphasizing local flow processing, data services coupled with strong self-development and database services appear more suitable for high-performance effects products. We chose compatible with both tend architecture system business. Why so? Let us look at the prevailing circumstances.

2.1 Environment

Microblogging recommended time period is 2.0 in March 2013 to the end of 2014, this time internal environmental factors are:

1) The current team members to cooperate for a long time, familiar with each other, while the technology selection for a certain consensus.

2) the product team focus, were consolidated for the content / user / vertical type three recommended, while for the division were the focus of the scene: in the feed stream, and PC Home page text on the right. This focus is conducive to a unified architecture, but also for the technology for a time.

The external factors are:

1) recommendation for the company to have a relatively clear positioning, improve relations and achieve the efficiency of content distribution, and to lay the technical exploration of the recommended type advertising, foundation scene intervention and user experience.

2) Recommended areas, companies have had to have output architecture, microblogging recommended for a very good guide.

2.2 Composition and characteristics of architecture

Team in the implementation of the core business to achieve, constantly evolving tools and frameworks, ready to come build targets 2.0.

1) technical objectives

1.0 and different business needs is not only to achieve technical goals 2.0, the recommended procedure for complete, we need to address:

First of all to achieve a complete recommended processes, architecture covering candidate, sorting, strategy, display, feedback and evaluation.

Data-first, to extract the data architecture. For data comparison, data-driven effect; for data channel, reflects feedback; floor for data, following business requirements.

Algorithm provides convenient ways to intervene.

Both to ensure rapid and iterative development business, but also to support efficient operations.

2) architecture consisting of

Microblogging recommended 2.0 architecture shown in Figure 5, it is no longer a separate system, nor let developers use different techniques to solve similar problems. This architecture diagram mainly includes several parts:

Application layer: mainly responsible for the work and recommend strategies to show aspects, which is characterized by full play to the characteristics of the scripting language response iterative requirements. Most of the recommendations after ordering already demonstrated, but due to the setting of the need to integrate front-end product strategy, delete selection and rearrangement operation, this layer needs to be done at the technical level as IO intensive. In the technology selection, framework developed earlier been produced common_recom_frame original apache + mod_python basis. The framework is intended for secondary developers, based on this framework can achieve a good recommendation business processes. The core idea of ​​this framework is to refine the project, work as well as the three-interface data, project recommendation for each project, work is recommended for each project different recommended methods, and data management is the access method of downstream data. At the same time, we set up two specifications: one is the recommended unified interface to both users, content or vertical business; the other is the shield of different protocols database access methods greatly enhance the development efficiency. Born common_recom_frame framework basically solve a variety of recommendation strategies demand for products, walk in the front of the product.

5 microblogging recommended 2.0 architecture diagram

Calculation Level: mainly responsible for the recommendation of the ranking calculation, the main consuming CPU, at this level to provide interventional algorithm, the model supports iterative algorithm. At this level of technology selection, we have inherited the original WOO framework agreement, based on efficient internal communications framework c / c ++ development. Of course, doing a lot of expansion, still borrows ideas common_recom_frame mentioned above, to achieve the management for the project / work / data in a frame on the basis of WOO, to the secondary developers more efficient development tools. This tool is included in the team's open source projects: https: //github.com/wbrecom/lab_common_so

Data layer: mainly responsible for the recommended data flow and storage work. Data layer work is mainly to solve the data IN / OUT / STORE problem. How the data entry system wherein IN, OUT indicates how the data access, STORE show how data is stored. When planning and carrying out the data layer, but also analyzes the microblogging recommended data features, which can be divided into two categories: static and dynamic. Static data defined as: updates require full amount while a lower frequency of large-scale data; the definition of dynamic data is: dynamic updates while higher frequency of incremental data. Thus in the general direction of the IN / OUT / SOTRE while distinguished between static and dynamic data, generated RIN / R9-interface, redis / lushan, tmproxy / gout agent or tool frame. Here to talk about expansion, RIN supports data access dynamic data, receive data through web services, backend ckestrel be queue management, complemented by multi-service framework for consumer cluster, users only need to conduct their own business development that is fast dynamic on-line consumption data. R9-interface processing access static data, static data from a large number of the recommended operation, r9-interface framework Yong Hadoop cluster to solve the static calculation [MR, HIVE SQL and SPARK operation] notification, management, and data loading. Recommended for storing data, extensive use of dynamic data redis clusters, static data is used lushan cluster. For lushan this tool open source project team also included: https: //github.com/wbrecom/lushan. tmproxy / gout OUT to solve the problem of data, gout is a middleware proxy to handle the needs of recommendation for static and dynamic data binding to access, back-end data to reduce the impact of business change brings.

More exciting content, please point me to learn

Basic services: basic service recommendation system mainly includes monitoring, evaluation and alarm system, data monitoring system is divided into two types of monitoring performance and effects, the evaluation system is mainly used for offline evaluation, until the line has a certain effect is expected to reduce the ineffective and online. Figure 6 shows the UI basic services.

UI 6 basic services system

3) Features

The advantage is:

Support the full recommended procedure for the data front with a unified approach

In taking into account the rapid realization of business functions to ensure the effectiveness of the technology continues to deepen

Algorithm to provide good support

Data for the first to propose the idea can be fully contrast, continue to recommend the effect can be improved

Sealing layer system is easy to deploy and QA testing intervention

The shortage is:

Recommended core and a certain distance, and did not completely tailored recommendation

The recommended strategy algorithm entirely to the developer, is not conducive to recommend universal

For the training algorithm did not involve merely an online delivery system, does not constitute a complete system recommended

2.3 achievements

Microblogging Recommended born 2.0 generated good returns, the results are as follows:

1) microblogging recommended core operations are done in the system: the text of pages recommended, recommended user trends, trends in content recommendation, the user is recommended at each scene, fans vermicelli economy, account number, etc. recommended products

2) the birth of the basic framework of lab_common_so, and open source

3) the birth of a static storage cluster solutions lushan, and open source

4) the birth of RUF framework greatly enhance business productivity, but also to make some contribution to the community openresty

3. Platform 3.0

Described in section 2.0 on when an important issue is referred to the "core and recommended a certain distance, and did not fully tailored to recommend," we hope to resolve it in the recommended 3.0, the shortage would bring a problem, and why at the same time have to meet the business needs of infrastructure development forward again recommended it? So then for you to show the microblogging platform recommended 3.0 design, we take a look at the environment in which.

3.1 Environment

3.0 microblogging recommended time period is the end of 2014 so far, the current internal environment factors are:

1) the product is not recommended expansion, the effect of more value, will focus on business development and transition from iteration to iteration effect targeting technology.

2) new projects or iterations recommendation service, noticed a lot of repetitive things, but does not solve the architecture, redundant work.

The external factors are:

1) The company also changed from business expansion to efficiency first, to enhance the user experience and the quality of content up.

2) microblogging recommended a certain distance from the field in the recommended technical aspects, the current conditions were to catch up.

3.2 Composition and characteristics of architecture

The current environment also reflect the technical target of 3.0:

1) technical objectives

2.0 and different, full coverage of the target recommended process is not 3.0, the objectives are:

Abstract process to general procedure recommended candidate / sorting / training / feedback to

Recommendation algorithm is a data problem, it should be built at an angle of a recommendation system algorithm, requiring closer to algorithmic strategies

2) architecture consisting of

As shown in Figure 7, is microblogging recommended 3.0 architecture is currently practiced architecture system, we can actually find that it is based on 2.0 developed, since it also retains a large hierarchical system and tools used in the framework 2.0. Here are several differences in emphasis Description:

Two standards: one for the application layer, as the entire output frame, all in one application layer is the interface standard, which includes standard input and output parameters; further is directed to a dynamic input Rin, since the off-line calculation we can determine the structure, it is therefore an input layer does not need to set the tool r9-interface specification, it is rin need for standard setting, divided from the attribute / interactive data / log, etc. level.

Calculation layer increases to the standard method of generating candidate: Artemis candidate content module, item-cands candidate user module, ......, in the project development only need to select these methods to generate candidate.

Increased Padre EROS, problem-solving algorithm model. EROS several major functions are: 1) a training model 2) 3 wherein selection) on-line comparison test.

r9-interface layer and the data for generating a candidate rin increase of online and offline recommended general strategy to generate results.

7 microblogging recommended architecture diagram 3.0

3) Features

It describes its advantages:

Inherited the original features of 2.0, it retains its advantages

For more in-depth understanding of recommendation, combined with closer

To solve the most important issues recommended candidate / sort / training algorithm

3.3 achievements

Microblogging Recommended born 3.0, the results are as follows:

1) microblogging recommended core business will be gradually migrated to this system, as a data driven algorithms to improve results

2) was born EROS training process proposed standard method of training

3) for the recommendation that set the standard input-output method

4) for the candidate to produce a set of recommended methods in an abstract way

4. Summary

Above architecture evolution of the micro-blog recommended to do a more detailed description, well represented in this evolutionary process great team and personal income, the relationship between technology and the business has been in the schema. There are several to share with you is this:

1) technology from the business while improving business development, business development, in turn, promote technology advances, their interaction is a mutually reinforcing relationship. And common development of the technology business is viable.

2) Selection advise technology architecture is currently looking for the shortest path, then the continuous optimization iteration, eat breath support is unrealistic and unreasonable.

3) the promotion of a framework and tools for the best way is not an executive order is not a dinner party, but we are all participants, as an open source project, everyone is its owner, so that everyone maintain everyone to use.

4) They enjoy simple and can be relied upon, it is easier said than done, but there is a good way to know he should not do, rather than what should be done.

5) When it comes to recommending this particular area up, setting goals, tracking goals is important, the data and the target laid out, products, architecture and algorithm will find a way to solve.

Million people recommend something

Published 38 original articles · won praise 27 · views 40000 +

Guess you like

Origin blog.csdn.net/HAOXUAN168/article/details/104102008