Design and implementation of music recommendation system based on Hadoop platform

Collect and follow to avoid getting lost


Summary

  

In recent years, with the development of network technology, online music platforms have become people's first choice for listening to songs. Faced with massive music data, users often have no choice. Listening to music is a daily way of entertainment and relaxation. It is obviously unrealistic to listen to songs one by one, so a certain degree of screening is required. Based on this requirement, this paper designs and implements a music recommendation system. The system is based on the Spring Boot framework and uses HDFS in the Hadoop platform for storage and Map Reduce for calculations. The front end of the system is composed of main functional modules such as homepage management, song artist management and personal information management, which can meet the playlist recommendations and song recommendations that users want. This article first introduces the development status and related theories of the music recommendation system, then analyzes and studies the recommendation system, using a recommendation algorithm based on collaborative filtering; then introduces the specific functions and implementation steps of the system module in detail; finally The music recommendation system has been tested and all functions of the system can be executed correctly and achieve the expected results.
Keywords: Hadoop; recommendation system; music recommendation; collaborative filtering

1. Related technologies and basic theories

2.1 Related technologies

2.1.1Hadoop cluster

Hadoop is a set of open source software products under the Apache Foundation. It can provide users with reliable and stable distributed computing services. A Hadoop cluster mainly has three modules: MapReduce, hdfs, and yarn. It is different from other computer programs. Hadoop provides a mature high-availability processing solution. It can detect and handle failure cases by itself to ensure the normal operation of the program. It also has a mature ecosystem and cluster load balancing. Figure 2-1 is a schematic diagram of the Hadoop framework:

Insert image description here

Figure 2-1 Schematic diagram of Hadoop framework
HDFS is a distributed file system, which is used to store data. Map Reduce is a programming framework that provides calculations for massive data [3]. The distinctive feature of HDFS is its high fault tolerance, and it provides high throughput for accessing application data. It is suitable for applications with very large data sets. The cluster will split large files into multiple small files and then store them. on different nodes. But HDFS is suitable for write once and read many times. Map Reduce, its main idea is to divide and conquer. It is divided into two stages. Map is also called mapping, and its function is to decompose a complex task into several small tasks; Reduce is called protocol, and at this stage it is to complete the aggregation of data. In this process, issues such as task scheduling, load balancing, fault tolerance, etc. are all completed by the Map Reduce framework. But it cannot return results as quickly as mysql, that is, it is not good at real-time calculations. Figure 2-2 is a schematic diagram of MapReduce:
Insert image description here

Figure 2-2MapReduce diagram

2.1.2Spring Boot framework

Spring Boot is an open source Java framework developed by Pivotal in 2014. Its emergence has greatly simplified the difficulty of deploying Java enterprise-level web applications. It is a framework built on top of the Spring framework that provides an efficient way to set up and run applications. Its advantage is that it makes coding easy, and configuration is relatively simple because it provides many default configurations. If you need to customize the configuration, you only need to modify the default value. No xml configuration is required, and it can be used out of the box. As a one-stop development environment, it is convenient for developers to quickly build an enterprise-level application, with high development efficiency and simple deployment. Use maven configuration to import the corresponding jar package without manual import [4]. And compared with ordinary spring framework development, the development efficiency of the spring boot framework is several times higher. It is well compatible with various relational and non-relational databases.

2.1.3Vue framework

With the development of the Internet, our web pages are now more powerful and dynamic. The information provided on the web page has increased a lot, and more emphasis has been placed on aesthetics. The Vue used in this article is a good and versatile JavaScript framework that allows a web page to be divided into reusable components, each component containing its own style. It can also achieve the separation of front-end and back-end, saving development costs. Its architecture is simpler and people can learn it quickly and put it to use. Its core is the responsive principle, which has the following advantages:
(1) Vue’s learning cost is lower than other frameworks and it is simple and easy to learn.
(2) The instructions, filters, etc. in the Vue template make it very convenient for developers to operate the DOM.
(3) For front-end applications with complex logical interactions, Vue can provide basic architectural abstraction while ensuring good user interaction on the front end.

2.2 Development environment and technical framework

2.2.1Technical framework

Core framework: Spring Boot
View framework: Spring MVC
JS framework: VUE

2.2.2 Development environment

IDE: IDEA2019.3.3
Database: Mysql5.5
JDK: Java8
Jar package management tool: Maven4.0.0
Operating system: Windows

2.3 Recommendation system

The recommendation system is an engineering technical solution and a convenient channel to provide users with information of interest. Its purpose is to appropriately solve the problem of information overload. The following mainly introduces the related theories and technologies [5].
The collaborative filtering algorithm is a well-known and commonly used recommendation algorithm. Its main function is prediction and recommendation. Discover user preferences by mining historical user behavior data, classify users based on different preferences and recommend similar products. It is divided into user-based collaborative filtering and item-based collaborative filtering. Among them, the item-based collaborative filtering algorithm is mainly divided into two steps. First, the relationship between items is obtained based on the user's ratings of different items, and then similar items are recommended to the user based on the relationship between the items.
The user-based collaborative filtering algorithm uses the user's historical behavioral data, such as purchasing items, collecting, or liking music, to discover the user's liking for product content. By analyzing these behaviors, we find user groups with similar interests, and then make recommendations among users with the same preferences. As far as this article is concerned, the steps of the algorithm are as follows: first, collect users' various operations on music and record the users' historical behaviors; then process the selected user ratings, collections and other data to generate a user-rating matrix, and the rating can be regarded as a Numerical values, different values ​​represent different preferences of recommended users; secondly, the calculation of similarity between users. The calculation methods mainly include cosine similarity, correlation similarity and modified cosine similarity; in this project, the collaborative filtering algorithm is finally used to create Music recommendation list[6]. The popular explanation is: If two users, A and B, have collected three songs x, y, and z, then A and B belong to the same type of users. You can recommend the songs listened to by A to B. Figure 2-3 below explains the user-based collaborative filtering algorithm:
Insert image description here

Figure 2-3 User-based collaborative filtering

2. System design

4.1 Overall process design

The overall process design of the project is to upload the latest music data set to the HDFS file system at a certain time interval; then, use Map Reduce to perform collaborative filtering calculations, and then read the latest recommendation results into memory; finally, users can browse The server sends a list of websites that need to be recommended to the server, and the web program reads the recommendation results from the data and presents them [7].

4.2 Function module design

4.2.1 Data sources

Data sources rely on network resources and combined with big data platforms to find the required data sets. This project uses the music data set data.zip, which contains many songs. The data is stored in HDFS and calculated using Map Reduce. On this basis, later data processing and final visualization begin.

4.2.2 Data storage

Data storage in this project involves two aspects. When registering a new user, the database will automatically save the user and password; it also stores the user's login information, singer song information, and user personalized rating data. The database storage data design is shown in Figure 4-1:
Insert image description here

Figure 4-1 Database storage diagram
Hadoop's HDFS is used to store data sets and recommended results calculated using Map Reduce. Data storage in Hadoop is shown in Figure 4-2:
Insert image description here

Figure 4-2Hadoop data storage diagram

3. System implementation

The data processing module mainly uses Map Reduce in Hadoop to calculate the data, thereby recommending song playlists based on the similarity between users. The calculation of Map Reduce is divided into five steps. First, the user and item rating matrices are obtained based on the user's various operations on songs or playlists. The second step is to use the rating matrix to construct a similarity matrix between users. The third step is to The fourth step is to transpose the rating matrix. The fourth step is to multiply the matrices obtained in the first two steps. The last step is to set the ratings of the user's behavior to zero in the previously output matrix to obtain the final calculation result. Figure 5-1 shows the five-step calculation of Map Reduce, and 5-2 shows the calculation results:
Insert image description here

Insert image description here

Insert image description here

Insert image description here

4. Conclusion

   In order to better push songs to users in a timely and accurate manner, we designed this music recommendation system based on Hadoop. This system mainly uses the Vue framework to design and write the front-end page, the back-end uses the SSM framework, and the database uses mysql stores and processes data in the system. The entire music recommendation system includes login and registration functions, song and singer management functions, and user-based playlist recommendation functions. After logging in, users can also perform related operations on songs, such as favorites, comments, downloads, etc., which can provide users with a more comprehensive music recommendation system. Good music experience[12].
In addition to introducing the functions of the system in detail, this article also makes a comprehensive analysis and planning of the construction of the system from the aspects of operational feasibility and economic feasibility; the second chapter introduces in detail the used Related technology, the subsequent outline design explains the design purpose of each functional module and the functions they can each achieve; the mysql database storage table is displayed, and the connection between each part is also explained using the ER diagram; later in the system During the detailed design process, the core code for functional implementation was given, and the front-end page was displayed.

Guess you like

Origin blog.csdn.net/QQ2743785109/article/details/134193568