Use big data analysis tools to realize multi-scenario visual data management

Official documentation

https://yanhuang.yuque.com/staff-sbytbc/rb5rur?

Prepare the server environment

buy server

Buy Tencent cloud server, 1300 yuan newcomer price, one year

●4-core 16G memory

●CentOS 6.7

(Additional note: the latest 2.7.1 GA version, 8G memory can also run, you can use 8G first, and then upgrade if it is not enough).

Install the docker environment

Install docker, the speed is quite fast, about 3~5 minutes

picture

Download the Honghu trial version installation package

1. Register a Honghu account to obtain the download address

●https://www.yanhuangdata.com/auth/register?redirect_url=%2Fapi%2Fproducts%2Fhonghu%2F2.7.0%2Fdownload%3Fos%3Dlinux

picture

After registration, click [Free Community Edition] on the home page

picture

picture

Click to download the installation package.

Install Honghu

Upload compressed package

Now upload the installation package to the server, either scp or ftp is fine, use scp here (it seems that no port is needed)

picture

Tips: If you encounter Permission denied, log in to the server, enable root authority, and edit the file /etc/ssh/sshd_config

remove#

picture

Install

The official tutorial is more detailed, just refer to the operation: https://yanhuang.yuque.com/staff-sbytbc/rb5rur/auwfm3?#EveRG

Unzip the Honghu installation package to the /opt directory. After the decompression is complete, all Honghu related files are located in the /opt/honghu directory.

In subsequent documents, HONGHU_HOME refers to the installation directory of Honghu. In the above example, HONGHU_HOME is the /opt/honghu directory.

start up

Problem: If you encounter the "Resource exhausted on 1 nodes" error, you can consider restarting the docker daemon to solve it

picture

picture

Restart docker daemon, restart Honghu, and solve the problem.

picture

success

picture

This password can be modified in the account management after login, as shown in the figure below

picture

Log in

Enter the user name and password prompted after the console installation is complete to log in.

picture

picture

sample experience

After version 2.7.1, Honghu has added [Main Application] -> [Yanhuang Product Showroom]

picture

Scenario 1: shopify background login log records, visual query management and alarm

demand pain point

The current shopify user operation log records are not intuitive. There are only one line of records, which is difficult to check. It needs to be continuously pulled and loaded from the main screen. Moreover, the information is not structured, and no effective information can be seen. When encountering a certain error problem, it is difficult to locate the relevant operation records if you want to query them.

Target

●Structured information, easy to search and query

●Define a certain type of abnormal operation and give an alarm

●Statistics of usage frequency of each user, output data report

Import Data

Step 1: Upload files

Currently, the data in the csv format file is used, which can be directly imported

picture

Step 2: preview data, there are 1000 pieces of data

picture

【hint】

Click Save As to create a new data source type for shopifiy data.

Click【Save As】

picture

Step 3: Import data

picture

import complete

picture

Query data

After importing, view the data, the following view is displayed by default

picture

The sql query statement is as follows:

Favorite query statement

picture

picture

The next time you enter, you can directly click [Expand] on the right, and then click [Favorite Queries] to view the list of favorite query statements and load them. Such a complex query statement does not need to be written by hand every time, it can be loaded with one click.

picture

picture

Re-filter required fields

Click [Show Field Bar]

picture

Raw data is listed by default and helps organize fields

picture

The above is raw data, but it's still confusing to me. I am more concerned about three things: who, what day, what did I do.

The [Selected Fields] and [Remaining Fields] in the left column of the query results are automatically processed by Honghu to help filter. We can click on the name of the corresponding field, and there will be the specific data entry content corresponding to the field extracted by Honghu. Help us identify which fields should be preserved.

picture

According to your needs, confirm the following fields

●Operator: author

●Operation content: description

●Time: created_at

Click for details】

picture

Select the fields you just confirmed, and save.

picture

New display chart

Take the operation frequency comparison chart of members as an example

Click the field, select author, we want to use the values ​​in the pop-up window to build the chart

picture

picture

Enter the following query in the query box at the top:

picture

The result is as follows

picture

Click [New Chart], select [Histogram], slightly adjust the display in the attribute configuration bar on the right, and get the chart smoothly. It is found that the character [Upsell] has changed the most to the system, which is very intuitive!

picture

export query results

For the table results of the query, you can also filter unwanted fields and export them as a new table. This function can be used for additional data processing

picture

<a href="" https

<a href="" The result after https export is csv, as follows

picture

Custom content extraction function experience

For a piece of log text that has not been parsed by the system, we can use Honghu to define the parsing field by ourselves. Click [Extract New Field] at the bottom of the field to operate

picture

After entering the page for extracting new fields, such as the _message field below, I want to split the contents into a more structured one.

picture

The [Regular Extraction] is selected. For the original text, the mouse selects the string that is considered to be a field. After selection, there will be a pop-up window to name the field.

picture

After dismantling, the obtained data is as shown in the figure below, and the [Field Extraction Preview] below allows you to see the dismantling results in real time more conveniently.

picture

continue to the next step

picture

Finish!

picture

Click [Query the fields you just created] to return to the query data interface. We can see a series of fields that we just customized on the left. For these new fields, you can follow the above steps to continue to create icons, or export them as table content . It has to be said that Honghu is quite efficient in abstracting messy text and structured modeling!

picture

New Query Graph

In this log scene, the core content I am more concerned about is: who, what day, and what did they do.

How can we quickly create queryable content based on Honghu? For example the following form:

picture

The scene in the shopify background login log this time is:

1. A large amount of log data is modeled and processed into structured data with extractable fields by Honghu Reading.

2. Filter the required fields, author, description, created_at, and generate a table.

3. Based on the table, we can quickly filter the operation logs of a specific day to check the operation behaviors that occurred on the day and assist in the positioning of marketing actions.

【Suggestions】

At present, Honghu Chart does not support the generation of this kind of table query for the time being. Since our log volume is relatively large, it may accumulate > 5000 entries for a period of time, and it will be difficult to browse line by line. Based on Honghu's reading time modeling, fast screening Effective fields, organized into queryable tables, are a better function for orderly positioning of chaotic data.

However, if the amount of data is not large, SQL query, after filtering the data, save the result set as a Table and add it to the dashboard can meet this requirement.

Scenario 2: Generate API functions

Scenario description: Since individuals often do some small query tools, such as [Shenzhen second-hand housing guide price], or even the national epidemic data map that Yanhuang has demonstrated before, these data usually have the following file formats:

picture

My previous steps were

picture

Among them, [manual cleaning and processing of data], [defining fields to enter the database], and [developing API interfaces] can be said to be the most core and time-consuming parts.

After learning about Honghu, I found that based on Honghu, this process can be greatly simplified, and the operation process becomes as follows:

picture

Since Honghu has core functions such as data reading, real-time modeling, and arbitrary definition of fields, can it support the external provision of API interfaces? Simplify the API generation process while having data management functions. For example, generate an API as follows.

picture

Of course, it is an expansion proposal based on Honghu's current core capabilities.

Other tools use experience comparison feedback

Due to the nature of the work content, I have been exposed to some data tools on the market, such as Tableau and Sensors.

Tableau

Among them, Tableau is biased towards the presentation of data reports, and its advantage is that it has high customization flexibility based on data presentation. Tableau is more like an application layer.

The core of Honghu is data read-time processing, but because Honghu handles the most complex link of data, it has the potential to extend upwards, complete the [data display layer], and form an integrated data application.

Communicating with the official personnel of Honghu, the current Honghu or the commercial version of Yanhuang support the Restful standard API, and also allow data developers to do data docking with Tableau through the API interface provided by Honghu, and reserve an extended interface.

Sensory strategy

Sensors is mainly based on data burying and reporting, and then builds a complete set of data analysis paths: [data pool-data analysis-data display].

The advantage of experience is that it is relatively closed-loop, and the real-time data is high; but the scalability of external exposure is relatively not as high as Honghu. Of course, the current positioning of the two is not consistent.

Guess you like

Origin blog.csdn.net/Yhpdata888/article/details/132109854