Application of Hadoop Technology in Business Intelligence BI

Hadoop is a very popular distributed computing solution. It is the name of an open source project of Apache. The core parts include HDFS and MapReduce. Among them, HDFS is a distributed file system, and MapReduce is a distributed computing engine. Today, Hadoop has been technically verified, recognized and even matured, and it has also spawned a huge ecosystem, well-known ones including HBase, Hive, Spark, etc. HBase is a distributed columnar database based on HDFS, and HIVE is a data warehouse system based on HBase. Impala provides real-time SQL query capabilities for data stored in HDFS and HBase, based on HIVE services, and can share HIVE metadata. Spark is a parallel computing framework similar to MapReduce, and also provides a Spark SQL query interface similar to HIVE, which is a data analysis tool based on hadoop.

Many companies, such as banks, have a lot of pipeline operations, and the data is updated in real time and the amount of data is large. Hadoop will be used as the underlying database, and the underlying data will be processed by the middlemen, and then the BI system will be used to connect the intermediate tables of these intermediate data processing manufacturers to access and process data, especially the hadoop big data platform vendors such as Star Ring and Huawei are mostly. It is also widely used.

Here is a brief introduction to the combined application of the Star Ring big data platform and the FanRuan big data BI tool FineBI.

Since Starlink also handles the hive database under hadoop, its essence is similar. You can use the jdbc driver provided by Hive. This driver can also allow FineBI to connect to the Starlink database and perform operations such as sql statement queries of some relational databases. , some special sql formulas need to be confirmed with Starlink personnel whether they can be used.

First copy these drivers to the report project, and then restart the BI server. After restarting, you can establish a data connection with the Starlink database, and finally perform data query through the connection.

1. Local deployment

The following picture is the jar package of hadoop used for internal testing of FineBI (put the following jar package under the webinf-lib folder), and the connection can be successfully tested, as shown in the following figure:

Application of Hadoop Technology in Business Intelligence BI

2. Data connection

The data connection is shown in the following figure:

Application of Hadoop Technology in Business Intelligence BI

测试连接成功之后,点击确定,可直接选择数据库中对应的表加入业务包中,类似于Mysql这些最常见的数据库取表方式。

Application of Hadoop Technology in Business Intelligence BI

3、实际分析案例

某银行的总行层面-机构维度-四象限图

Application of Hadoop Technology in Business Intelligence BI

(2)总行层面-机构维度-趋势分析

Application of Hadoop Technology in Business Intelligence BI

(3)总行层面-产品维度-盈利产品

Application of Hadoop Technology in Business Intelligence BI

4.关于FineBIFineIndexFineDirect功能

Application of Hadoop Technology in Business Intelligence BI

hadoop是底层,hive是数据库,上述案例采用的是FineIndex(cube连)连接,用的是hiveserver的方式进行数据连接的;数据连接成功之后,将hive数据库中的表添加到业务包中,也就是将库中数据拿到我们的多维数据库(FineIndex),当然抓取的过程中也可以读取数据库关联和转义,也可以手动转义和进行关联,同时也可以做一些etl操作如新增公式列/行列转换/join/union/过滤/分组统计/自循环列/新增分组列/使用部分字段等,做过处理的这些数据表用于前端分析。

也就是说数据库-FineIndex-前端分析,这里的FineIndex相当于一个中间库的形式,用来存储数据表,关联转义索引等。这些都对后续前台分析处理数据效率有很大的提升(因为直接sql取数,效率受数据库本身的限制,数据量大时,一般分析工具很容易就卡死升职内存溢出导致系统无响应),这也是FineIndex方案的初衷。FineIndex存在有两个意义,一个是提升效率,一个就是对数据进行二次整合处理。

FineBI还有一个连接方式FineDirect(数据库直连),主要是应对如下需求:

  • 分析结果的实时性

When enterprise users use BI tools, in most cases, OLAP analysis is performed on a large amount of historical data, but some users need to show real-time results. For example, in the analysis of transaction risk in the financial industry, each transaction is analyzed in real time. If it is necessary to go through the process of building a multi-dimensional database, there will be a delay in the arrival of data, which affects the accuracy of the analysis results. But because the calculation process is handed over to the database, the response speed depends more on the performance of the database.

  • Making full use of big data platforms

With the continuous optimization of various distributed computing solutions, the computing performance of data has also developed rapidly, and the computing power has been significantly improved. Many enterprises already have their own big data computing platforms, such as hadoop, kylin, greenplum, vertica, etc. The processing performance of these platforms for large amounts of data is sufficient to meet the needs of use, and there is no need for modeling, so the FineBI direct connection engine provides the function of docking these data platforms.

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=326480760&siteId=291194637