The difference between greenplum and hbase

Hadoop's hdfs supports massive data storage, mapreduce supports distributed processing of massive data
, although oracle can build clusters, but when the amount of data reaches a certain limit, the query processing speed will become very slow and the machine performance requirements are very high
. In fact, these two things Not the same kind. Hadoop is a distributed cloud processing architecture that tends to data computing while oracle is a relational database that tends to data storage . To say the comparison can compare hbase and oracle.
Hbase is a nosql database , a columnar database that supports massive data storage and column expansion, but the query operation is more complicated, not as simple as a relational database such as oracle, and only supports one index, but Hbase has a reasonable table structure setting The query speed has little to do with the size of the data, that is, the size of the data does not affect the query speed. By the way, the query speed of HBase can reach the ms level.

HBASE can do real-time data query, and it is very efficient,

but pay attention to the following points
: 1. The ROWKEY design of Hbase Table should be reasonable
. 2. The INDEX of HBase Table needs to be created by itself (using a coprocessor, or using MAPREDUCE to create an index asynchronously)
3. Hbase cannot directly use SQL to query, but you can use open source SQL projects to solve some problems, such as phoenix
reference https://github.com/forcedotcom/phoenix
phoenix is based on the coprocessor after version 0.92, and solves the problem of using SQL on HBASE The problem of executing aggregation commands,
such as SUM\AVG\COUNT\MAX\MIN, etc.,
also includes LIMIT\SORT and other operations.
However, phoenix currently does not support JOIN operations, nor does it support creating INDEX, such methods still need to be implemented by yourself.
I personally recommend custom ENDPOINT (abandon SQL), custom hooks can be directly bound to the table created by phoenix

4. HBASE itself is not suitable for BI, and needs to be customized through MAPREDUCE.

Millions of data, whether focusing on OLTP or OLAP, is of course MySql.
For the data of over 100 million levels, if you focus on OLTP, you can continue with Mysql, and if you focus on OLAP , it must be considered in different scenarios .

Real-time computing scenarios: Emphasis on real-time performance, often used in places with high real-time requirements, you can choose Storm;
Batch computing scenarios: Emphasis on batch processing, often used in data mining and analysis, you can choose Hadoop;
Real-time query scenarios: Emphasis on real-time query response , which is often used to convert the data in the DB into an index file and query it through a search engine, you can choose solr/elasticsearch;
enterprise-level ODS/EDW/dataset market scenario: Emphasizes real-time analysis of big data based on relational databases, often used for business data For integration, you can choose Greenplum;

database systems are generally divided into two types:
one is for front-end applications, the application is relatively simple, but the OLTP type is heavy throughput and high concurrency; the other
is heavy calculation, statistics on large data sets The type of OLAP analyzed.

Traditional databases focus on transaction processing, that is, OLTP , which focuses on multi-user simultaneous two-way operations. Under the requirement of ensuring immediacy, the system processes data allocation, read and write operations through memory, and there is an IO bottleneck.
An OLTP (On-Line Transaction Processing) system is also called a production system, which is event-driven and application-oriented. For example, the transaction system of an e-commerce website is a typical OLTP system. The basic characteristics of OLTP are:
data is generated in the system;
Transaction-Based;
the amount of data involved in each transaction is very small;
the response time is very high; the number of
users is very large, mainly operators;
various operations of the database are mainly based on indexes.

Analytical database is based on real-time multi-dimensional analysis technology, that is, it focuses on OLAP , simulates and summarizes data from multiple angles, and obtains the information and knowledge contained in the data.
OLAP (On-Line Analytical Processing) is an information analysis and processing process based on data warehouse, and is the user interface part of data warehouse. The OLAP system is cross-departmental and subject-oriented, and its basic characteristics are:
it does not generate data itself, and its basic data comes from the operational data (Operational Data) in the production system;
a query-based analysis system;
complex queries often use multi-table joins, Full table scan, etc., the amount of data involved is often very large; the
response time has a lot to do with the specific query; the number of
users is relatively small, and its users are mainly business personnel and management personnel;

The difference between greenplum and hbase

Guess you like