Introduction and comparison of OLAP and OLTP

Introduction to OLTP and OLAP

    Data processing can be roughly divided into two categories: online transaction processing OLTP (on-line transaction processing), online analytical processing OLAP (On-Line Analytical Processing). OLTP is the main application of traditional relational databases , mainly for basic, daily transaction processing, such as bank transactions. OLAP is the main application of data warehouse systems, supporting complex analysis operations, focusing on decision support, and providing intuitive and easy-to-understand query results. 

The OLTP system emphasizes the efficiency of database memory, the command rate of various memory indicators, the binding variables, and the concurrent operation; the
OLAP system emphasizes data analysis, the SQL execution market, the disk I/O, and the partitioning. 

Comparison between OLTP and OLAP :   

    OLTP, also called Online Transaction Processing (Online Transaction Processing) , represents a system with a very high transactional nature, generally a highly available online system, mainly small transactions and small queries. When evaluating its system, it is generally based on its The number of Transactions and Execute SQLs executed per second. In such a system, a single database often processes more than hundreds or thousands of transactions per second, and the execution volume of Select statements is thousands or even tens of thousands per second. Typical OLTP systems include e-commerce systems, banks, securities, etc., such as the business database of eBay in the United States, which is a typical OLTP database.
The most likely bottleneck in OLTP systems is the CPU and disk subsystems.
(1) The bottleneck of the CPU is often manifested in the total amount of logical reads and computational functions or processes. The total amount of logical reads is equal to the logical reads of a single statement multiplied by the number of executions. If a single statement executes quickly, the number of executions is very high. More, then, may also lead to a large total logical read. The design method and optimization method is to reduce the logical read of a single statement, or to reduce their execution times. In addition, the frequent use of some computational functions, such as custom functions, decode, etc., will also consume a lot of CPU time, resulting in increased system load. The correct design method or optimization method needs to avoid the calculation process as much as possible, such as It is a good way to save the calculation results to the statistics table.
(2) In the OLTP environment, the carrying capacity of the disk subsystem generally depends on its IOPS processing capacity. Because in the OLTP environment, the physical read of the disk is generally db file sequential read, that is, a single block read, but this read is very frequent. If the disk subsystem can't handle its IOPS frequently enough, there will be a big performance problem.
    The common design and optimization methods of OLTP are Cache technology and B-tree index technology. Cache determines that many statements do not need to obtain data from the disk subsystem. Therefore, Web cache and Oracle data buffer are very important to OLTP systems. In addition, in terms of index usage, the simpler the statement, the better, so that the execution plan is stable, and bound variables must be used to reduce statement parsing, minimize table associations, minimize distributed transactions, and basically do not use partitioning technology, MV technology , Parallelism and Bitmap Indexing. Because of the high concurrency, batch updates should be submitted quickly in batches to avoid blocking. 
The OLTP system is a system in which data blocks change very frequently and SQL statements are submitted very frequently. For data blocks, keep the data blocks in memory as much as possible. For SQL, use variable binding technology as much as possible to achieve SQL reuse, reduce physical I/O and repeated SQL parsing, and greatly improve database performance.
    In addition to binding variables, it may also be a hot block that affects performance. When a block is read by multiple users at the same time, Oracle needs to use Latch to serialize user operations in order to maintain data consistency. When a user acquires a latch, other users can only wait. The more users who acquire this data block, the more obvious the waiting will be. That's the heat problem. This hot block may be a data block or a rollback end block. For data blocks, it is usually caused by uneven data distribution in the database. If it is an indexed data block, you can consider creating an inverted index to achieve the purpose of redistributing data. For the rollback segment data block, you can add a few more appropriately. Rollback segments to avoid this contention. 
    OLAP, also called Online Analytical Processing (Online Analytical Processing) system, sometimes called DSS decision support system, is what we call data warehouse. In such a system, the amount of statement execution is not an assessment criterion, because the execution time of a statement may be very long, and the data read is also very large. Therefore, in such a system, the assessment standard is often the throughput (bandwidth) of the disk subsystem, such as how many MB/s traffic can be achieved.
    The throughput of the disk subsystem often depends on the number of disks. At this time, the Cache is basically ineffective, and the read and write types of the database are basically db file scattered read and direct path read/write. Should try to use a larger number of disks and larger bandwidth, such as 4Gb optical fiber interface.
In OLAP systems, partition technology and parallel technology are often used.
    The importance of partitioning technology in OLAP systems is mainly reflected in database management, such as database loading, which can be implemented by partition exchange, backup can be implemented by backing up partitioned tablespaces, and deleted data can be deleted by partitioning. As for the performance of partitions It can make some large table scans very fast (only scan a single partition). In addition, if partitioning is combined with parallelism, it can also make the scan of the entire table very fast. In short, the main function of partitioning is the convenience of management. It does not absolutely guarantee the improvement of query performance. Sometimes partitioning will bring performance improvement, and sometimes it will reduce it.
    In addition to the combination of parallel technology and partition technology, in Oracle 10g, combined with RAC to achieve simultaneous scanning of multiple nodes, the effect is also very good, and a task, such as a full table scan of select, can be evenly distributed to multiple RAC nodes go up.
    In the OLAP system, there is no need to use the binding (BIND) variable, because the execution amount of the entire system is small, the analysis time can be ignored for the execution time, and the wrong execution plan can be avoided. However, OLAP can use a lot of bitmap indexes and materialized views. For large transactions, try to optimize the speed. It is not necessary to submit as fast as OLTP, or even deliberately slow down the execution speed.
    The real use of bind variables is in OLTP systems. This system usually has such characteristics that the number of concurrent users is very large, the user requests are very intensive, and most of the SQL of these requests can be reused.
    For an OLAP system, most of the time the database is running a report job, and executing SQL operations that are basically aggregations, such as group by, at this time, it is appropriate to set the optimizer mode to all_rows. For some website databases with more paging operations, it is better to set it as first_rows. But sometimes for the OLAP system, when we have paging, we can consider using hint in each SQL. For example:
    Select a.* from table a;
Separate design and optimization
    should pay special attention to the design. For example, in a highly available OLTP environment, do not blindly use OLAP technology.
    For example, if the partitioning technology does not use the partitioning key in a wide range, but uses other fields as the where condition, then if it is a local index, it will have to scan multiple indexes, and the performance will become even lower. If it is a global index, it loses the meaning of partitioning.
    The same is true of parallel technology, which is generally used when completing large-scale tasks. For example, in real life, when translating a book, you can arrange multiple people first, and each person translates different chapters, which can improve the translation speed. If you only translate one page, and assign different people to translate different lines, and then combine them, there is no need, because one person may have already finished translating in the time allocated for work.
    The same is true for bitmap indexes. If used in an OLTP environment, it is easy to cause blocking and deadlock. However, in the OLAP environment, the query speed of OLAP may be improved due to its unique characteristics. MV is basically the same, including triggers, etc. In an OLTP system with frequent DML, it can easily become a bottleneck, or even waiting for the Library Cache. In an OLAP environment, it may improve the query speed due to proper use.
    For OLAP systems, there is little room for optimization in memory. Increasing the CPU processing speed and disk I/O speed is the most direct way to improve database performance. Of course, this also means an increase in system costs.      
    For example, we need to aggregate hundreds of millions or billions of data. It is difficult and unnecessary to store all this massive data in memory, because these data are rarely reused and cached. It doesn't make sense, and it also causes the physical I/O to be quite large. Therefore, the bottleneck of such a system is often above the disk I/O.
    For OLAP systems, SQL optimization is very important, because its data volume is very large, and the performance difference between full table scan and index is very large.
Other
    templates that can be selected in the process of building a database in versions before Oracle 10g are :
        Data Warehouse (data warehouse)
        General Purpose (general purpose, general purpose)
        New Database
        Transaction Processing (transaction processing)
    Available in the process of building a database in the version of Oracle 11g Templates to choose from are :
        General Purpose or Transaction Processing
        Custom Database

        database

Personal understanding of these templates is:

     Online Analytical Processing (OLAP, On-line Analytical Processing), large amount of data, less DML. Using data warehouse template
     online transaction processing (OLTP, On-line Transaction Processing), the amount of data is small, the DML is frequent, and the parallel transaction processing is many, but it is generally very short. Use general purpose or transaction templates.

     Decision support system (DDS, Decision support system), the typical operations are full table scan, long query, long transaction, but the number of general transactions is very small, and it is often a transaction-exclusive system.

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=326399968&siteId=291194637