Tool support "to O" database assessed by self-study database portrait

"Go O", is a topic of recent years has been very fire, will also produce a variety of doubts, including existing database assessment, technology selection and so on. Go O is a systematic project, need to be adequately assessed. Through self-development tool to generate database portrait, is to provide first-hand O data evaluation, we hope to bring learn.

A common doubts

Many companies consider to O, it often faced with the question - "enough to understand their database," can not help but have some doubts like this:

[Managers]

  • O database to the high cost of thing?
  • Heavy workload is not?
  • Long duration it?
  • What are the risks if there is?

[Architect]

  • MySQL can use their existing businesses carrying Well?
  • Are there any technical risk?
  • The need for the introduction of sub-library sub-table thing?
  • The need to introduce a cache thing?
  • Research and development of high complexity thing?
  • We need to invest much period?
  • How Data Access feature?
  • Before and after comparison of the amount of data you migrate?

[Developer]

  • SQL and more complex thing?
  • Transformation amount is not great?
  • You are not using Oracle dialect, proprietary objects, in need of rehabilitation?
  • and many more

Faced with these questions above, we need to quickly understand the existing Oracle objects, statements, access features, performance, etc., to assess your technical solutions, and subsequent migration scenarios workload. In other words, the need for "portrait" to our database. Based on the above database portrait of O to work full-cycle guidance, including the following aspects will be of great benefit:

  • Decision-making stage: the overall difficulty, cost (people when Choi), technical risks
  • Architecture phase: technical solutions, object structure, performance evaluation
  • R & D stage: compatibility, complexity, test
  • Migration phase: structure migration, data migration, data validation

It is based on such demand, some companies launched evaluate products, such as Ali's database and application migration services (referred to as ADAM), but such products often require deployment agent, upload analysis package, etc., for security sensitive enterprises is not feasible. The company where I start to work two years ago O, is also facing this problem. It is specifically the development of a green version of the applet can be run locally, to facilitate evaluation.

Address: https://github.com/bjbean/oracle-estimate-report

Second, the design ideas

Oracle database to collect and aggregate information, including six aspects of information environment, space, objects, access features, such as resource consumption, and SQL statement, a comprehensive database covering the actual operating conditions. For information collection for more targeted, tools threshold value parameter setting section. By running the command line, after collecting information WEB version of the assessment report production, intuitive way to visualize reflected. O not only as to the basis for the assessment, it may be used as reference data subsequent transformation.

Third, portrait interpretation

The following report data for interpretation, and common to O Selection -MySQL explained.

3.1 Summary Information

figure 1

Displaying summary information collected by the target, including IP, examples, and other users. It should be noted the analysis time, the script performs feature extraction database (within 24 hours), it is recommended to run the business after the peak.

3.2 Spatial Information

figure 2

Space is one of the indicators database selection important consideration, also affect the subsequent migration. Such as large-scale database, you should consider doing the spin-off process. Split principle is to try to control the size of a single library. Priority may be resolved following the general principles follow:

1) Vertical resolution business layer

At the application level, split the data according to different business lines. Such as electronic business platform split in accordance with the order, the user, commodities, stocks and so on. Each split portion, the polyethylene business, no strong data dependencies.

2) Split level business layer

In the same internal business, the establishment of life-cycle data management, data hot and cold stratification. Access to data in different layers of different characteristics, do further split. For example in electronic business platform for orders can be divided into active orders (two weeks, can return), the non-active orders (two weeks to six months period, the customer can accept), order history (more than six months).

3) application layer partition library sub-table

If the size of the above-mentioned split after a single library is still large, consider using sub-library sub-table techniques. The usual practice is introduced into the intermediate layer database, a database of virtual logical, but physically divided into a plurality of databases. This is a less "elegant" solution, because it is difficult to achieve application transparency. In other words, a compromise must be in research and development, the ability to sacrifice part of the database. On common technical solutions can be divided into: Client, Proxy, SideCar three, is more recommended to use Proxy mode (container deployment could be considered SideCar mode).

4) the base layer Distributed Database

More "sub-library sub-table" approach is more radical is the direct use of distributed databases. It provided a larger carrier (capacity, throughput) solution. In recent years, distributed database has matured, the promotion landing; and began to try to use the key scene.

3.3 Object Information

image 3

For Oracle objects, each with different points of consideration in the modification. The report gives a summary of data, but also gives detailed data easier to find.

1) Table

Excessive number of tables, a direct impact on the size of the data dictionary, and thus the overall efficiency of the database. From the point of view MySQL, we need to consider issues such as file handles. This indicator is not as obvious, it should be considered as appropriate under the circumstances. Here is more data to consider the architectural level, to avoid too much data in one database table. Have experienced a single library 100 000 table, poor performance; optimized integration into the 20000 optimization case. If you choose MySQL, recommended not more than 5000 single database table; the total number of library * Table does not exceed 20,000.

2) Table (large table)

Scale control of single table is one of the main points of design, direct impact on access performance. A large table above should be considered split principle. No general rule table size, this can be configured by parameters. It may be provided according to the physical size or number of records in two dimensions. The key point here is that the list of access methods, such as are simple kv access type, scale bigger okay; such as access to more complex, it is recommended that the threshold value is set lower. The selection MySQL, a large table or tables associated with a complex query which is not a good scene etc., may be considered ES, solr + hbase etc. asynchronous processing complex queries.

3) table (partition table)

From 9i, since 10g, Oracle's partitioning maturing enhancements. Oracle is said to have become a weapon to deal with huge amounts of data. But for MySQL, it is still not recommended partitioning. On the one hand, with the enhancement of hardware capabilities, single-table bearing capacity can become large; on the other hand, MySQL needs to use zoning to face "DDL zoom", "change locks" and other issues. If the team database can be a good control of the intermediate layer, it is recommended to use less complex sub-table technique. This may slightly increase the amount of research and development, but for operation and maintenance, a lot of advantages.

4) field (large object)

In any database, it is not recommended for large objects. If you use, taking advantage of the renovation work, and quickly get rid of it. Large Object functionality of the database, it is tasteless. ACID ability of the database itself, should focus on more important to save the data.

Figure 4

5) Index (B tree)

Too many indexes affect DML efficiency, take up a lot of space. By "index / table" generally reflects the number of indexes of a reasonable degree. There is no suggested value can be considered as appropriate under the circumstances. For any database, it has a similar problem, is how to "build a strategic indexing strategy." Herein may refer to the table (Hua plant selected - "Mass database solutions" a book), carded index requirements. Create scientifically to maintain the index.

6) Index (Other)

Oracle addition to the usual B + tree index, also support other types of indexes. If you choose a different database, these indexes are in need of rehabilitation, achieved by other means.

7) view

View, a logic package SQL statement, in some scenarios (such as security) makes sense. But it has higher requirements, Oracle has done a lot of work in this area (can be found in "SQL optimization best practices" written by the author of the book) for the optimizer. For MySQL, it is not recommended to use, consider the transformation.

8) trigger / stored procedure / function

For the databases, the calculation load, two storage capacity. As a component of the overall infrastructure expansion hardest part, try to play a central database capability is very important. Compared to the terms of storage capacity, computing power can be resolved through the application layer and the application layer is often easily extensible. In addition, taking into account the future maintainability, mobility and other factors considered in this part of the application side to solve it.

Figure 5

9) sequence

Oracle in sequence, can provide incremental, non-continuous protection service number. There are similar implementation in MySQL, it is done by the self-energizing properties. This part should be able to do the migration, but if the amount of concurrency is very large; may also consider using the Fa's solution.

10) Synonyms

Synonyms are data coupling of performance, no matter what the database, should the temptation to fall. Splitting operations should be considered in the end, no longer dependent on such characteristics.

3.4 Access feature

Image 6

Collected here, most of the database DML Top20 times in the past 24 hours. This directly reflects the current "hot spots" of the target operating system. These objects are required after selection, focus on assessing its performance prior to migration. To consider splitting, caching and other means, can reduce hot pressure of these objects. These objects are not limited to, more suggestion is to establish a "business pressure model." After the business by a full understanding and evaluation, the abstract service logic is converted to the pressure data model. The difficulty here lies in the ability of the abstract service logic and the evaluation of the proportion of the traffic module.

Form similar to the following pseudocode:

Figure 7

According to the above pseudo code may be compiled stress test code. Test code calls through a number of tools to generate pressure simulation test. This system transformation, upgrading, expansion assessments, etc. meaningful new hardware selection. In particular to O operation, the new technical solution meets the needs, it can be verified by this evaluation method. More use of the language of business, to compare before and after the change in carrying capacity to O. This is the decision whether or not technical feasible to consider one of the factors. Of course the above information, including only the DML, the query part is not included, the data may be obtained from the Oracle AWR. More complete, the application can be considered in conjunction with the measured pressure to do the whole link.

3.5 resource consumption

Figure 8

Here are the most recent 24 hours of resource usage. This data has two main purposes:

1) assess the overall load

Because the index is a measure of Oracle's display can not direct analogy to other databases. + Can rely on the expertise of historical data, evaluate the load pressure. For one of the basis for the assessment of other alternatives technical solutions. This is one of some indicators (such as user calls, etc.), can be transformed into quantitative indicators to guide follow-up testing and other work.

2) Evaluation choke point

For an index of very prominent cases, it means there are bottlenecks in existing business, as far as possible to be when you migrate to consider other options in the design phase, with a focus on testing session, possible reduction of technical risks.

3.6 SQL statement

Figure 9

Rewrite the SQL statement, the entire migration work in the most difficult part. Unless it is completely reconstructed, otherwise it is of concern to rewrite SQL tasks. It involves rewriting the quantity, complexity, and other content performance comparison, many still need to manually complete the screening.

I have had this experience, the project team spent a month's time to complete a project of "structural + SQL" of migration, but the follow-up and took three months to complete statement optimization, and even restructuring. The reason for this migration is on the line after the statement can not meet the performance requirements. Which is on the edge of the line, while adjusting the process very painful. Therefore, early identification of existing SQL case, the assessment of the workload, the difficulty of rewriting, performance assessment, has important significance. And on top of this part is to collect all the analysis of user SQL in history (the switch to open the details, showing the full amount of SQL), which contains the following dimensions.

1) The total number of SQL

The indicators reflect the approximate busy business degree. Further, the ratio can be used based on the analysis of a subsequent statement in question.

2) Long SQL

Here are more than a specified number of characters of the statement, the threshold can be configured via parameters. If you are considering MySQL, recommend using the "dapper" of SQL, the face of complex SQL is generally poor performance. So long for these statements are noteworthy objects, at least, is easy statement problem.

3)ANTI SQL

Reverse lookup, are more difficult to deal with on a database, this section is also more test optimizer. Although the newer version of MySQL, the reverse query optimization has been good, but this part is still cause for concern.

4)Oracle Syntax SQL

Oracle has characteristics wording, Oracle dialects (e.g. specific function, pseudo-column, etc.), which are required in the migration process. Of course, now some manufacturers, announced that its products are compatible with Oracle syntax, but also recommended for those doing special tests.

5)Join 3+ Table SQL

Multi-table association, the test is relatively optimizer. Particularly low correlation table between MySQL efficiency, does not recommend the use of more than two related tables. Listed here are 3 and above related queries, need to consider changing. Especially for complex queries, consider unloading their big data platform to complete.

6)SubQuery SQL

Subqueries the things above, MySQL is not good. Although the optimizer can be optimized to a certain extent, but still worthy of attention.

Author: Han Feng

The public starting number "Han Feng channel", welcome attention.

Source: CreditEase Institute of Technology

Guess you like

Origin blog.51cto.com/14159827/2416985