数据库的一致性、ORA-01555以及UNDO_RETENTION

数据库的一致性

定义

  数据库一致性(Database Consistency) 是指事务执行的结果必须是使数据库从一个一致性状态变到另一个一致性状态。

概述

  保证数据库一致性是指当事务完成时,必须使所有数据都具有一致的状态。在关系型数据库中,所有的规则必须应用到事务的修改上,以便维护所有数据的完整性。   


     保证数据库的一致性是数据库管理系统的一项功能.比如有两个表(员工\职位),员工表中有员工代码、姓名、职位代码等属性,职位表中有职位代码、职位名称、职位等级等属性。你在其中员工表中进行了插入操作,你插入了一个新员工的信息,而这个新员工的职位是公司新创建的一个职位。如果没有一致性的保证,就会出现有这么一个员工,但是不知道他到底担当什么职责!这个只是它的一个小小方面。   


     读一致性也是数据库一致性的一个重要方面,在实际中,我们会遇到这种情况:我们对一个表中的某些数据进行了更新操作,,但是还没有进行提交,这时另外一个用户读取表中数据.这个时候就出现了读一致性的问题:到底是读什么时候的数据呢?是更新前的还是更新后的?在DBMS中设有临时表,它用来保存修改前的值,在没有进行提交前读取数据,会读取临时表中的数据,这样一来就保证了数据是一致的.(当前用户看到的是更新后的值)  


    但是还有一种情况:用户user1对表进行了更新操作,用户user2在user1还没有进行提交前读表中数据,而且是大批量的读取(打个比方:耗时3分钟)而在这3分钟内user1进行了提交操作,那又会产生什么影响呢?这个时候怎么保证读写一致性呢?这个时候DBMS就要保证有足够大的临时表来存放修改前的数值,,以保证user2读取的数据是修改前的一致数据.然后下次再读取时候就是更新后的数据了.

ORA-01555

java.sql.SQLException:
ORA-01555: snapshot too old: rollback segment number 2 with name "_SYSSMU2$" too small

从客户那儿报来个问题,说是日志中显示,在执行一个sql时抛出 error info 如下:
java.sql.SQLException: ORA-01555: snapshot too old: rollback segment number 2 with name "_SYSSMU2$"
too small


搜索了一下,总算是对这个error有了一点点了解。下面是一篇介绍 ORA-01555 的文章:
Oracle 读一致及块清除的内部机制(read consistency and block cleanouts)
Oracle always enforces statement-level read consistency. This guarantees that the data returned by a single query is consistent with respect to time when the query began. Therefore, a query never sees the data-changes made by transactions that commit during the course of execution of the query.
Oracle uniquely identifies any given point in time by a set of numbers called the System Change Numbers (SCN). So SCN can be defined as the state of the database at any one given point in time. To produce read-consistency, Oracle marks the current SCN as the query enters the execution phase. The query can only see the snapshot of the records as they were at the time of marked SCN.
Oracle uses rollback segments to reconstruct the read-consistent snapshot of the data. Whenever a transaction makes any changes, a snapshot of the record before the changes were made is copied to a rollback segment and the data block header is marked appropriately with the address of the rollback segment block where the changes are recorded. The data block also maintains the SCN of the last committed change to the block.

[ Oracle 为保证读一致性,一个查询语句在执行过程中数据是从 snapshot 取的,从而忽略查询执行过程中其他事务对数据的修改。而 snapshot 是在回滚段中管理的。每当有事务修改了一些数据,DB就会将这些修改前的数据快照存在回滚段data block中。]


ORA-01555 的产生原因及解决方法
As the data blocks are read on behalf of the query, only blocks with lower SCN than the query SCN will be read. If a block has uncommitted changes of other transactions or changed data with more recent SCN, then the data is reconstructed using the saved snapshot from the rollback segments. In some rare situations, if RDBMS is not able to reconstruct the snapshot for a long running query, the query results in an ORA-01555 error.
[ 查询每次只读取回滚段中比自己的SCN号(查询开始时赋予)小的数据块(这些块在查询执行过程中没有被修改过)。如果其中某个数据块的数据被修改了(SCN号大),则回滚段中的相应快照需要重建。在某种情况下,如果某个查询执行时间过长导致快照重建失败,就会发生ORA-01555错误????(啥意思) ]


A rollback segment maintains the snapshot of the changed data as long as the transaction is still active (commit or rollback has not been issued). Once a transaction is committed, RDBMS marks it with current SCN and the space used by the snapshot becomes available for reuse.
[ 事务 inactive 后,snapshot 所在的空间被释放(可以被覆盖)。???那岂不是说如果某些数据在查询时被其他事务修改了,就会有ora-01555的问题???


不过,DB中有一个参数(UNDO_RETENTION)可以指定 snapshot 维持的时间,如果查询执行时间超过这个参数,且当前回滚段不够用,snapshot 所在的空间也会被占用。 ]


Therefore, ORA-01555 will result if the query is looking for the snapshot which is so old that rollback segment information could not be found because of wrap around or overwrite.
[ 就是说,导致ORA-01555的原因其实就是:快照在该查询执行过程中被修改了(清除或者覆盖)。 ]


SITUATIONS WHERE ORA-01555 ERRORS COMMONLY OCCUR:
1. Fewer and smaller rollback segments for a very actively changing database
If the database has many transactions changing data and commiting very often, then the chance of reusing the space used by a committed transaction is higher. A long running query then may not be able to reconstruct the snapshot due to wrap around and overwrite in rollback segments. Larger rollback segments in this case will reduce the chance of reusing the committed transaction slots.


2. Corrupted rollback segment
If the rollback segment is corrupted and could not be read, then a statement needing to reconstruct a before image snapshot will result in the error.


3. Fetch across commit
This is the situation when a query opens a cursor, then loops through fetching, changing, and committing the records on the same table. In this scenerio, very often an ORA-01555 can result.
Let's take the following example to explain this:
A cursor was opened at SCN=10. The execution SCN of the query is then marked as SCN=10. Every fetch by that cursor now needs to get the read-consistent data from SCN=10. The user program is now fetching x numbers of records, changing them, and committing them. Let's say they were committed with SCN=20. If a later fetch happens to retrieve a record which is in one of the previously committed blocks, then the fetch will see that the SCN there as 20. Since the fetch has to get the snapshot from SCN=10 it will try to find it in the rollback segments. If it could rollback sufficiently backwards as previously explained, then it could reconstruct the snapshot from SCN=10. If not, then it will result in an ORA-01555 error.
Committing less often which will result in larger rollback segments will REDUCE the probability of getting 'snapshot too old' error.


4. Fetch across commits with delayed block clean out
To complicate things, now we see how delayed block clean outs play an important role in getting this error.
When a data or index block is modified in the database and the transaction committed, Oracle does a fast commit by marking the transaction as committed in the rollback segment header but does not clean the datablocks that were modified. The next transaction which does a select on the modified blocks will do the actual cleanout of the block. This is known as a delayed block cleanout.


Now, take the same scenario as described in previous section. But instead of assuming one table, let us assume that there are two tables in question. i.e: the cursor is opened and then in a loop, it fetches from one table and changes records in another, and commits. Even though the records are getting committed in another table it could still cause ORA-01555 because cleanout has not been done on the table from which the records are being fetched.


For this case, a full table scan before opening and fetching through the cursor will help.
Summary:
Fetches across commits as explained in last two cases are not supported by ANSI standard. According to ANSI standard a cursor is invalidated when a commit is performed and should be closed and reopened. Oracle allows users to do fetch across commits but users should be aware that it might result in ORA-01555.


Related DB configuration:
UNDO_RETENTION
UNDO_RETENTION specifies (in seconds) the low threshold value of undo retention. For AUTOEXTEND undo tablespaces, the system retains undo for at least the time specified in this parameter, and automatically tunes the undo retention period to satisfy the undo requirements of the queries. For fixed- size undo tablespaces, the system automatically tunes for the maximum possible undo retention period, based on undo tablespace size and usage history, and ignores UNDO_RETENTION unless retention guarantee is enabled.


The setting of this parameter should account for any flashback requirements of the system. Automatic tuning of undo retention is not supported for LOBs. The RETENTION value for LOB columns is set to the value of the UNDO_RETENTION parameter.


The UNDO_RETENTION parameter can only be honored if the current undo tablespace has enough space. If an active transaction requires undo space and the undo tablespace does not have available space, then the system starts reusing unexpired undo space. This action can potentially cause some queries to fail with a "snapshot too old" message.


ALTER SYSTEM SET undo_retention=10800 SCOPE=BOTH; (设为3小时)

UNDO_RETENTION

 

每一中数据库都需要有一种管理回滚或者撤销数据的方法。当一个DML发生以后,在用户还没有提交(COMMIT)改变,用户不希望这种改变继续保持,需要撤销所做的修改,将数据回退到没有发生改变以前,这时就需要使用一种被称为撤销记录的数据。

使用撤销记录,我们可以:
1、 当使用ROLLBACK语句时回滚事务,撤销DML操作改变的数据
2、 恢复数据库
3、 提供读取的一致性
4、 使用Oracle Flashback Query分析基于先前时间点的数据
5、 使用Oracle Flashback特性从逻辑故障中恢复数据库

Oracle10g中的自动撤销管理(AUM)
在Oracle10g中对于回滚段的管理可以通过配置参数而实现自动管理。为启用撤销空间的自动管理,首先必须在init.ora中或者SPFILE文件中指定自动撤销模式。其次需要创建一个专用的表空间来存放撤销信息,这保证用户不会在SYSTEM表空间中保存撤销信息。此外还需要为撤销选择一个保留时间。

如果需要实现AUM,需要配置以下3个参数:

UNDO_MAMAGEMENT
UNDO_TABLESPACE
UNDO_RETENTION

查看初始化参数的设置:

SQL> show parameter undo_tablespace;
NAME TYPE VALUE
------------------------------------ ----------- -----------------------
undo_tablespace string UNDOTBS1
SQL> show parameter undo_management;
NAME TYPE VALUE
------------------------------------ ----------- -----------------------
undo_management string AUTO
SQL> show parameter undo_retention;
NAME TYPE VALUE
------------------------------------ ----------- -----------------------
undo_retention integer 900
SQL>



初始化参数的描述:

Initialization Parameter Description
UNDO_MANAGEMENT If AUTO, use automatic undo management. The default is MANUAL
UNDO_TABLESPACE An optional dynamic parameter specifying the name of an undo tablespace. This parameter should be used only when the database has multiple undo tablespaces and you want to direct the database instance to use a particular undo tablespace.
UNDO_RETENTION The UNDO_RETENTION parameter is ignored for a fixed size undo tablespace. The database may overwrite unexpired undo information when tablespace space becomes low.
For an undo tablespace with the AUTOEXTEND option enabled, the database attempts to honor the minimum retention period specified by UNDO_RETENTION. When space is low, instead of overwriting unexpired undo information, the tablespace auto-extends. If the MAXSIZE clause is specified for an auto-extending undo tablespace, when the maximum size is reached, the database may begin to overwrite unexpired undo information.



如果将初始化参数UNDO_MANAGEMENT设置为AUTO,则Oracle10g将启用AUM。
可以在初始化参数UNDO_RETENTION中设置撤销保留时间的大小:
UNDO_RETENTION=1800 设置保留时间为30分钟(1800秒)
UNDO_RETENTION参数默认设置为900秒。
UNDO_RETENTION的值应该设置为多少才合理?
不存在理想的UNDO_RETENTION的时间间隔。保留时间间隔依赖于估计最长的事务可能运行的时间长度。根据数据库中最长事务长度的信息,可以给UNDO_RETENTION分配一个大致的时间。

可以通过v$undostat视图的maxquerylen列查询在过去的一段时间内,最长的查询执行的时间(以秒为单位)。UNDO_RETENTION参数中的时间设置应该至少与maxquerylen列中给出的时间一样长。
Oracle提供如下为新数据库设置撤销保留时间间隔的指导:
1、 OLTP系统:15分钟
2、 混合: 1小时
3、 DSS系统:3小时
4、 闪回查询:24小时

UNDO_RETENTION参数的较高值并不保证撤销数据保留UNDO_RETENTION参数指定的时间。为保证撤销保留指定的时间,必须使用RETENTION GRARANTEE子句。

例如:

CREATE UNDO TABLESPACE UNDOTBS01
DATAFILE
‘E:\oracle\product\10.2.0\oradata\keymen\UNDOTBS01.DBF’
SIZE 500M AUTOEXTEND ON
RETENTION GUARANTEE



也可以使用ALTER DATABASE命令保证数据库中的撤销保留

ALTER DATABASE UNDOTBS01 RETENTION GUARANTEE



关闭撤销信息的保证保留

ALTER DATABASE UNDOTBS01 RETENTION NOGUARANTEE



设置撤销表空间的尺寸
Oracle建议使用Undo Advisor的帮助下设置撤销表空间的尺寸。可以创建一个小尺寸(大约500M)的撤销表空间,AUTOEXTEND数据文件属性设置为ON,从而允许表空间自动扩展。此表空间将自动增长以支持数据库中活动事务数目的增长以及事务长度的增长。
在数据库运行适当的一段时间后,可以使用UNDO Advisor来得出关于设置撤销表空间尺寸的建议。应该使用Analysis Time Period字段中允许的最大时间。出于此目的,可以使用OEM UNDO Management页面中给出的Longest——Runing Query长度。还必须根据闪回需求指定New UNDO Retention字段的值。例如:如果希望表能闪回24小时,应该使用24小时作为这个字段的值。

假如数据库中用RETENTION GUARANTEE子句配置了保证保留撤销。如果撤销表空间太小不能满足使用它的所有活动事务,那么会发生以下情况:
1、 如果撤销表空间用完85%,Oracle将发布一个自动表空间警告
2、 当撤销表空间用完97%时,Oracle将发布一个自动表空间严重警告
3、 所有DML语句将不允许,并且会接收到一个空间超出错误
4、 DDL语句允许继续执行

修改默认的undo_retention参数设置

 

数据库数据被误操作删除掉了,进行恢复。

数据库版本是Oracle10g Release 2的,我首先想到的是使用Flashback Query 进行闪回恢复,不幸的是ORA-01555,数据已经不能被闪回了。

查看当时的数据库参数undo_retention设置,发现这个参数被在10g中缺省的被设置为900秒,这个时间长度是不足够的。

马上将这个参数修改为10800,3个小时:

ALTER SYSTEM SET undo_retention=10800 SCOPE=BOTH;

记得以前一度 这个参数的缺省值被设为10800,可是随之而来的是UNDO表空间的过分扩展,难以回收 ,Oracle在不同版本中,也在进行不停的加权和折中。

Oracle也许会这样想:如果很少有人使用Flashback Query,而过大的undo_retention又会带来麻烦,那么干脆,设小点。

恢复之后,另外几个数据库的undo_retention同样修改为10800。
这一设置,应该被更新如安装手册,安装完数据库后即刻作出调整。

猜你喜欢

转载自mljavalife.iteye.com/blog/1547249