Thursday, December 4, 2008

About "Latch Free" Wait Event - My Perspective

Hi,

All latch wait events are collectively referred to as "latch free" event in 9i. The specific latch involved must be determined either from a 10046 trace or using the SQLs. There are many types of latches and all the latches are catalogued in v$latchname. Since all latch wait events show up as just "latch free" in v$session_wait, there is no single corrective action that fixes the "latch free" wait issue. The corrective action depends on the type of the latch for which the session is waiting. Here is how we can identify the type of latch and then act accordingly.

The P1 column in v$Session_wait corresponds to Latch Adress. P2 refers to the latch number. P3 is tries and i dont think is too important in finding out the name of the latch.

To find the name of the latch, you can take the value from P1 column of v$session_wait and substitute the value in following sql

select x.latch#,x.name from v$latchname x, v$latch y
where y.addr = '&address_from_P1'
and y.latch#=x.latch#;

You can also do the same by substituting the P2 column value in the following SQL:

SELECT name FROM v$latchname
WHERE latch# = &P2;

If many sessions are waiting on latch free event, then we would want to first find out what latch are they waiting for. If all of them are waiting for a specific latch, then I think the troubleshooting becomes focused on that latch and hence, is easier.

One of the latch that I have had trouble in the past was Cache Buffers Chain Latch.

This latch is acquired when a data block is accessed in buffer cache. If too many sessions want to access the same data block, you will see all these sessions waiting for "latch free" event. This situation is referred to as "hot block". This can happen in situations where sql statements are run in parallel mode and each one wanting to access the data block.

I came across a good metalink document to identify the hot blocks: 163424.1

While I am not aware of a sure fire way of fixing this issue outright, situation specific solutions can be implemented. For eg. If month end /quarter end processing is happening in Oracle Apps and lot of batch jobs are running, then one of the possible solution might be rescheduling some of the jobs contending for the hot blocks at a later time. If a sql statement is running in parallel, it might be worth while to see if it makes sense to run the query in a non parallel fashion.


- Aravind Kamath

Autoconfig failing with ORA-00603: ORACLE server session terminated by fatal error

Hi,

We were in the process of applying the RUP4 in our R12 development environment. As part of this activity, we had to run autoconfig and autoconfig failed with the following error:

Running Profile Process 1 of 2 for BIS_TOP
Executing script in InstantiateFile:
/apps/INSTANCE_NAME/apps/tech_st/10.1.3/perl/bin/perl -I /apps/INSTANCE_NAME/apps/tech_st/10.1.3/perl/lib/5.8.3 -I /apps/INSTANCE_NAME/apps/tech_st/10.1.3/perl/lib/site_perl/5.8.3 -I /apps/INSTANCE_NAME/apps/apps_st/appl/au/12.0.0/perl -I /apps/INSTANCE_NAME/apps/tech_st/10.1.3/Apache/Apache/mod_perl/lib/site_perl/5.8.3/i686-linux-thread-multi /apps/local/INSTANCE_NAME/inst/apps/CONTEXT_NAME/admin/scripts/adexecsql.pl sqlfile=/apps/local/INSTANCE_NAME/inst/apps/CONTEXT_NAME/admin/install/bisdblrp.sql

DECLARE
*
ERROR at line 1:
ORA-00603: ORACLE server session terminated by fatal error

-------
alert log - ORA-00600: internal error code, arguments: [16500], [kqddlu]


The following changes were made in the environment recently:

- Refresh from production
- Database upgrade to 10.2.0.4 from 10.2.0.2


Troubleshooting tips:

The bisdblrp.sql creates two db links EDW_APPS_TO_WH and APPS_TO_APPS. This sql was failing outside of autoconfig with the same error. The script actually drops (if the links already exists) and recreates the db links.The script also spools the output. The spool showed that we were facing this issue while trying to drop the above db links. In the meanwhile, I queried dba_db_links and found two entries for each of the above links. This was the reason for the error we were seeing. Fix was to delete the rows for these db links from sys.link$ table and then run autoconfig. Autoconfig ran fine without issues.

Upon further analysis, I found that the issue was happening due to import of sys.link$ after refresh as one of the post refresh tasks. We are adopting this import process as we have large number of private db links in our database. So the sequence of events were like this:

- export the sys.link$ table from the target database before destroying the database.
- clone the database from production
- refresh the apps from production and run autoconfig as part of this activity
- import the sys.link$ table to get back all the db links.

- Aravind Kamath Posral