Thursday, June 23, 2011

RACGMAIN PROCESS SPAWNS 1000+ PROCESSES THAT ARE NOT GETTING CLEANED

Visit the Below Website to access unlimited exam questions for all IT vendors and Get Oracle Certifications for FREE
http://www.free-online-exams.com
Problem Description: CC : RACGMAIN PROCESS SPAWNS 1000+ PROCESSES THAT ARE NOT GETTING CLEANED


racgmain process spawns 1000+ processes that are not getting cleaned causing memory / swap issues. this issue is happening only on one node (second) of a 2 node RAC cluster

Alert Log file shows
2009-07-18 13:10:24.914: [ CRSEVT][139127] CAAMonitorHandler :: 0:Action Script /u01/app/oracle/product/10.2.0/bin/racgwrap(check) timed out for ora.PDPP.PDPP2.inst! (timeout=600)
2009-07-18 13:10:24.914: [ CRSAPP][139127] CheckResource error for ora.PDPP.PDPP2.inst error code = -2
2009-07-18 13:28:42.448: [ CRSEVT][139188] CAAMonitorHandler :: 0:Could not join /u01/app/crs/product/10.2.0/bin/racgwrap(check)
category: 1234, operation: scls_process_join, loc: childcrash, OS error: 0, other: Abnormal termination of the child


- crs_stop <resource name > -f
- start the resource again using srvctl start , then monitor for the same behavior

- If the previous steps not release this status , we will need to check the following logs

* $ORA_CRS_HOME/log/<hostname>/crsd/crsd.log
* Please also check if you can find any core file under CRS home specially under $ORA_CRS_HOME/log/<hostname>/racg/racgmain
* $ORACLE_HOME/log/<hostname>/racg/*
* as root run ocrdump and upload the generated output
- Please also upload the result of the following command list :

$crs_stat
$crs_stat -p
$crs_stat -ls
$srvctl config nodeapps -n <nodename1> -a -g -s -l
$srvctl config nodeapps -n <nodename2> -a -g -s -l
$srvctl config database -d <db_name> -a
$srvctl config service -d <db_name> -a
$srvctl config asm -n <node name>
$ocrcheck
$crsctl query css votedisk
$crsctl query crs activeversion
$crsctl query crs softwareversion
$oifcfg getif
$oifcfg iflist

Solution:

As the CRS is 10.2.0.4 , so as we agree on phone that you will need to apply either 10.2.0.4 Bundle Patch #2 ( bug 7493592 ) or 10.2.0.4 CRS Bundle Patch #4 (bug 8436582) and the both will fix bug:6196746 ,as the bundle patches are cumulative 
Get Oracle Certifications for all Exams
Free Online Exams.com

WARNING [AJPRequestHandler-RMICallHandler-15] ldap.LDAPAuthenticatorStep - Could not get DN for the userid: deg-syncdispatch-user

Visit the Below Website to access unlimited exam questions for all IT vendors and Get Oracle Certifications for FREE
http://www.free-online-exams.com
Problem:


OWSM integration with OID is not working while trying to configure Oracle Web Services Manager ( OWSM ) 10.1.3 to use the LDAP Authenticate Policy Step for a registered service to allow for authentication of a user using Oracle Internet Directory ( OID )

WARNING [AJPRequestHandler-RMICallHandler-15] ldap.LDAPAuthenticatorStep - Could not get DN for the userid: deg-syncdispatch-user

The gateway.log file:
--------------------------------
2009-09-02 09:39:18,103 WARNING [Thread-15] configuration.PolicySetWatchdog - Failed to retrieve policy set from policy manager with url http://mirsal2.dubaitrade.ae:80/policymanager/services/RegistrationService: com.cfluent.policymanager.sdk.base.exception.ServerException: org.xml.sax.SAXException: Bad envelope tag: HTML
2009-09-02 09:39:18,106 FINEST [Thread-15] configuration.PolicySetWatchdog - Failed to retrieve policy set from policy manager with url http://mirsal2.dubaitrade.ae:80/policymanager/services/RegistrationService
com.cfluent.policymanager.sdk.base.exception.ServerException: org.xml.sax.SAXException: Bad envelope tag: HTML
at com.cfluent.policymanager.sdk.client.soap.SoapComponentConfigurator.getUpdatedPolicies(SoapComponentConfigurator.java:229)
at com.cfluent.agent.configuration.PolicySetWatchdog.getPolicySetFromPolicyManager(PolicySetWatchdog.java:168)
at com.cfluent.agent.configuration.PolicySetWatchdog.pollFromPolicyManager(PolicySetWatchdog.java:205)
at com.cfluent.agent.configuration.PolicySetWatchdog.run(PolicySetWatchdog.java:91)
2009-09-02 09:41:18,123 FINEST [Thread-15] configuration.PolicySetWatchdog - Checking Policy Manager
2009-09-02 09:41:18,133 WARNING [Thread-15] configuration.PolicySetWatchdog - Failed to retrieve policy set from policy manager with url http://mirsal2.dubaitrade.ae:80/policymanager/services/RegistrationService: com.cfluent.policymanager.sdk.base.exception.ServerException: org.xml.sax.SAXException: Bad envelope tag: HTML
2009-09-02 09:41:18,133 FINEST [Thread-15] configuration.PolicySetWatchdog - Failed to retrieve policy set from policy manager with url http://mirsal2.dubaitrade.ae:80/policymanager/services/RegistrationService
com.cfluent.policymanager.sdk.base.exception.ServerException: org.xml.sax.SAXException: Bad envelope tag: HTML
at com.cfluent.policymanager.sdk.client.soap.SoapComponentConfigurator.getUpdatedPolicies(SoapComponentConfigurator.java:229)
at com.cfluent.agent.configuration.PolicySetWatchdog.getPolicySetFromPolicyManager(PolicySetWatchdog.java:168)
at com.cfluent.agent.configuration.PolicySetWatchdog.pollFromPolicyManager(PolicySetWatchdog.java:205)
at com.cfluent.agent.configuration.PolicySetWatchdog.run(PolicySetWatchdog.java:91)

In the policymanager.log file of N2:
---------------------------------------------
com.cfluent.utils.db.DBException: Listener refused the connection with the following error:
ORA-12514, TNS:listener does not currently know of service requested in connect descriptor
The Connection descriptor used by the client was:
(DESCRIPTION=(LOAD_BALANCE=on)(ADDRESS=(PROTOCOL=TCP)(HOST=pdppdb03-vip.dubaiworld.ae)(PORT=1750))(ADDRESS=(PROTOCOL=TCP)(HOST=pdppdb04-vip.dubaiworld.ae)(PORT=1750))(CONNECT_DATA=(SERVICE_NAME=PSOA.dubaiworld.ae)))

at com.cfluent.utils.db.DBContext.getConnection(DBContext.java:95)
at com.cfluent.db.registry.ComponentsTable.isActiveComponent(ComponentsTable.java:323)
at com.cfluent.policymanager.da.ComponentAccessor.isValidComponent(ComponentAccessor.java:267)
at com.cfluent.policymanager.service.policy.PolicyQuery.getComponentPolicySet(PolicyQuery.java:52)
at com.cfluent.policymanager.service.soap.RegistrationService.getUpdatedPolicies(RegistrationService.java:192)
at sun.reflect.GeneratedMethodAccessor18.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:585)
at org.apache.axis.providers.java.RPCProvider.invokeMethod(RPCProvider.java:402)
at org.apache.axis.providers.java.RPCProvider.processMessage(RPCProvider.java:309)
at org.apache.axis.providers.java.JavaProvider.invoke(JavaProvider.java:333)
at org.apache.axis.strategies.InvocationStrategy.visit(InvocationStrategy.java:71)
at org.apache.axis.SimpleChain.doVisiting(SimpleChain.java:150)
at org.apache.axis.SimpleChain.invoke(SimpleChain.java:120)
at org.apache.axis.handlers.soap.SOAPService.invoke(SOAPService.java:481)
at org.apache.axis.server.AxisServer.invoke(AxisServer.java:323)
at org.apache.axis.transport.http.AxisServlet.doPost(AxisServlet.java:854)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:763)
at org.apache.axis.transport.http.AxisServletBase.service(AxisServletBase.java:339)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:856)
at com.evermind.server.http.ServletRequestDispatcher.invoke(ServletRequestDispatcher.java:713)
at com.evermind.server.http.ServletRequestDispatcher.forwardInternal(ServletRequestDispatcher.java:370)
at com.evermind.server.http.HttpRequestHandler.doProcessRequest(HttpRequestHandler.java:871)
at com.evermind.server.http.HttpRequestHandler.processRequest(HttpRequestHandler.java:453)
at com.evermind.server.http.AJPRequestHandler.run(AJPRequestHandler.java:313)
at com.evermind.server.http.AJPRequestHandler.run(AJPRequestHandler.java:199)
at oracle.oc4j.network.ServerSocketReadHandler$SafeRunnable.run(ServerSocketReadHandler.java:260)
at oracle.oc4j.network.ServerSocketAcceptHandler.procClientSocket(ServerSocketAcceptHandler.java:234)
at oracle.oc4j.network.ServerSocketAcceptHandler.access$700(ServerSocketAcceptHandler.java:29)
at oracle.oc4j.network.ServerSocketAcceptHandler$AcceptHandlerHorse.run(ServerSocketAcceptHandler.java:879)
at com.evermind.util.ReleasableResourcePooledExecutor$MyWorker.run(ReleasableResourcePooledExecutor.java:303)
at java.lang.Thread.run(Thread.java:595)
Caused by: java.sql.SQLException: Listener refused the connection with the following error:
ORA-12514, TNS:listener does not currently know of service requested in connect descriptor
The Connection descriptor used by the client was:
(DESCRIPTION=(LOAD_BALANCE=on)(ADDRESS=(PROTOCOL=TCP)(HOST=pdppdb03-vip.dubaiworld.ae)(PORT=1750))(ADDRESS=(PROTOCOL=TCP)(HOST=pdppdb04-vip.dubaiworld.ae)(PORT=1750))(CONNECT_DATA=(SERVICE_NAME=PSOA.dubaiworld.ae)))

at oracle.jdbc.driver.DatabaseError.throwSqlException(DatabaseError.java:125)
at oracle.jdbc.driver.DatabaseError.throwSqlException(DatabaseError.java:280)
at oracle.jdbc.driver.T4CConnection.logon(T4CConnection.java:328)
at oracle.jdbc.driver.PhysicalConnection.<init>(PhysicalConnection.java:361)
at oracle.jdbc.driver.T4CConnection.<init>(T4CConnection.java:151)
at oracle.jdbc.driver.T4CDriverExtension.getConnection(T4CDriverExtension.java:32)
at oracle.jdbc.driver.OracleDriver.connect(OracleDriver.java:595)
at java.sql.DriverManager.getConnection(DriverManager.java:525)
at java.sql.DriverManager.getConnection(DriverManager.java:171)
at org.apache.commons.dbcp.DriverManagerConnectionFactory.createConnection(DriverManagerConnectionFactory.java:48)
at org.apache.commons.dbcp.PoolableConnectionFactory.makeObject(PoolableConnectionFactory.java:290)
at org.apache.commons.pool.impl.GenericObjectPool.borrowObject(GenericObjectPool.java:771)
at org.apache.commons.dbcp.PoolingDriver.connect(PoolingDriver.java:175)
at java.sql.DriverManager.getConnection(DriverManager.java:525)
at java.sql.DriverManager.getConnection(DriverManager.java:171)
at com.cfluent.utils.db.DBContext.getConnection(DBContext.java:86)
... 31 more

//////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////


In the gateway.log of N1:
--------------------------------------
2009-09-02 09:23:23,010 FINEST [AJPRequestHandler-RMICallHandler-5] security.WSBasicCredsExtractor - Found the UsernameToken Header
2009-09-02 09:23:23,010 FINEST [AJPRequestHandler-RMICallHandler-5] security.WSBasicCredsExtractor - Element Value:deg-syncdispatch-user
2009-09-02 09:23:23,011 FINEST [AJPRequestHandler-RMICallHandler-5] security.WSBasicCredsExtractor - Element Value:plvSdpOQ1
2009-09-02 09:23:23,011 FINEST [AJPRequestHandler-RMICallHandler-5] security.WSBasicCredsExtractor - Successfully retrieved username and password
2009-09-02 09:23:23,011 FINEST [AJPRequestHandler-RMICallHandler-5] security.WSBasicCredsExtractor - Removing the UsernameToken Header
2009-09-02 09:23:23,011 FINE [AJPRequestHandler-RMICallHandler-5] CSWComponent - Executing policy step. Policy='SID0003009', Step Name='LDAP Authenticate', Step Class='com.cfluent.policysteps.security.ldap.LDAPAuthenticatorStep'
2009-09-02 09:23:23,011 FINE [AJPRequestHandler-RMICallHandler-5] common.AbstractAuthenticatorStep - Step LDAP Authenticate called
2009-09-02 09:23:23,011 FINEST [AJPRequestHandler-RMICallHandler-5] ldap.LDAPAuthenticatorStep - Inside LDAP authenticate
2009-09-02 09:23:23,011 FINE [AJPRequestHandler-RMICallHandler-5] ldap.LDAPAuthenticatorStep - Getting DN for user deg-syncdispatch-user
2009-09-02 09:23:23,011 FINE [AJPRequestHandler-RMICallHandler-5] ldap.LDAPAuthenticatorStep - BaseDN=cn=Users,dc=DT,dc=tco,dc=co,dc=ae
2009-09-02 09:23:23,011 FINE [AJPRequestHandler-RMICallHandler-5] ldap.LDAPAuthenticatorStep - Filter=(&(uid=deg-syncdispatch-user)(objectclass=inetOrgPerson))
2009-09-02 09:23:23,012 INFO [AJPRequestHandler-RMICallHandler-5] ldap.DirContextHolder - Attempt 1 to connect
2009-09-02 09:23:23,018 WARNING [AJPRequestHandler-RMICallHandler-5] ldap.LDAPAuthenticatorStep - Could not get DN for the userid: deg-syncdispatch-user
2009-09-02 09:23:23,028 FINEST [MessageLogWriterDaemon] util.BoundedTaskQueue - Removing from bounded queue AsyncMessageLog
2009-09-02 09:23:23,053 FINE [AJPRequestHandler-RMICallHandler-5] CSWComponent - Step execution failed: Fault Code=[http://schemas.oblix.com/ws/2003/08/Faults/AuthenticationFault] Fault String=[Unable to authenticate user deg-syncdispatch-user against LDAP Server.] Policy=[SID0003009] Pipeline=[Request] Step Name=[LDAP Authenticate] Step Class=[com.cfluent.policysteps.security.ldap.LDAPAuthenticatorStep]
2009-09-02 09:23:23,053 FINER [AJPRequestHandler-RMICallHandler-5] common.PrepareForServiceStep - Step PrepareForServiceStep called
2009-09-02 09:23:23,053 FINEST [AJPRequestHandler-RMICallHandler-5] wssecurity.WSSecurityUtils - Found the Security Header
2009-09-02 09:23:23,053 FINEST [AJPRequestHandler-RMICallHandler-5] wssecurity.WSSecurityUtils - Removing the Security Header
2009-09-02 09:23:23,053 FINE [AJPRequestHandler-RMICallHandler-5] gateway.Invoker - Result of Request Pipeline is 1
2009-09-02 09:23:23,053 FINEST [AJPRequestHandler-RMICallHandler-5] agent.AgentRuntime - Released read lock
2009-09-02 09:23:23,054 FINEST [AJPRequestHandler-RMICallHandler-5] listener.ProtocolListener - Request message size: 1,100
2009-09-02 09:23:23,054 FINEST [AJPRequestHandler-RMICallHandler-5] listener.ProtocolListener - Response message size: 380
2009-09-02 09:23:23,055 FINEST [AJPRequestHandler-RMICallHandler-5] policysteps.CoremanClient - creating invocation measurement with serviceStatus=-1, invocationstatus=1, componentId=C0003001
2009-09-02 09:23:23,055 FINEST [AJPRequestHandler-RMICallHandler-5] util.BoundedTaskQueue - adding to bounded queue
2009-09-02 09:23:23,055 FINEST [AJPRequestHandler-RMICallHandler-5] agent.AgentRuntime - Released read lock
2009-09-02 09:23:23,301 FINEST [Thread-18] util.BoundedTaskQueue - Removing from bounded queue
2009-09-02 09:23:23,301 FINEST [Thread-18] policysteps.CoremanClient - initInternal called
2009-09-02 09:23:23,303 WARNING [Thread-18] policysteps.CoremanClient - Fail to connect to coreman, Could not get RMI ICoremanAdaptor
2009-09-02 09:23:23,303 FINEST [Thread-18] policysteps.CoremanClient - Fail to connect to coreman
com.cfluent.coreman.sdk.CoremanException: Could not get RMI ICoremanAdaptor
at com.cfluent.coreman.sdk.client.CoremanAdaptorFactory.getRMIClient(CoremanAdaptorFactory.java:70)
at com.cfluent.coreman.sdk.client.CoremanAdaptorFactory.getCoremanAdaptor(CoremanAdaptorFactory.java:86)
at com.cfluent.policysteps.CoremanClient.initInternal(CoremanClient.java:87)
at com.cfluent.policysteps.CoremanClient.sendMeasurementsInternal(CoremanClient.java:251)
at com.cfluent.policysteps.CoremanClientDaemon.sendMeasurements(CoremanClient.java:356)
at com.cfluent.policysteps.CoremanClientDaemon.run(CoremanClient.java:307)

target exception:

java.rmi.ConnectException: Connection refused to host: localhost; nested exception is:
java.net.ConnectException: Connection refused (errno:239)
at sun.rmi.transport.tcp.TCPEndpoint.newSocket(TCPEndpoint.java:574)
at sun.rmi.transport.tcp.TCPChannel.createConnection(TCPChannel.java:185)
at sun.rmi.transport.tcp.TCPChannel.newConnection(TCPChannel.java:171)
at sun.rmi.server.UnicastRef.newCall(UnicastRef.java:306)
at sun.rmi.registry.RegistryImpl_Stub.lookup(Unknown Source)
at java.rmi.Naming.lookup(Naming.java:84)
at com.cfluent.coreman.sdk.client.CoremanAdaptorFactory.getRMIClient(CoremanAdaptorFactory.java:67)
at com.cfluent.coreman.sdk.client.CoremanAdaptorFactory.getCoremanAdaptor(CoremanAdaptorFactory.java:86)
at com.cfluent.policysteps.CoremanClient.initInternal(CoremanClient.java:87)
at com.cfluent.policysteps.CoremanClient.sendMeasurementsInternal(CoremanClient.java:251)
at com.cfluent.policysteps.CoremanClientDaemon.sendMeasurements(CoremanClient.java:356)
at com.cfluent.policysteps.CoremanClientDaemon.run(CoremanClient.java:307)
Caused by: java.net.ConnectException: Connection refused (errno:239)
at java.net.PlainSocketImpl.socketConnect(Native Method)
at java.net.PlainSocketImpl.doConnect(PlainSocketImpl.java:333)
at java.net.PlainSocketImpl.connectToAddress(PlainSocketImpl.java:195)
at java.net.PlainSocketImpl.connect(PlainSocketImpl.java:182)
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:366)
at java.net.Socket.connect(Socket.java:530)
at java.net.Socket.connect(Socket.java:480)
at java.net.Socket.<init>(Socket.java:366)
at java.net.Socket.<init>(Socket.java:180)
at sun.rmi.transport.proxy.RMIDirectSocketFactory.createSocket(RMIDirectSocketFactory.java:22)
at sun.rmi.transport.proxy.RMIMasterSocketFactory.createSocket(RMIMasterSocketFactory.java:128)
at sun.rmi.transport.tcp.TCPEndpoint.newSocket(TCPEndpoint.java:569)

Troubleshooting:


If you get an error saying ldap.LDAPAuthenticatorStep - Could not get DN for the <user>, verify password using the ldapbind command on your OID instance.
Also, ensure the DN entry is correctly given in OWSM's LDAP configuration page.

Would you please refer to the section "Troubleshooting" in the provided Note 782989.1?


check that the user has already bound and the password is correct by the following command:
ldapbind -h <HOST> -p <PORT> -D "cn=<username>,cn=users,dc=<domain>" -w <PASSWORD>

You should receive the message "Bind Successful" .

Enable OWSM logs with increased debugging by the following Note:

Note.726219.1 <How To Change Log Level for Oracle Web Services Manager (OWSM)>.


Solution:


The cause is creating the OID user using the oidadmin tool and did not set the proper values for the attributes which can be associated with a user to be created.

In order to solve this issue, you have to drop the OID user and recreate it using oiddas





Reference:


WebIV SEARCH
------------------------
Note 782989.1 : <An Example Of Using LDAP Authenticate Step in OWSM to Authenticate a Registered Web Service to OID>

The above note is used to configure Oracle Web Services Manager ( OWSM ) 10.1.3 to use the LDAP Authenticate Policy Step for a registered service
to allow for authentication of a user using Oracle Internet Directory ( OID ).
Get Oracle Certifications for all Exams
Free Online Exams.com

DATABASE ACTING DEAD SLOW DEAILY

Visit the Below Website to access unlimited exam questions for all IT vendors and Get Oracle Certifications for FREE
http://www.free-online-exams.com
Problem Description: ENG: DATABASE ACTING DEAD SLOW DEAILY @ THE SAME SAME TIME WINDOW

Database running very slow between around 2300 and 0200 Hours Everyday.

Top 5 Timed Events
~~~~~~~~~~~~~~~~~~~
Event Waits Time(s) Avg Wait(ms) % Total Call Time Wait Class
CPU time 13,783 31.8
log file sync 737,638 11,829 16 27.3 Commit
log file parallel write 749,259 8,533 11 19.7 System I/O
db file scattered read 305,077 6,418 21 14.8 User I/O
enq: TX - row lock contention 2,234 4,506 2,017 10.4 Application

SQL ordered by Elapsed Time
~~~~~~~~~~~~~~~~~~~~~~~~~~~

Elapsed Time (s) CPU Time (s) Executions Elap per Exec (s) % Total DB Time SQL Id SQL Module SQL Text
11,006 4,407 0 25.42 30jv4ktjqj3xz DBMS_SCHEDULER DECLARE job BINARY_INTEGER := ...
6,350 2,790 0 14.67 10828gd6bbnwt DBMS_SCHEDULER DECLARE job BINARY_INTEGER := ...


1,811 55 1 1810.78 4.18 6mcpb06rctk0x DBMS_SCHEDULER call dbms_space.auto_space_adv...
1,715 48 50 34.29 3.96 8szmwam7fysa3 DBMS_SCHEDULER insert into wri$_adv_objspace_...
1,568 187 1 1567.97 3.62 b6usrg82hwsa3 DBMS_SCHEDULER call dbms_stats.gather_databas...

OBSERVATION
===========
30jv4ktjqj3xz
~~~~~~~~~~~~~~~
DECLARE job BINARY_INTEGER := :job; next_date TIMESTAMP WITH TIME ZONE := :mydate; broken BOOLEAN := FALSE; job_name VARCHAR2(30) := :job_name; job_subname VARCHAR2(30) := :job_subname; job_owner VARCHAR2(30) := :job_owner; job_start TIMESTAMP WITH TIME ZONE := :job_start; window_start TIMESTAMP WITH TIME ZONE := :window_start; window_end TIMESTAMP WITH TIME ZONE := :window_end; BEGIN BEGIN OPCTF044.MAIN_PROC('J', 'T1'); COMMIT; END; :mydate := next_date; IF broken THEN :b := 1; ELSE :b := 0; END IF; END;


OBSERVATION
==========
+ AWR Report is for 300mins instead of request multiple 60mins reports.
+ We can see high waits for "Log file Sync" waits.
+ dbms_stats auto job is running in the night.
+ Besides there are other maintainenance jobs runnning in the night runnign space advisory.

SQL ordered by Reads
~~~~~~~~~~~~~~~~~~~~
Total Disk Reads: 10,792,730
Captured SQL account for 57.5% of Total
Physical Reads Executions Reads per Exec %Total CPU Time (s) Elapsed Time (s) SQL Id SQL Module SQL Text
462,457 1 462,457.00 4.28 211.52 1874.71 0vj94pmqvwdux sqlplus@jctpdb01 (TNS V1-V3) select min(minbkt), maxbkt, su...
424,535 1 424,535.00 3.93 87.93 838.76 74f7trkxksdzq sqlplus@jctpdb01 (TNS V1-V3) select min(minbkt), maxbkt, su...
407,046 1 407,046.00 3.77 79.50 811.24 69qkjfnh3tvtf sqlplus@jctpdb01 (TNS V1-V3) select min(minbkt), maxbkt, su...
380,101 1 380,101.00 3.52 1882.00 10834.46 amw3yd0j2ak61 sqlplus@jctpdb01 (TNS V1-V3) select /*+ parallel(t, 16) par...
378,133 1 378,133.00 3.50 49.44 448.18 3vuybk2a7828w sqlplus@jctpdb01 (TNS V1-V3) select min(minbkt), maxbkt, su...
355,344 1 355,344.00 3.29 113.11 938.17 cx3nf783fppg4 sqlplus@jctpdb01 (TNS V1-V3) select min(minbkt), maxbkt, su...
351,481 1 351,481.00 3.26 27.67 320.92 40kpkjgbfb3dm sqlplus@jctpdb01 (TNS V1-V3) select min(minbkt), maxbkt, su...
351,309 1 351,309.00 3.26 31.46 400.39 2cdckjfy4cxhb sqlplus@jctpdb01 (TNS V1-V3) select min(minbkt), maxbkt, su...
347,193 1 347,193.00 3.22 23.47 296.92 3w7fvrrccruqp sqlplus@jctpdb01 (TNS V1-V3) select min(minbkt), maxbkt, su...
346,058 1 346,058.00 3.21 11.76 285.14 b602h54cr0bp4 sqlplus@jctpdb01 (TNS V1-V3) select substrb(dump(val, 16, 0...
346,058 1 346,058.00 3.21 11.61 295.87 c9586mbmqfknm sqlplus@jctpdb01 (TNS V1-V3) select substrb(dump(val, 16, 0...
346,058 1 346,058.00 3.21 16.47 271.41 f5zz84nsrp5bm sqlplus@jctpdb01 (TNS V1-V3) select min(minbkt), maxbkt, su...
344,657 1 344,657.00 3.19 11.97 273.83 0dkxnkp0rnhp7 sqlplus@jctpdb01 (TNS V1-V3) select substrb(dump(val, 16, 0...
284,128 1 284,128.00 2.63 500.83 3903.04 g58ma37a1fkpa sqlplus@jctpdb01 (TNS V1-V3) select /*+ parallel(t, 16) par...
269,680 1 269,680.00 2.50 157.69 1354.73 f7kf4x9bskr9w sqlplus@jctpdb01 (TNS V1-V3) select min(minbkt), maxbkt, su...
245,799 1 245,799.00 2.28 127.12 704.66 8wq3bawn5n2p7 sqlplus@jctpdb01 (TNS V1-V3) select min(minbkt), maxbkt, su...
128,275 1 128,275.00 1.19 98.31 632.91 9h2gcqz93mcay sqlplus@jctpdb01 (TNS V1-V3) select min(minbkt), maxbkt, su


OBSERVATION
============
+ The db time increases over a period of time when the automatic job kick in.
+ We can huge waits for Parallelism. The Queries are mainly all related to statistics gathering.



disable the automatic statistics job for a couple of days and see if the performance is still bad. ?

NOTE.311836.1 How to Disable Automatic Statistics Collection in 10G
NOTE.377143.1 How to check what automatic statistics collection is

Even though observing the "Cluster" waits considering this a non-rac database.
and performance did not enhanced

#########
Bad Time
########


WORKLOAD REPOSITORY report for
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
DB Name DB Id Instance Inst num Release RAC Host
PRD10 1688280630 prd10 1 10.2.0.2.0 NO jctpdb01


Snap Id Snap Time Sessions Cursors/Session
Begin Snap: 25384 05-Sep-09 23:00:40 386 32.1
End Snap: 25385 05-Sep-09 23:30:40 384 32.9
Elapsed: 30.00 (mins)
DB Time: 312.41 (mins)


Top 5 Timed Events
~~~~~~~~~~~~~~~~~~~
Event Waits Time(s) Avg Wait(ms) % Total Call Time Wait Class
CPU time 2,601 13.9
log file sync 101,472 2,091 21 11.2 Commit
PX Deq Credit: send blkd 121,035 1,607 13 8.6 Other
log file parallel write 98,120 1,346 14 7.2 System I/O
db file scattered read 48,401 763 16 4.1 User I/O





WORKLOAD REPOSITORY report for
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
DB Name DB Id Instance Inst num Release RAC Host
PRD10 1688280630 prd10 1 10.2.0.2.0 NO jctpdb01


Snap Id Snap Time Sessions Cursors/Session
Begin Snap: 25385 05-Sep-09 23:30:40 384 32.9
End Snap: 25386 06-Sep-09 00:00:42 439 34.8
Elapsed: 30.04 (mins)
DB Time: 383.26 (mins)


Top 5 Timed Events
~~~~~~~~~~~~~~~~~~~
Event Waits Time(s) Avg Wait(ms) % Total Call Time Wait Class
PX Deq Credit: send blkd 1,285,081 5,336 4 23.2 Other
CPU time 4,516 19.6
log file sync 53,710 894 17 3.9 Commit
log file parallel write 57,896 697 12 3.0 System I/O
PX qref latch 48,494,766 487 0 2.1 Other



##########
Good Time
##########


WORKLOAD REPOSITORY report for
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
DB Name DB Id Instance Inst num Release RAC Host
PRD10 1688280630 prd10 1 10.2.0.2.0 NO jctpdb01


Snap Id Snap Time Sessions Cursors/Session
Begin Snap: 25122 29-Aug-09 21:00:05 470 35.8
End Snap: 25123 29-Aug-09 22:00:19 489 37.4
Elapsed: 60.23 (mins)
DB Time: 853.29 (mins)


Top 5 Timed Events
~~~~~~~~~~~~~~~~~~~

Event Waits Time(s) Avg Wait(ms) % Total Call Time Wait Class
PX Deq Credit: send blkd 6,082,722 12,816 2 25.0 Other
CPU time 7,820 15.3
PX qref latch 73,108,081 1,449 0 2.8 Other
log file sync 76,710 978 13 1.9 Commit
log file parallel write 84,027 856 10 1.7 System I/O




OBSERVATION
===========
+ In Bad Time, We see on 05-Sep-09, the DB Time was 383.26 (mins).
+ During Good Time, We see on 29-Aug-09 21:00:05, the DB Time was 853.29 (mins)
+ The waits are maing on PX.
+ There are not major critical waits during the bad time.
+ The I/O on the box get a little slower at times.

1. Can you check with the Unix Folks , if they are observing the high Network usage particularly @ night during problem time Daily ?
2.. Considering there 4 databases on this box. have you looked at other databases . I am just wondering if the culprit is someone else and database is merely a victim.


NOTE.301137.1 OS Watcher User Guide

1. Complete OSWatcher output.
2. Exact time when you observed the slowness. , i believe we still have problem from 21:00 -
3. Upload complete alert log.
4. Upload multiple 30mins / 1 hours report covering

i) 1 hour prior to problem time start.
ii) Problem time.
iii) 1 after after the performance was back to normal.
Get Oracle Certifications for all Exams
Free Online Exams.com

"ORA-26701: STREAMS process CDC$C_CTMS does not exist"

Visit the Below Website to access unlimited exam questions for all IT vendors and Get Oracle Certifications for FREE
http://www.free-online-exams.com
Problem Description: ACTIVE: APPLY PROCESS ABORTED AND NOT STARTING


### Detailed Problem Statement ###
Env:
CDC is esablished as the follwoing:

3 source systems:

1-BB01 -10.2.0.2 database
2-DB02  -10.2.0.2 database
3-DB03 -10.2.0.3 database

Destination is a 10.2.0.4 database (DB00)
The apply processes for source system DB01  (CDC$C_CTMS) is not starting with
"ORA-26701: STREAMS process CDC$C_CTMS does not exist"
this apply process was running before but got aborted beacuase of data error.

"ORA-01438: value larger than specified precision allowed for this column"
however on printing the problamatic LCR as per Note:265201.1 :
SET SERVEROUTPUT ON;

DECLARE
lcr SYS.AnyData;
BEGIN
lcr := DBMS_APPLY_ADM.GET_ERROR_MESSAGE
(<MESSAGE_NUMBER>, <LOCAL_TRANSACTION_ID>);
print_lcr(lcr);
END;
/
it is showing the transaction as an update field
Table structures are exactly the same on source and destinations.

Troubleshooting:


1. Please run the Streams Health Check SQL scripts on the instances where
CDC is configured and upload the output file to MetaLink.

Note.273674.1 Streams Configuration Report and Health Check Script

2. 10 columns have the table involved in the failed update (ORA-1438):

3. We want to execute the transaction manually, and see if it reproduces the
same error. So kindly follow the below procedure.

exec dbms_apply_adm.execute_error('<x.x.xxxx>') gave the below errors:

ERROR at line 1:
ORA-01438: value larger than specified precision allowed for this column
ORA-06512: at "SYS.DBMS_APPLY_ERROR", line 147
ORA-06512: at "SYS.DBMS_APPLY_ERROR", line 261
ORA-06512: at "SYS.DBMS_APPLY_ADM", line 468
ORA-06512: at line 1


4. There are two bugs about ORA-1438 and CDC. You will find additional information in:

Note.788425.1 CDC Change Table with more than 128 Columns Fails with ORA-1438
Note.745459.1 CDC APPLY FAILS WITH ORA-01722 OR ORA-01438 INTERMITTENTLY BUT RE-EXECUTION WORKS


Solution:


Apply Recommended patches 6454634 & 8680137 
MERGE LABEL REQUEST ON TOP OF 10.2.0.4 FOR BUGS 8488171 846406
Get Oracle Certifications for all Exams
Free Online Exams.com

knlc_ProcessMVDD-1: MISSING Streams multi-version data dictionary!!!

Visit the Below Website to access unlimited exam questions for all IT vendors and Get Oracle Certifications for FREE
http://www.free-online-exams.com
Problem Description: knlc_ProcessMVDD-1: MISSING Streams multi-version data dictionary!!!


found the following errors in alert.log:
knlc_ProcessMVDD-1: MISSING Streams multi-version data dictionary!!!
knlldmm: gdbnm=PROMISP.DUBAIWORLD.AE
knlldmm: objn=-40016384
knlldmm: objv=1
knlldmm: scn=1169006882675
knlldmm: opnum=7

Troubleshooting:


1-Please run the Streams Health Check SQL scripts on the instances where CDC is configured and investigate and errors from the generated files.

Note.273674.1 Streams Configuration Report and Health Check Script

Explanation:


-- Problem Statement:
DB0 is a 10.2.0.4 database which has 3 dowstream capture process and capture changes from 3 databases:

1- DB1 database 10.2.0.2
2- DB2 database 10.2.0.2
3- DB3 10.2.0.3

In the alert.log file for DB0, we can see the following errors reported.

knlc_ProcessMVDD-1: MISSING Streams multi-version data dictionary!!!
knlldmm: gdbnm=PROMISP.DUBAIWORLD.AE
knlldmm: objn=-40016384
knlldmm: objv=1
knlldmm: scn=1169006882675
knlldmm: opnum=7

The information is always referencing database DN02, but object_id referenced does change.
There are no objects corresponding to these numbers in source neither in destination

Solution:



Bug 7257038 ...fixed on 11.2

REDISCOVERY INFORMATION:
If you get
"knlc_ProcessMVDD-1: MISSING Streams multi-version data dictionary!!!"
in the capture database, it's likely to be this bug.


Get Oracle Certifications for all Exams
Free Online Exams.com

How to run Oracle RDA to gather streams information

Visit the Below Website to access unlimited exam questions for all IT vendors and Get Oracle Certifications for FREE
http://www.free-online-exams.com
As suggested by note 273674.1

You can run it with
./rda.sh -vCRP STC options to collect the streams monitor and health check.
Get Oracle Certifications for all Exams
Free Online Exams.com

Wednesday, June 22, 2011

Oracle Instance Related Waits 1/2

Visit the Below Website to access unlimited exam questions for all IT vendors and Get Oracle Certifications for FREE
http://www.free-online-exams.com


Wait for a undo record" & "Wait for stopper event to be increased


Sometimes Parallel Rollback of Large Transaction may become very slow. After killing a large running transaction (either by killing the shadow process or aborting the database) then database seems to hang, or SMON and parallel query servers taking all the available CPU. 

http://3.bp.blogspot.com/-EUK27oWwRq4/TfcDS6fp_MI/AAAAAAAAAB4/xdo51Gdp6h0/s320/1.bmp
http://1.bp.blogspot.com/-kvpD0ZWuv5Q/TfcDjx3YiqI/AAAAAAAAAB8/nlUaImNRSfg/s320/2.bmp
http://2.bp.blogspot.com/-W1DmSwdUSw8/TfcDyTAt3eI/AAAAAAAAACA/faHkCtr3rWQ/s320/3.bmp
In fast-start parallel rollback, the background process SMON acts as a coordinator and rolls back a set of transactions in parallel using multiple server processes.

Fast start parallel rollback is mainly useful when a system has transactions that run a long time before a commit, especially parallel Inserts, Updates, Deletes operations. When SMON discovers that the amount of recovery work is above a certain threshold, it automatically begins parallel rollback by dispersing the work among several parallel processes.

There are cases where parallel transaction recovery is not as fast as serial transaction recovery, because the PQ slaves are interfering with each other. It looks like the changes made by this transaction cannot be recovered in parallel without causing a performance problem. The parallel rollback slave processes are most likely contending for the same resource, which results in even worse rollback performance compared to a serial rollback. 


Solution
======
To disable the parallel rollback by setting the following parameter
fast_start_parallel_rollback = false

Refer to Oracle Note ID 464246.1


High CPU Wait


Your instance has spent much of its In Oracle time waiting for CPU.
High CPU Wait findings

Oracle Wait Event Tuning: Optimization with Oracle Wait Interface and Wait Event Analysis


Description
What to do next
Perform one of the following options:
n   Examine high wait for CPU statements in the Activity workspace.
n  Examine high
n   CPU statements usage in the Activity workspace.
n   Examine CPU utilization in the Statistics workspace.
n   Try to identify the system processes consuming CPU resources using the Insight Savvy for OS tool.
Advice
Perform one of the following options:
n   Identify other processes in the system.
n   The instance has spent much of its In Oracle time waiting for CPU; every process running in your system affects the available CPU resources.
Effort spent tuning non-Oracle factors can improve Oracle performance.
n   Identify heavy statements using CPU or Waiting for CPU and try to tune them.


High Other Host Wait


Your instance has spent much of its In Oracle time in Other Host Wait.
High Other Host Wait findings

Description
What to do next
Perform one of the following options:
n   Examine heavy wait for Other Host Wait statements in the Activity workspace.
n   Try to identify the system processes consuming OS resources using the Insight Savvy for OS tool.
Advice
Other Host Wait can result from any of the following causes: asynchronous I/O, gateways, or the use of NFS and TP monitors. Check the statements and programs suffering from this state and check whether the above resources are being utilized efficiently.





High Memory Wait


Your instance has spent much of its In Oracle time waiting for memory.
High Memory Wait findings

Description
What to do next
Perform one of the following options:
n   Examine high Memory Wait statements in the Activity workspace.
n   Try to identify the system processes consuming memory using the Insight Savvy for OS tool.
Advice
Perform one of the following options:
n   Identify other processes in the system.
The instance has spent much of its In Oracle time waiting for memory; every process running in your system affects the available memory.
Effort spent tuning non-Oracle factors can improve Oracle performance.
For example: the result of setting a high number of MAX_PARALLEL_SERVERS when using a parallel query option.
n   Identify heavy statements using High Memory Wait and try to tune them.


High Shared Pool Wait


Your instance has spent much of its In Oracle time waiting for the group event Shared Pool Wait.
High Shared Pool Wait findings

Description
What to do next
Perform one of the following options:
n   Examine the Oracle events that are grouped into the Wait for Shared Pool in the Statistics workspace. Determine the dominant Oracle event and follow the tuning scenario set by this event.
n   Examine high Shared Pool Wait statements in the Activity workspace.
Advice
Common scenarios for this wait occur when the shared pool is either too small or too big. Make sure your shared pool is sized according to the type of application being used (cursor sharing, literals usage, and so on.)


High Rollback Segment Wait


Your instance has spent much of its In Oracle time waiting for the group event Rollback Segment Wait.
High Rollback Segment Wait findings

Description
What to do next
Perform one of the following options:
n   Examine the Oracle events that are grouped into the Wait for Rollback segment. Determine the dominant Oracle event and follow the tuning scenario set by this event.
n   Examine the statements or objects with the highest values of Rollback Segment Wait and determine which applications are creating this wait.
Advice
To reduce contention on the rollback segment, consider one of the following solutions:
n   Add rollback segments, moving them into a less busy tablespace.
n   Change the application flow or change the rollback policy (using no logging on specific objects).


High Redo Log Buffer Wait


Your instance has spent much of its In Oracle time waiting for the Redo Log.
High Redo Log Buffer Wait findings

Description
What to do next
Perform one of the following options:
n   Examine the related Oracle events (lower area), Redo Activity (upper area), to determine the problem type in the Statistics workspace.
n   Examine high Redo Log Buffer Wait statements in the Activity workspace.
Advice
Use any one of the typical problem scenarios described below.
n   If the log buffer size is too small, this usually results in long waits for the Log Buffer Space event.
Consider increasing the Log_buffer parameter.
n   If the log buffer size is too big, this usually results in a low number of user   commits, high redo wastage statistics, and long waits for the Log File Sync   event.
Consider decreasing the Log_buffer parameter and/or the hidden LOG_I/O_SIZE parameter.
n   If there are too many commits, this usually results in long waits for the Log File Sync event and the number of user commits is very high.
Consider changing the application flow and logic (by decreasing the commit frequency or using bulk commits [resulting in larger transactions]).
n   If the LGWR is too slow, check whether Log File Sync is still the dominant event. This may be due to high values for Log File Parallel Write, or because there are not many commits. This may mean that the LGWR is underperforming.
Consider moving the log file to a faster, dedicated device.
Whenever the Log Buffer Space and Log File Sync events occur together, consider changing the hidden LOG_I/O_SIZE parameter.


High Log Switch and Clear Wait


Your instance has spent much of its In Oracle time waiting for the group event Log Switch and Clear.
High Log Switch and Clear Wait findings

Description
What to do next
Perform one of the following options:
n   Examine the Oracle events that are grouped into the Wait   for Log Switch and Clear. Determine the dominant Oracle event and follow the tuning scenario set by this event in the Statistics workspace.
n   Examine the statements with the highest values for this wait and determine which applications are creating this wait in the Activity workspace.
Advice
If the related Oracle events show too many log switches, try and reduce them by one of the following options:
n   Increase the Redo log size (when the Log File Switch (checkpoint incomplete) and/or Log File Switch Completion event is dominant).
n   Change the application flow or logging policy (by changing the commit frequency or using No Logging on specific objects).
There can be other reasons for a high Log Switch and Clear wait, such as an LGWR delay where the files cannot be switched until ARCH archiving is completed. This is usually caused by the Log File Switch (archiving needed) event.


High RAC/OPS Wait


Your instance has spent much of its In Oracle time waiting for RAC or OPS.
High RAC/OPS Wait findings

Description
What to do next
Perform one of the following options:
n   Examine the Oracle events that are grouped in the Oracle RAC/OPS Wait. Determine the dominant Oracle event and follow the tuning scenario set by this event in the Statistics workspace.
n   Examine Load balancing between RAC instances in the Activity workspace.
n   Examine Heavy objects suffering from RAC Waits in the Activity workspace.
Advice
There are two typical scenarios relevant to RAC Waits. Launch to the Dashboard RAC Database view. This view compares the selected instance with other instances in the same database. From the RAC Database view, examine each of the following issues:
n   If there is a balancing instances issue, launch to the Activity workspace and identify the root cause for this unbalanced issue.
n   If there is a RAC Wait event, compare them to other events in the database in the Statistics workspace.
n   If there are Objects suffering from RAC Waits, launch to the Activity workspace and identify their use across Database instances.



High Other Lock Wait


Your instance has spent much of its In Oracle time waiting for a latch.
High Other Lock Wait findings

Description
What to do next
Perform one of the following options:
n   Examine Latching view overtime and the Oracle latches that are grouped in the Other Lock Wait. Then determine the dominant Oracle latch or enqueue and follow the tuning scenario set by this latch in the Statistics workspace.
n   Examine high Other Lock Wait statements in the Activity workspace.
Advice
Examine the Oracle latches that are grouped into the Other Lock Wait. Determine the dominant Oracle latch or enqueue and follow the tuning scenario set by this event.

Get Oracle Certifications for all Exams
Free Online Exams.com