StarQuest Technical Documents

SQDR Plus: Oracle XStream Diagnostics

Last Update:13 September 2022
Product: SQDR & SQDR Plus
Version: 5.30 and later
Article ID: SQV00PL072

Use the script xstream_diagnostics.sql to gather some diagnostic details about XStream on your Oracle source.

The queries are to be run in the root container as the XStream admin user (usually c##xstrmadmin in a pdb environment, xstrmadmin in a non-pdb environment.)

Please provide the output of this script along with tier2 support logs when you have an issue for StarQuest support to investigate.

This script (or the individual queries) can be run easily in SQL*Plus or SQL Developer. When running individual queries, run this first to make sure you get full date/time details

ALTER SESSION SET NLS_DATE_FORMAT = 'DD-MON-YYYY HH24:MI:SS';

Here are the more details about some of the queries.:

Capture details including latency

SELECT CAPTURE_NAME,
STATE,
((SYSDATE - CAPTURE_MESSAGE_CREATE_TIME)*86400) LATENCY_SECONDS,
((SYSDATE - CAPTURE_TIME)*86400) LAST_STATUS,
TO_CHAR(CAPTURE_TIME, 'HH24:MI:SS MM/DD/YY') CAPTURE_TIME,
TO_CHAR(CAPTURE_MESSAGE_CREATE_TIME, 'HH24:MI:SS MM/DD/YY') CREATE_TIME
FROM gV$XSTREAM_CAPTURE;

We can see from the output below that the XStream capture has a latency of 421465 seconds (117 hours).
The reason for this latency needs to be investigated.
In this case, the XStream agent was stopped for several days and upon startup the oldest messages were 117 hours old.

CAPTURE_NAME STATE LATENCY_SECONDS LAST_STATUS CAPTURE_TIME CREATE_TIME
CAP$_RAC1PHX1NK_1 WAITING FOR TRANSACTION 421465 0 09:24:14 08/24/22 12:19:49 08/19/22

Current position of XStream Outbound Server

SELECT SERVER_NAME,
SOURCE_DATABASE,
PROCESSED_LOW_POSITION,
TO_CHAR(PROCESSED_LOW_TIME,'HH24:MI:SS MM/DD/YY') PROCESSED_LOW_TIME
FROM ALL_XSTREAM_OUTBOUND_PROGRESS;

Here we can see the processed_low_position for the XStream Outbound Server. This can tell us low watermark transacton which in turn can help us understand if the server is caught up and is updating its position regularly. This query also confirms the Outbound SERVER_NAME and the source database name.

SERVER_NAME SOURCE_DATABASE PROCESSED_LOW_POSITION PROCESSED_LOW_TIME
RAC1PHX1NK RAC1_PDB1.SUB08091883250.VCN1.ORACLEVCN.COM 00000000052E80A2000000010000000100000000052E80A2000000010000000102 09:26:42 08/24/22

State of Capture

SELECT CAPTURE_NAME,
STATE,
TOTAL_MESSAGES_CAPTURED,
TOTAL_MESSAGES_ENQUEUED
FROM gV$XSTREAM_CAPTURE;


This query gives us important information about the state of the capture. Possible states include:

WAITING FOR REDO - This state shows the Outbound Server is waiting on redo logs to continue its work. If redo/archivelogs are missing then the Outbound Server (and StarQuest agent) will wait, without throwing errors until the missing redo is available

WAITING FOR REDO: FILE /rdsdbdata/db/JDEPROD_A/arch/redolog-635566-1-1069256902.arc, THREAD 1, SEQUENCE 635566, SCN 0x0000000c9e375c6a

CAPTURING CHANGES

PAUSED FOR FLOW CONTROL - This state means that the capture is unable to enqueue log entries either because of low memory or because propagations and outbound servers are consuming messages slower than the capture process is creating them. This state indicates flow control that is used to reduce spilling of captured LCRs when propagation or apply has fallen behind or is unavailable.

WAITING FOR TRANSACTION - This state shows the capture is waiting for LogMiner to provide more transactions.

The output of the query also shows number of messages capture and enqueued since the capure process was last started. Check that these numbers are increasing on a non-idle system.

CAPTURE_NAME STATE TOTAL_MESSAGES_CAPTURED TOTAL_MESSAGES_ENQUEUED
CAP$_RAC1PHX1NK_1 WAITING FOR TRANSACTION 1157829 423406

Memory used by Apply

SELECT APPLY_NAME AS APP,
SGA_USED/(1024*1024) AS USED,
SGA_ALLOCATED/(1024*1024) AS ALLOCATED,
TOTAL_MESSAGES_DEQUEUED AS DEQUEUED,
TOTAL_MESSAGES_SPILLED AS SPILLED
FROM gV$XSTREAM_APPLY_READER;

This query shows important information about memory used and allocated by the apply process along with number of messages dequeued and spilled. A large number of spilled messages may affect performance and may require some tuning of parameters of the apply.

APP USED ALLOCATED DEQUEUED SPILLED
RAC1PHX1NK 421.83728790283203125 435.55146026611328125 423078 0

Streams Pool memory usage

select * from gv$sgastat where pool like '%streams%' and BYTES > 1000 order by BYTES desc;

This query gives us more details on streams pool usage. The important components to examine are free memory in the streams pool and how much message is being used by 'apply shared t'. This can confirm low memory conditions along with the previous query about the apply above.

INST_ID POOL NAME BYTES CON_ID
1 streams pool free memory 5942242160 0
1 streams pool apply shared t 459312448 1
1 streams pool Logminer LCR c 25012048 1
1 streams pool capture shared 15850752 1
1 streams pool capture shared sga 6272 1
1 streams pool apply shared transactions 6272 1
1 streams pool recov_kgqbtctx 5952 1
1 streams pool memory_knlso 4200 1
1 streams pool Sender info 3128 1
1 streams pool KGH: NO ACCESS 1920 1
1 streams pool krvxhds 1792 1
1 streams pool kwqbcqini:spilledovermsgs 1584 1

Replication events including errors

select * FROM ALL_REPLICATION_PROCESS_EVENTS order by EVENT_TIME desc

This view can reveal details about replication events involving XStream capture and apply. Columns for error and error message will reveal ORA- errors that have been raised.
If the query returns too many rows you can limit to a specific time frame (where EVENT_TIME > sysdate -3) or only look at errors (where ERROR_NUMBER is not null.
Below we can see output when an Outbound Server was stopped and the Apply then gave an error that it was disabled and then stopped then started again successfully.

STREAMS_TYPE PROCESS_TYPE STREAMS_NAME EVENT_NAME DESCRIPTION EVENT_TIME ERROR_NUMBER ERROR_MESSAGE
XSTREAM APPLY SERVER RAC1PHX1NK START SUCCESS 19-AUG-22 12.46.05.824735000 PM
XSTREAM APPLY READER RAC1PHX1NK START SUCCESS 19-AUG-22 12.46.05.814012000 PM
XSTREAM APPLY COORDINATOR RAC1PHX1NK START SUCCESS 19-AUG-22 12.46.05.785575000 PM
XSTREAM APPLY READER RAC1PHX1NK STOP SUCCESS 19-AUG-22 12.46.05.430969000 PM
XSTREAM APPLY SERVER RAC1PHX1NK STOP SUCCESS 19-AUG-22 12.46.05.430969000 PM
XSTREAM APPLY SERVER RAC1PHX1NK ABORT ERROR 19-AUG-22 12.46.01.895823000 PM 26920 ORA-26920: outbound server "" has been stopped.

Additional diagnostics

For more detailed information, we also suggest the script in the Oracle Support note
Streams Configuration Report and Health Check Script (Doc ID 273674.1)
Though the document refers to Oracle 12, the script runs on later versions as well. Also, if you have a RAC (Real Application Cluster) system, run the Streams Health Check Script on each node of the Oracle RAC database. This script needs to be run with a user that has sysdba privileges.

Oracle documentation

Monitoring XStream Out (Oracle 11.2)
Monitoring XStream Out (Oracle 12.2)
Monitoring XStream Out (Oracle 19.1)

 

 



DISCLAIMER

The information in technical documents comes without any warranty or applicability for a specific purpose. The author(s) or distributor(s) will not accept responsibility for any damage incurred directly or indirectly through use of the information contained in these documents. The instructions may need to be modified to be appropriate for the hardware and software that has been installed and configured within a particular organization.  The information in technical documents should be considered only as an example and may include information from various sources, including IBM, Microsoft, and other organizations.