Thursday, October 28, 2010

SAP PI Monitoring activities

Activities for Production Support and Maintenance projects in SAP PI/XI

While working in a Production Support and Maintenance project, we have to perform a series of day to day activities as a responsibility of an SAP PI / XI consultant. Here, I would like to list a series of steps that may be useful to monitor the PI / XI Production server.
J2EE Engine side monitoring:
1. Enter the URL for all the instances of your Production Server http://:500/rep/start/index.jsp
Make sure that these links are responding.
2. Go to Runtime Workbench (RWB) using URLhttp://:500/rwb
and perform the following:
a) Click on Component Monitoring and then click on display and then to Monitor your Adapter engine click on Adapter Engine.
b) In Adapter Engine Monitoring, monitor the adapter with errors.
c) After doing this monitoring activity click on Message Monitoring.
d) To do this click on Message Monitoring Tab.
e) Now choose free entry from start/end date combo Box and then choose the status of messages that were processed by Adapter engine.
f) Check the messages for To be delivered state.
g) Check the messages for Delivering state.
h) Check the messages for All containing error state.
i) Use filter in this monitor page if needed.
Once done with all of the above activities means we are done with J2EE Engine side Monitoring.
ABAP side monitoring:
1. Use SXMB_MONI. Transaction to monitor the messages which are processed through the entire XI pipeline that is Integration Server.
2. Use SXMB_MONI_BPE Transaction to monitor the messages which are processed by BPE (Business Process Engine).
3. Use SM37 to check the Job status.
4. Use SMQR Transaction for monitoring the XI Run Time queues (I/B, O/B Queues).
5. Use SMQ1 Transaction for monitoring the XI Run Time queues (O/B Queues).
6. Use SMQ2 Transaction for monitoring the XI Run Time queues (I/B Queues).
7. Use SLDCHECK to check SLD Connection. If there are any red entries then report it to SAP Basis Team.
8. Run transaction SM21 (Read System Log) on the XI server to look for error messages around the time you got the error.
9. Check the all instances of your Production Server are running fine.
To check this, use SM51 transaction. All instances should be in active state
Now we are done with ABAP side Monitoring.
· Check the log and Traces file
· Check the URL for server monitoring
http://:500/MessagingSystem/monitor/systemStatus.jsp
Ref: from http://www.saptechnical.com/

http://wiki.sdn.sap.com/wiki/display/XI/Michal+Krawczyk+-+FAQ+blog+-+Wiki
http://www.sdn.sap.com/irj/scn/weblogs?blog=/pub/wlg/2728
http://wiki.sdn.sap.com/wiki/display/XI/Troubleshooting
1. Process XML messages – Standard and Process (check for message failures, determine if ABAP or Java related)
2. Job Overview
3. Persistence Layer Analysis (Observe growth day to day, no sudden jumps).
4. Tcode: SMQ2, SMQ1 (check for blocking)
5. Tcode: SM66 (no long running DIA)
6. Tcode: SM21, ST22 (look for unusual errors)
7. Tcode: ST04 (Overview - hit ratios)
8. Tcode: ST06
1. CPU (no overloads)
2. Memory
9. Tcode: ST03 (Workload overview. Response times, special attention on DIA, HTTP)
10. Tcode: SMICM (for each server: look for Current/Peak/Maximum values and make its not Peak is not hitting MAX)

RWB:

1. Message Monitoring – Database (overview) -Integration Engine and Adapter engine (Look for unusual errors)
2. Performance Monitoring – last 7 days/Daily (look for volume jumps)
More inportantly if you have Wily INTROSCOPE in your environment look for trends in messages, garbage collections, etc.

And more in detail documentation on 'Process Integration PI 7.1' Trobleshooting Guide

https://websmp104.sap-ag.de/operationsNWpi71
-> Process Integration
-> Troubleshooting Guide - SAP NetWeaver PI 7.1

XI/PI: Dealing with Errors on the Outbound side


Sometimes the link between SAP XI and the target system (say ERP) goes down and messages fail on the outbound side. It may not be possible to restart them from using RWB or the transactions like SXI_MONITOR/SXMB_MONI. This article explains how to deal with such messages.
Generally, messages are picked up and sent via SAP XI when the link returns. However, in some scenarios, it may be possible that SAP XI server could not finish conversation with ERP. Main status of messages is “Processed successfully” – but there is an error in the outbound side as shown below. (Transactions – SXI_MONITOR/SXMB_MONI).
image
These messages do not get picked up automatically – and it is not possible restart them from using RWB or the transactions like SXI_MONITOR/SXMB_MONI.
Such messages could be processed in the following way:
  • Send data directly to Integration Engine
  • Change the status of failed message
This example shows how to solve the problem – two error messages are shown and one of them is solved here.
Step 1: Send data directly to Integration Engine
Go to Component Monitoring in SAP XI Runtime Workbench. Click on the Test Message tab for the Adapter Engine. Specify the URL of SAP XI Integration engine to send the message to e.g. http://:/sap/xi/engine?type=entry
image
Specify the header Information. Copy payload of the message using SXI_MONITOR/SXMB_MONI and paste it into Payload area in RWB.
image
Send the message using Send Message button.
Step 2: Change the status of failed message
Call the transaction SWWL.
image
Delete appropriate work items.
image
Check that the messages are complete in SXI_MONITOR/SXMB_MONI.
image
Another simpler way to accomplish this is to use transaction SXMB_MONI_BPE . Select Continue Process Following Error under Business Process Engine -> Monitoring and Execute (F8). Update the selection criteria as required and Execute (F8). Choose the appropriate line item and click on Restart workflow button.
In these ways we can reprocess the messages failed on the outbound side. If the messages do not participate in BPM process, then they can still be traced via outbound queues or SM58 logs and restarted.

Problem with Cache Refresh

When we perform any changes to our design objects and configuration objects which were created in our IR and ID and if the changes were not reflected to that objects, at that time every one mind would point to perform the cache refresh. We have different options available to perform the cache refresh.
  1. In IR from MenuàEnvironmentàClear SLD Data cache.
  2. In ID from Menu ClearàEnvironmentàSLD Data Cache.
From PI home page under administration we have the cache overview.
When we log in to ABAP stack and hit the transaction SXI_CACHE there you can observe the status will be in red color and the cache will not be up-to-date and unable to refresh cache. Even though if we perform the cache refresh the changes are not reflecting since the cache is not working properly.
From IR from menu using cache notifications we can see whether cache is refreshing or not. From that option also it shows the status as RED color and cache is not up-to-date and cache is not refreshing.
So how to resolve this issue and what might be the problem for displaying the status as red and cache is not up-to-date.
To resolve this issue logon ABAP stack and check the RFC Destination of
Type H: INTEGRATION_DIRECTORY_HMI in this check whether the path prefix is maintained or not.
If we observe the above screenshot we can identify that path prefix as not maintained anything. So maintain the path prefix as in the below screen shot and save it.
After performing the above step go to SXI_CACHE and perform cache refresh. Now cache will perform the refresh and the status will be as below.
Messages in XI can fail due to many reasons. Most of the common failures are due to connection failure to end systems, wrong or missing configuration settings, exceptions that weren't handled or lack of disk space for processing messages. These errors can be categorized as those generated in
I. Integration Engine
II. Adapter Engine.
I. Errors in Integration Engine
a) qRFC Errors
Often in asynchronous scenarios where inbound queues are used, the queues are set to SYSFAIL status and all the messages in the inbound queue are stuck (not processed). Depending on the status of XI processing queues, we can reset a queue’s status and trigger processing of messages.
Manual Resend of messages: Use transaction SMQR or SMQ2 to reset the status of queues. As you can see in the following figure, the queue has been marked with a status sysfail.
image
To be able to initiate processing of messages stuck in the queue, make sure to set following IS configuration parameter
MONITOR QRFC_RESTART_ALLOWED to value 1
image
For automatic qRfc failure recovery, schedule the report RSQIWKEX to run periodically. This report enables automatically resets the queues.
b) tRFC Errors
Like qRFC errors one can either manually or automatically initiated processing of messages hanged tRFC calls.
Manual Resend of messages: Use transaction SM58 and check through the list. If necessary, start hanging tRFC calls
under the Edit menu by choosing Execute LUWs.
For automatic tRfC failure recover, schedule the report RSARFCEX for periodic execution.
c) Other Errors
All the errors generated and captured in Integration engine can be viewed using transaction SXMB_MONI. Message that were sent asynchronously and had failed due transient system/configuration failures can be manually restarted in SXMB_MONI.
image
But would it be fun to restart many messages manually. What is required is a way to be able to automatically resend messages that error out. Thankfully there are many ways of doing this in XI.
Option 1
IS_Retry
A batch job( internal in XI) is automatically scheduled to reprocess the entry after 2 minutes.
If the maximum number of retries was reached (10 by default; IS configuration parameter
TUNING IS_RETRY_LIMIT), a communication error then causes a SYSFAIL status for a queue.
Option 2 The problem with setting IS_RETRY is that every message with a failure status will be retried every 2minitues till the maximum number of retries is reached. Since there is no control on the retry period , a high retry count could cause excessive load on XI. The other option is to do Mass Restart by scheduling the report RSXMB_RESTART_MESSAGES at a predetermined retry period like 1hr. There is a catch here, RSXMB_RESTART_MESSAGES tries to restart a failed message 800 times by default. So if there is a message that failed due to genuine reasons, we may want to limit the number of retries. It is recommend by SAP to reduce the retry count to 20 restarts. (You can always manually restart a message, from the monitor, up to 990 times).

This value can be maintained in SXMB_ADM-> specific configuration 'DELETION' 'MAX_VERSION' 'BATCH_RETRY' . If you don't see the DELETION category , you must run the report RSXMB_CREATE_CONF_ENTRIES3 to generate the configuration parameter.

Finally here is the table that describes ways to handle resubmit of errors in Integration Engine
Type of Error
Manual Start
Automatic Start
qRFC
SMQ2
RSQIWKEX
tRFC
SM58
RSARFCEX
OTHER errors
SXMB_MONI
RSXMB_RESTART_MESSAGES

II. Errors in Adapter Engine
Till now we have seen how to resubmit/restart message that failed in Integration Engine. One a message makes it from Integration Engine to Adapter Engine, the message is flagged as checked in Integration Engine. The status of the message in Adapter engine does not effect the processed state in Integration Engine. Now if this message was asynchronous, XI will by default try to restart the message 3 times at intervals of 5 minutes before the status of the message is changed from Waiting to System Error .
image

image

image
As shown in the above figures a message is initially put into waiting status, XI tries 3 times before changing the status of the message to System Error. One can Manually resend the error messages by using the RESEND button in RWB. In scenarios where XI was trying to send the message to an end system that was down for maintenance, you would want XI to resubmit the message automatically without human intervention. What would be nice is to able to tune the retries like IS_Retry which is available for Integration engine.
We can achieve this by changing the retry count used by the Adapter Engine, by default its set to 3 times, 5 minutes apart. This count can be changed inVisual Admin->server->services-> SAP XI Adapter: XI.Here change the number Retries parameter from 3 to 10 and change the retry retryInterval to around 10minutes. For these configuration changes to be picked up, restart SAP XI Adapter: XI.
Conclusion
Error in XI are inevitable, but when they occur we should be able to restart or resend the messages in a way that requires minimal human intervention, especially if the errors were due to system outage or system memory exceptions. In this weblog I have tried to list out the most commonly occurring errors and the many ways of restarting these messages.


Summary


1. routine monitoring

a. Adapter engine monitoring:  message monitoring   or  server: host/mdt 
b. Integration engine monitoring: SXMB_MONI, SXI_MONITOR
c. communication channel monitoring.( if you find if any message failed). In Run time work bench
   
2.BPM monitoring:
  use transaction: sxmb_moni_bpe:
       i) use table view and graphical view of the bpm
       2)table view is helpful, to see payload at various levels
       3)Graphical flow helps us to understand the flow on different steps.( select any step then go for table view to see status about messages or variables by clicking content tab, we can use trace option to trace the message.

3. Outages( No need to stop Integration server and Adapter engine)

    a)Planned outage
       Before outage:
        i) stop all sender specific communication channel for that applications.
       ii) coming to sender abap proxies , sender idoc, sender HTTP , just intimate them to stop sending    messages from sender system.       

       After outage:

        i) restart all sender specific communication channels       

   b)Unplanned outage:

       i) Adapter engine(A.E.) level: Message might be fail at this level
      ii) Integration engine level:
          a) check smq2, Check system errors, reprocess those message when ever receiver system up.