sg246880 PDF
sg246880 PDF
Problem Determination
for WebSphere for z/OS
Problem determination methodology Problem symptoms and their solutions Means and tools to support problem determination
Rica Weller Cleberson Calefi Per Fremstad Keith Jabcuga Suresh Maddukuri Kiet Nguyen Robyn Nostalgi Rajesh Pericherla
ibm.com/redbooks
International Technical Support Organization Problem Determination for WebSphere for z/OS August 2006
SG24-6880-02
Note: Before using this information and the product it supports, read the information in Notices on page xv.
Third Edition (August 2006) This edition applies to Version 6, Release 0, Modification 1 of WebSphere Application Server for z/OS (program number 5655-N01).
Copyright International Business Machines Corporation 2002, 2005, 2006. All rights reserved. Note to U.S. Government Users Restricted Rights -- Use, duplication or disclosure restricted by GSA ADP Schedule Contract with IBM Corp.
Contents
Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ix Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiii Notices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xv Trademarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xvi Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xvii How the book is structured. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xvii What this book is about . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xvii Who this book is for . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xviii How to use this book . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xviii The team that wrote this redbook. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xix Become a published author . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xxi Comments welcome. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xxi Part 1. Problem determination methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 Chapter 1. Problem determination methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.1 What problem determination is . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 1.2 What problem determination is not . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 1.3 Problem determination approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 1.3.1 Problem determination flow chart . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 1.3.2 Problem determination process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 1.3.3 Specific considerations for WebSphere for z/OS problem determination . . . . . . . . 8 1.3.4 The importance of a test environment. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 1.4 The skills needed for WebSphere for z/OS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 1.4.1 System skills . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 1.4.2 Skills for deploying and running an application . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 Chapter 2. Contacting IBM: Information . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.1 Communicating with IBM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2 The IBM WebSphere support structure. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.3 Before you contact IBM support . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.3.1 Defining the problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.3.2 Determining whether this situation has already been reported . . . . . . . . . . . . . . . 2.3.3 Gathering background information . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.3.4 Determining the business impact . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.4 How IBM Software Support handles your problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.4.1 The PMR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.4.2 Investigating a PMR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.4.3 How technical questions are handled by IBM . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.5 Exchanging data with IBM by FTP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.5.1 Copying the job log into a z/OS data set. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.5.2 Compressing the data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.5.3 Finding specific FTP instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.5.4 Using naming conventions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.6 IBM contacts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 14 14 15 15 15 16 18 19 19 19 20 21 21 21 22 23 23
iii
Chapter 3. Information sources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.1 WebSphere for z/OS support pages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.1.1 The WebSphere for z/OS home page. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.1.2 WebSphere support page . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.1.3 WebSphere for z/OS V6 product manuals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.1.4 WebSphere for z/OS V6 Information Center. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.1.5 WebSphere for z/OS IBM services . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.1.6 Recommended reading list: WebSphere Application Server . . . . . . . . . . . . . . . . 3.2 Techdocs: White papers, hints, and tips . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.3 Redbooks and draft publications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.4 Sources of information for developers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.4.1 WebSphere Developers Domain . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.4.2 The alphaWorks community . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.4.3 Java . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.5 Other helpful Web sites . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.5.1 zSeries support . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.5.2 z/OS home page . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.5.3 LookAt messages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.5.4 All software products. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.5.5 IBM Software support guide . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.5.6 z/OS Internet library . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.6 Educational information. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.6.1 IBM Global Services . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.6.2 WebSphere for z/OS training and certification . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.6.3 IBM Education Assistant . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
25 26 26 27 27 29 30 31 31 31 32 32 33 33 33 34 34 34 35 35 36 36 36 36 36
Part 2. Problem symptoms and their resolutions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 Chapter 4. Exceptions and error messages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.1 What is an exception or error? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2 Symptom flow chart: Exceptions and error messages . . . . . . . . . . . . . . . . . . . . . . . . . 4.3 Diagnosing an error or exception message . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Chapter 5. Abend . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.1 What is an abend? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.2 Symptom flow chart: Abend . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.3 Diagnosing an abend . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Chapter 6. Hang . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.1 What is a hang? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.2 Symptom flow chart: Hang . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.3 Diagnosing a hang . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Chapter 7. Timeout . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.1 What is a timeout? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.2 Symptom flow chart: Timeout . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.3 Diagnosing a timeout . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Chapter 8. Does not stop . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.1 What is the does not stop symptom? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.2 Symptom flow chart: Does not stop . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.3 Diagnosing the symptom. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 41 42 43 49 50 50 51 59 60 60 60 67 68 68 69 75 76 76 77
9.2 Symptom flow chart: Job failed . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84 9.3 Diagnosing the job failed symptom . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85 Chapter 10. No response . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.1 What does no response mean? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.2 Symptom flow chart: No response . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.3 Diagnosing the no response symptom . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89 90 90 91
Chapter 11. No resource access . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99 11.1 What is no resource access? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100 11.2 Symptom flow chart: No resource access . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100 11.3 Diagnosing no resource access . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101 Chapter 12. High CPU utilization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12.1 What is high CPU utilization? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12.2 Symptom flow chart: High CPU utilization. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12.3 Diagnosing high CPU utilization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Chapter 13. WebSphere for z/OS performance analysis . . . . . . . . . . . . . . . . . . . . . . . 13.1 Performance terminology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13.1.1 Response time . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13.1.2 Throughput . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13.1.3 Transaction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13.1.4 Hit rate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13.1.5 Page view rate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13.1.6 Number of clients and think time. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13.1.7 Resource. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13.2 Managing performance of WebSphere transactions . . . . . . . . . . . . . . . . . . . . . . . . . 13.2.1 Managing the number of servant regions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13.2.2 Managing the number of JVM threads . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13.2.3 Classifying servant region enclaves (WebSphere transactions) . . . . . . . . . . . . 13.2.4 Classifying servant regions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13.2.5 Classifying controller regions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13.2.6 Special considerations for HTTP requests over multiple servants . . . . . . . . . . 13.3 Introduction to performance analysis and management . . . . . . . . . . . . . . . . . . . . . . 13.3.1 Setting your performance expectations. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13.3.2 What is a performance problem and how do you manage it?. . . . . . . . . . . . . . 13.3.3 What to do about a performance problem. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13.4 Diagnosing performance problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13.4.1 Understanding the expectations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13.4.2 Quantify: Take a quick snapshot view of the system . . . . . . . . . . . . . . . . . . . . 13.4.3 Finding the cause . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13.4.4 Analyzing a heap or memory problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13.4.5 Analyzing a response time problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13.4.6 Analyzing a high CPU usage problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13.5 Related information sources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109 110 111 111 117 118 118 118 118 119 119 119 119 120 120 122 122 124 125 125 126 126 127 128 129 130 131 132 136 136 137 138
Part 3. Problem avoidance and best practices. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139 Chapter 14. Phase 1: Installation, configuration, and migration . . . . . . . . . . . . . . . . 14.1 Preparing the Installation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14.2 Installation and configuration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14.3 Migration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14.3.1 Migrating from V5.x to V6.0.x . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141 142 143 144 144
Contents
14.3.2 Migrating from V4.0.1 to V6.0.x . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14.3.3 Migrating from V3.5 Standard Edition to Version 6.0.x . . . . . . . . . . . . . . . . . . . 14.3.4 Checklist for migration. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14.4 Coexistence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14.5 Most common problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14.6 Related references . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Chapter 15. Phase 2: Application deployment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15.1 Tools for the deployment phase . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15.1.1 Installing and deploying application files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15.1.2 Logging and tracing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15.2 Problem avoidance checklist. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15.2.1 Assembling an application . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15.2.2 Deploying an application . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15.3 Most common problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15.4 Related references . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Chapter 16. Phase 3: Run applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16.1 Request process overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16.2 Model-view-control model for problem determination . . . . . . . . . . . . . . . . . . . . . . . . 16.2.1 Typical problems in the view tier . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16.2.2 Typical problems in the control tier . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16.2.3 Typical problems in the model tier . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16.3 Problem avoidance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16.3.1 Designing, coding, and testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16.3.2 Change control . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Chapter 17. Phase 4: System run time . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17.1 The WebSphere for z/OS V6 runtime environment. . . . . . . . . . . . . . . . . . . . . . . . . . 17.2 Problem categories in the runtime phase . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17.3 Understanding your own runtime configuration. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17.4 Troubleshooting tips for the runtime environment . . . . . . . . . . . . . . . . . . . . . . . . . . . 17.5 Security issues and problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17.6 Problem avoidance checklist. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17.7 Typical problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
145 145 145 147 147 151 153 154 154 155 156 156 156 157 163 165 166 167 167 170 173 175 175 176 177 178 180 182 182 183 183 186
Part 4. Problem Determination Means and Tools. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 193 Chapter 18. Commands . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18.1 Commands for administering WebSphere for z/OS . . . . . . . . . . . . . . . . . . . . . . . . . 18.2 z/OS MODIFY commands. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18.2.1 z/OS DISPLAY commands . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18.2.2 Basic TRACE commands . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18.2.3 Dynamic Java TRACE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18.3 TCP/IP related commands . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18.3.1 The netstat command . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18.3.2 The nslookup command . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18.3.3 The ping command . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18.3.4 The tracert command . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18.4 USS and OMVS commands . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18.4.1 Display file system with df . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18.4.2 Display disk space usage with du . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18.4.3 Display thread information with ps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18.4.4 Display thread details with DISPLAY . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 195 196 196 197 201 201 202 203 204 205 205 206 206 207 208 209
vi
18.4.5 Search string patterns with WASgrep.sh . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 210 18.5 Windows FTP command . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 211 Chapter 19. Logs for problem determination in WebSphere for z/OS . . . . . . . . . . . . 19.1 Job logs and system log . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19.1.1 When to use system log and job logs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19.1.2 How to set up system log and job logs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19.1.3 System log and job log output and their interpretation . . . . . . . . . . . . . . . . . . . 19.2 WebSphere error log (BBORBLOG) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19.2.1 When to use BBORBLOG. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19.2.2 How to set up BBORBLOG . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19.2.3 BBORBLOG output and its interpretation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19.3 First Failure Data Capture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19.3.1 When to use FFDC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19.3.2 How to set up the FFDC tool. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19.3.3 FFDC output and its interpretation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19.3.4 Example: Using the FFDC tool for problem determination . . . . . . . . . . . . . . . . 19.4 The Java Logging API . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19.4.1 What is the Java Logging API and when to use it. . . . . . . . . . . . . . . . . . . . . . . 19.4.2 Setting up the Java Logging API. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19.4.3 Java Logging output and interpretation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19.4.4 Java Logging API example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19.5 IBM HTTP Server logs and trace . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19.5.1 Server error log . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19.5.2 Server access log . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19.5.3 Very verbose trace . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19.5.4 HTTP plug-in log . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Chapter 20. WebSphere for z/OS traces and dumps . . . . . . . . . . . . . . . . . . . . . . . . . . 20.1 CTRACE for WebSphere . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20.1.1 Setting up and taking a CTRACE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20.1.2 Viewing CTRACE and JRas data through IPCS . . . . . . . . . . . . . . . . . . . . . . . . 20.2 JDBC trace . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20.2.1 Setting up the JDBC trace . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20.2.2 JDBC trace output and interpretation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20.3 SVC dumps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20.3.1 Capturing an SVC dump . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20.3.2 Problems capturing an SVC dump . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20.3.3 Formatting an SVC dump using IPCS. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20.3.4 Related references . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20.4 CEEDUMP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20.5 Java Transaction Dump (TDUMP) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20.6 Javadump . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20.7 Heapdump. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Chapter 21. Diagnostic tools for WebSphere for z/OS . . . . . . . . . . . . . . . . . . . . . . . . . 21.1 Collector tool . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21.2 JVM dump and heap analysis tools . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21.2.1 Svcdump.jar . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21.2.2 HeapRoots . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21.2.3 Dumpviewer GUI and jformat . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21.3 Memory Dump Diagnostic Tool for Java . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21.4 Trace Analyzer for WebSphere Application Server. . . . . . . . . . . . . . . . . . . . . . . . . . 21.5 Java Garbage Collection Formatter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Contents
213 214 214 214 215 216 216 217 218 219 220 220 221 225 227 227 228 231 232 232 233 234 235 237 241 242 242 242 244 244 245 247 247 248 248 249 249 250 251 251 253 254 254 255 259 259 260 261 262 vii
21.6 dumpNameSpace tool. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21.7 Rational Application Developer V6 Debug Perspective . . . . . . . . . . . . . . . . . . . . . . 21.7.1 When to use the Rational Application Developer Debug Perspective . . . . . . . 21.7.2 Setting up the Rational Application Developer Debug Perspective . . . . . . . . . 21.7.3 Rational Application Developer debugger output and interpretation. . . . . . . . . 21.8 Tivoli Performance Viewer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21.8.1 Setting up Tivoli Performance Viewer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21.8.2 Tivoli Performance Viewer output and its interpretation . . . . . . . . . . . . . . . . . . 21.9 OMEGAMON XE for WebSphere . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Chapter 22. Other handy tools . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22.1 TCP/IP related tools . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22.1.1 TCP/IP checkout program (InetInfo.java) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22.1.2 TCP/IP network packet tracing with Ethereal . . . . . . . . . . . . . . . . . . . . . . . . . . 22.1.3 TCP/IP for z/OS packet trace . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22.2 MVS Extended Information . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22.3 Resource Measurement Facility reports . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22.3.1 Running the RMF post processor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22.3.2 Analyzing RMF reports . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22.3.3 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22.4 System Management Facility records and browser . . . . . . . . . . . . . . . . . . . . . . . . . 22.4.1 Setting up SMF recording . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22.4.2 WebSphere for z/OS SMF browser. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22.5 Stress test tools. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22.5.1 WebSphere Studio Workload Simulator for z/OS and OS/390 . . . . . . . . . . . . . 22.5.2 Microsoft Web Application Stress Tool . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22.6 FTP, Telnet, and editors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22.6.1 TeraTerm Pro . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22.6.2 WS_FTP Professional. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22.6.3 Directing SYSPRINT output to an HFS file . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22.6.4 UltraEdit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Appendix A. Messages and codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A.1 WebSphere for z/OS message codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A.1.1 Specific Java component messages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A.1.2 Minor codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A.1.3 Abends . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A.2 System and component message table . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
265 266 266 267 269 270 270 271 273 275 276 276 277 279 281 283 284 285 292 293 293 295 300 300 305 306 306 307 308 308 311 312 312 314 315 315
Appendix B. Additional material . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 319 B.1 Locating the Web material . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 320 B.2 Using the Web material . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 320 Related publications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . IBM Redbooks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Other publications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Online resources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . How to get IBM Redbooks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Help from IBM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 321 321 321 322 323 323
Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 325
viii
Figures
1-1 1-2 1-3 2-1 3-1 3-2 3-3 3-4 3-5 3-6 3-7 3-8 3-9 4-1 4-2 5-1 5-2 5-3 5-4 5-5 5-6 5-7 6-1 7-1 7-2 7-3 7-4 8-1 8-2 8-3 8-4 8-5 8-6 8-7 9-1 10-1 10-2 10-3 11-1 11-2 11-3 11-4 11-5 11-6 12-1 13-1 13-2 13-3 General problem determination flow chart . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 Working together . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 Deploying applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 IBM support structure. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 IBM WebSphere Application Server for z/OS home page . . . . . . . . . . . . . . . . . . . . . 26 IBM WebSphere for z/OS support Web site . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 IBM WebSphere Application Server library page. . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 IBM WebSphere Application Server for z/OS Information Center page . . . . . . . . . . 29 Messages and codes in the Information Center . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 Recent IBM Redbooks and Redpapers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 zSeries support page . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 LookAt messages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 IBM Education Assistant WebSphere for z/OS: Problem determination . . . . . . . . . . 37 Flow chart for symptom: Exception and error . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42 A Java stack trace with an exception. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44 Flow chart for symptom: Abend . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50 Example of IEA995I message with abend code . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51 Output from IPCS ip st worksheet validate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55 IPCS ip summ format output showing RTM2WA SUMMARY . . . . . . . . . . . . . . . . . . 56 Browse dump storage using IPCS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56 Search for eye catchers in dump storage near PSW address . . . . . . . . . . . . . . . . . . 56 Traceback using IPCS data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57 Flow chart for symptom: Hang in the application server . . . . . . . . . . . . . . . . . . . . . . 60 Flow chart for symptom: Timeout. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68 Session timeout . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69 Active Jobs in DA panel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70 FFDC file information in trace . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72 Flow chart for symptom: Does not stop . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76 FFDC file information in trace . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78 Output from IPCS ip st worksheet validate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79 IPCS ip summ format output showing RTM2WA SUMMARY . . . . . . . . . . . . . . . . . . 80 Browse dump storage using IPCS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80 Search for eye catchers in dump storage near PSW address . . . . . . . . . . . . . . . . . . 80 Traceback using ipcs ledata . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81 Flow chart for symptom: Job failed . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84 Flow chart for symptom: No response . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90 Status of installed applications in Administrative Console . . . . . . . . . . . . . . . . . . . . . 92 WebSphere for z/OS change log detail levels . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94 Flow chart for symptom: No resource access . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100 New JDBC provider for DB2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102 Properties for Data Sources. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102 Changing database access scope . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103 Test JDBC connection in Administrative Console . . . . . . . . . . . . . . . . . . . . . . . . . . 104 WebSphere for z/OS logging detail levels . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105 Flow chart for symptom: High CPU utilization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111 WebSphere workload definition with Workload Manager . . . . . . . . . . . . . . . . . . . . 121 WLM definitions for servers and transaction classes in CB subsystem. . . . . . . . . . 123 WLM definition of Service Class WASHI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124
ix
13-4 13-5 13-6 13-7 13-8 13-9 14-1 16-1 16-2 16-3 16-4 16-5 17-1 17-2 18-1 18-2 18-3 19-1 19-2 19-3 19-4 19-5 19-6 19-7 19-8 19-9 19-10 19-11 20-1 21-1 21-2 21-3 21-4 21-5 21-6 21-7 21-8 21-9 21-10 21-11 22-1 22-2 22-3 22-4 22-5 22-6 22-7 22-8 22-9 22-10 22-11 22-12 22-13 x
WLM definition of the servant regions, STC subsystem . . . . . . . . . . . . . . . . . . . . . Performance monitoring: An overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Partition view from Partition Data Report. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . CP usage, response time, and throughput . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . CPU% and response time versus throughput . . . . . . . . . . . . . . . . . . . . . . . . . . . . . CP time per transaction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Problem areas in Phase 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Request/response flow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Verifying the status of an application server . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Hit Count application Web page . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . HTML source code instead of proper application . . . . . . . . . . . . . . . . . . . . . . . . . . JDBC Data source configuration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . WebSphere V6 for z/OS runtime structure (network deployment configuration) . . . Stand-alone application server configuration. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . USS command df display. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Using DISPLAY OMVS to show thread information. . . . . . . . . . . . . . . . . . . . . . . . . FTP client delivered with Windows . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . FFDC tool architectural overview. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ffdcRun.properties level values . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Index and exception Logs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Java logging architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Enable log in Diagnostic Trace Service . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . WebSphere Change Log Detail Levels . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . IBM HTTP Server logs and trace . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Server error log sample . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Server access log sample . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A plug-in trace record and some of the important fields . . . . . . . . . . . . . . . . . . . . . Plug-in traces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . CEEDUMP sample . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Memory Dump Diagnostic tool for Java screens . . . . . . . . . . . . . . . . . . . . . . . . . . . Trace Analyzer for WebSphere for z/OS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Enable GC Verbose in Advanced Java virtual machine settings . . . . . . . . . . . . . . . Diagram of garbage collection records . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Enable Debugging Service in Administrative Console. . . . . . . . . . . . . . . . . . . . . . . Debug menu from the Debug icon . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Remote Java Application Debug configuration . . . . . . . . . . . . . . . . . . . . . . . . . . . . The Debug Perspective in Rational Application Developer . . . . . . . . . . . . . . . . . . . Start Tivoli Performance Viewer in Administrative Console. . . . . . . . . . . . . . . . . . . Tivoli Performance Viewer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Configuration panel for thread pool properties. . . . . . . . . . . . . . . . . . . . . . . . . . . . . InetInfo.java program output . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Ethereal data analysis with Follow TCP Stream windows . . . . . . . . . . . . . . . . . . . . IPCS CTRACE display parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . TCP/IP network packet trace report. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . MXI Primary Option menu . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . CPU Activity Report (partial view) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Partition Data Report (partial view) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Partition Data report and processing weights (partial view). . . . . . . . . . . . . . . . . . . Summary Report . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Workload activity (part 1) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Workload activity (part 2) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Workload report for WebSphere server address space (partial) . . . . . . . . . . . . . . . Response time distribution (partial view) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
125 130 131 133 134 135 141 166 170 172 172 174 179 180 207 209 211 220 221 225 228 229 230 233 233 235 238 238 249 261 262 263 264 267 268 268 269 270 271 272 276 278 280 281 282 285 286 287 288 289 290 291 292
22-14 22-15 22-16 22-17 22-18 22-19 22-20 22-21 22-22 22-23 22-24 22-25 22-26
Sample summary report from SMF Browser . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . WebSphere Studio Workload Simulator window . . . . . . . . . . . . . . . . . . . . . . . . . . . Pop-up window to start recording . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . WebSphere Studio Workload Simulator with scripts of captured sessions . . . . . . . WebSphere Studio Workload Simulator window: Web session elements . . . . . . . . Variable elements through a filter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Various runtime parameters. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . WebSphere Studio Workload Simulator Monitor GUI . . . . . . . . . . . . . . . . . . . . . . . Sample simulation graph . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Microsoft Web Application Stress Tool . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . TeraTerm Pro. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Example of WS_FTP Professional) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . UltraEdit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
297 300 301 301 302 302 303 304 305 306 307 308 309
Figures
xi
xii
Tables
2-1 2-2 5-1 5-2 18-1 18-2 19-1 19-2 19-3 20-1 20-2 21-1 21-2 A-1 A-2 A-3 A-4 A-5 A-6 Problem severity levels . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 PMR numbers and what they indicate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 WebSphere related codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52 Abend reason code and explanation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52 Useful z/OS DISPLAY commands . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 200 TRACEBASIC/TRACEDETAIL codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 201 Parts of server log stream record output . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 219 Log Details Level . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 230 First line in trace sample . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 231 JDBC trace strings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 244 Useful IPCS commands for formatting an SVC dump . . . . . . . . . . . . . . . . . . . . . . . 248 Parameters and options for the dump utility . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 256 Modules and description to verify in Tivoli Performance Viewer . . . . . . . . . . . . . . . 271 WebSphere for z/OS message formats . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 312 WebSphere for z/OS messages overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 312 BBOO0222I message components . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 313 WebSphere-related abend codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 315 Example abend code and related reason code . . . . . . . . . . . . . . . . . . . . . . . . . . . . 315 System and component messages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 315
xiii
xiv
Notices
This information was developed for products and services offered in the U.S.A. IBM may not offer the products, services, or features discussed in this document in other countries. Consult your local IBM representative for information on the products and services currently available in your area. Any reference to an IBM product, program, or service is not intended to state or imply that only that IBM product, program, or service may be used. Any functionally equivalent product, program, or service that does not infringe any IBM intellectual property right may be used instead. However, it is the user's responsibility to evaluate and verify the operation of any non-IBM product, program, or service. IBM may have patents or pending patent applications covering subject matter described in this document. The furnishing of this document does not give you any license to these patents. You can send license inquiries, in writing, to: IBM Director of Licensing, IBM Corporation, North Castle Drive, Armonk, NY 10504-1785 U.S.A. The following paragraph does not apply to the United Kingdom or any other country where such provisions are inconsistent with local law: INTERNATIONAL BUSINESS MACHINES CORPORATION PROVIDES THIS PUBLICATION "AS IS" WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESS OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF NON-INFRINGEMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Some states do not allow disclaimer of express or implied warranties in certain transactions, therefore, this statement may not apply to you. This information could include technical inaccuracies or typographical errors. Changes are periodically made to the information herein; these changes will be incorporated in new editions of the publication. IBM may make improvements and/or changes in the product(s) and/or the program(s) described in this publication at any time without notice. Any references in this information to non-IBM Web sites are provided for convenience only and do not in any manner serve as an endorsement of those Web sites. The materials at those Web sites are not part of the materials for this IBM product and use of those Web sites is at your own risk. IBM may use or distribute any of the information you supply in any way it believes appropriate without incurring any obligation to you. Information concerning non-IBM products was obtained from the suppliers of those products, their published announcements or other publicly available sources. IBM has not tested those products and cannot confirm the accuracy of performance, compatibility or any other claims related to non-IBM products. Questions on the capabilities of non-IBM products should be addressed to the suppliers of those products. This information contains examples of data and reports used in daily business operations. To illustrate them as completely as possible, the examples include the names of individuals, companies, brands, and products. All of these names are fictitious and any similarity to the names and addresses used by an actual business enterprise is entirely coincidental. COPYRIGHT LICENSE: This information contains sample application programs in source language, which illustrate programming techniques on various operating platforms. You may copy, modify, and distribute these sample programs in any form without payment to IBM, for the purposes of developing, using, marketing or distributing application programs conforming to the application programming interface for the operating platform for which the sample programs are written. These examples have not been thoroughly tested under all conditions. IBM, therefore, cannot guarantee or imply reliability, serviceability, or function of these programs.
xv
Trademarks
The following terms are trademarks of the International Business Machines Corporation in the United States, other countries, or both:
Eserver Eserver Redbooks (logo) alphaWorks developerWorks z/Architecture z/OS zSeries AIX Cloudscape CICS DB2 Universal Database DB2 DFS Informix IBM IBMLink IMS Language Environment Lotus MQSeries MVS OMEGAMON OS/390 Parallel Sysplex Rational Redbooks RACF RETAIN RMF S/390 System z Tivoli WebSphere 1-2-3
The following terms are trademarks of other companies: EJB, Java, JavaServer, JavaServer Pages, JDBC, JDK, JMX, JSP, JVM, J2EE, Solaris, Sun, Sun Java, SNM, and all Java-based trademarks are trademarks of Sun Microsystems, Inc. in the United States, other countries, or both. Excel, Microsoft, Visual Studio, Windows NT, Windows, Win32, and the Windows logo are trademarks of Microsoft Corporation in the United States, other countries, or both. UNIX is a registered trademark of The Open Group in the United States and other countries. Ethereal is a registered trademark of Ethereal, Inc. Linux is a trademark of Linus Torvalds in the United States, other countries, or both. Other company, product, or service names may be trademarks or service marks of others.
xvi
Preface
This IBM Redbook can help clients and IBM employees understand the different aspects of problem determination for IBM WebSphere Application Server Version 6 for z/OS. It is intended to be an additional resource to the Information Center at: http://publib.boulder.ibm.com/infocenter/wasinfo/v6r0/index.jsp
xvii
Our goal is to help you become more self sufficient at diagnosing and resolving your WebSphere for z/OS problems. We have collected actual problem scenarios that have been reported to IBM support and presented these in a standard format to help you find problem solutions. Even when a solution to your specific problem is not found in these scenarios, we hope that you can use the symptom flow charts, references, and tools in this book to help you work through your problem and identify the cause and possible solution. If you are unable to find a solution to your problem, we also provide you with useful information to assist you when you communicate with IBM and WebSphere support teams.
xviii
Cleberson Calefi is an IBM WebSphere consultant at Bank of Brazil. For the last three years, Cleberson has worked extensively with the WebSphere Application Server environment, advising clients about problem solving, tuning, and implementation of fail-safe runtime environments. His areas of expertise include J2EE application development and WebSphere Application Server administration for z/OS, z/Linux, and Windows. He holds a degree in Information Systems from the University Alvorada, Brazil.
Per Fremstad is an IBM certified IT specialist from IBM Systems and Technology Group, Norway. He has worked for IBM since 1982 and has extensive experience with zSeries and z/OS. His areas of expertise include the Internet, the WebSphere product family, and Web enabling applications on z/OS. He teaches frequently about WebSphere and Java topics and about zSeries and z/OS at several universities. He holds a Bachelor of Science degree from the University of Oslo, Norway.
Keith Jabcuga is Software Support Specialist working at the ITSO in Poughkeepsie, New York. He has been on the WebSphere for z/OS support team for four years and his areas of expertise include defect support and application diagnostics. Keith holds a Master of Science degree in Computer Science from the University of Buffalo. Suresh Maddukuri is an IT Specialist who assists customers in the United States. He worked as an administrator for WebSphere Application Server on z/OS and distributed platforms. His main responsibilities are troubleshooting problems, performance monitoring, and application server tuning. His areas of expertise include IBM WebSphere MQ and WebSphere Business Integration Message Broker. He holds a post-graduate diploma in Computer Applications and a degree in Mechanical Engineering from Nagarjuna University, India.
Preface
xix
Kiet Nguyen is an IT Specialist with IBM Global Services/AMS CRM Siebel Development in North Carolina. He has more than 20 years of experience that ranges from MVS systems/application programming to building component software and end-to-end applications on distributed platforms for worldwide customers. He holds a degree in mathematics from Georgetown University in Washington, D.C. His areas of expertise also include J2EE Development and WebSphere Application Server Administration.
Robyn Nostalgi is an IT Software Support Specialist working in the IBM Support Center in Sydney, Australia, and she has been in this role for more than 10 years. She has specialized in supporting customers that run WebSphere Application Server for z/OS. She has also worked on the zSeries Software Support team, providing defect and non-defect support for all software components related to the z/OS operating system.
Rajesh Pericherla is a system tester and lead strategist in the WebSphere for z/OS SVT in the WQCoC organization. He has been working with this group for the last eight years. He is responsible for planning the system tests for the latest WebSphere releases on all supported platforms with the main focus on z/OS. He holds a Master of Science degree in Computer Engineering from Walden University.
Special thanks to Bob St. John of zSeries Performance, IBM Poughkeepsie, for his additional chapter about performance problem analysis. Thanks to all the contributors to the previous IBM Redbooks about WebSphere for z/OS Problem Determination, and the support teams of the IBM International Technical Support Organization, in particular (and in no specific order): Ash Venkatramen, Keith Kopycinski, Tamas Vilaghy, Patrick C. Ryan, Andrew Lam, Youn Chin Mah, Ralph Schipani Jr., Brent Watson, Ron Allan, Dave Clarke, James Bai, Paola Bari, DongJune Choi, Mike Cox, Alberto Gonzlez Dueas, John Hutchinson, Wilhelm Michel, Theresa Tai, Egon Terwedow, Ella Buslovitch, Rich Conway, Don Bagwell, Keith Kopycinski, Nancy Trent, Michael Stephen, Hany Salem, Peter Bertolozzi, Melinda Carter, Mike Schwartz, Dave Griffiths, Christopher Vignola, Timothy Spewak, Stephen J Kinder, Mark Dinges, David Follis, Timothy Kaczynski, Teddy J Torres, Scott Kurz, Louis Wilen, Maria Clarke, Edward McCarthy, Kenneth Irwin, Al Schwab, Forsyth Alexander, Don Brennan, Tessa Nguyen, and many more working in z/OS and WebSphere for z/OS Development and Technical Support worldwide.
xx
Comments welcome
Your comments are important to us! We want our Redbooks to be as helpful as possible. Send us your comments about this or other Redbooks in one of the following ways: Use the online Contact us review redbook form found at: ibm.com/redbooks Send your comments in an email to: redbook@us.ibm.com Mail your comments to: IBM Corporation, International Technical Support Organization Dept. HYTD Mail Station P099 2455 South Road Poughkeepsie, NY 12601-5400
Preface
xxi
xxii
Part 1
Part
Chapter 1.
Ask questions
Analyze documentation
10
No
No
Have there been any changes to the environment such as network topology, hardware configuration, or increase in the number of users? Have we made changes to the back-end systems that we are connecting to? Have any new applications been deployed, changed, or upgraded? Have we run this application, server, or system successfully before? When did the symptoms first appear? When the system was under peak load? After backup jobs? Can we reproduce the error? Asking these types of questions can help you eliminate potential causes early in the process. The answers to the questions form part of your symptom data. 3. Gather the documentation. The type of information that you gather depends on the type of symptoms that you are experiencing, but essentially what you are doing at this stage is collecting evidence of the problem. So, if the symptom is an error message, then you must obtain and examine the log or trace that shows the message. We recommend that you complete the following tasks as part of the gathering process: Document your problem determination steps. Keep a log of symptoms, messages, files, tests, results of tests, and conclusions. Retrace the steps to recreate the problem and see the results yourself. Understand the meaning behind the request that has created or that has induced the problem. This helps you isolate the problem. Use a controlled test environment when possible. Use the MustGather documentation for information about what data to collect. MustGather is a term that is used to describe the essential or minimum problem documentation that is required to analyze a problem. See MustGather on page 16. Knowing what data to collect and how to collect it can be difficult. We guide you through this in Part 2, Problem symptoms and their resolutions on page 39 where we analyze symptoms in detail with the use of symptom-specific flow charts. Sometimes your problem is very serious and your expertise in the product area is limited. Then you might choose to go directly to the step of calling IBM support rather than try to analyze your own trace or dump. 4. Analyze the documentation. The documentation that you obtain will depend on the type of symptom and what is enabled in your system. Some output is available by default; others, like traces, might have to be enabled. How to enable the different output received from an error is covered in Part 4, Problem Determination Means and Tools on page 193. Symptoms such as abend, loop, and incorrect output are often accompanied by messages, or you can find indications in traces and logs. Check the data that you have collected for messages or other indications. Analyze the messages, logs, and traces. Part 2, Problem symptoms and their resolutions on page 39 discusses what output is available given a particular symptom and how to analyze it. 5. Determine whether the problem is recorded in product documentation and, if so, what corrective action is recommended. In this step, check the product documentation, such as the product reference manuals and product Web sites, to determine whether your problem is documented. In the case of an
error message, the product documentation might describe the reason for the error and offer possible corrective action. 6. Take corrective action. The action you take depends on the cause and the recommended solution to your problem. The typical outcome or action that you can take falls into the following categories: The product works as designed. In this case, you can accept the design and adjust your system accordingly, or you can request a design change. This is an official request to change the product design that is assessed by technical staff, usually the product developer. You find a workaround for your problem. This means that changes must be made to your WebSphere for z/OS system to circumvent the problem. In some cases, this workaround is the solution. In other cases, it is a temporary solution until a permanent fix is found. You find a problem scenario or symptom that is described in an authorized program analysis report (APAR). In this case, you apply the fix (program temporary fix, or PTF) associated with the APAR to correct the problem. If your problem scenario or symptom is not found on the WebSphere for z/OS IBM support page, consider these possibilities: This is a new WebSphere Application Server for z/OS problem, which should be reported to IBM so that they can produce a fix. It is a user error. This includes configuration, setup, or procedural error. This must be corrected by the user. It is an application problem. This should be presented to the application owner or developer to correct.
7. Consult reference information sources. Information sources come in many forms: a product manual, a Web site that contains links to product fixes, a colleague with specialized skills, an online technical forum, and IBM software support. Refer to Chapter 3, Information sources on page 25, where we outline what information sources are available to you and how you can get access to them. If your symptom is an error message, check the meaning of the message in the product manuals because this might point you to the exact cause of the error and tell you what is required to fix it. If not, you can access IBM support data and search for your symptoms in hopes of finding other, similar problems reported by customers. These problem records can tell you what was done or recommended to fix the problem. 8. Identify the problem and solution. Using the information sources, you might have identified the problem and found a solution and can now take corrective action. If you have not been able to identify the problem or find a solution, you need to prepare and gather the problem determination documentation. 9. Prepare and send problem documentation. If, after consulting your information sources, you are unable to determine the exact problem or to identify the cause, then you should forward all problem documentation to IBM support.
10.Contact IBM support. Refer to Chapter 2, Contacting IBM: Information on page 13 for the options that are available when you must contact IBM. We also explain the WebSphere support teams and structure.
Share information
Communicate
Type text
Type text
Type text
Type text
Type text
Type text
Networking/TCP/IP
Security/RACF
Cooperate
You should also ensure that there are sufficient systems programming and application deployment skills and experience, because WebSphere for z/OS utilizes most of the advanced features and functions of the operating system. A list of these functions is available in WebSphere Application Server for z/OS Version 6.0.1: Migrating, coexisting, and interoperating, SA23-2207. You must have systems programming skills in all of these areas. If you try to set up the WebSphere run time without good skills or assistance in these areas, you are likely to experience many frustrating problems and delays. See 3.6, Educational information on page 36 for resources that can help you improve your skills for WebSphere for z/OS problem determination.
Logger: To set up log streams for Resource Recovery Services (RRS) and the WebSphere error log. Parallel Sysplex: To implement multi-system configurations. RRS: To implement RRS and support two-phase commit transactions. Automatic Restart Manager (ARM): To set up an automation process for stopping and starting the WebSphere runtime environment. Although deemed to be optional, it is crucial to have all operational processes automated in a multiple logical partitions (LPARs), multiple application servers environment. WebSphere: To customize and set up WebSphere administrative servers and application servers and configure WebSphere resources as required by your application.
10
The Theory
Design Build Assemble Deploy
Reality
Design
Build
Assemble
Deploy
It is unlikely that any one person can possess all these skills. It takes a team of specialists to set up the WebSphere run time and run the server. For specific courses and an organized view of the curricula, see the class catalogs at:
http://www.ibm.com/services/learning/
11
12
Chapter 2.
13
Frontend
Front office (domestic) teams Back office experts
Backend
Change Teams and Development Teams usually in labs
Entitlement
If a problem is reported in a Problem Management Record (PMR), this usually happens through the problem-entry help desk or front-office teams. The next level is the front-end support personnel, who usually have broader skills with IBM software products and a national language approach. If more in-depth skills are needed, the back-end becomes involved (for example, IBM software laboratories), where the software is developed and necessary code changes are made. Communication between front-end and back-end support works very well because of the worldwide IBM network and the fact that the teams usually know each other well and use all communication vehicles. 14
Problem Determination for WebSphere for z/OS
15
WebSphere Application Server related product support Access APARs, Technotes, and PTFs; register to receive e-mail notifications about technical alerts or new downloads; and use an advanced search feature that searches all IBM knowledge bases, such as redbooks and Information Centers. MySupport Register to receive e-mail notification about critical issues, IBM product updates, and items of interest. 11.Link2000 For IBM Eserver zSeries users with installations that have access to Link2000 (previous IBMLink), an interactive online database program, you can: Search for an existing authorized program analysis report (APAR) that is similar. Search for an available program temporary fix (PTF) for the existing APAR. Order the PTF if it is available. developerWorks WebSphere This gateway to WebSphere technical information for developers and administrators features: Zones and road maps for specific products, in-depth technical articles, tutorials, white papers, and links to downloads, technical previews, steps to getting support for WebSphere Application Server, and plug-ins Latest news about WebSphere products and offerings
MustGather
MustGather documents help with problem determination and save time when you are resolving PMRs. These documents, which are located on the product support site, include instructions about what documentation to gather for specific problems. You can find MustGather documents by searching for the word mustgather at the support Web site:
http://www.ibm.com/software/webservers/appserv/zos_os390/support
These are some of the MustGather documents for WebSphere for z/OS that might help you: MustGather: Read first for WebSphere Application Server for z/OS MustGather: High CPU causing hang or loop running V5 for z/OS MustGather: Plug-in regeneration problems for V5.0 and V5.1 MustGather: Plug-in problems in V5.0 and V5.1 on z/OS MustGather: System management for synchronization failures MustGather: System management discovery problems MustGather: wsadmin problems in V5 MustGather: Administrative console problems MustGather: A hang occurs when running WebSphere Application Server for z/OS Mustgather: Security problems with WebSphere Application Server z/OS V5 MustGather: ABENDEC3 RC=413000x for 4.0, 5.0 and 5.1 for WebSphere Application Server for z/OS
16
Collector Tool
IBM WebSphere Application Server, Version 6.0.x on AIX, HP-UX, Linux, Sun Solaris, and Microsoft Windows provides a collector tool that you can use for z/OS as well. Run it for all application servers and the deployment manager. The collector tool gathers extensive information about your WebSphere Application Server environment and packages it in a JAR file that you can send to IBM support to help determine and analyze your problem. Information in the JAR file includes logs, property files, configuration files, operating system and Java data, and the absence or level of each software prerequisite. The collector program runs to completion despite any errors that it might find. Errors might include missing files or commands. The collector tool collects as much data in the JAR file as possible. The collector tool has two phases. The first phase runs the collector tool on your WebSphere Application Server and produces the JAR file. The IBM support team performs the second phase, which analyzes the JAR file that the collector program produces. For more information about the collector tool and how to run it, search for Gathering information with the Collector tool in the WebSphere Application Server, Version 6.0.x, Information Center at: http://publib.boulder.ibm.com/infocenter/wasinfo/v6r0/index.jsp Tip: Entering Collector-summary output into an electronic service request (ESR) eliminates waiting on the phone to provide general information to IBM support level 1.
You should mention major configurations such as the following, if relevant: Monoplex or Sysplex Global security Clustered or non-clustered application Has the problem happened before, or is this an isolated instance? What steps led to the failure? Can the problem be recreated? If so, what steps are required? Have any changes been made to the system (hardware, network, or software)? Were any messages or other diagnostic information produced? If yes, what were they (for example, trace record or dump output)? It is often helpful to have a printout of the message ID numbers for any messages that you received before calling IBM. The most common data that IBM requests includes: System log (SYSLOG): The z/OS system log, which has assorted system error messages and a few WebSphere error messages.
17
WebSphere Server job logs: Application server job logs contain most of the configuration settings, stderr and stdout messages, and CEEDUMP and snap dumps. Job logs for deployment manager, node agent, and daemon servers might be required, depending on the nature of the problem. WebSphere error log: Target for WebSphere error messages. Dump data sets: If a system abend occurred, a JVM transaction dump and SAN Volume Controller (SVC) dump might be captured. In most cases, IBM asks you to compress (terse) and FTP them to designated FTP site. In situations where you must cancel a servant region to overcome a problem, be sure to request an SVCdump when you cancel. Component trace (CTRACE) message log: If more detailed information is required, IBM support asks you to have the component trace writer and debug turned on to display more detailed trace information in the message log. We describe how to obtain this data in Chapter 19, Logs for problem determination in WebSphere for z/OS on page 213 and Chapter 20, WebSphere for z/OS traces and dumps on page 241. You can also find more information at the WebSphere for z/OS Information Center: http://publib.boulder.ibm.com/infocenter/wasinfo/v6r0/index.jsp Define your technical question in specific terms and provide the version and release level of the products in question.
Documentation is incorrect.
18
You can use the Web problem submission tool to submit an ESR at the following site:
http://www.ibm.com/software/support/probsub.html
You can use IBMLink 2000 to submit your own electronic version of the PMR. You must be a registered IBMLink 2000 user to use this option. IBMLink 2000 is also referred to as servicelink. For more information, go to:
https://www.ibm.com/ibmlink/link2/logon/logonPage.jsp
Regardless of which option you choose, all software support calls for z/OS software products are recorded in the IBM Remote Technical Assistance Information Network (RETAIN) system. This system, which is used worldwide by all of the support teams, is a very effective communication tool for IBM support teams. The advantage of placing an electronic call using ESR or IBMLink 2000 is that you can view the updates in the record and monitor the status of your request.
19
additional information or the support team might request other problem data or material during the life of the problem. If the problem is very difficult, to rule out possible causes, it might be necessary for IBM support to gather different data, such as traces, to isolate the problem. If your problem is related to configuration, you might have to recreate the problem to obtain necessary information. During this investigation process, the resolution team determines whether your defect issue falls into one of three categories: It is the result of a software defect that has been reported previously. A fix or workaround is provided to circumvent or correct the issue. If none is available and it is determined that one is required, the resolution team works with you to find the best feasible workaround. The resolution team advises you when the defect APAR is closed, assists with the implementation of the fix, and updates your problem record. It is the result of an IBM software defect that has not been reported before. The Resolution Team works with you to create an APAR or Software Problem Report (SPR) to track the resolution of the defect. These APARs and SPRs are routed to the appropriate development teams. Because of the complexities of the environments supported and the development, verification, and testing resources required, defect fixes might require an extended period of time for resolution. For high impact problems, the resolution teams make every effort to develop a workaround that you can use until your APAR or SPR has been resolved. It is a problem that is not related to a defect. If the problem is not related to a software defect in supported IBM code, then the Resolution Team might seek a solution only at the request of the customer under a separate service agreement.
20
TRSMAIN/Packlib
Using TRSMAIN (also known as Packlib) for compressing data is the most common method of compressing files for the z/OS environment. A big advantage of data sets in tersed (packed) format is that the data set attributes are stored and the file can easily be uploaded and untersed (unpacked) in another z/OS system without guessing the DCB parameters. You can download TRSMAIN from:
ftp://ftp.software.ibm.com/s390/mvs/tools/packlib/
If you send a data set using TRSMAIN, be sure to provide IBM support with values such as LRECL, RECFM, BLKSZ, and space requirements. After installing TRSMAIN, use the sample job control language (JCL) shown in Example 2-1 on page 22 as a basis for creating your own job with proper modification to PACKLIB_PDS, &input_dataset, &tersed_output to compress &input_dataset into its compressed format.
Chapter 2. Contacting IBM: Information
21
Example 2-1 Job to invoke TRSMAIN //PACKIT JOB 'ACCOUNTING INFORMATION',NOTIFY=&SYSUID. //**************************************************** //* * //* TRSMAIN with PACK option * //* * //**************************************************** //JOBLIB DD DISP=SHR,DSN=&PACKLIB_PDS //STEP EXEC PGM=TRSMAIN,PARM=PACK //SYSPRINT DD SYSOUT=H //INFILE DD DISP=SHR,DSN=&input_dataset //OUTFILE DD DISP=(NEW,CATLG),UNIT=SYSDAL, // DSN=&packed_output, // SPACE=(CYL,(ppp,sss),RLSE)
JOBLIB DD can be eliminated if &PACKLIB is included in the LNKLST concatenation. The &input_dataset in INFILE DD must be modified with the proper name of the data set that needs to be compacted, and &packed_output in OUTFILE DD must be modified with the data set name accordingly. The ppp and sss are the primary and secondary spaces for the output data set. You can also do that using an ISPF dialog by entering the program name in the command line and entering information in the fields of the related panels.
ZIP file
This format is especially relevant for personal computer files. You can also put multiple files in a so-called ZIP archive. The most common programs used for this approach are Winzip and PKZip. These can be found at:
http://www.winzip.com
http://www.pkzip.com Pack the files into a ZIP archive that is executable (EXE file) so that the recipient of the file can extract it properly without having Winzip or PKZip installed.
22
The following site provides additional information about file upload and download procedures: http://www.ibm.com/de/support/ecurep/mvs.html
PMR12345.CONFIGFILES.TAR
These conventions might be slightly different in various IBM geographies and regions. IBM support personnel can advise you about how and with which naming conventions data should be sent to the FTP server.
23
24
Chapter 3.
Information sources
In addition to this IBM Redbook, other documentation, such as books and Web sites, is available for WebSphere for z/OS and supporting components. This chapter describes some of the resources available that the authors have found very helpful for solving problems in the WebSphere for z/OS environment.
25
Figure 3-1 IBM WebSphere Application Server for z/OS home page
Click the links in the gray navigation bar on the left side of the page for specific information categories such as system requirements, the library (manuals and Information Center), and services.
26
27
This page lists the WebSphere product manuals. Select Show from the WebSphere Application Server - z/OS section to link to the Information Center. This action also displays a list of the product manuals for z/OS that are available in PDF format. At the time of publication, the following guides were available: Program Directory, GI11-2825 Migrating, Coexisting, and Interoperating, SA23-2207 Installing Your Application Serving Environment, GA22-7957 Administering Applications and Their Environment, GA22-7962 Setting Up the Application Serving Environment, GA22-7958 Using the Administrative Clients, SA23-2208 Securing Applications and Their Environment, SA22-7961 Developing and Deploying Applications, SA22-7959 Troubleshooting and Support, GA22-7964 Tuning Guide, SA22-7963 Attention: There is no longer a Messages and Codes manual. For messages and codes, see the Information Center or Appendix A, Messages and codes on page 311.
28
Figure 3-4 IBM WebSphere Application Server for z/OS Information Center page
29
To limit the search scope: 1. 2. 3. 4. Click Search scope at the top of the page. In the window that opens, click New. Define a list name. Select WebSphere for z/OS.
To look up messages and codes, either search for the particular message or code in the Information Center or go to the Contents panel and select Reference Troubleshooter Messages (Figure 3-5). Then, select a tab according to the first few letters in your message code. See Appendix A, Messages and codes on page 311 for WebSphere for z/OS messages and their code explanations.
30
31
32
Important: These documents are not z/OS-specific and some of the tools might not be available for all platforms.
3.4.3 Java
At the Java community process Web site, you can access many hints and tips for coding J2EE applications: http://jcp.org/ The Web site includes a reference section with specifications, white papers, and other Java-related information. At the SUN Java technology home page, you can access first-hand information directly from the founder of Java technology:
http://java.sun.com
Sun Java Technology still is a major contributor to the Java community. The Web site contains technical information, specifications, examples, and references for J2EE. There is also a Java documentation Web site:
http://java.sun.com/j2se/download.html
For information and downloads about J2EE, see J2EE Software Development Kits (SDK) J2EE Application Programming Interface (API) documentation J2EE platform specification See the IBM developerWorks and alphaWorks domains for more Java-related information.
33
34
Message tables are also provided in Appendix A, Messages and codes on page 311.
35
You can view the books in HTML format or you can download them in PDF format for easy reference and printing. For System z and zSeries soft copy information, see:
http://www.ibm.com/servers/eserver/zseries/softcopy/
Click Training to browse the: Course catalog e-Learning Blended learning Save money On-site training
36
Figure 3-9 IBM Education Assistant WebSphere for z/OS: Problem determination
37
38
Part 2
Part
39
40
Chapter 4.
Native code errors happen when the base code layer that supports WebSphere
components encounter irrecoverable conditions. Native code errors have message IDs that are accompanied by descriptions. By design standards, the first four letters of a z/OS WebSphere message ID identifies the module or submodule belonging to a component.
Java exceptions happen when errors arise in either the code that makes up WebSphere components or in the application that runs in the WebSphere application server.
41
Java exceptions follow the package, class, and method name convention, meaning library, component, and module. Tip: Inspect your system logs for potential problems or early warnings as a daily routine. This habit makes you more familiar with your WebSphere environment because you can see small changes in behavior and warning messages early in the process, which might save you from having to fix big problems later on.
Java exception?
3 No
8 Get package
message
No
10 Contact develop-
ment team
16 Contact IBM
Support
Yes
13 Take corrective
15
action
Yes 12 11 Check
14 Search IIBM No
support data
As you can see, each box has a number referring to the analysis in more detail.
42
43
If the exception has a function name with a minor code, usually in the C9C2xxxx format, then it is a native code component that is wrapped by a Java class. Often, a Java wrapper class throws an exception on behalf of a native component and that is why you follow the stack trace to the original point of error. 7. Iterate through the stack trace. When you read the trace (in the log), you see a chain of method calls, from class to class, leading to the one that takes the exception. Stack traces are in life order, so the last entry is always on top of the stack. Go to the last entry in the stack trace. Figure 4-2 is a sample of a stack trace that was received in the log. The token that you should notice is A, Exception. This keyword indicates that it is a Java exception. 8. Get package name and method. The last entry in the stack trace always has the last method that was in memory when the error occurred. The method name, its class, and its package name tell you the owner or provider of the package and are usually descriptive enough to hint at the root cause of the problem. Item A in Figure 4-2 is the object type (kind) of the exception caught. It says what kind of work was being performed when the error occurred. Our example shows a Structured Query Language (SQL) call to the database that took the exception.
A SQL Exception: Schema 'TRADER' does not exist at db2j.ai.j.generateCsSQLException(Unknown Source) at db2j.ai.g.wrapInSQLException(Unknown Source) at itso.j2ee.trader.servlet.TraderSuperServlet.handlePerformLogon(TraderSuperServlet.java: 427) at itso.j2ee.trader.servlet.TraderSuperServlet.performTask(TraderSuperServlet.java :303) at itso.j2ee.trader.servlet.TraderSuperServlet.doPost(TraderSuperServlet.java:78) at javax.servlet.http.HttpServlet.service(HttpServlet.java(Compiled Code)) at javax.servlet.http.HttpServlet.service(HttpServlet.java(Compiled Code)) at com.ibm.ws.webcontainer.servlet.ServletWrapper.handleRequest(ServletWrapper.jav a(Compiled Code)) at com.ibm.ws.webcontainer.servlet.CacheServletWrapper.handleRequest(CacheServletWra pper.java(Compiled Code)) at com.ibm.ws.webcontainer.WebContainer.handleRequest(WebContainer.java(Compiled Code)) at com.ibm.ws.webcontainer.channel.WCChannelLink.ready(WCChannelLink.java(Compiled Code)) at com.ibm.ws.http.channel.inbound.impl.HttpInboundLink.handleDiscrimination(HttpI nboundLink.java(Compiled Code)) at C
Item B tells you more about the class library and package where that code came from; it is application code in this example. Item C shows what a WebSphere component Java package looks like. It starts with com.ibm.ws, which represents the WebSphere code libraries. It does not cause the exception in our example but shows the difference in structure to the application code packages. 9. Does the package have a format of com.ibm.*? The package name of a system application (WebSphere product code) starts with com.ibm.ws.* (item C in Figure 4-2). A message with the format of com.ibm.* indicates an IBM product package. All other package names come from third-party applications that are running in WebSphere for z/OS.
44
Does your exception method show a package format of com.ibm.*? If no, then contact your application developer; see step 10. If yes, then it is a WebSphere system exception; go to step 11. 10.Contact development team. If the package name is not from IBM, then it is from a third-party or your in-house application. In that case, you can contact your development team for support. The roles in the WebSphere environment overlap. By development, we mean your developers, system administrators, and systems programmers. Provide them with the trace (the method name, the package, and the class that threw the exception) so that they can determine the owner of the application that is causing the problem and what application component needs to be fixed. 11.Check messages and codes. With either the message number, the embedded message number, the minor code, or the exception method and class, search for more information at the WebSphere Information Center: http://publib.boulder.ibm.com/infocenter/wasinfo/v6r0/index.jsp If your (embedded) message number starts with BBOO (component ID), then it is a WebSphere for z/OS product component message. See A.1, WebSphere for z/OS message codes on page 312. Embedded messages in a BBOO0222I message are Java component messages. For identification of the specific component, see A.1.1, Specific Java component messages on page 312. If you have a minor code error, such as one that starts with C9C2, you are dealing with WebSphere for z/OS product code. See A.1.2, Minor codes on page 314. In the case of a WebSphere component (com.ibm.ws.*) exception, you can copy the class, method names, and verbiage from the exception and use them as your search token at the Information Center. The Information Center has many more messages and codes and the explanations of them, along with hints and tips for solving specific problems. Use it as your first point of reference when you encounter a problem. For example, assume that you received this message in your syslog: 16.18.08 STC06476 BBOO0038E Function CTRACE-DEFINE failed with RC=12, REASON=00001901, EXTENDED REASON=00000000 The BBOO prefix indicates a WebSphere for z/OS component failure. Search for BBOO0038E and the specific return and reason codes in the Information Center at: http://publib.boulder.ibm.com/infocenter/wasinfo/v5r0/index.jsp?topic=/com.ibm. websphere.zseries.doc/info/zseries/ae/bboo02.htm The result is a description of the error (Example 4-1).
Example 4-1 Description of BB00038E search
BBOO0038E Function string failed with RC=dstring,REASON=hstring, EXTENDED REASON=hstring. Explanation: WebSphere for z/OS failed as indicated and that function completed with a decimal return code indicated by RC, a hexadecimal reason code indicated by REASON, and an extended hexadecimal reason code indicated by EXTENDED REASON. User Response: Consult the function indicated in the OS/390 C/C++ Library Reference, OS/390 MVS Programming: Assembler Services Reference, OS/390 MVS Programming: Authorized Assembler Services Reference, or other appropriate z/OS reference book for a description of this error.
45
If you search the general IBM support Web site, you can find an entry (Figure 4-2) with a similar description.
Example 4-2 Result of searching the general IBM support Web site
How to manage operator message routing in WebSphere for z/OS Version 5 SYSC SERVER= none ./bbortbuf.cpp+513 ... BBOO0038E Function CTRACE-DEFINE failed with RC BossLog ... SERVANT PROCESS THREAD COUNT IS 6. HRDCPYDD BBOO0038E Function CTRACE-DEFINE failed with R BBOJ0011I This technical document described the problem and the solution for the sample error. The authors were able to use the document to fix the problem. We routed hardcopy and default messages to specific data sets instead of sending them to the console (SYSLOG), which creates a lot of extraneous messages. We had to configure another data set (file) to hold the messages. 12.Have you identified the problem and solution? After searching the WebSphere Information Center, did you identify the problem? Have you found information that matches your symptom data? Have you found a fix for your problem? If yes, then take the corrective action that is described in step 13. If no, then you need to do more research at the IBM support Web sites as described in step 14, and if your research is not successful, prepare to contact IBM support. See step 15. 13.Take corrective action. The information you find using the IBM support data might provide the following solutions: An existing APAR and PTF fix for your problem is available for you to apply. Other reports of your symptoms provide a procedure for fixing the problem. In such cases, follow the instructions that are provided or apply the information to your specific problem to solve it. 14.Search IBM support data. Search IBM support Web sites and databases, specifically the WebSphere for z/OS support site at: http://www-306.ibm.com/software/webservers/appserv/zos_os390/support/ From this site, you can follow several links to other support sites related to WebSphere Application Server, its components, and z/OS. When you search problem databases for information or fixes related to an error or exception, keep in mind that they are reported in many formats (sometimes with return codes or reason codes). Therefore, you might alter your search keyword to find a match. For example, you might search for EC3 abend on the WebSphere for z/OS support site and receive a list with a number of documents associated to EC3 abend, including: PK04379: SERVANT REGION ABENDS WITH EC3, REASON CODE 04060012 WITH SMF AND HIGH VOLUME ENVIRONMENT Click that particular link to access this Web site: http://www-1.ibm.com/support/docview.wss?rs=404&context=SS7K4U&dc=DB500&q1=EC3& uid=swg1PK04379&loc=en_US&cs=utf-8&lang=en This document describes a problem, explains the reason, and recommends applying maintenance, while specifying the service level and APAR number to download. After you follow these recommendations, you restart the application server, and your server runs successfully without issuing another abend. Your problem is solved. Document the
46
problem and the fixes that you have applied in your system change documentation for your specific WebSphere for z/OS environment for later reference. Any other messages with the YYYYxxxxZ format are usually documented and maintained in IBM documentation. See A.2, System and component message table on page 315 to identify other IBM products (such as z/OS components or subsystems) that might have created your particular error message. Go to the specific product manuals as indicated in the Appendix, or search the IBM Software support Web site at: http://www-950.ibm.com/search/SupportSearchWeb/SupportSearch?pageCode=SPS See also Chapter 3, Information sources on page 25, which provides many valuable links and resources for solving problems in WebSphere for z/OS. After you have exhausted all resources and can find no apparent fix for your problem, proceed with step 15 to prepare to contact IBM. 15.Assemble MustGather documentation Prepare the problem documentation, referred to as MustGather, for IBM support. For more information about MustGather, see MustGather on page 16. Read MustGather: Read first for WebSphere Application Server for z/OS for help with assembling the appropriate documentation. You can find this at: http://www.ibm.com/software/webservers/appserv/zos_os390/support The minimum information that is necessary is: Problem description Include information that is related to when the problem first started to occur. Software version and maintenance (build) level You find this information in the job log of your application server. Search for build level, and you obtain a line similar to this: BBOM0007I CURRENT CB SERVICE LEVEL IS build level cf10515.05 release WAS601.ZNATV date 04/15/05 12:55:41. Operating system version and maintenance (PUT) level The job log of the abending address space, including both controller and servant region job logs Any dumps or traces that were triggered by the problem See also 2.3, Before you contact IBM support on page 15. Then proceed with step 16 to contact IBM Support. 16.Contact IBM support To contact IBM support, refer to Chapter 2, Contacting IBM: Information on page 13, for instructions. Provide the information that is outlined in the MustGather documentation step.
47
48
Chapter 5.
Abend
This chapter explains what an abend is and provides a flow chart and step-by-step descriptions that can help you analyze the problem and find its cause. We also mention the analysis tools and reference information sources that are related to this symptom.
49
Yes
7 No 6
No
10
Yes 5
No
11
Locate svcdump
for reoccurence
13
Yes
12
Dump captured?
No
14
50
10.39.40 S0103633 IEA995I SYMPTOM DUMP OUTPUT 598 SYSTEM COMPLETION CODE=0D5 REASON CODE=00000021 TIME=10.39.40 SEQ=23671 CPU=0000 ASID=0077 PSW AT TIME OF ERROR 072C1000 AE86B6C4 ILC4 INTC 21 ACTIVE LOAD MODULE ADDRESS=2E863000 OFFSET=000086C4 NAME=BBOCOMM DATA AT PSW 2E86B6BE - D1B458F0 5098B218 F0005820 GR 0: 00000018 1: 7F1041B8 2: 30EBCDE0 3: 00FF8C90 4: 00000101 5: 7EEEA5F0 6: 30EBCDE0 7: 7EEEA5F0 8: 00000001 9: 0000000C A: 00000000 B: AE86AF80 C: 2E86C0E8 D: 7F103F28 E: 0000030F F: 0001A80D END OF SYMPTOM DUMP Figure 5-2 Example of IEA995I message with abend code
The abend is also recorded in the sys1.logrec file and is extracted into a report using the Environmental Record, Editing, and Printing (EREP) program utility. The EREP report provides more information than is contained in the IEA995I message. The EREP report is particularly helpful when you have a series of abends. This is because it assigns a sequence number to each abend that makes it easier to identify what the first abend was. For more information about EREP, refer to the z/OS Internet Library at: http://www-03.ibm.com/servers/eserver/zseries/zos/bkserv/ On the PDF line under z/OS elements and features publications, click your z/OS version. Under Elements and features, click EREP to find links to: EREP V3R5 Reference, GC35-0152 EREP V3R5 Users Guide, GC35-0151 If WebSphere for z/OS issues a user completion code (abend), the way that the abend is recorded in the job log varies. If you are unable to find the abend message, you should try searching for other keywords such as completion, code, or interrupt. An example of an abend EC3 can be seen in Example 5-1.
Example 5-1 Example of EC3 abend found in job log BPXP018I THREAD 24CAD40000000023, IN PROCESS 83886126, ENDED WITHOUT BEING UNDUBBED WITH COMPLETION CODE 4FEC3000, AND REASON CODE 04130007.
An abend or user completion code might have a Return code and a Reason code associated with it. 2. Extract abend code and module name. After you have located the abend message, record the abend code and module name. The same abend code can be issued for many components, so the abend code alone is
Chapter 5. Abend
51
usually not conclusive. From the module name or prefix, identify the component, subsystem, or product. In the case of Java code you might see a class path name. When the module name that is recorded is not helpful or shows as unknown in the job log, you can find it in the EREP report. If you are unable to determine the module name, then verify it using an SVCDUMP. See Step 11. 3. Check message and codes manuals. An abend code is either a z/OS system completion code or a user completion code: z/OS system completion code This code is documented in the MVS System Codes manuals. The manual for this case is z/OS V1R6.0 MVS System Codes, SA22-7626-10. You can also consult z/OS MVS Diagnosis: Procedures, GA22-7587, a white paper with flow charts and step-by-step help that you can use to find a problem in the MVS operating system. User completion (abend) code This code is documented in the specific manuals of the IBM component, subsystem, or product that issues the user completion codes. To determine the component, you can consult a table in A.2, System and component message table on page 315, which lists the message prefix and the issuing component. The current abend codes specific to WebSphere Application Server for z/OS are CC3, DC3, and EC3 (Table 5-1). The full code descriptions are documented in A.1, WebSphere for z/OS message codes on page 312, and at the WebSphere Application Server for z/OS Information Center: http://publib.boulder.ibm.com/infocenter/wasinfo/v6r0/index.jsp
Table 5-1 WebSphere related codes Abend Code CC3 DC3 EC3 Issuer daemon processing failure controller region processing failure servant region processing failure
If a reason code is passed along with these abend codes, you can refer to the WebSphere Application Server for z/OS Information Center to obtain an explanation and determine the course of action. Table 5-2 shows an example of what you can find for abend reason 0001000.
Table 5-2 Abend reason code and explanation Abend code CC3 Abend reason 0001000 Explanation BBORFRR routine was loaded into the wrong address. The routine should be in common. Suggested action The product was built or installed incorrectly. BBORFRR should reside in LPA and not be included in the STEPLIB/JOBLIB of the WebSphere for z/OS daemon address space.
Search for abend (reason) codes. If no explanation is given in the reason code and no indication is found in any information source, report the problem to IBM. 4. Is the information documented and conclusive? Did you find the information in the messages manuals? Was the information adequate to identify the problem? Do you have enough information to correct the problem? If yes, then take corrective action; see step 5. If no, then search the IBM support pages; see step 6. 52
Problem Determination for WebSphere for z/OS
5. Take corrective action. The messages and codes should provide the explanation of the abend code and a hint about how to respond. Processing this information should be sufficient to resolve the problem. You must restart the application server and check the logs for information about successful startup or other problems that you might encounter. In you have another abend, start to analyze it by going back to step 1. 6. Search IBM support data. Search IBM support Web sites and databases, specifically the WebSphere for z/OS support site at: http://www-306.ibm.com/software/webservers/appserv/zos_os390/support/ From this site, you can follow several links to other support sites that are related to WebSphere Application Server, its components, and z/OS. See also Chapter 3, Information sources on page 25, which provides many valuable links and resources for solving WebSphere for z/OS problems. When you search problem databases for information or fixes related to an abend, keep in mind that abends, return codes, and reason codes are reported in many formats and you might have to alter your search keyword to find a match. For example an EC3 abend might be reported as: Abend: ABENDEC3 or SEC3 or EC3 or 4FEC3000 Return Code: RET4 or RC04 or RC4 Reason Code: REASON4130004 or RSN04130004 7. Have you identified the problem and solution? After searching the IBM support data, have you identified the problem? Have you found information that matches your symptom data? Have you found a fix for your problem? If yes, then take corrective action as described in step 8. If no, prepare to contact IBM support or analyze the problem data further; see step 9. 8. Take corrective action. The information that you have found using the IBM support data might have provided the following solutions: An existing APAR and PTF fix for your problem that is available for you to apply. Other reports of your symptoms that have provided a procedure for fixing the problem. In those cases, follow the instructions that are provided or apply the information to your specific problem to solve it. For example, you might have searched for an EC3 abend at the WebSphere for z/OS support site and received a list with a number of documents associated with EC3 abend, including: PK04379: SERVANT REGION ABENDS WITH EC3, REASON CODE 04060012 WITH SMF AND HIGH VOLUME ENVIRONMENT If you click that particular link, you gain access to this Web site: http://www-1.ibm.com/support/docview.wss?rs=404&context=SS7K4U&dc=DB500&q1=EC3& uid=swg1PK04379&loc=en_US&cs=utf-8&lang=en This document describes a problem, explains the reason, and recommends maintenance to apply, while specifying the service level and APAR number to download. After you follow these recommendations, restart the application server, and your server should run successfully without issuing another abend. Be sure to document the problem and the fixes that you have applied in your system change documentation for your specific WebSphere for z/OS environment for later reference.
Chapter 5. Abend
53
9. Assemble MustGather documentation for abend. MustGather documents can assist you with problem determination and save time resolving PMRs. For more information about MustGather, see MustGather on page 16. For an abend, you should provide the following material: Problem description Include information related to when the problem first started to occur. Software version and maintenance (build) level Find this information in the job log of your application server. When you search for build level, you obtain a line similar to: BBOM0007I CURRENT CB SERVICE LEVEL IS build level cf10515.05 release WAS601.ZNATV date 04/15/05 12:55:41. Operating system version and maintenance (PUT) level The job log of the abending address space (both controller and servant region job logs) The SVCDUMP triggered by the abend 10.Contact IBM support. If you must contact IBM support, refer to Chapter 2, Contacting IBM: Information on page 13, for instructions about how to do this. Provide the information outlined in the MustGather documentation step. 11.Locate dump. Usually the name of the SVCDUMP data set is recorded in the syslog. If you are not sure whether a dump was captured or which data set the dump was written to, then search for the word dump in the syslog and locate any messages pertaining to the dump. Sometimes searching for dumpid can help find dumps when the word dump is too generic for a certain sysplex. Searching for dumpid results in messages such as this: DUMPID=009 REQUESTED BY JOB (WT3DMGS ) DUMP TITLE=COMPON=WEBSPHERE Z/OS, COMPID=5655I3500,ISSUER=BBOR LEXT,ABEND IN CEEPLPKA/CEEOPCT If there was a problem with capturing the dump, you see an IEAxxx message, such as: IEA911I PARTIAL DUMP ON MVS.O1MP.DMP00056 678 IEA043I SVC DUMP REACHED MAXSPACE LIMIT - MAXSPACE=xxxx MEG In such cases, you should fix the dump problem first before you attempt to analyze the dump because crucial information might not be written to the dump. Also ensure that your WebSphere for z/OS servers have the authority to create and write to the dump data sets. 12.Was a dump captured? Was there a dump? Were you able to locate the dump? If so, then prepare to analyze the dump as described in step 13. If not, prepare to set a SLIP and contact IBM support. Go to step 14. 13.Analyze the SVCDUMP. To analyze the SVCDUMP, invoke the z/OS MVS Interactive Problem Control System (IPCS). There are several methods for analyzing an abend using IPCS and data from the SVCDUMP. The following steps are for only one of these methods; For more information about IPCS, see z/OS V1R6.0 MVS IPCS Commands, SA22-7594-05 and z/OS V1R2.0 MVS IPCS User's Guide, SA22-7596-01:
54
a. Invoke IPCS and verify that you have the correct dump by checking the dump title, date, and time. To display this information, issue this command: ip st validate worksheet Figure 5-3 shows an example of output from this command. I
MVS Diagnostic Worksheet Dump Title: COMPON=WEBSPHERE Z/OS, COMPID=5655I3500,ISSUER=BBORLEXT, ABEND IN BBOORB /UNKNOWN
CPU Model 2084 Version 00 Serial no. 012345 Address 01 Date: 02/18/2005 Time: 12:38:22.102475 Local Original dump dataset: SYSPRD1.PLXA.SVCD.D050218.H123818.C2.N00011 Information at time of entry to SVCDUMP: HASID 04EC PASID 04EC SASID 04EC PSW 070D1000 9C038948
b. Go to the diagnostic data report section and verify the abend code, reason code, and module name. c. Locate the Program Status Word (PSW) address of where the abend condition occurred and verify the module name in the summary format report, which can be obtained with: ip summ format d. Scroll to the bottom of the report. e. Use find previous to locate the RTM2WA SUMMARY and control block data: f 'rtm2wa summary' prev The RTM2WA SUMMARY shows Recovery Termination Manager (RTM) data. This is the time-of-error information (see Figure 5-4 on page 56). Note the PSW address.
Chapter 5. Abend
55
.
. +001C +008C +0094 RTM2WA SUMMARY -------------Completion code 840C4000 Abending program name/SVRB address 007C2070 00000000 Abending program addr 00000000 of error 215D3F00 273C2250 00000000 00000000
GPRs at time 0-3 00000000 4-7 34326EB0 8-11 A667F7AA 12-15 33FE6C50 +007C +00DC
+00E8 Return code from recovery routine-00 Continue with termination-implies percolation +00E0 Retry Address returned from recovery exit 00000000 +00E4 RB Address for retry 00000000 +000C +0038 +00C8 CVT Address RTCT Address SCB Address 00FCB018 00FB24E0 007C4AC0
To determine the thread control block (TCB) address, you must scroll up a little to find the RTM2WA control block data and note the TCBC value. In this example, the PSW is 072C2000 A667FA42. The second word is a 31-bit address. For information about the format of the PSW, refer to z/Architecture Principles of Operation, SA22-7832-03. f. Locate the address in the dump storage. This is done from the IPCS main menu. In our example, we located 2667FA42 as shown in Figure 5-5.
ASID(X'04EC') ADDRESS(2667FA42.) Command ===> 2667FA42 A784 000A181B 2667FA50 FFA95800 D00018B0 2667FA60 B00012BB A774FF76 2667FA70 A784000D A7AA0FF8
STORAGE ---------------------------------9856E0D8 A7F40005 5820487C 9856A0D8 0D764700 18DB58B0 BF1F201C 0D764700 | xd....q.\Q.... | | .z..}...x4...... | | ....x......@.... | | xd..x..8q..Q.... |
g. From the PSW address, try to determine the module name using the eye catchers in the dump (Figure 5-6).
2667E3B0 2667E3C0 2667E3D0 2667E3E0 2667E3F0 F2F0F0F5 F0F2F0F0 9696A299 00005EF0 0D805870 F0F1F1F2 0010E6F5 815D0000 00000080 50485860 F2F0F3F1 F1F0F2F0 00C300C5 90684788 504C4100 F5F2F0F1 F44D8282 00C500F1 A74AFF80 00005810 | | | | | 2005011220315201 0200..W610001(bb oosra)...C.E.E.1 ..;0.......hx.. ....&..-&<...... | | | | |
Figure 5-6 Search for eye catchers in dump storage near PSW address
56
The eye catchers are the ASCII characters that are found to the right of the storage. In this example, you can see the bboosra module name. h. Often obtaining a module name is sufficient, but when WebSphere for z/OS is involved, it is sometimes necessary to go further and find the related method name. Examine the traceback data. Using the TCB from the RTM2WA information, enter the following command: ip verbx ledata 'tcb(007c07f8) nthreads(*)' When the output is displayed, locate the traceback information as shown in Figure 5-7. Traceback: PU Addr PU Offset 2667F7A0 +000002A2 266127B0 36422E28 36426388 36414658 7C200830 7C5DA5B0 7CCF1788 7CCE4618 7CCFD298 +00000072 +0000013C +00000052 +000000A4 +000002BC +00000030 +0000026E +00005C6A +000002DC
Entry E Addr E Offset Statement Load Mod Service SRAggregator::refresh(JNIEnv_*,_jobject*,_jobject*) 2667F7A0 +000002A2 SUBPOOL2 Java_com_ibm_ws390_orb_ORBEJSBridge_refreshSRAggregator 266127B0 +00000072 SUBPOOL2 com/ibm/ws390/orb/ORBEJSBridge.refreshSRAggregator(ILjava/la 36422E28 +0000013C SUBPOOL0 com/ibm/ws390/orb/SRAggregator.getSRObjectElementHT()Ljava/u 36426388 +00000052 SUBPOOL0 com/ibm/ws390/management/ServantMBeanInvoker.invokeSpecified 36414658 +000000A4 SUBPOOL0 INVFRMMI 7C200830 +000002BC *PATHNAM c_invokerFromMMI 7C5DA5B0 +00000030 *PATHNAM mmipSelectInvokeJavaMethod 7CCF1788 +0000026E *PATHNAM mmipExecuteJava 7CCE4618 +00005C6A *PATHNAM mmijExecuteJavaFromJIT 7CCFD298 +000002DC *PATHNAM
Stat Call Call Call Call Call Call Call Call Call Call
The information found in the traceback might be sufficient to find the module or method name. When the traceback provided by IPCS does not go far enough, a tool called svcdump.jar can be used. Refer to 21.2, JVM dump and heap analysis tools on page 254, for more details on how to download and run the svcdump.jar tool. i. With the information obtained from the svcdump.jar tool, such as abend code, module, and method name, determine in which component the abend was taken. You can use this information to debug the module or search IBM support data for related information and possible fix. After searching IBM support data on the Web, we found the PK06080 APAR to address our problem. j. If you cannot find a solution, prepare MustGather documentation and contact IBM. Refer to 20.3, SVC dumps on page 247, for more information about how to use SVCDUMP for problem analysis in WebSphere for z/OS. 14.Set SLIP for SVCDUMP. Dumps can be suppressed by the dump analysis and elimination process. When this is the case, you must set a SLIP to capture a dump when the abend occurs. This is done using the SLIP SET z/OS command. Example 5-2 on page 58 shows a SLIP that was used to capture a dump for an EC3 abend. It uses a wild card for the reason code so that any of the 0413000* abend reason
Chapter 5. Abend
57
codes that occur are allowed. The ASIDLIST is a list of address space IDs (ASIDs) for current, home, primary, secondary, and other address spaces in the dump, should you be in cross memory with them at the time.
Example 5-2 Example for setting a SLIP
SLIP SET,A=SVCD,COMP=EC3,REASON=0413000x,ID=WEC3,MATCHLIM=20, SDATA=(ALLNUC,CSA,GRSQ,LPA,LSQA,PSA,RGN,SQA,SUM,SWA,TRT), ASIDLST=(0,H,I,P,S) An example of a SLIP that is used to capture a dump for a 0C4 abend is: SLIP SET,A=SVCD,COMP=0C4,ID=ROBS,JOBNAME=(WASROBS),END Refer to the z/OS MVS System Commands, SA22-7627-11, for a full description and syntax of the SLIP command. 15.Reproduce or wait for reoccurrence. Sometimes you must have an SVCDUMP to determine the cause of the abend. Therefore, you must set a SLIP and try to reproduce the error. If you cannot reproduce the error, then wait for the problem to reoccur with the SLIP in place. If you are unsure about the most appropriate SLIP, contact IBM support for assistance.
58
Chapter 6.
Hang
This chapter explains what a hang is. The flow chart and step-by-step descriptions that we provide can help you analyze the problem and find its cause. We also mention the analysis tools and refer to information sources that are related to this symptom.
59
10
Yes 9
No 2
Analyze dump
13
14
Identified hang?
No
Yes 5
Figure 6-1 Flow chart for symptom: Hang in the application server
Run a simple test by entering a display command for the WebSphere for z/OS server that you suspect has a hang and wait for a response, for example: MODIFY <server Name>,DISPLAY If you do not get a response, it is very likely that the WebSphere for z/OS server is hung. Check the latest time stamps in the WebSphere for z/OS server logs against the current system time. How much time has passed between system time and last recorded activity? If you get a response from the server or the last recorded activity is close to the system time, then the problem might be with the HTTP server or the application itself. Check whether the request from the browser has arrived at the HTTP server by reviewing the HTTP server access logs and error logs. For more information, see 19.5, IBM HTTP Server logs and trace on page 232. If the HTTP request has arrived at the server and the HTTP server is responding to requests, then it might be the application that is hung. 2. Check and set hang detection variables. WebSphere for z/OS V6 has a thread hang detection option and it is enabled by default. To adjust the hang detection policy values or to disable them, go to the Administrative Console and select Servers Application Servers server_name. Under Infrastructure, select Administration Custom Properties. Then, select New. The properties are: Name: com.ibm.websphere.threadmonitor.interval Name: com.ibm.websphere.threadmonitor.threshold Name: com.ibm.websphere.threadmonitor.false.alarm.threshold For full explanation of the thread detection properties, search for the WSVR0605W message at the WebSphere for z/OS V6 Information Center at: http://publib.boulder.ibm.com/infocenter/wasinfo/v6r0/index.jsp If the WebSphere for z/OS server address space is hung, you cannot set or adjust these properties at this time. You must wait until the hang situation is cleared. 3. Analyze output from hang detection variables. If you have the hang detection variables set, then you can see WSVR0605W messages in your servant region job log output (Example 6-1).
Example 6-1 WSVR0605W message example
Trace: 2005/07/20 15:45:45.013 01 t=6C1AC8 c=UNK key=P2 (13007002) ThreadId: 00000016 FunctionName: com.ibm.ws.runtime.component.ThreadMonitorImpl SourceId: com.ibm.ws.runtime.component.ThreadMonitorImpl Category: WARNING ExtendedMessage: BBOO0221W: WSVR0605W: Thread "HAManager.thread.pool: 0" (00000030) has been active for 642415 milliseconds and may be hung. There is/are 1 thread(s) in total in the server that may be hung. Note the thread name and ID. They might help you determine the problem when you are searching the IBM support pages or reporting the problem to IBM. 4. Have you identified the hang? Using the hang detection properties and the output from the WSVR0605W message, have you been able to identify the hung thread? If yes, then take corrective action as described in step 5. If no, issue some diagnostic commands and prepare for a dump as described in step 6.
Chapter 6. Hang
61
5. Take corrective action. If you were able to identify the hung thread with the information from the variables, then fix the problem, or check with the application programmers or the IBM support pages for information about this specific thread. If the thread problem continues to occur, you must further diagnose the cause by issuing specific commands and prepare for a dump. See step 6. 6. Issue diagnostic commands. Many factors can cause an application server to hang. Usually a dump is required to diagnose the cause, but first you must determine the address space for the dump. At a minimum, you should dump all the application server address spaces: the controller region, servant regions, and the daemon (the control region adjunct if appropriate). Issue this command to determine which address spaces should be included in your dump: D GRS,C This MVS command displays enqueue contention on the system. Example 6-2 shows the output after the authors issued the command. The BACK1HFS job currently holds an OMVS file system latch. The WEBPR01 job, which is the application server that is currently hung, is waiting for the enqueue.
Example 6-2 Sample output from D GRS,C command
ISG343I 01.54.41 GRS STATUS 177 NO ENQ RESOURCE CONTENTION EXISTS LATCH SET NAME: SYS.BPX.A000.FSLIT.FILESYS.LSN CREATOR JOBNAME: OMVS CREATOR ASID: 000D LATCH NUMBER: 221 REQUESTOR ASID EXC/SHR OWN/WAIT BACK1HFS 0087 EXCLUSIVE OWN WEBPR01 011A SHARED WAIT This means that you should dump OMVS, BACK1HFS, and WEBPR01 control and servant regions in the example to diagnose the hang problem. Note: The DISPLAY GRS contention command might have to be routed to all systems if a sysplex is involved. For more information about Global Resource serialization (GRS) and other GRS commands that are available to analyze connection, refer to z/OS V1R6.0 MVS Planning: Global Resource Serialization, SA22-7600-03 7. Capture dump of hung ASID and others. In a hang situation, it is always advisable to dump the OMVS address space and data space. Capture a dump of the relevant application server address spaces using the MVS DUMP command (Example 6-3).
Example 6-3 MVS DUMP command
62
Note: The more address spaces that you include in your dump, the larger the dump will be. Be sure that the dump completes successfully because you might encounter a space limitation problem, MAXSPACE. Any problems with the dump are recorded in the syslog. Use the MVS CHNGDUMP command to increase MAXSPACE. 8. Analyze dump. Use IPCS to analyze the dump. For detailed information about IPCS and working with SVCDUMP, refer to 20.3, SVC dumps on page 247. Analyzing a dump can be done in many ways and the same information can be found by invoking different commands and options: IP ST REGS WORKSHEET You can use this IPCS command to verify dump title, time, and date. IP ANALYZE RESOURCE This command results in contention analysis. It shows resources, such as a latch or file system, that are causing contention. Note the type of resource and the TCBs that are holding the resources. IP SUMM FORMAT If more than one address space is dumped, you must supply the ASID or job name (servant region name). This command shows all of the TCBs for that address space. Use these TCB addresses in the LEDATA commands. IP VERBX LEDATA TCB(tcb_addr) NTHREADS(*) This command lists the TCB traceback information. This is the module flow for the task. From this, you can determine what modules the ASID could be waiting in and whether it is a valid wait step for that module or method. If it is in application code, then you might need to consult with the application owner. Run this command against the TCBs that appeared in the analyze resource output as holding a resource that other TCBs are waiting for. You could also use any TCB that is listed in the summary format output. Example 6-4 shows sample output from the LEDATA command.
Example 6-4 output from ip verbx ledata 'tcb(00AC6B58) nthreads(*) asid(00a8)'
TCB(00AC6B58) NTHREADS(*) ASID(00A8) Language Environment Product 04 V01 R6.00 To Display Additional Information: IP VERBX LEDATA 'CAA(6A5CF520)DSA(6BB181C0) ALL' Information for enclave main Information for thread 1CD56F600000003F PCB Address: 1C50D080 TCB Address: 00AC6B58 Registers GPR0..... GPR4..... GPR8..... GPR12.... and PSW: 00000086 6BB181C0 9C7A1412 6A5CF520
63
PSW..... 07851400 80000000 00000000 01372572 Traceback: DSA Addr PU Addr Entry E Addr E Offset Statement Load Mod Service 6BB181C0 1C7A1408 recv 1C7A1408 -1B42EE96 CELHV003 6BB182A0 665FBB98 NET_Recv 665FBB98 +0000015A *PATHNAM 6BB18340 665F78F0 Java_java_net_SocketInputStream_socketRead0 665F78F0 +00000292 *PATHNAM 6BB187C0 69D17380 java/net/SocketInputStream.socketRead0(Ljava/io/FileDescript 69D17380 +0000011E 6BB18880 69D1F6D0 java/net/SocketInputStream.read(.BII)I 69D1F6D0 +00000240 .. .. 6BB19240 70069CD8 gov/zena/mss/appenv/MSSLoginModule.getURLOutput(Ljava/lang/S 70069CD8 +00000410 6BB19380 70FF1E40 gov/zena/mss/appenv/MSSLoginModule.login()Z 70FF1E40 +0000132C 6BB19560 70732400 sun/reflect/GeneratedMethodAccessor24.invoke(Ljava/lang/Obje 70732400 +000000AA 6BB19680 6BFC8360 sun/reflect/DelegatingMethodAccessorImpl.invoke(Ljava/lang/r 6BFC8360 +00000090 6BB19780 69D36CD0 java/lang/reflect/Method.invoke(Ljava/lang/Object;.Ljava/lan 69D36CD0 +000001C0 6BB19900 71F64A70 javax/security/auth/login/LoginContext.invoke(Ljava/lang/Str 71F64A70 +00000806 6BB19A80 70913FF8 javax/security/auth/login/LoginContext$4.run()Ljava/lang/Obj 70913FF8 +00000024 6BB19B80 7B600868 INVFRMMI 7B600868 +000002BC *PATHNAM 6BB1A0A0 7B9DC368 c_invokerFromMMI 7B9DC368 +00000030 *PATHNAM IP VERBX GRSTRACE
This command checks for any enqueue contention. Search the report for entries with an asterisk (*). These entries have enqueue contention. With WebSphere dumps, analysis using only IPCS might not give the detail that is required to show the Java method and class that are necessary for debugging. The svcdump.jar tool, which is run on a Windows platform against an SVCDump, can provide this detail. Example 6-5 shows the output generated by the svcdump.jar tool. Information about the use of the svcdump.jar tool can be found in 21.2, JVM dump and heap analysis tools on page 254 and in more detail in WebSphere for z/OS V5 JVM Dump and Heap Analysis Tools, REDP-3950.
Example 6-5 output from svcdump.jar against tcb(00AC6B58)
TCB ac6b58 tid 6af69b60 pthread id 1cd56f600000003f tid type 0x00000000 tid state 0x00000015 tid singled Dsa Entry Offset Function ------------------6bb181c0 1c7a1408 49e5a8ea recv 6bb182a0 665fbb98 0000015a NET_Recv 6bb18340 665f78f0 00000292 Java_java_net_SocketInputStream_socketRead0 6bb187c0 69d17380 0000011e java/net/SocketInputStream.socketRead0(Ljav 6bb18880 69d1f6d0 00000240 java/net/SocketInputStream.read([BII)I 6bb18a00 728f8d98 0000006e java/io/BufferedInputStream.fill()V 6bb18b20 6978d9f0 00000096 java/io/BufferedInputStream.read1([BII)I 64
Problem Determination for WebSphere for z/OS
6bb18c40 69792fb8 0000009e java/io/BufferedInputStream.read([BII)I 6bb18ee0 7154c6d0 00000104 com/ibm/net/ssl/www2/protocol/http/y.a(Lcom 6bb19000 72d17bd8 00000302 com/ibm/net/ssl/www2/protocol/http/bb.getIn 6bb19240 70069cd8 00000410 gov/zena/mss/appenv/MSSLoginModule.getURLOu 6bb19380 70ff1e40 0000132c gov/zena/mss/appenv/MSSLoginModule.login()Z 6bb19560 70732400 000000aa sun/reflect/GeneratedMethodAccessor24.invok . . Java stack: Method Location ------------java/net/SocketInputStream.socketRead0 Native Method java/net/SocketInputStream.read SocketInputStream.java(C java/io/BufferedInputStream.fill BufferedInputStream.java java/io/BufferedInputStream.read1 BufferedInputStream.java java/io/BufferedInputStream.read BufferedInputStream.java com/ibm/net/ssl/www2/protocol/http/y.b (Compiled Code) com/ibm/net/ssl/www2/protocol/http/y.a (Compiled Code) com/ibm/net/ssl/www2/protocol/http/bb.getInputStream (Compiled Code) java/net/URL.openStream URL.java(Inlined Compile gov/zena/mss/appenv/MSSLoginModule.getURLOutput MSSLoginModule.java(Comp gov/zena/mss/appenv/MSSLoginModule.login MSSLoginModule.java(Comp Important: In a hang situation, it is also important to check the settings for the protocol_http type variables, especially: protocol_http_timeout_output_recovery A SESSION setting cleans up the socket, but no attempt is made to disrupt the running of a dispatched HTTP request in a servant (region). The thread cannot be terminated when a timer for the thread hits. A SERVANT setting causes the whole address space to go down when the timer hits. This is seen as a timeout EC3 abend. 9. Have you identified the problem? After analyzing the dump, have you been able to find the reason for the hang? Have you identified the resource that was held causing other work to wait? If yes, then take the corrective action as described in step 10. If no, search the IBM support pages as described in step 11. 10.Take corrective action. If you were able to identify the reason for the hang, take appropriate steps to fix the problem so that you can run the application server as desired. 11.Search IBM support pages. If you have been unable to identify the cause of the hang, you should search the IBM support pages. Having analyzed the output from the commands that you issued and having reviewed the data in the dump, you now have information that you can use as a basis for such a search. You can start your search at the WebSphere for z/OS support site: http://www-306.ibm.com/software/webservers/appserv/zos_os390/support/
Chapter 6. Hang
65
From this site, you can click several links to access other support sites that are related to WebSphere Application Server, its components, and z/OS. Refer to Chapter 3, Information sources on page 25, which provides many valuable links and resources for solving problems in WebSphere for z/OS. 12.Have you identified the problem and solution? After searching the support pages, were you able to find the cause of the problem or find a solution? If no, prepare to contact IBM support as described in step 13. If yes, go to step 10. 13.Assemble MustGather documentation. For more information about MustGather, see MustGather on page 16. Read the document MustGather: Read first for WebSphere Application Server for z/OS for help with assembling the appropriate documentation. For a hang, supply information about: The version of your WebSphere Application Server and build level The version of the operating system and service level (PUT) The description of the problem Include background information such as when the problem started to occur, whether it occurs at certain times, and whether there have been any changes to the system such as new maintenance or new applications. Controller region and servant region job logs Syslog showing diagnostic commands issued and their output SVCDump 14.Contact IBM support. Refer to Chapter 2, Contacting IBM: Information on page 13, for information about and procedures for contacting IBM support.
66
Chapter 7.
Timeout
This chapter explains what a timeout is. The flow chart and step-by-step descriptions in this chapter can help you analyze the problem and find its cause. We also mention the analysis tools and reference information sources that are related to this symptom.
67
Server abend ?
No
Timeout in browser ?
No
Yes
Yes
19
Check syslog
No 8
Yes
22 Check script
4 Adjust timeout
Re-try / relogin
Server displayed ?
Yes 9
T1
10
Go to flowchart: hang
No
Yes
Server active?
11 Check server
23
Yes
T1
12
and port
14
15 18 Contact 17 Assemble
IBM support
"MustGather" documentation
No
16 Go to flowchart:
68
3. Timeout while entering data? Do you experience a timeout while attempting to enter data in a Web client? If yes, adjust the timeout values as shown in step 4. If no, proceed with step 5. 4. Adjust timeout value for data input. Because a timeout can occur when the application server is not tuned with timeout values, consult the application developers and testers to determine how much time was scheduled for data entry. If this information is not available, change the value of the protocol_http_timeout_input variable to zero. Do extensive testing to determine how long it takes to enter data, and adjust the protocol_http_timeout_input value accordingly. To learn how to change the protocol_http_timeout_input value, search for HTTP Transport timeout variables or controlling behavior through timeout values at the WebSphere for z/OS Information Center: http://publib.boulder.ibm.com/infocenter/wasinfo/v6r0/index.jsp
Chapter 7. Timeout
69
There are several articles that provide more information about timeout and how to set the timeout values that WebSphere for z/OS uses. Also, search for understanding how timers work for more links and references about timeout. Note: For information about setting the Administrative Console session timeout value, search for setting the session timeout for the administrative console at the WebSphere for z/OS Information Center. Proceed with step 6. 5. Adjust timeout value for data processing. Because a timeout can occur when the application server is not tuned with timeout values, consult the application developers and testers to determine how much time was scheduled for data processing. If you do not have this information, change the value of the protocol_http_timeout_output variable to 0. Do extensive testing to determine how long it takes to process data and adjust the protocol_http_timeout_output value accordingly. To learn how to change the protocol_http_timeout_output value, search for HTTP Transport timeout variables or controlling behavior through timeout values at the WebSphere for z/OS Information Center: http://publib.boulder.ibm.com/infocenter/wasinfo/v6r0/index.jsp There are several articles that provide more information about timeout and how to set the timeout values that WebSphere for z/OS uses. Also search for understanding how timers work for more links and references about timeout. 6. Retry and log in again. Either your session was idle for more time than expected or you had to tune the timeout values for WebSphere for z/OS. Now try to access the Web client again, call the appropriate URL, and log in if necessary. If you cannot get to the Web site you intended, you cannot log in, or the Web client appears to be frozen, proceed with step 7. 7. Check DA panel in SDSF. Check the server activity by going to the DA (Active users) panel under SDSF and analyze the list that is produced (Figure 7-3).
8. Is the server up? Check whether the server name (application servers and Deployment Manager for the Administrative Console) appears in the list (see JOBNAME in Figure 7-3). Are both the controller region and servant region in the list and are they up? If yes, proceed with step 9. 70
Problem Determination for WebSphere for z/OS
If no, go to step 19. 9. Is the server active? Check whether the server that your Web browser is trying to access is active. You can check the server activity in the DA panel under SDSF. See whether the CPU% number (see Figure 7-3 on page 70) for the specific server is changing, an indication that your server is active. If it is not active, proceed with step 10. If it is active, check the server host and port as shown in step 11. 10.Go to the flow chart for hang. If the request was sent from the client side, and the server is up but did not respond, then the server could be hung. At this point you explore the hang symptom. See Chapter 6, Hang on page 59, for more information. 11.Check server host and port. Verify the accuracy of the server host name and port number in the script or client that you are using to access the application server. In the command line of the wsadmin script, the host name is followed by -host, and the port number is followed by -port. If the host name is not specified, the program uses the host name specified in the TCP/IP profile. Also check whether the server listens on the right port by issuing netstat -a in the system with the specified application server. Check the output, verifying that the port that you are trying to access is open and that it is the right type. For admin scripting and clients, it must be an IIOP port, listed as protocol_iiop_port in the application server joblog. 12.Is the server host and port correct? Are the host name and port correctly defined, active, and accessed by the client or script? If no, proceed with step 13. If yes, check the logs as described in step 14. 13.Fix host name and port. According to your findings in previous steps, change the host name and port in the client or script, in the application server definition, or in your TCP/IP configuration to ensure that you can access the right application server. Go to step 6 when you are finished. 14.Check job and server logs. If your server is active, review the recent server log or job log entries and look for any abnormal activity in the server and any token or keyword pointing to an error or problem. Also check the syslog for messages that indicate a problem in the system environment. Chapter 4, Exceptions and error messages on page 41 and Appendix A, Messages and codes on page 311 have more details about how to identify messages and exceptions. If First Failure Data Capture (FFDC) is enabled, messages are written to the specified FFDC files (see Figure 7-4 on page 72). Check them for any messages or exceptions.
Chapter 7. Timeout
71
Trace: 2005/10/03 10:46:25.304 01 t=6C61C8 c=UNK key=P8 (13007002) ThreadId: 0000002f FunctionName: initialize SourceId: com.ibm.ws.ffdc.IncidentStreamImpl.ServiceLogger Category: INFO ExtendedMessage: FFDC0009I: FFDC opened incident stream file /P13/WebSphere/V6R1M2A/AppServerB/profiles/default/logs/ffdc/PLXMCLA1_P1BNLA1_s erverB2_STC54100_W60ASB2S_05.10.03_14.46.25.0.txt
Figure 7-4 FFDC file information in trace
Note: For more information about FFDC, see 19.3, First Failure Data Capture on page 219, and the WebSphere Information Center at: http://publib.boulder.ibm.com/infocenter/wasinfo/v6r0/index.jsp 15.Is there any message or exception? Have you found a message or exception in the log that might be worth exploring? If yes, then proceed with step 16. If no, then go to step 17. 16.Go to the flow chart for exceptions and error messages. If you found a message or exception that is not self-explanatory but might lead you to the root cause of the problem, refer to Chapter 4, Exceptions and error messages on page 41. 17.Assemble MustGather documentation. MustGather documents help with problem determination and save time resolving PMRs. For more information about MustGather, see MustGather on page 16. You can find MustGather documents by searching for the word mustgather at: http://www.ibm.com/software/webservers/appserv/zos_os390/support Read the document: MustGather: Read first for WebSphere Application Server for z/OS, for help assembling the appropriate documentation, available at: http://www.ibm.com/support/docview.wss?uid=swg21176043 The minimum information that is necessary is: Problem description Include information related to when the problem first started to occur. Software version and maintenance (build) level You find the information in the job log of your application server. Search for build level, and you obtain a line similar to this: BBOM0007I CURRENT CB SERVICE LEVEL IS build level cf10515.05 release WAS601.ZNATV date 04/15/05 12:55:41. Operating system version and maintenance (PUT) level The job log of the application server in question (include both controller and servant region job logs) Any dumps or traces triggered by the problem
72
See also 2.3, Before you contact IBM support on page 15. Then proceed with step 18 to contact IBM Support. 18.Contact IBM support. If you need to contact IBM support, refer to Chapter 2, Contacting IBM: Information on page 13, for instructions. Provide the information that is outlined in the MustGather documentation step. 19.Check syslog. If you cannot see the server in the DA panel, you might have to start the server first (if this was not done before) or analyze why the server stopped by checking the syslog for specific messages that indicate a problem. Search for the last log entry related to the server in question and trace it back to where a problem occurred. There are several reasons for a server failing. 20.Is there a server abend? A server abend can be a reason for a request timing out. Did you find an abend message for your server in the syslog? If yes, then proceed with step 21. If not, then check for other error messages or exceptions and proceed with step 15. 21.Go to the flow chart for abend. The abend EC3 code followed by a 0413xxxx reason code indicates an abend caused by a timeout. To identify the cause of the EC3-0413xxxx abend timeout (by debugging the EC3 dump using the IPCS debugging tool), search for task with the EC3 abend at the WebSphere Information Center: http://publib.boulder.ibm.com/infocenter/wasinfo/v6r0/index.jsp 22.Check logs. Check all other logs for a message or exception that might be related to the timeout: If you were running a script and experienced a timeout, these scripts (using Java Command Language or Jython) can run into the same type of problems as the Administrative Console. Scripts generally put the log information into a file. Logs are not generated by default, so scripts have to be modified so that the errors can be written to logs. Search the Information Center for script information if you are not sure how to modify scripts for logging. Rerun the script to generate a log. If you were running other types of clients, such as an Object Request Broker (ORB) client, Java Message Server (JMS) client, Remote Method Invocation/Internet Inter-ORB protocol (RMI/IIOP) client or any fat client, and the client timed out while waiting for a response from the server or from an external source, check whether the clients generated any logs. If there are no logs, modify the clients to generate logs. Contact your application developers for the modification. Check all possible logs of other resources that could cause the timeout. Other resources include networks, databases, and any resources that are not visible in the browser or in the client logs. Other resource timeouts sometimes show up in the system log or job log. Check the syslog (issue LOG in SDSF panel). For the job log, go to the server address spaces and scan through the logs for any timeouts. Instead of looking through all job logs, you can use log streams. Define LOGSTREAMs for all the address spaces, then run the BBORBLOG tool from the Time Sharing Option (TSO) command prompt to receive the job logs. See 19.2, WebSphere error log (BBORBLOG) on page 216, for more information about WebSphere logs and tools.
Chapter 7. Timeout
73
23.Are there any connection timeout exceptions? If your script or client is waiting for a connection to establish that fails, or for a request sent through this connection but not returned, you might receive a connection timeout or reset exception in your log. Example 7-1 shows the connection error with the message number WASX7023E.
Example 7-1 Connection exception
WASX7023E: Error creating "SOAP" connection to host "WEBXXX.POK.IBM.COM"; exception information: com.ibm.websphere.management.exception.ConnectorNotAvailableException: [SOAPException: faultCode=SOAP-ENV:Client; msg=Connection reset; targetException=java.net.SocketException: Connection reset] You will most likely find another message that informs you of the reason and gives you the location for the specific log to review. In our case, we received message code WASX7213I, which means that this scripting client is not connected to a server process. It also pointed to the log file /WebSphere/V6R0M0A/AppServer1/profiles/default/logs/wsadmin.traceout for additional information. Do you see a connection timeout in your script/client or system log? If yes, go to step 11. If no, then go to step 15.
74
Chapter 8.
75
Request in dispatch ?
Yes
Yes
No
No
5 Check other resources in dispatch 6
Yes
9 Go to flowchart:
13
No
10
11
12
76
77
7. Check job log and server log. If your server is active, review the recent server log or job log entries and look for any abnormal activities in the server and any token or keyword pointing to an error or problem. Also check the syslog for messages about a problem in the system environment. Chapter 4, Exceptions and error messages on page 41 and Appendix A, Messages and codes on page 311 have more details about how to identify messages and exceptions. If FFDC is enabled, messages are written to the specified FFDC files (Figure 8-2). Check them for any messages or exceptions. Trace: 2005/10/03 10:46:25.304 01 t=6C61C8 c=UNK key=P8 (13007002) ThreadId: 0000002f FunctionName: initialize SourceId: com.ibm.ws.ffdc.IncidentStreamImpl.ServiceLogger Category: INFO ExtendedMessage: FFDC0009I: FFDC opened incident stream file /P13/WebSphere/V6R1M2A/AppServerB/profiles/default/logs/ffdc/PLXMCLA1_P1BNLA1_s erverB2_STC54100_W60ASB2S_05.10.03_14.46.25.0.txt
Figure 8-2 FFDC file information in trace
Note: For more information about FFDC, see 19.3, First Failure Data Capture on page 219 and visit the WebSphere Information Center. 8. Is there any message or exception? Have you found a message or exception in the log that might be worth exploring? If yes, then proceed with step 9. If no, then go to step 10. 9. Go to the flow chart for exceptions and error messages. If you found a message or exception that is not self-explanatory but might lead you to the root cause of the problem, refer to Chapter 4, Exceptions and error messages on page 41. 10.Set dump. Dumps can be suppressed by the dump analysis and elimination process. When this is the case, you set a SLIP to capture a dump. This is done using the SLIP SET z/OS command. Refer to z/OS MVS System Commands, SA22-7627-11, for a full description and syntax of the SLIP command. 11.Retry stop command and analyze dump. Try the stop command for the application server in question again. You might have solved the problem by resolving the resource in dispatch or a dump might have been created. Usually the name of the SVCDUMP data set is recorded in the syslog. If you are not sure whether a dump was taken or which data set the dump was written to, then search for the word dump in the syslog and locate any messages pertaining to the dump. If there is a problem capturing the dump, an IEAxxx type message is issued, such as: IEA911I PARTIAL DUMP ON MVS.O1MP.DMP00056 678 IEA043I SVC DUMP REACHED MAXSPACE LIMIT - MAXSPACE=xxxx MEG
78
In that case, you should fix the dump problem first before you attempt to analyze the dump because crucial information might not be written to the dump. Also ensure that your WebSphere for z/OS servers have the authority to create and write to the dump data sets. To analyze the SVCDUMP, invoke the z/OS MVS IPCS. There are several methods for analyzing SVCDUMP with IPCS and data from the SVCDUMP. We outline one approach here: a. Invoke IPCS and verify that you have the correct dump by checking the dump title, date, and time. To display this information, issue: ip st validate worksheet Figure 8-3 shows an example of output from this command. I
MVS Diagnostic Worksheet Dump Title: COMPON=WEBSPHERE Z/OS, COMPID=5655I3500,ISSUER=BBORLEXT, ABEND IN BBOORB /UNKNOWN
CPU Model 2084 Version 00 Serial no. 012345 Address 01 Date: 02/18/2005 Time: 12:38:22.102475 Local Original dump dataset: SYSPRD1.PLXA.SVCD.D050218.H123818.C2.N00011 Information at time of entry to SVCDUMP: HASID 04EC PASID 04EC SASID 04EC PSW 070D1000 9C038948
b. In the same display, scroll down to see the diagnostic data report section and verify the abend code, reason code, and module name. c. Locate the PSW address of where the abend condition occurred and verify the module name in the summary format report, which is displayed using the following command: ip summ format Scroll to the bottom of the report and use find previous to locate the RTM2WA SUMMARY and control block data: f 'rtm2wa summary' prev d. This takes you to RTM2WA SUMMARY, which shows RTM data. This is the time of error information as shown in Figure 8-4 on page 80. Note the PSW address.
79
RTM2WA SUMMARY -------------Completion code 840C4000 Abending program name/SVRB address 007C2070 00000000 Abending program addr 00000000 of error 215D3F00 273C2250 00000000 00000000
GPRs at time 0-3 00000000 4-7 34326EB0 8-11 A667F7AA 12-15 33FE6C50 +007C +00DC
+00E8 Return code from recovery routine-00 Continue with termination-implies percolation +00E0 Retry Address returned from recovery exit 00000000 +00E4 RB Address for retry 00000000 +000C +0038 +00C8 CVT Address RTCT Address SCB Address 00FCB018 00FB24E0 007C4AC0
To determine the TCB address, you must scroll up a little to find the RTM2WA control block data and note the TCBC value. In our example, we have a PSW of 072C2000 A667FA42. The second word is a 31-bit address. For information about the format of the PSW, refer to z/Architecture Principles of Operation, SA22-7832-03. e. Now locate the address in the dump storage. This is done from the IPCS main menu. In our example, we located address 2667FA42 (Figure 8-5).
ASID(X'04EC') ADDRESS(2667FA42.) Command ===> 2667FA42 A784 000A181B 2667FA50 FFA95800 D00018B0 2667FA60 B00012BB A774FF76 2667FA70 A784000D A7AA0FF8
STORAGE ---------------------------------9856E0D8 A7F40005 5820487C 9856A0D8 0D764700 18DB58B0 BF1F201C 0D764700 | xd....q.\Q.... | | .z..}...x4...... | | ....x......@.... | | xd..x..8q..Q.... |
f. From the PSW address, scroll up and try to determine the module name using the eye catchers in the dump such as those in Figure 8-6.
| | | | |
| | | | |
Figure 8-6 Search for eye catchers in dump storage near PSW address
80
The eye catchers are the ASCII characters found at the right of the storage. In our example, we found the bboosra module name. g. Obtaining a module name often is sufficient, but when WebSphere for z/OS is involved, it is sometimes necessary to go further and obtain the related method name. Examine the traceback data. Using the TCB from the RTM2WA information, enter. ip verbx ledata 'tcb(007c07f8) nthreads(*)' When the output is displayed, locate the traceback information (Figure 8-7). Traceback: PU Addr PU Offset 2667F7A0 +000002A2 266127B0 36422E28 36426388 36414658 7C200830 7C5DA5B0 7CCF1788 7CCE4618 7CCFD298 +00000072 +0000013C +00000052 +000000A4 +000002BC +00000030 +0000026E +00005C6A +000002DC
Entry E Addr E Offset Statement Load Mod Service SRAggregator::refresh(JNIEnv_*,_jobject*,_jobject*) 2667F7A0 +000002A2 SUBPOOL2 Java_com_ibm_ws390_orb_ORBEJSBridge_refreshSRAggregator 266127B0 +00000072 SUBPOOL2 com/ibm/ws390/orb/ORBEJSBridge.refreshSRAggregator(ILjava/la 36422E28 +0000013C SUBPOOL0 com/ibm/ws390/orb/SRAggregator.getSRObjectElementHT()Ljava/u 36426388 +00000052 SUBPOOL0 com/ibm/ws390/management/ServantMBeanInvoker.invokeSpecified 36414658 +000000A4 SUBPOOL0 INVFRMMI 7C200830 +000002BC *PATHNAM c_invokerFromMMI 7C5DA5B0 +00000030 *PATHNAM mmipSelectInvokeJavaMethod 7CCF1788 +0000026E *PATHNAM mmipExecuteJava 7CCE4618 +00005C6A *PATHNAM mmijExecuteJavaFromJIT 7CCFD298 +000002DC *PATHNAM
Stat Call Call Call Call Call Call Call Call Call Call
h. The information found in the traceback in Figure 8-7 might be sufficient to find the module or method name. When the traceback provided by IPCS does not go far enough, a tool called svcdump.jar can be used. Refer to 21.2, JVM dump and heap analysis tools on page 254 and WebSphere for z/OS V5 JVM Dump and Heap Analysis Tools, REDP-3950, for more details about how to download and run the svcdump.jar tool. i. With the information that you obtained from the svcdump.jar tool, such as abend code, module, and method name, determine in which component the abend was taken. You can use this information to debug the module or search IBM support data for related information and a possible fix. j. After searching IBM support data, we found APAR PK06080 to address our problem. k. If you cannot find a solution, prepare MustGather documentation and contact IBM. Further information about IPCS can be found in: z/OS V1R6.0 MVS IPCS Commands, SA22-7594-05 z/OS V1R2.0 MVS IPCS User's Guide, SA22-7596-01.
Also refer to 20.3, SVC dumps on page 247, for more information about how to use SVCDUMP for problem analysis in WebSphere for z/OS. 12.Assemble MustGather documentation.
81
MustGather documents help with problem determination and save time resolving PMRs. For more information about MustGather, see MustGather on page 16. Read the document MustGather: Read first for WebSphere Application Server for z/OS, for help with assembling the appropriate documentation. It is available at: http://www.ibm.com/support/docview.wss?uid=swg21176043 The minimum information necessary is: Problem description Include information related to when the problem first started to occur. Software version and maintenance (build) level You find the information in the job log of your application server. Search for build level and you obtain a line similar to this: BBOM0007I CURRENT CB SERVICE LEVEL IS build level cf10515.05 release WAS601.ZNATV date 04/15/05 12:55:41. Operating system version and maintenance (PUT) level The job log of the application server in question (including both controller and servant region job logs) Any dumps or traces triggered by the problem. See also 2.3, Before you contact IBM support on page 15. Then proceed with step 13 to contact IBM Support. 13.Contact IBM support. If you need to contact IBM support, refer to Chapter 2, Contacting IBM: Information on page 13, for instructions. Provide the information outlined in the MustGather documentation step.
82
Chapter 9.
Job failed
This chapter explains the job failed symptom. The flow chart and step-by-step descriptions in this chapter can help you analyze the problem and find its cause. We also mention the analysis tools and reference information sources that are related to this symptom.
83
17
2 1
Yes
Go to symptom abend
16
Analyze SVCDump
No
4 5 Go to symptom
Yes
15 Reproduce or wait
for reoccurance
Yes
13
No
6
Dump captured ?
No
14
7 8
No
12
Locate SVCDump
Yes
10 9
No
Identified problem and solution?
Yes
11
84
85
From this site, you can follow several links to other support sites that are related to WebSphere Application Server, its components, and z/OS. See Chapter 3, Information sources on page 25, which provides many valuable links and resources for solving problems in WebSphere for z/OS. When you are searching problem databases for information or fixes that might be related to the job failed symptom, consider that such problems are reported in many formats. You might have to alter your search keyword to find a match. 10.Have you identified the problem and solution? After you searched the support pages, were you able to find the cause of the problem and a solution? If yes, then take the corrective action by proceeding with step 11. If no, go to step 12. 11.Take corrective action. The information that you have found using the IBM support data might have provided the following solutions: An existing APAR and PTF fix for your problem that is available for you to apply. Other reports of your symptoms that include a procedure for fixing the problem. In such cases, follow the instructions that are provided, or apply the information to your specific problem. If you were able to solve it, document the problem and the fixes you have applied in your system change documentation for your specific WebSphere for z/OS environment for later reference. Otherwise, proceed with step 12. 12.Locate SVCDump. You might find more information or hints regarding the failing job in the dump. Usually the name of the SVCDUMP data set is recorded in the syslog. If you are not sure whether a dump was captured or which data set the dump was written to, then search for the word dump in the syslog and locate any messages pertaining to the dump. If there was a problem capturing the dump, an IEAxxx type message is issued, such as: IEA911I PARTIAL DUMP ON MVS.O1MP.DMP00056 678 IEA043I SVC DUMP REACHED MAXSPACE LIMIT - MAXSPACE=xxxx MEG In that case, you should fix the dump problem before you attempt to analyze the dump because crucial information might not be written to the dump. Also, ensure that your WebSphere for z/OS servers have the authority to create and write to the dump data sets. 13.Was a dump captured? Was a dump captured? Were you able to locate the dump? If no, then you must set a SLIP. Go to step 14. If you located the dump, then prepare to analyze the dump as described in step 16. 14.Set SLIP for SVCDump. Dumps can be suppressed by the dump analysis and elimination process. When this is the case, you should use the z/OS command SLIP SET to set a SLIP to capture a dump when the symptom occurs. Example 9-1 shows a SLIP used to capture a dump for an abend EC3 (a started task failure). It uses a wild card for the reason code so that any of the 0413000* abend reason codes that occur are allowed. The ASIDLIST is for current, home, primary, and secondary address spaces and can include other address spaces in the dump if you are in cross memory with them at the time.
Example 9-1 Example for setting a SLIP
86
Refer to the z/OS MVS System Commands, SA22-7627-11, for a full description and syntax of the SLIP command. If you are unsure about the most appropriate SLIP, contact IBM support for assistance. 15.Reproduce or wait for reoccurrence. With the SLIP set, try to reproduce the error. If you cannot reproduce the error, then wait for the problem to reoccur with the SLIP in place. 16.Analyze the SVCDUMP. To analyze the SVCDUMP, invoke IPCS. Several methods can be used to analyze this symptom using IPCS and data from the SVCDUMP. We outlined one approach in Analyze the SVCDUMP. on page 54 in Chapter 5, Abend on page 49. Further information about IPCS can be found in: z/OS V1R6.0 MVS IPCS Commands, SA22-7594-05 z/OS V1R2.0 MVS IPCS User's Guide, SA22-7596-01 See 20.3, SVC dumps on page 247, for more information about how to use SVCDUMP for problem analysis in WebSphere for z/OS. If you cannot find a solution, prepare MustGather documentation and contact IBM as described in step 17. 17.Assemble MustGather documentation. Prepare the MustGather documentation for IBM support. For more information about MustGather, see MustGather on page 16. For a failing job or started task, you should provide the following material: Problem description Include information related to when the problem first started to occur. Software version and maintenance (build) level Find this information in the job log of your application server. Search for build level to obtain a line similar to this: BBOM0007I CURRENT CB SERVICE LEVEL IS build level cf10515.05 release WAS601.ZNATV date 04/15/05 12:55:41. Operating system version and maintenance (PUT) level The job log of the abending address space (including both controller and servant region job logs) The SVCDUMP triggered by the failing job or the SLIP 18.Contact IBM support. If you need to contact IBM support, refer to Chapter 2, Contacting IBM: Information on page 13, for instructions. Provide the information outlined in the MustGather documentation step.
87
88
10
Chapter 10.
No response
This chapter explains what the no response symptom means. The flow chart and step-by-step descriptions in this chapter can help you analyze the problem and find its cause. We also mention the analysis tools and reference information sources related to this symptom.
89
No
Install application
Go to symptom timeout
No
Start application
23 Reproduce or wait
24
for reoccurance
Analyze SVCDump
Check syslog
22
25
10
No
8 Found keyword?
Yes
Assemble Mustgather
Yes
No 18 Access to back-end resources ? Yes 19 Go to No resource access 27 Take corrective action 29 Contact IBM Support
No
90
91
Does your application have a green arrow? If no, proceed with step 6. If yes, go to step 7. 6. Start application. To start the application, go to the Administrative Console and select Applications Enterprise Applications. Select your application and click the Start button (see Figure 10-2.) For more information about the status of applications and starting and stopping them, search for Start and Stop applications at the WebSphere for z/OS Information Center: http://publib.boulder.ibm.com/infocenter/wasinfo/v6r0/index.jsp 7. Check syslog. Check the syslog; issue LOG in the SDSF panel. Search for the last log entry related to the specific application server and trace it back to where a problem might have occurred. Look for any abnormal activity in the system or server and any token or keyword pointing to an error or problem. 8. Have you found a keyword? Have you found any abnormal activity in the system or server or any token or keyword pointing to an error or problem in the log that might be worth exploring? If no, then go to step 10. If yes, then proceed with step 9. 9. Analyze or search keyword. If you found a symptom message or exception that is not self-explanatory but might lead you to the root cause of the problem, refer to the appropriate symptom chapters in this book: Chapter 5, Abend on page 49 Chapter 6, Hang on page 59 Chapter 7, Timeout on page 67 92
Problem Determination for WebSphere for z/OS
Chapter 8, Does not stop on page 75 Chapter 9, Job failed on page 83 Chapter 11, No resource access on page 99 Chapter 4, Exceptions and error messages on page 41 and Appendix A, Messages and codes on page 311 have more details about how to identify messages and exceptions. Then, search for information about the specific message as described in step 23. 10.Check job and server logs. Review the recent server log or job log entries and look for any abnormal activity in the server and any token or keyword pointing to an error or problem. Instead of looking through all job logs, you can use log streams. Define LOGSTREAMs for all the address spaces, then run the BBORBLOG tool from the TSO command prompt to receive the job logs. See 19.1, Job logs and system log on page 214 and 19.2, WebSphere error log (BBORBLOG) on page 216 for more information about WebSphere logs and tools. If FFDC is enabled, messages are written to the specified FFDC files. Check them for any messages or exceptions. See 19.3, First Failure Data Capture on page 219, and the WebSphere Information Center at: http://publib.boulder.ibm.com/infocenter/wasinfo/v6r0/index.jsp If you use the IBM HTTP Server, you should analyze the appropriate logs that are described in 19.5, IBM HTTP Server logs and trace on page 232. 11.Have you found a keyword? Have you found any abnormal activity in the system or server or any token or keyword pointing to an error or problem in the log that might be worth exploring? If no, then go to step 12. If yes, then proceed with step 9. 12.Enable Java Logging. To gather more detailed information relating to the execution path of a running application so that you can determine the root cause of the problem, use the Java Logging API. To enable this API: a. In the navigation pane, select Servers Application Servers <server_name> b. Click Diagnostic Trace Service. c. Select Enable Log. d. Select either Memory buffer or file and click Apply. e. To specify your configurations, got to Troubleshooting and select Logging and tracing. f. Click Change Log Detail levels. g. To make a static change to the configuration, click the Configuration tab. To change the configuration dynamically, click the Runtime tab. A list of well-known components, packages, and groups is displayed. h. Select a component, package, or group to set a logging level. Figure 10-3 on page 94 shows the Administrative Console panel for changing log level details. The list of components, packages, and groups shows all the components that are currently registered to the running server.
93
i. Click Apply, then OK. See 19.4, The Java Logging API on page 227, for more detailed information. After setting the trace, try to recreate the problem, stop the trace (or reduce the level), and analyze the trace that was produced for hints relating to the problem. 13.Analyze traces. Go to the end of the file to see the last event that was recorded in the trace. The last entry always has the last method that was in memory (when an error occurred). The method name, its class, and its package name tell you the owner or provider of the package and are usually descriptive enough to hint at the root cause of the problem. Also look for messages that are correlated to the HTTP header or client request/response problems of your application. Refer to Trace Analyzer for WebSphere Application Server on page 261, or use various other tools that are described in Chapter 21, Diagnostic tools for WebSphere for z/OS on page 253, for the analysis. 14.Have you found a keyword? Have you found any abnormal activity in the trace, or any token or keyword pointing to an error or problem in the log that might be worth exploring? If no, go to step 15. If yes, proceed with step 9. 15.Check for loop. No response from an application can be caused by a loop in the application code. Program loops can also result in a component hanging, usually followed by a timeout error. In some cases, things seem to work fine, but some tasks in the address space keep consuming system resources without producing a result. You can see this when CPU usage is high but nothing seems to justify it. A dump or trace might be necessary because the log information is not sufficient for determining the cause of the loop or the unusual high resource consumption.
94
The indications of a loop are: You receive a repetitive message from a module waiting for work although nothing is being done, such as: ExtendedMessage: <component> waiting for next server work You receive a repetitive message that a module is active and you are able to follow the executed address ranges, but the only thing changing is the time stamp. You receive a repetitive message from a module processing work or requests but the thread ID stays the same. Notice the ThreadID and FunctionName in Example 10-1. They might stay the same, but the trace header line with the time stamp changes if a loop is occurring. There might be several other messages between the repetitions. The shorter the loop cycle, the more likely you will be to recognize the loop.
Example 10-1 Looping thread
Trace: 2005/08/19 21:23:41.232 01 t=7D19C0 c=UNK key=P8 (13007002) ThreadId: 0000006d FunctionName: com.ibm.etools.validation.validationbuilder SourceId: com.ibm.etools.validation.validationbuilder.UserStateRegistry ExtendedMessage: closeUser - found UserPrefs: UserPreferences: nodeName:nd6552, serverName: ws6552, userId: waspd2, refreshRate If you suspect an application loop or hang: You can use IPCS to format a trace and analyze for recurring psw addresses. Using the system trace, begin at the bottom of the file with the most recent entries. Look for any recurring or repetitive patterns in the system trace entries with the same psw addresses. Using these addresses, browse them in dump storage using IPCS option 1. Scroll up from the address in storage and locate any eye catchers that identify module names. It might be difficult to determine the module names and relate them to Java methods. Use the TCB to run a traceback that might help identify what code the TCB is running. Use the command: ip verbx ledata 'tcb(009C31C8) nthreads(*) asid(00fb)' See 20.1.2, Viewing CTRACE and JRas data through IPCS on page 242. You can use he com.ibm.jvm.svc.dump.Dump utility to identify: The thread under which a loop is occurring The threads contending for resources or involved in a lockout A thread waiting for some operation that is external to the server
A common mistake that causes a loop is forgetting to increase the counter in a while structure. Example 10-2 is a sample of Java code fragment for an infinite loop.
Example 10-2 Java code of an infinite loop
int ind = 0; int [] temp = {1, 3, 5, 7 }; while (ind < temp.length) { // other statements with no assignment to the variable "ind" } If the ind variable is never greater or equal to temp.length, the loop will never terminate. The correct way to code such a condition is shown in Example 10-3 on page 96.
95
int ind = 0; int[] temp = {1, 3, 5, 7}; while (ind < temp.length) { // Others statements that do not assign anything to the variable "ind" ind ++; // increment the counter } 16.Is there a loop in the application code? Have you found a loop in your application code? If yes, go to step 17. If no, proceed with step 18. 17.Fix application. If you have found an application loop, take information about the thread and component that is involved in the loop to the application developer to fix it. Redeploy fixed code and restart the application server as described in steps 4 and 5. 18.Do you have access to back-end resources? If your application accesses other resources, the connection to those resources, or the resources themselves, might experience some problems. Do you access other resources? If yes, go to step 19. If no, then proceed with step 20. 19.Go to No resource access. If your application is accessing back-end resources, like a database, Java messaging service, or transaction service, check these resources. See Chapter 11, No resource access on page 99, for information about how to analyze this problem. 20.Locate SVCDump. You might find more information or hints in the dump. Usually the name of the SVCDUMP data set is recorded in the syslog. If you are not sure whether a dump was taken or which data set the dump was written to, then search for the word dump in the syslog and locate any messages pertaining to the dump. If there was a problem taking the dump, an IEAxxx type message is issued such as: IEA911I PARTIAL DUMP ON MVS.O1MP.DMP00056 678 IEA043I SVC DUMP REACHED MAXSPACE LIMIT - MAXSPACE=xxxx MEG In that case, you should fix the dump problem before you attempt to analyze the dump because crucial information might not be written to the dump. Also ensure that your WebSphere for z/OS servers have the authority to create and write to the dump data sets. 21.Was a dump captured? Was a dump captured? Were you able to locate the dump? If no, then you should set a SLIP. Go to step 22. If you located the dump, then prepare to analyze the dump as described in step 24. 22.Set SLIP for SVCDump Dumps can be suppressed by the dump analysis and elimination process. When this is the case, you should use the SLIP SET command to set a SLIP to capture a dump when the symptom occurs. Example 10-4 on page 97 shows a SLIP used to capture a dump for an abend EC3 (a started task failure). It uses a wild card for the reason code so that any of the 0413000* abend reason codes that occur are allowed. The ASIDLIST is for current,
96
home, primary, and secondary addresses to include other address spaces in the dump if you are in cross memory with them at the time.
Example 10-4 Example for setting a SLIP
SLIP SET,A=SVCD,COMP=EC3,REASON=0413000x,ID=WEC3,MATCHLIM=20, SDATA=(ALLNUC,CSA,GRSQ,LPA,LSQA,PSA,RGN,SQA,SUM,SWA,TRT), ASIDLST=(0,H,I,P,S) Refer to the z/OS MVS System Commands, SA22-7627-11 for a full description and syntax of the SLIP command. If you are unsure about the most appropriate SLIP, contact IBM support for assistance. 23.Reproduce or wait for reoccurrence. With the SLIP set, try to reproduce the error. If you cannot reproduce the error, then wait for the problem to reoccur with the SLIP in place. 24.Analyze the SVCDUMP. To analyze the SVCDUMP, invoke IPCS. Several methods can be used to analyze this symptom using IPCS and data from the SVCDUMP. We outlined one approach in Analyze the SVCDUMP. on page 54 in Chapter 5, Abend on page 49. Further information about IPCS can be found in: z/OS V1R6.0 MVS IPCS Commands, SA22-7594-05 z/OS V1R2.0 MVS IPCS User's Guide, SA22-7596-01 Refer to 20.3, SVC dumps on page 247 for more information about how to use SVCDUMP for problem analysis in WebSphere for z/OS. 25.Search IBM support data. If you found a keyword, a key phrase, or error message in the dump that indicates the cause of the problem, search the IBM support Web sites and databases for a solution, especially the WebSphere for z/OS support site at: http://www-306.ibm.com/software/webservers/appserv/zos_os390/support/ From this site, you can click several links to access other support sites that are related to WebSphere Application Server, its components, and z/OS. See Chapter 3, Information sources on page 25, which provides many valuable links and resources for solving problems in WebSphere for z/OS. Tip: When searching problem databases for information or fixes, you may might have to alter your search keywords to find a match. 26.Identified problem and solution? Were you able to find the cause of the problem and a solution? If yes, then take the corrective action in step 25. If no, go to step 26. 27.Take corrective action. If the problem is related to application code, or you suspect that the problem is related to the application and its access to resources, take the problem description, application logs, and traces produced to the application owner or developer. Clarify which resources should be accessed and whether application access complies with J2EE and J2EE Connector Architecture (JCA) standards. Check whether connection properties are defined as intended, and walk through the trace step-by-step to find an
97
indication of what went wrong. Also check the application logs for messages and potential indicators, and analyze them together with the application developer to determine the exact point in the application where things went wrong. For more information about resource access, JCA, pitfalls, hints and tips, and best practices, see the developerWorks Web site at: http://www.ibm.com/developerworks The information that you found using the IBM support data might have provided the following solutions: An existing APAR and PTF fix for your problem that is available for you to apply Other reports of your symptoms that include a procedure for fixing the problem In such cases, follow the instructions that are provided or apply the information to your specific problem to solve it. Document the problem and the fixes that you have applied in your system change documentation for your specific WebSphere for z/OS environment for later reference. 28.Assemble MustGather documentation. For more information about MustGather, see MustGather on page 16. You can find MustGather documents by searching on the word mustgather on the support Web site: http://www.ibm.com/software/webservers/appserv/zos_os390/support Read the document: MustGather: Read first for WebSphere Application Server for z/OS, for help assembling the appropriate documentation. The minimum information necessary is: Problem description Include information related to when the problem first started to occur. Software version and maintenance (build) level You find the information in the job log of your application server. Search for build level, to obtain a line similar to this: BBOM0007I CURRENT CB SERVICE LEVEL IS build level cf10515.05 release WAS601.ZNATV date 04/15/05 12:55:41. Operating system version and maintenance (PUT) level The job log of the abending address space (include both controller and servant region job logs) Any dumps or traces that are triggered by the problem or produced in the analysis Tip: If you can, use the diagnostic tools that are mentioned in this chapter to identify the particular component or subcomponent that is responsible for the problem. In most cases, the components are in the application program code rather than product code from IBM. Present the component name together with the class and method name from the trace to your application development team or IBM (in the case of IBM components). This allows them to fix the code quicker. 29.Contact IBM support If you need to contact IBM support, refer to Chapter 2, Contacting IBM: Information on page 13, for instructions. Provide the information outlined in MustGather documentation step.
98
11
Chapter 11.
No resource access
This chapter explains the no resource access symptom. The flow chart and step-by-step descriptions in this chapter can help you analyze the problem and find its cause. We also mention the analysis tools and refer to information sources that are related to this symptom.
99
Analyze symptom
Yes
23
22
21
Analyze SVCDump
5 6
Define resources
No
Resources defined ?
Yes
18
20 Reproduce or wait
for reoccurance
Yes
7 8
Change scope
No
Scope correct ?
Dump captured ?
No
19
Yes
9 17
Test connection
Locate SVCDump
No
15
Identified problem ?
Yes
16
administrators
14
100
ICH408I USER(WASPD2 ) GROUP(SYS1 ) NAME(USER1) /u/waspd2/.sh_history CL(FSOBJ ) FID(00000000000000004C03000000190000) INSUFFICIENT AUTHORITY TO OPEN ACCESS INTENT(RW-) ACCESS ALLOWED(GROUP ---) EFFECTIVE UID(0000003174) EFFECTIVE GID(0000000000) 3. Analyze exception or error message. If you found a specific message that indicates the problem cause, pursue it and try to solve it. The authorization failure message in Example 11-1 can be resolved by requesting appropriate authorization from the security administrator so that the application can run with sufficient privileges. If you are not sure how to solve the problem, search for potential solutions on the WebSphere support page; see step 14. If you found a message or exception that is not self-explanatory, refer to Chapter 4, Exceptions and error messages on page 41. 4. Check resource definition. Check in the Administrative Console if your resource is defined. Go to the Resources panel and compare whether the list of resources represents the resource access requirements for the application. 5. Are the resources defined? Did you find all the resources in the list? If no, proceed with step 6. If yes, then go to step 7. 6. Define resources. You must define the application resources in the Administrative Console. For database access, follow these steps:
101
a. Go to Login Resources JDBC Providers New. Define the database type, provider type, and implementation type properties as shown in Figure 11-2, which is a sample configuration of a new JDBC provider for DB2. Click Next.
b. Specify and confirm the location of the class path and native library path and click OK. c. In the next panel, select your new DB2 JDBC Provider and click Data Sources, then New. Figure 11-3 shows a sample of the Data Source properties configuration.
For more detailed information about all available parameters and their use, refer to the Creating and configuring a JDBC provider and data source topic at the WebSphere for z/OS Information Center: http://publib.boulder.ibm.com/infocenter/wasinfo/v6r0/index.jsp Also check data source names in the resources.xml configuration file.
102
For messaging services access, if you want to use JMS, you can find more detailed information about all available parameters and their use when you searching configuring JMS resources for the WebSphere at the WebSphere for z/OS Information Center For transaction server access, if you want to access the resources of a transaction server or use any other resource adapter, you can find more information about all available parameters and their use by searching for installing J2EE Connector resource adapters at the WebSphere for z/OS Information Center. 7. Is the scope correct? Go to the Administrative Console and click Resources to check whether your resource scope is configured for Cell, Cluster, Node, or Server. Is it correct? If no, then change it; see step 8. If yes, go to step 9. 8. Change scope. Change the scope for your resource access. Figure 11-4 shows how to change the scope of JDBC resources. For detailed information about all resources and their use, search for administrative console scope at the WebSphere for z/OS Information Center: http://publib.boulder.ibm.com/infocenter/wasinfo/v6r0/index.jsp
9. Test connection. Test the connection to verify that the application can access the data source. If you are using a JDBC provider, you can test the connection in the Administrative Console (that is, if the connection is available, if the application code can connect to the database, and if the resource is currently available). To test the connection: a. Go to the Administrative Console and select Resources JDBC Providers Additional Properties Data Sources. b. Select a data source. c. Click Test Connection. A message is displayed that reports the success or failure of the test. Figure 11-5 on page 104 shows the panel for testing the connection for JDBC.
103
If you are using JMS or another JCA connector, you must check the connection in a different way because the Administrative Console does not provide a utility to test them. From the symptom you experience and the application logic, you might be able to conclude which specific resource should be accessed. Then check the syslog, job log, and error log for messages that indicate requests for resources and their responses, or regarding resource access failures, such as permission denied. 10.JDBC connection test failed? Did the JDBC connection test fail? If yes, proceed with step 11. If no, go to step 12. 11.Enable JDBC trace and contact the DB2 administrator. If the JDBC connection test failed, enable the JDBC trace as described in 20.2, JDBC trace on page 244 and at the WebSphere for z/OS Information Center. The JDBC trace output goes to an HFS file that is specified in the JDBC properties file. JDBC trace information shows Java methods, database names, plan names, user names, or connection pools. Contact the DB2 administrator to discuss what resources you intended to access, what permission is needed, and whether the DB2 logs show any messages indicating resource access problems. Try to solve the problem with the help of the administrator. 12.Contact resource administrators. If your connection timed out, or you assume that there is a network problem, contact the network administrator. Check the TCP/IP setup, firewalls, and ports. Try to solve the problem with the help of the administrator. Refer to 22.1, TCP/IP related tools on page 276, for various tools that might help, such as TCP/IP network packet tracing with Ethereal. If your application uses EIS resource adaptors: a. Contact the EIS administrator to verify that the subsystem is up and available because there is no direct way to test the connection to an EIS resource from the WebSphere Administrative Console. b. Go to the Administrative Console and select Login Resources Resource Adapters. c. Drill down to the resource name link as you would with the JDBC providers. Verify the configuration properties (given by the administrator or developer of the application) such as the spelling of resource names, the class path information for libraries, and security information.
104
d. IMS and CICS also produce their own traces and logs. These subsystems very likely run in their own LPARs. Contact the administrator to get the traces and logs that are required for further analysis or ask them for help. e. Check with the security administrator to verify that the user who is configured to access the resource has the required permissions. Specific settings might prevent data access. See WebSphere Application Server for z/OS V5 and J2EE 1.3 Security Handbook, SG24-6086. 13.Change logging level. Enable WebSphere for z/OS trace. If it is enabled, make sure that you have set it to a level that gives you enough information about connections and resource access attempts to research problems. To set the trace or change the level, follow these steps: a. b. c. d. e. f. In the navigation pane, select Servers Application Servers. Click the name of the server that you want to set the trace for. Under Troubleshooting, click Logging and tracing. Click Change Log Detail levels. Select a component, package, or group to set a logging level. Click Apply, then OK.
Figure 11-6 shows the panel for setting the component, package, and group to a logging level.
Note: A logging level set to all will produce large traces. Make sure that you have enough space allocated for a large trace and consider the overhead. See 19.4, The Java Logging API on page 227 and the WebSphere for z/OS Information Center for more information about the Java Logging API. 14.Search IBM support data. After setting the trace, try to recreate the problem, stop the trace (or reduce level), and analyze the trace that is produced for hints relating to the problem. If you found an exception or error message that points to a resource access problem, search IBM support
Chapter 11. No resource access
105
Web sites and databases for a solution, especially the WebSphere for z/OS support site at: http://www-306.ibm.com/software/webservers/appserv/zos_os390/support/ From this site, you can access other support sites that are related to WebSphere Application Server, its components, and z/OS. Also, Chapter 3, Information sources on page 25 provides many valuable links and resources for solving problems with WebSphere for z/OS. Tip: When you search problem databases for information or fixes, you might have to alter your search keywords to find a match. 15.Have you identified the problem? When you searched the trace and support pages, were you able to find the cause of the problem and a solution? If yes, then take the corrective action in step 16. If no, go to step 17. 16.Take corrective action. If the problem is related to application code, or you suspect a problem that is related to the application and its access to resources, take the problem description, the application logs, and your traces to the application owner or developer. Clarify which resources should be accessed and whether the application access complies with J2EE and JCA standards. Check that connection properties are defined as intended, and walk through the trace step-by-step to find an indication of what went wrong. Also check the application logs for messages and potential indicators and analyze them with the application developer to determine the exact point in the application where things went wrong. For more information about resource access, JCA, pitfalls, hints and tips, and best practices, see the developerWorks Web site at: http://www.ibm.com/developerworks The information that you found using the IBM support data might have provided the following solutions: An existing APAR and PTF fix for your problem that is available for you to apply Other reports of your symptoms that include a procedure for fixing the problem In such cases, follow the instructions that are provided or apply the information to your specific problem to solve it. Document the problem and the fixes that you have applied in your system change documentation for your specific WebSphere for z/OS environment for later reference. 17.Locate SVCDump. You might find more information or hints about the failing resource access in the dump. Usually the name of the SVCDUMP data set is recorded in the syslog. If you are not sure whether a dump was taken or which data set the dump was written to, then search for the word dump in the syslog and locate any messages pertaining to the dump. If there was a problem taking the dump, an IEAxxx type message is issued. For example: IEA911I PARTIAL DUMP ON MVS.O1MP.DMP00056 678 IEA043I SVC DUMP REACHED MAXSPACE LIMIT - MAXSPACE=xxxx MEG In that case, you should fix the dump problem first before you attempt to analyze the dump because crucial information might not be written to the dump. Also, ensure that your WebSphere for z/OS servers have the authority to create and write to the dump data sets.
106
18.Was a dump captured? Was a dump captured? Were you able to locate the dump? If no, set a SLIP. as described in step 19. If you located the dump, then prepare to analyze the dump as described in step 21. 19.Set SLIP for SVCDump. Dumps can be suppressed by the dump analysis and elimination process. In this case, set a SLIP using the z/OS SLIP SET command to capture a dump when the symptom occurs. Example 11-2 shows a SLIP that was used to capture a dump for an abend EC3 (a started task failure). It uses a wild card for the reason code so that any of the 0413000* abend reason codes that occur are allowed. The ASIDLIST is for current, home, primary, and secondary address spaces, and includes other address spaces in the dump in case you are in cross memory with them at the time.
Example 11-2 Example for setting a SLIP
SLIP SET,A=SVCD,COMP=EC3,REASON=0413000x,ID=WEC3,MATCHLIM=20, SDATA=(ALLNUC,CSA,GRSQ,LPA,LSQA,PSA,RGN,SQA,SUM,SWA,TRT), ASIDLST=(0,H,I,P,S) Refer to z/OS MVS System Commands, SA22-7627-11, for a full description and syntax of the SLIP command. If you are unsure about the most appropriate SLIP, contact IBM support for assistance. 20.Reproduce or wait for reoccurrence. With the SLIP set, try to reproduce the error. If you cannot reproduce the error, wait for the problem to reoccur with the SLIP in place. 21.Analyze the SVCDUMP. To analyze the SVCDUMP, invoke IPCS. Several methods can be used to analyze a dump using IPCS. We outlined one approach in Analyze the SVCDUMP. on page 54 in Chapter 5, Abend on page 49. More information about IPCS is in: z/OS V1R6.0 MVS IPCS Commands, SA22-7594-05 z/OS V1R2.0 MVS IPCS User's Guide, SA22-7596-01 Refer to 20.3, SVC dumps on page 247, for more information about how to use SVCDUMP for problem analysis in WebSphere for z/OS. If you are unable to find a solution, prepare MustGather documentation and contact IBM as described in step 22. 22.Assemble MustGather documentation. For more information about MustGather, see MustGather on page 16. One of the MustGather documents is MustGather: Read first for WebSphere Application Server for z/OS. Read it for help with assembling the appropriate documentation. The minimum information that is needed is: Problem description Include information that is related to when the problem first started to occur, whether it occurs only at certain times, and whether there have been any changes to the system such as maintenance or a new application. Version of WebSphere application server and build level Find this information in the job log of your application server. Search for build level to obtain a line that is similar to this:
107
BBOM0007I CURRENT CB SERVICE LEVEL IS build level cf10515.05 release WAS601.ZNATV date 04/15/05 12:55:41. Operating system version and maintenance (PUT) level The job log of the application server in question (including both controller and servant region job logs) Any dumps or traces triggered by the problem The SVCDUMP triggered by the SLIP See 2.3, Before you contact IBM support on page 15. Then proceed with step 23 to contact IBM Support. 23.Contact IBM support If you need to contact IBM support, refer to Chapter 2, Contacting IBM: Information on page 13, for instructions. Provide the information outlined in the MustGather documentation step.
108
12
Chapter 12.
109
Display Filter View Print Options Help -----------------------------------------------------------------------------------SDSF DA WC42 PAG 0 SIO 1347 CPU 32/ 32 LINE 1-20 (485) COMMAND INPUT ===> SCROLL ===> CSR PREFIX=* DEST=(ALL) OWNER=* SORT=CPU%/D SYSNAME= NP JOBNAME StepName ProcStep JobID Owner C Pos DP Real Paging SIO CPU% WS6422S WS6422S BBOSR STC17374 WSCHTSK IN FD 4082 0.00 2925.2 9.20 *MASTER* STC25439 +MASTER+ NS FF 9097 0.00 0.54 9.70 SYSVIEW SYSVIEW AOFAPPL STC25503 NETVTASK NS FE 8868 0.00 1.63 2.96 WLM WLM IEFPROC NS FF 2298 0.00 0.00 2.61 RMFGAT RMFGAT IEFPROC STC25484 RMFTASK NS FE 2278 0.00 0.00 2.61 WDSCH1BS WDSCH1BS BBOSR STC17374 WDSCHTSK IN FD 4049 0.00 1334.6 2.61 GRS GRS NS FF 6290 0.00 0.00 2.26 OM2CMS OM2CMS CMS STC25619 OMITASK NS FE 2370 0.00 0.00 2.26 U283614 IKJCEF QATCCBLP TSU13860 U283614 IN FB 1314 0.00 0.00 2.09 JES2 AAIB IEFPROC NS FE 5141 0.00 0.00 1.74 NET NET NET STC25479 VTAMTASK NS FE 2051 0.00 0.00 3.22 WDDEMN WDDEMN BBODAEMN STC17278 WDTASK NS FE 274 0.00 0.00 0.00
110
13
11 1
12 Yes
10
Is it an application module?
No
2 Capture console
Yes 8 9 14
15 No 17
Loop found?
No
No
18
16 Take corrective
action
111
DUMP COMM=(Descriptive name for this Webserver dump) R rn,SDATA=(CSA,SQA,RGN,TRT,GRSQ,LPA,LSQA,SUM,NUC,PSA),CONT R rn,JOBNAME=(OMVS,controlregionname,servantregionname),CONT R rn DSPNAME ('OMVS'.*),END 3. Format the dump system trace. Using the IPCS utility, format the system trace table in the dump with this command: ip systrace time(local) The system trace table shows the activity of the address space at the TCB level. The most recent events are at the bottom. Example 12-3 shows a sample system trace table. The output has been truncated on the right to fit the page.
Example 12-3 System Trace Table
--------------------------------------------------- SYSTEM TRACE TABLE ----------------------------------------PR ASID WU-ADDR- IDENT CD/D PSW----- ADDRESS- UNIQUE-1 UNIQUE-2 UNIQUE-3 UNIQUE-4 UNIQUE-5 UNIQUE-6 03 00FB 009E69C0 DSP 470C0400 80FF6EE8 00000000 40000001 CE6C4B58 03 00FB 009E69C0 PR ... 0 31B0AB86 00FF6EEE 03 00FB 009E69C0 PR ... 0 0663EAAB 31CBBAB4 03 00FB 009E69C0 PC ... 8 0663EAAB 01300 03 00FB 009E69C0 PC ... 0 31B0AB86 0030D 03 00FB 009E69C0 SSRV 128 00000000 CE6C4B58 40000001 00000000 00000000 03 00FB 009C3030 DSP 470C0400 80FF6EE8 00000000 40000001 CE6C3888 03 00FB 009C3030 PR ... 0 31B0AB86 00FF6EEE 03 00FB 009C3030 PR ... 0 0663EAAB 31CBBAB4 03 00FB 009C3030 PC ... 8 0663EAAB 01300 03 00FB 009C3030 PC ... 0 31B0AB86 0030D 03 00FB 009C3030 SSRV 128 00000000 CE6C3888 40000001 00000000 00000000 03 00FB 0099F828 DSP 470C0400 80FF6EE8 00000000 40000001 CE6BC268 03 00FB 0099F828 PR ... 0 31B0AB86 00FF6EEE 03 00FB 0099F828 PR ... 0 0663EAAB 31CBBAB4 03 00FB 0099F828 PC ... 8 0663EAAB 01300 03 00FB 0099F828 PC ... 0 31B0AB86 0030D 03 00FB 0099F828 SSRV 128 00000000 CE6BC268 40000001 00000000 00000000 4. Check for loop. Using the system trace, begin at the bottom of the file with the most recent entries. Look for any recurring or repetitive patterns in the system trace entries with the same PSW addresses. Using these addresses, review them in dump storage with IPCS option 1. Scroll up from the address in storage and locate any eye catchers that identify module names. It might be difficult to determine the module names and relate them to Java methods. With the TCB, you can run a traceback that might help identify what code the TCB is running. Use this command: ip verbx ledata 'tcb(009C31C8) nthreads(*) asid(00fb)'
112
Example 12-4 shows a truncated version of the TCB traceback information and the methods used.
Example 12-4 TCB traceback and methods
TCB(009C31C8) NTHREADS(*) ASID(00FB) Language Environment Product 04 V01 R6.00 To Display Additional Information: IP VERBX LEDATA 'CAA(5ACA4300)DSA(5ACA65B8) ALL' Information for enclave main Information for thread 322D057000000056 PCB Address: 31B0D080 TCB Address: 009C31C8 Registers and PSW: GPR0..... 00000001 GPR1..... 7F3E3F90 GPR2..... GPR4..... 00008000 GPR5..... 31B0D4F8 GPR6..... GPR8..... 7F3E3F90 GPR9..... 00000000 GPR10.... GPR12.... 5ACA4300 GPR13.... 5ACA65B8 GPR14.... PSW..... 478D0400 80000000 00000000 085DC5B8 Traceback: DSA Addr 5ACA65B8 5ACA6500 5ACA6430 5ACA6370 5ACA62C0 5ACA6208 5ACA6130 5ACA6048 5ACA5F48 5ACA5D08 5ACA5C30 5ACA5B60 5ACA5A90 5ACA59E0
PU Addr 085DAC10 08325060 7C902068 7C914700 7CD82DC0 7CAE7760 34386444 5F250754 624687BC 6118BB94 7CD3A610 7CD3A090 7CD39550 7CD5A608
PU Offset +000019A8 +00000080 +00000192 +0000027E +00000108 +000000F0 +000000DE +00000086 +000000CE +000001DA +00000534 +00000AB4 +00000096 +000000EC
Entry E Addr E Offset CEEOPCW 085DAC10 +000019A8 pthread_cond_wait 08325060 +00000080 condWait 7C902068 +00000192 sysMonitorEnter 7C914700 +0000027E xmIsThreadInterrupted 7CD82DC0 +00000108 JVM_IsInterrupted 7CAE7760 +000000F0 java/lang/Thread.isInterrupted(Z)Z 34386444 +000000DE EDU/oswego/cs/dl/util/concurrent/W 5F250754 +00000086 org/grnds/foundation/cache/GrndsCa 624687BC +000000CE org/grnds/foundation/cache/GrndsCa 6118BB94 +000001DA mmipSelectInvokeJavaMethod 7CD3A610 +00000534 INVOKDMY 7CD3A090 +00000AB4 EXECJAVA 7CD39550 +00000096 mmipExecuteJava 7CD5A608 +000000EC 7CD5A608 +000000EC xeRunDynamicMethod 7CD609C0 +000004D0 threadRT0 7CADBC08 +000000E0 xmExecuteThread
Chapter 12. High CPU utilization
113
CEEOPCMM
5. Have you found a loop? Have you found a loop in your output? If no, proceed with step 6. If yes, go to step 10. 6. Look for TCB with high CPU utilization. Use the system trace entries, starting from the bottom of the trace with the most recent entries, to look for the TCB using high CPU. Identify which TCBs have the most entries. Determine which TCBs are using more CPU time. 7. Have you found a specific TCB? Did you find specific TCBs causing high CPU usage? If yes, then proceed with step 8. If no, IBM support has specific utilities and tools that they can run against the system trace to extract TCB and time statistics. Go to step 17 (to prepare for contacting IBM). 8. Find most used method or module in TCB. In the system trace for the TCB, look at the column IDENT and focus on the DSP and SRB trace entries. A DSP trace entry represents dispatch of a task. An SRB trace entry represents the initial dispatch of a service request. Note the PSW addresses. Browse the PSW addresses of the DSP and SRB entries in dump storage using IPCS option 1. Locate the module eye catchers. Using the data that has been collected, determine which modules show the most activity. 9. Were the modules or a method found? Were you able to identify specific modules and methods with the most activity (causing higher CPU utilization)? If yes, determine the owner, and proceed with step 10. If no, go to step 17 to prepare for contacting IBM support. 10.Determine module owner and method name. From the name of the module or method, you should be able to identify the owner of the code. Specific prefixes indicate specific owners. IBM modules have a prefix that identifies them to a component (for example, BBO modules indicate WebSphere for z/OS code that is owned by IBM). These are documented in Chapter 1 in z/OS V1R4.0 MVS Diagnosis: Reference GA22-7588-03. See A.2, System and component message table on page 315, to identify which other IBM products (such as z/OS components or subsystems) might have created your particular error message. 11.Is a module identified? Have you identified the module? If yes, proceed with step 12. If no, go to step 14. 12.Is it an application module? The module owner can help determine why this module uses more CPU than usual. Is the module code from an application or a non-IBM product? If yes, proceed with step 13. If no, go to step 14.
114
13.Contact application owner. Contact the owner of the application and identify whether the code has any known problems. Identify documentation that they might require to investigate the problem. For example: Traces Response and throughput of JSP, servlets, and EJBs (use Tivoli Performance Viewer) Time stamps in application Using the module and method names, consult the owner of the code. Tip: Even if the module is not from IBM, sometimes you might find hints and tips related to other products and application in context with WebSphere for z/OS when you search the WebSphere Information Center or various other sources outlined in Chapter 3, Information sources on page 25. 14.Search IBM support pages. If the module is an IBM module or method or you are not sure who owns the module, search the IBM support pages and determine the owner and check whether the code has known problems. To search the IBM support Web sites and databases, specifically the WebSphere for z/OS support site, go to: http://www-306.ibm.com/software/webservers/appserv/zos_os390/support/ From this site, you can access other support sites that are related to WebSphere Application Server, its components, and z/OS. You can also consult the specific product manuals or search the IBM Software support Web site at: http://www-950.ibm.com/search/SupportSearchWeb/SupportSearch?pageCode=SPS See Chapter 3, Information sources on page 25, which provides many valuable links and resources for solving problems in WebSphere for z/OS. Tip: When you are searching problem databases for information or fixes that are related to the module or method, consider the format of the search criteria. You might have to alter your search keyword to find a match. If you have exhausted all resources and no apparent fix is found for your problem, proceed with step 15 to prepare to contact IBM. 15.Have you identified the problem and solution? After searching the support pages, did you find the cause of the problem and a solution? If yes, then take the corrective action; see step 16. If no, prepare to contact IBM support; see step 17. 16.Take corrective action. The information that you found using the IBM support data might provide the following solutions: An existing APAR and PTF fix for your problem that is available for you to apply Other reports of your symptoms that include a procedure for fixing the problem
115
In such cases, follow the instructions that are provided or apply the information to your specific problem to solve it. Document the problem and the fixes that you have applied in your system change documentation for your specific WebSphere for z/OS environment for later reference. 17.Assemble MustGather documentation. Read MustGather: Read first for WebSphere Application Server for z/OS, for help with assembling the appropriate documentation. Although it is for WebSphere for z/OS V5, the document: Mustgather: High CPU causing Hang or Loop running V5 for z/OS, might also help you collect the right documentation. For more information about MustGather, see MustGather on page 16. The minimum information necessary is: Problem description Include information that is related to when the problem first started to occur, whether it occurs only at certain times, or what changes have been applied, such as maintenance or a new application. Version of WebSphere Application Server and build level Find this information in the job log of your application server. Search for build level, to obtain a line similar to this: BBOM0007I CURRENT CB SERVICE LEVEL IS build level cf10515.05 release WAS601.ZNATV date 04/15/05 12:55:41. Operating system version and maintenance (PUT) level The job log of the application server in question (including both controller and servant region job logs) Any dumps or traces triggered by the problem Information obtained from performance reports See 2.3, Before you contact IBM support on page 15. Then proceed with step 18 to contact IBM support. 18.Contact IBM support. If you need to contact IBM support, refer to Chapter 2, Contacting IBM: Information on page 13, for instructions. Provide the information that is outlined in the previous step.
116
13
Chapter 13.
117
13.1.2 Throughput
Throughput is a measure of the amount of work that is going through a system in a given time. Typically, this can be measured as the number of transactions per second. As with response time, our measurement of throughput with WebSphere for z/OS is measured based on the work that is completed by the servant region.
13.1.3 Transaction
A strict definition of transaction is logical unit of work. When one transfers money from one account to another, it must be removed from the first account and then added to the second. The transaction includes both of these processes and they must both complete successfully for the transaction to be considered complete. A mechanism is also required to ensure that if one of the processes fails, the other is either not attempted or is also undone. A Web browser user purchasing a book from a Web site might consider the complete process of selecting the book, entering payment and delivery details, and then finalizing the purchase as a single transaction.
118
WebSphere considers each incoming request as a transaction. Each of the requests to WebSphere generated by the customer as they go through the process of buying a book is treated as a separate transaction. Our discussion in this chapter uses the term transaction as viewed by z/OS Workload Manager and reported by RMF.
13.1.7 Resource
A resource is any item that can be used in the transaction process. This can be a physical resource (for example, CPU or memory) or a logical resource (for example, JDBC connection or a queue in WLM). When a WebSphere transaction accesses data in DB2 or CICS, it might also be convenient to refer to DB2 or CICS as a resource.
119
For a transaction to complete, it must be able to access all the resources it requires. For a transaction to perform well, there must be enough of these resources available and they need to be available quickly enough. How much is enough? How quick is quick? There is no firm answer. It depends on your business requirements.
120
WLM queues
WASHI
WASLO
Work Requests
DEFLT
Adminstrative Console
wlm_minimumSRCount=2 wlm_maximumSRCount=6 protocol_http_transactionClass=DEFLT http_transport_class_mapping_file=ITSOTransDefinition.file
ITSOTransDefinition.file TransClassMap TransClassMap TransClassMap TransClassMap TransClassMap edgeplex.itso.ibm.com:* wtsc48oe.itso.ibm.com:* haplex1.itso.ibm.com:7080 *:7070* * /webap1/myservlet /webap2/* * /trade/* /myservlet WASHI WASHI WASHI WASDF WASLO WLM Policy
By default, the minimum number of regions for J2EE servers is one; there is no default maximum. You can override the maximum and minimum number of servant regions that WLM will start with three parameters in the Administrative Console: Minimum number of instances: wlm_minimumSRCount This parameter is used to start up a basic number of servant regions before the day's work arrives. This can reduce the time that is spent waiting for WLM to determine that more servant regions are needed. To keep work from coming in through the protocol handler before servant regions are ready, use: protocol_accept_http_work_after_min_srs=nn Maximum number of instances: wlm_maximumSRCount This parameter is used to cap the number of address spaces started by WLM if you determine that excessive servant regions might contribute to service degradation (for example, if real storage is limited). Multiple instances enabled This parameter is used to limit application server to one servant region. Even if minimum and maximum numbers of servant regions are defined as > 1, the ports will not be open. Ensure that you have selected the option to allow more servant regions.
121
Transactions that are received by the application server controller region are passed to servant regions through a set of WLM queues. The number of queues is determined by the number of service classes that are defined, and one servant region only serves one service class at a given time. To ensure that you do not limit the parallelism of execution under full load, wlm_maximumSRCount should be set so that, at minimum, it is as large as the number of service classes defined. A wlm_maximumSRCount setting that is too low creates a situation where fewer servers are available than WLM queues. The result might be a queue bottleneck under full load conditions, because WLM can be restricted from starting enough servant regions to handle the workload. As a consequence, the system might experience queuing delays in the WLM queues resulting in transactions with elongated response time or timeout errors. For more information about classifying z/OS workload, see the WebSphere Information Center Web site at: http://publib.boulder.ibm.com/infocenter/wasinfo/v6r0/index.jsp
3. Allow for increased concurrency. WebSphere for z/OS does not need threads as placeholders for work because it uses WLM queues. Plan for number of in and ready threads to be 2 to 3 times the number of CPs. Remember, too many threads in a JVM create interference and more frequent garbage collection.
122
Subsystem-Type Xref Notes Options Help -------------------------------------------------------------------------Modify Rules for the Subsystem Type Row 1 to 10 of 10 Command ===> ________________________________________ SCROLL ===> PAGE Subsystem Type . : CB Fold qualifier names? Description . . . WebSphere App Server Action codes: A=After B=Before C=Copy D=Delete row M=Move R=Repeat Y (Y or N)
I=Insert rule IS=Insert Sub-rule More ===> --------Qualifier--------------Class-------Action Type Name Start Service Report DEFAULTS: WASDF OTHER ____ 1 CN FMISRV* ___ ________ WASE ____ 1 CN FMESRV* ___ ________ WASE ____ 1 CN OMESRV* ___ WASLO WASE ____ 1 CN OMTSRV* ___ WASLO WASE ____ 1 CN INTSRV* ___ WASLO WASE ____ 1 CN INESRV* ___ WASLO WASE ____ 1 TN WASLO ___ WASLO WASE ____ 1 TN WASDF ___ WASDF WASE ____ 1 TN WASHI ___ WASHI WASE ****************************** BOTTOM OF DATA ***************************** F1=Help F2=Split F3=Exit F4=Return F7=Up F8=Down F9=Swap F10=Left F11=Right F12=Cancel
Figure 13-2 WLM definitions for servers and transaction classes in CB subsystem
You can assign a default transaction class for the server or server instance in the protocol_http_transactionClass or protocol_https_transactionClass environmental variables (see Figure 13-1 on page 121). For more information, search for workload classification file at the WebSphere Information Center: http://publib.boulder.ibm.com/infocenter/wasinfo/v6r0/index.jsp You can use the virtual host name, port number, or URI template to map the HTTP request to a transaction class with a filtering file that is specified in the http_transport_class_mapping_file variable (see the mapping definitions at the bottom of Figure 13-1 on page 121). The authors recommend that you define WebSphere transaction service classes using a percentage response time objective as illustrated in Figure 13-3 on page 124. The response time goal for the Service Class WASHI is: 90% of all transactions finish within 0.2 seconds.
123
Modify a Service Class Row 1 to 2 of 2 Command ===> ______________________________________________________________ Service Class Name . Description . . . . Workload Name . . . Base Resource Group Cpu Critical . . . . . . . . . . . . . . . . . . . . . . . . : . . . . WASHI LSA510 WAS 200MS RT WAS (name or ?) ________ (name or ?) NO (YES or NO)
Specify BASE GOAL information. Action Codes: I=Insert new period, E=Edit period, D=Delete period. ---Period--- ---------------------Goal--------------------Action # Duration Imp. Description __ __ 1 1 90% complete within 00:00:00.200 ***************************** Bottom of data ****************************** F1=Help F9=Swap F2=Split F3=Exit F10=Menu Bar F12=Cancel F4=Return F7=Up F8=Down
A response time objective is usually consistent with the business requirement of a Web application. The response time value can be adjusted depending on the type of application. Tip: Avoid multi-period goals, because second and subsequent periods are not aggressively managed. Response time goals are better than velocity goals in a true production environment because velocity goals must be recalibrated with environmental changes, such as CPU and workload. This option automatically generates response time distribution information that is reported through an RMF report (see Response time distribution on page 292). This option is useful when you must troubleshoot response time issues. For other methods for assigning transaction classes to incoming work requests, see Performance Engineering & Tuning for WebSphere V5 and V6 on z/OS, PRS804, at: http://www.ibm.com/support/techdocs
124
(Figure 13-4). Again, the authors recommend defining a reporting class to isolate the activity into a specific workload report (for example, WAS2 for INESRV*). Subsystem-Type Xref Notes Options Help -------------------------------------------------------------------------Modify Rules for the Subsystem Type Row 1 to 16 of 61 Command ===> ____________________________________________ SCROLL ===> PAGE Subsystem Type . : STC Fold qualifier names? Description . . . Use Modify to enter YOUR rules Action codes: A=After B=Before C=Copy D=Delete row M=Move R=Repeat Y (Y or N)
Action ____ 1 ____ 1 ____ 1 ____ 1 ____ 1 ____ 1 ____ 1 ____ 1 ____ 1 ____ 1 F1=Help F10=Left
I=Insert rule IS=Insert Sub-rule More ===> --------Qualifier--------------Class-------Type Name Start Service Report DEFAULTS: SYSSTC OTHER TN HWS710* ___ IMSCTL WASI TN FMESRVS* ___ VEL80 WASS TN FMISRVS* ___ VEL80 WASS TN INESRVS* ___ VEL80 WASS TN WSESRVS* ___ VEL80 WASS TN OMESRVS* ___ VEL80 WASS TN HAO* ___ ________ WAS TN FMESRV* ___ VEL85 WAS1 TN INESRV* ___ VEL85 WAS2 TN OMESRV* ___ VEL85 WAS3 F2=Split F3=Exit F4=Return F7=Up F8=Down F9=Swap F11=Right F12=Cancel
125
To distribute HTTP requests evenly across servant regions: 1. For the desired servers in the Environment variables in the Administrative Console, specify: wlm_stateful_sesion_placement_on=1 2. Optimize the minimum and maximum number of servant regions. 3. Consider eliminating transaction class mapping and minimize the number of different service classes for these servers. To determine whether the classification scheme that was implemented is classifying work as expected, use the z/OS operator command to display WLM classification of work requests: F <server>,DISPLAY,WORK,CLINFO See 18.2, z/OS MODIFY commands on page 196 for more information about z/OS commands.
WebSphere for z/OS is J2EE compliant, WebSphere for z/OS applications do not necessarily behave in the same way as those for WebSphere on distributed platforms because the underlying system functions are used differently. Business requirements Unless they are backed by data that confirms that these requirements can be met in your environment, in reality this might be little more than a statement of intent. Load testing Using workload simulation tools, such as WebSphere Studio Workload Simulator (see 22.5, Stress test tools on page 300), you can evaluate how an application will behave in your environment as long as you can recreate a testing environment that matches the projected production environment. Capacity planning Although there is very little information published on this topic, your IBM representative or authorized Business Partner has access to Technical Support to do pre-sales sizing estimates for your WebSphere for z/OS applications. Note: Keep in mind that, although it is important to set expectations, some of the methods might lead to unreliable and unrealistic expectations.
127
A common cause of performance problems is having several address spaces or threads (or tasks) compete for the same resource. This could be a hardware resource or a serially usable software resource. Most problems revolve around unacceptably high response times or resource usage. However, the definition of unacceptably high varies from one installation to another. For z/OS, the peaks and troughs of other workloads in the same system image impact the WebSphere environment and vice versa. The business might have to prioritize other workloads on the image at some point in time, such as year-end batch processing, even though this can be detrimental to the performance of WebSphere for z/OS applications. Note: The tools and techniques in this chapter can help you identify where your resources are being consumed and why and where your application is experiencing delays. They cannot tell you whether such answers are applicable to your situation. Ultimately, this is a business decision.
128
recommendations in the IBM HTTP Server manual, available for download under the Features and Elements list of your particular z/OS version, at: http://www-03.ibm.com/servers/eserver/zseries/zos/bkserv/ On Demand Business is a fast-moving world. Consequently, there is a seemingly never-ending supply of maintenance. Keeping current with maintenance is more important than for more traditional workloads and often brings improved performance. Check for WebSphere performance information in APARs. The latest information can be found at: http://www-1.ibm.com/support/docview.wss?rs=180&uid=swg27006970 Attention: Never attempt to tune an application that runs smoothly and without performance problems. Otherwise, you might limit resources, and that can lead to performance problems. Always tune your applications in test systems first to analyze the impact and consequences. Only tune one variable at a time and document the exact environment, the tuned variable, and the impact that was experienced for later reference. Fix the problem, if you can. Generally, a performance problem is the result of the workload not getting the physical or logical resources that are necessary for it to complete in a timely manner, so the solution is to make more resources available to the application. You can do this by: Buying more If there is no other means for making your application performance meet your expectations, add more resources. At times, this might also be more cost effective than recoding a badly-written application. Beware of induced costs. Stealing it Take it from a less important application. The price that you pay is lower service to the application from which the resources are stolen. Using less For example, fixing badly written code, revising poorly performing SQL queries, or adding appropriate indexes to improve fetches from databases might involve labor costs, but these costs might still be lower than buying new hardware. Live with the problem. If none of the above options is technically or financially possible, you must change your expectations. At least you know why the performance is not meeting your previous expectations. Your users might be disappointed by the answer, but at least you can give them facts and convince them that the situation is understood and under control. Changing the perception might be an important factor in user satisfaction.
129
environment (Figure 13-5) when you are analyzing potential problem areas for your application performance. The correction that you apply as you try to remove the bottleneck might not produce the improvements that you are looking for, or might just move you from one problem to another further down the line.
zSeries Processor Model LPAR Definitions LPAR Processing Weight WebSphere Application Response Time Network Sprayer Response Time S Y S P L E X
J2EE Container
JCA
CICS IMS MQ
D I S T
W eb Container
JDBC
DB2
W LM Management
DB2
Number of Users, User Think Time, Page Rate, Response Time
DB2
40
35
30
LPAR CPU%
25
15
10
0
10 20 30 40 50 60
Time (clock)
For intervals 35, 40, and 45 it is highly probable that the partition is CPU-constrained to its guaranteed share because of activity in other logical partitions. Although not an anomaly, this is something that you should remember for the rest of the analysis. Remember that your LPAR CPU share is relative to the sum of the weights of all partitions. As a consequence, your guaranteed share is reevaluated for every change in the logical configuration: Every time a logical partition is activated or deactivated Every time operations update the processing weights Dynamically if your logical partition participates in an LPAR cluster
131
The Partition Data Report alone cannot tell you whether this is good, bad, or normal, but you can use it to determine whether these factors meet your expectations. CPU queue In the distributed world, running CPU above 50% is unusual. In zSeries, running CPU at 90% or more is not necessarily an indication of a problem and is, indeed, common. Even 100% is not necessarily a problem; in this case, you should investigate further to evaluate how much queuing it causes. The CPU Activity Report can help you. Check the Queuing Report in the CPU Report. If the queue length substantially exceeds three times the number of CPs online in the configuration, one workload might have a CPU delay problem. Although it might not be a performance problem and might only affect a non-priority batch workload, you should remember it for a later step. Paging activity Check the system paging level in the RMF Summary Report. The RMF Summary Report indicates demand paging rate for the whole system. As with CPU, high paging is not necessarily a problem, but high paging might lead to a CPU penalty and response time problems. If system paging is indicated, then check if paging occurs at the servant region level. Check the STORAGE and PAGING sections in the Workload Report for the servant regions. Make sure that you check the address space, not the enclave, because the enclave shows zero values for PAGING and STORAGE.
132
[1]
100 2 100
[2]
2
90
90
60 CP % Busy
60 CP % Busy
RT 90%
50
RT 90%
50
CP APPL % CP % Busy
CP APPL % CP % Busy
40
40
30 0.5 20
30 0.5 20
10
10
In both examples, the workload CP APPL% grows almost linearly with the transaction rate. This is to be expected when no problem is present. Although visibility might vary with the length of the measurement interval, it is very likely that, in the case of a performance issue, CP usage and throughput do not correlate linearly. Graph 1 in Figure 13-7 illustrates a situation with no problems. The system behaves as expected in the observed range, even though the amount of CPU resources used might not meet your expectations. The 90th percentile response time remains sub-second until the workload CP usage reaches 90%, where there is an important increase. This is normal. Graph 2 illustrates a typical throughput problem. The bend of the response time curve appears long before the APPL% CP usage reaches 90%. More investigation is required to determine the cause of the problem: Check the WLM definitions. Other workloads that are running on the sysplex and competing for CP resources might take precedence over WebSphere applications. If this is by mistake, change the WLM settings. If this is desired, WLM is enforcing business priorities as defined and it is no longer an issue that has a technical solution. Check other resources for constraints. If the workload increases, it might be another z/OS-managed physical resource (I/O or storage) or a logical resource that is a constraint. If it is a logical resource, it might be in the WebSphere infrastructure, in another z/OS component, another subsystem (DB2, CICS, IMS), or in the application itself. Figure 13-8 on page 134 illustrates another practical example of a WebSphere application workload that is running dedicated in a single z/OS image. It was run on partition SC48 with two online CPs (hence the 200% on the CP% busy axis) on a zSeries model 1C8.
133
200
175
150
125
RT avg RT 90%
100
CP APPL % CP % Busy
75
50
25
0
3.47 13.87 25.73 26.42 29.96 32.88 34.02 36.25 37.58
The CPU usage from APPL% plots linearly with the throughput. We used a linear regression with a 0.99 R-square for the example. If the workload APPL% does not plot in a linear fashion, then it is usually an indication of a performance problem. The response time does not plot linearly. It slowly grows up to a point where it jumps significantly. The bend of the curve indicates the scalability limit of the workload given the current logical and physical configuration. The bend of the response time curve appears just above 32 transactions per second while the APPL% is approximately 110% and the total LPAR CPU% is 140%. From the graph in Figure 13-8, you can deduce that: There is a response time problem above 32 transactions per second. It is not related to CPU usage of the WebSphere Application Server. It is not related to a CPU queuing problem at the server level because the server has not reached the LPAR guaranteed share. A memory problem is not likely. Another indicator that can prove useful is the average CP usage per transaction. It can be expressed in various units. The authors used the number of milliseconds of CP per transaction. In normal circumstances the average CP millisecond per transaction should be nearly constant across the throughput range, as shown in Figure 13-9 on page 135. A significant variation is usually an indication of a performance problem.
134
CP % Busy
200
40
175
150
30
125
100
20
75
50
10
25
0
0 10 20 30 40
You can quantify the CP per transaction using three methods, the only important point being that the interpretation of the numbers should be kept consistent with the method chosen: The workload CP APPL%, that is, only the workload that is reported in the enclave. This workload includes application CP time in WebSphere and in any subsystem (DB2, MQ, IMS, or CICS) called on behalf of the transaction. The workload CP APPL% plus the WebSphere server address spaces. The time then reflects variations when additional servers are started/stopped because of the Application Environment or when servers are recycled. It also shows the time that is incurred because of the Garbage Collector. The total CP time, that is, the workload CP APPL% plus the WebSphere server address spaces plus the apportioned uncaptured CP time. Although this is the gross value preferred for cost calculations, it might not be the best one to use for performance analysis. Depending on your conclusion about where the performance problem might be, you can use one of the three options to analyze and solve the problem: If CPU consumption in the server address space (not the application environment) is higher than expected, you are probably experiencing a memory leak or heap problem. Refer to 13.4.4, Analyzing a heap or memory problem on page 136. If response time becomes significantly worse (getting to the bend of the curve) as you apply more load without using available CPU, then you are probably experiencing a delay problem. Refer to 13.4.5, Analyzing a response time problem on page 136. 4. If CPU utilization seems higher than expected for the current transaction rate, you are probably experiencing CPU problems. Refer to 13.4.6, Analyzing a high CPU usage problem on page 137.
CP %
135
5. Test hypotheses, one by one, by gathering additional information. 6. Be prepared to repeat this process until the identified problem has been resolved. For more information about analyzing increased response times of WebSphere for z/OS applications or If you suspect the delay in the back-end subsystems, see Monitoring WebSphere Application Performance on z/OS, SG24-6825.
137
138
Part 3
Part
139
Phase 1
Setting up the runtime
Phase 2
Deploying an application
Problem categorization
Phase 4
Runtime system problems
Phase 3
Running an application
These phases are: Phase 1: Installation, configuration, and migration (see Chapter 14 on page 141) Phase 2: Application deployment (see Chapter 15 on page 153) Phase 3: Testing and running applications (see Chapter 16 on page 165) Phase 4: System environment (production) (see in Chapter 17 on page 177) We give a general overview of the problem areas in the individual phases, explain how to analyze them, and provide valuable hints and tips about how to avoid problems.
140
14
Chapter 14.
TCP/IP
Security
DNS Configuration
This chapter describes various methods for preventing common problems during the installation and migration processes of WebSphere for z/OS. We give hints and tips for the coexistence of WebSphere for z/OS V6 with previous releases and list common problems and their solutions. We also mention means and tools that can help you solve problems in this phase.
141
142
The following checklist can help you with your specific setup: The recommended primary size allocation for the HFS is 250 cylinders (3390); the recommended secondary allocation is 100 cylinders (3390). If possible, set up your HFS so that the root HFS is shared by all processors and so that the deployment manager configuration is in an HFS configuration on a system-generic mount point. Understand the HFSs for the application servers, the nodes, the daemons, and the cells. It might be necessary to resize your system dump data sets because of the size of WebSphere address spaces, and where possible, evaluate the use of dynamic dump data sets. If you are running in a sysplex, set up your TCP/IP with Sysplex Distributor to make use of dynamic virtual IP addresses (DVIPAs). If ARM is enabled, you might want to disable it for the WebSphere Application Server address spaces during installation and customization to avoid unnecessary restarts of the address spaces. After installation and customization are complete, you should consider re-enabling ARM. Search for Ensuring problem avoidance at the Information Center for additional information about USS/HFS configurations, System Modification Program Extended (SMP/E) tasks, ISPF dialogs, TCP/IP configurations, and security information.
143
Initial customization of WebSphere Application Server for z/OS V6.0.1 requires that an installation be at a minimum Service Level of 6.0.1.2 (PTF UQ04304). Check for the latest maintenance requirements at the following Web site: http://www-306.ibm.com/software/webservers/appserv/zos_os390/support/ Check the product PSP bucket WASAS601 subset H28W601 to verify that all the suggested maintenance has been applied. Make sure that the product code HFSs are mounted in the directories that you have chosen in the planning session. Your installation might limit (control) the specification of REGION=, usually through the JES2 EXIT06 exit or the JES3 IATUX03 exit. If so, relax this restriction for the WebSphere for z/OS JCL procedures. Navigating the configuration HFS with a UID of 0 can alter files or their ownership and permission attributes, making them inaccessible to the WebSphere for z/OS runtime servers and administrators. It is better to use the WebSphere for z/OS administrator user ID. Always run the installation jobs from the same system where WebSphere for z/OS is being installed. Use the JOBPARM card below the JOB card to avoid running the jobs from different systems. The syntax for the JOBPARM is /*JOBPARM SYSAFF=SXX where SXX is the system name. When using DB2 for z/OS, the messaging engine cannot dynamically create the data store tables. This means that you must manually create these tables using the DDL statements produced by the sibDDLgenerator command. You can find instructions at: http://publib.boulder.ibm.com/infocenter/ws60help/index.jsp?topic=/com.ibm.webs phere.pmc.zseries.doc\ref\rjm0630_.html You can redirect the output of the command to a file so that you can submit it to DB2 for z/OS later. One way to do this is to use SPUFI, but before you do this, you must copy the file to an FB80 data set first. As the final step, you must clear the Create tables box (for every messaging engine that use sDB2 for z/OS) from the data store panel of the Administrative Console.
14.3 Migration
When you migrate WebSphere Application Server products, you change the existing environment and applications so that they are compatible with the current product version. To understand the migration process, go to Migrating, coexisting, and interoperating, at the Information Center: http://publib.boulder.ibm.com/infocenter/wasinfo/v6r0/index.jsp Or, see WebSphere Application Server V6 for z/OS, Migrating, coexisting, and interoperating, SA23-2207, which is available from the WebSphere for z/OS library Web site: http://www-306.ibm.com/software/webservers/appserv/was/library
144
The migration utilities in WebSphere for z/OS 6.0.x support migration from V5.x. Search for Migrating product configurations at the Information Center and use it as a starting point for planning information, customization dialogs, and V5.x to V6.0.x migration explanations for stand-alone application server nodes, deployment managers, and federated nodes. Migration from V5.x to V6.0.1 is the same as that from V5.0 to V5.1 at the highest level. You copy the existing HFS configuration, transform it to V6.0.x, and write it to a new HFS. Prior to migration, the old configuration is renamed so that if, for any reason, the migration fails, users should be able to go back to their previous configuration. See Migrating from WebSphere for z/OS V5.x to V6 - An Example Migration, WP100559, which is available from: http://www-03.ibm.com/support/techdocs/atsmastr.nsf/WebIndex/WP100559
145
If you are running other versions of WebSphere or z/OS, watch out for potential location service daemon port collisions or LPA issues that are caused by different versions in one system. Review the ports that have been defined to ensure that the WebSphere for z/OS V6 installation does not conflict with existing port definitions for previous WebSphere for z/OS releases. In particular, when you are installing V6 to coexist with V4.01 or V5.x, note that the default daemon port definition for all versions (V6, V5, and V4) is the same. The authors recommend setting up V6.x installation with the STEPLIB if V6.x must coexist with any prior releases. Because of naming conflicts, V5 and V6 product code cannot be in LPA at the same time. To support coexistence: Place the V6 SBBOLPA data set in the STEPLIB of the V6 daemon. If a prior level of SBBOLOAD is in LPA, add a V6 STEPLIB for SBBOLOAD. BBORTSS5 must be in LPA to make CTRACE work. If V5.0x is already installed and running in the system, then BBORTSS5 must be in LPA already. To check whether BBORTSS5 is in LPA, use D PROG,LPA,MOD=BBORTSS5 from the SDSF, syslog, or console. This does not cause any coexistence issues, because the DLL name is different from V4.0.1 and the module is the same for V5 and V6. Only one set of PPT entries can be active at one time for a given program. WebSphere for z/OS V4.0.1 and V5 both use the BBOCTL program as a controller region. If you are running V4.0.1 and V5 on the same system, their BBOCTL programs share the same PPT entry. Prior to V4.0.1, Service Level W401610, the PPT attribute SYST was required for the BBOCTL program; it is not required after that service level. Therefore, including the SYST keyword in the PPT entry for the V5 BBOCTL program causes an informational message (IEF188I PROBLEM PROGRAM ATTRIBUTES ASSIGNED) to appear when you start a V5 server (V5 does not require the SYST attribute). This message does not affect the functionality of the WebSphere address spaces. If you do not want this message to appear, and your V4.0.1 Service Level is at least W401610, you can delete SYST from the SCHEDxx member to stop the message from being generated. If you are not at this service level, you must leave SYST in the PPT for BBOCTL to start the V4.0.1 server. IEF188I is issued when V5 server is started as long as BBOCTL is defined as a system task. The authors recommend using the WLM dynamic application environment when you are configuring V6 so that a specific server name can be used by V6 and by a server on V5. A return code of 0 means nothing in a migration job. Be sure to review the .err and .out logs carefully for diagnostic information. A migration cannot be restarted after the process is started. If something fails during the process, you must start again from the beginning. Always run the jobs from the same system where the node being migrated is located. Use the JOBPARM card below the JOB card to avoid running the jobs from different system. Syntax for the JOBPARM is /*JOBPARM SYSAFF=SXX where SXX is the system name.
146
14.4 Coexistence
WebSphere for z/OS V6 can coexist with any of the prior WebSphere releases on the same LPAR. V6.0.x can coexist with V5.0.x or higher in the same cell with a few known limitations, such as: The Deployment Manager must be at the highest release level in the cell that has mixed releases. If you have multiple V5.0.x nodes on the same LPAR, they all should be migrated simultaneously because Version 5.0.x nodes cannot coexist with Version 6.0.x nodes on the same LPAR. This restriction does not apply for V5.1.x and V6.0.x nodes. The V6.0.x Deployment Manager can manage Version 5.x nodes. In a coexistence situation, the V5.x configuration tree cannot be modified until Service Level W6012XX or higher of the V6.0x driver. At this level, the restriction is lifted but modification can be done to existing nodes and servers. Important: There must only be one daemon per cell in an LPAR. The following restrictions apply when multiple releases or versions are in the same LPAR: Cells cannot have the same short names. Only one version of the code can exist in LPA/LNKLST on the same LPAR; the others must be included in the STEPLIB. For successful coexistence, ensure that: The load modules are in LPA for one system, and the load modules are in STEPLIB for the other system. The ports are unique between the two systems. The daemon_group_name values are unique between the two systems. This is a known cause of the ABEND EC3 with reason code 02060018.
147
Problem: Message IGW01513T appears during the SMP/E receive of WebSphere for z/OS and OS/390. This message is produced by the utility IEBCOPY while copying files to TLIB (target library). Solution: This error is the result of the output record format of the PDSE being forced to Fixed Blocked (FB) when it should be Undefined. If you have a DFSMS ACS routine that controls the allocation of the PDSE data sets and forces the PDSE record format to FB, this problem is the result. You can resolve it if you use a DFSMS ACS routine to ensure that PDSEs are created with: RECFM=U Tools used: SYSLOG and job log from SMP receive. Problem: After building your WebSphere for z/OS V5 environment, you need to change the WebSphere configuration root directory. Solution: It is part of the setup process to specify the WebSphere configuration root directory. You then run a series of configuration jobs to build the WebSphere for z/OS V5 environment. After building your V5 environment, there is no simple way to change this root directory. The only option is to rebuild the V5 environment from scratch, specifying the new configuration root directory in the ISPF panels and rerunning all of the configuration batch jobs. If you have made significant changes to the V5 environment, such as defining several clusters, creating various definitions, and installing a number of applications, you must redo all of this work, which can be a lengthy process. The value of the configuration root directory file is that it is stored in several places. This value should be changed in several files and some definitions could be easily missed. A procedure that addresses this problem can be found on the WebSphere for z/OS support page. Search for configuration root directory. Tools used: None. Problem: You receive this error message in the console: SECJ4046E: Duplicate login configuration name system.wssecurity.IDAssertion. Will over write. Duplicate login configuration name system.wssecurity.Signature. Will over write. Solution: Remove all duplicate entries in wsjaas.conf, and then restart WebSphere for z/OS. If the node is part of a WebSphere cell, you might need to remove duplicate entries in all four locations where wsjaas.conf is stored: install_root /bin/wsinstance/propdefaults DeploymentManager/bin/wsinstance/propdefaults install_root /properties DeploymentManager/properties
Tools used: None. Problem: You do not know how to check the version and history information of your WebSphere Application Server for z/OS V5 environment. Solution: Look in the SystemOut.log file for the specific installation for the base Install_Root/logs/nodeagent/SystemOut.log and for the Deployment Manager Install_Root/logs/dmgr/SystemOut.log. Alternatively, you can run the versionInfo command from the /bin directory of the specific installation, for example:
Install_Root/bin>versionInfo.sh
148
Problem: When WebSphere V5.0.2 for z/OS is running, starting up a new V5.1 application server causes the Control Region to abend with SEC3 in the BBOSSACE module. Solution: Further review of the configuration setting in the job logs for both V5.0.2 and V5.1.0 shows that the names of the daemon group (cell name) are identical:
daemon_group_name: PDCELL
This is a coexistence issue. Rename daemon_group_name for V5.1.0 to a different name. Tools used: Job log. Problem: When you are generating installation jobs, you receive the error message BBOMNINS stating that: BBOMNINS: BBOIPCSP does not exist Or: BBOSCHED does not exist. The instructions for the creation of a Deployment Manager node referred to two optional members, BBOSCHED and BBOIPCSCP. However, they are not being generated for use. Also, the START command for the node agent in the managed node generates instructions that reference the wrong node name. Solution: Update the Customization Dialog to generate the missing jobs, and correct the START command. APAR PK07293 is associated with Service Level (Fix Pack) 6.0.2.1 (Build Level cf10533.10) of WebSphere Application Server V6.0.1 for z/OS. Tools used: FTP tool. Problem: The release number no longer appears in the job output, and the following message appears in the log: BBOM0007I CURRENT CB SERVICE LEVEL IS build level cf10515.05 Solution: The cf1 stands for Cumulative Fix 1. You will see a cf2 in the future when a series of WebSphere for z/OS APARs is rolled into Cumulative Fix 2. Therefore, cf10515.05 stands for: Cumulative Fix1, and the service date is the fifteenth week of 2005 (0515), and the .05 is the day of the week that the PTF was cut. The .05 is for internal use and it could be any value from .01 to .05. So the next set of PTFs could be cf10521.xx because we expect a PTF to be available about once every six to eight weeks. As to whether the cf1 will change with each PTF release, the answer is no. Only a major series of PTFs (a level set across WebSphere family of products across platforms) would result in the cf1 changing to cf2. So, you can anticipate cf10521.xx or cf10523.xx for the next series of fixes. Although it is more difficult, you still must keep track of WebSphere maintenance levels. See the APAR/PTF table for WebSphere Application Server V6.0.1 for z/OS at:
http://www-306.ibm.com/software/webservers/appserv/zos_os390/support/
Tools used: FTP tool. Problem: After migrating the deployment manager cell from WebSphere Application Server for z/OS Version 5.0.2 at service level W502030 to Version 6.0.1.2, the Deployment Manager does not start, although the migration job ran successfully. The error message is: BBOO0220E: WSVR0009E: Error occurred during startup META-INF/ws-server-components.xml com.ibm.ws.runtime.WsServerImpl com.ibm.ws.runtime.WsServerImpl Solution: If trace is not enabled, enable it first and analyze the trace. In this case, the messages in Example 14-1 were issued and used to analyze and solve the problem.
149
BBOO0222I: SECJ0240I: Security service initialization completed successfully BBOO0222I: PROX0000I: z/OS Web Router v6 deployment recognized with configuration root=<null>. BBOO0221W: SECJ0288E: Error during security initialization. BBOO0222I: HMGR0206I: The Coordinator is an Active Coordinator for core group DefaultCoreGroup. BBOO0220E: HMGR0024W: An error was encountered while looking up the IP address for the host name of a core group member. The host name is GENERIC and the server name is SYSPROG\SYSPROG\dmgr. The member will be excluded from the core group. com.ibm.ws.hamanager.coordinator.dcs.CoreStackMembershipManager com.ibm.ws.hamanager.coordinator.dcs.CoreStackMembershipManager BBOO0220E: HMGR0002E: HA Manager services on this process were not started. This server is not a member of a core group. com.ibm.ws.hamanager.coordinator.dcs.CoreStackMembershipManager com.ibm.ws.hamanager.coordinator.dcs.CoreStackMembershipManager BBOO0222I: CWRCB0103I: The core group bridge service has stopped. BBOO0220E: WSVR0009E: Error occurred during startup. META-INF/ws-server-components.xml com.ibm.ws.runtime.WsServerImpl com.ibm.ws.runtime.WsServerImpl
...
Caused by: com.ibm.wsspi.hamanager.HAException: Local Member SYSPROG\SYSPROG\dmgr is not a member of the core group .at com.ibm.ws.hamanager.coordinator.dcs.CoreStackMembershipManager.<init>(CoreStackMe mbershipManager.java:129) .at com.ibm.ws.hamanager.coordinator.impl.DCSPluginImpl.<init>(DCSPluginImpl .java:207) .... 13 more com.ibm.ws.runtime.WsServerImpl com.ibm.ws.runtime.WsServerImpl BossLog: { 0023} 2005/06/19 23:28:22.241 01 SYSTEM=XCSF SERVER=PPT5MGR PID=0X05010055 TID=0X18AA0410 00000000 c=UNK ./bbolsys.cpp+839 ... BBOO0157E JVM EXIT API DRIVEN. JVM EXITING WITH CODE=-1 An error was encountered during the IP address lookup for the host name of a core group member. The host name is GENERIC and the server name is SYSPROG\SYSPROG\dmgr. The member is excluded from the core group. One of the environment variables in the BBOWMG3D migration job is dcsHost=GENERIC. After PTFs UK04303 and UK04304, the High Availability Manager Host has no field for entering the name in the second panel of 3. Server customization (in the migration process). Therefore, the BBOWDMG3 job is generated with the dcsHost=GENERIC value. Edit the dcsHost variable in the job to state the correct host name and rerun that job. Start the Deployment Manager again. Tools used: Trace Analyzer for WebSphere Application Server. Problem: During the configuration of WebSphere Application Server V6 for z/OS and an attempt to access the Network Deployment cell, an error message appears (in BBORBLOG): SRVE0017W: A WebGroup/Virtual Host to handle Not found. has not been defined. Solution: This symptom is common when the BBOWWPFD job was not executed correctly. Check if the return code was zero (RC=0). If is there a TIME option in the JCL for the BBOWWPFD job your task can exceed the CPU runtime limit. Remove this option and submit this job again.
150
Problem: After migrating an application from a WebSphere for z/OS V5 cluster to version 6, the application started successfully. You were prompted for a password and authorized successfully but the browser returned an HTTP 404 error. Solution: Several attempts to migrate the application failed. After analyzing the problem systematically using the flow charts from Part 2, Problem symptoms and their resolutions on page 39, we came to the conclusion that it is not a migration or an application server problem. The error was related to using security functionality with IBM Tivoli Access Manager for z/OS. We changed the Tivoli Access Manager configuration and applied the latest maintenance for WebSphere for z/OS to include the latest Java for z/OS updates to fix the problem. Tools used: Flow charts, Tivoli Access Manager, latest WebSphere for z/OS maintenance release. Problem: When logging out of the Administrative Console, you often receive an HTTP Error 404: Error 404 An error occurred while processing request: /ibm/console/ibm/console/logon.jsp Message: SRVE0200E: Servlet [_ibmjsp.ibm.console._logon]: Could not find required servlet class - _ibmjsp.ibm.console._logon Solution: This is a known problem and relates to versions before 6.0.2. It is addressed in APAR PK07829, which is shipped with version 6.0.2. Apply the maintenance to fix this problem. Tools used: APAR PK07829, which is shipped with version 6.0.2.
151
152
15
Chapter 15.
153
To install and deploy the application files, you can use the wsadmin tool for production environments and unattended operations or command-line tools to start and stop application servers, check server status, add or remove nodes, and complete similar tasks. WebSphere for z/OS also supports a Java programming interface for developing administrative programs. All of the administrative tools that are supplied with the product are written according to the API, which is based on the industry standard Java Management Extensions (JMX) specification.
154
Investigate these tools with the Java APIs to determine the best ways to administer WebSphere for z/OS and your applications. For information about the Java APIs, see Java Management Extensions (JMX) API documentation at the Information Center, which outlines the following procedure for taking advantage of these tools: a. Create a custom Java administrative client program using the Java administrative APIs. This topic describes how to develop a Java program that uses the WebSphere Application Server administrative APIs to access the administrative system of WebSphere for z/OS. b. Extend the WebSphere for z/OS administrative system with custom MBeans. This topic describes how to extend the WebSphere for z/OS administration system by supplying and registering new JMX MBeans in one of the application server processes. In this case, you can use the administrative classes and methods to add newly managed objects to the administrative system. c. Deploy and manage a custom Java administrative client program for use with multiple J2EE application servers. This topic describes how to connect to a J2EE server and how to manage multiple vendor servers. d. Manage applications through programming. This topic describes how to use Java MBean programming to install, update, and delete a J2EE application in WebSphere for z/OS. Java programs that define a J2EE DeploymentManager object in accordance with J2EE Deployment API Specification (JSR-88) JSR-88 defines a contract between a tool provider and a platform that allows tools from multiple vendors to configure, deploy, and manage applications on any J2EE product platform. The tool provider typically supplies software tools and an integrated development environment (IDE) for developing and assembling J2EE application modules. The J2EE platform provides application management functions that deploy, undeploy, start, stop, and otherwise manage J2EE applications. WebSphere for z/OS is a J2EE 1.4 specification-compliant platform that implements the JSR-88 APIs. See the Information Center topic Installing J2EE modules with JSR-88 for more information.
155
156
Specify the directory where the application EAR file is to be installed. In a network deployment configuration, by default the application is installed in the APP_INSTALL_ROOT/network_cell_name directory. In a base configuration, it is installed in the APP_INSTALL_ROOT/base_cell_name directory. Choose Deploy enterprise beans, if: The EAR file was assembled with an assembly tool such as Rational Application Developer, and the EJBDeploy tool was not run during assembly. The EAR file was not assembled with an assembly tool. The EAR file was assembled using versions of the Application Assembly Tool (AAT) previous to Version 5. This option allows the EJBDeploy tool to run during application installation and generates code that is required to run EJB files. Note: Choosing this option might cause the installation program to run for several minutes. Ensure that the application name is unique in a cell and does not contain characters that are not allowed in object names. Select Deploy WebServices if the EAR file has modules that are using Web services and has not previously had the wsdeploy tool run on it. The wsdeploy tool then can run during installation of the application and can generate the code that is required to run applications that use Web services.
An example is:
asadmin set domain.node-agent.node0.property.INSTANCE-SYNC-JVM-OPTIONS=-Xmx32m - Xss2m
The node agent is node0 and the JVM options are -Xmx32m -Xss2m.
157
For more information about JVM options, see the Web site at:
http://java.sun.com/docs/hotspot/VMOptions.html
Important: Restart the node agent after changing the INSTANCE_SYNC_JVM_OPTIONS property because the node agent is not automatically synchronized when a property is added or changed in its configuration. Tools used: Administrative console message log, SYSPRINT and job log of the server, and WebSphere for z/OS error log. Problem: The plug-in of the HTTP Server is unable to recognize the availability of another server because one server is down. Solution: To avoid this problem, tune the WebSphere HTTP plug-in configuration parameters to fit your environment so that users can experience fewer delays and failover performance of the WebSphere environment improves. Tools used: HTTP Plug-in config files, error logs. Problem: When a large application starts, your application server hangs and then shuts down. You get an ABEND SEC3. Solution: You are experiencing a timeout in the HTTP transport. The recommended way to solve this is to find the reason for the timeout by analyzing the SVC dump. See Analyze the SVCDUMP. on page 54 and SVC dumps on page 247 for more information. You might have to increase or even disable the deployment manager timeout variables to circumvent the problem for a while or to enable the SVC dump. Tools used: Administrative Console message log, system log, SYSPRINT and job log of the server, and WebSphere for z/OS error log. Problem: You are unable to install a large EAR file and the following message appears in the WebSphere for z/OS error log: BBOO0271E HTTP REQUEST EXCEEDED 10485760 BYTE INPUT BUFFER Solution: Increase the following variables as follows by selecting Environment Manage WebSphere Variables: protocol_http_large_data_inbound_buffer = 20485700 (or some other large number) protocol_http_large_data_response_buffer = 20485700 (or some other large number)
After setting these variables, recycle the DMGR and install the EAR file. If you receive the same message with the larger byte input buffer that is referenced in the error message, increase the number in the variables again. Tools used: Administrative Console message log, SYSPRINT and job log of the server, and WebSphere for z/OS error log. Problem: Session data integrity is lost when concurrent access to a session is made in different Web modules. Solution: This problem can occur when two Web modules are installed on different servers. If this is the case, the applications might share session attributes between Web modules using distributed sessions, but session data integrity is lost. Also, the use of some session management features such as TIME_BASED_WRITES is severely restricted. Install the Web modules in an enterprise application on one server to share session attributes to eliminate these problems as follows:
158
i. Start the assembly tool. ii. In the assembly tool, right-click the application (EAR file) that you want to share and select Open With Deployment Descriptor Editor. iii. In the application deployment descriptor editor of the assembly tool, select Shared session context under WebSphere Extensions. Make sure the class definition of attributes that are put in the session are available to all Web modules in the enterprise application. The shared session context does not fully meet the requirements of the specifications. iv. Save the application (EAR) file. In the assembly tool, after you close the application deployment descriptor editor, confirm that you want to save the changes that you made to the application. Tools used: WebSphere Administrative Console for z/OS, Assembly tool. Problem: When you deploy an application using wsadmin, the plugin-cfg.xml is not updated. Solution: You should install the application with a target of -cluster. After installation and before the application is saved, use $AdminApp edit to add the additional mapping to the Web server. After the application is saved, plugin-cfg.xml is regenerated. Tools to use: wsadmin. Problem: Documentation for Rollout Update for cluster deployment is not clear for use in scripting with wsadmin.sh Solution: The function for Rollout Update can be found under AdminTask for scripting with wsadmin. Refer to Commands for the AdminTask object in the Information Center at: http://publib.boulder.ibm.com/infocenter/wasinfo/v6r0/index.jsp The updateAppOnCluster command can be used to synchronize nodes and restart cluster members for an application update that is deployed to a cluster. After an application update, this command can be used to synchronize the nodes without stopping all the cluster members on all the nodes at one time. This command synchronizes one node at a time by stopping the cluster members to which the application is targeted, by performing a node synchronization operation, and by restarting the cluster members. This command might take more time than the default connector timeout period, depending on the number of nodes that the target cluster spans. Be sure to set proper timeout values in the soap.client.props file when a SOAP connector is used and in the sas.client.props file when an RMI connector is used. This command is not supported in local mode. Tools used: wsadmin, Information Center. Problem: The pre-compile JSP phase failed during deployment of WebSphere Application Server V6 for z/OS. Example 15-1 shows the trace output.
Example 15-1 Trace output
Trace: 2005/07/19 15:08:35.410 01 t=8CE828 c=UNK key=P8 (0000000A) Description: Log Boss/390 Error from filename: ./bborjtr.cpp at line: 932 error message: Compile complete for /jsp/ Errors compiling jsps in /WebSphere/V6R0M0/DeploymentManager/profiles/default/ wstemp/app/ext/applic.war Return code from jsp-compilation is: 1 Exception in jsp compile: com.ibm.websphere.management. exception.AdminException: ADMA0021E: An error
159
occurred in compiling JavaServe r Pages (JSP) files - applic.war (rc=1) ADMA6012I: Exception in run com.ibm.websphere.management. exception.AdminException: ADMA0021E: An error occurred in compiling JavaServer Pages (JSP) files - applic.war (rc=1) Exception: com.ibm.websphere.management.exception. AdminException: ADMA0021E: An error occurred in compiling JavaServer Pages (JSP) files - applic.war (rc=1) com.ibm.websphere.management.exception.AdminException: ADMA0021E: An error occurred in compiling JavaServer Pages (JSP) files - applic.war (rc=1) at com.ibm.ws.management.application.task. CompileJspTask.compileWar(CompileJspTask.java:152) at com.ibm.ws.management.application.task. CompileJspTask.performTask(CompileJspTask.java:86) at com.ibm.ws.management.application.SchedulerImpl. run(SchedulerImpl.java:253) at java.lang.Thread.run(Thread.java:568) Solution: Generally, you see this error when the JAR file that is being used by the compiler is not in a readable or complete state. It could be truncated or malformed in some other fashion. Check the disk space. The JAR file might be placed in /tmp/app_1052f9ddb0a/ear. Ensure that there is enough free space for that directory and for the WebSphere for z/OS temp space that the application server is using to compile. Problem: The following error occurs during synchronization when an EAR file is being deployed in WebSphere Application Server for z/OS V6: EDC5129I No such file or directory. The configuration synchronization completed successfully but there is an error message in the trace as shown in Example 15-2.
Example 15-2 EDC5129I error
Trace: 2005/06/29 08:42:59.948 01 t=AC44F8 c=UNK key=P2 (13007002) ThreadId: 00000229 FunctionName: com.ibm.ws.management.repository.FileRepository SourceId: com.ibm.ws.management.repository.FileRepository Category: AUDIT ExtendedMessage: BBOO0222I: ADMR0016I: User AHCPLEX/ASCR1T modified document cells/CELLDM1T/nodes/NODEB1T/servers/IMWEBPR6/pluginBSYS-V61-cfg.xml. file:///Was601DB1T/V6R0/AppServer/properties/xsl/server.xsl; Line #489; Column #138; Can not load requested doc: /Was601DB1T/V6R0/Ap pServer/profiles/default/config/cells/CELLDM1T/clusters/CLUSTER3/sib-eng ines.xml (EDC5129I No such file or directory. (errno2=0x0562
0062))
160
Solution: This message does not indicate any problem in the WebSphere environment. The sib-engines.xml file is missing in /cells/CELLDM1T/clusters/CLUSTERxxx after migration to V6.0. See APAR PK07966 for more details:
http://www-1.ibm.com/support/docview.wss?rs=404&uid=swg1PK07966
After FixPack 6.0.2 is installed, the message does not appear. Tools used: None. Problem: When you run the admin script $AdminApp install the process times out with the error message: SOAPException: faultCode=SOAP-ENV: Client; msg=Read timed out; targetException=java.net.SocketTimeoutException: Read timed out Solution: In most cases, this exception occurs because the timeout value is too small. To fix this, increase the timeout value specified by the com.ibm.SOAP.requestTimeout property in the soap.client.props file in the /WebSphere/V6R0M0/AppServer/profiles/default/properties directory. The value that you should choose depends on a number of factors such as the size and the number of the applications that are installed in the server, the speed of your hardware, and the capacity provided for the application. The default value of the com.ibm.SOAP.requestTimeout property is 180 seconds. Problem: When deploying an application that has database access, you receive this error message: ADMA8019E: The resources that are assigned to the application are beyond the deployment target scope. Resources are within the deployment target scope if they are defined at the cell, node, server, or application level when the deployment target is a server, or at the cell, cluster, or application level when the deployment target is a cluster. Assign resources that are within the deployment target scope of the application or confirm that these resources assignments are correct as specified. Solution: Apply the latest maintenance to your WebSphere for z/OS environment. APAR PK08164 solves this problem. Problem: When deploying an application with DB2 access (in our case: TraderDB2 to test DB2 connection) into WebSphere for z/OS V6.0 (network deployment) a dump was thrown with a Java SQL exception with the message: application are beyond the deployment target scope Solution: When enabling trace, we received the information in Example 15-3.
Example 15-3 Trace for DB2 resource access error
Trace: 2005/08/16 11:00:00.000 01 t=AC89C0 c=2.C key=P8 (13007002) ThreadId: 0000001a FunctionName: com.ibm.ejs.j2c.poolmanager.FreePool SourceId: com.ibm.ejs.j2c.poolmanager.FreePool Category: SEVERE ExtendedMessage: BBOO0220E: J2CA0046E: Method createManagedConnect ionWithMCWrapper caught an exception during creation of the Managed Conn ection for resource jdbc/TraderDB2, throwing ResourceAllocationException . Original exception: com.ibm.ws.exception.WsException: DSRA8100E: Unable to get a PooledConnection from the DataSource. with SQL State : 42505 SQL Code : -922 at COM.ibm.db2os390.sqlj.jdbc.DB2SQLJConnection.setError(DB2SQLJConne at COM.ibm.db2os390.sqlj.jdbc.DB2SQLJConnection.<init>(DB2SQLJConnect
Chapter 15. Phase 2: Application deployment
161
at com.ibm.db2.jcc.DB2PooledConnection.<init>(DB2PooledConnection.jav at com.ibm.db2.jcc.DB2ConnectionPoolDataSource.getPooledConnection(DB at com.ibm.db2.jcc.DB2ConnectionPoolDataSource.getPooledConnection(DB at com.ibm.ws.rsadapter.DSConfigurationHelper$1.run(DSConfigurationHe at com.ibm.ws.security.util.AccessController.doPrivileged(AccessContr at com.ibm.ws.rsadapter.DSConfigurationHelper.getPooledConnection(DSC at com.ibm.ws.rsadapter.spi.WSRdbDataSource.getPooledConnection(WSRdb at com.ibm.ws.rsadapter.spi.WSManagedConnectionFactoryImpl.createMana at com.ibm.ejs.j2c.poolmanager.FreePool.createManagedConnectionWithMC at com.ibm.ejs.j2c.poolmanager.FreePool.createOrWaitForConnection(Fre at com.ibm.ejs.j2c.poolmanager.PoolManager.reserve(PoolManager.java:2 at com.ibm.ejs.j2c.ConnectionManager.allocateMCWrapper(ConnectionMana at com.ibm.ejs.j2c.ConnectionManager.allocateConnection(ConnectionMan at com.ibm.ws.rsadapter.jdbc.WSJdbcDataSource.getConnection(WSJdbcDat at com.ibm.ws.rsadapter.jdbc.WSJdbcDataSource.getConnection(WSJdbcDat at edu.mayo.registration.amts.datamanager.ManageConnections.getConnec at edu.mayo.registration.amts.objects.UserRequest.getDbConnection(Use at edu.mayo.registration.amts.datamanager.UserMapper.accessRacfGroup( at edu.mayo.registration.amts.businessfacade.ManageAMTSApplication.ch at edu.mayo.registration.amts.presentation.TeamLogonServlet.performTa at edu.mayo.registration.amts.presentation.TeamLogonServlet.doPost(Te at javax.servlet.http.HttpServlet.service(HttpServlet.java:763) at javax.servlet.http.HttpServlet.service(HttpServlet.java:856) Caused by: java.sql.SQLException: DB2SQLJConnection error in native method: constructor: CONNECT 00F30085 Analyzing the trace (with Trace Analyzer for WebSphere Application Server and JDBC Trace) and the resource access, we concluded that this condition indicates a security violation. We had to verify the RACF and JDBC data source definitions. In our case, the resource requesters password could not be verified. The user password specified for the data source jdbc/TraderDB2 was incorrect. We changed the definition and redployed the application. Problem: During the process of installing the Trade6 application in a WebSphere cell on z/OS v6, the Trade6 install scripts create all the SIBus resources that are needed to support the JMS part of the application, but at the messaging engine start up, the following error appears: The messaging engine encountered an exception while starting. Exception: com.ibm.ws.sib.msgstore.PersistenceException: CWSIS1501E: The data source has produced an unexpected exception:java.lang.IllegalStateException: CWSIS1523E: Dynamic allocation of database objects in DB2 for z/OS is not allowed. com.ibm.ws.sib.utils.ras.SibMessage Solution: When you use DB2 for z/OS, the messaging engine cannot dynamically create the data store tables. This means that you must manually create these tables using the DDL statements produced by the sibDDLgenerator command. You can find instructions at: http://publib.boulder.ibm.com/infocenter/ws60help/index.jsp?topic=/com.ibm.w ebsphere.pmc.zseries.doc\ref\rjm0630_.html You can redirect the output of the command to a file so that you can submit it to DB2 for z/OS later. One way to do this is to use SPUFI, but before you do this, you must copy the file to an FB80 data set first. As the final step, you must clear the Create tables box (for every messaging engine that uses DB2 for z/OS) from the data store panel of the Administrative Console.
162
Important: You must create database tables manually for every messaging engine that uses DB2 for z/OS.
See also Disabling the Deployment Manager HTTP Timeout, TD101703, available at:
http://www.ibm.com/support/techdocs
163
164
16
Chapter 16.
For additional learning and in-depth treatment of WebSphere components and related topics, consult the Information Center at:
http://publib.boulder.ibm.com/infocenter/wasinfo/v6r0/index.jsp?topic=/com.ibm.websphere.ba se.doc/info/aes/ae/rtrb_plugincomp.html
165
W ebsphere on z/OS
An application server
6
2 3 W EB SERVER W EB SERVER PLUGIN W eb Services Engine Dynamic Cache Security JM X 8 W PM Dispatch RA Name Server Data Replication Etc. Controller Controller Region Region (JVM ) (JVM ) W LM QUEUE 5 W eb Container Servlets JSPs
wow
EIS
Database
A typical request flow (refer to Figure 16-1) might be: 1. The browser or client issues a request for a resource from the J2EE application. 2. After the request is cleared and authenticated by the z/OS security component, it is routed to the plug-in. The plug-in forwards the request based on directives in the plugin-cfg.xml file that are masked against the requested resource, the transport protocol (secured or non-secured), and the destination Plug-in. This task is done by the Web server plug-in. 3. The controller validates the request for resource access and puts it in the WLM queue. If there is no pre-allocated servant task running to pick the request from the queue and process it, the controller starts one. This scenario can happen if the number of servant tasks has reached the default maximum. 4. The J2EE application server, also called the servant region, loads the components from the required WebSphere class libraries into the runtime environment and invokes the application.
166
5. The request is routed to a container and a servlet is loaded to service the request. Common events that take place during the servicing of a request are: If the servlet is not already loaded, it is loaded. If the servlet is packaged with load on initialize, it is loaded when the server is started. If not, it is loaded when the first request hits. 6. Requests for data from the servlet are classified depending on the types and intended activities (read-only, update). Based on this classification, the appropriate EJBs are invoked. EJBs act as internal application data brokers and shield the applications from the mechanics of having to model and format data every time they require them. 7. Physical data is located on data servers. Database software models data to a predefined design and access patterns (hierarchical, relational, sequential). Organizations choose database software that best fits their processing needs. J2EE connectors connect Java applications to data repositories with programming APIs. They do what EJBs do for internal applications that are accessing relational databases: they shield the clients from the mechanics of having to know the attributes of every piece of data needed. The two types of J2EE connectors are: a. JDBC for relational databases (inside a WebSphere environment) b. JCA (implemented as Resource Adapters) for EIS databases (outside a WebSphere environment) For more information about J2EE connectors, visit: http://java.sun.com/j2ee/connector/ 8. Data is retrieved from the enterprise complex and returned to the EJB that is making the request. This data is processed (by the program code) and sent back to the requestor in HTML or XML format. 9. If the request has been serviced with no errors, it is posted back to the browser or client, and the HTTP return code is set to 200. Knowing how far a request gets is critical in eliminating components that do not need to be addressed. To reinforce this concept, you take a request and superimpose it into your J2EE framework. From there, you can identify the layer or tier where the problem area might lie.
167
The necessary logs for the IBM HTTP Server can be found in the <plugin_install_root>/logs/<web_server_name>/ directory. The files are: http_plugin.log error.log access.log Confirms that the HTTP Server started and initialized Records errors within the server Records inbound and outbound requests
The Web server plug-in software handles communication between a Web server and the application modules in the Web container. It acts as a somewhat intelligent router for HTTP requests based on directives from its configuration file. Some typical problems in the order of a request/response flow from the diagram in Figure 16-1 on page 166 are: Problem: You cannot get to an application or an application does not work. Solution: Follow these steps to analyze and fix the problem: i. Make sure that you do not have any typographical errors in the URI and that all HTTP 4xx codes are client error codes. ii. Search for the HTTP error code at the HTTP Server Information Center or consult the W3 Consortium site: http://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html iii. Check to see if the Web server has been started. Go to SDSF, select SYSPRINT from the controller job (installation dependent), procstep BBOCTL, and look for this message: PLGC0057I: Plug-in configuration service is started successfully. You can also access the top level of the URL from a Web browser. For example: http://<web_server_name>/ Or, you can ping the server: ping <web_ server_name> iv. If the Web server does not open a default welcome or error page, check in SDSF to see if the Web server job is running. The name of the job is installation dependent, so verify it with yours. If the Web server is not running, start the Web server by issuing the START <web_server_procname> command and analyzing the syslog and job log for its success. v. If the Web server is running, verify access to the application: http://<host name>/snoop vi. If you still have problems, check to see if the application can be accessed directly with the embedded HTTP server in the Web container. Select Application servers appserver link HTTP transport. The Web container default ports are listed here, one for non-SSL and one for SSL. Invoke the application again using this URL format: http://<host name>:port/snoop vii. If this did not work (you were not able to access the Web container), then the Web server and its plug-in have problems. Web servers and associated Plug-ins are simple and solidly built components. Once they are running, they work. It is rare that a bad code upgrade was released for the plug-in.
168
Most problems in this area are related to incorrect changes that were made to the plug-in file or files that were corrupted. With that in mind, incremental back-up of your plug-in configuration is recommended. In USS, use the -nostop and -nowait options because it is not necessary to stop the server to back up or restore configuration files: backupConfig <backup_file> [options] For example, you can issue: backupConfig myFile.zip -nostop The -nostop option does a backup in place; the server does not have to be stopped for backup to be performed. restoreConfig <backup_file> [options] For example, you can issue: restoreConfig -nowait The -nowait option does a restore in place; the server does not have to be stopped for that restore to be performed. viii.To verify that the Plug-in is the problem, swap the configuration file that you think is in error with a good backup copy. Use the Administrative Console to apply changes to the Plug-in file. Although it is possible to edit it manually, it is not recommended. ix. If this works, then you only have to analyze the two plug-in files for differences to determine where the problem lies. When you know what causes the problem, you can fix it. x. You can also regenerate the plugin-cfg.xml from the Administrative Console if you suspect that the copy in the local server is bad. Select Login Servers Web Servers and click Generate plug-in. If you are running in a network deployment environment, this action replaces the plugin-cfg.xml copy at the node server with the master copy stored at the deployment manager node. Note: The plug-in is an xml file. The directives that usually are changed when servers are remapped and reconfigured are: VirtualHostGroup, ServerCluster, VirtualHostGroup, and UriGroup. xi. After you get past the Web server and the application still does not respond to your requests, there are a few things that you can do to check on the server and its status. Check for the servant job in SDSF (job name is installation dependent, look for Procstep=BBOSR). If it is not running, it does not show. In that case, you must start the application server as follows: START <appserver_proc_name>,JOBNAME=<server_short_name>, ENV=<cell_short_name>.<node_short_name>.<server_short_name> For example, we issued: START WS6552C,JOBNAME=WS6552,ENV=CL6552.ND6552.WS6552 If you are running in a network deployment environment, you can start the application server from the Administrative Console. Select Login Applications Enterprise Applications Select Application (check box) Start. xii. If there is a servant job, select its SYSOUT (this log has the error trace turned on by default). Make sure that you do not see any exceptions logged. If the server is
Chapter 16. Phase 3: Run applications
169
running but the application is not responding, usually that means it has run into an Out Of Memory problem. This can be caused by a limited heap size or other resource constraints. Check which processes use most of your memory and whether this behavior is as expected or caused by configuration mistakes. xiii.You can also use the Administrative Console to verify the status of the application server as shown in Figure 16-2. Go to Administrative Console and select Expand Troubleshooting Logs and Trace. The green arrow confirms that the application server is running. The startup of the application server automatically starts all the applications that it hosts.
Problem: The static content that the Web browser is serving up is incorrectly rendered. Solution: The source file is being transferred to z/OS in ASCII from workstations. HFS handles files in EBCDIC and any data at the presentation layer must be in ASCII, which might cause some confusion. With your image, text, or source file available on your workstation, verify that the file has good contents. Use a browser to try to open it. If the file content is good and its association is correct, then it is viewable. If the file is not viewable, just retransfer the file in binary mode. Tip: Issue the bin command to set the transfer mode to binary before transferring files between your workstation and HFS, and vice versa. Problem: You are experiencing erratic browser response and an inconsistent and quirky interface. Solution: Sometimes, older (outdated) browsers also give erratic responses and render contents incorrectly. Check on updates, fixes, and for a list of supported browsers at: http://www.ibm.com/software/webservers/appserv/doc/latest/prereq.html
170
Some common problems in the control tier are: Problem: You must log in and enter passwords from page to page even though you are using the same function in the same record in the same application. Solution: This is the typical behavior of a session affinity problem. The Web container keeps track of sessions with cookies that carry session IDs that are passed back to the browser. Session data is maintained in the application server memory. The Plug-in configuration file is set up to enable session affinity by default and uses the CloneId parameter for session IDs. The parameter can look like this: <Server CloneID="80mn5ljkma" ...> The Plug-in generation process adds the Clone ID parameter by default; this is how the Plug-in identifies each application server. When this ID is used, subsequent requests get routed back to the server that generated the session ID. This is the most efficient way of handling session affinity. Other ways to maintain session affinity throughout the servers in a cluster are: Persisting session information to a database Applying in-memory copying of objects between JVMs (domain replication)
The preferred and recommended way is to keep session objects on the server, the second option. If you are experiencing a loss of session affinity, check to make sure that cookie writing is allowed in your browser and client firewall software. The browser software has different tabs or sub-menus for enabling cookie writing but, in general, it is in the privacy area. Problem: You suspect a program loop. Solution: Program loops usually result in a request not responding, or a component hanging, usually followed by a timeout error. In some cases, things seem to work fine, but some tasks in the address space keep consuming system resources without producing a result. You can see this when CPU usage is high but nothing seems to justify this. A dump or trace might be necessary because the log information is not sufficient for determining the cause of the loop or the unusual high resource consumption. Indications of a loop are: A repetitive message from a module waiting for work and nothing being done, such as: ExtendedMessage: <component> waiting for next server work A repetitive message that a module is active and you are able to follow the executed address ranges, but the only thing changing is the time stamp. A repetitive message from a module processing work/requests but the thread ID stays the same. Notice the ThreadID and FunctionName in Example 16-1. They might stay the same, but the trace header line with the time stamp changes if a loop is occurring. There might be several other messages between the repetitions. The shorter the loop cycle, the more likely it is that you can recognize the loop.
Example 16-1 Looping thread
Trace: 2005/08/19 21:23:41.232 01 t=7D19C0 c=UNK key=P8 (13007002) ThreadId: 0000006d FunctionName: com.ibm.etools.validation.validationbuilder SourceId: com.ibm.etools.validation.validationbuilder.UserStateRegistry
171
ExtendedMessage: closeUser - found UserPrefs: UserPreferences: nodeName:nd6552, serverName: ws6552, userId: waspd2, refreshRate If you suspect an application loop or hang, you can use: IPCS to format a trace and analyze it for recurring PSW addresses; see 20.1.2, Viewing CTRACE and JRas data through IPCS on page 242. Use the com.ibm.jvm.svc.dump.Dump utility to identify the thread under which a loop is occurring, the threads contending for resources or involved in a lockout, and a thread waiting for some operation that is external to the server. See Chapter 20, WebSphere for z/OS traces and dumps on page 241. Before you contact IBM for service, use these tools to identify the particular component or subcomponent that is responsible for the failure. In most cases, the components are the application program code rather than product code from IBM. Present the component name together with the class and method name from the trace to your application development team or IBM (in the case of IBM components). This enables them to fix the code quicker. Problem: You typed in a URL, such as <host name>/hitcount, and you expected an application page, such as that shown in Figure 16-3.
Instead, you see a screen of source HTML in the browser window (Figure 16-4).
<HTML> <HEAD><TITLE>IBM WebSphere Hit Count Demonstration</TITLE></H1> <SCRIPT TYPE="text/javascript"> function enableLookupButtons() { var myButtons = document.getElementsByName("lookup"); for (i = 0; i < myButtons.length; i++){ myButtons(i).disabled = false; } }
Solution: Whenever the document root of the WebSphere Application Server is the same as the Web server document root, the Web server JSP source file is shown as plain text. Using the plug-in directives, you can tell the Web server that a request is to 172
be handled by the WebSphere Application Server. If an inbound request does not match any entry in these directives, the control returns to the Web server. In this case, the Web server searches the resource requested in the document root. Because the JSP file is stored in the document root, the Web server displays it as plain text. To avoid the plain text display, move the application server JSP source file outside the Web server document root. When a request comes in with an unknown host header, the plug-in returns control to the Web server and if the Web server cannot find this JSP source file in its root document, it returns an HTTP 404 error message instead. Problem: The application server can be started from SDSF but does not start from the Administrative Console. The JCL error is: IEF642I EXCESSIVE PARAMETER LENGTH IN THE PARM FIELD Solution: In the log, there is a variable appended to the start command: //STARTING EXEC BB6S001,ENV=DTDCV6.I21A.BB6S001, // PARMS=-Dwas.status.socket=1927 The status socket being opened by the Administrative Console is actually opened by the node agent. The node agent uses this socket to monitor the progress of the server starting, and reports that progress to the Administrative Console. When you start the server from the MVS command line, it is not being monitored by the node agent, so there is no need to open the status socket (or pass in the was.status.socket). So, in this case, everything appears to be working correctly. The best solution here is to remove the hard coded envvar from your server JCL. If you manually added the _BPXK_SETIBMOPT_TRANSPORT variable (for a multistack environment), you should move it out of the JCL. In the Administrative Console, select Environment Manage WebSphere Variables. For more information, search for TcpMultistack at the WebSphere for z/OS Information Center: http://publib.boulder.ibm.com/infocenter/wasinfo/v6r0/index.jsp Problem: The application is deployed in WebSphere for z/OS in a cluster of two or more servers. Each server is on a different WebSphere node. When the application is redeployed, restarting the servers on each node fails. WebSphere for z/OS attempts to restart the controller region before the controller region has finished shutting down. Solution: Apply the latest PTFs (maintenance) to your WebSphere for z/OS environment. The problem was corrected in version 6.0.2.
173
The user of a request has insufficient authorization granted for a certain resource. If this is the case, you almost certainly find an authorization failure message (Example 16-2) in the job log. Request appropriate authorization from the security administrator so that the application can run with sufficient privileges.
Example 16-2 Sample failure message ICH408I USER(WASPD2 ) GROUP(SYS1 ) NAME(USER1) /u/waspd2/.sh_history CL(FSOBJ ) FID(00000000000000004C03000000190000) INSUFFICIENT AUTHORITY TO OPEN ACCESS INTENT(RW-) ACCESS ALLOWED(GROUP ---) EFFECTIVE UID(0000003174) EFFECTIVE GID(0000000000)
The connection to the back-end EIS or data sources has problems. There are two types of connectors to back-end data: A JDBC driver that is embedded with WebSphere if you are using DB2 EIS JCA connectors that are provided by IBM if you are using IMS or CICS. Regardless of which connector you use, they all have traces and logs that can be collected to help diagnose errors. An incorrect configuration usually results in an exception logged in the SYSLOG when the resource is being accessed. A timeout exception occurs when the back-end system server (EIS or DB2) is unavailable. A JDBC trace is useful for diagnosing problems in the DB2 SQL for Java and JDBC (SQLJ/JDBC). The output goes to an HFS file that is specified in the JDBC properties file. JDBC trace information shows Java methods, database names, plan names, user names, or connection pools. Use the Administrative Console to check the configuration information and test the connection for DB2 JDBC: a. Go to Login Resources JDBC Providers and look for the link with the name of the database. The center pane displays the configured JDBC provider as seen in Figure 16-5.
b. Click the link and verify the name and properties of the database. c. Test the connection to verify that the application can access the data source. d. Ensure that you have the right data source name and properties configured in the Administrative Console. e. Also check the data source names in the resources.xml configuration file.
174
IMS and CICS (two EIS products) also produce their own traces and logs. These subsystems very likely run in their own LPARs. Contact the administrator to get the traces and logs that are required for further analysis or ask them for help. If your application uses EIS resource adaptors: a. Contact the EIS administrator to verify that the subsystem is up and available because there is no direct way to test the connection to an EIS resource from the WebSphere Administrative Console. b. Go to the Administrative Console and select Login Resources Resource Adapters. c. Drill down to the resource name link as you would with the JDBC providers. Verify the configuration properties (given by the administrator or developer of the application) such as spelling of resource names, class path information for libraries, and security information.
175
176
17
Chapter 17.
177
178
Cell Node
HTTP
Daemon
Deployment Manager
WebServer (IHS)
HTTP
Server
HTTP
Servant Region
(user Applications) Servant Region (user Applications) Servant Region Web Container (user Applications) (Servlets, JSPs) Web Container (Servlets, JSPs) Web Container EJB Container(Servlets, JSPs) (EJBs) EJB Container WebServices (EJBs) EJB Container (EJBs) WebServices
MQ DB2
IIOP
CR Adjunct
JMS
(Messaging Engine)
WebServices
z/OS infrastructure TCP LOGR WLM RRS USS LE Java SDK Security (RACF)
Figure 17-1 WebSphere for z/OS V6 runtime structure (network deployment configuration)
Figure 17-1 shows the network deployment configuration, which was first introduced in WebSphere for z/OS V5 . A network deployment configuration consists of A deployment manager for the cell A node agent for each node in the cell A number of servers in each node One daemon server per cell per z/OS image The network deployment configuration is slightly more advanced in WebSphere for z/OS V6 because it might have additional address spaces. However, it is still very similar to the network deployment configuration in WebSphere for z/OS V5 , especially if you are not using messaging. IBM HS in Figure 17-1 stands for IBM HTTP server for z/OS, which is an optional Web server if you are not using the HTTP server that is embedded in the controller region (which is indicated as CR in the figure). In Figure 17-2 on page 180, we also show the Web server for z/OS among the optional z/OS subsystems. This implementation assumes that we have configured and are using the IBM HTTP server with the WebSphere Application Server Plug-in.
179
Cell
Optional Sub Systems
IBM HS
WebSphere Application Server Plug-in
Node Server
Daemon
CR
SR
z/OS infrastructure
TCP LOGR WLM RRS USS LE Java SDK Security (RACF)
WebSphere for z/OS V6 also has a base application server configuration. It is called a stand-alone application server configuration because that is what it is. You can no longer add additional servers to the node like you did with WebSphere for z/OS V5 . As shown in Figure 17-2, the stand-alone application server configuration is a node and a cell, and it has a daemon server. The rule is one daemon per cell per z/OS image. In WebSphere for z/OS V6, the stand-alone application server configuration supports multiple servant regions (SR in the illustration). This was not supported in WebSphere for z/OS V5 . You could only have one servant region in the WebSphere for z/OS V5 base application server configuration.
going to be accessing one or more of these subsystems. These subsystems can be in the same LPAR as your WebSphere for z/OS V6 run time (see Figure 17-1 on page 179), but this is not required. They can even be on a separate server such as an LPAR on a different zSeries platform. These subsystems (for example, CICS TS, DB2 UDB for z/OS, or IMS) normally provide vital back-end functionality for your user applications. When WebSphere MQ is involved, this subsystem might provide back-end and front-end functionality to your applications. Problems connecting to the back-end subsystems usually create an application timeout or some incorrect output from the application. Problems in the subsystems themselves often impact the response times or the performance of your WebSphere for z/OS V6 user application. In many of these cases, the WebSphere for z/OS V6 system administrator works with subsystem experts to resolve these types of problems. WebSphere for z/OS environment problems The WebSphere for z/OS environment basically consists of an administrative part and the application server run time part: The administrative part includes several tools like the Administrative Console, the WebSphere administrative scripting program (wsadmin), administrative commands, and possible administrative programs. Refer to the WebSphere for z/OS Information Center for further information and description. In addition to the tools, the deployment managers and the node agents are administrative components. For the administrative part to work without problems, these tools must all be installed correctly and the components must be configured correctly and up and running. For example, in a network deployment configuration, the Administrative Console application, the deployment manager, the node agent, and the configuration repository (HFS or zFS) must all be present and working correctly for your administration to be carried out. Any mismatch between the tools and the components can result in your administration failing to work. The application server run time (either a stand-alone configuration or a network deployment configuration) consists of, at minimum, a daemon, a controller region, an optional CRA, and one or more servant regions for each controller region. Inside each servant region is a JVM where the J2EE applications are executed (see Figure 17-1 on page 179 and Figure 17-2 on page 180). When you are analyzing problems in the WebSphere for z/OS V6 application server run time, it is important to understand the functions of each component. It is also very important to understand how the components interact with each other. Problems arising in the runtime part should be approached in the standard way: What is the symptom? When does the problem occur? Has anything been changed in the runtime part recently? Has a new version of the application been deployed recently? Look at logs starting with the most likely component Look at the error log, look at SYSOUT, and look at SYSPRINT
17.3, Understanding your own runtime configuration on page 182, describes this process in more detail. For information about troubleshooting, search for Troubleshooting and support at the WebSphere for z/OS Information Center: http://publib.boulder.ibm.com/infocenter/wasinfo/v6r0/index.jsp
181
Recent operational activities Are you running any diagnostic tools or traces? Have you activated any performance monitors? Have you restarted any of the z/OS components or subsystems? Changes in the volume of traffic Has the number of WebSphere for z/OS V6 transactions increased? Has the resource utilization increased? Is there a lack of processor or memory capacity for the WebSphere for z/OS V6 LPAR?
182
Following such an approach leads you to the most likely cause of the problem in the shortest amount of time. The WebSphere for z/OS V6 runtime environment differs from installation to installation. However, the initial troubleshooting process is almost identical and the authors recommend the following simple approach when you have identified a symptom: 1. Review the syslog, the job log for the CR and the SR, and the WebSphere for z/OS V6 error logs. Identify error messages. 2. If you can associate an error message directly with your initial symptom, take action as indicated by the error message. 3. If the error message cannot be directly associated with the symptom but requires you to obtain more diagnostic information, follow the indication in the error message. If you still cannot identify the error messages that are associated with your symptom, we recommend that you use the flow charts for the symptoms that we have identified in Part 2, Problem symptoms and their resolutions on page 39, to find the root cause of the problem and solve it.
183
WebSphere for z/OS V6. Although many problems are caused by software components other than WebSphere for z/OS V6, many product defects are also found and maintenance for those defects is made available. As a standard procedure, verify that your current runtime environment satisfies all the hardware and software prerequisites for the maintenance to be performed. Also verify that, when you perform maintenance for WebSphere for z/OS V6, you do not introduce any inconsistency between the z/OS load libraries and the HFS. Change management In the same context, establish a total change management procedure for your environment. This procedure should outline the proper processes for making any kind of changes to your environment. This prevents problems that occur as a result of one group or person in the organization introducing changes to the environment that others were not aware of. With a procedure to eliminate these occurrences, the chances of unexpected problems is reduced. The test environment Ideally, the test environment should be configured exactly like the production environment. However, this is sometimes not possible. When performing tests, the software maintenance level should be identical to your production environment. If the test environment and the production environment are kept in sync, you can easily do problem determination in the test environment if a problem occurs in the production environment. This is an advantage because it does not impact your production environment. One of the reasons the authors recommend a testing environment with a configuration identical to the production environment is because it allows you to test performance without affecting your production environment. Application testing Testing is the best strategy for preventing problems from occurring in your WebSphere for z/OS V6 production runtime environment. A detailed testing strategy should be in place for all applications and it should be followed every time a new version of the application is being introduced into the production environment. This procedure must include both functional testing and performance testing. Prior to deploying an application into your WebSphere for z/OS V6 runtime environment, the authors recommend that it be thoroughly tested in the WebSphere test environment of the IDE or in WebSphere V6 in a distributed environment. Monitoring A good monitoring strategy can help you identify problems before users experience them. WebSphere for z/OS V6 includes an enhanced Tivoli Performance Viewer that is accessible from the Administrative Console. To use Tivoli Performance Viewer, you enable the Performance Monitoring Infrastructure, log on to the Administrative Console, select your application server, and select the PMI link. Tivoli Performance Viewer can give the you clues about possible performance bottlenecks. It is also wise to monitor the WebSphere for z/OS V6 application server logs. The SystemOut and SystemErr logs for each application server should be monitored. Informational messages, warning messages, and error messages are in these logs. Operational procedures To avoid or reduce operational issues, make sure that you have created and documented your own start and stop procedures. They must clearly describe the sequence and commands for all components and subsystems. This can serve as a requirement specification for the automation of the operational process. Ideally, your operation should be fully automated to reduce human error.
184
Large HFS Ensure that you allocate a large HFS for the WebSphere for z/OS V6 configuration. Leave a large spare space for your file system so that you have room for adding new applications, upgrading to new application servers, upgrading to a new WebSphere for z/OS V6 maintenance level with a new version of the Administrative Console application, and so on. Network deployment configuration In your network deployment configuration, ensure that the deployment manager and the node agents are running prior to any application deployment or configuration changes. This helps ensure that the changes are effectively synchronized across all nodes. Attributes and properties verification For each server, using the Administrative Console or the job logs, verify the following attributes and properties: protocol_http_timeout_persistentSession Use the default values unless you know that your application requires different values. protocol_http_timeout_input Use the default values unless you know that your application requires different values. protocol_http_timeout_output Use the default values unless you know that your application requires different values. This timeout might also cause abend EC3-0413000n in the case of a long running JSP compilation request. transaction_maximumTimeout The default is 300 seconds. Check the value and increase it. wlm_dynapplenv_single_server=1 Use this value if you do not want to have multiple servant region instances per application server. Otherwise set it to 0. wlm_maximumSRCount=1 If you need multiple servant region instances to be started, make sure that this is set to a desired value. wlm_minimumSRCount=1 This determines how many servers to start as a minimum. ras_trace_defaultTracingLevel=1 For new applications, the recommendation is 1 and: ras_trace_outputLocation=BUFFER SYSPRINT Correct settings For each server, use the Administrative Console or the servant job log to verify that the following attributes and properties are set correctly: Verbose garbage collection is enabled. Verbose garbage collection for the application server JVM is turned on. Turning on verbose garbage collection is not very costly and can help to monitor possible Java memory leaks. JVM maximum heap size is set. You can set this value to 256, -Xmx256m
185
JVM minimum heap size is set. -Xms256m or set it to the same as for -Xmx -Xdebug is turned off.
186
CHKW3706E: Validation of cells/CELLDM1T/nodes/NODEB1T/serverindex.xml failed with exception java.lang.NullPointerException. cells/CELLDM1T/nodes/NODEB1T/serverindex.xml They only received it once. Another message was: CHKW2130W: The server Network Deployment Server has no configured transaction service. cells/CELLDM1T/nodes/NODEDM1T/servers/dmgr/server.xml Solution: All of these CHKW messages are being issued by the configuration validator. The CHKW2062E messages are usually the result of Java process definitions that have blank fields, specifically the executable arguments field in the process definition for the JVM of a server. These messages communicate nothing except that the field is blank. They are benign. There is a good argument for why these messages are being generated, however. The next message is the CHKW2130W. This is a warning message (note the W suffix on the message code) that appears to check the value of the enable attribute for the TransactionService element in the server.xml. This warning is benign and can safely be ignored. For the last error, it is difficult to determine what is occurring in that CHKW3706E message. However, this is simply an error in the validation of the WebSphere document repository, and does not necessarily represent any runtime error in WebSphere that would affect operation. Problem: When you attempt to use the wsadmin script, you are not able to invoke the script or the error returned by wsadmin does not seem to apply to the command you entered. For example, you receive a WASX7023E, stating that a connection could not be created to host myhost, but you did not specify -host myhost on the command line. Solution: You are either in the wrong session and therefore not authorized to run wsadmin in this particular session, or you might have entered the wrong commands. If you are not able to enter wsadmin command mode, try running wsadmin -c "$Help wsadmin" for help in verifying that you are entering the command correctly. If you can get the wsadmin command prompt, enter $Help help to verify that you are using specific commands correctly. Check your syntax and spelling and enter the command again. Also keep in mind that wsadmin.traceout is refreshed (existing log records are deleted) whenever a new wsadmin session is started. If the error returned by wsadmin does not seem to apply to the command you entered, examine the properties files that are used by wsadmin to determine what properties are specified. If you do not know what properties files were loaded, look for the WASX7326I messages in the wsadmin.traceout file; there is one of these messages for each properties file that is loaded. The wsadmin commands are a super-set of JACL, which is a Java-based implementation of the TCL command language. For details about JACL syntax beyond wsadmin commands, refer to the TCL developers site at:
http://www.tcl.tk
For specific details relating to the Java implementation of TCL, refer to the Web site at:
http://www.tcl.tk/software/java
Problem: After you apply the cumulative fix cf20523.06 to your WebSphere for z/OS V6.0.1 system, you notice that an old problem regarding SMF had resurfaced. Separator
187
values were missing in the SMF 120 subtype 5 field, AMCName, that is used to parse the application, module, and class name. For example, what used to appear as: ECperfEAR::OrdersJAR.jar::OrderLineEnt Now appears as: ECperfEAROrdersJAR.jarOrderLineEnt The APAR to fix the old issue was PQ85314. Solution: In WebSphere Application Server V4.0.1 for z/OS and OS/390, the Application-Module-Component (AMC) name was represented in SMF as the string ApplicationName::ModuleName::ComponentName, where each sub-field was separated by ::. However, in V5, the AMC name returned with SMF recording is ApplicationNameModuleNameComponentName, where there is no separation of each sub-field. The Application-Module-Component value that is returned by SMF should follow the ApplicationName::ModuleName::ComponentName format and can now be tokened with :: separating each sub-field. This value is now compatible with WebSphere Application Server V4.0.1 for z/OS and OS/390 SMF tooling. Apply the latest maintenance. APAR PK07137 is associated with Service Level (Fix Pack) 6.0.2.1 (Build Level cf10533.10) of WebSphere Application Server for z/OS V6.0.1. Problem: When you are running an application, a Storage Allocation Exception error occurs in the WebSphere for z/OS error log: WAS Z fullMaterializedLobData give storage error- Websphere Application Server Allocation Error Solution: The current available resolution is either to use LOB locators versus fully materialized data (that is, setting the fullyMaterialLobData=false) or to define the target LOB columns with maximum sizes that the client-side system can allocate storage for. Problem: Server abends cause the Administrative Console session to become invalid. The following message is issued in the log: BBOO0232W REQUEST FOR CLASS NAME REMOTEWEBCONTAINER METHOD NAME HTTP REQUEST FROM IP ADDR HAS TIMED OUT Solution: If the server abends are caused by a timeout, you can change the protocol_http_timeout_output_recovery variable. Set the variable to protocol_http_timeout_output_recovery=SESSION so that there is no abend when the session timeout happens. This prevents invalidation of the existing session and allows the Administrative Console to be used immediately without waiting for the server to finish initialization. If the timeout occurs because of server performance issues, then capture a console dump of the server and determine the cause of the poor performance: DUMP COMM=(description of problem) Reply to dump WTO, where SERVERPROC is the name of your WebSphere Server address spaces: JOBNAME=(OMVS,Serverproc),DSPNAME=('OMVS'.SYSZBPX1,'OMVS'.SYSZBPX2), SDATA=(CSA,GRSQ,LPA,NUC,PSA,RGN,SQA,TRT,SUM) Use IPCS to diagnose the dump and identify the cause of low performance. Chapter 13, WebSphere for z/OS performance analysis on page 117, and Chapter 12, High CPU utilization on page 109, can help you analyze and resolve performance problems.
188
Problem: When an application in WebSphere for z/OS V6 (network deployment) was running, an application accessing DB2 resources timed out. The server address space did not respond anymore. Solution: We enabled trace and received the messages in Example 17-1.
Example 17-1 Messages after WebSphere for z/OS becomes unresponsive
Trace: 2005/08/15 13:34:14.427 01 t=AC2CF0 c=UNK key=P8 (13007002) ThreadId: 0000003a FunctionName: com.ibm.ws.runtime.component.ThreadMonitorImpl SourceId: com.ibm.ws.runtime.component.ThreadMonitorImpl Category: WARNING ExtendedMessage: BBOO0221W: WSVR0605W: Thread "WebSphere:ORB.thread. pool t=00ac6cf0" (00000020) has been active for 700000 milliseconds and may be hung. There is/are 9 thread(s) in total in the server that may Trace: 2005/08/15 13:34:14.428 01 t=AC2CF0 c=UNK key=P8 (0000000A) Description: Log Boss/390 Error from filename: ./bborjtr.cpp at line: 901 error message: BBOO0221W: WSVR0605W: Thread "WebSphere:ORB.thread. pool t=00ac6cf0" (00000020) has been active for 700000 milliseconds and may be hung. There is/are 9 thread(s) in total in the server that may When using the Trace Analyzer for WebSphere Application Server, we realized that the application ran out of JDBC connection threads. The application in WebSphere for z/OS could not process the next request as there were no more threads available. The request stayed in waiting condition. The application server issued a warning message about a potential hang. This message is common when the connection between WebSphere for z/OS and DB2 database runs into problems, such as authorization failures or thread limitations for new DB2 resource connections. We increased the number of threads available for DB2 connections. Problem: The connection between z/OS and a Java-based Connector to WebSphere MQ does not start. Solution: First verify that WebSphere MQ is running and test to see if the listener is running and connected to the right port. The port must be the same as in those of your counterpart definitions for WebSphere MQ for z/OS. Use the following command to verify: ps -ef | grep runmqlsr If the name of the MQ manager that you suspect has trouble does not appear in the list, it is not started. Check the port numbers in the display. You must use a different port for every Queue Manager. Try to start the listener with the runmqlsr command if the ports are fine. If you see a listener with no port information, this listener is using the default port 1414. In a system where more than one MQ listener is active, this has an impact on the port definitions of the counterpart as well. To start the listener with a different port, you must provide the port number with option -p, such as runmqlsr -m QM1 -t tcp -p 1500
189
Note: When you are trying to connect from two different Queue Managers to a local queue, you might successfully transmit one or more messages only from the first Queue Manager. If that happens, it means that the local message counter of your local queue differs from the message counter of the second remote Queue Manager on the other side. In this situation, no further messages are accepted until the counters on both sides have been reset or have adjusted to each other.
Problem: You find the following error message: CTG9630E: IOException occurred in communication with CICS Solution: This is a configuration error. Review the JNI trace for the root cause. If you find: RRS register return code 0x300 The CICS Transaction Gateway has not been properly configured to use RRS. Consult your CICS or system administrator about changing the CICS Transaction Gateway configuration for RRS. EXCI reason code 403 The CICS Transaction Gateway is unable to contact the target CICS server because of an invalid pipe name. Consult your CICS or system administrator about correcting the pipe name in the CICS Transaction Gateway configuration. Problem: You find the following message: CTG9631E: Error occurred during interaction with CICS. Error Code=ECI_ERR_NO_CICS minor code: 0 completed Solution: The CICS Transaction Gateway is unable to contact the target CICS server. If you also found EXCI reason code 203, then the target CICS server is not active or has not opened IRC communication. Consult your CICS or system administrator to verify that the CICS server region is up and running and accepting requests. Problem: You find the following message: CTG9631E: Error occurred during interaction with CICS. Error Code=ECI_ERR_SECURITY_ERROR Solution: This is a security problem that is related to either validating a user ID or accessing RACF authorized resources. Review the JNI trace for the root cause. If you find: EXCI reason code 423 This means that RACF surrogate checking has failed. Contact your RACF administrator to analyze the problem and fix it. Verify that you are using the correct surrogates in CICS. RACF return code 143 The user ID is unknown or is not defined to RACF or it does not have an OMVS RACF segment. Problem: You find the following message: Return code - 22 Solution: The connection to the gateway was successful; however, the program does not exist. Make sure that you used the right program name or that the program exists.
190
Problem: There is a hang after a connect to gateway message. Solution: The port of the remote gateway cannot be found. Check your port definitions. Problem: You find the following message: ICO0079E:com.ibm.connector2.ims.ico.IMSTCPIPManagedConnection@3b1bf125.getO utputData (InteractionSpec) error. IMS returned DFS message. Solution: The first eight characters of the input could not be recognized as a valid transaction, a logical terminal name, or a command. This usually means the transaction name that you specified in the input request is not recognized by the target IMS system. Make sure that you have defined the correct transaction name. Problem: You find the following message: ICO0001E:com.ibm.connector2.ims.icoIMSTCPIPManagedConnection@d6fd946.proces sOutputOTMAMsg(byte [], InteractionSpec, Record) error. IMS Connect returned the error: RETCODE=[8], REASONCODE=[SECFNPUI]. Security failure; no password and no user ID. Solution: The application description specifies the res-auth application but the application did not provide a user ID or a password. Either correct the res-auth configuration or consider changing the authorization method from res-auth to container. Problem: You find the following message: ICO0003E:com.ibm.connector2.ims.ico.IMSTCPIPManagedConnection@5072da31.conn ect() error. Failed to connect to host [p390.poughkeepsie.ibm.com], port [3500]. [java.net.ConnectException: Connection refused: connect] Solution: The IMS Connect task on the specified host is not accepting the connection request. Verify that the IMS Connect task is active in the target host system and is listening on the specified port. Problem: You find the following message: ICO0064E:com.ibm.connector2.ims.ico.IMSLocalOptionManagedConnection@12d18e56 .processSubject(javax.security.auth.subject aSubject) error Solution: IMS connect was unable to validate the user ID and password that were sent in your request with the external security manager. Either change the user ID and password for the access, provide the correct credentials, or determine if this is correct behavior because only authorized requests should be processed.
191
192
Part 4
Part
193
194
18
Chapter 18.
Commands
There are various commands that can be useful when you are trying to determine the root cause of problems in WebSphere for z/OS and are preparing to analyze logs, traces, dumps, and diagnostic tool output. This chapter is intended to be a quick reference guide to these commands. Although they are not specific to problem determination for WebSphere for z/OS, we consider them very useful and powerful for performing day-to-day administrative tasks with WebSphere for z/OS and for identifying problems. In the sections that follow, we introduce you to: A few command-line tools z/OS commands for WebSphere such as MODIFY, DISPLAY, and TRACE Some TCP/IP commands Related USS for z/OS (OMVS) commands such as df, du, ps, pid The WASgrep.sh command for searching string patterns, The Windows FTP command
195
//
196
The message for your commands is issued to the job log of your WebSphere for z/OS server and to your z/OS system log. As with other z/OS system commands, you can issue them either on a system console or you can use a system command interface, such as SDSF or JES33 Spooler Interface (EJES). The general syntax for the system commands is: MODIFY <Server_Name>,DISPLAY,<Options> The syntax is broken down as follows: Server NAME DISPLAY Options Server name as specified in the JCL Fixed keyword HELP, SERVERS, TRACE, WORK
The MODIFY command can be abbreviated by using the F character, for example: f bboasr2a,display,trace Note that the system commands are not case-sensitive. MODIFY <Server_Name>,DISPLAY returns this information: STC/server name Status System name Level Started task name and server name ACTIVE SYSID of the system where WebSphere for z/OS is active Build level of your WebSphere for z/OS server
MODIFY <Server_Name>,DISPLAY
Example 18-2 shows the result of this MODIFY command.
Example 18-2 MODIFY <Server_Name>,DISPLAY F PDSR01A,DISPLAY BBOO0173I SERVER PDSR01/PDSR01A ACTIVE ON SC49 AT LEVEL W510004. BBOO0188I END OF OUTPUT FOR COMMAND DISPLAY
MODIFY <Server_Name>,DISPLAY,HELP
Example 18-3 shows the result of this MODIFY command.
Example 18-3 MODIFY <Server_Name>,DISPLAY,HELP F PDSR01A,DISPLAY,HELP BBOO0178I THE COMMAND DISPLAY, MAY BE FOLLOWED BY ONE OF THE FOLLOWING 042 KEYWORDS: BBOO0179I SERVERS - DISPLAY ACTIVE CONTROL PROCESSES BBOO0179I SERVANTS - DISPLAY SERVANT PROCESSES OWNED BY THIS CONTROL 044 PROCESS BBOO0179I SESSIONS - DISPLAY INFORMATION ABOUT COMMUNICATIONS SESSIONS BBOO0179I TRACE - DISPLAY INFORMATION ABOUT TRACE SETTINGS BBOO0179I JVMHEAP - DISPLAY JVM HEAP STATISTICS BBOO0179I WORK - DISPLAY WORK ELEMENTS BBOO0179I ERRLOG - DISPLAY THE LAST 10 ENTRIES IN THE ERROR LOG BOO0188I END OF OUTPUT FOR COMMAND DISPLAY,HELP
197
MODIFY <Server_Name>,DISPLAY,SERVERS
Example 18-4 shows the result of this MODIFY command.
Example 18-4 MODIFY <Server_Name>,DISPLAY,SERVERS F PDSR01A,DISPLAY,SERVERS BBOO0182I SERVER ASID SYSTEM LEVEL BBOO0183I PDCELL /SC49 3F5x SC49 W510004 BBOO0183I PDAGNTB /PDAGNTB 78x SC42 W510004 BBOO0183I PDSR01 /PDSR01A 3F6x SC49 W510004 BBOO0183I PDCELL /SC42 72x SC42 W510004 BOO0183I PDDMGR /PDDMGR 3EBx SC49 W510004 BBOO0183I PDAGNTA /PDAGNTA 3ECx SC49 W510004 BBOO0188I END OF OUTPUT FOR COMMAND DISPLAY,SERVERS
MODIFY <Server_Name>,DISPLAY,SERVERS returns the following information: SERVER ASID SYSTEM LEVEL STC and server name for all active servers ASID for all active servers System ID (SYSID) of the system on which the server is active Build level of each active server
MODIFY <Server_Name>,DISPLAY,TRACE
Example 18-5 shows the result of this MODIFY command.
Example 18-5 MODIFY <Server_Name>,DISPLAY,TRACE F PDSR01A,DISPLAY,TRACE BBOO0224I TRACE INFORMATION FOR SERVER PDSR01/PDSR01A/STC05755 BBOO0197I LOCATION = SYSPRINT BUFFER BBOO0197I AGGREGATE TRACE LEVEL = 1 BBOO0197I EXCEPTION TRACING = RAS(0), Common Utilities(1), COMM(3), 059 ORB(4), OTS(6), Shasta(7), OS/390 Wrappers(9), Daemon(A), Security(E), Externalization(F), JRAS(J), J2EE(L), Logging(M) BBOO0197I BASIC TRACING = BBOO0197I DETAILED TRACING = BBOO0197I TRACE SPECIFIC = NONE SPECIFIED BBOO0197I TRACE EXCLUDE SPECIFIC = NONE SPECIFIED BBOO0225I TRACE INFORMATION FOR SERVER PDSR01/PDSR01A/STC05755 064 COMPLETE
The MODIFY <Server_Name>,DISPLAY,TRACE command returns the following information: SERVER/STC LOCATION LEVEL OPTIONS Started task name and server name Target location for trace data Trace level Trace options
MODIFY <Server_Name>,DISPLAY,ERRLOG
You can use this command to display the last 10 messages in the error log even if you are not routing them to a log stream. Example 18-6 on page 199 shows only the last three entries in the log.
198
Example 18-6 WebSphere for z/OS DISPLAY,ERRLOG command F DISPLAY,ERRLOG BBOO0188I END OF OUTPUT FOR COMMAND DISPLAY,WORK,ALL,SRS F PDSR01A,DISPLAY,ERRLOG BBOO0266I (STC05755) BossLog: { 0002} 2004/09/21 22:13:44.668 01 138 SYSTEM=SC49 SERVER=PDSR01A PID=0X03080069 TID=0X216AC930 00000000 c=UNK ./bborjtr.cpp+830 ... BBOO0222I TRAS0017I: The startup trace state is *=all=disabled. BBOO0266I (STC05755) BossLog: { 0003} 2004/09/21 22:13:53.044 01 139 SYSTEM=SC49 SERVER=PDSR01A PID=0X03080069 TID=0X216AC930 00000000 c=UNK ./bborjtr.cpp+830 ... BBOO0222I SECJ0231I: The Security component's FFDC Diagnostic Module com.ibm.ws.security.core.SecurityDM registered successfully: true BBOO0266I (STC05755) BossLog: { 0004} 2004/09/21 22:13:53.485 01 140 9} 2004/09/21 22:14:51.908 01 145 SYSTEM=SC49 SERVER=PDSR01A PID=0X03080069 TID=0X2173C430 0X00001A c=UNK ./bborjtr.cpp+842 ... BBOJ0087W MDB Workload Classification Support is not enabled BBOO0188I END OF OUTPUT FOR COMMAND DISPLAY,ERRLOG
DISPLAY WLM,APPLENV=*
To display the WLM application environment names for WebSphere for z/OS V5, use this command as shown in Example 18-7.
Example 18-7 DISPLAY WLM,APPLENV=* D WLM,APPLENV=* IWM029I 08.28.43 WLM DISPLAY 768 APPLICATION ENVIRONMENT NAME STATE STATE DATA BBOASR1 AVAILABLE BBOASR2 AVAILABLE CBINTFRP AVAILABLE CBNAMING AVAILABLE CBSYSMGT AVAILABLE C1INTFRP AVAILABLE C1NAMING AVAILABLE C1OASR1 AVAILABLE C1OASR2 AVAILABLE C1SYSMGT AVAILABLE DBD7MWLM AVAILABLE DBD8MWLM AVAILABLE DB7AODBA AVAILABLE DB7EUTIL AVAILABLE DB7EWLM AVAILABLE DB7LSQL AVAILABLE DB7LUTIL AVAILABLE DB7LWLM AVAILABLE DB7LWLM2 QUIESCED DB7PDBUG AVAILABLE DB7PJAVS AVAILABLE DB7PODBA AVAILABLE
DISPLAY WLM,DYNAPPL=*
To list all the dynamic application environment names (for WebSphere for z/OS V6), use this command as shown in Example 18-8 on page 200.
199
Example 18-8 DISPLAY WLM,DYNAPPL=* D WLM,DYNAPPL=* IWM029I 08.39.26 WLM DISPLAY 854 DYNAMIC APPL. ENVIRON. NAME STATE STATE DATA PDSR01 AVAILABLE ATTRIBUTES: PROC=PDASR SUBSYSTEM TYPE: CB SUBSYSTEM NAME: PDSR01A NODENAME: PDCELL PDDMGR AVAILABLE ATTRIBUTES: PROC=PDASR SUBSYSTEM TYPE: CB SUBSYSTEM NAME: PDDMGR NODENAME: PDCELL CLU491 AVAILABLE ATTRIBUTES: PROC=WS5491S SUBSYSTEM TYPE: CB SUBSYSTEM NAME: WS491 NODENAME: CL491
200
Explanation Trace settings XCF parameters and couple data sets | structures JES2 spool utilization
To turn off all tracing: F <server_name>,TRACENONE Other useful trace commands are: F F F F <control_region_JOBNAME>,TRACESPECIFIC <control_region_JOBNAME>,TRACE_EXCLUDE_SPECIFIC <control_region_JOBNAME>,TRACETOTRCFILE <control_region_JOBNAME>,MDBSTATS
201
The z/OS MODIFY command does not require the server to be recycled. To turn on Java tracing for specified components such as com.ibm.ws.security, enter: F <server_name>,TRACEJAVA='com.ibm.ws.security.yyy.*=all=enabled To reset to trace settings in your configuration (such as in was.env), enter: F <server_name>,TRACEINIT To turn off all tracing, enter: F <server_name>,TRACENONE For more information, search for Dynamic Java Trace at the WebSphere for z/OS Information Center: http://publib.boulder.ibm.com/infocenter/wasinfo/v6r0/index.jsp
202
State
You should be most interested in the normal session states of LISTEN (a server waiting for work), ESTABLISHED (a communication between client and server is in progress), and SYNC_SENT (usually means an attempt to establish a connection is being blocked by a firewall). For an explanation of the other states, refer to TCP/IP V3.2 for MVS: Users Guide, SC13-7136. Example 18-9 shows the output after we issued the netstat command in our z/OS system.
Example 18-9 The netstat command and its response. >tso netstat MVS TCP/IP NETSTAT CS V1R6 TCPIP Name: TCPIP User Id Conn Local Socket Foreign Socket ------- ---------------------------FTPMVS1 0000002A 9.12.4.28..21 0.0.0.0..0 FTPOE1 0000002B 9.12.4.29..21 0.0.0.0..0 INETD1 00000033 9.12.4.29..512 0.0.0.0..0 INETD1 00000030 9.12.4.29..23 0.0.0.0..0 INETD1 00000032 0.0.0.0..513 0.0.0.0..0 INETD1 00000031 9.12.4.29..514 0.0.0.0..0 NFSMVS 00000050 0.0.0.0..10001 0.0.0.0..0 NFSMVS 0000004F 0.0.0.0..10000 0.0.0.0..0 NFSMVS 00000055 0.0.0.0..2049 0.0.0.0..0 NFSMVS 00000052 0.0.0.0..10002 0.0.0.0..0 PMAP 00000028 0.0.0.0..111 0.0.0.0..0 REXECD 00000025 9.12.4.28..512 0.0.0.0..0 REXECD 00000026 9.12.4.28..514 0.0.0.0..0 TCPIP 00000016 127.0.0.1..1024 127.0.0.1..1025 TCPIP 000206E2 9.12.4.28..23 9.12.6.132..1438 TCPIP 0000001B 0.0.0.0..23 0.0.0.0..0 15:29:47 State ----Listen Listen Listen Listen Listen Listen Listen Listen Listen Listen Listen Listen Listen Establsh Establsh Listen
203
TCPIP TCPIP TCPIP WASPDCTG WS551 WS551 WS551 WS551 WS551 WS551D WS6552 WS6552 WS6552 WS6552 WS6552 WS6552D WS6552S NFSCLNT NFSCLNT NFSCLNT NFSCLNT NFSCLNT NFSCLNT NFSCLNT NFSMVS NFSMVS NFSMVS NFSMVS NFSMVS PMAP SYSLOGD4
0000000C 00000015 0001492A 000017E3 00000275 0000024D 00000249 00000276 0000024B 0000023B 0001385F 00013839 00013837 00013835 00013860 0001381F 00014871 00000022 00000021 00000020 0000001F 0000001E 0000001D 0000001C 00000054 0000004E 00000051 0000004D 00000053 00000027 00000024
127.0.0.1..1024 127.0.0.1..1025 9.12.4.28..23 0.0.0.0..2006 0.0.0.0..19080 0.0.0.0..10003 0.0.0.0..18880 0.0.0.0..19443 0.0.0.0..12809 0.0.0.0..15655 0.0.0.0..29080 0.0.0.0..10044 0.0.0.0..22809 0.0.0.0..28880 0.0.0.0..29443 0.0.0.0..25655 9.12.4.28..10054 0.0.0.0..1017 0.0.0.0..1018 0.0.0.0..1019 0.0.0.0..1020 0.0.0.0..1021 0.0.0.0..1022 0.0.0.0..1023 0.0.0.0..2049 0.0.0.0..10001 0.0.0.0..10002 0.0.0.0..10000 0.0.0.0..10003 0.0.0.0..111 0.0.0.0..514
0.0.0.0..0 127.0.0.1..1024 9.12.6.136..2330 0.0.0.0..0 0.0.0.0..0 0.0.0.0..0 0.0.0.0..0 0.0.0.0..0 0.0.0.0..0 0.0.0.0..0 0.0.0.0..0 0.0.0.0..0 0.0.0.0..0 0.0.0.0..0 0.0.0.0..0 0.0.0.0..0 9.12.4.30..38050 *..* *..* *..* *..* *..* *..* *..* *..* *..* *..* *..* *..* *..* *..*
Listen Establsh Establsh Listen Listen Listen Listen Listen Listen Listen Listen Listen Listen Listen Listen Listen ClosWait UDP UDP UDP UDP UDP UDP UDP UDP UDP UDP UDP UDP UDP UDP
204
Example 18-11 shows sample output from the ping command on a workstation.
Example 18-11 The ping command and its response C:\Documents and Settings\TOT188>ping wtsc55.itso.ibm.com Pinging wtsc55.itso.ibm.com [9.12.4.28] with 32 bytes of data: Reply Reply Reply Reply from from from from 9.12.4.28: 9.12.4.28: 9.12.4.28: 9.12.4.28: bytes=32 bytes=32 bytes=32 bytes=32 time=4ms time=4ms time=4ms time=7ms TTL=63 TTL=63 TTL=63 TTL=63
Ping statistics for 9.12.4.28: Packets: Sent = 4, Received = 4, Lost = 0 (0% loss), Approximate round trip times in milli-seconds: Minimum = 4ms, Maximum = 7ms, Average = 4ms
205
Example 18-12 shows sample output from the tracert command on a workstation.
Example 18-12 The tracert command and its response C:\Documents and Settings\TOT188>tracert plpsc.pok.ibm.com Tracing route to plpsc.pok.ibm.com [9.56.214.1] over a maximum of 30 hops: 1 2 3 4 5 6 7 8 3 4 4 4 4 4 7 8 ms ms ms ms ms ms ms ms 3 4 4 4 4 4 4 5 ms ms ms ms ms ms ms ms 3 4 4 4 4 4 5 4 ms ms ms ms ms ms ms ms pok6509r.itso.ibm.com [9.12.6.92] 9.56.1.189 pok-ud-2a-v993.pok.ibm.com [9.56.126.3] pok-co-a-v808.pok.ibm.com [9.56.2.33] pok-bd-b-ge0-4.pok.ibm.com [9.56.2.6] pok-sc-a-v256.pok.ibm.com [9.56.1.6] pok-sd-5b-ge2-1.pok.ibm.com [9.56.208.4] plpsc.pok.ibm.com [9.56.214.1]
Trace complete.
206
.
WASPD2 @ SC55:/waspdconfig/pdcell>df -k ./* Mounted on Filesystem Avail/Total Files Status /waspdconfig/pdcell (OMVS.WAS.PDCELL.CONFIG.HFS) 139200/288000 4294949456 Available /waspdconfig/rescell (OMVS.WAS.RESCELL.CONFIG.HFS) 222224/288000 4294960903 Available Figure 18-1 USS command df display
This display shows that there are two file systems mounted under the /waspdconfig subdirectory: ./pdcell has 143560 K available out of a total of 288000 K (50% full). ./rescell has 222224 K available out of a total of 288000 K (23% full).
In this command: du -k . displays list that consists of all the files and subdirectories. sort -r sorts the list in reverse or descending order. head -n 20 displays only the top 20 rows of the sorted list. These three commands are connected with the UNIX | piping function. This tool displays a list of rows sorted in descending order. Each row consists of two columns: the size in kilobyte block and the name of the subdirectories as shown in Example 18-13.
Example 18-13 The du command output WASPD2 @ SC55:/waspdconfig/pdcell>du -k | sort -r | head -n 20 119580 . 84644 ./DeploymentManager 43436 ./DeploymentManager/installedApps 43428 ./DeploymentManager/installedApps/pdcell 43276 ./DeploymentManager/installedApps/pdcell/adminconsole.ear 42596 ./DeploymentManager/installedApps/pdcell/adminconsole.ear/adminconsole.war 29576 ./DeploymentManager/installedApps/pdcell/adminconsole.ear/adminconsole.war/WEB-INF 27000 ./AppServerNodeA 16144 ./DeploymentManager/wstemp 14316 ./DeploymentManager/config 13900 ./AppServerNodeA/config 13724 ./DeploymentManager/config/cells 13708 ./DeploymentManager/config/cells/pdcell 13128 ./DeploymentManager/config/cells/pdcell/applications 12392 ./AppServerNodeA/config/backup
207
CPU% ASID ASIDX 0.00 89 0059 0.02 92 005C 0.00 94 005E 0.02 97 0061 0.02 101 0065 0.02 1005 03ED 0.04 1011 03F3
COMMAND INPUT ===> /D OMVS,ASID=65 SCROLL == RESPONSE=SC49 BPXO040I 18.21.20 DISPLAY OMVS 833 OMVS 000E ACTIVE OMVS=(5A) USER JOBNAME ASID PID PPID STATE START CT_SECS ASSR1 PDSR01AS 0065 84410478 1 HR---- 17.53.49 28.11 LATCHWAITPID= 0 CMD=BBOSR From OMVS command line: >ps -p 84410478 -m | wc -l 25
In this sample, you can see how we issued commands based on the result of the previous command: 1. SDSF shows the ASIDX for PDSR01AS as 0065. 2. /D OMVS,ASID=65 shows the PID as 84410478. 3. ps -p 84410478 -m | wc -l gives the number of threads as 25.
208
CPU% ASID ASIDX 0.00 89 0059 0.02 92 005C 0.00 94 005E 0.02 97 0061 0.02 101 0065 0.02 1005 03ED 0.04 1011 03F3
COMMAND INPUT ===> /D OMVS,ASID=65 SCROLL == RESPONSE=SC49 BPXO040I 18.21.20 DISPLAY OMVS 833 OMVS 000E ACTIVE OMVS=(5A) USER JOBNAME ASID PID PPID STATE START CT_SECS ASSR1 PDSR01AS 0065 84410478 1 HR---- 17.53.49 28.11 LATCHWAITPID= 0 CMD=BBOSR COMMAND INPUT ===> /D OMVS,PID=84410478 SCROLL == RESPONSE=SC49 BPXO040I 18.22.14 DISPLAY OMVS 835 OMVS 000E ACTIVE OMVS=(5A) USER JOBNAME ASID PID PPID STATE START CT_SECS ASSR1 PDSR01AS 0065 84410478 1 HR---- 17.53.49 28.11 LATCHWAITPID= 0 CMD=BBOSR THREAD_ID TCB@ PRI_JOB USERNAME ACC_TIME SC STATE 2172D91000000000 006F4118 26.611 PTC YU 2172E62000000001 006E0170 .001 PTX JY V 2173004000000002 006D04F0 .001 PTX JY V 21730D5000000003 006D0260 .003 RED JY V 2173277000000004 006DCE88 .005 CLO JY V 2173348000000005 006DCCF0 .085 STE JY V 21734EA000000006 006DCB58 .001 PTX JY 21735BB000000007 006DC9C0 .001 PTX JY V 217368C000000008 006DC690 .001 PTX JY V 217375D000000009 006DC360 .094 STE JY V 217382E00000000A 006DC1C8 .001 PTX JY V 21738FF00000000B 006CFE88 .443 STE JY V 21739D000000000C 006CFCF0 .004 RED JY V 2173AA100000000D 006CFA60 .003 RED JY V 21739D000000000C 006CFCF0 .004 RED JY V 2173AA100000000D 006CFA60 .003 RED JY V 2173B72000000011 006CF7D0 .001 PTX JY V 2173C43000000014 006CCE88 .001 PTX JR V 2173D14000000015 006CF180 .001 PTX JR V 2173DE5000000016 006C71F0 WLM .001 PTX JR V 2177CC2000000017 006C7388 WLM .001 PTX JR V 2178830000000018 006C7520 WLM .001 PTX JR V 2178B74000000019 006C76B8 WLM .001 PTX JR V 2178C4500000001A 006CC0B0 WLM .001 PTX JR V 2178D1600000001B 006CC248 WLM .001 PTX JR V
TCB address
Thread ID
CPU time
WLM threads
209
To use the tool, you cd to an appropriate directory. Then, you run the shell script with a string pattern that you want to search: <script-directory>/WASgrep.sh jmsQCF2 This tool displays the XML file name and all the text lines that have the string pattern. Example 18-16 shows that only the server.xml file contains the was.wlmTimeout search string.
Example 18-16 Search string in server.xml file cd /waspdconfig/pdcell/AppServerNodeB/config/cells/pdcell/nodes/pdnodeb/servers/pdsr02b/ /u/waspd2/WASgrep.sh was.wlmTimeout ------------------------------------------./namestore-cell.xml ------------------------------------------./namestore-node.xml
210
After supplying the user ID and password, you can issue PUT and GET commands to transfer files between the two systems.
211
212
19
Chapter 19.
213
214
Use the JOB statement MSGLEVEL parameter to request that the job control statements be printed in the job log output listing. Use MSGLEVEL=(1,1) to receive the maximum amount of information in the following order: 1. JES messages and job statistics 2. All job control statements in the input stream and procedures 3. Messages about job control statements 4. JES and operator messages about the processing of the job: allocation of devices; volumes, execution, and termination of job steps and the job; and disposition of data sets
19.1.3 System log and job log output and their interpretation
Each job log output section contains certain types of information: JESMSGLG: This section contains start-up messages, including a list of environment variable values and server settings, and the service level of WebSphere, for example: BBOM0007I CURRENT CB SERVICE LEVEL IS build level cf20523.06 release WAS601.ZNATV date 06/07/05 10:24:12. It also lists the Java service level in a J2EE servant region, for example: BBOJ0011I JVM Build is J2RE 1.4.2 IBM z/OS Persistent Reusable VM build cm142sr1a-20050209 (JIT enabled: jitc). 070 829
JESJCL: This section lists the JCL of the procedure that is running the address space. This is a useful place to look for incorrect STEPLIBs and other JCL related issues. JESYSMSG: This section might list more messages, dump information, and provide a list of environment variables and server settings. CEEDUMP: An exception in the address space might cause this section to be generated. It lists failure information including trace backs (a trace back shows which functions were last called prior to the program failure). System output (SYSOUT): During normal processing, the SYSOUT should be empty, but there are situations that cause output to be written to this section. If the error log stream cannot connect, the messages set to be written to the error log are written to CERR, which goes to SYSOUT. Trace from the JVM occurs when you set the control_region_jvm_logfile, server_region_jvm_logfile, and server_region_use_java_g environment variables to SYSPRINT. SYSPRINT: The WebSphere for z/OS trace output can be written to SYSPRINT if the environment variable is ras_trace_outputLocation=SYSPRINT. Important: When BM support asks for a trace, always send the entire job output, because each section might have useful information that can help debug the problem. The job log can have a variety of information based on environment variable settings. As a default, it obtains information about the job itself, such as life cycle messages (when it started, when it finished initiating, and so on), the JCL that was used to run the job, data set utilization, and other typical JES messages. There are also WebSphere messages, which start with the BBO prefix. The messages in the console (what we refer to as SYSLOG messages) are typically related to configuration failures of other products, unrecoverable WebSphere configuration errors, and WebSphere life cycle messages. Messages written explicitly to the job log are more general failure and warning messages. Messages with more details that support these general failure and warnings can be found in the error log.
Chapter 19. Logs for problem determination in WebSphere for z/OS
215
Other important pieces of useful information that always come out in the job log are the configuration messages. These list the values of the environment variables and the server properties. You also have the option of using various traces and managing their output. For more information about traces, refer to Chapter 4 in WebSphere Application Server for z/OS V6, Troubleshooting and support, GA22-7964-03. Example 19-1 shows a part of SYSLOG, where information generated at start-up for WebSphere for z/OS is captured. In this example, the release, build level, cell name, node name, and procedure name are displayed.
Example 19-1 SYSLOG sample at start-up +BBOO0239I WEBSPHERE FOR Z/OS SERVANT PROCESS cl6422/nd6422/ws6422 IS STARTING. +BBOM0007I CURRENT CB SERVICE LEVEL IS build level cf20523.06 release WAS601.ZNATV date 06/07/05 10:24:12.
For more information about the messages you can see in the SYSLOG and job log, refer to Appendix A, Messages and codes on page 311, and the WebSphere for z/OS Information Center.
216
If, however, the server cannot connect to the log stream, the message is instead written to CERR, which puts it in the SYSOUT of the job output. This is indicated by this message: BBOO0024I ERRORS WILL BE WRITTEN TO CERR FOR JOB <server name> Important: Even if the server successfully connects to the log stream, there will still be a message saying that errors will be written to CERR. This is because during initialization, before the connection to the log stream is made, errors are written to CERR. When they are written to CERR, or SYSOUT, messages have a header that looks like that shown in Example 19-3. Notice that it is prefaced with BossLog.
Example 19-3 BBORBLOG sample header BossLog: { 0001} 2005/08/08 13:06:36.265 01 SYSTEM=SC42 SERVER=<none> You can view the error log stream output using the BBORBLOG browser. To invoke the browser, go to ISPF option 6 and enter: ex BBO.SBBOEXEC(BBORBLOG) WAS.SC42.ERROR.LOG
217
In this example, BBORBLOG resides in BBO.SBBOEXEC and WAS.SC42.ERROR.LOG is the LOG_STREAM_NAME that was configured in the administrative console. Other examples include: ex BBO.SBBOEXEC(BBORBLOG) WAS.SC42.ERROR.LOG 80 ex BBO.SBBOEXEC(BBORBLOG) WAS.SC42.ERROR.LOG noformat The browser creates a data set named USERID.LOG_STREAM_NAME (for example, WASPD3.WAS.SC42.ERROR.LOG), which contains the formatted contents of the log stream. When the browser is started, it: 1. Allocates the USERID.LOG_STREAM_NAME data set, overwriting any duplicates 2. Populates the data set with the contents of the log stream 3. Puts the user in browse mode on the data set Important: Each time BBORBLOG is invoked, a static file is created that overwrites the existing file. To refresh the file, it is necessary to re-issue BBORBLOG. If you want to keep the last log, you must rename it before running the tool again. You can also invoke the BBORBLOG utility from the OMVS shell or Telnet using the shell script shown in Example 19-4.
Example 19-4 Shell script to format error log stream from MVS shell or telnet /* REXX */ /* trace r */ parse arg logstrm format . if logstrm = '' then logstrm = "WAS.ERROR.LOG" if format = '' then format = "80" qual =userid() file_name = "/tmp/" || qual || ".errorlog" "rm " || file_name "touch " || file_name call syscalls 'SIGOFF' call bpxwdyn "alloc fi(bbolog) path('" || file_name || "')" address LINKMVS "BBORBLOG logstrm format" call bpxwdyn "free fi(bbolog)" "vi " || file_name exit(0)
The line numbers together with the output listed in Table 19-1 can help you analyze and understand the output of this log.
218
Table 19-1 Parts of server log stream record output Line number 1 1 1 2 2 2 2 Component 2005/08/08 20:11:04.658 01 SYSTEM=SC42 SERVER=WS6422 ASID=0X0403 PID=0X0301014D TID=0X22172FA0 0X000019 c=UNK Description Date, time stamp, 2-digit record version number System name Server name ASID PID Thread identifier (TID) Request correlation information
Lines 1 and 2 help identify when and where the error occurred. 3 3 ./bbooboat.cpp+3152 BBOO0011W File name and line Log message number
The message number can provide detailed information about the error. 3, 4, 5, 6 The function Log message
It shows you which function was active in the moment of error; useful information for describing the problem to the IBM Support Center. 6 Error code is C9C2102F Error code
Sometimes, the error code is more meaningful than the message number.
For further details about BBORBLOG see WebSphere Application Server for z/OS V6, Troubleshooting and support, GA22-7964-03, and the WebSphere for z/OS Information Center at: http://publib.boulder.ibm.com/infocenter/wasinfo/v6r0/index.jsp For assistance with error messages, see Appendix A, Messages and codes on page 311.
219
Directives DB
Errors, Events
Incidents
FFDC High-Performance Filter
Incidents
Diagnostic Engine
Log
The servant regions of WebSphere for z/OS have been instrumented with calls to the FFDC tool. When an exception occurs, the event is passed through the error filter to the diagnostic engine. If the analysis engine is enabled, it retrieves any information that relates to the event from the symptom database. Each component of WebSphere Application Server for z/OS contains a symptom database for the FFDC tool. This symptom database is located in the <install_root>/properties/logbr/ffdc/adv/ directory installed under all members of the cell, including the deployment manager and all installed nodes. When the analysis engine tool is run against an exception log, the symptom database, ffdcdb.xml, is used to try to provide a solution to the exceptions that are captured by the FFDC tool. The analysis engine uses the name, the source ID, and the probe ID from the exception as the key to the symptom database. If a match exists, the solution is displayed after the analysis engine is run.
220
If a particular cell member is of interest during a runtime problem determination session, only that particular FFDC capability has to be activated. The only property that needs to be reset in the ffdcRun.properties file is the Level property. For a WebSphere for z/OS V5 installation, the Level property is set to a value of 0 by default. Figure 19-2 shows the possible Level values and their effects.
# Level of processing to perform # 0 - none # 1 - monitor exception path # 2 - dump the call stack, with no advanced processing # 3 - 2, plus object interspecting the current object # 4 - 2, plus use DM to process the current object # 5 - 4, plus process the top part of the call stack with DMs # 6 - perform advanced processing the entire call stack Figure 19-2 ffdcRun.properties level values
In this example, the default value is set to 4. The authors recommend that you use the same value as a starting point. However, if you want to set it to another logging level, use the following process: 1. 2. 3. 4. 5. Locate the ffdcRun.properties file for the address space of interest. Open the ffdcRun.properties file for editing in any ASCII-capable editor. Set the Level key of the ffdcRun.properties to the level of your choice (see Figure 19-2). Save the ffdcRun.properties file. Restart the address space.
Important: In z/OS, each address space produces its own set of FFDC files. Therefore, it is important to determine which address space is of interest during the runtime debug process to enable the FFDC tool for that specific address space.
221
The index file uses a naming convention of <server name>_exception.log and has the following columns: Index This column is used to determine the number of rows in the table. A plus sign (+) in front of this value signifies that this value has been updated since the last persistence of the information to the file system. This column signifies the number of times that the exception has occurred. A time stamp of the last occurrence of this exception. The name of the Java exception class that was captured by the FFDC tool. The unique identifier for the exception source. The unique identifier for the probe used in the data capture.
The exception file in Example 19-7 was produced by setting the Level to 4 in the ffdcRun.properties file. It has the stack trace for a java.lang.IllegalStateException and a dump of the this object and its properties. This was not captured in the z/OS system log, so the information captured by FFDC was over and above the normal logging.
Example 19-7 FFDC exception file ------Start of DE processing------ = [04.09.27 14:58:58:371 GMT] , key = java.lang.IllegalStateException com.ibm.ws.webcontainer.servlet.ServletManager.doService 3891 Exception = java.lang.IllegalStateException Source = com.ibm.ws.webcontainer.servlet.ServletManager.doService probeid = 3891 Stack Dump = java.lang.IllegalStateException: Context has not been prepared for next connection at com.ibm.ws.webcontainer.srt.NilSRPConnection.getHeaderNames(SRTConnectionContext.java:482) at com.ibm.ws.webcontainer.srt.SRTServletRequest.prepareHeader(SRTServletRequest.java(Compiled Code)) at com.ibm.ws.webcontainer.srt.SRTServletRequest.getHeader(SRTServletRequest.java:307) at . . . at com.ibm.ws390.orb.ORBEJSBridge.invoke(ORBEJSBridge.java:170) Dump of callerThis = Object type = com.ibm.ws.webcontainer.servlet.StrictServletInstance com.ibm.ws.webcontainer.servlet.StrictServletInstance@4045fee5 Exception = java.lang.IllegalStateException Source = com.ibm.ws.webcontainer.servlet.ServletManager.doService probeid = 3891 Dump of callerThis = Object type = com.ibm.ws.webcontainer.servlet.StrictServletInstance class$com$ibm$ws$webcontainer$servlet$StrictServletInstance = serialPersistentFields = {} serialVersionUID = 3206093459760846163 allPermDomain = null getPDperm = null have_extensions = true _servicingCount = 0 _servletClassname = com.ibm.ws.cache.servlet.ServletWrapper _servletName = action _servlet = class$com$ibm$ws$cache$servlet$ServletWrapper =
222
serialPersistentFields = this.class$com$ibm$ws$webcontainer$servlet$StrictServletInstance.serialPersistentFields serialVersionUID = 3206093459760846163 allPermDomain = null getPDperm = null have_extensions = true applicationUnAvailList = class$java$lang$Object = null size = 0 elementData = [Ljava.lang.Object;@3b07e96 serialVersionUID = 8683452581122892189 modCount = 0 firstTime = true wrapsCacheableServlet = false cacheEntrySet = true cacheEntry = null proxied = definitionsFactory = org.apache.struts.tiles.definition.ReloadableDefinitionsFactory@25807ee1 lStrings = java.util.PropertyResourceBundle@743b3e83 LSTRING_FILE = javax.servlet.http.LocalStrings HEADER_LASTMOD = Last-Modified HEADER_IFMODSINCE = If-Modified-Since METHOD_TRACE = TRACE METHOD_PUT = PUT METHOD_POST = POST METHOD_OPTIONS = OPTIONS METHOD_GET = GET METHOD_HEAD = HEAD METHOD_DELETE = DELETE config = this._config tc = ivLogger = null ivResourceBundleName = com.ibm.ws.cache.resources.dynacache ivDumpEnabled = false defaultMessageFile = com.ibm.ejs.resources.seriousMessages ivEntryEnabled = false ivEventEnabled = false ivDebugEnabled = false ivName = com.ibm.ws.cache.servlet.ServletWrapper tc = ivLogger = null ivResourceBundleName = com.ibm.ejs.resources.seriousMessages ivDumpEnabled = false defaultMessageFile = com.ibm.ejs.resources.seriousMessages ivEntryEnabled = false ivEventEnabled = false ivDebugEnabled = false ivName = com.ibm.ws.webcontainer.servlet.StrictServletInstance syncObject = java.lang.Object@40457ee5 servicingCount = 1 _implementsSTM = false _config = _servletName = action _initParams = hexDigit = [C@523dfeb5 whiteSpaceChars = specialSaveChars = =: #! strictKeyValueSeparators = =: keyValueSeparators = =:
223
defaults = null serialVersionUID = 4112578634029874840 class$java$util$Hashtable$Entry = java.lang.Class@68cbe4b emptyIterator = java.util.Hashtable$EmptyIterator@522cbeb5 emptyEnumerator = java.util.Hashtable$EmptyEnumerator@522d3eb5 ENTRIES = 2 VALUES = 1 KEYS = 0 values = null entrySet = null keySet = null modCount = 11 loadFactor = 0.75 threshold = 17 count = 10 table = [Ljava.util.Hashtable$Entry;@40657ee5 _servletContext = com.ibm.ws.webcontainer.webapp.WebApp@16503e92 _unavailableUntil = -1 _servicingState = _instance = this._servicingState _state = _instance = this._state PERMANENTLY_UNAVAILABLE_FOR_SERVICE_STATE = _instance = this.PERMANENTLY_UNAVAILABLE_FOR_SERVICE_STATE UNAVAILABLE_FOR_SERVICE_STATE = _instance = this.UNAVAILABLE_FOR_SERVICE_STATE AVAILABLE_FOR_SERVICE_STATE = this._servicingState ERROR_STATE = _instance = this.ERROR_STATE DESTROYED_STATE = _instance = this.DESTROYED_STATE DESTROYING_STATE = _instance = this.DESTROYING_STATE STM_SERVICING_STATE = _instance = this.STM_SERVICING_STATE SERVICING_STATE = this._state IDLE_STATE = _instance = this.IDLE_STATE INITIALIZING_STATE = _instance = this.PRE_INITIALIZED_STATE PRE_INITIALIZED_STATE = _instance = this.PRE_INITIALIZED_STATE
The information that was dumped from the this object provides extra context information for the stack, including the member variables, calling objects, and so on. This information is of limited value when you are doing a problem determination by inspection. However, it can provide the analysis engine with valuable information when the exception log can be correlated to the symptom database. The real value of this exception log is the capture of the exception, during runtime, of the exception stack trace. The combination of the exception, source ID, and the probe ID form an index key that is used to identify the exception log that has the FFDC exception information. The exception file for each exception uses a naming convention of <server name>_<thread Id>_yy.MM.dd_HH.mm.ss_<unique id>.txt and contains information that is relative to the value of the ffdcRun.properties Level property value. The higher the value of the Level property, the greater the amount of information in the exception file. Refer to 19.3.2, How to set up the FFDC tool on page 220 for an explanation of the information that is produced by each logging level. 224
Problem Determination for WebSphere for z/OS
After it is enabled, the FFDC tool produces the index and exception logs that are associated with the address space and persist to the <install base>/logs/ffdc directory. Retrieval of these log files can be done by using an FTP client from any other environment. For example, the index and exception logs could be retrieved with the ASCII setting for the FTP client on a Microsoft Windows host. Because the index and exception logs are text files, they can be viewed in any ASCII-capable text editor or viewer.
Preliminary investigation
Having retrieved the ASCII FFDC index file and exception logs, we followed several steps to determine that a java.lang.IllegalStateException had occurred: 1. We opened the dmgr_exception.log with WordPad to format the data and performed a visual inspection. It was clear that the IllegalSateException was of interest. We noted that the java.lang.Illegal state exception was originating from multiple classes. This is because the exception was being trapped by the FFDC tool as it traversed the call stack. 2. We began to inspect the exception logs that the FFDC tool had produced and that we had retrieved from the z/OS system. Here is where we noted a weakness between the index file and the naming of the exception logs: It was not clear which exception log contained the java.lang.IllegalStateException without opening each file and inspecting them. 3. Once again, we used WordPad on a Windows client to open each of these files in a formatted manner. The set of files had the names shown in Figure 19-3.
Figure 19-3 Index and exception Logs Chapter 19. Logs for problem determination in WebSphere for z/OS
225
As you can see, the dmgr_4c8beb2_04.09.27_14.58.58_0.txt file had the first instance of the java.lang.IllegalStateException. Note: The authors edited Figure 19-3 to eliminate all occurrences of the com.ibm.ws.classloader.CompoundClassLoader.loadClass exception. Only the last of these exceptions is shown. The information in the exception log also includes class and state information. This interpretation of the information beyond the stack trace is meant to be used by IBM to debug the problem and has little meaning outside of the IBM support network. Therefore, forward the exception logs to IBM support to obtain further information about the captured exception.
In this case, the analysis engine did not find a solution in the symptom database, as reported by this statement:
Solution: ****** NOT FOUND
The conclusion of the analysis, by inspection, resulted in further investigation by the WebSphere development team. Based on the java.lang.IllegalStateException, the team studied the synchronized code that was responsible for managing the administrative console and determined that a race condition existed for the shared connection resource. The code was then updated so that the appropriate synchronization occurred for the shared resource. It took less than one hour to run the FFDC tool in the customer environment and, in this case, it provided an important clue in resolving the problem.
226
227
Application Code
WebSphere
JRas
Logger
Handler
OutPut
Filter
Figure 19-4 Java logging architecture
Filter
Formatter
Figure 19-5 on page 229 shows Enable log in Diagnostic Trace Service and options to enable logs and trace.
228
The difference between V6 and older versions of WebSphere for z/OS is that the log level string has its own panel in V6. To set traces and log level details: 1. In the navigation pane, click Servers and Application Servers. 2. Click the name of the server that you want to work with. 3. Under Troubleshooting, select Logging and tracing. 4. Click Change Log Detail levels. 5. To make a static change to the configuration, click the Configuration tab. A list of well-known components, packages, and groups is displayed. 6. To change the configuration dynamically, click the Runtime tab. 7. Select a component, package, or group to set a logging level. 8. Click Apply. 9. Click OK. Figure 19-6 on page 230 shows the Administrative Console panel for changing Log Levels Details. The list of components, packages, and groups shows all the components that are currently registered to the running server.
229
The default log level is *=info. To modify it, you can type another level or set it using the graphical menu. Table 19-2 describes the fields in the first line of the trace.
Table 19-2 Log Details Level Level Off Fatal Severe Warning Audit Info Config Detail Fine Finer Finest All Consequence No events are logged Task cannot continue and component cannot function Task cannot continue but component can still function Potential error or impending error Significant event affecting server state or resources General information outlining overall task progress Configuration change or status General information detailing subtask progress Trace information: General trace Trace information: Detailed trace Trace information: A more detailed trace (includes all the details that are needed to debug problems All events are logged. Inclusive custom logs
230
The syntax of strings that are used in the log detail level is specified by the Java Logging specification. The string is defined by the component or group that you want to trace, followed by an equal sign (=) and the level for detail. For example, to enable fine trace level for all classes in the com.ibm.ws.classloades package, use: com.ibm.ws.classloader.*=fine To enable detailed trace level for all components in the EJBContainer group, use: EJBContainer=finest Note: Tracing components have an impact on performance. A trace of all components (com.ibm.*=all) causes the system to run very slowly.
You can see the fields listed in Table 19-3 in the first line.
Table 19-3 First line in trace sample 2005/08/03 13:43:44.723 01 t=7CB4F8 c=UNK Date and hour of the entry in the trace. Version ID TCB Represents correlation information that consists of internal runtime information (session ID and request ID) that is used to identify trace entries related to a particular client request Represents which state and key the code is running in, for example, code running in a control process is running in supervisor state, key 2 (s2) and code running in a servant process is in problem state, key 8 (p8) Trace point ID that is used to locate trace in code and which follows the ccmmmttt, structure, where cc is the component ID from include/private/bborras.h, mmm identifies the module in the component in include/private/ras/bboXcrd.h, and ttt is the unique trace point within the source (this value is in hex; the value in the code is decimal)
key=P8
(0000000A)
In the second line, you can see a brief description of the entry recorded in the trace:
Description: Log Boss/390 Error
231
232
plugin.log
error log
access log
Clients (browsers)
IBM HTTP Server
-vv trace
plugin-cfg.xml httpd.conf
Web container
233
The fields (delineated by numbers in the illustration) in the server error log represent the following information: 1. Date and time when the entry of the request was recorded in the server 2. Error message (a description of this message is provided in IBM HTTP Server Planning, Installing, and Using, SC34-4826) 3. The IP address of the client that accessed the server 4. The URL that the client requested 5. The context root and the file requested by the client Note: If you access the server from a PC client, the IP address in the server error log might not be the same as the IP address of your PC. This depends on the network configuration that you use (proxies, gateways, and so on).
234
9.12.6.160 - - [30/Sep/2004:11:47:26 +0400] "GET / HTTP/1.1" 403 282 9.12.6.160 - - [30/Sep/2004:11:47:38 +0400] "GET /mytest HTTP/1.1" 404 375 9.12.6.160 - - [30/Sep/2004:11:51:45 +0400] "GET /IBMTools/ HTTP/1.1" 500 310 9.12.6.160 - - [30/Sep/2004:11:51:53 +0400] "GET /IBMTools/ HTTP/1.1" 304 0 9.12.6.160 - - [30/Sep/2004:11:51:53 +0400] "GET /IBMTools/rbhome.gif HTTP/1.1" 304 0 9.12.6.160 - - [30/Sep/2004:11:57:49 +0400] "GET /IBMTools/EBizHitCount HTTP/1.1" 200 1070 9.12.6.160 - - [30/Sep/2004:11:59:41 +0400] "GET /testapp HTTP/1.1" 404 375 9.12.6.160 - - [30/Sep/2004:12:00:02 +0400] "GET /IBMTools HTTP/1.1" 500 308 9.12.6.160 - - [30/Sep/2004:12:00:44 +0400] "GET /myTest/ HTTP/1.1" 404 375 9.12.6.160 - - [30/Sep/2004:12:01:06 +0400] "GET /IBMTools/EBizHitCount HTTP/1.1" 200 1175 9.12.6.160 - - [30/Sep/2004:13:01:43 +0400] "GET /IBMTools/testapp HTTP/1.1" 404 9.12.6.160 - - [30/Sep/2004:13:02:32 +0400] "GET /IBMTool/testapp HTTP/1.1" 404 375 9.12.6.160 - - [30/Sep/2004:13:03:11 +0400] "GET /IBMTools/EBizSuperSnoop HTTP/1.1" 200 14023 9.12.6.160 - - [30/Sep/2004:13:06:49 +0400] "GET /IBMTools/EBizHitCount HTTP/1.1" 404 9.12.6.160 - - [30/Sep/2004:13:06:59 +0400] "GET /IBMTools/EBizSuperSnoop HTTP/1.1" 404 -
6 7
The numbered fields represent the following information: 1. 2. 3. 4. 5. 6. 7. The IP address of the client that made the request The date and time of the request The method of the request The file that the client requested The protocol and version The value of the HTTP return code The size of the file (in bytes) being requested
For more information about the server logs, see IBM HTTP Server Planning, Installing, and Using, SC34-4826, especially Chapter 15.
235
The server is in a very verbose mode when it is restarts. To stop the trace, you either change the parameter in the procedure and restart the server or issue a MODIFY command. Alternatively, dynamically modify the server with the following console command: /f imwebsrv,appl=-vv In the command, imwebsrv is the name of your IBM HTTP Server. The following message appears in the console log: 30Sep04 10:28:08: IMW3518I Second level tracing (-vv) enabled. To stop the trace, launch this command: /f imwebsrv,appl=-nodebug The following message appears in the console log: 30Sep04 10:38:38: IMW3508I Debug has been disabled for all modules. Important: Because of the large amount of data that the -vv trace generates and the impact on performance, the authors recommend that you start the trace dynamically, reproduce the error, and then stop the trace. That way, you have a short -vv trace, which makes it easier to find the section of the log that relates to the problem. The -vv trace provides more detailed information than the server error log or the access log. For this reason, the trace is more helpful if you determine that the problem occurred inside IBM HTTP Server and you need detailed step-by-step processing information to rectify the problem. Example 19-12 displays only a portion of the trace, showing a request of the /IBMTools/EBizSuperSnoop file from a browser with the following information: The method of the request The file requested (GET //IBMTools/EBizSuperSnoop) The protocol and version (HTTP/1.1) The browser and the operative system of the client (Mozilla 4.0; compatible Microsoft Internet Explorer 6.0; Microsoft Windows NT 5.1) The IP address and port of the host (wtsc49.itso.ibm.com:9508)
Example 19-12 Very verbose trace sample [21646C48 [21660778 [21660778 [21660778 [21660778 [21660778 [21660778 [21660778 [21660778 [21660778 [21660778 [21660778 [21660778 [21660778 30/Sep/2004:14:06:12.854728]: 30/Sep/2004:14:06:22.556968]: 30/Sep/2004:14:06:22.557027]: 30/Sep/2004:14:06:22.557046]: 30/Sep/2004:14:06:22.557131]: 30/Sep/2004:14:06:22.557156]: 30/Sep/2004:14:06:22.557187]: 30/Sep/2004:14:06:22.557218]: 30/Sep/2004:14:06:22.557236]: 30/Sep/2004:14:06:22.557314]: 30/Sep/2004:14:06:22.557340]: 30/Sep/2004:14:06:22.557355]: 30/Sep/2004:14:06:22.557378]: 30/Sep/2004:14:06:22.557395]: 30Sep04 14:06:12: IMW3518I Second level tracing (-vv) enabled. Read 460 bytes from socket 10. After AcceptEx nAcceptThds: 75 and nSSLAcceptThds: 0. server_loop... Accepted socket: 10. KEEPALIVE... set. HTSession... starting for socket=10; STHD=21932DD8 Keep-Alive.. Starting HTTPD 1.1 loop. HTTimer... setting timer off->set (1) on socket 10. HTTimer... set, old=0, cur=0, new=1 Client sez.. GET /IBMTools/EBizSuperSnoop HTTP/1.1 Protocol version.... 1.1 Persistent Connection has been established Client sez.. Accept: */* Accept...... */* (q=1.00,mxb=0.0,mxs=0.0)
236
[21660778 30/Sep/2004:14:06:22.557437]: Client sez.. Referer: http://wtsc49.itso.ibm.com:9508/IBMTools/ [21660778 30/Sep/2004:14:06:22.557456]: Referer..... http://wtsc49.itso.ibm.com:9508/IBMTools/ [21660778 30/Sep/2004:14:06:22.557476]: Client sez.. Accept-Language: en-us [21660778 30/Sep/2004:14:06:22.557492]: Language.... en-us (q=1.00) [21660778 30/Sep/2004:14:06:22.557517]: Client sez.. Accept-Encoding: gzip, deflate [21660778 30/Sep/2004:14:06:22.557533]: Encoding.... gzip (q=1.00) [21660778 30/Sep/2004:14:06:22.557552]: Encoding.... deflate (q=1.00) [21660778 30/Sep/2004:14:06:22.557580]: Client sez.. If-Modified-Since: Thu, 30 Sep 2004 17:03:11 GMT [21660778 30/Sep/2004:14:06:22.557600]: Format...... Wkd, 00 Mon 0000 00:00:00 GMT [21660778 30/Sep/2004:14:06:22.557621]: TimeZone.... 04 hours from GMT [21660778 30/Sep/2004:14:06:22.557638]: Time string. Thu, 30 Sep 2004 17:03:11 GMT; offset = 0 seconds [21660778 30/Sep/2004:14:06:22.557662]: Parsed...... to 1096563791 seconds, Thu Sep 30 13:03:11 2004 [21660778 30/Sep/2004:14:06:22.557687]: Give only... if modified since (localtime) Thu Sep 30 13:03:11 2004 [21660778 30/Sep/2004:14:06:22.557722]: Client sez.. User-Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; .NET CLR 1.1.4322) [21660778 30/Sep/2004:14:06:22.557746]: User-Agent.. Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; .NET CLR 1.1.4322) [21660778 30/Sep/2004:14:06:22.557771]: Client sez.. Host: wtsc49.itso.ibm.com:9508 [21660778 30/Sep/2004:14:06:22.557789]: Host........ wtsc49.itso.ibm.com [21660778 30/Sep/2004:14:06:22.557804]: Host Port... 9508
237
Then, in the plugin-cfg.xml file, specify the LogLevel and Name of the plug-in log file (plugin.log) where all logging output should go, as shown: <Log LogLevel="Error" Name="/<...>/plugin.log"/> Plug-in logging allows logging at many levels of detail to suit various situations. You can specify one of the following levels: Trace Stats Warn Error All of the steps in the request process are logged in detail. The server selected for each request and other load balancing information that is related to request handling is logged. All warning and error messages that result from abnormal request processing are logged. Only error messages that result from abnormal request processing are logged.
The default level of logging is Error. Note: Specifying LogLevel="Trace" generates a large amount of data that might impact performance. The authors recommend that you specify LogLevel="Error". The server records one line per request that arrives. Figure 19-10 illustrates the fields in each record line: 1. 2. 3. 4. Process ID Pthread ID IBM Software source code file name Function name
[Wed Sep 22 16:27:59 2004] 01080075 216b31b000000053 - TRACE: ws_common: websphereHandleRequest:Request is: host='wtsc49.itso.ibm.com'; uri='/IBMTools/EBizHitCount'
Figure 19-10 A plug-in trace record and some of the important fields
Figure 19-11 on page 238 shows the plug-in trace records that resulted from a request to find matches for the virtual host group (VhostGroup) and URI group (UriGroup).
TRACE: ws_common: websphereVhostMatch: Comparing 'wtsc49.itso.ibm.com:9508' to 'wtsc49.itso.ibm.com:9508' in VhostGroup: default_hos TRACE: ws_common: websphereVhostMatch: Found a match 'wtsc49.itso.ibm.com:9508' to 'wtsc49.itso.ibm.com:9508' in VhostGroup: default TRACE: ws_common: websphereVhostMatch: Comparing '*:9559' to 'wtsc49.itso.ibm.com:9508' in VhostGroup: default_host TRACE: ws_common: websphereVhostMatch: Comparing '*:9558' to 'wtsc49.itso.ibm.com:9508' in VhostGroup: default_host TRACE: ws_common: websphereVhostMatch: Comparing '*:9549' to 'wtsc49.itso.ibm.com:9508' in VhostGroup: default_host TRACE: ws_common: websphereVhostMatch: Comparing '*:9548' to 'wtsc49.itso.ibm.com:9508' in VhostGroup: default_host TRACE: ws_common: websphereVhostMatch: Comparing '*:80' to 'wtsc49.itso.ibm.com:9508' in VhostGroup: default_host TRACE: ws_common: websphereVhostMatch: Comparing 'wtsc49.itso.ibm.com:9519' to 'wtsc49.itso.ibm.com:9508' in VhostGroup: default_hos TRACE: ws_common: websphereVhostMatch: Comparing 'wtsc49.itso.ibm.com:9518' to 'wtsc49.itso.ibm.com:9508' in VhostGroup: default_hos TRACE: ws_common: websphereVhostMatch: Comparing '*:9519' to 'wtsc49.itso.ibm.com:9508' in VhostGroup: default_host TRACE: ws_common: websphereVhostMatch: Comparing '*:9518' to 'wtsc49.itso.ibm.com:9508' in VhostGroup: default_host TRACE: ws_common: websphereUriMatch: Comparing '/admin' to '/IBMTools/EBizHitCount' in UriGroup: default_host_dmgr_pddmnode_Cluster_ TRACE: ws_common: websphereUriMatch: Comparing '/admin/*' to '/IBMTools/EBizHitCount' in UriGroup: default_host_dmgr_pddmnode_Cluste TRACE: ws_common: websphereUriMatch: Comparing '/adminservlet' to '/IBMTools/EBizHitCount' in UriGroup: default_host_dmgr_pddmnode_C TRACE: ws_common: websphereUriMatch: Comparing '/FileTransfer' to '/IBMTools/EBizHitCount' in UriGroup: default_host_dmgr_pddmnode_C TRACE: ws_common: websphereUriMatch: Comparing '/adminservlet/*' to '/IBMTools/EBizHitCount' in UriGroup: default_host_dmgr_pddmnode TRACE: ws_common: websphereUriMatch: Comparing '/FileTransfer/*' to '/IBMTools/EBizHitCount' in UriGroup: default_host_dmgr_pddmnode TRACE: ws_common: websphereUriMatch: Failed to match: /IBMTools/EBizHitCount
238
The request that was used was: http://wtsc49.itso.ibm.com:9508/IBMTools/EBizHitCount The plug-in first attempts to find a match for the wtsc49.itso.ibm.com virtual host and port 9508 in the defined virtual host group. Then, it compares the /IBMTools/EBizHitCount URI with the defined URI group entries. The last line shows that there was no matching URI definition.
239
240
20
Chapter 20.
241
242
1. From the IPCS Primary Option Menu panel, select option 6 (COMMAND). 2. In the IPCS Sub-command Entry panel: a. Issue the SETDEF sub command to determine the default values for routing displays. b. Enter the CTRACE command, with the following required parameters: CTRACE COMP(cell_short_name) cell_short_name is the value that is specified through the ISPF Customization Dialog to identify the location of server configuration files (eight or fewer characters and all uppercase). If you are interested in only JRas data, enter the following command and specify additional parameters as necessary: CTRACE COMP(cell_short_name )USEREXIT(JRAS) For more details about CTRACE, see z/OS MVS IPCS Commands, SA22-7594. 3. View your application CTRACE data based on the options that you chose for the location of the data. Example 20-1 shows WebSphere for z/OS CTRACE output.
Example 20-1 WebSphere for z/OS CTRACE output SY1 OBOAT008 04000002 00:14:57.268258 Dispatch Method ASID.... 0039 TCB..... 009E34A0 PSW1.... 078D2400 SESS.... 00000008 REQI.... 0000006C Class Name = JPolicyEmSQLMO Method Name = _get_policyNo object = 0x260ED1F8 objectPtr refcount = 3 0x00000003 objectPtr classname= JPolicyEmSQLMO
An entry contains an undefined ID: 13007002 , hex format will be used. SY1 N/A 13007002 00:14:57.272682 N/A 0002009E 34A0078D 24000039 00000008 | ................ | 0000006C 000C0302 97969389 83A8D596 | ...%....policyNo | 00120402 6DD1D796 938983A8 C2D6C994 | ...._JPolicyBOIm | 97930009 0A02C1E4 C4C9E300 2E0B02C2 | pl....AUDIT....B | C2D6D1F0 F0F0F240 D7969389 83A84095 | BOJ0002 Policy n | A4948285 9940F3F3 6BF3F3F3 409682A3 | umber 33,333 obt | 81899585 844B4040 40 | ained. | Trace: 2004/10/12 00:14:57.268 01 t=9E34A0 c=8.6C key=P8 (04000002) Description: Dispatch Method Class Name: JPolicyEmSQLMO Method Name: _get_policyNo object: 260ED1F8 objectPtr refcount: 3 objectPtr classname: JPolicyEmSQLMO Trace: 2004/10/12 00:14:57.272 01 t=9E34A0 c=8.6C key=P8 (13007002) FunctionName: policyNo SourceId: _JPolicyBOImpl Category: AUDIT ExtendedMessage: BBOJ0002 Policy number 33,333 obtained.
To navigate through the trace data in the Dump Display Reporter panel, use the commands and PF keys listed in z/OS MVS IPCS Users Guide, SA22-7596. For information about viewing CTRACE and JRas data through IPCS, refer to the WebSphere Application Server for z/OS V6, Troubleshooting and support, GA22-7964-03.
243
Also, you can visit the WebSphere for z/OS Information Center at: http://publib.boulder.ibm.com/infocenter/wasinfo/v6r0/index.jsp
If this trace cannot help you and you are connecting to DB2 databases, the DB2 SQLJ/JDBC trace might. The following steps describe the procedure for obtaining this JDBC DB2 trace: 1. Log in to the Administrative Console, expand the Environment item in the menu, and select WebSphere Variables. Select the scope for which you want to enable the trace and click Apply. 2. Click New. Add this variable name and its value: DB2SQLJPROPERTIES=/mydb2dir/wsccb_db2sqljjdbc.properties 3. In the properties file called /mydb2dir/wsccb_db2sqljjdbc.properties, set up the variable DB2SQLJ_TRACE_FILENAME to enable the SQLJ/JDBC trace and specify the name of the file to which the trace is written: DB2SQLJ_TRACE_FILENAME=/tmp/IVP2_jdbctrace 4. The JDBC trace produces two HFS files: /tmp/IVP2_jdbctrace: This file is in binary format. You must format it using the db2sqljtrace command (as shown in the following step). /tmp/IVP2_jdbctrace.JTRACE: This file contains readable text.
244
5. To format the binary trace data, use the following db2sqljtrace command in the USS environment: db2sqljtrace fmt|flw TRACE_FILENAME > OUTPUT_FILENAME The fmt|flw sub commands ensure that the output race contains: fmt flw A record every time a function is entered or exited before a failure The function flow before a failure
OUTPUT_FILENAME The name of the file that the new formatted trace is written to. To run db2sqljtrace correctly, ensure that the JDBC path and libraries variables are PATH and LIBPATH. You can change them with the following commands: export PATH=$PATH:/usr/lpp/db2/db2810/bin export LIBPATH=$LIBPATH:/usr/lpp/db2/db2810/lib Note: The IBM default path is /usr/lpp/db2/db2810/, but you might have another path, depending on your installation. You can verify that they are correct with the following commands: echo $PATH echo $LIBPATH For more information, refer to DB2 documentation.
245
<2004.10.04 20:15:03.164> <Entry> <constructor> <com.ibm.db2.jcc.DB2LogicalConnection> <P=253767:O=0:CT> <2004.10.04 20:15:03.164> <Exit> <constructor> <com.ibm.db2.jcc.DB2LogicalConnection> <P=253767:O=0:CT> -- <p#1=com.ibm.db2.jcc.DB2LogicalConnection@4ba6629c[mClosed=false;mConnection=5254a29a]>
Example 20-3 shows a JDBC trace that was formatted with the fmt sub command.
Example 20-3 JDBC trace formatted with fmt sub command Trace Version : DB2 7.1 Driver Build Version : DB2 7.1 UQ85384 JDBC 2.0 Trace Captured at : Mon Oct 4 20:15:02 2004 Trace buffer size : 262144 bytes Records to keep : LAST Trace truncated : NO Trace wrapped : NO Shared Memory Address : 0x1E5CA568 First empty slot : 7604 Trace Table Address : 0x1E681030 Size of trace : 7592 bytes Records in trace : 134 1 SQLJ fnc_entry sqlj_JDBC_Driver DB2SQLJ_sqlj_driver_native_init (2.1.7.1) pid 0x007fb620; tid 0x007fb620; time 1096935302; tpoint 0 0000 0000 .... 2 SQLJ fnc_entry sqlj_JDBC_AttachMgr sqlj_Attach_Global_Init (2.1.14.1) pid 0x007fb620; tid 0x007fb620; time 1096935302; tpoint 0 0000 0000 .... 3 SQLJ fnc_data sqlj_JDBC_AttachMgr sqlj_Attach_Global_Init (2.3.14.1) pid 0x007fb620; tid 0x007fb620; time 1096935302; tpoint 1 0000 0001 0000 0004 37ac 75d0 ...........} 4 SQLJ fnc_entry sqlj_Native_Util sqlj_memAlloc (2.1.3.1) pid 0x007fb620; tid 0x007fb620; time 1096935302; tpoint 0 0000 0000 ....
Example 20-4 shows a JDBC trace formatted with the flw sub command.
Example 20-4 JDBC trace formatted with flw sub command Trace Version : DB2 7.1 Driver Build Version : DB2 7.1 PQ56655 Trace Captured at : Wed Sep 18 17:12:00 2002 Trace buffer size : 262144 bytes Records to keep : LAST Trace truncated : NO Trace wrapped : NO Shared Memory Address : 0x236D3568 First empty slot : 184452 Trace Table Address : 0x2377C030 Size of trace : 184440 bytes Records in trace : 2298 pid = 0x007f9358; 1 DB2SQLJ_sqlj_driver_native_init fnc_entry ... 2 |sqlj_Attach_Global_Init fnc_entry ... 3 |sqlj_Attach_Global_Init fnc_data ... 4 | |sqlj_memAlloc fnc_entry ... 5 | |sqlj_memAlloc fnc_data ... 6 | |sqlj_memAlloc fnc_retcode 0
246
Console dump
A console dump is an SVC dump that is captured using the MVS DUMP command and run from the console or SDSF log. It is referred to as a console dump because of how it is triggered. You initiate a console dump when: You want an SVC dump of a servant region or a dump of the servant controller region. You suspect a particular servant region to be the source of a problem. Dump the controller region and all of its servant regions. You detect a hang or high CPU utilization for a particular address space. A sample PARMLIB member that determines the information to be included in a console dump can be found in SBBOSLIB(BBODMCCB). The sample contains instructions about its installation and use. The standard SDATA expected in an SVC dump is: SDATA=(ALLNUC,CSA,GRSQ,LPA,LSQA,PSA,RGN,SQA,SUM,SWA,TRT)
SLIP dump
A SLIP dump is an SVC dump, but it is called a SLIP dump because it is triggered by the MVS SLIP command. You can use a SLIP dump when there is an error and no SVC dump is being produced because the SVC dump is probably being suppressed by the Dump Analysis and Elimination facility. You can also use a SLIP dump when you want a dump to be triggered when a certain error message occurs or when IBM support has asked you for one
247
An example of s SLIP dump that we used to capture an EC3/04130007 is shown in Example 20-5.
Example 20-5 Slip dump for capturing EC3/04130007 SLIP SET,A=SVCD,COMP=EC3,REASON=04130007,ID=WEC3, MATCHLIM=20,ASIDLST=(0,H,I,P,S), SDATA=(ALLNUC,CSA,GRSQ,LPA,LSQA,PSA,RGN,SQA,SUM,SWA,TRT),END
ip verbx ledata nthreads(*) ip verbx ledata nthreads(*) asid(aaaa) ip verbx ledata tcb(tttttttt) nthreads(*)
248
IPCS commands ip omvsdata process detail asid(xhhhh) ip analyze resource ip verbx vsmdata summary
Explanation This command generates a report for the process that shows the thread status from a USS perspective. This command generates a report showing resource contention. This is useful in a hang situation. This command generates a report that shows the virtual storage usage for the system. This is useful when the system is experiencing storage problems. This command formats the available erep detail reports. This command formats the available master trace, which holds syslog information.
20.4 CEEDUMP
Generally, a CEEDUMP is generated if a region fails or there is an abend (for example, an error occurring in the z/OS Language Environment or Java Runtime Environment). Typically, CEEDUMP can be found in the job logs of the different servers. CEEDUMP can help you identify the failing module in the Traceback section of the dump. Search for Traceback at the top of the CEEDUMP. The result is a sequence of modules as shown in Figure 20-1.
CEE3DMP V1 R3.0: Condition processing resulted in the unhandled condition. Page: 1 Information for enclave main Information for thread 23B00F1000000000 Traceback: DSA Addr 236D4768 236D3C08 236D37B8 236D32C0 236D3210 236D30F8 236D3030 Program Unit CEEHDSP PU Addr PU Offset Entry E Addr E Offset Statement Load Mod Service Status 06CB6B48 +00000806 __zerros 06CB6B48 +00000806 CEEEV003 Call 06E7C2B0 +00002BE6 CEEHDSP 06E7C2B0 +00002BE6 CEEPLPKA Call /src/share/java/runtime/jni.c 26091830 +00000528 JNI_CreateJavaVM 26091830 +00000528 4432 *PATHNAM Exception 1C2FBEE0 +00001270 loadAndInitVM(JavaVM_**,JNIEnv_**,SOMException*) 1C2FBEE0 +00001270 411 BBOLRT CB30038 Call 1C301E88 +000002BE getJavaEnv(SOMException*) 1C301E88 +000002BE 1679 BBOLRT CB30036 Call 1C302860 +00000092 buildJavaClass(const char*,SOMException*) 1C302860 +00000092 1921 BBOLRT CB30036 Call 1C30A5F0 +000001A4 __cdecl _NewObject(SOMClassRef*,SOMException*) 09/18/02 5:19:06 PM
249
The last modules in action are at the top, and underneath them are the oldest, in order. Look for the term Exception in the Status column. Usually, an exception is in one of the last modules in action, so it is likely to be near the top of the Status column. The name of the entry with the exception (JNI_CreateJavaVM in the Entry column) is the most important string in this CEEDUMP, because it is the search argument that you use for researching known problems and their solutions in APARs, PMRs, or on your favorite problem search site on the Internet. Note: See Chapter 1, Problem determination methodology on page 3, for information about IBM resources and using the exception entry to search problem databases. A CEEDUMP is a formatted dump and therefore IPCS is not required to read it. Depending on the problem, a CEEDUMP might not have enough formatted information and IBM support might require an SVC dump. For CEEDUMP parameter settings, see z/OS V1R6.0 Language Environment Debugging Guide, GA22-7560 If you have an SVC dump, it is possible to view CEEDUMP contents in an SVC dump using the IPCS verbexit LEDATA with the CEEDUMP or NTHREADS options. This formats the Language Environment control blocks to help in analysis. For additional information, see the z/OS V1R6.0 Language Environment Debugging Guide, GA22-7560, to learn more about using IPCS to format and analyze dumps.
The job log also shows the JVM messages in the trace part of the job log as shown in Example 20-7.
Example 20-7 Java OutOfMemoryErrors shown in job log JVMDG217: JVMHP002: JVMHP012: JVMDG315: JVMDG318: JVMDG303: JVMDG304: JVMDG274: JVMST109: Dump Handler is Processing OutOfMemory - Please Wait. JVM requesting System Transaction Dump System Transaction Dump written to ASSR1.JVM.TDUMP.WS6422S.D050809.T1 JVM Requesting Heap dump file Heap dump file written to /SC42/tmp/HEAPDUMP.20050809.190559.16843095 JVM Requesting Java core file Java core file written to /SC42/tmp/JAVADUMP.20050809.190613.16843095 Dump Handler has Processed OutOfMemory. Insufficient space in Javaheap to satisfy allocation request
250
...Trace: 2005/08/09 19:06:19.729 01 t=7CC148 c=11.1 key=P8 (13007002) ThreadId: 00000029 FunctionName: com.ibm.ws.webcontainer.servlet.ServletWrapper SourceId: com.ibm.ws.webcontainer.servlet.ServletWrapper Category: SEVERE ExtendedMessage: BBOO0220E: SRVE0068E: Could not invoke the service() method on servlet MemLeak. Exception thrown : java.lang.OutO fMemoryError: JVMXE006:OutOfMemoryError, stAllocArray for executeJava failed
In the case of an OutOfMemory error, looking for the memory leak using the transaction dump in IPCS is not useful. Java tools such as Heaproots or the Memory dump diagnostic for Java are more effective in this case. You can disable the generation of a TDUMP, but IBM does not recommend it. For more information about TDUMP, refer to IBM Developer Kit and Runtime Environment, Java 2 Technology Edition, Version 1.4.2, Diagnostics Guide, SC34-6358-01, for your version of Java on z/OS.
20.6 Javadump
A Javadump produces files with diagnostic information that is related to the JVM and a Java application captured at a point while it is run. For example, the information can be about the operating system, the application environment, threads, native stack, locks, and memory. The exact contents depend on the platform that you are running. By default, a Javadump occurs when the JVM terminates unexpectedly. A Javadump can also be triggered by sending specific signals to the JVM. Note: Javadump is also known as Javacore. This is not the same as a core file (that is, an operating system feature that can be produced by any program, not just the JVM). For more information about the Javadump, refer to IBM Developer Kit and Runtime Environment, Java 2 Technology Edition, Version 1.4.2, Diagnostics Guide, SC34-6358-01, for your version of Java on z/OS.
20.7 Heapdump
Heapdump is an IBM JVM facility that generates a dump of all of the live objects that are on the Java heap, that is, those that are used by the Java application. It shows the objects that are using large amounts of memory on the Java heap and what is preventing them from being collected by the Garbage Collector. For more information about the Heapdump refer to IBM Developer Kit and Runtime Environment, Java 2 Technology Edition, Version 1.4.2, Diagnostics Guide, SC34-6358-01, for your version of Java on z/OS.
251
252
21
Chapter 21.
253
The analysis of heap-related issues, such as OutOfMemoryError and other crashes, hangs, or loops in WebSphere for z/OS address spaces can be similar to that for the level that is possible with the earlier deployment environments, such as CICS and IMS. You use the relatively low impact tools of unformatted dumps and JVM internal information for analysis, which means that you must draw together knowledge of the JVM internals with the less invasive diagnostic approaches (such as SDUMP) that are typically used to diagnose problems in z/OS production environments. These tools can be used to diagnose problems that affect your important production workload, but can only be recreated in these high-transaction environments. These approaches should be driven initially by the systems programming staff, because they have authority to access the SVC dumps or transaction dumps that are taken during failures or have the authority to request console dumps of hung or looping servers. After the unformatted dumps are available, the post processing and interpretation of the data can be done by either systems programming or by development staff, because the tools are Java-based and therefore not tied to z/OS.
21.2.1 Svcdump.jar
The svcdump.jar file allows direct access to the binary SVC dump or transaction dumps that are created in z/OS without the need for intermediate software such as IPCS. There are three packages that are shipped in svcdump.jar: Dump utility: com.ibm.jvm.svcdump.Dump package This formats native and Java stacks for threads in dumped processes that include an instantiated JVM. The dump utility includes a function to print out other useful information, such as in core trace buffers maintained by the JVM and the system trace that mimic or extend the information that can be obtained with IPCS. FindRoots utility: com.ibm.jvm.findroots.* package This provides multiple ways of formatting the object graphs that are present in the Java-managed heap. This is critical for the sometimes difficult tasks of pinning down object leaks and making sense of heap occupancy. Java API This can be used to write ad hoc utilities. For example, you could write small programs that report on objects in the Java heap that maintain state data about a business application.
Using svcdump.jar
The svcdump.jar file is available from this Web site (requires IBM registration): https://www14.software.ibm.com/webapp/iwm/web/preLogin.do?source=diagjava Note: This tool is in active development, so it would be helpful to IBM if you provide feedback about your experiences with it. You need three files: The svcdump.jar file The doc.jar file, which contains documentation for the exposed API libsvcdump.so, a DLL that allows the Java code to access an unformatted dump in an MVS data set rather than in the HFS (when you use this DLL, you do not have to provide a large HFS data set to copy the dump to, meaning that in z/OS, the original dump can be analyzed instead)
255
Attention: The authors used Version 20041012 of the code. Later versions might offer additional functions or different output. Copy the three files in binary format to a suitable location in the HFS. In our example, the files are in /u/dclarke. Use the following command to confirm the version of the utility that you are running: java -cp svcdump.jar com.ibm.jvm.svcdump.Dump version Example 21-1 shows the results that we obtained after we used the command.
Example 21-1 Determining the dump utility version You are using jar:file:/C:/Documents%20and%20Settings/Administrator/My%20Documents/SVCDumps/svcdump200410 07.jar!/com/ibm/jvm/svcdump/Dump.class which was last modified on Tue Oct 12 14:53:29 BST 2004
This uses introspection to identify when this code was last modified.
256
Option -hpitrace -caa <addr> -r<n> -args -verifysubpools -verifyheap -printdosed -printroots -version -fullversion -title -time -dis <addr> <n>
Description Print the HPI trace. Specify the CAA to use when disassembling. Include saved register <n> in stack trace. Print first four function arguments Verify subpools. Verify heap. Print pinned and dosed objects. Print garbage collection roots. Print the version and exit. Print the version of the jvm in the dump and exit. Print title of the dump and exit. Print time of the dump and exit. Disassemble <n> instructions starting at <addr> (hex).
To use the dump utility with these options, issue: java com.ibm.jvm.svcdump.Dump [options] <filename> Example 21-2 shows a simple shell script that can be used to run the utility.
Example 21-2 Shell script for the dump utility #!/bin/sh #TZ=EST5EDT set -x DUMPNAME=1 SVCDUMPJARFILE=/u/dclarke/svcdump20041007.jar SVCDUMPLIBPATH=/u/dclarke java -Xmx348m -Dsvcdump.libpath=SVCDUMPLIBPATH -Xbootclasspath/p:SVCDUMPJARFILE \ -Dsvcdump.default.jvm=0 \ com.ibm.jvm.svcdump.Dump -exception \ DUMPNAME \ >>DUMPNAME.svcdump.txt java -Xmx348m -Dsvcdump.libpath=SVCDUMPLIBPATH -Xbootclasspath/p:SVCDUMPJARFILE \ -Dsvcdump.default.jvm=0 \ com.ibm.jvm.svcdump.Dump -hpitrace \ DUMPNAME \ >>DUMPNAME.svcdump.txt java -Xmx348m -Dsvcdump.libpath=SVCDUMPLIBPATH -Xbootclasspath/p:SVCDUMPJARFILE \ -Dsvcdump.default.jvm=0 \ com.ibm.jvm.svcdump.Dump -systrace \ DUMPNAME \ >>DUMPNAME.svcdump.txt
257
Analysis of the dump can take some time, especially for the first execution. The tool stores some heap information in a small .cache file that make subsequent executions faster. Use this simple JCL to run the utility in batch: //STEP1 EXEC PGM=BPXBATCH,REGION=0M, // PARM='SH /u/dclarke/svcdump.sh ONTOP.GS031.P10316.C724.JVMDMP
258
transaction has ended. There is a bug in this code that is causing some other global object to maintain a reference to the XML data object after the end of the transaction. Although the object itself is small, it contains a reference to non-trivial numbers of objects that is created by XML parsing. A successful diagnosis of the problem might be as follows: 1. A still reachable XML data object exists after each transaction runs. 2. The remaining reachable data gradually increases over time after each garbage collection cycle. 3. You can observe this from the verbose:gc output (or from the incore verbosegc data). 4. Eventually, the Java heap is exhausted and an OutOfMemoryError is thrown. 5. You obtain a console dump of the server at the time when the heap usage is high. 6. You run the PrintDomTree tool and establish from the reports that there is an unexpectedly high number of these XML data objects. 7. You find the unexpected reference in the reports from the global object. 8. A review of the logic reveals that this reference is not nulled out after the transaction as the design anticipated. More information about the different FindRoots utilities and the reports they produce is available in WebSphere for z/OS V5 JVM Dump and Heap Analysis Tools, REDP-3950.
21.2.2 HeapRoots
The HeapRoots utility is shipped in the HR204.jar file and is derived from the same requirement as that for being able to map the heap object graphs. It was originally developed for the JVM shipped with IBM AIX. It is now possible to use this code seamlessly with binary SVC or transaction dumps, providing a range of additional functions that are available with the FindRoots utility in svcdump.jar. HeapRoots is available through the alphaWorks Web site at: http://www.alphaworks.ibm.com/tech/heaproots To use HeapRoots with .phd files, use the following command: java -classpath svcdump.jar;HR204.jar HR.main.Launcher Examples and details about HeapRoots are available in WebSphere for z/OS V5 JVM Dump and Heap Analysis Tools, REDP-3950.
259
You can invoke Dumpviewer with the following command: java -Xmx512m -cp svcdump.jar com.ibm.jvm.dump.format.DvConsole -g When the GUI initializes, select File from the menu to locate the dump for initialization. Restriction: To use this tool with z/OS, you must export the DISPLAY environment variable to a valid X Server display on a Win32 or Linux system. With the Win32 JDK shipped with WebSphere for Windows, the formatter can be invoked with the jformat command, which is found in C:\Program Files\IBM\Java142\bin\jformat. The g switch starts the GUI:
"C:\Program Files\IBM\Java142\bin\jformat" -J-Xmx512m -g
For more information, see IBM Developer Kit and Runtime Environment, Java 2 Technology Edition, Version 1.3.1, Diagnostics Guide, SC34-6200, (used with WebSphere V5.0), and IBM Developer Kit and Runtime Environment, Java 2 Technology Edition, Version 1.4.1, Diagnostics Guide, SC34-6309, (used with WebSphere V5.1). These manuals and detailed documentation for the garbage collector used by the IBM JVM, can be downloaded from: http://www.ibm.com/developerworks/java/jdk/diagnosis/
Figure 21-1 shows an example of the output that the tool produces.
261
traceFile is the name of the trace file that you want to open for analysis. This is an optional feature. You can also open files from the GUI by selecting File Open. Figure 21-2 shows a the Trace Analyzer for WebSphere z/OS window.
4. From the Refine menu, you can select Filter and Search. By selecting any entry from the trace pane, you can see its full contents in the bottom console. 5. For help using the program, see the integrated help system by selecting Help Using Trace Analyzer. This utility makes it relatively easy to read the diagnostic information, even when you are not very familiar with the component that is being debugged or tested.
262
You first need to capture the Java garbage collection statistics. To use the Administrative Console to turn on verbose garbage collection, follow these steps: 1. Expand the Servers node in the left-hand menu and select Application Servers. 2. From the list of servers, select the application server for verbose garbage collection. 3. From the Java and Process Management menu, select Process Definition. 4. From the processType list, select the appropriate servant. 5. From the Additional Properties menu, select Java Virtual Machine. 6. Find Verbose garbage collection in the list of General Properties and select it. Figure 21-3 shows the Advanced Java Virtual Machine settings for enabling Verbose mode.
7. Click Apply to apply your changes. Click Save to save your configuration. 8. Run transactions through your server for a specific period. The results are a log file that is similar to that in Example 21-4.
Example 21-4 Java garbage collection trace sample
Allocation Failure. need 528 bytes, 122081 ms since last AF> managing allocation failure, action=1 (0/255012224) (13421696/13421696) GC cycle started Mon Aug 1 17:44:10 2005 freed 183448600 bytes, 73% free (196870296/268433920), in 400 ms> mark: 332 ms, sweep: 68 ms, compact: 0 ms> refs: soft 0 (age >= 32), weak 275, final 1348, phantom 0>
263
<AF[1]: completed in 403 ms> The number in the brackets in AF[x] at the start of a line indicates how many times the memory allocation failed. The number in the parentheses in GC(y) indicates how many times garbage collection has occurred since the servant region started: <GC(1): freed 183448600 bytes, 73% free (196870296/268433920), in 400 ms> This line indicates how much free memory is available in the JVM after the GC, 73% in our example. If this number decreases over a period of time, there is a problem in the JVM memory heap. Ultimately, the JVM keeps trying to allocate memory and keeps failing because garbage collection cannot recall any free memory. This occurs when all objects in the JVM have held references that cannot be released. 9. Using a REXX or AWK script, format the output of the Java garbage collection trace to get a semicolon-delimited condensed file (Example 21-5).
Example 21-5 Semicolon-delimited Java garbage collection trace afnum ; timeSinceLastAF ; aftime ; afsize ; gcnum ; conGCnumb; timeSinceLastConGC; conGCtime; ConRes ; ConTarget ; ConTraced ; ConFree ;gcstart ; gcfreed ; freespace ; heapsize ; gctime ;threadStopTime ; threadStartTime ; marktime ; sweeptime ; compacttime ; msmin ; msmax ; msavg ; moved ; bytes ; reason ; 1;122081;403;528;1;;;;;;;;17:44:10;183448600;196870296;268433920;400;;;332;68;0;;;;;; 2;102844;514;528;2;;;;;;;;17:45:53;139549048;152714712;268433920;513;;;462;51;0;;;;;; 3;53149;526;5184;3;;;;;;;;17:46:47;134965264;147782608;268433920;526;;;464;62;0;;;;;; 4;431101;525;528;4;;;;;;;;17:53:58;140140296;150365624;268433920;524;;;474;50;0;;;;;; 5;346417;446;528;5;;;;;;;;17:59:45;156671872;164212856;268433920;446;;;392;54;0;;;;;; 6;9495859;525;528;6;;;;;;;;20:38:01;140242632;145579056;268433920;525;;;462;63;0;;;;;; 7;12308;539;528;7;;;;;;;;20:38:14;138183336;140835416;268433920;539;;;482;57;0;;;;;;
10.FTP the output file (in ASCII) to your workstation and import into a spreadsheet tool such as Microsoft Excel or Lotus 1-2-3. 11.From the spreadsheet, create a diagram. Figure 21-4 shows an example.
Garbage Collection tim e consum ption 600 526 464 524 474 446 392 Mark all live objects Identify objects no longer referenced Consolidate free space 525 462 Garbage Collection
513 462
400
400 332
300
200
Start Tim e
264
For more information, search for Java memory tuning tip at the WebSphere for z/OS Information Center: http://publib.boulder.ibm.com/infocenter/wasinfo/v6r0/index.jsp
265
============================================================================== Beginning of Name Space Dump ============================================================================== 1 (top) 2 (top)/legacyRoot javax.naming.Context 2 Linked to context: cl6552/persistent 3 (top)/domain javax.naming.Context 3 Linked to context: cl6552 4 (top)/persistent javax.naming.Context 5 (top)/persistent/cell javax.naming.Context 5 Linked to context: cl6552 6 (top)/cellname java.lang.String 7 (top)/cell javax.naming.Context 7 Linked to context: cl6552 8 (top)/nodes javax.naming.Context 9 (top)/nodes/nd6552 javax.naming.Context 10 (top)/nodes/nd6552/domain javax.naming.Context 10 Linked to context: cl6552 11 (top)/nodes/nd6552/servers javax.naming.Context 12 (top)/nodes/nd6552/servers/ws6552 javax.naming.Context ============================================================================== End of Name Space Dump ==============================================================================
266
The local system runs the debugger, and the remote system runs both the debugging engine and your program. The person debugging the program on the workstation interacts with the program as usual (except where breakpoints or step commands introduce delays) and can control the program and observe the internal behavior of the remote program from the local system.
4. Specify the JVM debug port. Port 7777 is the default. 5. Verify the JVM debug arguments. The default settings are: -Djava.compiler=NONE -Xdebug -Xnoagent -Xrunjdwp:transport=dt_socket,server=y,suspend=n,address=7777 Debug filters do not need to be set initially. You might find them useful as you gain experience with the debugger. 267
6. Click Apply and Save to apply your modifications and save the configuration. 7. Stop and start the server for debugging. 8. In Rational Application Developer, open the Debug Perspective by selecting Window Open Perspective Debug. 9. Open the Debug configurations by clicking the debug icon and choosing Debug from the menu (Figure 21-6).
10.Select Remote Java Application from the list of configurations. Click New to create a new configuration. You should see a panel similar to the one in Figure 21-7.
11.Enter a Name for your configuration. Select the project you want to debug by clicking Browse and choosing it from the list of projects in the work space. 12.Enter the host address of your WebSphere for z/OS application server and enter the JVM debug port that you specified in the administrative console. 13.Click Apply to save your changes.
268
14.Click Debug. Now, you can start debugging the application on the remote application server. A debug engine daemon is listening for a connection. To debug Java source code, you set breakpoints in the source code. You can set a breakpoint on a line of code to be triggered when a certain exception occurs. Add breakpoints in your code by double-clicking the gray area next to the line of code that you want to break. Right-click the breakpoint and select Breakpoint Properties to specify more detailed properties. To start a program in debug mode, click the Debug icon, and the Debug Perspective opens. If you have multiple debug configurations, you can choose which to debug by clicking the arrow next to the Debug icon and selecting it from the menu. In the Debug Perspective, you use icons to step into a line of code, to step over a line of code, or to run to the end of a method (step return). There are multiple views to aid you in debugging your application, including the Variables, Inspector, Debug, and Outline views.
Figure 21-8 The Debug Perspective in Rational Application Developer Chapter 21. Diagnostic tools for WebSphere for z/OS
269
Another Rational Application Developer tool is the XML editor and perspective that can be used to properly create and maintain valid XML files. An XML validator ensures that the XML is in a valid format and can be useful in fixing files that have become incorrectly formatted. The Rational Application Developer can import application client JAR, EJB JAR, EAR, and WAR files, which can be helpful in problem determination. Sometimes, it is important to be able to view the code of the application that you are deploying. These tools can take the place of a decompiler because they can import an archive file into an editable project.
Tivoli Performance Viewer consists of two panels: the Resource Selection panel and the Data Monitoring panel (see Figure 21-10 on page 271). The Resource Selection panel provides a view of resources for which performance data can be displayed. The Data Monitoring panel displays numeric and statistical data for the resources in the Resource Selection panel.
270
Web and EJB Thread Pools Database and connection pool size JVM Memory
Request metrics is a tool (embedded in Tivoli Performance Viewer) that you can use to track individual transactions, recording the processing time in each of the major WebSphere Application Server components.
Chapter 21. Diagnostic tools for WebSphere for z/OS
271
To enable request metrics from the Administrative Console: 1. Open the Administrative Console. 2. Select Monitoring and Tuning Request metrics in the console navigation tree. 3. Select Enable in the Request metrics field under the Configuration tab. 4. Specify the components that are instrumented by request metrics. 5. Specify how much data to collect. 6. Enable and disable logging. 7. Enable Application Response Measurement (Application RM) Agent. 8. Specify which Application RM type to use. 9. Specify the name of the Application RM transaction factory implementation class. 10.Isolate performance for specific types of requests. 11.Add and remove request metrics filters. 12.Click Apply and Save. 13.Regenerate the Web server plug-in configuration file so that it recognizes the changes that you made for the request metrics configuration. Another parameter in Tivoli Performance Viewer is thread pools. With Thread Pools, components of the server can reuse the threads and avoid the creation of new threads at run time. Creating new threads expends time and resources. Figure 21-11 shows a configuration panel for thread pool properties.
272
The maximum number of threads that you can create is constrained only by the limits of the JVM and the operating system. When a thread pool that can grow expands beyond the maximum size, the additional threads are not reused. They are discarded from the pool after the processing of the work items for which they were created is completed. When additional threads are created, a message is logged in the SYSOUT file to let you know that you went beyond the maximum size that was set for the thread pool. Attention: The size of the thread pools constrain the performance, and setting the size of the thread pools too high impacts the amount of memory that is needed by the system. For more detailed information about all available parameters, refer to Monitoring performance with Tivoli Performance Viewer at the WebSphere for z/OS Information Center: http://publib.boulder.ibm.com/infocenter/wasinfo/v6r0/index.jsp
273
274
22
Chapter 22.
275
To use this tool, download the InetInfo.java program to your working directory in z/OS USS. Compile the Java program into the class file as follows: >/usr/lpp/java/J1.4/bin/javac InetInfo.java When you run the Java class, you get the following results, as shown in Figure 22-1: 1. The function getLocalHost returns an IP address. 2. The function getHostName by address returns a host name. 3. The function getHostAddress by name returns the IP address correctly.
>/Z16RA1/usr/lpp/java/J1.4/bin/javac InetInfo.java >/Z16RA1/usr/lpp/java/J1.4/bin/java InetInfo get Local Host IP Address: 9.12.4.28 get Host Name By Address using 9.12.4.28 Host Name: wtsc55.itso.ibm.com get Host Address By Name using wtsc55.itso.ibm.com Host Address: 9.12.4.28
If any of the functions fail, the following message is displayed: Unknown Host, result: <returned message> The <returned message> value is the reason for the exception.
276
You can copy or download this Java program from the Techdoc Java program to test TCP/IP setup - InetInfo.java, TD100609, available at: http://www.ibm.com/support/techdocs/
Use the statistics to gain many kinds of higher perspective views, for example: Overall summary of transmission Summary by different endpoints (by hardware, IP, and so on) Summary by protocol hierarchy Look into the formatted display of each data packet. Ethereal formats all kinds of protocol headers according to standard specifications. One of the most powerful tools is a feature called Follow TCP Stream. When looking at a TCP data packet, you can instruct Ethereal to analyze the user data portion of the packet
Chapter 22. Other handy tools
277
according to a higher-level application protocol specification such as HTTP. The data is formatted according to the HTTP header into separate sections inside other windows. In the data windows, you can display in ASCII, EBCDIC, hexadecimal, or C-array. Figure 22-2 shows an Ethereal window that is analyzing a TCP data packet. The section display shows the details of all the formatted network headers of the TCP data packet. The Follow TCP Stream window shows the formatted HTTP header and HTML data.
Follow TCP Stream pop-up windows
HTTP Header
HTML data
Figure 22-2 Ethereal data analysis with Follow TCP Stream windows
For more informal information about Ethereal, see: http://www.ethereal.com The source code and installers for Microsoft Windows, Red Hat Linux, Sun Solaris, IBM AIX, SUSE Linux, and more, can be found on this Web site: http://www.ethereal.com/download.html For more information about WinPcap, see: http://www.winpcap.org
278
2. Start the external writer using the following command: /trace ct,wtrstart=CTWTRPD 3. Start the TCP/IP packet trace with filtering to pick up only one IP address: /v tcpip,TCPIP,pkttrace,full,ip=9.12.6.160 4. When the system responds with a prompt, reply as follows: /r xx,WTR=CTWTRPD,end 5. Use the DISPLAY command to check the external writer status (Example 22-2).
Example 22-2 Display trace status command /d trace,comp=systcpda,sub=(TCPIP) RESPONSE=SC49 IEE843I 15.02.26 TRACE DISPLAY 267 SYSTEM STATUS INFORMATION ST=(ON,0256K,00512K) AS=ON BR=OFF EX=ON MT=(ON,024K) TRACENAME ========= SYSTCPDA MODE BUFFER HEAD SUBS ===================== OFF HEAD 1 NO HEAD OPTIONS SUBTRACE MODE BUFFER HEAD SUBS -----------------------------------------------------TCPIP ON 0016M ASIDS *NONE* JOBNAMES *NONE* OPTIONS MINIMUM WRITER CTWTRPD
6. Recreate the problem scenario using the Web browser to access a page. 7. Stop the packet trace: /trace ct,off,comp=systcpda,sub=(TCPIP) 8. Stop the external writer: /trace ct,wtrstop=CTWTRPD
279
You use the IPCS utility to format the captured trace data into a user friendly format. During the format process, you have a choice of three levels of details: Summary, Short, and Full. Complete the following steps to format the trace data: 1. Access IPCS. 2. Select 2 (ANALYSIS) from the option list. 3. Select 0 (DEFAULT) from the option list and enter the trace data set name to be used as default source: Source ==> DSNAME('WAS5PD.SC49.CTRACE') Press PF3 to return to the previous panel. 4. Select 7 (TRACES) from the option list. 5. Select 1 (CTRACE) from the option list. 6. Select D (DISPLAY) from the option list. Enter the component name, subsystem name, and trace detail level as shown in Figure 22-3. To start formatting, type S on the command line and press Enter.
---------------------------------------------- CTRACE DISPLAY PARAMETERS COMMAND ===> System Component Subnames ===> ===> SYSTCPDA ===> TCPIP (System name or blank) (Component name (required)) (G or L, GMT is default) (mm/dd/yy,hh:mm:ss.dddddd or mm/dd/yy,hh.mm.ss.dddddd) Exception ===> (SHort, SUmmary, Full, Tally) (Exit program name)
GMT/LOCAL ===> G Start time ===> Stop time ===> Limit ===> 0 Report type ===> FULL User exit ===> Override source ===> Options ===>
To enter/verify required values, type any character Entry IDs ===> Jobnames ===> ASIDs ===> OPTIONS ===> CTRACE COMP(SYSTCPDA) SUB((TCPIP)) FULL
SUBS ===>
ENTER = update CTRACE definition. END/PF3 = return to previous panel. S = start CTRACE. R = reset all fields.
Formatting the trace data with a FULL detail level results in information in the following sections: Interface device IP header TCP header Message data Figure 22-4 on page 281 shows a generated report of one captured trace data packet. In the IP header, note the IP addresses (source and destination) and the date and time stamp. In the TCP header section, note the socket ports (source and destination).
280
4 SC49 PACKET 00000004 18:59:51.958438 Packet Trace From Interface : OSA2CA0LNK Device: QDIO Ethernet Full=448 Tod Clock : 2004/09/23 18:59:51.958438 Sequence # : 0 Flags: Pkt IpHeader: Version : 4 Header Length: 20 Tos : 00 QOS: Routine Normal Service Packet Length : 448 ID Number: 53CC Fragment : DontFragment Offset: 0 TTL : 127 Protocol: TCP CheckSum: 8996 FFFF Source : 9.12.6.160 Destination : 9.12.4.30 TCP Source Port Sequence Number Header Length Window Size
: : : :
Destination Port: 9508 () Ack Number: 3021682969 Flags: Ack Psh CheckSum: 00CA FFFF Urgent Data Pointer: 0000
HTTP request
IP Header : 20 000000 450001C0 53CC4000 7F068996 090C06A0 Protocol Header 000000 0BF02524 Data 000000 47455420 000010 697A4869 000020 312E310D 000030 0D0A5265 000040 2F2F7774 000050 6D2E636F 000060 6F6C732F 000070 67756167 000080 63657074 000090 7A69702C 0000A0 65722D41 0000B0 612F342E 0000C0 653B204D 0000D0 646F7773 0000E0 5420434C 0000F0 0A486F73 000100 736F2E69 000110 0A436F6E 000120 702D416C 000130 206D7370 000140 7265643B 000150 30303030 000160 42304C35 000170 31324632 000180 30314438 000190 30363430 : 20 B33C04B3 : 408 2F49424D 74436F75 0A416363 66657265 73633439 6D3A3935 0D0A4163 653A2065 2D456E63 20646566 67656E74 30202863 53494520 204E5420 5220312E 743A2077 626D2E63 6E656374 6976650D 3D616C72 204A5345 654E6E78 337A6A45 32423642 30303030 0D0A0D0A
090C041E
B41B3919 5018FAF0 00CA0000 Data Length: 408 546F6F6C 732F4542 |.......(.??%.... 6E742048 5454502F |.:....?.>.....&. 6570743A 202A2F2A |................ 723A2068 7474703A |................ 2E697473 6F2E6962 |............?... 30382F49 424D546F |_..?_........(.? 63657074 2D4C616E |?%...........</> 6E2D7573 0D0A4163 |../.....>....... 6F64696E 673A2067 |......>.?..>.... 6C617465 0D0A5573 |:.......%/...... 3A204D6F 7A696C6C |......>...(?:.%% 6F6D7061 7469626C |/.......?_./...% 362E303B 2057696E |...(...........> 352E313B 202E4E45 |.?...+........+. 312E3433 3232290D |...<............ 74736334 392E6974 |..?............. 6F6D3A39 3530380D |.?..._..?_...... 696F6E3A 204B6565 |..?>>....?>..... 0A436F6F 6B69653A |...%......??,... 65616479 4F666665 |._.../%../.`|... 5353494F 4E49443D |..........|+... 5F415854 6774655F |.....+>.^......^ 6655513A 42424342 |..<..:.......... 43443630 30303030 |................ 30303032 30393043 |................ |........
GET /IBMTools/EB| izHitCount HTTP/| 1.1..Accept: */*| ..Referer: http:| //wtsc49.itso.ib| m.com:9508/IBMTo| ols/..Accept-Lan| guage: en-us..Ac| cept-Encoding: g| zip, deflate..Us| er-Agent: Mozill| a/4.0 (compatibl| e; MSIE 6.0; Win| dows NT 5.1; .NE| T CLR 1.1.4322).| .Host: wtsc49.it| so.ibm.com:9508.| .Connection: Kee| p-Alive..Cookie:| msp=alreadyOffe| red; JSESSIONID=| 0000eNnx_AXTgte_| B0L53zjEfUQ:BBCB| 12F22B6BCD600000| 01D800000002090C| 0640.... |
Session cookie
For more information about the TCP/IP for z/OS packet trace, see z/OS V1R5.0 Communication Server: IP Diagnosis Guide, GC31-8782. For more information about the IPCS tool, see OS/390 V2R10.0 MVS Interactive Problem Control System (IPCS) Users Guide, GC28-1756.
281
MXI can display a wealth of information from your system, including: APF, LNKLST, and LPA data sets Active address spaces and ASVT slot usage Allocated data sets for any address space Master and user catalogs Common storage usage by address space or subpool Orphaned common storage Cross-memory connections CPU and LPAR information Online DASD and tape units Enqueue requests and contention HSM request queues ISPF screen images of any user LLA module statistics Memory contents of any address space Memory delete queue Real and auxiliary storage usage SMS classes SMS storage groups Subsystems SVCs and PC routines Sysplex information WLM information Figure 22-5 shows an excerpt of the MXI Primary Option menu.
282
Most of the displays can be filtered using ISPF-like masking characters, and many display fields have point-and-shoot functionality that drills down to a more detailed display.
283
For CICS regions called upon by WebSphere transactions. For DB2. For other activity not directly related to our WebSphere environment. Because our exercise was to illustrate a production environment as opposed to a lab controlled environment, this was done to isolate started tasks, systems management tasks, TSO users, and so forth that were concurrently active in the sysplex.
//RMFRPT52 JOB (999,POK),'FRANCK',CLASS=A,REGION=4096K, // MSGCLASS=T,TIME=90,MSGLEVEL=(1,1),NOTIFY=&SYSUID //RMFSORT EXEC PGM=SORT,REGION=0M //******** SORTIN DATA SETS FOLLOWING HERE ************************* //SORTIN DD DISP=SHR, // DSN=FRANCK.SMF.D06T1700 //SORTOUT DD DISP=(NEW,PASS),DSN=&&SORTOUT,UNIT=SYSALLDA, // SPACE=(CYL,(50,50)),DCB=*.RMFSORT.SORTIN //SORTWK01 DD DISP=(NEW,DELETE), // DSN=&&WK1,UNIT=SYSALLDA,SPACE=(CYL,(50,50)) //SORTWK02 DD DISP=(NEW,DELETE), // DSN=&&WK2,UNIT=SYSALLDA,SPACE=(CYL,(50,50)) //SORTWK03 DD DISP=(NEW,DELETE),DSN=&&WK3, // UNIT=SYSALLDA,SPACE=(CYL,(50,50)) //SYSPRINT DD SYSOUT=* //SYSOUT DD SYSOUT=* //SYSIN DD * SORT FIELDS=(11,4,CH,A,7,4,CH,A),EQUALS MODS E15=(ERBPPSRT,500),E35=(ERBPPSRT,500) //POST1 EXEC PGM=ERBRMFPP //MFPINPUT DD DSN=*.RMFSORT.SORTOUT,DISP=(OLD,PASS) //* REPORTS (CHAN) //* REPORTS (ENQ) //* REPORTS (IOQ) //* REPORTS (PAGING) //* REPORTS (DEVICE(DASD)) //* REPORTS (OMVS,HFS) //* //SYSIN DD * RTOD(0000,2400) STOD(0000,2400) REPORTS (CPU) SUMMARY (INT) SYSOUT(T) //POST2 EXEC PGM=ERBRMFPP //MFPINPUT DD DSN=*.RMFSORT.SORTOUT,DISP=(OLD,PASS) //* SYSRPTS (WLMGL(POLICY(FRANCK.LSM301_1))) 284
Problem Determination for WebSphere for z/OS
//* //SYSIN DD * RTOD(0000,2400) STOD(0000,2400) SUMMARY (INT) SYSRPTS (WLMGL(RCLASS(WAS*,OTHER,SYS*))) SYSRPTS (WLMGL(POLICY,SCPER(WAS*))) SYSOUT(T)
Storage
I/O activity
Additional resource reports for channels, paging, and virtual storage, can be further investigated.
SYSTEM ID SC48 RPT VERSION V1R2 RMF MVS BUSY TIME PERC 95.15 95.13 95.14 SAMPLES = 0 ----0.0 1 ----6.9 CPU SERIAL NUMBER 0B0ECB 1B0ECB
301 2 ----3.3 DISTRIBUTION OF QUEUE LENGTHS 3 4 5 6 7-8 ----- ----- ----- ----- ----12.2 49.1 21.5 4.6 1.6 (%) 9-10 ----0.0 11-12 ----0.3 13-14 ----0.0
We run in LPAR mode, so our report told us that the partition SC48 is running at 90.16% CPU busy. Although it is common to talk about a CPU being p% busy, this is an abbreviated statement that has no physical reality. At any time, a CP only has two operational states: Busy (that is, 100 percent busy) Idle (that is, 0 percent busy)
285
All CPU percentages in the RMF reports are relative to the RMF measurement time interval. The CPU percentages that are reported express the amount of time that the CPU was busy over the measurement interval. Hence, the correct way to understand the report really reads, the CPU is 100% busy p% of the time. Also check the IN READY line. This is the dispatching queue, that is, the work that is in the system and ready to be dispatched. In the example in Figure 22-6 on page 285, no immediate CPU contention is visible. Although CPU is 90% busy (something not unusual in a z/OS environment), the IN READY queue length remains below three times the number of CPs for more than 90% of the time. This might not be a problem if there are non-time-critical batch jobs running in the background.
SYSTEM ID SC48 DATE 11/17/2002 INTERVAL 05.00.509 RPT VERSION V1R2 RMF TIME 16.54.59 MVS PARTITION NAME A11 IMAGE CAPACITY 171 NUMBER OF CONFIGURED PARTITIONS 15 NUMBER OF PHYSICAL PROCESSORS 13 CP 7 ICF 6 WAIT COMPLETION NO DISPATCH INTERVAL DYNAMIC --------- PARTITION DATA -----MSU-NAME S WGT DEF A1 A 180 45 A2 A 10 30 A3 A 180 0 A4 A 10 0 A5 A 10 45 A6 A 10 0 A7 A 10 45 A8 A 10 0 A9 A 10 0 A10 A 10 0 A11 A 180 0 A12 A 10 50 *PHYSICAL* TOTAL .... -- AVERAGE PROCESSOR UTILIZATION PERCENTAGES -LOGICAL PROCESSORS --- PHYSICAL PROCESSORS --EFFECTIVE TOTAL LPAR MGMT EFFECTIVE TOTAL 4.65 5.05 0.11 1.33 1.44 4.52 4.91 0.11 1.29 1.40 4.52 4.92 0.11 1.29 1.41 0.78 0.92 0.04 0.22 0.26 4.12 4.52 0.11 1.18 1.29 10.19 10.59 0.12 2.91 3.03 4.52 4.97 0.13 1.29 1.42 3.99 4.38 0.11 1.14 1.25 3.60 3.99 0.11 1.03 1.14 3.31 3.71 0.11 0.95 1.06 89.93 90.16 0.07 25.69 25.76 0.91 1.04 0.04 0.26 0.30 4.19 4.19 ----------- -----5.36 38.58 43.95
286
In the example report notice, that: The running partition is A11. It is using 90.16% of its logical CPs or 25.76% of the server CP capacity. The zSeries CPs are only used 43.95% of the time. The Physical Management Time that is reported by RMF in the PHYSICAL* line indicates the amount of processor time that is required to manage all active LPARs. The partition that is named PHYSICAL does not exist; the line is created by RMF for reporting purposes. The logical partition Dispatch Time Effective that is indicated for each configured partition (Figure 22-8) is the sum of the z/OS captured time and the z/OS uncaptured time. The Partition LPAR Management Time is not a collected value, but is calculated by: DISPATCH TIME DATA EFFECTIVE - DISPATCH TIME DATA TOTAL.
z/OS V1R3
SYSTEM ID SC48 DATE 11/17/2002 RPT VERSION V1R2 RMF TIME 16.54.59 MVS PARTITION NAME A11 IMAGE CAPACITY 171 NUMBER OF CONFIGURED PARTITIONS 15 NUMBER OF PHYSICAL PROCESSORS 13 CP 7 ICF 6 WAIT COMPLETION NO DISPATCH INTERVAL DYNAMIC --------- PARTITION DATA --------------------MSU---- -CAPPING-NAME S WGT DEF ACT DEF WLM% A1 A 180 45 4 NO 0.0 A2 A 10 30 4 NO 0.0 A3 A 180 0 4 NO 0.0 A4 A 10 30 1 NO 0.0 A5 A 10 45 4 NO 0.0 A6 A 10 0 9 NO 0.0 A7 A 10 45 4 NO 0.0 A8 A 10 0 4 NO 0.0 A9 A 10 0 3 NO 0.0 A10 A 10 0 3 NO 0.0 A11 A 180 0 77 NO 0.0 A12 A 10 50 1 NO 0.0 *PHYSICAL* TOTAL C1 A C2 A C3 A *PHYSICAL* DED DED DED 0 0 0 86 86 86 0.0 0.0 0.0 2 2 2 ICF ICF ICF -- LOGICAL PARTITION PROCESSOR PROCESSOR- ----DISPATCH TIME NUM TYPE EFFECTIVE 2 CP 00.00.27.930 2 CP 00.00.27.137 2 CP 00.00.27.192 2 CP 00.00.04.705 2 CP 00.00.24.785 2 CP 00.01.01.222 2 CP 00.00.27.174 2 CP 00.00.23.989 2 CP 00.00.21.648 2 CP 00.00.19.890 2 CP 00.09.00.486 2 CP 00.00.05.489 -----------00.13.31.653 00.10.00.977 00.10.00.774 00.10.00.644
Figure 22-8 Partition Data report and processing weights (partial view)
Note that the flexibility that is brought by logical partitioning adds an additional level of complexity to the performance analysis; unless the LPAR is capped, the amount of CPU processing power that the partition can use can vary: The minimum CPU that the logical partition (LP) is entitled to is determined by the processing weights set as part of the partitioning definition: Min LP CP share = Your LP weights / sum of all LP weights
287
This occurs when other partitions require their full share of CP resources. In Figure 22-8 on page 287, the sum of all WGT is 630, while our logical partition (A11) has a processing weight of 180. That means that the guaranteed CP share is 180/630 = 28.57% of the shared CPs. The maximum CPU that the logical partition can use is fixed by the ratio of the number of CPs that is defined in the partition to the total number of available CPs in the shared pool: Max LP CP share = number of CPs / sum of shared CPs This occurs when other partitions do not need their full share of CPU resources. In Figure 22-8 on page 287, there are seven shared CPs available while our LP (A11) has 2 CPs defined. This means that the maximum usable CPU share is 2/7 = 28.57% of the shared CPs. In this example, we were able to align both the minimum and maximum values to simplify our tests, but in a real production environment this might not always be possible, nor desirable. Additionally, if the partition is part of an LPAR cluster (the set of LPARs in a single server that belongs to the same parallel sysplex), WLM can dynamically adjust the number of logical processors and the weight of an LPAR. This allows the system to distribute the CPU resources in an LPAR cluster to partitions where the CPU demand is high. Because the processing weights can be dynamically adjusted, either by operations personnel or by LPAR cluster management, remember to check their settings before you start a time consuming workload analysis. Note: All percentages indicated in the partition data report are relative to the RMF time interval. As such, they accurately show the amount of time that physical CPs were dispatched on behalf of a LPAR. However, these time-based figures do not take into account all processor costs of operating in LPAR mode and do not reflect the resulting processor power that is expressed in the Large System Performance Reference (LSPR) ITRs or MIPS. The LPAR Capacity Estimator (LPARCE) tool should be run to estimate the impact of the LPAR configuration on the processing power. Consult your IBM support representative to obtain an LPARCE review for your configuration.
Summary report
This report (Figure 22-9) provides a summary view of the entire systems activity over multiple measurement intervals.
R M F PAGE 001 z/OS V1R3 SYSTEM ID SC48 RPT VERSION V1R2 RMF START 11/17/2002-16.24.59 END 11/17/2002-17.00.00 INTERVAL 00.04.59 CYCLE 1.000 SECONDS S U M M A R Y R E P O R T
NUMBER OF INTERVALS 7 DATE TIME INT MM/DD HH.MM.SS MM.SS 11/17 16.24.59 05.00 11/17 16.30.00 05.00 11/17 16.35.00 04.59 11/17 16.39.59 05.00 11/17 16.45.00 04.59 11/17 16.50.00 04.59 11/17 16.54.59 05.00
DASD DASD RESP RATE 2 264.2 8 19.8 4 69.0 4 50.7 4 46.6 2 277.9 2 328.3
JOB MAX 0 0 0 0 0 0 0
JOB AVE 0 0 0 0 0 0 0
TSO MAX 2 2 2 2 2 2 2
TSO AVE 2 2 2 2 2 2 2
ASCH MAX 0 0 0 0 0 0 0
ASCH AVE 0 0 0 0 0 0 0
OMVS MAX 6 5 5 5 5 5 5
OMVS AVE 5 5 5 5 5 5 5
SWAP DEMAND RATE PAGING 0.00 0.20 0.00 0.07 0.00 0.63 0.00 0.11 0.00 0.02 0.00 0.09 0.00 0.23
288
When you know your average system statistics, it is a very useful report for quickly spotting unusual behavior regarding: CPU busy DASD rate, that is, disk I/O activity per second Swap rate and paging demand
Workload reports
The RMF workload activity report contains information about your workload. The interpretation of the numbers depends on whether you are reporting a workload, a service class, or a reporting class. We strongly recommend using reporting classes.
Enclave report
An enclave report is a workload report for a reporting class that is associated with a WebSphere workload that is running in enclaves. It corresponds to the WLM definitions in the CB subsystem. Figure 22-10 and Figure 22-11 on page 290 show parts of a sample report.
REPORT CLASS=WASE DESCRIPTION =LSA510 WAS EBUSINESS WORKLOAD --DASD I/O-SSCHRT 1.5 RESP 1.8 CONN 1.2 DISC 0.3 Q+PEND 0.3 IOSQ 0.0
TRANSACTIONS AVG 2.28 MPL 2.28 ENDED 6777 END/S 22.59 #SWAPS 0 EXCTD 0 AVG ENC 2.28 REM ENC 0.00 MS ENC 0.00
TRANS.-TIME HHH.MM.SS.TTT ACTUAL 147 EXECUTION 101 QUEUED 45 R/S AFFINITY 0 INELIGIBLE 0 CONVERSION 0 STD DEV 201
In these examples: AVG is the average number of active transactions during the interval. MPL is the average number of transactions in storage during the measurement interval. ENDED is the number of transactions that ended during the interval, and END/S is the number of transactions that ended per second. If the reporting class is set up correctly, this is a direct measure of the application throughput as seen by WebSphere. AVG ENC is the average number of enclaves concurrently active at any time. This information can be useful for sizing storage requirements or system recovery aspects. The DASD I/O section indicates the profile of the disk activity in your workload. High values for DISC, Q+PEND, or IOSQ might indicate an elongated response time. The SSCHRT field indicates the disk start subchannel rate, in numbers per second. From this section, you can detect a possible delay caused by I/O activity to the disk subsystem. By comparing this value with the DASD I/O column in the Summary Report, it is possible to quantify to what extent the WebSphere application participates in the I/O activity and possibly determine whether some system tuning actions are required. TRANS.-TIME contains the transaction time in HHH.MM.SS.TTT units that is seen by WLM. This is from the time that the transaction is put in the Servant Region WLM queue until the time the transaction is completed:
289
ACTUAL is the actual amount of time required to complete the work submitted under the service class. This is the total response time. QUEUED is the average time that the WebSphere transaction was delayed in the WLM queue. The time can increase under full load conditions if the number of servers in MAX_SRS is too low. STD DEV is the standard deviation of ACTUAL. It is a measure of variability of the data in the sample. The higher the standard deviation, the more spread-out it looks on a graph (Figure 22-11).
--DASD I/O-SSCHRT 1.5 RESP 1.8 CONN 1.2 DISC 0.3 Q+PEND 0.3 IOSQ 0.0
--SERVICE RATES-- PAGE-IN RATES ---STORAGE---ABSRPTN 38580 SINGLE 0.0 AVG 0.00 TRX SERV 38580 BLOCK 0.0 TOTAL 0.00 TCB 229.7 SHARED 0.0 CENTRAL 0.00 SRB 0.0 HSP 0.0 EXPAND 0.00 RCT 0.0 HSP MISS 0.0 IIT 0.0 EXP SNGL 0.0 SHARED 0.00 HST 0.0 EXP BLK 0.0 APPL % 76.6 EXP SHR 0.0
Note that the STORAGE field is always 0 for an enclave type report. Since enclaves are not associated with a specific address space, no storage values are reported. The APPL% field indicates the CPU activity incurred on behalf of all activities that are part of the enclave. It is expressed as a percentage of CP time used over the interval. Note that this represents all the CPU activity across all address spaces that are spanned by the transaction, including DB2 and CICS if the transaction contains JDBC or JCA connectors. No activity (or response time) information is reported by WLM in the CICS assigned service class or report class. From the above fields, it is possible to calculate the average CP cost per transaction. Using APPL%, the measurement interval length expressed in milliseconds and the number of ended transactions over the interval are multiplied: CP_millisecPerTran = Interval_length in milliseconds * APPL% / 100 / ENDED Note that there are now multiple APPL% values to show zAAP activity as well. Using the RMF fields for the WASE report class in Figure 22-10 on page 289, you can determine the following values for the specific measurement interval: 2.28 transactions were concurrently active, all of them running in enclaves. A total of 6777 transactions ended, which translates into an average throughput of 22.59 transactions per second. The average response time was 147 ms, with a standard deviation of 201 ms. For the measurement interval, APPL% shows that one CP was busy 76.6% of the time to service WASE. Because the measurement interval is 5 minutes, this translates as: Used CP time = 300 sec x .766 = 229.8 sec Over the same interval, 6777 transactions were processed. The average CP cost is: CP_MillisecPerTran CP_MillisecPerTran 290 = 229.8 x 1000 / 6777 = 33.90 ms
REPORT BY: POLICY=LSA510 TRANSACTIONS AVG 2.00 MPL 2.00 ENDED 0 END/S 0.00 #SWAPS 0 EXCTD 0 AVG ENC 0.00 REM ENC 0.00 MS ENC 0.00 --SERVICE RATES-ABSRPTN 181961 TRX SERV 181961 TCB 8.0 SRB 0.3 RCT 0.0 IIT 0.0 HST 0.0 APPL % 2.8
REPORT CLASS=WASS DESCRIPTION =LSA510 WAS SERVER AS ACTIVITY PAGE-IN RATES ----STORAGE---SINGLE 0.0 AVG 56146.9 BLOCK 0.0 TOTAL 112293 SHARED 0.0 CENTRAL 112293 HSP 0.0 EXPAND 0.00 HSP MISS 0.0 EXP SNGL 0.0 SHARED 3216.83 EXP BLK 0.0 EXP SHR 0.0
Figure 22-12 Workload report for WebSphere server address space (partial)
There are three major differences in the interpretation of the data, because the reported activity is address-space based: The TRANSACTION AVG indicates the number of Servant Region address spaces that are active over the interval. Using this field, you can monitor the evolution of the number of servers between the MIN_SRS and MAX_SRS settings. STORAGE values are now provided. Under normal conditions, the APPL% is typically very low. However, a gradual increase in APPL% might be an indication of excessive garbage collector activity caused by a heap size that is too small, or a memory leak. Using workload definitions, it is possible to calculate the system uncaptured percentage value. This is the part of CP resources that is used by system-related services on behalf of the workloads but not directly accounted for in the enclave or address space activity: 1. For each member in the sysplex, multiply the CPU_Busy% obtained from the CPU report by the number of CPs available to the z/OS LPAR. This brings the percentage value to a unit consistent with the APPL% reported in the workload report. Then, the sum for all systems participating in the sysplex is: All_CP_Busy% = Sum of [CPU_Busy% * Number of CPs] 2. From the RMF Workload Activity report, obtain the total CP utilization that has been reported for all workloads. This is indicated by the APPL% value for the policy. The report is obtained when option WLMGL(POLICY) is specified. The APPL% value for the policy represents the percentage of time that any CP in the sysplex configuration was busy processing a workload that was defined in the WLM policy: ALL_Wkl% = APPL% from RMF Policy report 3. The uncaptured CP value, expressed in percentage of CP activity over the measurement interval, is calculated by subtracting ALL_Wkl% obtained in step 2 from All_CP_Busy% calculated in step 1: uncaptured_CP% = All_CP_Busy% - ALL_Wkl% Typically, the uncaptured CP% represents 10% to 20% of the total CP utilization.
291
z/OS V1R3
----TIME---50 60 HH.MM.SS.TTT CUM TOTAL |..|..|..|..|..|..|..|..|.. < 00.00.00.250 3716 >>>>>>>>>>>>>>>>>>> <= 00.00.00.300 4129 <= 00.00.00.350 4601 <= 00.00.00.400 5041 <= 00.00.00.450 5363 <= 00.00.00.500 5633 <= 00.00.00.550 5876 <= 00.00.00.600 6119 <= 00.00.00.650 6277 <= 00.00.00.700 6445 <= 00.00.00.750 6601 <= 00.00.01.000 7290 <= 00.00.02.000 9022 > 00.00.02.000 10075
W O R K L O A D A C T I V I T Y SYSPLEX WTSCPLX1 DATE 12/01/2002 INTERVAL RPT VERSION V1R2 RMF TIME 17.30.00 POLICY ACTIVATION DATE/TIME 11/26/2002 03.00.58 ----------RESPONSE TIME DISTRIBUTION-----------NUMBER OF TRANSACTIONS--------PERCENT------- 0 10 20 30 40 IN BUCKET 3716 413 472 440 322 270 243 243 158 168 156 689 1732 1053 CUM TOTAL 36.9 41.0 45.7 50.0 53.2 55.9 58.3 60.7 62.3 64.0 65.5 72.4 89.5 100 IN BUCKET 36.9 4.1 4.7 4.4 3.2 2.7 2.4 2.4 1.6 1.7 1.5 6.8 17.2 10.5 >>> >>> >>> >> >> >> >> >> >> >> >>>> >>>>>>>>> >>>>>>
The interpretation of the data requires knowledge of the application workload. If you have a coherent J2EE application, response time distribution is concentrated into one peak, but if the application contains a mix of static HTML pages and J2EE transactions, the response time distribution may show two peaks that reflect the two different types of transactions. From this information, it is also possible to set an achievable percentile response time, a value commonly used in establishing service level agreements.
22.3.3 References
For more information about RMF reports, see the following manuals: z/OS Resource Measurement Facility Report Analysis, SC33-7991. z/OS Resource Measurement Facility User s Guide, SC33-7990 z/OS Resource Measurement Facility Performance Management Guide, SC33-7992 These are available in the Elements and Features list for your specific z/OS version at: http://www-03.ibm.com/servers/eserver/zseries/zos/bkserv/ See the following Web sites: http://www.ibm.com/servers/eserver/zseries/zos/rmf/ http://www.ibm.com/servers/eserver/zseries/zos/wlm/
292
e. Click OK or Apply. f. Save the changes and make sure a file synchronization is performed before you restart the servers.
293
2. Edit the SMFPRMxx parmlib member and update the SYS or SUBSYS(STC,...) statement to include the type 120 record. Example 22-4 shows a sample SMFPRMxx member that creates interval records every 2 minutes and records the following SMF record types: 30: Address space 70 to 79: RMF 82: Crypto 88 to 90: System logger, usage, and system data 101: DB2 110: CICS 120: WebSphere
ACTIVE /*ACTIVE SMF RECORDING*/ DSNAME(&SYSNAME..MAN1, &SYSNAME..MAN2) /*TWO MAN DATASETS */ LISTDSN /* LIST DATA SET STATUS AT IPL*/ NOPROMPT /* DON'T PROMPT THE OPERATOR */ INTVAL(02) /* SMF GLOBAL RECORDING INTERVAL */ SYNCVAL(00) /* GLOBAL SYNC VALUE */ MAXDORM(3000) /* WRITE AN IDLE BUFFER AFTER 30 MIN*/ STATUS(010000) /* WRITE SMF STATS AFTER 1 HOUR*/ SID(&SYSNAME(1:4)) /* USE SYSNAME AS SID */ SUBSYS(STC,EXITS(IEFU29,IEFACTRT),INTERVAL(SMF,SYNC), TYPE(0,30,70:79,88:90,101,110,120,245)) To avoid collecting more SMF data than you need, review SMFPRMxx to ensure that only the minimum number of records are being collected. Use SMF 92 or 120 only for diagnostics. SMF 92 records are created each time an HFS file is opened, closed, deleted, and so forth. Almost every Web server request references HFS files, so thousands of SMF 92 records are created. Unless you must have this information, turn off SMF 92 records. You might find that running SMF 120 records in production is appropriate, because these records provide information that is specific to WebSphere applications, such as response time for J2EE artifacts and bytes transferred. If you do choose to run with SMF 120 records enabled, the authors recommend that you use the server interval SMF records and container interval SMF records rather than the server activity records and container activity records. 3. Use SET=xx to activate the SMFPRMxx member from SYSx.PARMLIB. Use the D SMF,O to display the parameters in effect. You must issue the SET command before you start WebSphere Application Server. If you issue the command after the application server has started, SMF 120 records will not be collected. 4. For the changes to take effect, restart the application server. 5. Use a tool such as WebSphere Studio Workload Simulator (see 22.5.1, WebSphere Studio Workload Simulator for z/OS and OS/390 on page 300) to simulate an application stress load. While the transactions are running, switch to SDSF and RMF to observe the transactions. 6. Format the SMF recording output data set for printing to the screen or other output device: a. Switch the SMF data sets by entering i smf from the MVS console. b. Run the SMF Dump program (IFASMFDP) to create a sequential data set. A sample is shown in z/OS MVS System Management Facilities (SMF), SA22-7630. c. You have successfully formatted the output data set when SMFDUMP ends with return code 0. 7. To interpret the output data set see 22.4.2, WebSphere for z/OS SMF browser. 294
Problem Determination for WebSphere for z/OS
For an overview of SMF recording, see Chapter 1 of z/OS MVS System Management Facilities, SA22-7630. After WebSphere performance data is collected, it can be monitored and analyzed with a variety of tools: Monitor performance with Tivoli Performance Viewer (formerly Resource Analyzer) as described in 21.8, Tivoli Performance Viewer on page 270. This tool is included with WebSphere. Use third-party vendor tools or write your own applications to exploit the Performance Monitoring Infrastructure (PMI). Search for Developing your own monitoring applications at the WebSphere Information Center. Use RMF as discussed in 22.3.2, Analyzing RMF reports on page 285. See RMF Workload Activity reports and RMF Monitor III at the WebSphere Information Center. Refer to WLM Delay Monitoring at the WebSphere Information Center.
295
//INSMF2 DD DSN=SYS1.SC48.MAN2,DISP=SHR //SMFDATA DD DSN=FRANCK.SC48T.SMF, // DCB=(RECFM=VBS,LRECL=32760), // SPACE=(CYL,(25,50)), // UNIT=SYSALLDA, // DISP=(NEW,CATLG) //* //SYSPRINT DD SYSOUT=* //SYSIN DD * OUTDD(SMFDATA,TYPE(120)) INDD(INSMF1,OPTIONS(DUMP)) INDD(INSMF2,OPTIONS(DUMP)) 4. To interpret SMF data from our file named FRANCK.SC48T.SMF and produce a detailed report the WTSCplexSMFout.txt file, run this command (in TSO OMVS, all on one line): java -cp WSCSMFperfV510.jar com.ibm.ws390.sm.smfview.Interpreter "FRANCK.SC48T.SMF" 1>WTSCplexSMFout.txt 5. To add the summary report showing the performance data, specify a second parameter (all on one line): java -cp WSCSMFperfV510.jar com.ibm.ws390.sm.smfview.Interpreter "FRANCK.SC48T.SMF" "./WTSCplexSMFsummary.txt" 1>WTSCplexSMFout.txt The summary report of the z/OS SMF Browser will be saved in the WTSCplexSMFsummary.txt file and is available for browsing or editing through ISPF. Note: It is implicit in the Java command parameters that your current working directory is the tools directory. If this is not the case, you receive a NoClassDefFoundError on com.ibm.ws390.sm.smfview.Interpreter. Java does not generate a diagnostic when it does not find WSCSMFPerfV510.jar in the current directory. 6. Enable SMF recording as described in 22.4.1, Setting up SMF recording on page 293, and The SMF Dump Program in z/OS MVS System Management Facilities (SMF), SA22-7630. Figure 22-14 on page 297 shows a sample summary report.
296
The detailed report file lists each activity that occurs during the collection interval for the server, Web container, and J2EE container. The summary report file sample from an application called Trade2A is shown in Example 22-6.
Example 22-6 SMF Browser (1)
WSC SMF 120 Performance Summary2 -Date: Sun Nov 10 13:37:00 EST 2002 , SysID: SC52 SMF -Record Time Server Bean/WebAppName Bytes Bytes # of El.Time(mSec) Numbr -Type hh:mm:ss Instance Method/Servlet Sent Rec'd Calls Ave. Max. 1---+----1----+----2----+----3----+----4----+----5----+----6----+----7----+----8----+ 30 120.1 13:37:00 FMISRVC 579 4191 31 120.5 13:37:00 FMISRVC Trade_WebApp dispatch() 1 1 1 32 120.7 13:37:00 FMISRVC /welcome.jsp 1 JSP 1.1 Processor 1 trade Web Application_0 33 120.1 13:37:00 FMISRVC 863 4559 34 120.5 13:37:00 FMISRVC TradeRegistryBean findByPrimaryKey(trade.Registr 1 1 1 >ejbLoad 1 0 0 >ejbActivate 1 0 0 login(java.lang.String) 1 0 0
297
>ejbStore >ejbPassivate TradeAccountBean findByPrimaryKey(trade.Account >ejbLoad >ejbActivate getBalance() >ejbPassivate Trade_WebApp dispatch() TradeSession create() login(java.lang.String,java.la getBalance(java.lang.String) 35 120.7 13:37:00 FMISRVC /tradehome.jsp TradeAppServlet JSP 1.1 Processor trade Web Application_0
1 1 1 1 1 1 1 1 2 1 1
0 0 3 0 0 0 0 16 0 1 3 1 15 1
0 0 3 0 0 0 0 16 0 1 3
This trace shows the activities in the server, J2EE container, and Web container that are caused by a login transaction in the Trade2 sample at a specific time. It includes invocation of welcome.jsp, tradehome.jsp, and TradeAppServlet by the Web container, and EJB activities such as each method invocation of TradeRegistryBean entity EJB, TradeAccountBean entity EJB, and TradeSession session EJB. Response time for each method call and the number of bytes downstream and upstream served by the server are also collected. The summary report displays statistics, such as average and maximum elapsed time for the server, container, Web container, and J2EE container, for each type of activity during the collection interval. A type of activity can be the same JSP invocation, or the same method call on the same EJB. The following is a sample of a summary report file from an application called elITSO, for an interval of 5 minutes.
Example 22-7 SMF Browser (2)
WSC SMF 120 Performance Summary2 -Date: Mon Nov 18 18:15:03 EST 2002 , SysID: SC50 SMF -Record Time Server Bean/WebAppName Bytes Bytes # of El.Time(mSec) Numbr -Type hh:mm:ss Instance Method/Servlet Sent Rec'd Calls Ave. Max. 1---+----1----+----2----+----3----+----4----+----5----+----6----+----7----+----8----+ 44 120.3 19:30:01 FMESRVB 34004 226469 45 120.6 19:30:01 FMESRVB ItemEntity findByPrimaryKey(itemEntityPac 5 1553 2129 WebERWWNO_WebApp create() 3 1 1 driveLoadServlet(java.lang.Str 1 1345 1345 dispatch() 8 3587 19361 WarehouseEntity findByPrimaryKey(warehouseEnti 17 3428 7107 WebERWWJustPC_WebApp create() 4 0 0 driveLoadServlet(java.lang.Str 2 992 1002 dispatch() 8 11524 49825 PriceChangeSession create() 5 17 45 priceChangeSession(priceChange 5 3639 6930 298
Problem Determination for WebSphere for z/OS
PaySession create() paySession(paySessionPackage.P DeliverySession deliverySession(deliverySessio create() RemoteWebContainer create() driveLoadServlet(java.lang.Str WebERWWD_WebApp create() driveLoadServlet(java.lang.Str dispatch() NewOrderSession create() NewOrderEntity findByWIdAndDId(short,short,bo WebERWWPY1_WebApp create() driveLoadServlet(java.lang.Str dispatch() WebERWWOS_15 WebERWWSL_17 WebERWWPQ_16 WebERWWjmsPRR_25 eRWWPriceChangeHTTPSession_26 WebERWWPC_21 DEController SimpleFileServlet JSP 1.1 Processor /DEAGResults.jsp WebERWWDelivery_20 SimpleFileServlet WebERWWNO_19 SimpleFileServlet /error.jsp JSP 1.1 Processor WebERWWJustPC_14 PAYController /PAYAGResults.jsp SimpleFileServlet /error.jsp JSP 1.1 Processor WebERWWPay_24 57124 147311
15 15 1 2 14 14 4 2 2 2 7 12 9 26
1 1 1 1 8 3 5 5 8 8 18 8 8
For example, we can see that the findByPrimaryKey method on ItemEntity EJB was called five times with an average elapsed time of 1553 ms and maximum elapsed time of 2129 ms. Another example is SimpleFileServlet, which is responsible for serving static pages in the Web application. The report shows the number of SimpleFileServlet calls in each Web application and the average elapsed time in the Web container.
299
300
b. In the browser, type the URL of the Web site that is the source of the session data that you want to capture. Click Start in your Capture window to begin recording a script, as shown in Figure 22-16. The capture session starts and the data stream for the Web session is recorded.
c. Click Stop to end the recording. When the capture session ends, WebSphere Studio Workload Simulator prompts you to enter a script name and description (Figure 22-17).
Figure 22-17 WebSphere Studio Workload Simulator with scripts of captured sessions
d. The script shows a list of HTTP interactions (Web session elements). You can edit or change the value of these interactions (see Figure 22-18 on page 302).
301
Figure 22-18 WebSphere Studio Workload Simulator window: Web session elements
e. Variable elements in the script are revealed through a filter (Figure 22-19).
302
2. Set various runtime parameters (for example, number of clients, number of times to repeat the script, delay controls, turn dynamic cookies on or off, a time limit for the test, HTTP trace, and Socks support) before executing the script (Figure 22-20). The runtime parameters can be saved in a configuration file for reuse.
3. Run the script and monitor the test. When the script runs, you can monitor the test engine in real time with a Windows GUI (see Figure 22-21 on page 304).
303
304
See WebSphere Studio Workload Simulator Users Guide, SC31-6307, and WebSphere Studio Workload Simulator Getting Started, SC31-6383, on the WebSphere Studio Workload Simulator Library page for more information: http://www.ibm.com/software/awdtools/studioworkloadsimulator/library
305
Figure 22-23 shows a screen capture1 of the Microsoft Web Application Stress Tool in use.
Microsoft Visual Studio .NET Edition ships with a license for a tool called Application Center Test 1.0 that has similar functionality and that is easy to use. To learn more, visit: http://msdn.microsoft.com/library/default.asp?url=/library/en-us/act/htm/actml_mai n.asp
Microsoft product screen shot reprinted with permission from Microsoft Corporation.
306
You can download the application from: http://www.tucows.com/preview/195282.html Figure 22-24 shows the TeraTerm Pro emulator window.
307
22.6.4 UltraEdit
UltraEdit is a text editor, hexadecimal editor, HTML editor, and programmer editor. You can download it from the Web site for IDM Computer Solutions, Inc.: http://www.ultraedit.com/index.php
308
Some of the more popular features of this editor are: You can edit files remotely using FTP. This is especially useful when working with WebSphere for z/OS. You can easily edit traces, logs, configuration files, and so on that are data sets or HFS files in z/OS. You can edit or compare files in binary, hexadecimal, ASCII, and so on with: Easy management of the search utility (when you look for a string in a log, a window shows all the lines that contain the string) User-configurable syntax highlighting specific to the language that is being edited (Java, HTML, XML, C/C++, and so on). Column mode and useful macros Figure 22-26 shows an HTML file in the UltraEdit editor.
309
310
Appendix A.
311
Table A-2 gives an overview of where BBO messages come from and where they appear.
Table A-2 WebSphere for z/OS messages overview Prefix BBOJnnnnt BBOMnnnnt BBOOnnnnt BBOSnnnnt BBOTnnnnt DYNAnnnnt Come from JVM Runtime environment Control process, servant process, daemon, CORBA. (These are general messages.) Security system Transaction service Dynamic Fragment Cache Appear on or in Operators console error log, job log Operators console, error log, job log Operators console, error log, job log Operators console, error log, job log Operators console, error log, job log Job log
To look up specific message codes, follow these steps: 1. In the Information Center navigation panel, click WebSphere Application Server for z/OS V6 to see the table of contents. 2. Select Reference Troubleshooter Messages. 3. Choose the tab according to the first few letters in your message code. You can also search for the specific message or code with the search function at the top of the window.
312
Table A-3 BBOO0222I message components Msg ACIN ACWA ADFS ADMA ADMB ADMC ADMD ADME ADMF ADMG ADMK ADML ADMN ADMR ADMS ADMU ADNT APPR ASYN BBOJ BBOM BBOO BBOS BBOT BBZW BCDS Component Access Intent Work Area Mgmt File Service Subsystem Application Deployment Mgmt Config Archive Subsystem Mgmt Connector Subsystem Mgmt Process Discovery Mgmt Event Subsystem Mgmt Command Framework Mgmt Connector Subsystem Mgmt Utilities Mgmt Process Launching Tool Activity Service Mgmt Repository Mgmt Subsystem Mgmt Utilities Adaptive Entity Application Profile Asynchronous Beans EJB Container Naming Runtime, Web Security OTS and RRS WBI SF Install Business Context Data Service for Event Infrastructure Binding EJB References Channel Framework Event Infrastructure Validation PME Validation SIB Validation Validation XD Validation Compensation EJB Container Connection Manager CScope Service B Core Group Bridge A Service Integration Bus Msg CWSIY CWSIZ CWSJA CWSJB CWSJC CWSJD CWSJO CWSJQ CWSJR CWSJU CWSJW CWSWS CWUDD CWUDG CWUDM CWUDN CWUDQ CWUDR CWUDS CWUDT Component Y SIBus Mediation Handlers Z SIBus Mediation Framework A Admin B inter-bus messaging engine C SIBus Core SPI D Admin O SDO Repository Component Q MFP MQ interoperability component R SIBus U Jetstream Message Tracing W WLM Classifier S SIBus Web Services Web Services UDDI Deployment & Removal UDDI User Console UDDI Mgmt Interface UDDI Node Manager UDDI Migration UDDI Logging and Tracing UDDI SOAP Interface Msg PMON PMRM PMWC PROC PROX SCHD SECG SECJ SESN SIEG SOAP SRMC SRVE SSLC STFF STUP TCPC TRAS TUNE UDAI UDCF UDDA UDDM UDEJ UDEX UDIN UDLC UDPR UDRS UDSC UDSP UDUC UDUT UDUU UTLS WACS WACT WASX WBIA WHFW Component PMI, Tivoli Performance Viewer Performance Monitoring Request Metrics PME Edition Support Process Mgmt and Spawning Facility Proxy Scheduler WEBUI SecurityCenter Security Session and User Profiles Example SOAP Support Service Reference ManagerTransactions Transactions SSL Channel Staff Support Service Startup Beans TCP Channel Trace Facility Perform Auto-Tuning Support UDDI API UDDI Configuration UDDI Data Types UDDI DOM UDDI EJB Interface UDDI Exceptions UDDI Installation UDDI Local API UDDI Persistence UDDI Logging UDDI Security UDDI SOAP Interface UDDI User Console UDDI Utility Tools UDDI UUID Utilities Activity Session Service Activity Service Non WSCP Scripting Support for Business Integration Adapters Handler Framework
BNDE CHFW CHKC CHKP CHKS CHKW CHKX CMPN CNTR CONM CSCP CWRCB CWSIA
UDDI Registry Transaction Manager CWUDU UDDI Utility Tools CWUDV UDDI Value Set Tools CWUDX Web Services JAXR CWWCW W Validation CWWDR R Data Replication Service CWWSG G Web Service Gateway DCSV DCS DSRA DWCT DYNA EAAT ECNS ESOP Resource Adapters Dynamic Workload Mgmt Client Dynacache Placeholder Entity Change Notification Service State Observer Plug-in for Event Infrastructure
313
Msg CWSIB CWSIC CWSID CWSIE CWSIF CWSIH CWSII CWSIJ CWSIK
Component B SIBus Common C Communications D Admin E SIBus Externals F SIBus MFP H Jetstream MatchSpace I Security J COmmunications K SIBus Return Codes
Msg GWIN HMGR HTPC I18N ILMC INST IVTL J2CA JSAS JSFG JSPG JSSL LTXT MIGR MSGS NMSV OBPL ODCF ORBX PLGC PLGN PLPR PMGR PMI
Component Web Services Gateway HA Manager HTTP Channel Internationalization Service Instance Location Manager Install Installation Verification Tool J2EE Connector Security Association jsf (bean class type) Java Server Pages ORB SSL Extensions Localizable Text Release-to-Release Migration Tooling JMS Server Naming Service ObjectPool On Demand ConFiguration ORB Extensions Plug-in Configuration Generator Transactions Plug-in Processor Persistence Manager PMI
Msg WKSP WKSQ WLTC WMSG WSBB WSCL WSCP WSEC WSGW WSIF WSSC WSSK WSVM WSVR WSWS WTRN WUDU
Component Work Space Workspace Query Utilities Transaction Monitor Messaging Service WsByteBuffer WebSphere Client Non WSCP Scripting Web Services Security Web Services Gateway Web Services Invocation Framework SOAP Channels Web Services Security Kerberos Validation Manager Implementation Server Runtime Web Services
CWSIL L PSB CWSIM M SIBus Mediations SIMediationSession Interface N SIBus Mediations CWSIN Framework O SIBus Migration CWSIO P Jetstream Message CWSIP Processor CWSIQ CWSIR CWSIS CWSIT CWSIU CWSIV CWSIW CWSIX Q MQFap Channel R SIBus Core S MessageStore T TRM U Utilities V SIBus Resource Adapter W SIBus Mediations X SIBus Mediations
Transaction recovery WebUI Deployment Descriptor Utilities WUPD Update Installer WVER Product History Information WWLM WLM Client XMEM XMem Channel
Some of the minor code meanings are described at the WebSphere for z/OS Information Center: http://publib.boulder.ibm.com/infocenter/wasinfo/v6r0/index.jsp Important: Error (minor) codes that are not listed at the WebSphere for z/OS Information Center should be reported directly to IBM Support.
314
A.1.3 Abends
Table A-4 shows the WebSphere for z/OS related abend codes.
Table A-4 WebSphere-related abend codes Abend code CC3 DC3 EC3 Issuer Daemon processing failure Controller region processing failure Servant region processing failure
Some reason codes are also passed along with these abend codes. They are described in detail at the Information Center; search for abend (reason) codes. Table A-5 shows an example with an explanation quoted directly from the Information Center.
Table A-5 Example abend code and related reason code Abend code CC3 Abend reason 000C0009 Explanation An exception occurred on the main thread of execution, probably during initialization. The address space is abended with this code to cause the space to terminate. Suggested action Further information about the exception should be found in the job log for the space and also possibly in the error log.
If no explanation is given in the reason code, and no indication is found in any information source, the problem should be reported to IBM.
DB2 Information Center topic Messages and Codes is available at: http://publib.boulder.ibm.com/infocenter/dzichelp/index.jsp EZA EZB EZD EZY EZZ SNM Communications Server (TCP/IP) pppnnnnt
ppp: Prefix nnnn: Unique identifier t: Type with A - immediate action, E - eventual action, D - immediate decision, I information Example: EZZ0902I Source: z/OS V1.6 Communications Server: IP Messages: Volume 1-4, GC31-8783/4/5/6
315
Prefix ICH
Message structure ICHcnnt c: Identifies the RACF function, where: 0: SAF initialization 3: RACROUTE REQUEST=VERIFY macro 4: RACF processing 5: RACF initialization 7: RACF status 8: RACROUTE REQUEST=AUTH macro 9: RACROUTE REQUEST=DEFINE macro
IKJ
TSO/E
pppccnnnt ppp: Prefix cc: System module prefix (in decimal) nnn: Message serial number identifying the program that issued the message t:- Type, where: A - action; the terminal user must perform the action specified in the message text. E - error; processing terminates. I - information; no action is required. Example: IKJ55112E Source: z/OS V1R6.0 TSO/E Messages, SA22-7786
IRX
pppccnnt
ppp: Prefix cc: System module prefix (in decimal) nnn: Message serial number identifying the program that issued the message t: Type, where:
E - error; processing terminates. I - information; no action is required. Example: IRX0042I Source: z/OS V1R6.0 TSO/E Messages, SA22-7786
316
Prefix IWM
Message structure pppnnnt ppp: Prefix nnn: Message serial number t: Type, where: A - action by operator, D - decision by operator, E - eventual action by operator, I - information for operator/programmer, S - severe error, W - wait for operator action Example: IWM003I Source: z/OS V1R6.0 MVS System Messages, Vol 9 (IGF-IWM), SA22-7639
CEE EDC
pppnnnnt
GIM
SMP/E
pppnnnnnt ppp: Prefix nnnnn: Message serial number t: Type, where: I - informational, W - warning, E - error, S - severe, T - terminating Example: GIM20101S Source: z/OS V1R1 SMP/E Messages, Codes, and Diagnosis, GA22-7770
UNIX System Services (USS) Debugger USS Shell & Utilities System Logger
pppcnnnn
IXG
ATR
pppnnnt
317
Prefix IMW
Message structure pppnnnnt ppp: Prefix nnnnn: Message serial number t: Type, where: I - informational message, E - recoverable error, W - warning, S - serious error Message ID ranges: Components: IMW0001-IMW2000 - IMWHTTPD IMW2000-IMW2500 - Proxy Server IMW3501-IMW3700 - CONSOLE IMW3701-IMW3999 - HTCounter IMW4000-IMW5000 - HTIMAGE IMW5001-IMW6000 - HTADM IMW6100-IMW6900 - SSL Security Example: IMW0442E Source: IBM HTTP Server Planning, Installing, and Using, SC34-4826
318
Appendix B.
Additional material
This appendix refers to additional material that can be downloaded from the Web.
319
320
Related publications
The publications listed in this section are considered particularly suitable for a more detailed discussion of the topics covered in this redbook.
IBM Redbooks
For information about ordering these publications, see How to get IBM Redbooks on page 323. Note that some of the documents referenced here might be available in softcopy only. Monitoring WebSphere Application Performance on z/OS, SG24-6825 Systems Programmer's Guide to Resource Recovery Services (RRS), SG24-6980 WebSphere Application Server for z/OS V5 and J2EE 1.3 Security Handbook, SG24-6086 Installing WebSphere Studio Application Monitor V3.1, SG24-6491 Effective zSeries Performance Monitoring Using Resource Measurement Facility, SG24-6645 IBM Tivoli OMEGAMON XE V3.1.0 Deep Dive on z/OS, SG24-7155 WebSphere for z/OS V5 JVM Dump and Heap Analysis Tools, REDP-3950 Problem Determination Methodology for WebSphere on z/OS, REDP-6001 Problem Symptoms in WebSphere for z/OS and Their Resolution, REDP-6002 Problem Avoidance for WebSphere Application Server for z/OS, REDP-6003 WebSphere for z/OS Problem Determination Means and Tools, REDP-6880
Other publications
These publications are also relevant as further information sources: WebSphere Application Server for z/OS Version 6.0.2 Program Directory, GI11-2825 Migrating, Coexisting, and Interoperating, SA23-2207 Installing Your Application Serving Environment, GA22-7957 Administering Applications and Their Environment, GA22-7962 Setting Up the Application Serving Environment, GA22-7958 Using the Administrative Clients, SA23-2208 Securing Applications and Their Environment, SA22-7961 Developing and Deploying Applications, SA22-7959 Troubleshooting and Support, GA22-7964
Tuning Guide, SA22-7963 z/Architecture Principles of Operation, SA22-7832-03 EREP V3R5 Reference, GC35-0152 EREP V3R5 Users Guide, GC35-0151 z/OS V1R6.0 MVS Planning: Global Resource Serialization, SA22-7600-03
321
z/OS V1R6.0 MVS System Codes, SA22-7626-10 z/OS MVS System Commands, SA22-7627-11 z/OS MVS Diagnosis: Procedures, GA22-7587 z/OS V1R6.0 MVS Planning: Global Resource Serialization, SA22-7600-03 z/OS V1R6.0 MVS IPCS Commands, SA22-7594-05 z/OS V1R2.0 MVS IPCS User's Guide, SA22-7596-01 Installing your application serving environment, GA22-7957-03 z/OS V1R5 MVS System Commands, GC28-1781 TCP/IP V3.2 for MVS: Users Guide, SC13-7136 WebSphere Application Server for z/OS V6, Troubleshooting and support, GA22-7964-03 z/OS V1R6.0 MVS Setting Up a Sysplex, SA22-7625-10 z/OS V1R6.0 MVS Diagnosis: Tools and Service Aids, GA22-7589 z/OS V1R6.0 MVS System Commands, SA22-7627 Java 2 Technology Edition, Version 1.4.2, Diagnostics Guide, SC34-6358-01 z/OS Resource Measurement Facility Report Analysis, SC33-7991 z/OS Resource Measurement Facility User s Guide, SC33-7990 z/OS Resource Measurement Facility Performance Management Guide, SC33-7992 z/OS Planning: Workload Management, SA22-7602 WebSphere Studio Workload Simulator Users Guide, SC31-6307 WebSphere Studio Workload Simulator Getting Started, SC31-6383 Directing SYSPRINT Output to an HFS File in WebSphere for z/OS, TD101087 How can I put a local copy of the WebSphere Information Center on my workstation, FQ102912 Performance Engineering & Tuning for WebSphere V5 and V6 on z/OS, PRS804 Migrating from WebSphere for z/OS V5.x to V6 - An Example Migration, WP100559 Disabling the Deployment Manager HTTP Timeout, TD101703 Java program to test TCP/IP setup - InetInfo.java, TD100609 z/OS MVS System Management Facilities, SA22-7630 Performance Summary Report for SMF 120 records from WAS V.5 for z/OS, PRS752 Directing SYSPRINT Output to an HFS File in WebSphere for z/OS, TD101087
Online resources
Web sites with further information sources are mentioned in the relevant chapters.
322
Related publications
323
324
Index
A
abend 5, 50, 248249, 315 code 50, 52, 315 system abend 18 ABEND EC3 147 ABEND SCC3 147 ABEND SEC3 149, 158 abnormal end 50 access log 232, 234, 236 log sample 235 AccessLog 234 ad hoc utilities 255 address space 282 active 282 address space buffer 242 agent log 232 alphaWorks 33, 259 APAR 16, 20, 142, 147 Authorized Program Analysis Report 16, 20 APF 282 API documentation 33 APPL% 134, 137, 290 application environment 199 dynamic 142, 146 format trace data 242 Application Center Test 306 ARM 10 ASCII 21, 210, 221, 225, 278, 309 ASID 208209 ASTK 157 ASVT slot usage 282 Automatic Restart Management 143 AVG ENC 289
C
C/C++ 309 cache access log 232 capacity planning 127 CC3 52, 315 CEEDUMP 18, 214215, 249250 parameter 250 view 250 CERR 217 CGI error log 232 Change Log Detail levels 93, 229 checklist migration 145 coexistence 146, 149 problems 147 common storage orphaned 282 usage 282 communication prevent 205 problems 202 program 306 services 202 with IBM 14 Communications Server (TCP/IP) 315 component ID 19 trace 242, 279 compress large files 21 compressing data 21 configuration change root directory 148 error 237 information 281 message 216 problems 143, 147 Configuration tab 93, 229 connection cross memory 282 external 203 ID 203 inbound 203 console dump 247, 255 log 236 control region 214 abend 149 failure code 52, 315 migration 146 controller region 214 CPU 282 information 209 problem 16, 116 utilization 304 CPU Activity Report 286
B
BBO 215, 312 BBOC_HTTP_TRANSACTION_CLASS 123 BBORBLOG 216218 binary 21, 255256, 259, 309 bootstrap 265 BossLog 217 bottlenecks 270, 300 B-PLUS 306 Breakpoint Properties 269 breakpoints 266 buffer address space 242 core trace 255 exceeded 158 size and number 242
325
CPU report 285 crash 255 CTRACE 18, 242 how to set it up 242 setup 146 view with IPCS 242 with IPCS 280 Current Activity 270
D
DAE 248 daemon failure code 52, 315 group 149 job log 18 port collision 146 regions 214 DASD 282 Data Monitoring 270 data set allocated 282 compress 21 format 21, 148 permission 142 tersed 21 utilization 215 data source 10 databases 244 DB2 315 Administrator 10 hints 144 messages 315 DC3 52, 315 Debug 268 debug engine 269 hints and tips 31 levels 235 production environment 8 Debug Perspective 266 debugger remote 254 debugging 266 Debugging Service 267 Defining 15 delay 205 deployment application 145 phase 156 deployment manager 18, 163, 322 configuration 143 version 148 developer information 32 DeveloperWorks 16, 32 df command output 206 DFSMS ACS 148 diagnose lock and heap problems 259 system 4 diagnostic information 261
Diagnostic Trace Service 93, 228 disk full 207 space remaining 206 space usage 206207 Dispatch Time Data Effective 287 DISPLAY OMVS 209 sample 209 DNS Server 204 doc.jar 255 du command output 207 dump analysis 258 CEEDUMP 249 CTRACE 242 data set 18, 143, 247 display report panel 243 Dump Analysis and Elimination 247248 Dumpviewer 254, 259260 information 215 IPCS 242 resize 143 transaction 255 transfer 21 unformatted 255 dump utility 255, 258 parameters and options 256 shell script 257 dumpNameSpace 265 duplicate login configuration name 148 DVIPA 143
E
EAR file 157 EBCDIC 210 EC3 52, 315 edit remotely by FTP 309 edit traces, logs, configuration files 309 editors 306 education 36 EJB references 10 Enable Log 93, 228 enclave 135, 289 enqueue request 282 environment variables 216 error log 10, 18, 146, 232, 236, 314 BBORBLOG 216 problem 215 sample 233 server 233 view 217 message 5, 17, 234, 317 message flow chart 42 native code 41 runtime server 216 state 5
326
ErrorLog 233 Ethereal 277278 Example 8-14 DISPLAY WLM,DYNAPPL=* 200 exception FFDC 224 flow chart 42 Java 41 log 220 external writer 279
F
FAQs 31 Fast Response Cache Accelerator 232 FFDC 219220, 224226 example 225 exception 224226 ffdcRun.properties 220222, 224225 output and interpretation 221 set up 220 file 93, 228 compare 309 file system display 206 full 207 size 206 FindRoots 255, 258259 firewall 203 First Failure Data Capture 219, 221 flashes 31 Foreign Socket 203 Formatter 227 FRCA 232 FTP 306 data transfer 225 naming conventions 23 to IBM 21
display 206 edit remotely 309 environment 9 large data set 255 mount 144 permission 142, 144 plan 143 shared 142 Hierarchical File Structure (HFS) 142 hints and tips 3133 hit rate 119 host name 204, 276 HR204.jar 259 HSM request queues 282 HTML 309 editor 306, 308 HTTP 278 error 500 225 Plug-in 237 return code 235 Server 214, 236 logs and traces 232233 message 318 HTTPD 237 httpd.conf 233234, 237
I
IBM contact 23 Link 16 Support 15, 19, 226, 242 guide 35 IBM HTTP Server 128, 232 IFASMFDP 295 Incident/Support Case 19 InetInfo.java 276 Information Center 15 installation plan 142 problems 143, 147 interface device 280 Internet helpful pages 33 IP address 205 client 234235 dynamic virtual 143 filter 279 host 236 local 203, 276 name server 204 IP header 280 IPCS for CEEDUMP 250 for CTRACE 242 format trace 279280 reference 281 ISPF 282 configure log 217 message 316 reference 143 Index
G
garbage collection 124, 258260 garbage collection (GC) 300 gather background information 16 Global Performance Management Control 286
H
Handler 227 hang 255, 258 Hardware Management Console (HMC) 286 hardware specifications 8 heap dump 258 HeapRoots tool 254, 259 information 258 occupancy 255 portable dump 258 usage 259 hexadecimal 208209, 278, 308309 HFS directory 22
327
J
J2EE migration 145 server create 10 Jakarta Commons-Logging 227 Java API 255 edit 309 heap 156, 255, 259 messages 312 stack 259 turn on/off trace dynamically 201 Java garbage collection 262 Java garbage collection formatter 262 JCL 21, 144, 196, 215, 279 JDBC 244 JES EJES 197, 214 JESJCL 215 JESMSGLG 215 JESYSMSG 215 JES2 spool 214 jformat 259 job log 18, 21, 197, 214216, 235, 249 information 215 JOBLIB 22 JRas 227, 242243 JVM heap 254 Monitoring Interface 254 Profiling Interface 254 properties 215 JVM debug arguments 267 JVM debug port 267 JVM dump 255256, 258259 and heap analysis tools 254
loop 16, 116, 255, 258 LPA 146147, 247, 282 LPAR 282 cluster 288 LPARCE Capacity Estimator 288
M
markers 266 Master catalog 282 memory content 282 delete queue 282 leak 258 utilization 304 Memory buffer 93, 228 memory leak 136, 262 message 34 data 280 prefix 311, 315 returned 5 Microsoft Visual Studio .NET 306 Microsoft Web Application Stress Tool 306 migration 146 checklist 145 problems 144, 147 Version 3.5 SE to 5.1 145 Version 4.0.1 to 5.1 145 Version 5.0 to 5.1 144 minor code 314 MODIFY command 196 monitor 270 monitoring 254 Monitoring and Tuning 270 mount-point name 206 MQ Administrator 10 MustGather 47 Mustgather 16 MVS eXtended Information 281 MXI 281282
K
Kermit 306
L
Language Environment 250, 317 large EAR file 158 LDAP 10 libsvcdump.so 255 LLA 282 LNKLST 22, 282 Local Socket 203 log 232, 237238 job log and system log 214215 LOG_STREAM_NAME 218 LogLevel 238 stream 216218 Logger 227 logger 10, 216, 317 logging 227 Logging and tracing 93, 229 LookAt messages 34
N
name server lookup 204 namespace 265 naming conventions 23, 142 native stack 259 netstat 202203 command 203 sample output 203 network data packet 277 hops 205 packet analyzer 277, 279 capturer 279 topology 8 transport 276
328
O
object leak 255 OMVS 218 command tools 206 Open Perspective 268 OutOfMemoryError 255, 259
documentation 20 information 15 support resources 15 production environment 30, 151, 254, 305 profiling 254 programmers editor 308 programming issues 300 proxy 232 ps command sample 208 PTF 16 PThread ID 238
P
packet trace 279 Packlib 21 page elements 304 Page View Rate 119 paging activity 132 Parallel Sysplex 10 partition data report 285 PC routines 282 PD/PSI 4, 8 What PD/PSI is 4 PDF 36, 316 performance 270 analysis 126 configuration guidelines 128 CTRACE 242 expectations 126, 128, 130 FFDC 220 hints and tips 216, 236 HTTP plug-in log 238 monitoring 126 problems 4, 20, 254 report 295, 322 vv trace 235 WebSphere Studio Workload Simulator 304 Performance Viewer 270 perspective 266 Physical Management Time 287 ping 202, 205 PKZip 22 plug-in log 237 trace 238 plugin-cfg.xml 237 PMR 15, 19, 23 Investigating a PMR 19 port conflict 146147 Ports 265 PrintDomTree 258259 problem 177 PMR 1415, 19 Problem Management Record 14, 19 scenario 277 process information 206, 238 processing weights 288 product
Q
Quick-VAN 306
R
RACF 9, 17, 142, 316 Rational Application Developer 266 Redbooks 31 Redbooks Web site 323 Contact us xxi referer log 232 Release Notes 15 Remote Java Application 268 request header 237 Resolution Team 1920 resource Recovery Services 317 shared 225 Resource Measurement Facility (RMF) 283 Resource Selection 270 response time 5, 118, 304 distribution 292 expectations 136 objective 124 RETAIN REmote Technical Assistance Information Network 19 REXX 281 RMF CPU information 285 partition data report 286 Post Processor 284 reports 284 RRS 10, 196, 317 runtime state 50 Runtime tab 93, 229
S
SBBOSLIB 247 scenario 15, 225 SCLM 316 SDSF 21, 197, 208, 214 search 210, 309 security administrator 142
Index
329
servant region 214 server access log 232, 234 error log 232234, 236 log 235, 237 name 196 properties 216 region 18, 52, 206, 215, 315 trace 237 Server Region address space classification 124 Server Region enclave classification 122 ServerRoot 234 session affinity 237 state 203 severity level 18 SimpleFileServlet 299 skills 811, 14, 36 SLIP command 242, 249 SMF 294 Record Interpreter 295 SMP/E messages 317 problems 148 references 143 SMS classes 282 storage group 282 Socket State 203 Software defect 20 Maintenance 20 Problem Report 20 terminal emulator 306 spool space 242 spreadsheet 264 SSCHRT 289 stack information 279 stack trace 222, 224, 226 Started Task 196 state data 255 STD DEV 290 STEPLIB 146147 STORAGE 291 storage usage 282 stress tests 300 subsystem 282 summary report 285 support before contacting IBM 15 case 19 line 20 pages 26 SVC 282 dump 250, 255, 258259 in detail 247 interpretation 258 view 259 svcdump.jar 254255, 259
symptoms 45, 8, 15 synchronization 225226 SYSLOG 17, 214216, 248 SYSOUT 215, 217, 237 sysplex 282 Sysplex Distributor 143 SYSPRINT 22, 201, 215, 242, 308, 314 to HFS File 306, 308 view 308 system administration 8 log 17, 197, 214215 output 215 programming skills 9 programming staff 255 trace 255 System.out.println 227
T
tape unit 282 tar command 22 TCB 209 TCP header 280 TCP/IP checkout program 276 commands 10, 202, 281 component trace 279 CTRACE 242 DVIPA 143 messages 315 packet trace 276277, 279, 281 reference 281 references 143 skills 910 stack 279 test 277, 322 tools 202, 281 Techdocs 31, 277, 308 Technotes 16 TEK4010 emulation 306 telnet 306 TeraTerm Pro 306307 test environment 6 testing 266 text editor 308 Think Time 119 thread analysis 258 display command 208209 threads servant region 208 throughput 118 timeout 158 Tivoli Performance Viewer 202 Tivoli Performance Viewer 202, 270 token 43 trace 235, 261 back 215, 249
330
buffer 255 commands 201, 205 component 18 data 279280 exchange 21 format 279280 JVM 215 MODIFY 201 request 196 skills 10 start and stop 236, 242, 279 TCP/IP 276 to data set 242 variables 215 Trace Analyzer for WebSphere 261 tracert 202, 205 tracing 227 transaction 118 class 123 diagnose production environment 255 dump 255, 258259 throughput 254, 304 transfer files 211 Troubleshooting 93, 229 TRSMAIN 21, 23 TSO Command panel 202, 204205 TSO REXX processing 316 TSO/E 205, 316 tuning 4, 31, 313
W
Wait 316317 warning message 317 WASgrep.sh 206, 210 Web Application Stress tool 305 helpful pages 33 server plug-in 232 traffic 300 WebSphere commands 196 plug-in 237 Proof of Concept for z/OS 30, 151 Studio Workload Simulator 300, 305 Studio Workload Simulator 127 support structure 14 WebSphere Studio Workload Simulator 300, 304 whitepapers 16, 31, 33 WinPcap 278 Winzip 22 WLM 10, 282 commands 199 dynamic 142 information 209 maximum number of instances 121 messages 317 minimum number of instances 121 references 196 static 142 WLM queues 122 workload Simulator for z/OS 300 Workload Activity Report 137, 289 worksheet 143 wrong output 5 WS_FTP 306308 WSAdmin 10 wsdeploy tool 157 wsjaas.con 148
U
UltraEdit 306, 308309 unexpected condition 227 UNIX System Services 9, 22, 206, 317 threads 208 URI 239 URL error log 234 URI matching 237 user catalog 282 USS and OMVS command tools 206, 300 CTRACE 242 messages 317 problems 207 references 143 skills 10
X
XML 306, 309 parser 259 problems 259 references 259 tools 206, 210, 304 XMODEM 306 XSL 266
V
Verbose garbage collection 263 verbose GC trace 136 version 148 virtual host match 237, 239 VT100 emulation 306 VT200/300 emulation 306 vv trace 232, 235237
Z
z/OS external writer 279 z/OS Internet library 36 z/OS packet trace facility 276 ZIP file 22 ZMODEM 306
Index
331
332
Back cover