DevOps is a hot IT buzzword that everyone is talking about these days.  We had SaaS, Cloud, SDE and tons of other catchy titles.  Perhaps I’m just becoming an IT curmudgeon but it seems to me that all these new ways of thinking are really a recycled old way of thinking with a new buzzword attached.   The “Cloud” is what us old timers would call a network attached Service Bureau.  It all runs in a real data center somewhere.

DevOps is a philosophy for getting software development, quality assurance, and operations personal to collaborate on delivering a continuous stream of software updates to a business entity in a rapid manner.  This is pretty much what we called “business as usual” four decades ago when I was working at my first applications programming job at a bank.  The programmers, or system analysis, would work with the bank’s internal departments to determine their exact requirements for software updates.  The programmers were assigned a project to implement the designed change and when it was completed and tested to the requesting department’s satisfaction, the change was quickly moved into production through the operations department.  Each change moved at its own natural pace depending on the complexity but when it was complete it went into production immediately.


Then came the era of Change Control and ITIL (formerly known as the Information Technology Infrastructure Library).  These methodologies pit the development and operations organizations against each other.  The goal of the development group is to provide timely software updates for the business but the goal of the IT operations group is to provide the most stable and reliable production workload possible.  Development can’t complete its projects until changes reach production status and yet in the name of stability operations only wants to accept updates once per quarter.
 

Now comes DevOps to the rescue.  Its goal is to incentivize personal in all phases of IT, to work together in harmony to move software updates from development, into QA or acceptance testing and finally into production in an agile, lean and fast manner.  Do you think that is even possible?  Most of the DevOps literature I have read goes to great lengths to describe how to transform your entire IT staff culture into the DevOps way of thinking to achieve a more agile business process.  I’m sure with enough time and money and the hiring of new recruits it can be a successful undertaking.

So, what’s the right business or industry in which to use DevOps?  I personally don’t see a place for it in the commercial ISV industry.  We typically distribute software product updates in batches with many enhancements, some small and some large, all wrapped-up in a “Release”.  Depending on the ISV, some releases may be years in development.  I do know at least one ISV company that labels incremental updates with a sequential “build number” instead of a release number.  That enables them to roll out small updates to users that require a change without waiting for a full release distribution cycle.  Is this a form of DevOps for ISVs?  While this could work for a small organization I doubt it is feasible for large products with more than a handful of developers working on it.

That pretty much leaves DevOps to large businesses that employ their own IT staff to create custom applications.  For proper implementation DevOps requires a QA Test group or operations Acceptance Test group, which most smaller businesses won’t have.  For these major corporation size businesses a DevOps implementation could be very beneficial in terms of making the entire business more agile and more responsive to changing market conditions.  This could be especially true when producing mobile or cloud applications.


There is one thing for certain.  Implementing DevOps will provide a major reason to acquire new software tools for the IT folks.  From IDEs to assist in development and testing, to project planning and tracking to change control and operations scheduling tools.  There will be plenty of opportunity to acquire a whole new set of tools to go along with the new business philosophy.  One (cynical old timer) might wonder if that was the original idea behind DevOps, to boost tool software sales.

Whether or not you implement DevOps, one tool that can dramatically increase the productivity of a z/OS developer is a software debugger.  Our Trap Diagnostic Facility (TDF) product can bring even the most stubborn bugs to the light in a fraction of the time they would take to locate using other techniques.  You can find more information about TDF here.  

If you are implementing DevOps in a mainframe environment please leave a comment on this blog post and let us know what industry it is for and what made you decide to go that route.  Or if you have looked into using DevOps and decided to not use it, please leave a comment and let us know what the deciding factors were for you.     

 
 
Happy New Year to all our friends and future friends.  May 2015 be a great year for the Mainframe, for z/OS and for Software Development and Testing.
 
 
We’ve heard this on a few occasions.  Actually, I’ve been guilty of it myself.
 
While it’s not impossible, it’s very unlikely a debugger such as our Trap Diagnostic Facility (TDF) product, will create a problem that appears to be an error in the software being debugged.  TDF has one of the most robust recovery systems you’re likely to find in the System z world.  Depending on option settings it may be very verbose in describing an abend in our code, or it may be completely silent, but in either case it uses abend retry routines to insulate the user application from the debugger.  A problem in TDF will not generally show up as a user program error.  

It is easy to blame the debugger when a piece of code that normally runs fine encounters errors when run in debug mode.  It seems obvious the debugger caused a problem that did not previously exist until an attempt is made to debug a different problem.  One hint to the cause of the problem is typically the environment.  When this notion is put forth it is usually a multi-tasking environment rather than a simple single load module application.  

That being said, you probably have a good idea of where this is going.  But, you don’t think YOUR code will be caught in this trap until it actually occurs.  For me, it was in a multi-tasking system that had been running for years in a good number of production systems across the globe.  In those years no problems were seen during the start-up initialization process or shutdown processing.  However, when started up in debug mode problems showed up almost immediately.  We’ve seen the same scenario occur in other ISV software products so we know this is not an isolated instance.  

To be more precise, what we found in these situations is errors in the sequence of processing and interactions between tasks.  This most often occurs during initialization or termination of an environment but it could happen at any time.  Usually you find one program handling initialization of the environment creating control blocks and data areas and attaching other tasks.  Everything works fine when run at normal speeds but when a modern debugger is introduced into the mix, one or more tasks can be stopped by the debugger placing them into a diagnostic pause, while other tasks continue to operate at normal speed.  The problem is caused by attaching a task that uses a data area before the attaching task, or another task, has initialized it. Usually there is sufficient time for the data area to be initialized before it is required by the other task(s), but if the initializing task is paused it throws the normal timing all out of whack. The same sort of thing can occur during termination with one task releasing a resource while it is still being used by other tasks.   

This sequence can expose problems that have existed for long periods of time but have never come to light, so your first reaction is to blame the debugger.  Hmmm…Is it really a problem in the target program if it never shows up without the debugger?  Well, yes it is.  Just because the problem has never surfaced before, doesn’t mean it’s the debugger’s fault.  The application code has to be corrected to fix it.  The problem may not show up under normal circumstances but it’s still worth spending the time to fix it.  It really is a bug in the application and there could be other unknown situations that could occur in the future to alter the execution timing that will expose the bug.  You just never know!
 
 
This is a question we hear asked fairly often.  And the answer is a qualified, Yes…Yes you can as long as you are running release 1.13 or greater of z/OS.  And then the more important question comes, …SHOULD I be writing my code to run RMODE 64?  And, that one is a bit harder to answer.

Perhaps right now is not the time to undertake a project to convert an existing product or application to RMODE 64 but you might want to consider the following information to help you make that decision.  If you ask IBM this question (Disclaimer: the following is my belief only.  I do not speak for IBM or even claim to know their most recent position) they will likely tell you it’s not a good idea at this time.  I believe this is mostly because they don’t yet have their diagnostic and management facilities ready for the job and most of the very basic tools for interfacing a piece of software to z/OS is lacking.  This is not a criticism of IBM but simply a fact of evolution of the hardware and software supporting it.  We’re sure new features will come with time. 

Very early in the design of our debugging product (Trap Diagnostic Facility), we decided that since the System z hardware supports it, this is something we want to tackle just as soon as the operating system was able to allow it.  After all, TDF has to support debugging RMODE 64 code so we need to gain all the experience we can on the matter.  The result is that currently a large percentage of TDF code resides above the 2gb bar when it can.  Today that’s over 1mb of code executing up there!  So, it is definitely possible and it’s a lot of fun but only you can decide if it’s right for you.  Some of the following may help with your decision.  

Most operating system services do not support being called in AMODE 64 let alone support 64-bit addresses as input and output.  As of z/OS 2.1 there are about 42 services that can be called in AMODE 64 but cannot accept 64-bit parameter addresses.  There are a very few services, related specifically to 64-bit addressing that do support 64-bit parameters.  These include services such as IARV64, IARVST64 and IARCP64.  All services supporting AMODE 64 are requested by either an SVC or PC.  There are no branch entry services that can be called in AMODE 64.  In general you should assume that all services cannot be called in AMODE 64 or be passed 64-bit addresses until you check the manuals for their allowed environments and parameters.

One of the biggest problems with making your programs reside in 64-bit storage is actually getting them there and managing their use after you do.  Content Supervision and other components of z/OS are largely not ready for RMODE 64 programs.  The Binder does not support an RMODE 64 specification but that really doesn’t matter since “normal” loads can’t be used anyway.  If you write code like we do and always create self-relocating programs then the easy way to get your code into 64-bit storage is to actually load it in below-the-bar storage and move it, simply using the MVCL instruction, to the new memory object location which you previously allocated.  Don’t forget to delete the loaded version.  For programs that are not self-relocating, your only option to get the system to load your code is to use the LOAD macro with some specific parameters.  To get LOAD to make your code RMODE 64, you must perform a directed load using the DCB parameter along with the ADDR64 or the ADRNAPF64 parameter.  Of course, using either of these options the system will not have any record of your program being loaded into storage so the management of the code is all up to you.  You must make the address of the program available to its callers and manage any requirements for removing the program from storage or reloading newer versions.  You might also want to provide some assistance in resolving abends related to the code.  Since z/OS has no knowledge of the program, it won’t be much help when problems occur.  We provide a number of tools in TDF to assist us in problem resolution.  Operator commands to display program locations or to resolve an address to a program + displacement come in very handy when identifying problems.      

As far as coding an RMODE 64 program, it’s not really much different in creating your own logic.  Since you are doing the code you can design it to create and accept 64-bit addresses and pass them between your programs.  The problems occur when you need to interface with the outside world.  You have to develop specialized techniques to deal with non-64-bit service providers.  It’s not as simple as just calling them in their expected AMODE because when your code is RMODE 64 you cannot switch out of AMODE 64 and expect the next instruction to complete successfully.  If the called service saves the caller’s AMODE and full 64-bit registers and returns with a mode switching instruction such as BSM or PR you can successfully call the service using a mode saving/switching instruction such as BASSM while passing only 32-bit addresses as parameters.  However, if the called service uses older style program linkage such as returning to the caller via a BR instruction or it only saves/restores 32-bit registers, you will have to create an interfacing routine that resides below the 2gb bar.  Using your own code you can BASSM to a 31-bit routine which will take care of calling the desired service and returning to the 64-bit code via an appropriate mode switching instruction.  The interface routine could also take care of relocating parameters for the service call, if necessary.  One technique we developed for this purpose is to dynamically build a small piece of routing code, placed in a common location within a below-the-bar work area.  Dynamically building the small routine, calling it and letting it call the service before returning to the RMODE 64 code allows us to perform calls such as requesting ISPF services without having to have specific purpose routines residing below-the-bar.  One coding problem you do still have to watch for is the location of any passed parameters.  In calling services such as ISPF it is common to use literals in the calling code.  Obviously that does not work well when the literal pool resides in 64-bit storage.  You still have to be mindful of parameter locations.  Another issue to be aware of when coding is that the high word of 64-bit registers can easily become non-zero.  When running RMODE 31 code in AMODE 64 the high word of registers generally remain clear (zero) unless you purposely intend otherwise.   However, move that same AMODE 64 code into 64-bit storage and you will find the high word of registers being set by instructions like LA, LARL and even linkage instructions like BRAS.  You will quickly find yourself replacing instructions like L with LLGT or LGF to clear that high word.  

And, to wrap this up, a caution about handling any type of system exit.  Anytime you define a piece of code that will be called by the operating system it must not be RMODE 64.  This includes RTM Recovery Routines.  Frequently exits such as ESTAE’s are coded in the same source module as the code it protects.  If you intend to make that module RMODE 64 you will have to make other arrangements to leave those exits in RMODE 31 storage.  Often in these circumstances you can define a small stub program which resides below-the-bar as the exit and then have that code simply call the real RMODE 64 code.  However we found it easier to just leave our recovery code in RMODE 31 storage.  Something else to be aware of is that PC routines currently cannot be defined above-the-bar.  Since the System z architecture does support 64-bit addresses for PCs, we can hope for this support some day from z/OS but it’s not here yet.  We typically define one of those stub routines as the PC routine and have it BASSM to the real code above-the-bar.

This post is already past being long so I’ll end it here.  There are additional benefits to making your code RMODE 64 (think about using a Common memory object for your code to share it across all address spaces), and additional drawbacks but hopefully after considering the presented topics you will be well informed when it comes time to make the decision about your code.  All serious developers need a debugger regardless of where their code resides.  However, when debugging RMODE 64 code there are even less system provided diagnostic facilities available to you.  TDF fully supports the debugging of RMODE 64 code and will make moving your code much easier.

 
 
This post is a continuation of our previous blog post titled Planned Debugging versus On-Demand Debugging.  It will discuss several methods that can be used to accomplish a Dynamic Hook into an address space that is already up and running without having a debugger initialized.  

TDF provides three different techniques you can use to effect a Dynamic Hook in an address space.  We use the term Dynamic Hook to mean the process of initializing a Dispatchable Unit of work (DU) for debugging.  The initialization process starts up the debugger in the address space, adds the DU to the TDF Connect Queue so that it’s available for connection from a User Interface (UI) session, and places the DU into a diagnostic pause state.  It’s Dynamic because the debug session was not planned when the application was started. The term Hook is used because a program instruction in storage is overlaid with a different instruction in order for the debugger to gain control without the original program’s knowledge.   As you will see later, one of our Dynamic options is not a true hook but the other two are.  The TDF hook that is used to gain control is a PC instruction that transfers the DU execution to a PC service of the TDF Server.  

That first Dynamic Hook facility we delivered (see previous blog post) is known as a Breakpoint and Hook and it is defined with a BH or BHP command depending on the desired use.  If the hook is to remain in place after it is triggered, use the BHP persistent command.  Otherwise, the BH command specifies a one-time hook that is removed when the first trigger occurs.  Before the command can be entered, a UI session must be started.  Since the debugger is not initialized in the target address space, the TDF Peek mode can be used on the address space.  The code that is to be hooked can be either local to the address space or it can reside in common storage.  If the code is in common, the Peek session can be started using any address space but if the code is within the address space you should Peek that address space.  

Once the Peek session is established and the instruction to be hooked is located, issue the UI BH or BHP command to define the hook.  In addition to the hook address, the command allows you to optionally specify the job name and/or task name of the DU(s) to be hooked.  The names can be entered directly or a ? can be used for either the JOB or TASK parameters, in which case you will receive a list of the currently running candidates so you can Select the desired entry.  With the job/task name specifications the hook is not triggered simply by the first DU that executes the code.  The trigger is not complete until the DU name specifications are a match to the executing DU.  Un-matched DUs do not trigger the hook and they continue execution as normal just as if the hook PC instruction was not there.  This is made possible by the TDF Pass-through facility (maybe there should be a blog post on that topic some day!).  If a triggered hook is defined as persistent, the hook instruction is left in place where it could trigger debugging for other jobs/tasks.  Otherwise the hook instruction is removed and the original instruction(s) are restored.  Care must be exercised when using persistent hooks to ensure undesired DUs are not trapped.  Stopping a huge number of DUs or even just the wrong DUs could put the health of your system in jeopardy.  

The second type of Dynamic Hook supported by TDF is slightly different from the Breakpoint and Hook but it has the same basic operation.  It is actually a combination of a BH and a TDF redirection Identify.  The TDF Identify facility is a way for the developer to define special requirements for a piece of code.  There are several types of Identifies for different purposes but this one is called the System Code Identify.  This technique is useful for debugging common shared code that resides in page protected storage.  A private copy of the code is created for the triggering DU, in non-page protected storage so that it can be debugged as normal. Tasks not matching the job/task filter pass-through the hook placed in the page protected code as if the hook was not present and the hooked DU can be debugged using the copied version of the code.  

The final type of Dynamic Hook supported by TDF takes a completely different approach than the previous two.  This one is performed at the specific DU level.  Rather than placing a hook at an instruction address and waiting for a DU to execute the instruction, this technique targets a specific TCB with a z/OS Interrupt Request Block (IRB).  After a Peek session is established with the address space the TDF UI command D JOBEXTR is issued to display a list of address space control blocks.  The D line command can be entered on any TCB within the address space to place the DU in debug mode.  An IRB is scheduled for the TCB and when executed, the TCB is added to the TDF Connect Queue.  After connecting to the DU with a UI session, breakpoints can be set or other debug commands can be used to perform the needed actions on the DU.  

…And that’s how you debug a running address space!

 
 
Don’t expect me to tell you, in this blog, every time we screw-up.  None of us have the time for all that!  But this one time I will, just because it makes a good story.

In the early days of our debugging product, just after the first release became available, we were busy talking to developers at a number of well-known software companies.  We believed we had a well thought out, excellently designed product that was ready for the street.  After the initial installation and a couple of days of testing at one company we were discussing their first impressions and their list of desired enhancements (there always seems to be a list).  Then someone said “How do we start a debug session with an already running job?”.   Our answer was: …What? Why would you want to do that?  You always plan a debug session and start-up your job with the debugger initialized.  Right?  It was immediately clear to us we had missed something in the system design if they really did mean they wanted to do that.  And they did…  

This was a case of blindness brought on by living in our development labs for decades with only short peeks at the real world now and then.  We know what everyone needs and we design our software to provide for that need.  After all, we are always debugging code and this is a debugger product.   But…sometimes the real world has ideas that run contrary to the experience of a few good developers.  We expected everyone to work like we do.  When you need to debug a software problem you sit and think about the problem and the software flow and you make a plan of attack.  Maybe it’s just in your head and not a written plan but it is a plan none the less.  Then you start-up your test job, initialized with the debugger and you go at it.  

Well, it seems the real world sometimes has other ideas.  When something goes wrong with an already running application and it starts doing unexpected things, some folks would like to be able to start a dynamic debug session right there on the spot without any prior planning.  Err, well that would be a cool thing to do alright.  Start a debug session in a running application to look at the code path being executed, or storage buffers or whatever is appropriate for the situation without having to plan a debug session and try and reproduce the same situation under the controlled environment.  

Because TDF has a well architected modular design we were able to add a Dynamic Hook facility into the system fairly quickly.  By making good use of the already existing Peek facility, we were able to deliver a very flexible solution to the customer within just a few weeks in the form of our Breakpoint and Hook technology (BH or BHP commands).  

In addition to the usual planned debugging and on-demand debugging the Breakpoint and Hook technology is also useful to create a combination of the two techniques.  When a part of a complex application is planned to be debugged but nothing in the application initialization needs to be debugged, a long running or Server type job can be started up as normal and then, after all setup is complete, a Dynamic Hook can be used to start the debugging session at a desired code point.  This planned Dynamic Hook technique can be a time saver in the right circumstances.  

This subject will be continued in our next blog post where we will describe three different techniques that can be used to perform a Dynamic Hook with TDF.
 
 
Do you know what the TRAP2 and TRAP4 instructions do and how they work?  If not we’ll be happy to tell you why we thought enough of them to name our product after the TRAP Facility.  In two words, it is fast and flexible.

First some basics.  The IBM System z architecture includes two special instructions designed to reroute execution to a designated routine.  Rumor has it these instructions came about as a result of the Y2K date problem so that systems could be written to dynamically gain control on date oriented instructions for testing and simulation of future dates in advance of the century roll-over.  The TRAP2 and TRAP4 instructions are two and four bytes in length respectively so they can be planted on top of existing instructions. The System z Architecture defines the Dispatchable Unit Control Table (DUCT) control block, which contains items related to DU control functions. The DUCT contains the address of the Trap Control Block, another System z architected structure. The Trap Control Block contains the information and save areas necessary to handle the TRAP instructions, including the Trap Routine address.

With these control structures properly set up, the execution of a TRAP instruction results in the hardware routing control to the Trap Routine. This processing occurs at the individual DU level; thus some DUs within an address space might be initialized for Trap processing, while others are not. If a TRAP instruction is encountered by a DU that is not set up for Trap processing, a special operation exception is recognized, resulting in an S0D3 z/OS system ABEND.

The most important benefit of using the TRAP Facility is its flexibility.  A TRAP instruction can be executed in TCB or SRB mode anytime the CPU is in the primary-space (SAC 0) or access-register (SAC 512) addressing mode which is almost all the time for typical software.  TRAP is not allowed in the real, home-space (SAC 768) or secondary-space (SAC 256) addressing mode.  These modes are typically only used by the control program, or very special code for a short duration.  This makes using the TRAP Facility a very good choice for a code debugger.  Unlike using RTM recovery exits (SPIE, ESPIE, STAE, ESTAE, ESTAEX, FRR), the TRAP Facility is not subject to the rules of the control program.  z/OS has a number of rules or restrictions on the environmental conditions that must exist before an RTM exit can be used.  The developer must be aware of the environment and must know which type of RTM exit can be used in that environment before knowing how to invoke an RTM exit-based debugger, making setup more difficult.  Using TRAP also makes debugging a recovery routine just as easy as any other code.

The last attribute we mentioned for the TRAP Facility was speed.  This is again related to being a hardware facility rather than a control program supported software facility.  The context switch performed by a TRAP is completely handled by the hardware without any involvement of z/OS.  When using an RTM exit-based debugger, program interrupts are used to gain control.  When the program interrupt occurs, the operating system has to get involved to handle the first level interrupt and then to dispatch the proper recovery routine to handle the problem.  All using CPU cycles.  To be honest, increased CPU overhead is not usually a problem when performing interactive debugging because there is so much wait time involved, but it is a concern when performing non-interactive debugging such as using TDF’s High Speed Dynamic Trace facility.

Happy debugging till next time…

 

Welcome!

08/28/2014

 
What, another Blog?  About z/OS software debugging?  Right on both counts!  I’m as surprised as you are.  I never thought we would be blogging about what we do but as it turns out, a blog can be a great, informal way to convey information to an interested group of people that choose to pay attention.  There are a number of mainframe blogs that already exist, created by folks both inside and outside IBM, some being more active than others.  My old friend Ray even has one that focuses mostly on software debugging, but we intend to focus exclusively on software debugging techniques and facilities for z/OS and even more, directly geared toward the development of commercial ISV software.   

What we plan to do here is discuss tips, techniques and products for debugging software on IBM System z computers running the z/OS operating system.  But, let’s be honest, we all know what we really want to do is talk about our software debugger product named Trap Diagnostic Facility (TDF).  Perhaps some of the information presented will be generic enough to be applied when using other debugger products but we will, of course, be highlighting tips and techniques that can be used with TDF to debug interesting scenarios or difficult problems.   This should help existing users learn how to use the product more efficiently and hopefully the topics will be interesting enough to hold the attention of users of other debugging products for this platform.  

We definitely encourage the use of comments for each blog post.  We are open to discussion on any interesting topic really.  If you want to discuss a post, please feel free to use the comment facility.  The comments will be approved before they will show-up on the web page, just so we can keep some of those Internet crazies at bay, but we expect the approval process to happen quickly anytime a comment is submitted.  If you have a topic you would like to see us cover please feel free to email suggestions to us at  info@arneycomputer.com.  Please come back often to check for new discussions or subscribe to our RSS Feed to receive the content automatically.

We have public pages available on both Facebook and Linkedin.  Please Like or Follow either or both of those to keep up with our constant state of change.  If you just do Twitter, we are @ArneyComputerSy over there.

Thanks for playing along!