Thursday, May 8, 2014

Top and load average in Linux


It always practice to see top command output in the linux servers to identify the resource utilization.

We try to check load average and assume the performence of the system.

The top output looks like below

load average: 2.39, 1.70, 1.81

The same can be checked from 
$ cat  /proc/loadavg
2.70 2.45 2.13 1/450 6959

What are these 3 number?  how to analyze them ? What should be normal numbers?

The load average is computed based on CPU utilization, and includes the number of processes using or waiting to use the CPU,

The load average can be interpreted on a basic level as being a CPU core at full utilization has a system load average of one.

For a quad-core (4 core) machine, a system load average of 4 would mean that the machine had adequate resources to handle the work it needed to do, 

On the same quad-core system, a load average of 8 would mean that if the server had eight cores instead of four, 

It would have been able to handle the work, but it is now overloaded. 

So when you check load average, You also need to check the no of CPU cores on the server.

Use the below command to check the number of CPU cores on the server.

$ grep 'process' /proc/cpuinfo | wc -l
8

So the thumb rule is always load average should be less than the number of CPU (Cores), For 4 core server, Load average should be less than 4 always.


In case the system is showing high load average, but the CPU system and user utilization is low, it is time to start looking at IO wait.

IO wait shows up in system load on Linux because one or more of the cores is busy waiting on something having to do with either disk, or network input or output to finish before it can continue. 

Will post more on the IO waits and finding the IO issues in the next post.

class: ELFCLASS64 at /usr/lib64/perl5/XSLoader.pm line 70.



In R12 when we try to start the forms server,Below is the error encountered.

=============================================
*** Latest formsapp.ear has been deployed ***
=============================================


Program : /apps/EBSPROD/apps/apps_st/appl/fnd/12.0.0/patch/115/bin/txkChkFormsDeployment.pl completed @ Thu May  8 10:43:26 2014

Perl script txkChkFormsDeployment.pl got executed successfully



adformsctl.sh: exiting with status 0

adformsctl.sh: check the logfile /apps/EBSPROD/inst/apps/EBSPROD_dxbhoebzapp2/logs/appl/admin/log/adformsctl.txt for more information ...


.end std out.
Can't load '/usr/lib64/perl5/auto/Sys/Hostname/Hostname.so' for module Sys::Hostname: /usr/lib64/perl5/auto/Sys/Hostname/Hostname.so: wrong ELF class: ELFCLASS64 at /usr/lib64/perl5/XSLoader.pm line 70.
 at /usr/lib64/perl5/Sys/Hostname.pm line 23
*** ALL THE FOLLOWING FILES ARE REQUIRED FOR RESOLVING RUNTIME ERRORS
*** Log File = /apps/EBSPROD/inst/apps/EBSPROD_dxbhoebzapp2/logs/appl/rgf/TXK/txkChkFormsDeployment_Thu_May_8_10_43_25_2014/txkChkFormsDeployment_Thu_May_8_10_43_25_2014.log


FIX
===

Check below two varaibles in the CONTEXT_FILE, 
PERL5LIB
ADPERLPRG

They should point to  below variable, Please check the below

Mofiy them in CONTEXT_FILE as below

From 

  <ADPERLPRG oa_var="s_adperlprg" osd="unix">/usr/bin/perl</ADPERLPRG>

To 
         <ADPERLPRG oa_var="s_adperlprg" osd="unix">/apps/PROD/apps/tech_st/10.1.3/perl/bin/perl</ADPERLPRG>

From 

        <PERL5LIB oa_var="s_perl5lib" osd="LINUX_X86-64">/usr/local/lib64/perl5:/usr/local/share/perl5:/usr/lib64/perl5/vendor_perl:/usr/share/perl5/vendor_perl:/usr/lib64/perl5:/usr/share/perl5:.:/apps/PROD/apps/apps_st/appl/au/12.0.0/perl:/apps/PROD/apps/apps_st/appl/au/12.0.0/perl</PERL5LIB>

To 

         <PERL5LIB oa_var="s_perl5lib" osd="LINUX_X86-64">/apps/PROD/apps/tech_st/10.1.3/perl/lib/5.8.3:/apps/PROD/apps/tech_st/10.1.3/perl/lib/site_perl/5.8.3:/apps/PROD/apps/apps_st/appl/au/12.0.0/perl:/apps/PROD/apps/tech_st/10.1.3/Apache/Apache/mod_perl/lib/site_perl/5.8.3/i686-linux-thread-multi</PERL5LIB>


Thursday, April 17, 2014

How can we control a concurrent program to Run In Specific RAC/PCP node?

How can we control a concurrent program to Run In Specific RAC/PCP node?


There is an interesting feature in R12 for this to achieve.


In concurrent program define window , we can define which node and instance a concurrent program should connect and run, That is through concurrent program session control.

Navigation: System Administrator Responsibility > Concurrent > Program > Define > Click 'Session Control' 

You can define in this screen Target Node and Target Instance for that particler concurrent program. 

Irrespective of the managers running on the node and connected to which ever RAC instance, This definition makes sure the program is run in the specified RAC/PCP node.

What is Target Node 
------------------------

If you specify the target node on which requests for this program will run. When requests for this program are submitted, They run on this node if that is node is available, 

If no specification is made for the target node of a concurrent program, a request for it will be picked up by any manager available to run it. 

If a node specification is made for a concurrent program and the node is up, only available managers running on the specified node will pick up the request. 

What is Target Instance 
----------------------------

When requests for this program are submitted, they run on this database instance node If it is available.

If no specification is made for the target instance of a concurrent program, a request for it will be picked up by the first manager available to run it and will be run in the instance where the manager is already connected.

If an instance specification is made for a concurrent program and the instance is up, it will be picked up by the first manager available to run it and the manager will run the request in the specified instance. 

However, if the target RAC instance is down, the manager will run the request in the instance where it is already connected and log an appropriate message.

Wednesday, April 16, 2014

All About Oracle Parallel Concurrent Processing (PCP)


All About Oracle Parallel Concurrent Processing (PCP)


1) What is PCP

   - Parallel Concurrent Processing (PCP) is an extension of the Concurrent Processing architecture. 

   - PCP allows concurrent processing activities to be distributed across multiple nodes, maximizing throughput and providing resilience to node failure.

2) How to Configure Parallel Concurrent Processing (PCP)

  Below are steps to configure the PCP in Oracle Applications.

  A) Set Up PCP
  
  - Edit the applications context file via Oracle Applications Manager, and set the value of the variable APPLDCP to ON.

  - Execute AutoConfig by running the following command on all concurrent processing nodes:

  - $ $INST_TOP/admin/scripts/adautocfg.sh

  - Source the Applications environment.

  - Check the tnsnames.ora and listener.ora configuration files, located in $INST_TOP/ora/10.1.2/network/admin. Ensure that the required FNDSM and FNDFS entries are present for all other concurrent nodes.

  - Restart the Applications listener processes on each application tier node. 

  - Log on to Oracle E-Business Suite Release 12 using the SYSADMIN account, and choose the System Administrator Responsibility. Navigate to Install > Nodes screen, and ensure that each node in the cluster is registered.

  - Verify that the Internal Monitor for each node is defined properly, with correct primary node specification, and work shift details. For example, Internal Monitor: Host1 must have primary node as host1. Also ensure that the Internal Monitor manager is activated: this can be done from Concurrent > Manager > Administrator. 

  - Set the $APPLCSF environment variable on all the Concurrent Processing nodes to point to a log directory on a shared file system.

  - Set the $APPLPTMP environment variable on all the CP nodes to the value of the UTL_FILE_DIR entry in init.ora on the database nodes. (This value should be pointing to a directory on a shared file system.)

  - Set profile option 'Concurrent: PCP Instance Check' to OFF if database instance-sensitive failover is not required (In case of Non RAC Database). By setting it to 'ON', a concurrent manager will fail over to a secondary Application tier node if the database instance to which it is connected becomes unavailable for some reason.


  B) Set Up Transaction Managers  (Only R12)
  
  If you are already using the transnational managers and If you wish to have transnational managers fail over, Perform the below steps

  - Shut down the application services (servers) on all nodes

  - Shut down all the database instances cleanly in the Oracle RAC environment, using the command: 
  
  - SQL>shutdown immediate;
  
  - Edit the $ORACLE_HOME/dbs/<context_name>_ifile.ora and add the following parameters:
        _lm_global_posts=TRUE
        _immediate_commit_propagation=TRUE

  - Start the instances on all database nodes.

  - Start up the application services (servers) on all nodes.

  - Log on to Oracle E-Business Suite Release 12 using the SYSADMIN account, and choose the System Administrator responsibility. Navigate to Profile > System, change the profile option ‘Concurrent: TM Transport Type' to ‘QUEUE', and verify that the transaction manager works across the Oracle RAC instance.

  - Navigate to Concurrent > Manager > Define screen, and set up the primary and secondary node names for transaction managers.

  - Restart the concurrent managers.

  - If any of the transaction managers are in a deactivated status, activate them from Concurrent > Manager > Administrator.


 C) Set Up Load Balancing on Concurrent Processing Nodes (Only Applicable in case of RAC)

  If you wish to have PCP to use the load balancing capability of RAC, You can perform the below, Connections will load balanced using SID_BALANCE value and they will connect to all the RAC nodes.
  

  - Edit the applications context file through the Oracle Applications Manager interface, and set the value of Concurrent Manager TWO_TASK (s_cp_twotask) to the load balancing alias (<service_name>_balance>).

  - Execute AutoConfig by running $INST_TOP/admin/scripts/adautocfg.sh on all concurrent nodes.


3) Is RAC Mandatory to Implement PCP?

  - No, RAC is not manadatory for PCP, If you have two or more applications nodes, You can enable PCP, But PCP works better in conjunction with RAC to handle all the failover scenarious.


4) How PCP Works with RAC?

 - In RAC Enabled env, PCP uses cp_two_task env variable to connect to DB RAC node, This can be set one CM node to one RAC node or you can set to connect to all the RAC nodes in the cluster.

5) What happens when one of the RAC node goes down when PCP enabled?

 - When Concurrent: PCP Instance Check is set to ON and cp_two_task value set to SID (i.e One CM node connects to only one RAC node always), If one DB node goes down, PCP identifies the DB failure and shifts all the CM managers to other applications node where Database is available.


6)What happen when one of the PCP node goes down?

 - IMON identifies the failure and through FNDSM (service Manager) It initiates ICM to start in surviving node (If ICM is is running on Failed node), ICM will start all the managers.

7) What is primary and Secondary Nodes in PCP?

 - It is requirement to define the primary and secondary node to distribute load on the servers, If this is not defined,All the managers will start on the node where ICM is running by default.

8) How Fail Back happens in PCP?

 - Once failed node comes online, IMON detects and ICM will fail back all the managers defined on that node. 

9) What happens to requests running during failover in PCP?

 - It is important to note RAC and PCP does not support any DML commands and TAF and FAN are not supported with E-Bussiness Suite.
 - When a request is running, If CM goes down it is having status running normal and it will not have any associated process ID, When ICM start in other node, It     
   verifies for all the running normal requests and verifies the OS process ID, If it did not find the process ID, It will resubmit the request to start.

 -  This behavior is normal even in NON PCP env.

 - The Internal Concurrent Manager (ICM) will only restart a request if the following conditions are met

The ICM got the manager's database lock for the manager that was running the request
The phase of the request is "running" (phase_code = 'R')
The program for this request is set to "restart on failure"
All of the above requirements have been met AND at least one of the following:
         a.  The ICM is just starting up, (ie. it has just spawned on a given node and going through initial code before the main loop)
         b.  The node of the concurrent manager for which we got the lock is down
         c.  The database instance (TWO_TASK) defined for the node of that concurrent  manager is down (this is not applicable if one is using some "balance" @ TWO_TASK on that node)



10) How PCP identifies when node goes down?

  - There are two types of failures that PCP recognizes.

a.) Is the node pingable ? 
Issues an operating system ping on the machine name - timeout or available.

b.) Is the database available? 
Query on V$threads and V$instance for value of open or close.



 - When any of the two above failures occur, the following example will illustrate the failover and failback of managers.

Primary node = HOST1 - Managers assigned to primary node are ICM (FNDLIBR-cpmgr) , FNDCRM
Secondary node = HOST2 - Manager assigned to secondary node is STandard Manager (FNDLIBR)

When HOST1 becomes unavailable, both ICM and FNDCRM are migrated over to HOST2.
This is viewable from Administer Concurrent Manager form in System Administrator Responsibility.
The $APPLCSF/log/.mgr logfile will also reflect that HOST1 is being added to unavailable list.

On HOST2, after pmon cycle, FNDICM, FNDCRM, and FNDLIBR are now migrated and running.
(Note: FNDIMON and FNDSM run independently on each concurrent processing node. FNDSM
is not a persistent process, and FNDIMON is a persistent process local to each node)

Once HOST1 becomes available, FNDICM and FNDCRM are migrated back to the original primary 
node for successful failback.



In summary, in a successful fail over and failback scenario, all managers should failover to their secondary node, and once node or instance becomes available; then all managers should failback to primary node.




Sunday, January 5, 2014

Changing absolute path for tar command in Solaris OS

Changing absolute path for tar command in Solaris OS


When we use tar command in Linux, The / is trimmed, So when we restore from tar file , It will extract the files to directory we wish to to extract without overwriting.

In Solaris, tar command contains the absolute path, When we try to restore , It will restore to the absolute path as source.

This is some times very dangarous where it can overrite the source files or when you need to restore it in other locations, You need to have same file system as source.

To avoid this , Use the below.

$ tar tvf backup_apps.tar

-rwxr-xr-x 205/100  34260 Jan 18 11:14 2006 /tstdp1/prg/oracle/R12/apps/tech_st/10.1.3/jdk/demo/plugin/jfc/Notepad/Notepad.jar
-rwxr-xr-x 205/100    339 Jan 18 11:14 2006 /tstdp1/prg/oracle/R12/apps/tech_st/10.1.3/jdk/demo/plugin/jfc/Notepad/README.txt
drwxrwx--- 205/100      0 Jan  2 21:17 2007 /tstdp1/prg/oracle/R12/apps/tech_st/10.1.3/jdk/demo/plugin/jfc/TableExample/
-rwxr-xr-x 205/100  22353 Jan 18 11:14 2006 /tstdp1/prg/oracle/R12/apps/tech_st/10.1.3/jdk/demo/plugin/jfc/TableExample/TableExample.jar
drwxrwx--- 205/100      0 Jan  2 21:17 2007 /tstdp1/prg/oracle/R12/apps/tech_st/10.1.3/jdk/demo/plugin/jfc/TableExample/src/
-rwxr-xr-x 205/100   8613 Jan 18 11:14 2006 /tstdp1/prg/oracle/R12/apps/tech_st/10.1.3/jdk/demo/plugin/jfc/TableExample/src/TableExample.java
-rwxr-xr-x 205/100   8790 Jan 18 11:14 2006 /tstdp1/prg/oracle/R12/apps/tech_st/10.1.3/jdk/demo/plugin/jfc/TableExample/src/JDBCAdapter.java
-rwxr-xr-x 205/100   3293 Jan 18 11:14 2006 /tstdp1/prg/oracle/R12/apps/tech_st/10.1.3/jdk/demo/plugin/jfc/TableExample/src/TableExample2.java
-rwxr-xr-x 205/100   5708 Jan 18 11:14 2006 /tstdp1/prg/oracle/R12/apps/tech_st/10.1.3/jdk/demo/plugin/jfc/TableExample/src/TableExample3.java
-rwxr-xr-x 205/100   9339 Jan 18 11:14 2006 /tstdp1/prg/oracle/R12/apps/tech_st/10.1.3/jdk/demo/plugin/jfc/TableExample/src/OldJTable.java


-- If you extract using tar -xvf backup_apps.tar command, It will restore to original directory structure if available or errors out if not avaibale.


The work around for this problem is , Use command pax to extract the files in solaris instead of tar -xvf command.


pax -r -s '=^/tstdp1/prg/oracle/R12=/usr/R12=' < backup_apps.tar


Now it will extract to  /usr/R12 folder. 

$ cd /usr/R12

$ls -ltr 
drwxr-xr-x   4 oracle   oinstall       4 Jan  5 15:01 apps