Tuesday, June 7, 2016

Never,ever use LDAP for the RAC install user and sysdba user

Oracle's recommandation is to have a local dba and oinstall user group.
If there are cluster admins and DBAs, then using a cluster user for the cluster installation is best.
Otherwise, it's possible to use one unique user (generally oracle) and one group (generally DBA) for the cluster and database installation.

In any case, always make sure that the groups and users are local users and groups.
check /etc/groups, the groups should be there.

Failing this, it may be possible to install the RAC cluster and the database the first time, but when the time comes to patch the system, and the cluster is restarted, the ASM user may not be able to authenticated and the patch will fail.

I have found myself in this very real situation in pre-prod environment because our SAs did not follow our recommendations and ended up in a 13 hour marathon with a complete re-install of the clusterware.

The good news is, there is no data loss. The ASM disks were still available, and found when the cluster was back online. We had to add the diskgroups back and it was quite a bit of work.

Just use a local user and groups.

How to read an AWR report - How to generate an AWR report with sqlplus?

Generate AWR report with sqlplus:

Connect to sqlplus and run the following command:
sqlplus  @$ORACLE_HOME/rdbms/admin/awrrpt.sql

A few information will be displayed regarding the system.

AWR reports generation works with snapshot ids, so to know that information, we are prompted for a number of days to display them. If there was a problem yesterday, mention 2 days for more snap id:

Specify the number of days of snapshots to choose from
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Entering the number of days (n) will result in the most recent
(n) days of snapshots being listed.  Pressing  without
specifying a number lists all completed snapshots.


Enter value for num_days: 1

Listing the last day's Completed Snapshots

                                                        Snap
Instance     DB Name        Snap Id    Snap Started    Level
------------ ------------ --------- ------------------ -----
dw02         DW02             29150 03 Jun 2016 00:00      1
                              29151 03 Jun 2016 00:30      1
                              29152 03 Jun 2016 01:00      1
                              29153 03 Jun 2016 01:30      1
                              29154 03 Jun 2016 02:00      1
                              29155 03 Jun 2016 02:30      1
                              29156 03 Jun 2016 03:00      1
                              29157 03 Jun 2016 03:30      1
                              29158 03 Jun 2016 04:00      1
                              29159 03 Jun 2016 04:30      1
                              29160 03 Jun 2016 05:00      1
                              29161 03 Jun 2016 05:30      1
                              29162 03 Jun 2016 06:00      1
                              29163 03 Jun 2016 06:30      1
                              29164 03 Jun 2016 07:00      1
                              29165 03 Jun 2016 07:30      1
                              29166 03 Jun 2016 08:00      1
                              29167 03 Jun 2016 08:30      1
                              29168 03 Jun 2016 09:00      1

Specify the Begin and End Snapshot Ids
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Enter value for begin_snap: 29167
Begin Snapshot Id specified: 29167

Enter value for end_snap: 29168
End   Snapshot Id specified: 29168

Finally, the script prompts for the report name. Do mention a full path, otherwise it will get generated in the current working directory which will be $ORACLE_HOME/rdbms/admin. Additionally, snapids become quickly meaningless when looking if there's a need to search for that report again a few days later. I prefer a format with a full date start time and end time.

Specify the Report Name
~~~~~~~~~~~~~~~~~~~~~~~
The default report file name is awrrpt_1_29167_29168.html.  To use this name,
press  to continue, otherwise enter an alternative.

Enter value for report_name: /home/oracle/awrrpt_20160603_0830_0900.html

Following this, the report is generated and we can look at it.

Monday, June 6, 2016

How to read an AWR report

I want to show case how to read an AWR report with real world database issues.
The AWR I will display come from a production environment which I can display because there is no secret information, with the exception of the host which I will blur and SQLs which I will not show.

Because an AWR report is really long and complete, I will break it down into different posts showcasing:

  • How to generate an AWR report?
    • Using sqlplus
    • Using OEM
  • A first glance at AWR report - What to look for?
  • Identify bottlenecks
  • Memory issues
  • IO issues
  • CPU issues
All problems have many facets to them. For instance, if a server appears to be suffering from too much IO, is it because the SAN storage has reached its capacity, or is it because a new workload has caused significant increase in IO? Or is it because the system is CPU bound?


I will show how to use an AWR report to identify these issues.

Wednesday, June 1, 2016

How to pin a plan to a sql_id with dbms_spm

Important: This package is not RAC aware, so running LOAD_PLANS_FROM_CURSOR_CACHE on the wrong node means it won't load. use gv$sql, identify on which node it's running by checking column inst_id and logon to that specific node to complete the procedure below.

This procedure is very useful, because it allows the association of a better plan, maybe obtained using a hint, typically after migration from a version to another, say using alter session optimizer_features_enable='11.2.0.4' to a SQL which is not performing as well as it was before in the new version.

Identify the sql_id and the bad plan

Every SQL commands are assigned a sql_id, this sql_id using a hash function based on the body of the SQL text. This includes formatting and comments.
So a SQL can perform the exact same task, a developer may call it the "same" query, but if it's written with an extra carriage return, it has a different sql_id.

So make sure that the sql_id you are looking for is the one coming from the application.

These views will help with the task:
  • v$sql and gv$sql in RAC.
  • dba_hist_sqltext - If the query can't be found in v$sql
  • dba_hist_sqlstat to have some history. Here's a query using dba_hist_sqlstat

Load the bad plan

declare
l_pls PLS_INTEGER;
begin
l_pls :=
DBMS_SPM.LOAD_PLANS_FROM_CURSOR_CACHE (  sql_id=>'', plan_hash_value =>,fixed =>'NO',enabled=>'NO');
dbms_output.put_line(l_pls);
end;
/
Once loaded, obtain the sql_handle from dba_sql_plan_baselines.
select *
from dba_sql_plan_baselines
order by created desc;

Now associate the good plan (plan_hash_value) with the sql_id/sql_handle for the sql_id obtained above. 

The good plan generally comes from a different source, possibly a developer working on that SQL with hints or alter session statements or simply the plan changed for worse and a better plan exists in the AWR views. For example running a query with explain plan.




declare
l_pls PLS_INTEGER;
begin
l_pls :=
dbms_spm.load_plans_from_cursor_cache(  sql_id=>'',plan_hash_value=> 1509555798, sql_handle =>'');
end;
/