Tuesday, 8 January 2013

Oracle Questions-Part-2[Analytical functions]

Analytical Functions

Q. Some basic analytical functions for interview preparation.

I am trying to describe the use of analytical function in a simple way, here with some example, Few analytical functions are as follows:   
  • Rank
  • Dense_rank
  • Lead and Lag
  • First_Value
  • Listagg
  • Cube
  • Rollup
  • Connect by Prior
Pre-requisit 
  • What is over() clause do? - OVER() is a mandatory key word,it defines the scope of that function, here  'OVER()'  means ALL columns in other hand   'OVER(partition by deptno)'  means it will look into specifically deptno and partition by clause will do group by on deptno column. I will give some example below ,which will clear your doubt about this.
  • What does PARTITION BY clause denote? - As mentioned earlier the PARTITION BY clause is similar to GROUP By clause, it specifies the window of rows that the analytic function should operate on.

Whatever an analytic function does can be done by basic SQL, with help of various joins and sub-queries. But we use oracle analytical function to replace basic SQL for faster performance. Some of us used to avoid analytical functions due to lack of knowledge on this. I am trying to give a platform here to help with this features.


Example:
Simple query 1:

select
       department_id,
       count(*)
from hr.employees
where department_id in (90,60)
group by department_id;

department_id   count(*)
--------------    --------
60 5
90 3

Rewrite query 2:
SELECT  
       EMPLOYEE_ID, 
       department_id,
       COUNT(*) OVER (PARTITION BY department_id ) DEPT_COUNT 
FROM hr.employees 
WHERE department_id  IN (20, 30);













Here in this two query one can write his/her query in any of this two method.If you observe Query1 groups record based on department id and whatever you want to group by it is mandatory to put those columns in select clause as well as group by clause same number of columns.Non-group by columns are notallowed in select statement.

Where in analytical function oracle given us the freedom to use any columns based on functionality it does not depend on any type of clause like GROUP BY. Though analytic functions give aggregate result they do not group the result set. They return the group value multiple times with each record as shown in query2.

In absence of any partition or window_clause inside the OVER( ) portion, the function acts on entire record set returned by the where clause.

Analytic functions are computed after all joins, WHERE clause, GROUP BY and HAVING are computed on the query. The main ORDER BY clause of the query operates after the analytic functions. So analytic functions can only appear in the select list and in the main ORDER BY clause of the query.

So we will see here how OVER() works in below query except any partition clause.

SELECT * 
from hr.employees
WHERE department_id IN (90, 60);

count(*)
-----------
8

Note: It will show you 8 records in result output means has total 8 records for those two department.

SELECT   EMPLOYEE_ID, department_id,  
                 COUNT(*) OVER () DEPT_COUNT 
FROM hr.employees 
WHERE department_id  IN (90, 60);

EMPNO   DEPTNO  DEPT_COUNT 
----------  ----------   ----------
100 90 8
101 90 8
102 90 8
103 60 8
104 60 8
105 60 8
106 60 8
107 60 8

Note: So from the above output we saw that it doesn't group based on department_id  IN (90, 60) as we omitted PARTITION clause here.PARTITION clause is important to group your result based on values.So PARTITION BY is used to break the result set into groups.
----------------------------------------------------------------------------------------------------------------------------------


  • Rank
Q.One very common question in interview is find out nth highest salary department wise.

SELECT
       employee_id,
       department_id,
       salary
from hr.employees
WHERE department_id IN (90, 60);














From this output for Dept 60 it has 5 rows and for Dept 90 it has 3 rows. Form this we are going to show you NTH highest salary query.

select
       employee_id,
       department_id,
       salary,
       rank() over(order by salary desc) As RNK
from hr.employees
WHERE department_id IN (90, 60);


employee_id   department_id   salary    RNK
------------    ------------     -------    --------------
100             90             24000                 1
102             90             17000                 2
101             90             17000                 2
103             60             9000                   4
104             60             6000                   5
105             60             4800                   6
106             60             4800                   6
107             60             4200                   8


In this query we can idetify rank from RNK column and one can easily write down NTH highest query from this.

Below query will show highest salary from each department.
SELECT 
       sal_ord.employee_id Eid,
       sal_ord.department_id DeptCd,
       sal_ord.salary Sal,
       sal_ord.Rnk Rnk
FROM (
              SELECT employee_id,
              department_id,
              salary,
              RANK () OVER (PARTITION BY department_id ORDER BY salary DESC) AS Rnk
FROM hr.employees
WHERE department_id IN (90, 60)
) sal_ord
WHERE sal_ord.Rnk = 1;

Eid    DeptCd     Sal     Rnk
----    ---------   ----   -----
103    60        9000      1
100    90      24000      1
-----------------------------------------------------------------------------------------------------------
  • Dense_rank
Q. Difference between RANK()   and DENSE_RANK().

As per below query we fired the same for RANK() as well.

Rank() example 

select
       employee_id,
       department_id,
       salary,
rank() over(order by salary desc) As RNK
from hr.employees
WHERE department_id IN (90, 60);

employee_id   department_id   salary    RNK
------------    ------------     -------    --------------
100             90             24000                 1
102             90             17000                 2
101             90             17000                 2
103             60             9000                   4
104             60             6000                   5
105             60             4800                   6
106             60             4800                   6
107             60             4200                   8

Dense_rank() example 
select
       employee_id,
       department_id,
       salary,
       dense_rank() over(order by salary desc) As RNK
from hr.employees
WHERE department_id IN (90, 60);

employee_id   department_id   salary    RNK
------------    ------------     -------    --------------
100             90             24000                 1
102             90             17000                 2
101             90             17000                 2
103             60             9000                   3
104             60             6000                   4
105             60             4800                   5
106             60             4800                   5
107             60             4200                   6

Please check highlighted RNK column, one can easily identify the difference.


select
       employee_id,
       department_id,
       salary,
       rank() over(order by salary desc) As RNK,
       Dense_rank() over(order by salary desc) As DNS_RNK
from hr.employees
WHERE department_id IN (90, 60);


employee_id      department_id    salary     RNK      DNS_RNK
------------               ------------          -------      --------      ------------


100            90                        24000            1             1
102            90                        17000            2             2
101            90                        17000            2             2
103            60                        9000              4             3
104            60                        6000              5             4
105            60                        4800              6             5
106            60                        4800              6             5
107            60                        4200              8             6

Note:   The DENSE_RANK function acts like the RANK function except that it assigns consecutive ranks

----------------------------------------------------------------------------------------------------------------------------------

  • All together Rank(),Dense_Rank(),Row_Number
select
             employee_id,
             department_id,
             salary,
             rank() over(
                              partition by department_id order by salary desc) As RNK,
             Dense_rank() over(
                              partition by department_id order by salary desc) As DNS_RNK,
             row_number() over(
                              partition by department_id order by salary desc ) RwNm
from hr.employees
WHERE department_id IN (90, 60);

Output













----------------------------------------------------------------------------------------------------------------------------------


  • Lead and Lag
LEAD has the ability to compute an expression on the next rows.Lag is same as LEAD except LAGS goes into previous row.


SELECT 
       department_id,
       employee_id,
       salary,
       LEAD (salary, 1, 0) OVER (PARTITION BY department_id ORDER BY salary DESC NULLS LAST) NEXT_LOWER_SAL,
       LAG (salary, 1, 0) OVER (PARTITION BY department_id ORDER BY salary DESC NULLS LAST) PREV_HIGHER_SAL
FROM hr.employees
WHERE department_id IN (90, 60)
ORDER BY department_id, salary DESC;





----------------------------------------------------------------------------------------------------------------------------------                                                                                                                

  • First_Value and LAST_VALUE

  1. The FIRST_VALUE analytic function picks the first record from the partition after doing the ORDER BY
  2. The LAST_VALUE function is used in similar context except that it acts on the last record of the partition,ie: immediate next value
SELECT 
       employee_id,
       department_id,
       FIRST_VALUE (hire_date) OVER (PARTITION BY department_id) First_hire,
       LAST_VALUE (hire_date) OVER (PARTITION BY department_id) last_hire
FROM hr.employees
WHERE department_id IN (20, 30)
ORDER BY department_id, First_hire, last_hire;
















----------------------------------------------------------------------------------------------------------------------------------                                                                                                              
  • Listagg
SELECT 
             department_id, 
             LISTAGG(FIRST_NAME,',') WITHIN GROUP (ORDER BY FIRST_NAME) 
FROM hr.employees
GROUP BY department_id;


DEPTNO               EMPLOYEES 
----------                  -------------------------------------------------- 
 10           CLARK,KING,MILLER 20 ADAMS,FORD,JONES,SCOTT,SMITH 
 30           ALLEN,BLAKE,JAMES,MARTIN,TURNER,WARD


Note: We can get the same output with some other function if those are available in your version of DB.
  • wm_concat()
SELECT 
            department_id, 
            wm_concat(FIRST_NAME) AS employees 
FROM hr.employees 
GROUP BY department_id;

DEPTNO         EMPLOYEES 
----------            -------------------------------------------------- 
 10           CLARK,KING,MILLER 20 ADAMS,FORD,JONES,SCOTT,SMITH 
 30           ALLEN,BLAKE,JAMES,MARTIN,TURNER,WARD
-----------------------------------------------------------------------------------------------------------------------------



Q. Difference between ROLLUP and CUBE?

A simple difference will show the difference. Exaple as follows:


  • Rollup


select
          department_id,
          job_id,
          sum(salary)
from hr.employees
where department_id in (10,20)
group by rollup(department_id,job_id)
order by department_id;


Output















  • Cube
Note: Here we will use CUBE insist of rollup to check output differences.



select
          department_id,
          job_id,
          sum(salary)
from hr.employees
where department_id in (10,20)
group by Cube(department_id,job_id)
order by department_id;


Output




















Here one can see extra layer of information are in output. which can easily identify the difference between this two.

----------------------------------------------------------------------------------------------------------------------------------

  • ROW_NUMBER



select  row_number() over(order by sal ),sal from 
(
        select sum(salary) as sal 
        from hr.employees 
        where department_id in (10,20)
        group by cube(department_id,job_id)
order by department_id
);

Output























----------------------------------------------------------------------------------------------------------------------------------

Q.What is windowing_clause?Functionality?
We have discussed about over() and partition clause and how does those helps a query to grouping the output and gave us a control over data output. And now window_clause is same as others, actually it gives another layer or further control over this window with current partition.
It is basically a extension of order_by_clause which we use within
over(partition by ...... order by ......) clause.
Instead of this example it will look like with window clause is
over(partition by ...... order by ......ROWS BETWEEN 0 PRECEDING AND 0 PRECEDING)
It will gives us some more filtration more flexible output.

The windowing_clause has two basic forms.

  • RANGE BETWEEN start_point AND end_point 
  • ROWS BETWEEN start_point AND end_point

Possible values for "start_point" and "end_point" are:



  • UNBOUNDED PRECEDING : Window starts at the first row of the partition.Available for start points.
  • UNBOUNDED FOLLOWING : Window ends at the last row of the partition. Available for end points.
  • CURRENT ROW : Window starts or ends at the current row.
  • value_expr PRECEDING : An physical or logical offset before the current row using a constant or expression that evaluates to a positive numerical value. When used with RANGE, it can also be an interval literal if the order_by_clause uses a DATE column.
  • value_expr FOLLOWING : As above, but an offset after the current row.

















To be continued
Next topics:
  • Connect by Prior









6 comments:

  1. really helpful..keep posting more

    ReplyDelete
  2. Thanks. Good explanation and useful.

    ReplyDelete
  3. This post is very intuitive and very well explained :) thanks

    ReplyDelete

Is this article is useful(Yes/No)