ORACLE分析函数解析-Oracle认证-计算机-珠峰网 www.my8848.com

ORACLE分析函数解析

发布时间：2010/12/30 15:15:15 来源：www.xue.net 编辑：城市总裁吧

　分析函数是oracle 8.1.6中就引入的一个全新的概念,为我们分析数据提供了一种简单高效的处理方式.在分析函数出现以前,我们必须使用自联查询,子查询或者内联视图,甚至复杂的存储过程实现的语句,现在只要一条简单的sql语句就可以实现了,而且在执行效率方面也有相当大的提高.

　　而ORACLE的常见分析函数有三类：

　　1. OVER函数

　　分析函数的语法为:

　　Java代码

　　FUNCTION_NAME(<argument>,<argument>...)

　　OVER

　　(<Partition-Clause><Order-by-Clause><Windowing Clause>)

　　例:

　　sum(sal) over (partition by deptno order by ename) new_alias

　　sum就是函数名

　　(sal)是分析函数的参数,每个函数有0~3个参数,参数可以是表达式,例如:sum(sal+comm)

　　over 是一个关键字,用于标识分析函数,否则查询分析器不能区别sum()聚集函数和sum()分析函数

　　partition by deptno 是可选的分区子句,如果不存在任何分区子句,则全部的结果集可看作一个单一的大区

　　order by ename 是可选的order by 子句,有些函数需要它,有些则不需要.依靠已排序数据的那些函数,如:用于访问结果集中前一行和后一行的LAG和LEAD,必须使用,其它函数,如AVG,则不需要.在使用了任何排序的开窗函数时,该子句是强制性的,它指定了在计算分析函数时一组内的数据是如何排序的.

　　eg:按区域查找上一年度订单总额占区域订单总额20%以上的客户，来看看分析函数的应用。

　　【1】测试环境：

　　Java代码

　　SQL> desc orders_tmp;

　　Name Null? Type

　　----------------------- -------- ----------------

　　CUST_NBR NOT NULL NUMBER(5)

　　REGION_ID NOT NULL NUMBER(5)

　　SALESPERSON_ID NOT NULL NUMBER(5)

　　YEAR NOT NULL NUMBER(4)

　　MONTH NOT NULL NUMBER(2)

　　TOT_ORDERS NOT NULL NUMBER(7)

　　TOT_SALES NOT NULL NUMBER(11,2)

　　【2】测试数据：

　　Java代码

　　SQL> select * from orders_tmp;

　　CUST_NBR REGION_ID SALESPERSON_ID YEAR MONTH TOT_ORDERS TOT_SALES

　　---------- ---------- -------------- ---------- ---------- ---------- ----------

　　11 7 11 2001 7 2 12204

　　4 5 4 2001 10 2 37802

　　7 6 7 2001 2 3 3750

　　10 6 8 2001 1 2 21691

　　10 6 7 2001 2 3 42624

　　15 7 12 2000 5 6 24

　　12 7 9 2000 6 2 50658

　　1 5 2 2000 3 2 44494

　　1 5 1 2000 9 2 74864

　　2 5 4 2000 3 2 35060

　　2 5 4 2000 4 4 6454

　　2 5 1 2000 10 4 35580

　　4 5 4 2000 12 2 39190

　　13 rows selected.

[NextPage]

　【3】测试语句：

　　Java代码

　　SQL> select o.cust_nbr customer,

　　2 o.region_id region,

　　3 sum(o.tot_sales) cust_sales,

　　4 [color=red]sum(sum(o.tot_sales)) over(partition by o.region_id) region_sales

　　[/color] 5 from orders_tmp o

　　6 where o.year = 2001

　　7 [color=red]group by o.region_id, o.cust_nbr;[/color]

　　CUSTOMER REGION CUST_SALES REGION_SALES

　　---------- ---------- ---------- ------------

　　4 5 37802 37802

　　7 6 3750 68065

　　10 6 64315 68065

　　11 7 12204 12204

　　SQL> select o.cust_nbr customer,

　　2 o.region_id region,

　　3 sum(o.tot_sales) cust_sales,

　　4 [color=red]sum(sum(o.tot_sales)) over(partition by o.region_id) region_sales

　　[/color] 5 from orders_tmp o

　　6 where o.year = 2001

　　7 [color=red]group by o.region_id, o.cust_nbr;[/color]

　　CUSTOMER REGION CUST_SALES REGION_SALES

　　---------- ---------- ---------- ------------

　　4 5 37802 37802

　　7 6 3750 68065

　　10 6 64315 68065

　　11 7 12204 12204

　　group by的意图很明显：将数据按区域ID，客户进行分组，那么Over这一部分有什么用呢？假如我们只需要统计每个区域每个客户的订单总额，那么我们只需要group by o.region_id,o.cust_nbr就够了。但我们还想在每一行显示该客户所在区域的订单总额，这一点和前面的不同：需要在前面分组的基础上按区域累加。很显然group by和sum是无法做到这一点的(因为聚集操作的级别不一样，前者是对一个客户，后者是对一批客户)。

　　这就是over函数的作用了！它的作用是告诉SQL引擎：按区域对数据进行分区，然后累积每个区域每个客户的订单总额(sum(sum(o.tot_sales)))。

　　总结：

　　①Over函数指明在那些字段上做分析，其内跟Partition by表示对数据进行分组。注意Partition by可以有多个字段。

　　②Over函数可以和其它聚集函数、分析函数搭配，起到不同的作用。例如这里的SUM，还有诸如Rank，Dense_rank等。

　　2. RANK,DENSE_RANK,ROW_NUMBER函数

　　Rank，Dense_rank，Row_number函数为每条记录产生一个从1开始至N的自然数，N的值可能小于等于记录的总数。这3个函数的唯一区别在于当碰到相同数据时的排名策略。

　　①ROW_NUMBER：

　　Row_number函数返回一个唯一的值，当碰到相同数据时，排名按照记录集中记录的顺序依次递增。

　　②DENSE_RANK：

　　Dense_rank函数返回一个唯一的值，除非当碰到相同数据时，此时所有相同数据的排名都是一样的。

　　③RANK：

　　Rank函数返回一个唯一的值，除非遇到相同的数据时，此时所有相同数据的排名是一样的，同时会在最后一条相同记录和下一条不同记录的排名之间空出排名。

[NextPage]

看下面一个例子：

　　Java代码

　　SQL> select region_id, customer_id, sum(customer_sales) total,

　　2 rank() over(order by sum(customer_sales) desc) rank,

　　3 dense_rank() over(order by sum(customer_sales) desc) dense_rank,

　　4 row_number() over(order by sum(customer_sales) desc) row_number

　　5 from user_order

　　6 group by region_id, customer_id;

　　REGION_ID CUSTOMER_ID TOTAL RANK DENSE_RANK ROW_NUMBER

　　---------- ----------- ---------- ---------- ---------- ----------

　　8 18 1253840 11 11 11

　　5 2 1224992 12 12 12

　　9 23 1224992 12 12 13

　　9 24 1224992 12 12 14

　　10 30 1216858 15 13 15

　　30 rows selected.

　　3. TOP/BOTTOM N,FIRST/LAST,NTILE,NULLS LAST

　　1）NULLS LAST:

　　Java代码

　　SQL> select region_id, customer_id,

　　2 sum(customer_sales) cust_total,

　　3 sum(sum(customer_sales)) over(partition by region_id) reg_total,

　　4 rank() over(partition by region_id

　　order by sum(customer_sales) desc NULLS LAST) rank

　　5 from user_order

　　6 group by region_id, customer_id;

　　REGION_ID CUSTOMER_ID CUST_TOTAL REG_TOTAL RANK

　　---------- ----------- ---------- ---------- ----------

　　10 26 1808949 6238901 1

　　10 27 1322747 6238901 2

　　10 30 1216858 6238901 3

　　10 28 986964 6238901 4

　　10 29 903383 6238901 5

　　10 31 6238901 6

　　NULLS LAST/FIRST告诉Oracle让空值排名最后后第一。注意是NULLS，不是NULL。

　　2).TOP N/BOTTOM N:

　　eg:找出每个区域订单总额排名前3的大客户：

　　Java代码

　　SQL> select *

　　2 from (select region_id,

　　3 customer_id,

　　4 sum(customer_sales) cust_total,

　　5 sum(sum(customer_sales)) over(partition by region_id) reg_total,

　　6 rank() over(partition by region_id

　　order by sum(customer_sales) desc NULLS LAST) rank

　　7 from user_order

　　8 group by region_id, customer_id)

　　9 where rank <= 3;

[NextPage]

　REGION_ID CUSTOMER_ID CUST_TOTAL REG_TOTAL RANK

　　---------- ----------- ---------- ---------- ----------

　　5 4 1878275 5585641 1

　　5 2 1224992 5585641 2

　　5 5 1169926 5585641 3

　　6 6 1788836 6307766 1

　　6 9 1208959 6307766 2

　　6 10 1196748 6307766 3

　　7 14 1929774 6868495 1

　　7 13 1310434 6868495 2

　　7 15 1255591 6868495 3

　　8 17 1944281 6854731 1

　　8 20 1413722 6854731 2

　　8 18 1253840 6854731 3

　　9 25 2232703 6739374 1

　　9 23 1224992 6739374 2

　　9 24 1224992 6739374 2

　　10 26 1808949 6238901 1

　　10 27 1322747 6238901 2

　　10 30 1216858 6238901 3

　　18 rows selected.

　　3). FIRST / LAST:

　　eg:找出订单总额最多、最少的客户

　　Java代码

　　SQL> select min(customer_id)

　　2 keep (dense_rank first order by sum(customer_sales) desc) first,

　　3 min(customer_id)

　　4 keep (dense_rank last order by sum(customer_sales) desc) last

　　5 from user_order

　　6 group by customer_id;

　　FIRST LAST

　　---------- ----------

　　31 1

　　4) NTILE---按层次查询

　　eg:找出订单总额排名前1/5的客户

　　Java代码

　　SQL> select region_id,

　　2 customer_id,

　　3 ntile(5) over(order by sum(customer_sales) desc) til

　　4 from user_order

　　5 group by region_id, customer_id;

　　REGION_ID CUSTOMER_ID TILE

　　---------- ----------- ----------

　　10 31 1

　　9 25 1

　　10 26 1

　　6 6 1

　　8 18 2

　　5 2 2

　　9 23 3

　　6 9 3

　　7 11 3

　　5 3 4

　　6 8 4

　　8 16 4

　　6 7 5

　　10 29 5

　　5 1 5

　　Ntil函数为各个记录在记录集中的排名计算比例，我们看到所有的记录被分成5个等级，那么假如我们只需要前1/5的记录则只需要截取TILE的值为1的记录就可以了。假如我们需要排名前25%的记录(也就是1/4)那么我们只需要设置ntile(4)就可以了。