读书人

ORACLE - Oracle分析函数胪陈【二】

发布时间: 2012-09-13 09:51:53 作者: rapoo

ORACLE -- Oracle分析函数详述【二】

一.分析函数2(rank\dense_rank\row_number)

目录
===============================================
1.使用rownum为记录排名
2.使用分析函数来为记录排名
3.使用分析函数为记录进行分组排名

一、使用rownum为记录排名:

在前面一篇《Oracle开发专题之:分析函数》,我们认识了分析函数的基本应用,现在我们再来考虑下面几个问题:

①对所有客户按订单总额进行排名
②按区域和客户订单总额进行排名
③找出订单总额排名前13位的客户
④找出订单总额最高、最低的客户
⑤找出订单总额排名前25%的客户

按照前面第一篇文章的思路,我们只能做到对各个分组的数据进行统计,如果需要排名的话那么只需要简单地加上rownum不就行了吗?事实情况是否如此想象般简单,我们来实践一下。

【1】测试环境:
SQL> desc user_order;
Name????????????????????????????????????? Null???? Type
----------------------------------------- -------- ----------------------------
REGION_ID????????????????????????????????????????? NUMBER(2)
CUSTOMER_ID????????????????????????????????? NUMBER(2)
CUSTOMER_SALES????????????????????????? NUMBER

【2】测试数据:
SQL> select * from user_order order by customer_sales;

REGION_ID CUSTOMER_ID CUSTOMER_SALES
---------- ----------- --------------
???????? 5?????????? 1????????????? 151162
??????? 10????????? 29???????????? 903383
???????? 6?????????? 7????????????? 971585
??????? 10????????? 28??????????? 986964
???????? 9????????? 21?????????? 1020541
???????? 9????????? 22?????????? 1036146
???????? 8????????? 16?????????? 1068467
???????? 6?????????? 8??????????? 1141638
???????? 5?????????? 3??????????? 1161286
???????? 5?????????? 5??????????? 1169926
???????? 8????????? 19?????????? 1174421
???????? 7????????? 12?????????? 1182275
???????? 7????????? 11?????????? 1190421
???????? 6????????? 10?????????? 1196748
???????? 6?????????? 9??????????? 1208959
??????? 10????????? 30????????? 1216858
???????? 5???????????? 2??????????????? 1224992
?????????? 9???????????? 24????????????? 1224992
?????????? 9???????????? 23????????????? 1224992
?????????? 8????????? 18?????????? 1253840
???????? 7????????? 15?????????? 1255591
???????? 7????????? 13?????????? 1310434
??????? 10????????? 27????????? 1322747
???????? 8????????? 20?????????? 1413722
???????? 6?????????? 6??????????? 1788836
??????? 10????????? 26????????? 1808949
???????? 5?????????? 4??????????? 1878275
???????? 7????????? 14?????????? 1929774
???????? 8????????? 17?????????? 1944281
???????? 9????????? 25?????????? 2232703

30 rows selected.

注意这里有3条记录的订单总额是一样的。假如我们现在需要筛选排名前12位的客户,如果使用rownum会有什么样的后果呢?
SQL> select rownum, t.*
2??? from (select *
3??????????? from user_order
4?????????? order by customer_sales desc) t
5?? where rownum <= 12
6?? order by customer_sales desc;

??? ROWNUM REGION_ID CUSTOMER_ID CUSTOMER_SALES
---------- ---------- ----------- --------------
???????? 1????????? 9???????????????? 25??????? 2232703
???????? 2????????? 8???????????????? 17??????? 1944281
???????? 3????????? 7???????????????? 14??????? 1929774
???????? 4????????? 5?????????????????? 4??????? 1878275
???????? 5???????? 10??????????????? 26??????? 1808949
???????? 6????????? 6?????????????????? 6??????? 1788836
???????? 7????????? 8???????????????? 20??????? 1413722
???????? 8???????? 10??????????????? 27??????? 1322747
???????? 9????????? 7??????????????? 13??????? 1310434
??????? 10????????? 7?????????????? 15??????? 1255591
??????? 11????????? 8?????????????? 18??????? 1253840
????????? 12???????????? 5???????????????????? 2????????? 1224992

12 rows selected.

很明显假如只是简单地按rownum进行排序的话,我们漏掉了另外两条记录(参考上面的结果)。

二、使用分析函数来为记录排名:

针对上面的情况,Oracle从8i开始就提供了3个分析函数:rand,dense_rank,row_number来解决诸如此类的问题,下面我们来看看这3个分析函数的作用以及彼此之间的区别:

Rank,Dense_rank,Row_number函数为每条记录产生一个从1开始至N的自然数,N的值可能小于等于记录的总数。这3个函数的唯一区别在于当碰到相同数据时的排名策略。

①ROW_NUMBER:

Row_number函数返回一个唯一的值,当碰到相同数据时,排名按照记录集中记录的顺序依次递增。

②DENSE_RANK:
Dense_rank函数返回一个唯一的值,除非当碰到相同数据时,此时所有相同数据的排名都是一样的。


③RANK:
Rank函数返回一个唯一的值,除非遇到相同的数据时,此时所有相同数据的排名是一样的,同时会在最后一条相同记录和下一条不同记录的排名之间空出排名。

这样的介绍有点难懂,我们还是通过实例来说明吧,下面的例子演示了3个不同函数在遇到相同数据时不同排名策略:
SQL> select region_id, customer_id, sum(customer_sales) total,
2???????? rank() over(order by sum(customer_sales) desc) rank,
3???????? dense_rank() over(order by sum(customer_sales) desc) dense_rank,
4???????? row_number() over(order by sum(customer_sales) desc) row_number
5??? from user_order
6?? group by region_id, customer_id;

REGION_ID CUSTOMER_ID????? TOTAL?????? RANK DENSE_RANK ROW_NUMBER
---------- ----------- ---------- ---------- ---------- ----------
?????????
???????? 8????????? 18??????????????? 1253840???????? 11???????? 11???????? 11
???????? 5?????????? 2???????????????? 1224992???????? 12???????? 12???????? 12
???????? 9????????? 23??????????????? 1224992???????? 12???????? 12???????? 13
???????? 9????????? 24??????????????? 1224992???????? 12???????? 12???????? 14
??????? 10????????? 30?????????????? 1216858???????? 15?????????? 13??????????? 15


30 rows selected.

请注意上面的绿色高亮部分,这里生动的演示了3种不同的排名策略:

①对于第一条相同的记录,3种函数的排名都是一样的:12

②当出现第二条相同的记录时,Rank和Dense_rank依然给出同样的排名12;而row_number则顺延递增为13,依次类推至第三条相同的记录

③当排名进行到下一条不同的记录时,可以看到Rank函数在12和15之间空出了13,14的排名,因为这2个排名实际上已经被第二、三条相同的记录占了。而Dense_rank则顺序递增。row_number函数也是顺序递增

比较上面3种不同的策略,我们在选择的时候就要根据客户的需求来定夺了:

①假如客户就只需要指定数目的记录,那么采用row_number是最简单的,但有漏掉的记录的危险

②假如客户需要所有达到排名水平的记录,那么采用rank或dense_rank是不错的选择。至于选择哪一种则看客户的需要,选择dense_rank或得到最大的记录

三、使用分析函数为记录进行分组排名:

上面的排名是按订单总额来进行排列的,现在跟进一步:假如是为各个地区的订单总额进行排名呢?这意味着又多了一次分组操作:对记录按地区分组然后进行排名。幸亏Oracle也提供了这样的支持,我们所要做的仅仅是在over函数中order by的前面增加一个分组子句:partition by region_id。
SQL> select region_id, customer_id,
?????????????? sum(customer_sales) total,
2???????? rank() over(partition by region_id
??????????????????????? order by sum(customer_sales) desc) rank,
3???????? dense_rank() over(partition by region_id
??????????????????????? order by sum(customer_sales) desc) dense_rank,
4???????? row_number() over(partition by region_id
??????????????????????? order by sum(customer_sales) desc) row_number

5??? from user_order
6?? group by region_id, customer_id;

REGION_ID CUSTOMER_ID????? TOTAL?????? RANK DENSE_RANK ROW_NUMBER
---------- ----------- ---------- ---------- ---------- ----------
???????? 5?????????? 4??????????????? 1878275????????? 1????????? 1????????? 1
???????? 5?????????? 2??????????????? 1224992????????? 2????????? 2????????? 2
???????? 5?????????? 5??????????????? 1169926????????? 3????????? 3????????? 3
???????? 6?????????? 6??????????????? 1788836????????? 1????????? 1????????? 1
???????? 6?????????? 9??????????????? 1208959????????? 2????????? 2????????? 2
???????? 6????????? 10?????????????? 1196748????????? 3????????? 3????????? 3??????


30 rows selected.

现在我们看到的排名将是基于各个地区的,而非所有区域的了!Partition by 子句在排列函数中的作用是将一个结果集划分成几个部分,这样排列函数就能够应用于这各个子集。

三.分析函数3(top\bottom n、first\last、ntile)

目录
===============================================
1.带空值的排列
2.Top/Bottom N查询
3.First/Last排名查询
4.按层次查询

一、带空值的排列:

在前面《Oracle开发专题之:分析函数2(Rank、Dense_rank、row_number)》一文中,我们已经知道了如何为一批记录进行全排列、分组排列。假如被排列的数据中含有空值呢?
SQL> select region_id, customer_id,
2???????? sum(customer_sales) cust_sales,
3???????? sum(sum(customer_sales)) over(partition by region_id) ran_total,
4???????? rank() over(partition by region_id
5????????????????? order by sum(customer_sales) desc) rank
6??? from user_order
7?? group by region_id, customer_id;

REGION_ID CUSTOMER_ID CUST_SALES RAN_TOTAL?????? RANK
---------- ----------- ---------- ---------- ----------
??????? 10????????? 31??????????????????? 6238901????????? 1
??????? 10????????? 26??? 1808949??? 6238901????????? 2
??????? 10????????? 27??? 1322747??? 6238901????????? 3
??????? 10????????? 30??? 1216858??? 6238901????????? 4
??????? 10????????? 28???? 986964??? 6238901????????? 5
??????? 10????????? 29???? 903383??? 6238901????????? 6

我们看到这里有一条记录的CUST_TOTAL字段值为NULL,但居然排在第一名了!显然这不符合情理。所以我们重新调整完善一下我们的排名策略,看看下面的语句:
SQL> select region_id, customer_id,
2???????? sum(customer_sales) cust_total,
3???????? sum(sum(customer_sales)) over(partition by region_id) reg_total,
4???????? rank() over(partition by region_id
??????????????????????? order by sum(customer_sales) desc NULLS LAST) rank
5??????? from user_order
6?????? group by region_id, customer_id;

REGION_ID CUSTOMER_ID CUST_TOTAL REG_TOTAL?????? RANK
---------- ----------- ---------- ---------- ----------
??????? 10????????? 26??? 1808949???? 6238901?????????? 1
??????? 10????????? 27??? 1322747??? 6238901?????????? 2
??????? 10????????? 30??? 1216858??? 6238901?????????? 3
??????? 10????????? 28???? 986964???? 6238901?????????? 4
??????? 10????????? 29???? 903383???? 6238901?????????? 5
??????? 10????????? 31???? 6238901?????????????????????????? 6

绿色高亮处,NULLS LAST/FIRST告诉Oracle让空值排名最后后第一。

注意是NULLS,不是NULL。

二、Top/Bottom N查询:

在日常的工作生产中,我们经常碰到这样的查询:找出排名前5位的订单客户、找出排名前10位的销售人员等等。现在这个对我们来说已经是很简单的问题了。下面我们用一个实际的例子来演示:

【1】找出所有订单总额排名前3的大客户:
SQL> select *
SQL>?? from (select region_id,
SQL>??????????????? customer_id,
SQL>??????????????? sum(customer_sales) cust_total,
SQL>??????????????? rank() over(order by sum(customer_sales) desc NULLS LAST) rank
SQL>?????????? from user_order
SQL>????????? group by region_id, customer_id)
SQL> where rank <= 3;

REGION_ID CUSTOMER_ID CUST_TOTAL?????? RANK
---------- ----------- ---------- ----------
???????? 9????????? 25??? 2232703????????? 1
???????? 8????????? 17??? 1944281????????? 2
???????? 7????????? 14??? 1929774????????? 3

SQL>

【2】找出每个区域订单总额排名前3的大客户:
SQL> select *
2??? from (select region_id,
3???????????????? customer_id,
4???????????????? sum(customer_sales) cust_total,
5???????????????? sum(sum(customer_sales)) over(partition by region_id) reg_total,
6???????????????? rank() over(partition by region_id
??????????????????????????????? order by sum(customer_sales) desc NULLS LAST) rank
7??????????? from user_order
8?????????? group by region_id, customer_id)
9?? where rank <= 3;

REGION_ID CUSTOMER_ID CUST_TOTAL REG_TOTAL?????? RANK
---------- ----------- ---------- ---------- ----------
???????? 5?????????? 4??? 1878275??? 5585641????????? 1
???????? 5?????????? 2??? 1224992??? 5585641????????? 2
???????? 5?????????? 5??? 1169926??? 5585641????????? 3
???????? 6?????????? 6??? 1788836??? 6307766????????? 1
???????? 6?????????? 9??? 1208959??? 6307766????????? 2
???????? 6????????? 10??? 1196748??? 6307766????????? 3
???????? 7????????? 14??? 1929774??? 6868495????????? 1
???????? 7????????? 13??? 1310434??? 6868495????????? 2
???????? 7????????? 15??? 1255591??? 6868495????????? 3
???????? 8????????? 17??? 1944281??? 6854731????????? 1
???????? 8????????? 20??? 1413722??? 6854731????????? 2
???????? 8????????? 18??? 1253840??? 6854731????????? 3
???????? 9????????? 25??? 2232703??? 6739374????????? 1
???????? 9????????? 23??? 1224992??? 6739374????????? 2
???????? 9????????? 24??? 1224992??? 6739374????????? 2
??????? 10????????? 26??? 1808949??? 6238901????????? 1
??????? 10????????? 27??? 1322747??? 6238901????????? 2
??????? 10????????? 30??? 1216858??? 6238901????????? 3

18 rows selected.

三、First/Last排名查询:

想象一下下面的情形:找出订单总额最多、最少的客户。按照前面我们学到的知识,这个至少需要2个查询。第一个查询按照订单总额降序排列以期拿到第一名,第二个查询按照订单总额升序排列以期拿到最后一名。是不是很烦?因为Rank函数只告诉我们排名的结果,却无法自动替我们从中筛选结果。

幸好Oracle为我们在排列函数之外提供了两个额外的函数:first、last函数,专门用来解决这种问题。还是用实例说话:
SQL> select min(customer_id)
2???????? keep (dense_rank first order by sum(customer_sales) desc) first,
3???????? min(customer_id)
4???????? keep (dense_rank last order by sum(customer_sales) desc) last
5??? from user_order
6?? group by customer_id;

???? FIRST?????? LAST
---------- ----------
??????? 31????????? 1

这里有几个看起来比较疑惑的地方:

①为什么这里要用min函数
②Keep这个东西是干什么的
③fist/last是干什么的
④dense_rank和dense_rank()有什么不同,能换成rank吗?

首先解答一下第一个问题:min函数的作用是用于当存在多个First/Last情况下保证返回唯一的记录。假如我们去掉会有什么样的后果呢?
SQL> select keep (dense_rank first order by sum(customer_sales) desc) first,
2???????????? keep (dense_rank last order by sum(customer_sales) desc) last
3??? from user_order
4?? group by customer_id;
select keep (dense_rank first order by sum(customer_sales) desc) first,
??????????????????????? *
ERROR at line 1:
ORA-00907: missing right parenthesis

接下来看看第2个问题:keep是干什么用的?从上面的结果我们已经知道Oracle对排名的结果只“保留”2条数据,这就是keep的作用。告诉Oracle只保留符合keep条件的记录。

那么什么才是符合条件的记录呢?这就是第3个问题了。dense_rank是告诉Oracle排列的策略,first/last则告诉最终筛选的条件。

第4个问题:如果我们把dense_rank换成rank呢?
SQL> select min(region_id)
2????????? keep(rank first order by sum(customer_sales) desc) first,
3???????? min(region_id)
4????????? keep(rank last order by sum(customer_sales) desc) last
5??? from user_order
6?? group by region_id;
select min(region_id)
*
ERROR at line 1:
ORA-02000: missing DENSE_RANK

四、按层次查询:

现在我们已经见识了如何通过Oracle的分析函数来获取Top/Bottom N,第一个,最后一个记录。有时我们会收到类似下面这样的需求:找出订单总额排名前1/5的客户。

很熟悉是不?我们马上会想到第二点中提到的方法,可是rank函数只为我们做好了排名,并不知道每个排名在总排名中的相对位置,这时候就引入了另外一个分析函数NTile,下面我们就以上面的需求为例来讲解一下:
SQL> select region_id,
2???????? customer_id,
3???????? ntile(5) over(order by sum(customer_sales) desc) til
4??? from user_order
5?? group by region_id, customer_id;

REGION_ID CUSTOMER_ID?????? TILE
---------- ----------- ----------
??????? 10????????? 31????????? 1
???????? 9????????? 25?????????? 1
??????? 10????????? 26????????? 1
???????? 6?????????? 6??????????? 1????????
???????? 8????????? 18?????????? 2
???????? 5?????????? 2??????????? 2
???????? 9????????? 23?????????? 3
???????? 6?????????? 9??????????? 3
???????? 7????????? 11?????????? 3
???????? 5?????????? 3??????????? 4
???????? 6?????????? 8??????????? 4
???????? 8????????? 16?????????? 4
???????? 6?????????? 7??????????? 5
??????? 10????????? 29????????? 5
???????? 5?????????? 1??????????? 5

Ntil函数为各个记录在记录集中的排名计算比例,我们看到所有的记录被分成5个等级,那么假如我们只需要前1/5的记录则只需要截取TILE的值为1的记录就可以了。假如我们需要排名前25%的记录(也就是1/4)那么我们只需要设置ntile(4)就可以了。

?

读书人网 >其他数据库

热点推荐