读书人

Hive内置函数的使用(一)

发布时间: 2013-10-31 12:03:52 作者: rapoo

Hive内置函数的使用(1)

工作中用到了一些,总结一下:

1、 行拆分、合并

1. 分拆explode(ARRAY)

返回值:多行

SELECT explode(myCol) AS myNewCol FROM myTable;

注:1.使用UDTF函数时,select中不可以包含其他表达式;

2.UDTF函数不能嵌套使用;

3.UDTF不支持GROUP BY / CLUSTER BY / DISTRIBUTE BY / SORT BY;

2. 合并去重collect_set(col)

返回值:数组(去重的效果)

Xjt2原始数据:

1 one

2 two

2 two

2 two

3 three

4 four

4 four

1 one one

1 one two

2 twotwo

2 twoone

select id,collect_set(name) from xjt2 group by id;

1 ["one two","oneone","one"]

2 ["twoone","twotwo","two"]

3 ["three"]

4 ["four"]

select collect_set(name) from xjt2;

["three","one","four","two"]

注:当collect_set(col)与其他字段同时在select语句中时,必须使用group by other_fields;

2、 时间函数

1. 获取当前Unix时间戳unix_timestamp()

返回值类型:BIGINT

select unix_timestamp() from xjt1;

1383012276

2. 将日期转时间戳unix_timestamp(string date)

返回值类型:BIGINT,若转换失败,则返回0

select unix_timestamp('2013-01-13 00:00:00') from xjt1;

1358006400

3. 转化指定格式(pattern)日期转时间戳unix_timestamp(string date, string pattern)

返回值类型:BIGINT,若转化失败,则返回0

select unix_timestamp('2013-01-13 00:00:00','yyyyMMdd')from xjt1;

1354291200

4. 将Unix时间戳转日期from_unixtime(BIGINT,’format’)

select from_unix(unix_timestamp(),'yyyyMMdd') from xjt1;

20131029

5. 取日期to_date()、取年year()、取月month()、取天数day()

返回值类型:SRING

select to_date('1990-10-10 00:00:00') from xjt1;

1990-10-10

6. 日期增加函数date_add(string startdate, int days)

返回值类型:STRING

select date_add('2013-10-29',10) from xjt1;

2013-11-08

7. 日期减少函数date_sub(string startdate, intdays)

返回值类型:STRING

select date_sub('2013-10-29',10) from xjt1;

2013-10-19

8. 日期比较函数datediff(string enddate, string startdate)

返回值类型:INT(结束日期减去开始日期,结束日期放在前面)

select datediff('2013-10-29','2013-12-10') from xjt1;

-42

3、 条件判断函数CASE

返回值:T/F

语法:CASE a WHEN b THENc [WHEN d THEN e]* [ELSE f] END

说明:如果 a 等于 b ,那么返回 c ;如果 a 等于 d ,那么返回 e ;否则返回 f

4、 字符串分割函数split(stringstr, string pat)

返回值类型:ARRAY

selectsplit('hello world hello hive',' ') from xjt1;

["hello","world","hello","hive"]


读书人网 >云计算

热点推荐