读书人

经过学生-课程关系表熟悉hive语句

发布时间: 2012-09-08 10:48:07 作者: rapoo

通过学生-课程关系表,熟悉hive语句
1、在hive中创建以下三个表。create table student(Sno int,Sname string,Sex string,Sage int,Sdept string)row format delimited fields terminated by ','stored as textfile;
create table course(Cno int,Cname string) row format delimited fields terminated by ',' stored as textfile;
create table sc(Sno int,Cno int,Grade int)row format delimited fields terminated by ',' stored as textfile;2、load数据到三个表中。load data local inpath '/home/student.txt' overwrite into table student;
load data local inpath '/home/sc.txt' overwrite into table sc;
load data local inpath '/home/course.txt' overwrite into table course;95001,李勇,男,20,CS
95002,刘晨,女,19,IS
95003,王敏,女,22,MA
95004,张立,男,19,IS
95005,刘刚,男,18,MA
95006,孙庆,男,23,CS
95007,易思玲,女,19,MA
95008,李娜,女,18,CS
95009,梦圆圆,女,18,MA
95010,孔小涛,男,19,CS
95011,包小柏,男,18,MA
95012,孙花,女,20,CS
95013,冯伟,男,21,CS
95014,王小丽,女,19,CS
95015,王君,男,18,MA
95016,钱国,男,21,MA
95017,王风娟,女,18,IS
95018,王一,女,19,IS
95019,邢小丽,女,19,IS
95020,赵钱,男,21,IS
95021,周二,男,17,MA
95022,郑明,男,20,MA1,数据库
2,数学
3,信息系统
4,操作系统
5,数据结构
6,数据处理
95001,1,81
95001,2,85
95001,3,88
95001,4,70
95002,2,90
95002,3,80
95002,4,71
95002,5,60
95003,1,82
95003,3,90
95003,5,100
95004,1,80
95004,2,92
95004,4,91
95004,5,70
95005,1,70
95005,2,92
95005,3,99
95005,6,87
95006,1,72
95006,2,62
95006,3,100
95006,4,59
95006,5,60
95006,6,98
95007,3,68
95007,4,91
95007,5,94
95007,6,78
95008,1,98
95008,3,89
95008,6,91
95009,2,81
95009,4,89
95009,6,100
95010,2,98
95010,5,90
95010,6,80
95011,1,81
95011,2,91
95011,3,81
95011,4,86
95012,1,81
95012,3,78
95012,4,85
95012,6,98
95013,1,98
95013,2,58
95013,4,88
95013,5,93
95014,1,91
95014,2,100
95014,4,98
95015,1,91
95015,3,59
95015,4,100
95015,6,95
95016,1,92
95016,2,99
95016,4,82
95017,4,82
95017,5,100
95017,6,58
95018,1,95
95018,2,100
95018,3,67
95018,4,78
95019,1,77
95019,2,90
95019,3,91
95019,4,67
95019,5,87
95020,1,66
95020,2,99
95020,5,93
95021,2,93
95021,5,91
95021,6,99
95022,3,69
95022,4,93
95022,5,82
95022,6,100
3、hive的select

SELECT [ALL | DISTINCT] select_expr, select_expr, ... FROM table_reference [WHERE where_condition] [GROUP BY col_list] [CLUSTER BY col_list | [DISTRIBUTE BY col_list] [SORT BY col_list] ] [LIMIT number]
查询全体学生的学号与姓名
hive> select Sno,Sname from student;Total MapReduce jobs = 1
查询选修了课程的学生姓名
hive> select distinct Sname from student inner join sc on student.Sno=Sc.Sno;Total MapReduce jobs = 2
3、hive的group by 和集合函数在一个query里可以出现多个聚合函数,但是一个query语句里2个聚合函数只能有一个distinct。
hive.map.aggr控制我们怎样去聚合,默认值为false,如果设置为ture后,会在map任务里执行第一级别的聚合,通常这样会有 高的效率,但是需要更大的内存。
查询学生的总人数hive> select count(distinct Sno)count from student;
Total MapReduce jobs = 1
计算1号课程的学生平均成绩
hive> select avg(distinct Grade) from sc where Cno=1;
查询选修1号课程的学生最高分数
  select Grade from sc where Cno=1 sort by Grade desc limit 1;      Total MapReduce jobs = 2      求各个课程号及相应的选课人数 hive> select Cno,count(1) from sc group by Cno;
Total MapReduce jobs = 1经过学生-课程关系表,熟悉hive语句
查询选修了3门以上的课程的学生学号hive> select Sno from (select Sno,count(Cno)CountCno from sc group by Sno)a where a.CountCno>3;
Total MapReduce jobs = 1
hive> select Sno from sc group by Sno having count(Cno)>3;
Total MapReduce jobs = 1
4、hive的Order By/Sort By/Distribute By/Cluster ByOrder By ,在strict 模式下(hive.mapred.mode=strict),order by 语句必须跟着limit语句,但是在非strict下就不是必须的,这样做的理由是必须有一个reduce对最终的结果进行排序,如果最后输出的行数过多,一个reduce需要花费很长的时间。
hive> set hive.mapred.mode=strict;
hive> select Sno from student order by Sno;
FAILED: Error in semantic analysis: 1:33 In strict mode, if ORDER BY is specified, LIMIT must also be specified. Error encountered near token 'Sno'
Sort By,它通常发生在每一个redcue里,“order by” 和“sort by"的区别在于,前者能给保证输出都是有顺序的,而后者如果有多个reduce的时候只是保证了输出的部分有序。set mapred.reduce.tasks=<number>在sort by可以指定,在用sort by的时候,如果没有指定列,它会随机的分配到不同的reduce里去。经过学生-课程关系表,熟悉hive语句
distribute by 按照指定的字段对数据进行划分到不同的输出reduce中 此方法会根据性别划分到不同的reduce中 ,然后按年龄排序并输出到不同的文件中。hive> set mapred.reduce.tasks=2;hive> insert overwrite local directory '/home/hadoop/out' select * from student distribute by Sex sort by Sage;Cluster By 能保证分配给同一个reduce的相同的列邻近,除distribute by除完成的功能外。5、Join
读书人网 >云计算

热点推荐