问题解决：Only SubQuery expressions that are top level conjuncts are allowed

文章目录

问题场景
问题环境
问题原因
解决方案
结果
总结
PS

问题场景

在CDH的hue界面控制台上面，使用hive组件，执行较为复杂的SQL，SQL内含有in关键字，而in内部是关联其他表的结果。因为里面含有or字段，导致hive解析SQL的时候，认为in的条件不是位于第一序列，然后报错。可能说着很抽象，以下是SQL：

select a.a1,a.a2,count(*) as num
from test1 a 
where a.a3 in ('1','2')  and (
	a.a4 in ('2','3') 
	or 
	a.a4 not in (
		select c1 from test2 c where c2 = '1'
	)
)
group by a.a1,a.a2

问题环境

软件	版本
CDH	5.15.1
hive	1.1.0

问题原因

从SQL可以看出，in的条件是与or的另一个条件一起当做整条SQL的条件。然后在hive解析当中，如果要使用 in (select * from table)这种格式的话，那必须是将in (select * from table)当做条件的唯一体，即下面的SQL是可以通过验证的

select a.a1,a.a2,count(*) as num
from test1 a 
where a.a3 in ('1','2')  and a.a4 not in (
	select c1 from test2 c where c2 = '1'
)
group by a.a1,a.a2

解决方案

既然已经知道问题的原因，那么就很好解决了。我提供以下两种方式：

使用 union all将 or条件的语句切割成两条SQL，然后结果进行拼接。如果有重复数据，记得进行过滤。样例SQL如下：

select a.a1,a.a2,count(*) as num
from test1 a 
where a.a3 in ('1','2')  and a.a4 in ('2','3') 
group by a.a1,a.a2

union all 

select a.a1,a.a2,count(*) as num
from test1 a 
where a.a3 in ('1','2')  and a.a4 not in (
	select c1 from test2 c where c2 = '1'
)
group by a.a1,a.a2

使用 left join语句，将匹配上的数据筛选出来。样例SQL如下：

select a.a1,a.a2,count(*) as num
from test1 a left join (select c1 from test2 c where c2 = '1') c
on a.a4 = c.c1
where a.a3 in ('1','2')  and (
	a.a4 in ('2','3') 
	or  
	c.c1 is null
)
group by a.a1,a.a2

以上的SQL会将匹配不上的数据筛选出来。这个方法可以参考我的另一篇博客hive 如何去除两个表相同的部分，理论是一致的。

结果

这里是采用了第二种做法，为了最大限度不进行大改。最后问题得到解决。

总结

问题的出现都是有原因的，找到问题发生的原因，然后根据原因得出不同的解决的方案，选择最有利的解决方案。之后，也要进行问题的回顾和记录，才能将知识沉淀于心。

PS

大家看到这里，如果我的文章可以对大家产生帮忙，麻烦在文章底部点个赞或者收藏；如果有好的讨论，也可以留言。谢谢大家的观看。

ldx2

发布了43 篇原创文章 · 获赞 4 · 访问量 2万+

私信关注