我知道没有直接的方法可以在 hive 中转置数据.我跟着这个问题:Is there a way to transpose data in Hive? ,但由于那里没有最终答案,所以无法一路走来.
I know there's no direct way to transpose data in hive. I followed this question: Is there a way to transpose data in Hive? , but as there is no final answer there, could not get all the way.
这是我的桌子:
| ID | Code | Proc1 | Proc2 | | 1 | A | p | e | | 2 | B | q | f | | 3 | B | p | f | | 3 | B | q | h | | 3 | B | r | j | | 3 | C | t | k |这里 Proc1 可以有任意数量的值.ID、代码和Proc1 一起构成了该表的唯一键.我想转置/转置该表,以便 Proc1 中的每个唯一值成为一个新列,而 Proc2 中的对应值是该列中对应行的值.本质上,我试图得到类似的东西:
Here Proc1 can have any number of values. ID, Code & Proc1 together form a unique key for this table. I want to Pivot/ transpose this table so that each unique value in Proc1 becomes a new column, and corresponding value from Proc2 is the value in that column for the corresponding row. In essense, I'm trying to get something like:
| ID | Code | p | q | r | t | | 1 | A | e | | | | | 2 | B | | f | | | | 3 | B | f | h | j | | | 3 | C | | | | k |在新的转换表中,ID 和代码是唯一的主键.根据我上面提到的票证,我可以使用 to_map UDAF 走到这一步.(免责声明 - 这可能不是朝着正确方向迈出的一步,但只是在此处提及,如果是的话)
In the new transformed table, ID and code are the only primary key. From the ticket I mentioned above, I could get this far using the to_map UDAF. (Disclaimer - this may not be a step in the right direction, but just mentioning here, if it is)
| ID | Code | Map_Aggregation | | 1 | A | {p:e} | | 2 | B | {q:f} | | 3 | B | {p:f, q:h, r:j } | | 3 | C | {t:k} |但不知道如何从这一步到我想要的数据透视表/转置表.关于如何进行的任何帮助都会很棒!谢谢.
But don't know how to get from this step to the pivot/transposed table I want. Any help on how to proceed will be great! Thanks.
推荐答案这是我最终使用的解决方案:
Here is the solution I ended up using:
add jar brickhouse-0.7.0-SNAPSHOT.jar; CREATE TEMPORARY FUNCTION collect AS 'brickhouse.udf.collect.CollectUDAF'; select id, code, group_map['p'] as p, group_map['q'] as q, group_map['r'] as r, group_map['t'] as t from ( select id, code, collect(proc1,proc2) as group_map from test_sample group by id, code ) gm;to_map UDF 来自砖厂仓库:github/klout/brickhouse
The to_map UDF was used from the brickhouse repo: github/klout/brickhouse