最新消息:雨落星辰是一个专注网站SEO优化、网站SEO诊断、搜索引擎研究、网络营销推广、网站策划运营及站长类的自媒体原创博客

如何在配置单元中转置旋转数据?

SEO心得admin40浏览0评论
本文介绍了如何在配置单元中转置/旋转数据?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧! 问题描述

我知道没有直接的方法可以在 hive 中转置数据.我跟着这个问题:Is there a way to transpose data in Hive? ,但由于那里没有最终答案,所以无法一路走来.

I know there's no direct way to transpose data in hive. I followed this question: Is there a way to transpose data in Hive? , but as there is no final answer there, could not get all the way.

这是我的桌子:

| ID | Code | Proc1 | Proc2 | | 1 | A | p | e | | 2 | B | q | f | | 3 | B | p | f | | 3 | B | q | h | | 3 | B | r | j | | 3 | C | t | k |

这里 Proc1 可以有任意数量的值.ID、代码和Proc1 一起构成了该表的唯一键.我想转置/转置该表,以便 Proc1 中的每个唯一值成为一个新列,而 Proc2 中的对应值是该列中对应行的值.本质上,我试图得到类似的东西:

Here Proc1 can have any number of values. ID, Code & Proc1 together form a unique key for this table. I want to Pivot/ transpose this table so that each unique value in Proc1 becomes a new column, and corresponding value from Proc2 is the value in that column for the corresponding row. In essense, I'm trying to get something like:

| ID | Code | p | q | r | t | | 1 | A | e | | | | | 2 | B | | f | | | | 3 | B | f | h | j | | | 3 | C | | | | k |

在新的转换表中,ID 和代码是唯一的主键.根据我上面提到的票证,我可以使用 to_map UDAF 走到这一步.(免责声明 - 这可能不是朝着正确方向迈出的一步,但只是在此处提及,如果是的话)

In the new transformed table, ID and code are the only primary key. From the ticket I mentioned above, I could get this far using the to_map UDAF. (Disclaimer - this may not be a step in the right direction, but just mentioning here, if it is)

| ID | Code | Map_Aggregation | | 1 | A | {p:e} | | 2 | B | {q:f} | | 3 | B | {p:f, q:h, r:j } | | 3 | C | {t:k} |

但不知道如何从这一步到我想要的数据透视表/转置表.关于如何进行的任何帮助都会很棒!谢谢.

But don't know how to get from this step to the pivot/transposed table I want. Any help on how to proceed will be great! Thanks.

推荐答案

这是我最终使用的解决方案:

Here is the solution I ended up using:

add jar brickhouse-0.7.0-SNAPSHOT.jar; CREATE TEMPORARY FUNCTION collect AS 'brickhouse.udf.collect.CollectUDAF'; select id, code, group_map['p'] as p, group_map['q'] as q, group_map['r'] as r, group_map['t'] as t from ( select id, code, collect(proc1,proc2) as group_map from test_sample group by id, code ) gm;

to_map UDF 来自砖厂仓库:github/klout/brickhouse

The to_map UDF was used from the brickhouse repo: github/klout/brickhouse

发布评论

评论列表(0)

  1. 暂无评论