我正在使用 apache POI 读取 Excel 文档.至少可以说,它目前能够达到我的目的.但我感到震惊的一件事是将单元格的值提取为 HTML.
I am using apache POI to read an excel document. To say the least, it is able to serve my purpose as of now. But one thing where I am getting struck is extracting the value of cell as HTML.
我有一个单元格,用户将在其中输入一些字符串并应用一些格式(如项目符号/数字/粗体/斜体) 等.
I have one cell wherein user will enter some string and apply some formatting(like bullets/numbers/bold/italic) etc.
所以当我阅读它时,内容应该是 HTML 格式,而不是 POI 给出的纯字符串格式.
SO when I read it the content should be in HTML format and not a plain string format as given by POI.
我几乎浏览了整个 POI API,但找不到任何人.我想只保留一列的格式,而不是整个 excel.我所说的列是指在该列中输入的文本.我希望该文本为 HTML 文本.
I have almost gone through the entire POI API but not able to find anyone. I want to remain the formatting of just one particular column and not the entire excel. By column I mean, the text which is entered in that column. I want that text as HTML text.
还探索和使用了 Apache Tika.但是,据我所知,它只能获取文本,而不能获取文本的格式.
Explored and used Apache Tika also. However as I understand it can only get me the text but not the formatting of the text.
请有人指导我.我的选择不多了.
Please someone guide me. I am running out of options.
假设我在 Excel 中写了 My name is Angel 和 Demon.
Suppose I wrote My name is Angel and Demon in Excel.
我应该在 Java 中得到的输出是 My name is Angel和<i>恶魔</i>
The output I should get in Java is My name is <b>Angel</b> and <i>Demon</i>
推荐答案我已将此作为 unicode 粘贴到 xls 文件的单元格 A1 中:
I've paste this as unicode to cell A1 of xls file:
<html><p>This is a test. Will this text be <b>bold</b> or <i>italic</i></p></html>这个 html 行产生这个:
This html line produce this:
这是一个测试.这段文字是粗体还是斜体
This is a test. Will this text be bold or italic
我的代码:
public class ExcelWithHtml { // <html><p>This is a test. Will this text be <b>bold</b> or // <i>italic</i></p></html> public static void main(String[] args) throws FileNotFoundException, IOException { new ExcelWithHtml() .readFirstCellOfXSSF("/Users/rcacheira/testeHtml.xlsx"); } boolean inBold = false; boolean inItalic = false; public void readFirstCellOfXSSF(String filePathName) throws FileNotFoundException, IOException { FileInputStream fis = new FileInputStream(filePathName); XSSFWorkbook wb = new XSSFWorkbook(fis); XSSFSheet sheet = wb.getSheetAt(0); String cellHtml = getHtmlFormatedCellValueFromSheet(sheet, "A1"); System.out.println(cellHtml); fis.close(); } public String getHtmlFormatedCellValueFromSheet(XSSFSheet sheet, String cellName) { CellReference cellReference = new CellReference(cellName); XSSFRow row = sheet.getRow(cellReference.getRow()); XSSFCell cell = row.getCell(cellReference.getCol()); XSSFRichTextString cellText = cell.getRichStringCellValue(); String htmlCode = ""; // htmlCode = "<html>"; for (int i = 0; i < cellText.numFormattingRuns(); i++) { try { htmlCode += getFormatFromFont(cellText.getFontAtIndex(i)); } catch (NullPointerException ex) { } try { htmlCode += getFormatFromFont(cellText .getFontOfFormattingRun(i)); } catch (NullPointerException ex) { } int indexStart = cellText.getIndexOfFormattingRun(i); int indexEnd = indexStart + cellText.getLengthOfFormattingRun(i); htmlCode += cellText.getString().substring(indexStart, indexEnd); } if (inItalic) { htmlCode += "</i>"; inItalic = false; } if (inBold) { htmlCode += "</b>"; inBold = false; } // htmlCode += "</html>"; return htmlCode; } private String getFormatFromFont(XSSFFont font) { String formatHtmlCode = ""; if (font.getItalic() && !inItalic) { formatHtmlCode += "<i>"; inItalic = true; } else if (!font.getItalic() && inItalic) { formatHtmlCode += "</i>"; inItalic = false; } if (font.getBold() && !inBold) { formatHtmlCode += "<b>"; inBold = true; } else if (!font.getBold() && inBold) { formatHtmlCode += "</b>"; inBold = false; } return formatHtmlCode; } }我的输出:
This is a test. Will this text be <b>bold</b> or <i>italic</i>我认为这是您想要的,我只是向您展示了可能性,我没有使用最佳代码实践,我只是快速编程以产生输出.
I think it is what you want, i'm only show you the possibilities, i'm not using the best code practices, i'm just programming fast to produce an output.