最新消息:雨落星辰是一个专注网站SEO优化、网站SEO诊断、搜索引擎研究、网络营销推广、网站策划运营及站长类的自媒体原创博客

聊聊Spring AI Alibaba的BilibiliDocumentReader

网站源码admin4浏览0评论

聊聊Spring AI Alibaba的BilibiliDocumentReader

本文主要研究一下Spring AI Alibaba的BilibiliDocumentReader

BilibiliDocumentReader

community/document-readers/spring-ai-alibaba-starter-document-reader-bilibili/src/main/java/com/alibaba/cloud/ai/reader/bilibili/BilibiliDocumentReader.java

代码语言:javascript代码运行次数:0运行复制
public class BilibiliDocumentReader implements DocumentReader {

	private static final Logger logger = LoggerFactory.getLogger(BilibiliDocumentReader.class);

	private static final String API_BASE_URL = "=";

	private final String resourcePath;

	private final ObjectMapper objectMapper;

	private static final int MEMORY_SIZE = 5;

	private static final int BYTE_SIZE = 1024;

	private static final int MAX_MEMORY_SIZE = MEMORY_SIZE * BYTE_SIZE * BYTE_SIZE;

	private static final WebClient WEB_CLIENT = WebClient.builder()
		.defaultHeader(HttpHeaders.ACCEPT, MediaType.APPLICATION_JSON_VALUE)
		.codecs(configurer -> configurer.defaultCodecs().maxInMemorySize(MAX_MEMORY_SIZE))
		.build();

	public BilibiliDocumentReader(String resourcePath) {
		Assert.hasText(resourcePath, "Query string must not be empty");
		this.resourcePath = resourcePath;
		this.objectMapper = new ObjectMapper();
	}

	@Override
	public List<Document> get() {
		List<Document> documents = new ArrayList<>();
		try {
			String bvid = extractBvid(resourcePath);
			String videoInfoResponse = fetchVideoInfo(bvid);
			JsonNode videoData = parseJson(videoInfoResponse).path("data");
			String title = videoData.path("title").asText();
			String description = videoData.path("desc").asText();
			Document infoDoc = new Document("Video information", Map.of("title", title, "description", description));
			documents.add(infoDoc);
			String documentContent = fetchAndProcessSubtitles(videoData, title, description);
			documents.add(new Document(documentContent));
		}
		catch (IllegalArgumentException e) {
			logger.error("Invalid input: {}", e.getMessage());
			documents.add(new Document("Error: Invalid input"));
		}
		catch (IOException e) {
			logger.error("Error parsing JSON: {}", e.getMessage(), e);
			documents.add(new Document("Error parsing JSON: " + e.getMessage()));
		}
		catch (Exception e) {
			logger.error("Unexpected error: {}", e.getMessage(), e);
			documents.add(new Document("Unexpected error: " + e.getMessage()));
		}
		return documents;
	}

	private String extractBvid(String resourcePath) {
		return resourcePath.replaceAll(".*(BV\\w+).*", "$1");
	}

	private String fetchVideoInfo(String bvid) {
		return WEB_CLIENT.get().uri(API_BASE_URL + bvid).retrieve().bodyToMono(String.class).block();
	}

	private JsonNode parseJson(String jsonResponse) throws IOException {
		return objectMapper.readTree(jsonResponse);
	}

	private String fetchAndProcessSubtitles(JsonNode videoData, String title, String description) throws IOException {
		JsonNode subtitleList = videoData.path("subtitle").path("list");
		if (subtitleList.isArray() && subtitleList.size() > 0) {
			String subtitleUrl = subtitleList.get(0).path("subtitle_url").asText();
			String subtitleResponse = WEB_CLIENT.get().uri(subtitleUrl).retrieve().bodyToMono(String.class).block();

			JsonNode subtitleJson = parseJson(subtitleResponse);
			StringBuilder rawTranscript = new StringBuilder();
			subtitleJson.path("body").forEach(node -> rawTranscript.append(node.path("content").asText()).append(" "));

			return String.format("Video Title: %s, Description: %s\nTranscript: %s", title, description,
					rawTranscript.toString().trim());
		}
		else {
			return String.format("No subtitles found for video: %s. Returning an empty transcript.", resourcePath);
		}
	}

}

BilibiliDocumentReader使用WebClient去请求B站接口,它从url解析bvid,再根据bvid去请求接口,解析json获取title、description,通过fetchAndProcessSubtitles再去请求subtitle_url获取字幕内容作为document的内容

示例

代码语言:javascript代码运行次数:0运行复制
public class BilibiliDocumentReaderTest {

	private static final Logger logger = LoggerFactory.getLogger(BilibiliDocumentReader.class);

	@Test
	void bilibiliDocumentReaderTest() {
		BilibiliDocumentReader bilibiliDocumentReader = new BilibiliDocumentReader(
				"/?t=7&vd_source=3069f51b168ac07a9e3c4ba94ae26af5");
		List<Document> documents = bilibiliDocumentReader.get();
		logger.info("documents: {}", documents);
	}

}

小结

spring-ai-alibaba-starter-document-reader-bilibili提供了BilibiliDocumentReader用于解析B站的视频url,它请求两次接口,一次获取title和description,一次获取字幕。

doc

  • java2ai
本文参与 腾讯云自媒体同步曝光计划,分享自微信公众号。原始发表:2025-04-18,如有侵权请联系 cloudcommunity@tencent 删除stringtitle接口springprivate
发布评论

评论列表(0)

  1. 暂无评论