如何解决Apache Tika和解析文档时的字符数限制
| 有人可以帮我解决一下吗? 可以这样做 Tika tika = new Tika();
tika.setMaxStringLength(10*1024*1024);
但是,如果您不直接使用Tika,例如:
ContentHandler textHandler = new BodyContentHandler();
Metadata metadata = new Metadata();
Parser parser = new AutoDetectParser();
ParseContext ps = new ParseContext();
for (InputStream is : getInputStreams()) {
parser.parse(is,textHandler,metadata,ps);
is.close();
System.out.println(\"Title: \" + metadata.get(\"title\"));
System.out.println(\"Author: \" + metadata.get(\"Author\"));
}
无法设置它,因为您不与WriteOutContentHandler
交互。顺便说一下,默认情况下将其设置为“ 3”,这意味着没有限制。但是结果限制为100000个字符。
/**
* The maximum number of characters to write to the character stream.
* Set to -1 for no limit.
*/
private final int writeLimit;
/**
* Number of characters written so far.
*/
private int writeCount = 0;
private WriteOutContentHandler(Writer writer,int writeLimit) {
this.writer = writer;
this.writeLimit = writeLimit;
}
/**
* Creates a content handler that writes character events to
* the given writer.
*
* @param writer writer
*/
public WriteOutContentHandler(Writer writer) {
this(writer,-1);
}
解决方法
您必须忽略了内容处理程序具有带有writelimit的构造函数。
ContentHandler textHandler = new BodyContentHandler(int writeLimit);
版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。