Solr 8.6中的矢量计分插件

如何解决Solr 8.6中的矢量计分插件

我正在尝试将VectorScoringPlugin升级到Solr 8.6。检查Lucene 8.x的migration logs后,我了解到插件viz.CustomScoreQuery和CustomScoreProvider中使用的类已弃用,取而代之的是,我们必须将FunctionScoreQuery与DoubleValuesSource一起使用。我进行了很多搜索,但找不到使用上述类实现自定义评分器的任何示例。我在java-lucene论坛上偶然发现了这两个线程 [thread1thread2],实际上讨论了同一问题,提到的解决方案是实现一个具有自定义逻辑的自定义DoubleValuesSource类。下面是实现;

import java.io.IOException;
import java.util.ArrayList;
import java.util.List;

import org.apache.lucene.analysis.payloads.PayloadHelper;
import org.apache.lucene.index.LeafReaderContext;
import org.apache.lucene.index.PostingsEnum;
import org.apache.lucene.index.Terms;
import org.apache.lucene.index.TermsEnum;
import org.apache.lucene.search.DocIdSetIterator;
import org.apache.lucene.search.DoubleValues;
import org.apache.lucene.search.DoubleValuesSource;
import org.apache.lucene.search.IndexSearcher;
import org.apache.lucene.util.BytesRef;
import org.apache.solr.common.SolrException;

public class CustomDoubleValueSource extends DoubleValuesSource {

    List<Double> vector;
    private String field;
    private boolean cosine;
    double queryVectorNorm = 0;

    public CustomDoubleValueSource(String field,String Vector,boolean cosine) {
        // TODO Auto-generated constructor stub
        super();
        this.field = field;
        this.cosine = cosine;
        this.vector = new ArrayList<Double>();
        String[] vectorArray = Vector.split(",");
        for (int i = 0; i < vectorArray.length; i++) {
            double v = Double.parseDouble(vectorArray[i]);
            vector.add(v);
            if (cosine) {
                queryVectorNorm += Math.pow(v,2.0);
            }
        }
        
        System.out.println("Vector size:"+this.vector.size());
    }

    @Override
    public boolean isCacheable(LeafReaderContext ctx) {
        // TODO Auto-generated method stub
        return false;
    }

    @Override
    public DoubleValues getValues(LeafReaderContext ctx,DoubleValues scores) throws IOException {

        Terms terms = ctx.reader().terms(field);
        TermsEnum te = terms == null ? null : terms.iterator();
        
        System.out.println("Term size:"+terms.size());

        if (vector.size() != terms.size()) {
            throw new SolrException(SolrException.ErrorCode.BAD_REQUEST,"indexed and input vector array must have same length");
        }

        final PostingsEnum pe = te.postings(null);
        // TODO Auto-generated method stub
        return new DoubleValues() {

            @Override
            public double doubleValue() throws IOException {
                // TODO Auto-generated method stub
                float score = 0;
                double docVectorNorm = 0;
                BytesRef text;
                while ((text = te.next()) != null) {
                    String term = text.utf8ToString();
                    float payloadValue = 0f;
                    PostingsEnum postings = te.postings(null,PostingsEnum.ALL);
                    while (postings.nextDoc() != DocIdSetIterator.NO_MORE_DOCS) {
                        int freq = postings.freq();
                        while (freq-- > 0)
                            postings.nextPosition();

                        BytesRef payload = postings.getPayload();
                        payloadValue = PayloadHelper.decodeFloat(payload.bytes,payload.offset);

                        if (cosine)
                            docVectorNorm += Math.pow(payloadValue,2.0);
                    }

                    score = (float) (score + payloadValue * (vector.get(Integer.parseInt(term))));
                }

                if (cosine) {
                    if ((docVectorNorm == 0) || (queryVectorNorm == 0))
                        return 0f;
                    return (float) (score / (Math.sqrt(docVectorNorm) * Math.sqrt(queryVectorNorm)));
                }

                return score;
            }

            @Override
            public boolean advanceExact(int doc) throws IOException {
                // TODO Auto-generated method stub
                if (pe.docID() > doc)
                    return false;
                
                return pe.docID() == doc || pe.advance(doc) == doc;
            }
        };
    }

    @Override
    public boolean needsScores() {
        // TODO Auto-generated method stub
        return true;
    }

    @Override
    public DoubleValuesSource rewrite(IndexSearcher reader) throws IOException {
        // TODO Auto-generated method stub
        return null;
    }

    @Override
    public int hashCode() {
        // TODO Auto-generated method stub
        return 0;
    }

    @Override
    public boolean equals(Object obj) {
        // TODO Auto-generated method stub
        return false;
    }

    @Override
    public String toString() {
        // TODO Auto-generated method stub
        return null;
    }

}

并且我正在使用以下自定义类;

import org.apache.lucene.queries.function.FunctionScoreQuery;
import org.apache.lucene.search.Query;
import org.apache.solr.common.SolrException;
import org.apache.solr.common.params.SolrParams;
import org.apache.solr.request.SolrQueryRequest;
import org.apache.solr.schema.FieldType;
import org.apache.solr.search.QParser;
import org.apache.solr.search.QParserPlugin;
import org.apache.solr.search.QueryParsing;
import org.apache.solr.search.SyntaxError;

public class VectorQParserPlugin extends QParserPlugin {
    @Override
    public QParser createParser(String qstr,SolrParams localParams,SolrParams params,SolrQueryRequest req) {
        return new QParser(qstr,localParams,params,req) {
            @Override
            public Query parse() throws SyntaxError {
                String field = localParams.get(QueryParsing.F);
                String vector = localParams.get("vector");
                boolean cosine = localParams.getBool("cosine",true);

                if (field == null) {
                    throw new SolrException(SolrException.ErrorCode.BAD_REQUEST,"'f' not specified");
                }

                if (vector == null) {
                    throw new SolrException(SolrException.ErrorCode.BAD_REQUEST,"vector missing");
                }
                
                System.out.println("FIELD:"+field);

                Query subQuery = subQuery(localParams.get(QueryParsing.V),null).getQuery();

                FieldType ft = req.getCore().getLatestSchema().getFieldType(field);
                
                if(ft != null) {
                    System.out.println("in here");
                    VectorQuery q = new VectorQuery(subQuery);
                    q.setQueryString(localParams.toLocalParamsString()); 
                    query = q;
                }
            
                System.out.println("QUERY:"+query);
                if (query == null) {
                    throw new SolrException(SolrException.ErrorCode.BAD_REQUEST,"Query is null");
                }

                return FunctionScoreQuery.boostByValue(query,new CustomDoubleValueSource(field,vector,cosine));

            }
        };
    }
}

此外,我将自定义查询实现升级到特定于8.6,以避免查询未实现createWieght 错误。

下面是VectorQuery的实现;

import java.io.IOException;
import org.apache.lucene.index.LeafReaderContext;
import org.apache.lucene.search.ConstantScoreScorer;
import org.apache.lucene.search.ConstantScoreWeight;
import org.apache.lucene.search.DocIdSetIterator;
import org.apache.lucene.search.IndexSearcher;
import org.apache.lucene.search.Query;
import org.apache.lucene.search.ScoreMode;
import org.apache.lucene.search.Scorer;
import org.apache.lucene.search.Weight;
public class VectorQuery extends Query {
    String queryStr = "";
    Query q;
    public VectorQuery(Query subQuery) {
        this.q = subQuery;
    }
    
    public void setQueryString(String queryString){
        this.queryStr = queryString;
    }

    public Weight createWeight(IndexSearcher searcher,ScoreMode needsScores,float boost) throws IOException {
        Weight w;
        if(q == null){
            w =  new ConstantScoreWeight(this,boost) {
                @Override
                public Scorer scorer(LeafReaderContext context) throws IOException {
                    return new ConstantScoreScorer(this,score(),needsScores,DocIdSetIterator.all(context.reader().maxDoc()));
                }

                @Override
                public boolean isCacheable(LeafReaderContext ctx) {
                    // TODO Auto-generated method stub
                    return false;
                }
            };
        }else{
            w = searcher.createWeight(q,boost);
        }
        return w;
    }

    @Override
    public String toString(String field) {
        return queryStr;
    }

    @Override
    public boolean equals(Object other) {
        return sameClassAs(other) &&
                queryStr.equals(other.toString());
    }

    @Override
    public int hashCode() {
        return classHash() ^ queryStr.hashCode();
    }

}

我添加了打印语句来检查执行流程,并调用了CustomDoubleValueSource类。下面是日志的屏幕截图。

enter image description here

但是执行流程没有到达getValues方法。我遇到以下错误;

2020-10-21 16:55:09.578 ERROR (qtp1962826816-19) [   x:example_vector] o.a.s.s.HttpSolrCall null:java.lang.NullPointerException
        at org.apache.lucene.queries.function.FunctionScoreQuery$MultiplicativeBoostValuesSource.getValues(FunctionScoreQuery.java:261)
        at org.apache.lucene.queries.function.FunctionScoreQuery$FunctionScoreWeight.scorer(FunctionScoreQuery.java:224)
        at org.apache.lucene.search.Weight.bulkScorer(Weight.java:181)
        at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:658)
        at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:445)
        at org.apache.solr.search.SolrIndexSearcher.buildAndRunCollectorChain(SolrIndexSearcher.java:208)
        at org.apache.solr.search.SolrIndexSearcher.getDocListNC(SolrIndexSearcher.java:1593)
        at org.apache.solr.search.SolrIndexSearcher.getDocListC(SolrIndexSearcher.java:1410)
        at org.apache.solr.search.SolrIndexSearcher.search(SolrIndexSearcher.java:593)
        at org.apache.solr.handler.component.QueryComponent.doProcessUngroupedSearch(QueryComponent.java:1513)
        at org.apache.solr.handler.component.QueryComponent.process(QueryComponent.java:403)
        at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:331)
        at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:214)
        at org.apache.solr.core.SolrCore.execute(SolrCore.java:2606)
        at org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:812)
        at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:588)
        at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:415)
        at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:345)
        at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1596)
        at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:545)
        at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)
        at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:590)
        at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127)
        at org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:235)
        at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:1610)
        at org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:233)
        at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1300)
        at org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:188)
        at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:485)
        at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:1580)
        at org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:186)
        at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1215)
        at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
        at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:221)
        at org.eclipse.jetty.server.handler.InetAccessHandler.handle(InetAccessHandler.java:177)
        at org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:146)
        at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127)
        at org.eclipse.jetty.rewrite.handler.RewriteHandler.handle(RewriteHandler.java:322)
        at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127)
        at org.eclipse.jetty.server.Server.handle(Server.java:500)
        at org.eclipse.jetty.server.HttpChannel.lambda$handle$1(HttpChannel.java:383)
        at org.eclipse.jetty.server.HttpChannel.dispatch(HttpChannel.java:547)
        at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:375)
        at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:273)
        at org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:311)
        at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:103)
        at org.eclipse.jetty.io.ChannelEndPoint$2.run(ChannelEndPoint.java:117)
        at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:806)
        at org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.run(QueuedThreadPool.java:938)
        at java.lang.Thread.run(Thread.java:748)

我强烈怀疑这与CustomDoubleValuesSource中未实现的方法有关。我阅读了DoubleValuesSource的java docs,但它们不是描述性的或不包含任何示例。

感谢所有可以帮助我前进的帮助:)

版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。

相关推荐


依赖报错 idea导入项目后依赖报错,解决方案:https://blog.csdn.net/weixin_42420249/article/details/81191861 依赖版本报错:更换其他版本 无法下载依赖可参考:https://blog.csdn.net/weixin_42628809/a
错误1:代码生成器依赖和mybatis依赖冲突 启动项目时报错如下 2021-12-03 13:33:33.927 ERROR 7228 [ main] o.s.b.d.LoggingFailureAnalysisReporter : *************************** APPL
错误1:gradle项目控制台输出为乱码 # 解决方案:https://blog.csdn.net/weixin_43501566/article/details/112482302 # 在gradle-wrapper.properties 添加以下内容 org.gradle.jvmargs=-Df
错误还原:在查询的过程中,传入的workType为0时,该条件不起作用 &lt;select id=&quot;xxx&quot;&gt; SELECT di.id, di.name, di.work_type, di.updated... &lt;where&gt; &lt;if test=&qu
报错如下,gcc版本太低 ^ server.c:5346:31: 错误:‘struct redisServer’没有名为‘server_cpulist’的成员 redisSetCpuAffinity(server.server_cpulist); ^ server.c: 在函数‘hasActiveC
解决方案1 1、改项目中.idea/workspace.xml配置文件,增加dynamic.classpath参数 2、搜索PropertiesComponent,添加如下 &lt;property name=&quot;dynamic.classpath&quot; value=&quot;tru
删除根组件app.vue中的默认代码后报错:Module Error (from ./node_modules/eslint-loader/index.js): 解决方案:关闭ESlint代码检测,在项目根目录创建vue.config.js,在文件中添加 module.exports = { lin
查看spark默认的python版本 [root@master day27]# pyspark /home/software/spark-2.3.4-bin-hadoop2.7/conf/spark-env.sh: line 2: /usr/local/hadoop/bin/hadoop: No s
使用本地python环境可以成功执行 import pandas as pd import matplotlib.pyplot as plt # 设置字体 plt.rcParams[&#39;font.sans-serif&#39;] = [&#39;SimHei&#39;] # 能正确显示负号 p
错误1:Request method ‘DELETE‘ not supported 错误还原:controller层有一个接口,访问该接口时报错:Request method ‘DELETE‘ not supported 错误原因:没有接收到前端传入的参数,修改为如下 参考 错误2:cannot r
错误1:启动docker镜像时报错:Error response from daemon: driver failed programming external connectivity on endpoint quirky_allen 解决方法:重启docker -&gt; systemctl r
错误1:private field ‘xxx‘ is never assigned 按Altʾnter快捷键,选择第2项 参考:https://blog.csdn.net/shi_hong_fei_hei/article/details/88814070 错误2:启动时报错,不能找到主启动类 #
报错如下,通过源不能下载,最后警告pip需升级版本 Requirement already satisfied: pip in c:\users\ychen\appdata\local\programs\python\python310\lib\site-packages (22.0.4) Coll
错误1:maven打包报错 错误还原:使用maven打包项目时报错如下 [ERROR] Failed to execute goal org.apache.maven.plugins:maven-resources-plugin:3.2.0:resources (default-resources)
错误1:服务调用时报错 服务消费者模块assess通过openFeign调用服务提供者模块hires 如下为服务提供者模块hires的控制层接口 @RestController @RequestMapping(&quot;/hires&quot;) public class FeignControl
错误1:运行项目后报如下错误 解决方案 报错2:Failed to execute goal org.apache.maven.plugins:maven-compiler-plugin:3.8.1:compile (default-compile) on project sb 解决方案:在pom.
参考 错误原因 过滤器或拦截器在生效时,redisTemplate还没有注入 解决方案:在注入容器时就生效 @Component //项目运行时就注入Spring容器 public class RedisBean { @Resource private RedisTemplate&lt;String
使用vite构建项目报错 C:\Users\ychen\work&gt;npm init @vitejs/app @vitejs/create-app is deprecated, use npm init vite instead C:\Users\ychen\AppData\Local\npm-