如何解决通过查询分别计算字符串的频率

我想从名为a.java的文件中搜索查询。如果我的查询是字符串名称，我想从文本文件的查询中分别获取字符串的频率。首先，我必须计算String的频率，然后分别命名，然后将两者都添加。如何在Java平台上实现该程序？

public class Tf2 {
Integer k;
int totalword = 0;
int totalfile,containwordfile = 0;
Map<String,Integer> documentToCount = new HashMap<>();
File file = new File("H:/java");
File[] files = file.listFiles();
public void Count(String word) {
   File[] files = file.listFiles();
    Integer count = 0;
    for (File f : files) {
        BufferedReader br = null;
        try {
            br = new BufferedReader(new FileReader(f));
            count = documentToCount.get(word);

            documentToCount.clear();

            String line;
            while ((line = br.readLine()) != null) {
                String term[] = line.trim().replaceAll("[^a-zA-Z0-9 ]"," ").toLowerCase().split(" ");


                for (String terms : term) {
                    totalword++;
                    if (count == null) {
                        count = 0;
                    }
                    if (documentToCount.containsKey(word)) {

                        count = documentToCount.get(word);
                        documentToCount.put(terms,count + 1);
                    } else {
                        documentToCount.put(terms,1);

                    }

                }

            }
          k = documentToCount.get(word);

            if (documentToCount.get(word) != null) {
                containwordfile++;
       
               System.out.println("" + k);

            }

        } catch (Exception e) {
            e.printStackTrace();
        }
    }
} public static void main(String[] args) throws IOException {Tf2  ob = new Tf2();String query="String name";ob.Count(query);
}}

我用hashmap尝试过。但它无法单独计算查询的频率。

解决方法

以下是使用Collections.frequency获取文件中字符串计数的示例：

public void Count(String word) {
    File f = new File("/your/path/text.txt");
    BufferedReader br = null;
    List<String> list = new ArrayList<String>();
    try {
        if (f.exists() && f.isFile()) {
            br = new BufferedReader(new FileReader(f));
            String line;
            while ((line = br.readLine()) != null) {
                String[] arr = line.split(" ");
                for (String str : arr) {
                    list.add(str);
                }

            }
            System.out.println("Frequency = " + Collections.frequency(list,word));
        }

    } catch (IOException e) {
        e.printStackTrace();
    }
}

这是另一个使用Java Streams API的示例，它也适用于目录内的多文件搜索：

    public class Test {

    public static void main(String[] args) {
        File file = new File("C:/path/to/your/files/");
        String targetWord = "stringtofind";
        long numOccurances = 0;

        if(file.isFile() && file.getName().endsWith(".txt")){

            numOccurances = getLineStreamFromFile(file)
                    .flatMap(str -> Arrays.stream(str.split("\\s")))
                    .filter(str -> str.equals(targetWord))
                    .count();

        } else if(file.isDirectory()) {

            numOccurances = Arrays.stream(file.listFiles(pathname -> pathname.toString().endsWith(".txt")))
                    .flatMap(Test::getLineStreamFromFile)
                    .flatMap(str -> Arrays.stream(str.split("\\s")))
                    .filter(str -> str.equals(targetWord))
                    .count();
        }

        System.out.println(numOccurances);
    }

    public static Stream<String> getLineStreamFromFile(File file){
        try {
            return Files.lines(file.toPath());
        } catch (IOException e) {
            e.printStackTrace();
        }
        return Stream.empty();
    }
  }

此外，您可以将输入字符串分成单个单词并循环以获取每个单词的出现次数。

您使事情变得过于复杂。如果您要做的只是计数发生次数，则不需要哈希图或类似的东西。您需要做的就是遍历文档中的所有文本，并计算找到搜索字符串的次数。

基本上，您的工作流程为：

实例化计数器为0
阅读文字
遍历文本，寻找搜索字符串
找到搜索字符串后，递增计数器
完成对文本的迭代之后，打印计数器的结果

如果文本很长，则可以逐行执行此操作，也可以批量阅读。

这是一个简单的例子。假设我有一个文件，正在寻找“狗”一词。

// 1. instantiate counter to 0
int count = 0;

// 2. read text
Path path = ...; // path to my input file
String text = Files.readString(path,StandardCharsets.US_ASCII);

// 3-4. find instances of the string in the text
String searchString = "dog";

int lastIndex = 0;
while (lastIndex != -1) {
  lastIndex = text.indexOf(searchString,lastIndex); // will resolve -1 if the searchString is not found
  if (lastIndex != -1) {
    count++; // increment counter
    lastIndex += searchString.length(); // increment index by length of search term
  }
}

// 5. print result of counter
System.out.println("Found " + count + " instances of " + searchString);

在您的特定示例中，您将阅读a.java类的内容，然后找到“ String”的实例数，然后是“ name”的实例数。您可以在闲暇时将它们汇总在一起。因此，对于要搜索的每个单词，您都要重复第3步和第4步，然后最后汇总所有计数。

当然，最简单的方法是将第3步和第4步包装在返回计数的方法中。

int countOccurrences(String searchString,String text) {
  int count = 0;
  int lastIndex = 0;
  while (lastIndex != -1) {
    lastIndex = text.indexOf(searchString,lastIndex);
    if (lastIndex != -1) {
      count++;
      lastIndex += searchString.length();
    }
  }
  return count;
}

// Call:
int nameCount = countOccurrences("name",text);
int stringCount = countOccurrences("String",text);

System.out.println("Counted " + nameCount + " instances of 'name' and " + stringCount + " instances of 'String',for a total of " + (nameCount + stringCount));

（是否对toLowerCase()做text取决于是否需要区分大小写的匹配。）

当然，如果您只想要'name'而不是'lastName'，那么您将需要考虑单词边界之类的东西（正则表达式字符类\b在这里很有用。）用于解析打印的文本，则需要考虑跨行尾以连字符分隔的单词。但这听起来像是您的用例只是在对刚好以空格分隔的字符串提供给您的单个单词的实例进行计数。

如果您实际上只希望将String name的实例作为单个短语，则只需使用第一个工作流程即可。

其他有用的问答：

您可以使用以单词为键，以count为值的地图：

  public static void main(String[] args) {
    String corpus =
        "Wikipedia is a free online encyclopedia,created and edited by volunteers around the world";
    String query = "edited Wikipedia volunteers";

    Map<String,Integer> word2count = new HashMap<>();
    for (String word : corpus.split(" ")) {
      if (!word2count.containsKey(word))
        word2count.put(word,0);
      word2count.put(word,word2count.get(word) + 1);
    }

    for (String q : query.split(" "))
      System.out.println(q + ": " + word2count.get(q));
  }

如果我有一个包含以下行的文件，“维基百科是免费在线百科全书，由世界各地的志愿者创建和编辑。” 想要搜索查询“ Wikipedia志愿者编辑”。然后我的程序首先计算从文本文件中编辑的频率，然后计算维基百科频率，然后是志愿者频率，最后求和所有频率。我可以通过使用哈希图解决它吗？

您可以执行以下操作：

undefined

输出：

import java.util.HashMap;
import java.util.Map;

public class Main {
    public static void main(String[] args) {
        // The given string
        String str = "Wikipedia is a free online encyclopedia,created and edited by volunteers around the world.";

        // The query string
        String query = "edited Wikipedia volunteers";

        // Split the given string and the query string on space
        String[] strArr = str.split("\\s+");
        String[] queryArr = query.split("\\s+");

        // Map to hold the frequency of each word of query in the string
        Map<String,Integer> map = new HashMap<>();

        for (String q : queryArr) {
            for (String s : strArr) {
                if (q.equals(s)) {
                    map.put(q,map.getOrDefault(q,0) + 1);
                }
            }
        }

        // Display the map
        System.out.println(map);

        // Get the sum of all frequencies
        int sumFrequencies = map.values().stream().mapToInt(Integer::intValue).sum();

        System.out.println("Sum of frequencies: " + sumFrequencies);
    }
}

选中the documentation of Map#getOrDefault，以了解更多信息。

更新

在原始答案中，我使用了Java {edited=1,Wikipedia=1,volunteers=1} Sum of frequencies: 3 API来获取值的总和。下面提供了一种替代方法：

Stream

您的另一个问题是：

如果一个文件夹中有多个文件，那么我怎么知道有多少个文件该查询是在哪个文件中发生的时间

您可以创建一个// Get the sum of all frequencies int sumFrequencies = 0; for (int value : map.values()) { sumFrequencies += value; }，其中的键将是文件的名称，而值（即Map<String,Map<String,Integer>>）将是文件的频率映射。我已经在算法上方显示了创建此频率图的算法。您所需要做的就是循环浏览文件列表并填充此映射（Map<String,Integer>）。

通过查询分别计算字符串的频率

如何解决通过查询分别计算字符串的频率

解决方法

更新

相关推荐