MySql:计算单词在列中出现的次数

如何解决MySql:计算单词在列中出现的次数

| 例如,如果我在这样的列中有数据
data
I love book
I love apple
I love book
I hate apple
I hate apple
我如何获得这样的结果
I = 5
love = 3
hate = 2
book = 2
apple = 3
我们可以用MySQL实现吗?     

解决方法

这是仅使用查询的解决方案:
SELECT SUM(total_count) as total,value
FROM (

SELECT count(*) AS total_count,REPLACE(REPLACE(REPLACE(x.value,\'?\',\'\'),\'.\',\'!\',\'\') as value
FROM (
SELECT SUBSTRING_INDEX(SUBSTRING_INDEX(t.sentence,\' \',n.n),-1) value
  FROM table_name t CROSS JOIN 
(
   SELECT a.N + b.N * 10 + 1 n
     FROM 
    (SELECT 0 AS N UNION ALL SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 4 UNION ALL SELECT 5 UNION ALL SELECT 6 UNION ALL SELECT 7 UNION ALL SELECT 8 UNION ALL SELECT 9) a,(SELECT 0 AS N UNION ALL SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 4 UNION ALL SELECT 5 UNION ALL SELECT 6 UNION ALL SELECT 7 UNION ALL SELECT 8 UNION ALL SELECT 9) b
    ORDER BY n
) n
 WHERE n.n <= 1 + (LENGTH(t.sentence) - LENGTH(REPLACE(t.sentence,\'\')))
 ORDER BY value

) AS x
GROUP BY x.value

) AS y
GROUP BY value
这是完整的工作提琴:http://sqlfiddle.com/#!2/17481a/1 首先,我们进行查询以提取所有单词,如@peterm所述(如果要自定义处理的单词总数,请按照他的说明进行操作)。然后将其转换为子查询,然后对每个单词的值分别进行“ 3”和“ 4”的查询,然后在此基础上再次查询“ 4”未分组的单词(可能出现伴随符号)。即:你好=你好! a6ѭ     ,如果要执行这种文本分析,我建议您使用lucene之类的方法来获取文档中每个术语的termcount。     ,如果您的表大小合适,此查询将需要很长时间才能运行。最好在单独的表中跟踪计数并在插入值时更新该表,或者,如果不需要实时结果,则每隔一段时间运行一次此查询以更新计数表并从中获取数据它。这样一来,您就不必花几分钟时间来从此复杂查询中获取数据。 到目前为止,这是我为您提供的服务。这是一个好的开始。您唯一需要做的就是修改它以遍历每一行中的单词。您可以使用游标或子查询。 创建测试表:
create table tbl(str varchar(100) );
insert into tbl values(\'data\');
insert into tbl values(\'I love book\');
insert into tbl values(\'I love apple\');
insert into tbl values(\'I love book\');
insert into tbl values(\'I hate apple\');
insert into tbl values(\'I hate apple\');
从测试表中提取数据:
SELECT DISTINCT str AS Word,COUNT(str) AS Frequency FROM tbl GROUP BY str;
    ,创建这样的用户定义函数,并在查询中使用它
DELIMITER $$

CREATE FUNCTION `getCount`(myStr VARCHAR(1000),myword VARCHAR(100))
    RETURNS INT
    BEGIN
    DECLARE cnt INT DEFAULT 0;
    DECLARE result INT DEFAULT 1;

    WHILE (result > 0) DO
    SET result = INSTR(myStr,myword);
    IF(result > 0) THEN 
        SET cnt = cnt + 1;
        SET myStr = SUBSTRING(myStr,result + LENGTH(myword));
    END IF;
    END WHILE;
    RETURN cnt;    

    END$$

DELIMITER ;
希望能帮助到你 推荐这个     ,分割字符串过程不是我的工作。你可以在这里找到它 http://forge.mysql.com/tools/tool.php?id=4 我为您编写了其余代码。
drop table if exists mytable;
create table mytable (
id int not null auto_increment primary key,mytext varchar(1000)
) engine = myisam;

insert into mytable (mytext)
values (\'I love book,but book sucks!What do you,think   about it? me too\'),(\'I love apple! it rulez.,No,it sucks a lot!!!\'),(\'I love book\'),(\'I hate apple!!! Me too.,!\'),(\'I hate apple\');

drop table if exists mywords;
create table mywords (
id int not null auto_increment primary key,word varchar(50)
) engine = myisam;


delimiter //
drop procedure if exists split_string //
create procedure split_string (
    in input text,in `delimiter` varchar(10) 
) 
sql security invoker
begin
    declare cur_position int default 1 ;
    declare remainder text;
    declare cur_string varchar(1000);
    declare delimiter_length tinyint unsigned;

    drop temporary table if exists SplitValues;
    create temporary table SplitValues (
        value varchar(1000) not null 
    ) engine=myisam;

    set remainder = input;
    set delimiter_length = char_length(delimiter);

    while char_length(remainder) > 0 and cur_position > 0 do
        set cur_position = instr(remainder,`delimiter`);
        if cur_position = 0 then
            set cur_string = remainder;
        else
            set cur_string = left(remainder,cur_position - 1);
        end if;
        if trim(cur_string) != \'\' then
            insert into SplitValues values (cur_string);
        end if;
        set remainder = substring(remainder,cur_position + delimiter_length);
    end while;

end //
delimiter ;


delimiter // 
drop procedure if exists single_words//
create procedure single_words()
begin
declare finish int default 0;
declare str varchar(200);
declare cur_table cursor for  select replace(replace(replace(replace(mytext,\' \'),\',\' \') from mytable;
declare continue handler for not found set finish = 1;
truncate table mywords;
open cur_table;
my_loop:loop
fetch cur_table into str;
if finish = 1 then
leave my_loop;
end if;
call split_string(str,\' \');
insert into mywords (word) select * from splitvalues;
end loop;
close cur_table;
end;//
delimiter ;

call single_words();

select word,count(*) as word_count 
from mywords
group by word;

+-------+------------+
| word  | word_count |
+-------+------------+
| a     |          1 |
| about |          1 |
| apple |          3 |
| book  |          3 |
| but   |          1 |
| do    |          1 |
| hate  |          2 |
| I     |          5 |
| it    |          3 |
| lot   |          1 |
| love  |          3 |
| me    |          2 |
| No    |          1 |
| rulez |          1 |
| sucks |          2 |
| think |          1 |
| too   |          2 |
| What  |          1 |
| you   |          1 |
+-------+------------+
19 rows in set (0.00 sec)
为了考虑任何标点符号,必须对代码进行改进,但这是总的思路。     

版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。

相关推荐


依赖报错 idea导入项目后依赖报错,解决方案:https://blog.csdn.net/weixin_42420249/article/details/81191861 依赖版本报错:更换其他版本 无法下载依赖可参考:https://blog.csdn.net/weixin_42628809/a
错误1:代码生成器依赖和mybatis依赖冲突 启动项目时报错如下 2021-12-03 13:33:33.927 ERROR 7228 [ main] o.s.b.d.LoggingFailureAnalysisReporter : *************************** APPL
错误1:gradle项目控制台输出为乱码 # 解决方案:https://blog.csdn.net/weixin_43501566/article/details/112482302 # 在gradle-wrapper.properties 添加以下内容 org.gradle.jvmargs=-Df
错误还原:在查询的过程中,传入的workType为0时,该条件不起作用 &lt;select id=&quot;xxx&quot;&gt; SELECT di.id, di.name, di.work_type, di.updated... &lt;where&gt; &lt;if test=&qu
报错如下,gcc版本太低 ^ server.c:5346:31: 错误:‘struct redisServer’没有名为‘server_cpulist’的成员 redisSetCpuAffinity(server.server_cpulist); ^ server.c: 在函数‘hasActiveC
解决方案1 1、改项目中.idea/workspace.xml配置文件,增加dynamic.classpath参数 2、搜索PropertiesComponent,添加如下 &lt;property name=&quot;dynamic.classpath&quot; value=&quot;tru
删除根组件app.vue中的默认代码后报错:Module Error (from ./node_modules/eslint-loader/index.js): 解决方案:关闭ESlint代码检测,在项目根目录创建vue.config.js,在文件中添加 module.exports = { lin
查看spark默认的python版本 [root@master day27]# pyspark /home/software/spark-2.3.4-bin-hadoop2.7/conf/spark-env.sh: line 2: /usr/local/hadoop/bin/hadoop: No s
使用本地python环境可以成功执行 import pandas as pd import matplotlib.pyplot as plt # 设置字体 plt.rcParams[&#39;font.sans-serif&#39;] = [&#39;SimHei&#39;] # 能正确显示负号 p
错误1:Request method ‘DELETE‘ not supported 错误还原:controller层有一个接口,访问该接口时报错:Request method ‘DELETE‘ not supported 错误原因:没有接收到前端传入的参数,修改为如下 参考 错误2:cannot r
错误1:启动docker镜像时报错:Error response from daemon: driver failed programming external connectivity on endpoint quirky_allen 解决方法:重启docker -&gt; systemctl r
错误1:private field ‘xxx‘ is never assigned 按Altʾnter快捷键,选择第2项 参考:https://blog.csdn.net/shi_hong_fei_hei/article/details/88814070 错误2:启动时报错,不能找到主启动类 #
报错如下,通过源不能下载,最后警告pip需升级版本 Requirement already satisfied: pip in c:\users\ychen\appdata\local\programs\python\python310\lib\site-packages (22.0.4) Coll
错误1:maven打包报错 错误还原:使用maven打包项目时报错如下 [ERROR] Failed to execute goal org.apache.maven.plugins:maven-resources-plugin:3.2.0:resources (default-resources)
错误1:服务调用时报错 服务消费者模块assess通过openFeign调用服务提供者模块hires 如下为服务提供者模块hires的控制层接口 @RestController @RequestMapping(&quot;/hires&quot;) public class FeignControl
错误1:运行项目后报如下错误 解决方案 报错2:Failed to execute goal org.apache.maven.plugins:maven-compiler-plugin:3.8.1:compile (default-compile) on project sb 解决方案:在pom.
参考 错误原因 过滤器或拦截器在生效时,redisTemplate还没有注入 解决方案:在注入容器时就生效 @Component //项目运行时就注入Spring容器 public class RedisBean { @Resource private RedisTemplate&lt;String
使用vite构建项目报错 C:\Users\ychen\work&gt;npm init @vitejs/app @vitejs/create-app is deprecated, use npm init vite instead C:\Users\ychen\AppData\Local\npm-