如何将主题列表从 gensim lda get_document_topics()更改为 DataFrame 格式

如何解决如何将主题列表从 gensim lda get_document_topics()更改为 DataFrame 格式

我已经使用 gensim.models.ldamodel.LdaModel() 进行了一些主题建模,我想标记我的数据,以可视化我的发现。

这是我目前所拥有的:

我当前的数据框有以下列:

['text']['date']['gender']['tokens']['topics']['main_topic']
    

文本只是纯文本数据,日期的形式为(yyyy-mm-dd),性别是二进制的,女性为1,tokens是预处理后的文本,主题来自:

df['topics'] = LDA_model.get_document_topics(corpus)

和 main_topic 与此 post 的第二个答案略有不同,填充如下:

df['main_topic'] = [int(str(sorted(LDA_model[i],reverse=True,key=lambda x: x[1])[0][0]).zfill(3)) for i in corpus]

最后,topics 和 main_topics 的前 10 行看起来像这样(注意 num_topics=30):

    topics  main_topic
[(0,0.051341455),(1,0.21204428),(2,0.1145254),(4,0.055585753),(11,0.20260869),(29,0.25616828)]   29
[(0,0.052005265),0.21128647),0.08015486),(3,0.11465485),0.4478401)]  29
[(0,0.05355798),0.1394092),0.10734849),0.32699445),0.273105)] 4
[(0,0.053568278),0.22299954),0.22616898),0.0959242),0.2897638)]  29
[(0,0.05404401),0.4482777),0.141311),0.24849494)]  1
[(0,0.054245334),0.18933308),0.14567153),0.11169399),(23,0.05768766),0.35825193)]   29
[(0,0.05449035),0.114870586),0.13284092),0.075592585),0.13247918),(24,0.06598773),0.32016253)]   29
[(0,0.055871632),0.23100668),0.06832383),0.4730603)]   29
[(0,0.057746172),0.057121024),0.07247137),0.26388222),(13,0.07291462),0.34331965)]  29
[(0,0.057841185),0.19891246),0.09586754),0.5344914)]   29

现在我想要的是:

我想要 30 个新列:“主题 0、主题 1、主题 2、...、主题 29”。对于第一行,我想使用 df['topics'] 并将值保存在新列中,以便:

第 1 行的主题 0 = 0.0513414,第 1 行的主题 1 = 0.21204,第 1 行的主题 2 = 0.11452,第 1 行的主题 3 = 0,依此类推。

但我不知道怎么做。有人可以帮忙吗?

解决方法

我想通了。如果有人希望实现同样的目标:

LDA_model = gensim.models.ldamodel.LdaModel()
dir(gensim.models.ldamodel.LdaModel)

df['topics'] = LDA_model.get_document_topics(corpus)

sf = pd.DataFrame(data=df['topics'])
af = pd.DataFrame()

for i in range(30):
    af[str(i)]=[]

frames = [sf,af]
af = pd.concat(frames).fillna(0)

for i in range(6301):
    for j in range(len(df['topics'][i])):
        af[str(df['topics'][i][j][0])].loc[i] = df['topics'][i][j][1]

请注意,30 是我的 num_topics6301 是我在 df['topics' ])

现在数据框 af 看起来像这样 [限制为 5 行和 5 列]:

    topics  0   1   2   3
0   [(1,0.055395175),(5,0.0647138),(7,0.13507782),(9,0.055264555),(13,0.19258575),(21,0.05181323),(27,0.07139948)] 0.0 0.05539517477154732 0.0 0.0
1   [(0,0.052290276),(6,0.064590134),0.24019116),(16,0.07827738),0.0994899)]   0.05229027569293976 0.0 0.0 0.0
2   [(6,0.054943837),0.07324204),(10,0.052613333),(12,0.12482096),0.19818054),(29,0.06280263)]    0.0 0.0 0.0 0.0
3   [(4,0.12759669),(8,0.06937062),0.2261674),0.066699274),(24,0.06150386),0.096883684)] 0.0 0.0 0.0 0.0
4   [(2,0.09043305),0.15643781),0.13145259),0.064689845),(17,0.05019963),0.09253424),(28,0.10176642)]   0.0 0.0 0.09043305367231369 0.0

版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。

相关推荐


依赖报错 idea导入项目后依赖报错,解决方案:https://blog.csdn.net/weixin_42420249/article/details/81191861 依赖版本报错:更换其他版本 无法下载依赖可参考:https://blog.csdn.net/weixin_42628809/a
错误1:代码生成器依赖和mybatis依赖冲突 启动项目时报错如下 2021-12-03 13:33:33.927 ERROR 7228 [ main] o.s.b.d.LoggingFailureAnalysisReporter : *************************** APPL
错误1:gradle项目控制台输出为乱码 # 解决方案:https://blog.csdn.net/weixin_43501566/article/details/112482302 # 在gradle-wrapper.properties 添加以下内容 org.gradle.jvmargs=-Df
错误还原:在查询的过程中,传入的workType为0时,该条件不起作用 <select id="xxx"> SELECT di.id, di.name, di.work_type, di.updated... <where> <if test=&qu
报错如下,gcc版本太低 ^ server.c:5346:31: 错误:‘struct redisServer’没有名为‘server_cpulist’的成员 redisSetCpuAffinity(server.server_cpulist); ^ server.c: 在函数‘hasActiveC
解决方案1 1、改项目中.idea/workspace.xml配置文件,增加dynamic.classpath参数 2、搜索PropertiesComponent,添加如下 <property name="dynamic.classpath" value="tru
删除根组件app.vue中的默认代码后报错:Module Error (from ./node_modules/eslint-loader/index.js): 解决方案:关闭ESlint代码检测,在项目根目录创建vue.config.js,在文件中添加 module.exports = { lin
查看spark默认的python版本 [root@master day27]# pyspark /home/software/spark-2.3.4-bin-hadoop2.7/conf/spark-env.sh: line 2: /usr/local/hadoop/bin/hadoop: No s
使用本地python环境可以成功执行 import pandas as pd import matplotlib.pyplot as plt # 设置字体 plt.rcParams['font.sans-serif'] = ['SimHei'] # 能正确显示负号 p
错误1:Request method ‘DELETE‘ not supported 错误还原:controller层有一个接口,访问该接口时报错:Request method ‘DELETE‘ not supported 错误原因:没有接收到前端传入的参数,修改为如下 参考 错误2:cannot r
错误1:启动docker镜像时报错:Error response from daemon: driver failed programming external connectivity on endpoint quirky_allen 解决方法:重启docker -> systemctl r
错误1:private field ‘xxx‘ is never assigned 按Altʾnter快捷键,选择第2项 参考:https://blog.csdn.net/shi_hong_fei_hei/article/details/88814070 错误2:启动时报错,不能找到主启动类 #
报错如下,通过源不能下载,最后警告pip需升级版本 Requirement already satisfied: pip in c:\users\ychen\appdata\local\programs\python\python310\lib\site-packages (22.0.4) Coll
错误1:maven打包报错 错误还原:使用maven打包项目时报错如下 [ERROR] Failed to execute goal org.apache.maven.plugins:maven-resources-plugin:3.2.0:resources (default-resources)
错误1:服务调用时报错 服务消费者模块assess通过openFeign调用服务提供者模块hires 如下为服务提供者模块hires的控制层接口 @RestController @RequestMapping("/hires") public class FeignControl
错误1:运行项目后报如下错误 解决方案 报错2:Failed to execute goal org.apache.maven.plugins:maven-compiler-plugin:3.8.1:compile (default-compile) on project sb 解决方案:在pom.
参考 错误原因 过滤器或拦截器在生效时,redisTemplate还没有注入 解决方案:在注入容器时就生效 @Component //项目运行时就注入Spring容器 public class RedisBean { @Resource private RedisTemplate<String
使用vite构建项目报错 C:\Users\ychen\work>npm init @vitejs/app @vitejs/create-app is deprecated, use npm init vite instead C:\Users\ychen\AppData\Local\npm-