使用 Pandas 提取 $ 符号后的字母

如何解决使用 Pandas 提取 $ 符号后的字母

我正在尝试从电子表格中提取包含 $ 符号的数据。

我已经隔离了数据,只给我包含数据的列,但我想要做的是提取任何和所有跟在 $ 符号后面的符号。

例如: $AAPL $LOW $TSLA 等等来自整个数据集,但我不需要或想要 $1000 $600 等等 - 只是字母,后面还有一个句点或空格,但只有字符 az 是我想要得到的.

我没有成功完全提取,我的代码开始变得混乱,所以我将提供可以带回数据的代码供您自己查看。我正在使用 Jupyter Notebook。

import mysql.connector
import pandas

googleSheedID = '15fhpxqWDRWkNtEFhi9bQyWUg8pDn4B-R2N18s1xFYTU'
worksheetName = 'Sheet1'
URL = 'https://docs.google.com/spreadsheets/d/{0}/gviz/tq?tqx=out:csv&sheet={1}'.format(
        googleSheedID,worksheetName
)

df = pandas.read_csv(URL)
del df['DATE']
del df['USERNAME']
del df['LINK']
del df['LINK2']
df[df["TWEET"].str.contains("RT")==False]

print(df)

解决方法

不确定我是否正确理解您想要的内容,但以下代码给出了 $ 之后 之前的所有元素(空格)。

import mysql.connector
import pandas

googleSheedID = '15fhpxqWDRWkNtEFhi9bQyWUg8pDn4B-R2N18s1xFYTU'
worksheetName = 'Sheet1'
URL = 'https://docs.google.com/spreadsheets/d/{0}/gviz/tq?tqx=out:csv&sheet={1}'.format(
        googleSheedID,worksheetName
)

df = pandas.read_csv(URL)
del df['DATE']
del df['USERNAME']
del df['LINK']
del df['LINK2']

unique_results = []
for i in range(len(df['TWEET'])):
    if 'RT' in df["TWEET"][i]:
        continue
    else:
        for j in range(len(df['TWEET'][i])-1):
            if df['TWEET'][i][j] == '$':
                if df['TWEET'][i][j+1] == '1' or df['TWEET'][i][j+1] == '2' or df['TWEET'][i][j+1] == '3' or\
                   df['TWEET'][i][j+1] == '4' or df['TWEET'][i][j+1] == '5' or df['TWEET'][i][j+1] == '6' or\
                    df['TWEET'][i][j+1] == '7' or df['TWEET'][i][j+1] == '8' or df['TWEET'][i][j+1] == '9' or df['TWEET'][i][j+1] == '0':
                        continue
                else:
                    start = j
                    for k in range(start,len(df['TWEET'][i])):
                        if df['TWEET'][i][k] == ' ' or df['TWEET'][i][k:k+1] == '\n':
                            end = k
                            break
                    results = df['TWEET'][i][start:end]
                    if results not in unique_results:
                        unique_results.append(results)
print(unique_results)
                        

编辑:修复代码

输出是:

['$GME','$SNDL','$FUBO','$AMC','$LOTZ','$CLOV','$USAS','$AIHS','$PLM','$LODE','$TTNP','$IMTE','','$NAK.','$NAK','$CRBP','$AREC','$NTEC','$NTN','$CBAT','$ZYNE','$HOFV','$GWPH','$KERN','$ZYNE,','$AIM','$WWR','$CARV','$VISL','$SINO','$NAKD','$GRPS','$RSHN','$MARA','$RIOT','$NXTD','$LAC','$BTC','$ITRM','$CHCI','$VERU','$GMGI','$WNBD','$KALV','$EGOC','$Veru','$MRNA','$PVDG','$DROP','$EFOI','$LLIT','$AUVI','$CGIX','$RELI','$TLRY','$ACB','$TRCH','$TRCH.','$TSLA','$cciv','$sndl','$ANCN','$TGC','$tlry','$KXIN','$AMZN','$INFI','$LMND','$COMS','$VXX','$LEDS','$ACY','$RHE','$SINO.','$GPL','$SPCE','$OXY','$CLSN','$FTFT','$FTFT.....','$BIEI','$EDRY','$CLEU','$FSR','$SPY','$NIO','$LI','$XPEV,'$UL','$RGLG','$SOS','$QS','$THCB','$SUNW','$MICT','$BTC.X','$T','$ADOM','$EBON','$CLPS','$HIHO','$ONTX','$WNRS','$SOLO','$Mara,'$Riot,'$SOS,'$GRNQ,'$RCON,'$FTFT,'$BTBT,'$MOGO,'$EQOS,'$CCNC','$CCIV','$tsla','$fsr','$wkhs','$ride','$nio','$NETE','$DPW','$MOSY','$SSNT','$PLTR','$GSAH:','$EQOS','$MTSL','$CMPS','$CHIF','$MU','$HST','$SNAP','$CTXR','$acy','$FUBOTV','$DPBE','$HYLN','$SPOT','$NSAV','$HYLN,'$aabb','$AAL','$BBIG','$ITNS','$CTIB','$AMPG','$ZI','$NUVI','$INTC','$TSM','$AAPL','$MRJT','$RCMT','$IZEA','$BBIG,'$ARKK','$LIAUTO','$MARA:','$SOS:','$XOM','$ET','$BRNW','$SYPR','$LCID','$QCOM','$FIZZ','$TRVG','$SLV','$RAFA','$TGCTengasco,'$BYND','$XTNT','$NBY','$sos','$KMPH','$','$(0.60)','$(0.64)','$BIDU','$rkt','$GTT','$CHUC','$CLF','$INUV','$RKT','$COST','$MDCN','$HCMC','$UWMC','$riot','$OVID','$HZON','$SKT','$FB','$PLUG','$BA','$PYPL','$PSTH.','$NVDA','$AMPG.','$aese.','$spy','$pltr','$MSFT','$AMD','$QQQ','$LTNC','$WKHS','$EYES','$RMO','$GNUS','$gme','$mdmp','$kern','$AEI','$BABA','$YALA','$TWTR','$WISH','$GE','$ORCL','$JUPW','$TMBR','$SSYS','$NKE','$AMPGAmpliTech','$$$','$$','$RGLS','$HOGE','$GEGR','$nclh','$IGAC','$FCEL','$TKAT','$OCG','$YVR','$IPDN.','$IPDN',"$SINO's",'$WIMI','$TKAT.','$BAC','$LZR','$LGHL','$F','$GM','$KODK','$atvk','$ATVK','$AIKI','$DS','$AI','$WTII','$oxy','$DYAI','$DSS','$ZKIN','$MFH','$WKEY','$MKGI','$DLPN','$PSWW','$SNOW','$ALYA','$AESE','$CSCW','$CIDM','$HOFV.','$LIVX','$FNKO','$HPR','$BRQS','$GIGM','$APOP','$EA','$CUEN','$TMBR?','$FLNT,'$APPS','$METX','$STG','$WSRC','$AMHC','$VIAC','$MO','$UAVL','$CS','$MDT','$GYST','$CBBT','$ASTC','$AACG','$WAFU.','$WAFU','$CASI','$mmmw','$MVIS','$SNOA','$C','$KR','$EWZ','$VALE','$EWZ.','$CSCO','$PINS','$XSPA','$VPRX','$CEMI','$M','$BMRA','$SPX','$akt','$SURG','$NCLH','$ARSN','$ODT','$SGBX','$CRWD.','$TGRR','$PENN','$BB','$XOP','$XL','$FREQ','$IDRA','$DKNG','$COHN','$ADHC','$ISWH','$LEGO','$OTRA','$NAAC','$HCAR','$PPGH','$SDAC','$PNTM','$OUST','$IO','$HQGE','$HENC','$KYNC','$ATNF','$BNSO','$HDSN','$AABB','$SGH','$BMY','$VERY','$EARS','$ROKU','$PIXY','$APRE','$SFET','$SQ','$EEIQ','$REDU','$CNWT','$NFLX','$RGBPP','$RGBP','$SHOP','$VITL','$RAAS','$CPNG','$JKS','$COMP','$NAFS']
,

您可以使用正则表达式。

\$[a-zA-Z]+

阅读df后执行下面的代码

import re
# Create Empty list for final results
results = []
final_results = []
for row_num in range(len(df['TWEET'])):
    string_to_check = df['TWEET'][row_num]
    # Check for RT at the beginning of the string only.
    # if 'RT' in df["TWEET"][row_num] would have found the "RT" anywhere in the string.
    if re.match(r"^RT",string_to_check):
        continue
    else:
        # Check for all words starting with $ and followed by only alphabets.
        # This will find $FOOBAR but not $600,$6FOOBAR & $FOO6BAR
        rel_text_l = re.findall(r"\$[a-zA-Z]+",string_to_check)
        # Check for empty list
        if rel_text_l:
            # Add elements of list to another list directly
            results.extend(rel_text_l)

# Making list of the set of list to remove duplicates
final_results = list(set(results))
print(results)
print(final_results)

结果是

['$GME','$FOOBAR','$FOO','$GME','$GOBLIN','$LTNC']
['$LTNC','$FUBO']

注意 $GMEfinal_results 中被删除一次

如果您不介意删除以 RT 开头的推文,所有这一切都可以在一行代码中实现。

direct_result = list(set(re.findall(r"\$[a-zA-Z]+",str(df['TWEET']))))

版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。

相关推荐


依赖报错 idea导入项目后依赖报错,解决方案:https://blog.csdn.net/weixin_42420249/article/details/81191861 依赖版本报错:更换其他版本 无法下载依赖可参考:https://blog.csdn.net/weixin_42628809/a
错误1:代码生成器依赖和mybatis依赖冲突 启动项目时报错如下 2021-12-03 13:33:33.927 ERROR 7228 [ main] o.s.b.d.LoggingFailureAnalysisReporter : *************************** APPL
错误1:gradle项目控制台输出为乱码 # 解决方案:https://blog.csdn.net/weixin_43501566/article/details/112482302 # 在gradle-wrapper.properties 添加以下内容 org.gradle.jvmargs=-Df
错误还原:在查询的过程中,传入的workType为0时,该条件不起作用 <select id="xxx"> SELECT di.id, di.name, di.work_type, di.updated... <where> <if test=&qu
报错如下,gcc版本太低 ^ server.c:5346:31: 错误:‘struct redisServer’没有名为‘server_cpulist’的成员 redisSetCpuAffinity(server.server_cpulist); ^ server.c: 在函数‘hasActiveC
解决方案1 1、改项目中.idea/workspace.xml配置文件,增加dynamic.classpath参数 2、搜索PropertiesComponent,添加如下 <property name="dynamic.classpath" value="tru
删除根组件app.vue中的默认代码后报错:Module Error (from ./node_modules/eslint-loader/index.js): 解决方案:关闭ESlint代码检测,在项目根目录创建vue.config.js,在文件中添加 module.exports = { lin
查看spark默认的python版本 [root@master day27]# pyspark /home/software/spark-2.3.4-bin-hadoop2.7/conf/spark-env.sh: line 2: /usr/local/hadoop/bin/hadoop: No s
使用本地python环境可以成功执行 import pandas as pd import matplotlib.pyplot as plt # 设置字体 plt.rcParams['font.sans-serif'] = ['SimHei'] # 能正确显示负号 p
错误1:Request method ‘DELETE‘ not supported 错误还原:controller层有一个接口,访问该接口时报错:Request method ‘DELETE‘ not supported 错误原因:没有接收到前端传入的参数,修改为如下 参考 错误2:cannot r
错误1:启动docker镜像时报错:Error response from daemon: driver failed programming external connectivity on endpoint quirky_allen 解决方法:重启docker -> systemctl r
错误1:private field ‘xxx‘ is never assigned 按Altʾnter快捷键,选择第2项 参考:https://blog.csdn.net/shi_hong_fei_hei/article/details/88814070 错误2:启动时报错,不能找到主启动类 #
报错如下,通过源不能下载,最后警告pip需升级版本 Requirement already satisfied: pip in c:\users\ychen\appdata\local\programs\python\python310\lib\site-packages (22.0.4) Coll
错误1:maven打包报错 错误还原:使用maven打包项目时报错如下 [ERROR] Failed to execute goal org.apache.maven.plugins:maven-resources-plugin:3.2.0:resources (default-resources)
错误1:服务调用时报错 服务消费者模块assess通过openFeign调用服务提供者模块hires 如下为服务提供者模块hires的控制层接口 @RestController @RequestMapping("/hires") public class FeignControl
错误1:运行项目后报如下错误 解决方案 报错2:Failed to execute goal org.apache.maven.plugins:maven-compiler-plugin:3.8.1:compile (default-compile) on project sb 解决方案:在pom.
参考 错误原因 过滤器或拦截器在生效时,redisTemplate还没有注入 解决方案:在注入容器时就生效 @Component //项目运行时就注入Spring容器 public class RedisBean { @Resource private RedisTemplate<String
使用vite构建项目报错 C:\Users\ychen\work>npm init @vitejs/app @vitejs/create-app is deprecated, use npm init vite instead C:\Users\ychen\AppData\Local\npm-