如何解决使用Tweepy进行流传输:将Unicode字符转换为字母
我在使用Tweepy进行流式传输时捕获的推文采用Unicode特殊字符,因此我需要将它们作为字母。我在该网站上找到了许多解决方案,但由于我是实时收集推文,因此似乎没有一个解决方案甚至无法应用于我的案子。有人可以帮忙吗?
这是我的代码:
from urllib3.exceptions import ProtocolError
from tweepy import Stream
from tweepy.auth import OAuthHandler
from tweepy.streaming import StreamListener
import time
ckey = 'your code here'
csecret = 'your code here'
atoken = 'your code here'
asecret = 'your code here'
class listener(StreamListener):
def on_data(self,data):
while True:
try:
#print (data)
tweet = data.split(',"text":"')[1].split('","')[0]
tweet2 = data.split(',"screen_name":"')[1].split('","location')[0]
print (tweet2,tweet)
saveFile = open ('test.csv','a')
saveFile.write('@')
saveFile.write(tweet2)
saveFile.write(';')
saveFile.write(tweet)
saveFile.write('\n')
saveFile.close()
return True
except ProtocolError:
continue
except BaseException as e:
print ('Failed on data',str(e))
break
def on_error(self,status):
print (status)
auth = OAuthHandler(ckey,csecret)
auth.set_access_token(atoken,asecret)
twitterStream = Stream(auth,listener())
twitterStream.filter(track=['keyword'])
这是关键字“ fluminense”的输出:
adrianabpadilha Impressionante como mesmo com poucas op\u00e7\u00f5es para o banco o Burro s\u00f3 me sobe o Wisney e o Higor! Pq n\u00e3o levar o Pato\u2026 https:\/\/t.co\/lO4CJJsaaP
Miguel_Aalmeida RT @pulligffc: O Fluminense em dia de jogo olha pra mim e faz isso
TRANQUILINHO3 Time fdpt \ud83d\ude20
LeleoCasttroo @jrmenini @FFvinho Palmeiras e Fluminense ainda tiveram a base como fonte de renda,atl\u00e9tico n\u00e3o revela um jogador\u2026 https:\/\/t.co\/ZF8awS6pDt
SouzaArthur6 @CezarSabia @andreisilvasoar @ndrzej87 @futebol_info C\u00e9zar,existe um tempo certo de testagem,q se d\u00e1 no 5\u00b0 da doe\u2026 https:\/\/t.co\/zmBlBzafdo
Thomasrodrigue_ @renatojr_07 \u00c9 o mesmo exemplo da final da ta\u00e7a rio,a \u00fanica coisa que muda \u00e9 que na final n\u00e3o tinha jogador contam\u2026 https:\/\/t.co\/3Q2nCBw9XS
如您所见,诸如“ç”和“õ”之类的某些字符分别显示为“ / u00e7”和“ \ u00f5”。
谢谢!
解决方法
由于编码字符问题而发生这种情况。您可以使用unicode_escape
encoding
例如
s = r'\u00e7'
print s
\u00e7 #output
print s.decode('unicode-escape')
ç #output
版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。