如何解决从谷歌翻译读取 html 文本的问题
当我将某些内容从日语翻译成英语时,我遇到了抓取来自 Google 翻译的一些文本的问题。这是我正在使用的代码:
library(rvest)
library(dplyr)
url_pr2 <- 'https://warp.ndl.go.jp/info:ndljp/pid/11454275/www.mofa.go.jp/mofaj/press/release/17/rls_0430b.html'
webpage2 <- read_html(url_pr2,encoding = 'utf8')
title_data <- html_nodes(webpage2,'h2')
title <- html_text(title_data)
getParam = title
translateFrom = "ja"
translateTo = "en"
search <- gsub(" ","%20",getParam)
URL_title <- paste("https://translate.google.pl/m?hl=",translateFrom,"&sl=","&tl=",translateTo,"&ie=UTF-8&prev=_m&q=",search,sep="")
page <- getURL(URL_title)
web_title <- read_html(URL_title)
text_final <- html_nodes(web_title,'.result-container')
html_text(text_final)
但我得到以下文本:
[1] "æ Š € è ¡ "Å" å Š ›ã «é – ¢ ã ™ ã‹æ—¥ œ¬å ›½æ” ¿ é – “ã ®å” å®šã ®ç½²å”
如果我运行相同的代码,但将某些内容从西班牙语或法语翻译成英语,它就可以完美运行。这是另一个代码:
url_pr2 <- 'https://www.gob.mx/sre/prensa/la-sre-brinda-asistencia-a-mexicano-detenido-en-letonia?idiom=es'
webpage2 <- read_html(url_pr2,'.bottom-buffer')
title <- html_text(title_data)
getParam = title
translateFrom = "es"
translateTo = "en"
search <- gsub(" ",'.result-container')
html_text(text_final)
从前面的代码我得到以下结果:
[1]“SRE 为在拉脱维亚被拘留的墨西哥人提供援助”
有谁知道如何提取英文翻译?如果我去我生成的谷歌翻译网站,我可以看到英文翻译。
解决方法
您需要正确编码要翻译的整个短语
search <- URLencode(getParam)
library(rvest)
library(dplyr)
url_pr2 <- 'https://warp.ndl.go.jp/info:ndljp/pid/11454275/www.mofa.go.jp/mofaj/press/release/17/rls_0430b.html'
webpage2 <- read_html(url_pr2,encoding = 'utf8')
title_data <- html_nodes(webpage2,'h2')
title <- html_text(title_data)
getParam = title
translateFrom = "ja"
translateTo = "en"
search <- URLencode(getParam)
URL_title <- paste("https://translate.google.pl/m?hl=",translateFrom,"&sl=","&tl=",translateTo,"&ie=UTF-8&prev=_m&q=",search,sep="")
page <- read_html(URL_title)
text_final <- html_node(page,'.result-container') %>% html_text()
print(text_final)
"Signing of an agreement between the Government of Japan and the Government of the Islamic Republic of Pakistan on technical cooperation"
版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。