如何解决从论坛使用“美丽的汤”进行报废-如何刮除使用<td>重复多次的表?
我想从论坛中检索表格数据,该论坛要求使用用户名和密码登录。我已经编写了代码,但是无法从论坛表中获取任何值。这是我的代码:
from bs4 import BeautifulSoup as bs
import requests
URL = 'http://kingmedia.tv'
LOGIN_ROUTE = '/home/'
HEADERS = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML,like Gecko) Chrome/84.0.4147.125 Safari/537.36','origin': URL,'referer': URL + LOGIN_ROUTE}
s = requests.session()
login_payload = {
'login': "bachoo786",'password': "abcde12345"
}
login_req = s.post(URL + LOGIN_ROUTE,headers=HEADERS,data=login_payload)
print(login_req.status_code)
cookies = login_req.cookies
soup = bs(s.get(URL + '/forumdisplay.php?f=2').text,'html.parser')
tbody = soup.find('table',id='tborder')
print(tbody)
我也尝试过使用硒,但无法获取数据。这是我的硒代码:
from selenium.webdriver.chrome.options import Options
from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
from bs4 import BeautifulSoup
import re
#from bs4 import BeautifulSoup as bs
import requests
URL = 'http://kingmedia.tv'
LOGIN_ROUTE = '/home/'
HEADERS = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML,'password': "abcde123"
}
login_req = s.post(URL + LOGIN_ROUTE,data=login_payload)
print(login_req.status_code)
cookies = login_req.cookies
options = Options()
# Runs Chrome in headless mode.
#options.add_argument("--headless")
#path of the chrome driver
chrome_options = Options()
chrome_options.add_argument("--headless")
chrome_options.add_argument('--no-sandbox')
chrome_options.add_argument('--disable-dev-shm-usage')
driver=webdriver.Chrome('/usr/bin/chromedriver',chrome_options=chrome_options)
driver.headless=True
driver.get('http://kingmedia.tv/home/forumdisplay.php?f=2')
WebDriverWait(driver,20).until(EC.visibility_of_element_located((By.CSS_SELECTOR,'div.sidebar-widget.widget_text>div>table')))
print("Data rendered successfully!!!")
#Get the page source
html = driver.page_source
soup = BeautifulSoup(html,'html.parser')
#print (soup)
driver.close()
table=soup.find('table',class_='tborder').find_next('table').find_next('class')
for row in table.find_all('tr'):
name=row.find_all("td")[0].text.strip()
print(name)
我尝试提取的表数据如下所示:
更新: 这是上面图片中表格的html元素:
<table class="tborder" cellpadding="6" cellspacing="1" border="0" width="100%" align="center">
<thead>
<tr align="center">
<td class="thead" width="150">Live Screenshot</td>
<td class="thead" width="130" align="left">Channel Number</td>
<td class="thead" width="290">Now Playing</td>
<td class="thead" width="170">Watching Options</td>
</tr>
</thead>
<tbody>
<tr align="left">
<td class="alt1Active" colspan="2" align="left" id="f13">
<table cellpadding="0" cellspacing="0" border="0">
<tbody><tr>
<td class="alt1" width="20">
<input type="image" width="130" height="90" src="http://213.163.74.154/ss2/tv1.jpg" onclick="open_tv1()"></td><td width="20"><center>
<br></center></td><td height="80"><br><br><strong><img src="/images/uk.gif"><img src="/images/hd2.png"> Channel 1</strong> <br><br>
</td>
</tr>
</tbody></table>
</td><td class="alt1" align="Center">
<div class="smallfont" align="left">
<div style="clear:both">
<span class="smallfont"></span>
<font size="2"><br><img class="inlineimg" src="http://www.kingmedia.tv/scripts/status.php?file=tv12.nsv" alt="12" border="0" title="12"> <strong>Back Soon</strong><br>
</font><iframe height="25" width="140" frameborder="0" scrolling="no" seamless="seamless" src="http://213.163.74.154:8080/tv1.xsl"></iframe><iframe height="25" width="80" frameborder="0" scrolling="no" seamless="seamless" src="http://207.244.98.215:8080/tv1.xsl"></iframe>
<br><span class="smallfont"></span></div></div></td>
<td class="alt1" nowrap="nowrap">
<div align="right"><form name="menuformtv1">
<select name="menu1tv1">
<option disabled="">-- Server 1 --</option>
<option value="/buildmtv1s2.php">Watch in Winamp</option>
<option value="/webvlctv1s2.php">Watch in a Web Player</option>
<option value="/directlinktv1s2.php">View Stream URL</option>
<option disabled="">-- Server 2 (Backup) --</option>
<option value="/buildmtv1.php" selected="selected">Watch in Winamp </option>
<option value="/webvlctv1.php">Watch in a Web Player</option>
<option value="/directlinktv1.php">View Stream URL</option>
</select>
<br><br><strong><a href="/home/payments.php">Subscribe to Unlock</a></strong>
</form></div><br>
</td>
<!---->
</tr>
</tbody>
<tbody>
<tr align="left">
<td class="alt1Active" colspan="2" align="left" id="f6">
<table cellpadding="0" cellspacing="0" border="0">
<tbody><tr>
<td class="alt1" width="20">
<input type="image" width="130" height="90" src="http://213.163.74.154/ss2/tv2.jpg" onclick="open_tv2()"></td><td width="20"><center>
<br></center></td><td height="80"><br><br><strong><img src="/images/uk.gif"><img src="/images/hd2.png"> Channel 2</strong> <br><br>
</td>
</tr>
</tbody></table>
</td><td class="alt1" align="Center">
<div class="smallfont" align="left">
<div style="clear:both">
<span class="smallfont"></span>
<font size="2"><br><img class="inlineimg" src="http://www.kingmedia.tv/scripts/status.php?file=tv9.nsv" alt="9" border="0" title="9"> <strong>Back Soon</strong><br>
</font><iframe height="25" width="140" frameborder="0" scrolling="no" seamless="seamless" src="http://213.163.74.154:8080/tv2.xsl"></iframe><iframe height="25" width="80" frameborder="0" scrolling="no" seamless="seamless" src="http://207.244.98.215:8080/tv2.xsl"></iframe>
<br><span class="smallfont"></span></div></div></td>
<td class="alt1" nowrap="nowrap">
<div align="right"><form name="menuformtv2">
<select name="menu1tv2">
<option disabled="">-- Server 1 --</option>
<option value="/buildmtv2s2.php">Watch in Winamp</option>
<option value="/webvlctv2s2.php">Watch in a Web Player</option>
<option value="/directlinktv2s2.php">View Stream URL</option>
<option disabled="">-- Server 2 (Backup) --</option>
<option value="/buildmtv2.php" selected="selected">Watch in Winamp </option>
<option value="/webvlctv2.php">Watch in a Web Player</option>
<option value="/directlinktv2.php">View Stream URL</option>
</select>
<br><br><strong><a href="/home/payments.php">Subscribe to Unlock</a></strong>
</form></div><br>
</td>
<!--<input type="image" src="/images/gs.gif" onClick="open_win2()" /><br>
<br><a href="/home/showthread.php?goto=newpost&t=4"><img src=/images/wtn.gif align=right border=0></a>-->
</tr>
</tbody>
<tbody>
<tr align="left">
<td class="alt1Active" colspan="2" align="left" id="f17">
<table cellpadding="0" cellspacing="0" border="0">
<tbody><tr>
<td class="alt1" width="20">
<input type="image" width="130" height="90" src="http://213.163.74.154/ss2/tv3.jpg" onclick="open_tv3()"></td><td width="20"><center>
<br></center></td><td height="80"><br><br><strong><img src="/images/uk.gif"><img src="/images/hd2.png"> Channel 3</strong> <br><br>
</td>
</tr>
</tbody></table>
</td><td class="alt1" align="Center">
<div class="smallfont" align="left">
<div style="clear:both">
<span class="smallfont"></span>
<font size="2"><br><img class="inlineimg" src="http://www.kingmedia.tv/scripts/status.php?file=tv13.nsv" alt="13" border="0" title="13"> <strong>Back Soon</strong><br>
</font><iframe height="25" width="140" frameborder="0" scrolling="no" seamless="seamless" src="http://213.163.74.154:8080/tv3.xsl"></iframe><iframe height="25" width="80" frameborder="0" scrolling="no" seamless="seamless" src="http://207.244.98.215:8080/tv3.xsl"></iframe>
<br><span class="smallfont"></span></div></div></td>
<td class="alt1" nowrap="nowrap">
<div align="right"><form name="menuformtv3">
<select name="menu1tv3">
<option disabled="">-- Server 1 --</option>
<option value="/buildmtv3s2.php">Watch in Winamp</option>
<option value="/webvlctv3s2.php">Watch in a Web Player</option>
<option value="/directlinktv3s2.php">View Stream URL</option>
<option disabled="">-- Server 2 (Backup) --</option>
<option value="/buildmtv3.php" selected="selected">Watch in Winamp </option>
<option value="/webvlctv3.php">Watch in a Web Player</option>
<option value="/directlinktv3.php">View Stream URL</option>
</select>
<br><br><strong><a href="/home/payments.php">Subscribe to Unlock</a></strong>
</form></div><br>
</td>
<!--<input type="image" src="/images/gs.gif" onClick="open_win3()" /><br>
<br><a href="/home/showthread.php?goto=newpost&t=8"><img src=/images/wtn.gif align=right border=0></a>-->
</tr>
</tbody>
<tbody>
<tr align="left">
<td class="alt1Active" colspan="2" align="left" id="f9">
<table cellpadding="0" cellspacing="0" border="0">
<tbody><tr>
<td class="alt1" width="20">
<input type="image" width="130" height="90" src="http://213.163.74.154/ss2/tv4.jpg" onclick="open_tv4()"></td><td width="20"><center>
<br></center></td><td height="80"><br><br><strong><img src="/images/canada.gif"><img src="/images/hd2.png"> Channel 4</strong> <br><br>
</td>
</tr>
</tbody></table>
</td><td class="alt1" align="Center">
<div class="smallfont" align="left">
<div style="clear:both">
<span class="smallfont"></span>
<font size="2"><br><img class="inlineimg" src="http://www.kingmedia.tv/scripts/status.php?file=tv3.nsv" alt="3" border="0" title="3"> <strong>TSN 2 HD</strong><br>
</font><iframe height="25" width="140" frameborder="0" scrolling="no" seamless="seamless" src="http://213.163.74.154:8080/tv4.xsl"></iframe><iframe height="25" width="80" frameborder="0" scrolling="no" seamless="seamless" src="http://207.244.98.215:8080/tv4.xsl"></iframe>
<br><span class="smallfont"></span></div></div></td>
<td class="alt1" nowrap="nowrap">
<div align="right"><form name="menuformtv4">
<select name="menu1tv4">
<option disabled="">-- Server 1 --</option>
<option value="/buildmtv4s2.php">Watch in Winamp</option>
<option value="/webvlctv4s2.php">Watch in a Web Player</option>
<option value="/directlinktv4s2.php">View Stream URL</option>
<option disabled="">-- Server 2 (Backup) --</option>
<option value="/buildmtv4.php" selected="selected">Watch in Winamp </option>
<option value="/webvlctv4.php">Watch in a Web Player</option>
<option value="/directlinktv4.php">View Stream URL</option>
</select>
<br><br><strong><a href="/home/payments.php">Subscribe to Unlock</a></strong>
</form></div><br>
</td>
<!--<input type="image" src="/images/gs.gif" onClick="open_win4()" /><br>
<br><a href="/home/showthread.php?goto=newpost&t=8"><img src=/images/wtn.gif align=right border=0></a>-->
</tr>
</tbody>
<tbody>
<tr align="left">
<td class="alt1Active" colspan="2" align="left" id="f8">
<table cellpadding="0" cellspacing="0" border="0">
<tbody><tr>
<td class="alt1" width="20">
<input type="image" width="130" height="90" src="http://213.163.74.154/ss2/tv5.jpg" onclick="open_tv5()"></td><td width="20"><center>
<br></center></td><td height="80"><br><br><strong><img src="/images/canada.gif"><img src="/images/hd2.png">Channel 5</strong> <br><br>
</td>
</tr>
</tbody></table>
</td><td class="alt1" align="Center">
<div class="smallfont" align="left">
<div style="clear:both">
<span class="smallfont"></span>
<font size="2"><br><img class="inlineimg" src="images/icons/status.php.png" alt="test" border="0" title="test"> <strong>TSN 3 HD</strong><br>
</font><iframe height="25" width="140" frameborder="0" scrolling="no" seamless="seamless" src="http://213.163.74.154:8080/tv5.xsl"></iframe><iframe height="25" width="80" frameborder="0" scrolling="no" seamless="seamless" src="http://207.244.98.215:8080/tv5.xsl"></iframe>
<br><span class="smallfont"></span></div></div></td>
<td class="alt1" nowrap="nowrap">
<div align="right"><form name="menuformtv5">
<select name="menu1tv5">
<option disabled="">-- Server 1 --</option>
<option value="/buildmtv5s2.php">Watch in Winamp</option>
<option value="/webvlctv5s2.php">Watch in a Web Player</option>
<option value="/directlinktv5s2.php">View Stream URL</option>
<option disabled="">-- Server 2 (Backup) --</option>
<option value="/buildmtv5.php" selected="selected">Watch in Winamp </option>
<option value="/webvlctv5.php">Watch in a Web Player</option>
<option value="/directlinktv5.php">View Stream URL</option>
</select>
<br><br><strong><a href="/home/payments.php">Subscribe to Unlock</a></strong>
</form></div><br>
</td>
<!--<input type="image" src="/images/gs.gif" onClick="open_win5()" /><br>
<br><a href="/home/showthread.php?goto=newpost&t=5"><img src=/images/wtn.gif align=right border=0></a>-->
</tr>
</tbody>
<tbody>
<tr align="left">
<td class="alt1Active" colspan="2" align="left" id="f11">
<table cellpadding="0" cellspacing="0" border="0">
<tbody><tr>
<td class="alt1" width="20">
<input type="image" width="130" height="90" src="http://213.163.74.154/ss2/tv6.jpg" onclick="open_tv6()"></td><td width="20"><center>
<br></center></td><td height="80"><br><br><strong><img src="/images/usa.gif"><img src="/images/hd2.png"> Channel 6</strong> <br><br>
</td>
</tr>
</tbody></table>
</td><td class="alt1" align="Center">
<div class="smallfont" align="left">
<div style="clear:both">
<span class="smallfont"></span>
<font size="2"><br><img class="inlineimg" src="images/icons/status.php.png" alt="test" border="0" title="test"> <strong>ESPN HD</strong><br>
</font><iframe height="25" width="140" frameborder="0" scrolling="no" seamless="seamless" src="http://213.163.74.154:8080/tv6.xsl"></iframe><iframe height="25" width="80" frameborder="0" scrolling="no" seamless="seamless" src="http://207.244.98.215:8080/tv6.xsl"></iframe>
<br><span class="smallfont"></span></div></div></td>
<td class="alt1" nowrap="nowrap">
<div align="right"><form name="menuformtv6">
<select name="menu1tv6">
<option disabled="">-- Server 1 --</option>
<option value="/buildmtv6s2.php">Watch in Winamp</option>
<option value="/webvlctv6s2.php">Watch in a Web Player</option>
<option value="/directlinktv6s2.php">View Stream URL</option>
<option disabled="">-- Server 2 (Backup) --</option>
<option value="/buildmtv6.php" selected="selected">Watch in Winamp </option>
<option value="/webvlctv6.php">Watch in a Web Player</option>
<option value="/directlinktv6.php">View Stream URL</option>
</select>
<br><br><strong><a href="/home/payments.php">Subscribe to Unlock</a></strong>
</form></div><br>
</td>
<!--<input type="image" src="/images/gs.gif" onClick="open_win6()" /><br>
<br><a href="/home/showthread.php?goto=newpost&t=10"><img src=/images/wtn.gif align=right border=0></a>-->
</tr>
</tbody>
<tbody>
<tr align="left">
<td class="alt1Active" colspan="2" align="left" id="f4">
<table cellpadding="0" cellspacing="0" border="0">
<tbody><tr>
<td class="alt1" width="20">
<input type="image" width="130" height="90" src="http://213.163.74.154/ss2/tv7.jpg" onclick="open_tv7()"></td><td width="20"><center>
<br></center></td><td height="80"><br><br><strong><img src="/images/world.gif"><img src="/images/hd2.png"> Channel 7</strong> <br><br>
</td>
</tr>
</tbody></table>
</td><td class="alt1" align="Center">
<div class="smallfont" align="left">
<div style="clear:both">
<span class="smallfont"></span>
<font size="2"><br><img class="inlineimg" src="images/icons/status.php.png" alt="test" border="0" title="test"> <strong>Live: Cricket</strong><br>
</font><iframe height="25" width="140" frameborder="0" scrolling="no" seamless="seamless" src="http://213.163.74.154:8080/tv7.xsl"></iframe><iframe height="25" width="80" frameborder="0" scrolling="no" seamless="seamless" src="http://207.244.98.215:8080/tv7.xsl"></iframe>
<br><span class="smallfont"></span></div></div></td>
<td class="alt1" nowrap="nowrap">
<div align="right"><form name="menuformtv7">
<select name="menu1tv7">
<option disabled="">-- Server 1 --</option>
<option value="/buildmtv7s2.php">Watch in Winamp</option>
<option value="/webvlctv7s2.php">Watch in a Web Player</option>
<option value="/directlinktv7s2.php">View Stream URL</option>
<option disabled="">-- Server 2 (Backup) --</option>
<option value="/buildmtv7.php" selected="selected">Watch in Winamp </option>
<option value="/webvlctv7.php">Watch in a Web Player</option>
<option value="/directlinktv7.php">View Stream URL</option>
</select>
<br><br><strong><a href="/home/payments.php">Subscribe to Unlock</a></strong>
</form></div><br>
</td>
<!--<input type="image" src="/images/world.gif" onClick="open_win7()" /><br>
<br><a href="/home/showthread.php?goto=newpost&t=3"><img src=/images/wtn.gif align=right border=0></a>-->
</tr>
</tbody>
</table>
更新:我修改了代码,但未返回任何内容。这是我更新的代码:
from bs4 import BeautifulSoup
import requests
URL = 'http://kingmedia.tv'
LOGIN_ROUTE = '/home/'
HEADERS = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML,data=login_payload)
print(login_req.status_code)
cookies = login_req.cookies
r = requests.get(URL + '/forumdisplay.php?f=2')
soup = BeautifulSoup(r.text,'html.parser')
path = '/home/pi/'
tborders = soup.select('table.tborder')
tborders = [tborder.text for tborder in tborders]
#del tborders[0]
print (tborders)
版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。