使用Python在html中的某些文本之后找到最接近的表格的方法

如何解决使用Python在html中的某些文本之后找到最接近的表格的方法

我正在使用BeautifulSoup解析HTML数据。

我正在解析的HTML看起来像

<html>
   <head></head>
   <body>
      <table class = "nb">
         <tr>
            <td> <p> ABC </p> </td>
         <tr>
      </table>
      <table border = "1">
       ...
      </table>
      <table class = "nb">
         <tr>
            <td> <p> DEF </p> </td>
         <tr>
      </table>
      <table border = "1">
       ...
      </table>
      <table class = "nb">
         <tr>
            <td> <p> GHI </p> </td>
         <tr>
      </table>
      <table border = "1">
       ...
      </table>
   <body>
</html>

（ABC，DEF和GHI是下表的名称）

在此html中，我需要做的是。

首先，检查html中是否有“ ABC”或“ GHI”文本。

第二，在（next_silbings）文本“ ABC”和“ GHI”之后找到第一张表（因此第一张和第三张表的边框为“ 1”）

（换句话说，我需要找到表的名称（ABC），并找到带有border =“ 1”的第一个表）

首先，使用

findAll(text = "regular expression")

我可以解决第一个问题。

实现第二秒我尝试使用next_siblings或类似的方法，但是我没有达到第二秒。

谢谢。

解决方法

另一种方法。

from simplified_scrapy import SimplifiedDoc
html = '''
<html>
   <head></head>
   <body>
      <table class = "nb">
         <tr>
            <td> <p> ABC </p> </td>
         <tr>
      </table>
      <table border = "1">
       ...
      </table>
      <table class = "nb">
         <tr>
            <td> <p> DEF </p> </td>
         <tr>
      </table>
      <table border = "1">
       ...
      </table>
      <table class = "nb">
         <tr>
            <td> <p> GHI </p> </td>
         <tr>
      </table>
      <table border = "1">
       ...
      </table>
   <body>
</html>
'''
doc = SimplifiedDoc(html)
# First,find and check there is "ABC" or "GHI" text in html.
nameTable = doc.getElementByReg('ABC',tag='td')
if nameTable:
    nameTable = nameTable.getParent('table')
    # Second,find the first table after(next_silbings) text "ABC"
    table = nameTable.getNext('table')  # Using next
    print(table['border'])

    # Or,Using index positioning
    table = doc.getElement('table',start=nameTable._end)
    print(table['border'])

结果：

1
1

使用Python在html中的某些文本之后找到最接近的表格的方法

如何解决使用Python在html中的某些文本之后找到最接近的表格的方法

解决方法

相关推荐