菜鸟笔记
提升您的技术认知

beautiful soup关联选择的方法-ag真人游戏

阅读 : 95

beautiful soup是python的一个网页解析库,处理快捷; 支持多种解析器,功能强大。教程细致讲解beautiful soup的深入使用、节点选择器、css选择器、beautiful soup4的方法选择器等重要知识点,是学好爬虫的基础课程。

学习目标

  1. 掌握关联选择的方法的使用

1. 关联选择

在做选择的时候,有时候不能做到一步就选到想要的节点元素,例如示例中的第二个a节点

,
 and
;

需要先选中某一个节点元素,然后以它为基准再选择它的子节点、父节点、兄弟节点等,接下来我们来介绍如何选择这些节点元素。

1. 子节点

  • 格式:soup.tag.contents

  • 返回值:列表

  • 示例:

    html = '''
    

    hello

    • foo
    • bar
    • ]ay
    • foo
    • bar
    ''' from bs4 import beautifulsoup soup = beautifulsoup(html, 'lxml') # 获取p节点的子节点 print(soup.p.contents) # 输出结果 ['once upon a time there were three little sisters; and their names were\n', , ',\n', , ' and\n', , ';\nand they lived at the bottom of a well.']
  • 格式:soup.tag.children

  • 返回值:生成器

  • 示例:

    html = '''
      

    hello

    • foo
    • bar
    • ]ay
    • foo
    • bar
    ''' from bs4 import beautifulsoup soup = beautifulsoup(html, 'lxml') # 获取p节点的每一个子节点 print(soup.p.children) for i, child in enumerate(soup.p.children): print(i, child) # 输出结果 0 once upon a time there were three little sisters; and their names were 1 2 , 3 4 and 5 6 ; and they lived at the bottom of a well.

2. 子孙节点

上面我们已经拿到了p节点的全部直系的子节点,如果我们想要获取p节点中的所有子孙节点的话,可以使用descendants属性。

  • 格式:soup.p.descendants

  • 返回值:生成器

  • 示例:

    html = '''
      

    hello

    • foo
    • bar
    • ]ay
    • foo
    • bar
    ''' from bs4 import beautifulsoup soup = beautifulsoup(html, 'lxml') # 获取p节点所有的子孙节点 print(soup.p.descendants) for i, child in enumerate(soup.p.descendants): print(i, child) # 输出结果 0 once upon a time there were three little sisters; and their names were 1 2 elsie 3 elsie 4 , 5 6 lacie 7 and 8 9 tillie 10 ; and they lived at the bottom of a well.

3. 父节点

上面我们都是在选择子节点和子孙节点,接下来我们使用parent属性获取某节点元素的父节点。

  • 格式:soup.tag.parent

  • 返回值:节点元素

  • 示例:

    html = '''
      

    hello

    • foo
    • bar
    • ]ay
    • foo
    • bar
    ''' from bs4 import beautifulsoup soup = beautifulsoup(html, 'lxml') # 打印选取的a节点 print(soup.a) # 获取a节点的父节点 print(soup.a.parent) # 输出结果 # 选取的a节点 # 所选a节点的父节点

    once upon a time there were three little sisters; and their names were , and ; and they lived at the bottom of a well.

4. 祖先节点

如果想要获取,祖先节点,可以调用parents属性。

  • 格式:soup.tag.parents

  • 返回值:生成器

  • 示例:

    html = '''
      

    hello

    • foo
    • bar
    • ]ay
    • foo
    • bar
    ''' from bs4 import beautifulsoup soup = beautifulsoup(html, 'lxml') # 获取a节点的所有祖先节点 print(soup.a.parents) # 打印a节点的所有祖先节点的类型 print(type(soup.a.parents)) # 获取a节点的所有祖先节点的内容 print(list(enumerate(soup.a.parents))) # 输出结果 [(0,

    once upon a time there were three little sisters; and their names were , and ; and they lived at the bottom of a well.

    ), (1,

    once upon a time there were three little sisters; and their names were , and ; and they lived at the bottom of a well.

    ...

    ), (2, the dormouse's story

    once upon a time there were three little sisters; and their names were , and ; and they lived at the bottom of a well.

    ...

    ), (3, the dormouse's story

    once upon a time there were three little sisters; and their names were , and ; and they lived at the bottom of a well.

    ...

    )]

5. 兄弟节点

上面说明了子节点和父节点的获取方式,那如果想要获取同级的节点,应该怎么办呢?接下来我们来学习下,使用sibling获取兄弟节点。

  • 获取后面一个节点

  • 格式:soup.tag.next_sibling

  • 返回值:节点元素

  • 示例:

    html = '''
      

    hello

    • foo
    • bar
    • ]ay
    • foo
    • bar
    ''' from bs4 import beautifulsoup soup = beautifulsoup(html, 'lxml') # 获取a节点的后面一个节点 print(soup.a.next_sibling) # 获取类型 print(type(soup.a.next_sibling)) # 输出结果 ,
  • 获取后面所有的节点

  • 格式:soup.tag.next_siblings

  • 返回值:生成器

  • 示例:

    # 获取a节点的后面所有节点
    print(soup.a.next_siblings)
    # 获取类型
    print(type(soup.a.next_siblings))
    # 获取所有内容
    print(list(enumerate(soup.a.next_siblings)))
    # 输出结果
    
    
    [(0, ',\n'), 
     (1, ), 
     (2, ' and\n'), 
     (3, ), 
     (4, ';\nand they lived at the bottom of a well.')]
    
  • 获取前面一个节点

  • 格式:soup.tag.previous_sibling

  • 返回值:节点元素

  • 示例:

    # 获取a节点的前一个节点
    print(soup.a.previous_sibling)
    # 获取类型
    print(type(soup.a.previous_sibling))
    # 输出结果
    once upon a time there were three little sisters; and their names were
    
    
  • 获取前面的所有节点

  • 格式:soup.tag.previous_siblings

  • 返回值:生成器

  • 示例:

    # 获取a节点的前面所有节点
    print(soup.a.previous_siblings)
    # 获取类型
    print(type(soup.a.previous_siblings))
    # 获取所有内容
    print(list(enumerate(soup.a.previous_siblings)))
    # 输出结果
    
    
    [(0, 'once upon a time there were three little sisters; and their names were\n')]
    

2. 总结

节点选择器 关联选择方法:

  • 子节点
    • soup.tag.contents
    • soup.tag.children
  • 子孙节点
    • soup.tag.descendants
  • 父节点
    • soup.tag.parent
  • 祖先节点
    • soup.tag.parents
  • 兄弟节点
    • soup.tag.next_sibling
    • soup.tag.next_siblings
    • soup.tag…previous_sibling
    • soup.tag…previous_siblings
网站地图