request-htmlでリンクを拾い集めるのが簡単だったので簡易画像ダウンローダーをつくった

追記：画像の保存先を記事ごとに振り分けるように作ったもの。

ページ内のリンクを収集するのが本当に楽。

from requests_html import HTMLSession

session = HTMLSession()
r = session.get(page_url)
links = r.html.absolute_links #これだけでいい

request-htmlのインストール

importはアンスコ、インストールはダッシュね。

$ pip install request-html

画像ダウンローダー

Pythonの練習がてらです。使えなくてもあしからず。

#!/usr/bin/env python
#config:utf-8

from requests_html import HTMLSession
from time import sleep
from pathlib import Path

page_url = ''

suffix = ('.jpg', '.gif')

path = Path('.')
img_path = path / 'img'

if not img_path.is_dir():
    img_path.mkdir(parents=True)

session = HTMLSession()
r = session.get(page_url)
links = r.html.absolute_links

for link in links:
    if link.endswith(suffix) and img.status_code == 200:
        img_link = session.get(link)
        img_file = img_path / img_link.url.split('/')[-1]
    else:
        continue
        
    if not img_file.is_file():
        with open (img_file, 'wb') as f:
            f.write(img_link.content)
            sleep(3)
    else:
        continue