Skip to content

Commit 27c4509

Browse files
committed
Merge remote-tracking branch 'origin/main' into close-inactive-contexts
2 parents 98ba7bf + 809aad1 commit 27c4509

File tree

17 files changed

+222
-47
lines changed

17 files changed

+222
-47
lines changed

.bumpversion.cfg

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
[bumpversion]
2-
current_version = 0.0.35
2+
current_version = 0.0.36
33
commit = True
44
tag = True
55

.github/workflows/tests.yml

Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -13,6 +13,8 @@ jobs:
1313
include:
1414
- os: macos-latest
1515
python-version: "3.12"
16+
- os: windows-latest
17+
python-version: "3.12"
1618

1719
steps:
1820
- uses: actions/checkout@v4
@@ -33,14 +35,27 @@ jobs:
3335

3436
- name: Upload coverage report (Linux)
3537
if: runner.os == 'Linux'
38+
env:
39+
CODECOV_TOKEN: ${{ secrets.CODECOV_TOKEN }}
3640
run: |
3741
curl -Os https://uploader.codecov.io/latest/linux/codecov
3842
chmod +x codecov
3943
./codecov
4044
4145
- name: Upload coverage report (macOS)
4246
if: runner.os == 'macOS'
47+
env:
48+
CODECOV_TOKEN: ${{ secrets.CODECOV_TOKEN }}
4349
run: |
4450
curl -Os https://uploader.codecov.io/latest/macos/codecov
4551
chmod +x codecov
4652
./codecov
53+
54+
- name: Upload coverage report (Windows)
55+
if: runner.os == 'Windows'
56+
env:
57+
CODECOV_TOKEN: ${{ secrets.CODECOV_TOKEN }}
58+
run: |
59+
$ProgressPreference = 'SilentlyContinue'
60+
Invoke-WebRequest -Uri https://uploader.codecov.io/latest/windows/codecov.exe -Outfile codecov.exe
61+
.\codecov.exe

README.md

Lines changed: 31 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -56,10 +56,13 @@ See the [changelog](docs/changelog.md) document.
5656

5757
## Activation
5858

59+
### Download handler
60+
5961
Replace the default `http` and/or `https` Download Handlers through
6062
[`DOWNLOAD_HANDLERS`](https://docs.scrapy.org/en/latest/topics/settings.html):
6163

6264
```python
65+
# settings.py
6366
DOWNLOAD_HANDLERS = {
6467
"http": "scrapy_playwright.handler.ScrapyPlaywrightDownloadHandler",
6568
"https": "scrapy_playwright.handler.ScrapyPlaywrightDownloadHandler",
@@ -70,12 +73,19 @@ Note that the `ScrapyPlaywrightDownloadHandler` class inherits from the default
7073
`http/https` handler. Unless explicitly marked (see [Basic usage](#basic-usage)),
7174
requests will be processed by the regular Scrapy download handler.
7275

73-
Also, be sure to [install the `asyncio`-based Twisted reactor](https://docs.scrapy.org/en/latest/topics/asyncio.html#installing-the-asyncio-reactor):
76+
77+
### Twisted reactor
78+
79+
When running on GNU/Linux or macOS you'll need to
80+
[install the `asyncio`-based Twisted reactor](https://docs.scrapy.org/en/latest/topics/asyncio.html#installing-the-asyncio-reactor):
7481

7582
```python
83+
# settings.py
7684
TWISTED_REACTOR = "twisted.internet.asyncioreactor.AsyncioSelectorReactor"
7785
```
7886

87+
This is not a requirement on Windows (see [Windows support](#windows-support))
88+
7989

8090
## Basic usage
8191

@@ -112,6 +122,20 @@ does not match the running Browser. If you prefer the `User-Agent` sent by
112122
default by the specific browser you're using, set the Scrapy user agent to `None`.
113123

114124

125+
## Windows support
126+
127+
Windows support is possible by running Playwright in a `ProactorEventLoop` in a separate thread.
128+
This is necessary because it's not possible to run Playwright in the same
129+
asyncio event loop as the Scrapy crawler:
130+
* Playwright runs the driver in a subprocess. Source:
131+
[Playwright repository](https://github.com/microsoft/playwright-python/blob/v1.44.0/playwright/_impl/_transport.py#L120-L130).
132+
* "On Windows, the default event loop `ProactorEventLoop` supports subprocesses,
133+
whereas `SelectorEventLoop` does not". Source:
134+
[Python docs](https://docs.python.org/3/library/asyncio-platforms.html#asyncio-windows-subprocess).
135+
* Twisted's `asyncio` reactor requires the `SelectorEventLoop`. Source:
136+
[Twisted repository](https://github.com/twisted/twisted/blob/twisted-24.3.0/src/twisted/internet/asyncioreactor.py#L31)
137+
138+
115139
## Supported [settings](https://docs.scrapy.org/en/latest/topics/settings.html)
116140

117141
### `PLAYWRIGHT_BROWSER_TYPE`
@@ -870,6 +894,12 @@ Refer to the
870894
[upstream docs](https://docs.scrapy.org/en/latest/topics/extensions.html#module-scrapy.extensions.memusage)
871895
for more information about supported settings.
872896

897+
### Windows support
898+
899+
Just like the [upstream Scrapy extension](https://docs.scrapy.org/en/latest/topics/extensions.html#module-scrapy.extensions.memusage), this custom memory extension does not work
900+
on Windows. This is because the stdlib [`resource`](https://docs.python.org/3/library/resource.html)
901+
module is not available.
902+
873903

874904
## Examples
875905

@@ -931,23 +961,6 @@ See the [examples](examples) directory for more.
931961

932962
## Known issues
933963

934-
### Lack of native support for Windows
935-
936-
This package does not work natively on Windows. This is because:
937-
938-
* Playwright runs the driver in a subprocess. Source:
939-
[Playwright repository](https://github.com/microsoft/playwright-python/blob/v1.28.0/playwright/_impl/_transport.py#L120-L129).
940-
* "On Windows, the default event loop `ProactorEventLoop` supports subprocesses,
941-
whereas `SelectorEventLoop` does not". Source:
942-
[Python docs](https://docs.python.org/3/library/asyncio-platforms.html#asyncio-windows-subprocess).
943-
* Twisted's `asyncio` reactor requires the `SelectorEventLoop`. Source:
944-
[Twisted repository](https://github.com/twisted/twisted/blob/twisted-22.4.0/src/twisted/internet/asyncioreactor.py#L31).
945-
946-
Some users have reported having success
947-
[running under WSL](https://github.com/scrapy-plugins/scrapy-playwright/issues/7#issuecomment-817394494).
948-
See also [#78](https://github.com/scrapy-plugins/scrapy-playwright/issues/78)
949-
for information about working in headful mode under WSL.
950-
951964
### No per-request proxy support
952965
Specifying a proxy via the `proxy` Request meta key is not supported.
953966
Refer to the [Proxy support](#proxy-support) section for more information.

docs/changelog.md

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,15 @@
11
# scrapy-playwright changelog
22

3+
### [v0.0.36](https://github.com/scrapy-plugins/scrapy-playwright/releases/tag/v0.0.36) (2024-06-24)
4+
5+
* Windows support (#276)
6+
7+
8+
### [v0.0.35](https://github.com/scrapy-plugins/scrapy-playwright/releases/tag/v0.0.35) (2024-06-01)
9+
10+
* Update exception message check
11+
12+
313
### [v0.0.34](https://github.com/scrapy-plugins/scrapy-playwright/releases/tag/v0.0.34) (2024-01-01)
414

515
* Update dev status classifier to 4 - beta

scrapy_playwright/__init__.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1 +1 @@
1-
__version__ = "0.0.35"
1+
__version__ = "0.0.36"

scrapy_playwright/_utils.py

Lines changed: 42 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,11 +1,16 @@
1+
import asyncio
2+
import concurrent
13
import logging
4+
import platform
5+
import threading
26
from typing import Awaitable, Iterator, Optional, Tuple, Union
37

8+
import scrapy
49
from playwright.async_api import Error, Page, Request, Response
5-
from scrapy import Spider
610
from scrapy.http.headers import Headers
711
from scrapy.settings import Settings
812
from scrapy.utils.python import to_unicode
13+
from twisted.internet.defer import Deferred
914
from w3lib.encoding import html_body_declared_encoding, http_content_type_encoding
1015

1116

@@ -54,7 +59,7 @@ def _is_safe_close_error(error: Error) -> bool:
5459

5560
async def _get_page_content(
5661
page: Page,
57-
spider: Spider,
62+
spider: scrapy.Spider,
5863
context_name: str,
5964
scrapy_request_url: str,
6065
scrapy_request_method: str,
@@ -97,3 +102,38 @@ async def _get_header_value(
97102
return await resource.header_value(header_name)
98103
except Exception:
99104
return None
105+
106+
107+
if platform.system() == "Windows":
108+
109+
class _WindowsAdapter:
110+
"""Utility class to redirect coroutines to an asyncio event loop running
111+
in a different thread. This allows to use a ProactorEventLoop, which is
112+
supported by Playwright on Windows.
113+
"""
114+
115+
loop = None
116+
thread = None
117+
118+
@classmethod
119+
def get_event_loop(cls) -> asyncio.AbstractEventLoop:
120+
if cls.thread is None:
121+
if cls.loop is None:
122+
policy = asyncio.WindowsProactorEventLoopPolicy() # type: ignore
123+
cls.loop = policy.new_event_loop()
124+
asyncio.set_event_loop(cls.loop)
125+
if not cls.loop.is_running():
126+
cls.thread = threading.Thread(target=cls.loop.run_forever, daemon=True)
127+
cls.thread.start()
128+
logger.info("Started loop on separate thread: %s", cls.loop)
129+
return cls.loop
130+
131+
@classmethod
132+
async def get_result(cls, coro) -> concurrent.futures.Future:
133+
return asyncio.run_coroutine_threadsafe(coro=coro, loop=cls.get_event_loop()).result()
134+
135+
def _deferred_from_coro(coro) -> Deferred:
136+
return scrapy.utils.defer.deferred_from_coro(_WindowsAdapter.get_result(coro))
137+
138+
else:
139+
_deferred_from_coro = scrapy.utils.defer.deferred_from_coro

scrapy_playwright/handler.py

Lines changed: 8 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,6 @@
11
import asyncio
22
import logging
3+
import platform
34
from contextlib import suppress
45
from dataclasses import dataclass
56
from ipaddress import ip_address
@@ -25,20 +26,20 @@
2526
from scrapy.http.headers import Headers
2627
from scrapy.responsetypes import responsetypes
2728
from scrapy.settings import Settings
28-
from scrapy.utils.defer import deferred_from_coro
2929
from scrapy.utils.misc import load_object
3030
from scrapy.utils.reactor import verify_installed_reactor
3131
from twisted.internet.defer import Deferred, inlineCallbacks
3232

3333
from scrapy_playwright.headers import use_scrapy_headers
3434
from scrapy_playwright.page import PageMethod
3535
from scrapy_playwright._utils import (
36+
_deferred_from_coro,
3637
_encode_body,
38+
_get_float_setting,
3739
_get_header_value,
3840
_get_page_content,
3941
_is_safe_close_error,
4042
_maybe_await,
41-
_get_float_setting,
4243
)
4344

4445

@@ -108,7 +109,8 @@ class ScrapyPlaywrightDownloadHandler(HTTPDownloadHandler):
108109

109110
def __init__(self, crawler: Crawler) -> None:
110111
super().__init__(settings=crawler.settings, crawler=crawler)
111-
verify_installed_reactor("twisted.internet.asyncioreactor.AsyncioSelectorReactor")
112+
if platform.system() != "Windows":
113+
verify_installed_reactor("twisted.internet.asyncioreactor.AsyncioSelectorReactor")
112114
crawler.signals.connect(self._engine_started, signals.engine_started)
113115
self.stats = crawler.stats
114116

@@ -141,7 +143,7 @@ def from_crawler(cls: Type[PlaywrightHandler], crawler: Crawler) -> PlaywrightHa
141143

142144
def _engine_started(self) -> Deferred:
143145
"""Launch the browser. Use the engine_started signal as it supports returning deferreds."""
144-
return deferred_from_coro(self._launch())
146+
return _deferred_from_coro(self._launch())
145147

146148
async def _launch(self) -> None:
147149
"""Launch Playwright manager and configured startup context(s)."""
@@ -321,7 +323,7 @@ def _set_max_concurrent_context_count(self):
321323
def close(self) -> Deferred:
322324
logger.info("Closing download handler")
323325
yield super().close()
324-
yield deferred_from_coro(self._close())
326+
yield _deferred_from_coro(self._close())
325327

326328
async def _close(self) -> None:
327329
logger.info("Closing %i contexts", len(self.context_wrappers))
@@ -337,7 +339,7 @@ async def _close(self) -> None:
337339

338340
def download_request(self, request: Request, spider: Spider) -> Deferred:
339341
if request.meta.get("playwright"):
340-
return deferred_from_coro(self._download_request(request, spider))
342+
return _deferred_from_coro(self._download_request(request, spider))
341343
return super().download_request(request, spider)
342344

343345
async def _download_request(self, request: Request, spider: Spider) -> Response:

tests/__init__.py

Lines changed: 28 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,10 +1,38 @@
1+
import inspect
2+
import logging
3+
import platform
14
from contextlib import asynccontextmanager
5+
from functools import wraps
26

37
from scrapy import Request
48
from scrapy.http.response.html import HtmlResponse
59
from scrapy.utils.test import get_crawler
610

711

12+
logger = logging.getLogger("scrapy-playwright-tests")
13+
14+
15+
if platform.system() == "Windows":
16+
from scrapy_playwright._utils import _WindowsAdapter
17+
18+
def allow_windows(test_method):
19+
"""Wrap tests with the _WindowsAdapter class on Windows."""
20+
if not inspect.iscoroutinefunction(test_method):
21+
raise RuntimeError(f"{test_method} must be an async def method")
22+
23+
@wraps(test_method)
24+
async def wrapped(self, *args, **kwargs):
25+
logger.debug("Calling _WindowsAdapter.get_result for %r", self)
26+
await _WindowsAdapter.get_result(test_method(self, *args, **kwargs))
27+
28+
return wrapped
29+
30+
else:
31+
32+
def allow_windows(test_method):
33+
return test_method
34+
35+
836
@asynccontextmanager
937
async def make_handler(settings_dict: dict):
1038
"""Convenience function to obtain an initialized handler and close it gracefully"""

tests/conftest.py

Lines changed: 17 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,20 @@
1+
import platform
2+
3+
import pytest
4+
5+
6+
@pytest.hookimpl(tryfirst=True)
7+
def pytest_configure(config):
8+
# https://twistedmatrix.com/trac/ticket/9766
9+
# https://github.com/pytest-dev/pytest-twisted/issues/80
10+
11+
if config.getoption("reactor", "default") == "asyncio" and platform.system() == "Windows":
12+
import asyncio
13+
14+
selector_policy = asyncio.WindowsSelectorEventLoopPolicy()
15+
asyncio.set_event_loop_policy(selector_policy)
16+
17+
118
def pytest_sessionstart(session): # pylint: disable=unused-argument
219
"""
320
Called after the Session object has been created and before performing

0 commit comments

Comments
 (0)