-
Notifications
You must be signed in to change notification settings - Fork 455
Description
I initially reported this as a papermill issue(not quite sure about this). I am copying that issue to SparkMagic community to see if there happen to be any expert who can provide advice for unblocking. Please feel free to close if this is not SparkMagic issue. Thanks in advance.
Describe the bug
Our use case is to use SparkMagic wrapper kernels with PaperMill notebook execution.
Most of the functions are working as expected except the %%sql magic, which will get stuck during execution. The SparkMagic works properly when executed in interactive mode in JupyterLab and issue only happens for %%sql magic when running with PaperMill.
From the debugging log(attached), I can see the %%sql logic had been executed and response was retrieved back. The execution state was back to idle at the end. But the output of %%sql cell was not updated properly and the following cells were not executed.
Following content was printed by PaperMill, which shows the %%sql has been executed properly. This content was not rendered into cell output.
msg_type: display_data
content: {'data': {'text/plain': '<IPython.core.display.HTML object>', 'text/html': '\n<style scoped>\n .dataframe tbody tr th:only-of-type {\n vertical-align: middle;\n }\n\n .dataframe tbody tr th {\n vertical-align: top;\n }\n\n .dataframe thead th {\n text-align: right;\n }\n</style>\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n'}, 'metadata': {}, 'transient': {}}\n
database tableName isTemporary 0 default movie_reviews False
To Reproduce
conda create --name py310 python=3.10
conda activate pyenv310
pip install sparkmagic
pip install papermill
# install kernelspecs
SITE_PACKAGES_LOC=$(pip show sparkmagic | grep Location | awk '{print $2}')
cd $SITE_PACKAGES_LOC
jupyter-kernelspec install sparkmagic/kernels/sparkkernel --user
jupyter-kernelspec install sparkmagic/kernels/pysparkkernel --user
jupyter-kernelspec install sparkmagic/kernels/sparkrkernel --user
jupyter nbextension enable --py --sys-prefix widgetsnbextension
pip install notebook==6.5.1 (Downgrade rom 7.0.3 to 6.5.1 due to ModuleNotFoundError: No module named 'notebook.utils')
# Run papermill job(notebook is also uploaded)
# Before run this, an EMR cluster is needed and the sparkmagic configure is also needed.
# If it's not possible/easy to create it, please comment for any testing/verification needed, I can help. Also, you can check the uploaded the papermill debugging log.
papermill pm_sparkmagic_test.ipynb output1.ipynb --kernel pysparkkernel --log-level DEBUG
Following is package list which might be highly related. I also attached one text contains all the packages.
pip list | grep 'papermill\|sparkmagic\|autovizwidget\|hdijupyterutils\|ipykernel\|ipython\|ipywidgets\|mock\|nest-asyncio\|nose\|notebook\|numpy\|pandas\|requests\|requests-kerberos\|tornado\|ansiwrap\|click\|entrypoints\|nbclient\|nbformat\|pyyaml\|requests\|tenacity\|tqdm\|jupyter\|ipython'|sort
ansiwrap 0.8.4
autovizwidget 0.20.5
click 8.1.7
entrypoints 0.4
hdijupyterutils 0.20.5
ipykernel 6.25.2
ipython 8.15.0
ipython-genutils 0.2.0
ipywidgets 8.1.0
jupyter 1.0.0
jupyter_client 8.3.1
jupyter-console 6.6.3
jupyter_core 5.3.1
jupyter-events 0.7.0
jupyterlab 4.0.5
jupyterlab-pygments 0.2.2
jupyterlab_server 2.24.0
jupyterlab-widgets 3.0.8
jupyter-lsp 2.2.0
jupyter_server 2.7.3
jupyter_server_terminals 0.4.4
nbclient 0.8.0
nbformat 5.9.2
nest-asyncio 1.5.5
notebook 6.5.1
notebook_shim 0.2.3
numpy 1.25.2
pandas 1.5.3
papermill 2.4.0
requests 2.31.0
requests-kerberos 0.14.0
sparkmagic 0.20.5
tenacity 8.2.3
tornado 6.3.3
tqdm 4.66.1
Expected behavior
The %%sql should not hang and following cell should proceed for execution.
Screenshots
Output notebook of papermill:

Expected output(from JupyterLab)

Versions:
- SparkMagic (0.20.5)
- Livy (N/A)
- Spark (N/A)
Additional context
log and other files.zip contains:
1. log - papermill debugging log
2. my_test_env_requirements.txt - full list of packages in the conda env
3. pm_sparkmagic_test.ipynb - the notebook executed in jupyterlab and it's also the input of papermill job
4. output1.ipynb - output notebook from the papermill job