Skip to content

[BUG] SparkMagic pyspark kernel magic(%%sql) hangs when running with Papermill. #833

@edwardps

Description

@edwardps

I initially reported this as a papermill issue(not quite sure about this). I am copying that issue to SparkMagic community to see if there happen to be any expert who can provide advice for unblocking. Please feel free to close if this is not SparkMagic issue. Thanks in advance.

Describe the bug
Our use case is to use SparkMagic wrapper kernels with PaperMill notebook execution.
Most of the functions are working as expected except the %%sql magic, which will get stuck during execution. The SparkMagic works properly when executed in interactive mode in JupyterLab and issue only happens for %%sql magic when running with PaperMill.

From the debugging log(attached), I can see the %%sql logic had been executed and response was retrieved back. The execution state was back to idle at the end. But the output of %%sql cell was not updated properly and the following cells were not executed.

Following content was printed by PaperMill, which shows the %%sql has been executed properly. This content was not rendered into cell output.

msg_type: display_data
content: {'data': {'text/plain': '<IPython.core.display.HTML object>', 'text/html': '

\n<style scoped>\n .dataframe tbody tr th:only-of-type {\n vertical-align: middle;\n }\n\n .dataframe tbody tr th {\n vertical-align: top;\n }\n\n .dataframe thead th {\n text-align: right;\n }\n</style>\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
databasetableNameisTemporary
0defaultmovie_reviewsFalse
\n
'}, 'metadata': {}, 'transient': {}}

To Reproduce

conda create --name py310 python=3.10
conda activate pyenv310

pip install sparkmagic
pip install papermill

# install kernelspecs
SITE_PACKAGES_LOC=$(pip show sparkmagic | grep Location | awk '{print $2}')
cd $SITE_PACKAGES_LOC
jupyter-kernelspec install sparkmagic/kernels/sparkkernel --user
jupyter-kernelspec install sparkmagic/kernels/pysparkkernel --user
jupyter-kernelspec install sparkmagic/kernels/sparkrkernel --user

jupyter nbextension enable --py --sys-prefix widgetsnbextension 

pip install notebook==6.5.1 (Downgrade rom 7.0.3 to 6.5.1 due to ModuleNotFoundError: No module named 'notebook.utils')

# Run papermill job(notebook is also uploaded)
# Before run this, an EMR cluster is needed and the sparkmagic configure is also needed. 
# If it's not possible/easy to create it, please comment for any testing/verification needed, I can help. Also, you can check the uploaded the papermill debugging log. 
papermill pm_sparkmagic_test.ipynb output1.ipynb --kernel pysparkkernel  --log-level DEBUG 

Following is package list which might be highly related. I also attached one text contains all the packages.

pip list | grep 'papermill\|sparkmagic\|autovizwidget\|hdijupyterutils\|ipykernel\|ipython\|ipywidgets\|mock\|nest-asyncio\|nose\|notebook\|numpy\|pandas\|requests\|requests-kerberos\|tornado\|ansiwrap\|click\|entrypoints\|nbclient\|nbformat\|pyyaml\|requests\|tenacity\|tqdm\|jupyter\|ipython'|sort
ansiwrap                  0.8.4
autovizwidget             0.20.5
click                     8.1.7
entrypoints               0.4
hdijupyterutils           0.20.5
ipykernel                 6.25.2
ipython                   8.15.0
ipython-genutils          0.2.0
ipywidgets                8.1.0
jupyter                   1.0.0
jupyter_client            8.3.1
jupyter-console           6.6.3
jupyter_core              5.3.1
jupyter-events            0.7.0
jupyterlab                4.0.5
jupyterlab-pygments       0.2.2
jupyterlab_server         2.24.0
jupyterlab-widgets        3.0.8
jupyter-lsp               2.2.0
jupyter_server            2.7.3
jupyter_server_terminals  0.4.4
nbclient                  0.8.0
nbformat                  5.9.2
nest-asyncio              1.5.5
notebook                  6.5.1
notebook_shim             0.2.3
numpy                     1.25.2
pandas                    1.5.3
papermill                 2.4.0
requests                  2.31.0
requests-kerberos         0.14.0
sparkmagic                0.20.5
tenacity                  8.2.3
tornado                   6.3.3
tqdm                      4.66.1

Expected behavior
The %%sql should not hang and following cell should proceed for execution.

Screenshots
Output notebook of papermill:
image

Expected output(from JupyterLab)
image

Versions:

  • SparkMagic (0.20.5)
  • Livy (N/A)
  • Spark (N/A)

Additional context
log and other files.zip contains:

1. log - papermill debugging log
2. my_test_env_requirements.txt - full list of packages in the conda env
3. pm_sparkmagic_test.ipynb - the notebook executed in jupyterlab and it's also the input of papermill job
4. output1.ipynb - output notebook from the papermill job

Metadata

Metadata

Assignees

No one assigned

    Labels

    kind:bugAn unexpected error or issue with sparkmagic

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions