Skip to content

Commit 1b09ac4

Browse files
authored
Modernize and enhance violinplot (#3353)
* Add very basic new violinplot code * Break apart violin plotting into two loops * Update saturation * Take advantage of new common methods for dodging and scale inversion * Get all density normalizations working (with common norm) * Get split violins and singular KDEs working * Other KDE parameters, including bw deprecation * Add inner points and sticks * Add inner quartiles and box * Add group-specific normalization (scale_hue=False) * Add fill, inner_kwargs, and rename scale/scale_hue * Allow prefix checks in check_argument helper * Use a dash for the inner box median by default, but allow more customization * Use plot_violins in catplot * Remove vestigal _ViolinPlotter and tests * Basic positional tests * Color tests * Test for inner statistics * Further testing * Fix tests * Update API docs and examples * Update violinplot docstring
1 parent 3fd4146 commit 1b09ac4

File tree

9 files changed

+1108
-1269
lines changed

9 files changed

+1108
-1269
lines changed

doc/_docstrings/barplot.ipynb

Lines changed: 9 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -160,7 +160,7 @@
160160
"outputs": [],
161161
"source": [
162162
"ax = sns.barplot(flights, x=\"year\", y=\"passengers\", estimator=\"sum\", errorbar=None)\n",
163-
"ax.bar_label(ax.containers[0], fontsize=10)"
163+
"ax.bar_label(ax.containers[0], fontsize=10);"
164164
]
165165
},
166166
{
@@ -248,6 +248,14 @@
248248
" height=4, aspect=.5,\n",
249249
")"
250250
]
251+
},
252+
{
253+
"cell_type": "code",
254+
"execution_count": null,
255+
"id": "0b6a62b9-eef7-4c85-a1c2-85a58231e6c6",
256+
"metadata": {},
257+
"outputs": [],
258+
"source": []
251259
}
252260
],
253261
"metadata": {

doc/_docstrings/violinplot.ipynb

Lines changed: 159 additions & 32 deletions
Original file line numberDiff line numberDiff line change
@@ -16,11 +16,11 @@
1616
]
1717
},
1818
{
19-
"cell_type": "markdown",
20-
"id": "863c03b1-63e2-4d60-a3a4-4693afab4b5b",
19+
"cell_type": "raw",
20+
"id": "c72b5394-ff5f-42b1-b083-2e42b2ffdf0f",
2121
"metadata": {},
2222
"source": [
23-
"Draw a single horizontal boxplot, assigning the data directly to the coordinate variable:"
23+
"The default violinplot represents a distribution two ways: a patch showing a symmetric kernel density estimate (KDE), and the quartiles / whiskers of a box plot:"
2424
]
2525
},
2626
{
@@ -35,11 +35,11 @@
3535
]
3636
},
3737
{
38-
"cell_type": "markdown",
39-
"id": "aeea380b-405e-4762-8ede-db57f5549ca5",
38+
"cell_type": "raw",
39+
"id": "e7d25589-0dc9-48ce-92f9-ab61ffbf964a",
4040
"metadata": {},
4141
"source": [
42-
"Group by a categorical variable, referencing columns in a dataframe:"
42+
"In a bivariate plot, one of the variables will \"group\" so that multiple violins are drawn:"
4343
]
4444
},
4545
{
@@ -53,11 +53,11 @@
5353
]
5454
},
5555
{
56-
"cell_type": "markdown",
57-
"id": "c9a99aa4-2da0-42fa-879a-0c3b264803f4",
56+
"cell_type": "raw",
57+
"id": "6d588b32-b14b-4b33-bbd9-69b17f8212a6",
5858
"metadata": {},
5959
"source": [
60-
"Draw vertical violins, grouped by two variables:"
60+
"By default, the orientation of the plot is determined by the variable types, preferring to group by a categorical variable:"
6161
]
6262
},
6363
{
@@ -71,11 +71,29 @@
7171
]
7272
},
7373
{
74-
"cell_type": "markdown",
75-
"id": "973e6617-5720-428d-a0ac-447e76aa9fde",
74+
"cell_type": "raw",
75+
"id": "402812f2-c024-4179-9fee-fed92f03deb2",
7676
"metadata": {},
7777
"source": [
78-
"Draw split violins to take up less space:"
78+
"Pass `fill=False to draw line-art violins:"
79+
]
80+
},
81+
{
82+
"cell_type": "code",
83+
"execution_count": null,
84+
"id": "8e00ce8b-5871-486b-8c55-a4f2e764aa86",
85+
"metadata": {},
86+
"outputs": [],
87+
"source": [
88+
"sns.violinplot(data=df, x=\"class\", y=\"age\", hue=\"alive\", fill=False)"
89+
]
90+
},
91+
{
92+
"cell_type": "raw",
93+
"id": "8350abce-6a40-4e18-9501-7d358192471b",
94+
"metadata": {},
95+
"source": [
96+
"Draw \"split\" violins to take up less space, and only show the data quarties:"
7997
]
8098
},
8199
{
@@ -85,51 +103,51 @@
85103
"metadata": {},
86104
"outputs": [],
87105
"source": [
88-
"sns.violinplot(data=df, x=\"deck\", y=\"age\", hue=\"alive\", split=True)"
106+
"sns.violinplot(data=df, x=\"class\", y=\"age\", hue=\"alive\", split=True, inner=\"quart\")"
89107
]
90108
},
91109
{
92-
"cell_type": "markdown",
93-
"id": "f291d4a2-41bc-4eb0-813d-7a1ceacc0cb0",
110+
"cell_type": "raw",
111+
"id": "90f4263f-7294-4ad5-bff4-25d7d796cb45",
94112
"metadata": {},
95113
"source": [
96-
"Prevent the density from smoothing beyond the limits of the data:"
114+
"Add a small gap between the dodged violins:"
97115
]
98116
},
99117
{
100118
"cell_type": "code",
101119
"execution_count": null,
102-
"id": "82556de0-3756-426c-a591-9af6ed6c45d4",
120+
"id": "26cb5b89-496d-4893-8914-ca8b6fbf97b7",
103121
"metadata": {},
104122
"outputs": [],
105123
"source": [
106-
"sns.violinplot(data=df, x=\"age\", y=\"alive\", cut=0)"
124+
"sns.violinplot(data=df, x=\"class\", y=\"age\", hue=\"alive\", split=True, gap=.1, inner=\"quart\")"
107125
]
108126
},
109127
{
110-
"cell_type": "markdown",
111-
"id": "6f351f71-1db3-4c5a-948c-9e1dbc550234",
128+
"cell_type": "raw",
129+
"id": "bbea49e0-7b08-4b25-8686-1d5404b71601",
112130
"metadata": {},
113131
"source": [
114-
"Use a narrower bandwidth to reduce the amount of smoothing:"
132+
"Starting in version 0.13.0, it is possilbe to \"split\" single violins:"
115133
]
116134
},
117135
{
118136
"cell_type": "code",
119137
"execution_count": null,
120-
"id": "8d17e1e3-e0f4-4d2c-ac6e-aec42ed75390",
138+
"id": "ba261531-a280-44e5-b8c0-bcc5a53f60bf",
121139
"metadata": {},
122140
"outputs": [],
123141
"source": [
124-
"sns.violinplot(data=df, x=\"age\", y=\"alive\", bw=.15)"
142+
"sns.violinplot(data=df, x=\"class\", y=\"age\", split=True, inner=\"quart\")"
125143
]
126144
},
127145
{
128-
"cell_type": "markdown",
129-
"id": "c4aaeb60-6c1b-4337-91ce-d6b744a3dd90",
146+
"cell_type": "raw",
147+
"id": "7c4dafa1-2747-4b43-ba4a-4c9b32778086",
130148
"metadata": {},
131149
"source": [
132-
"Represent every observation inside the distribution"
150+
"Represent every observation inside the distribution by setting `inner=\"stick\"` or `inner=\"point\"`:"
133151
]
134152
},
135153
{
@@ -139,15 +157,15 @@
139157
"metadata": {},
140158
"outputs": [],
141159
"source": [
142-
"sns.violinplot(data=df, x=\"age\", y=\"embark_town\", inner=\"stick\")"
160+
"sns.violinplot(data=df, x=\"age\", y=\"deck\", inner=\"point\")"
143161
]
144162
},
145163
{
146-
"cell_type": "markdown",
147-
"id": "01622556-9df8-4af1-b36c-9bc5f6b6099e",
164+
"cell_type": "raw",
165+
"id": "23c13695-cd01-4da8-bc89-2519ae445f9f",
148166
"metadata": {},
149167
"source": [
150-
"Use a different scaling rule for normalizing the density:"
168+
"Normalize the width of each violin to represent the number of observations:"
151169
]
152170
},
153171
{
@@ -157,13 +175,122 @@
157175
"metadata": {},
158176
"outputs": [],
159177
"source": [
160-
"sns.violinplot(data=df, x=\"age\", y=\"embark_town\", scale=\"count\")"
178+
"sns.violinplot(data=df, x=\"age\", y=\"deck\", inner=\"point\", density_norm=\"count\")"
179+
]
180+
},
181+
{
182+
"cell_type": "raw",
183+
"id": "abe650fb-4d26-4bac-97f3-f451a3872cf5",
184+
"metadata": {},
185+
"source": [
186+
"By default, the KDE will smooth past the extremes of the observed data; set `cut=0` to prevent this:"
187+
]
188+
},
189+
{
190+
"cell_type": "code",
191+
"execution_count": null,
192+
"id": "82556de0-3756-426c-a591-9af6ed6c45d4",
193+
"metadata": {},
194+
"outputs": [],
195+
"source": [
196+
"sns.violinplot(data=df, x=\"age\", y=\"alive\", cut=0, inner=\"stick\")"
197+
]
198+
},
199+
{
200+
"cell_type": "raw",
201+
"id": "abfb9e78-d524-4536-90ef-c71834b055f9",
202+
"metadata": {},
203+
"source": [
204+
"The `bw_adjust` parameter controls the amount of smoothing:"
205+
]
206+
},
207+
{
208+
"cell_type": "code",
209+
"execution_count": null,
210+
"id": "8d17e1e3-e0f4-4d2c-ac6e-aec42ed75390",
211+
"metadata": {},
212+
"outputs": [],
213+
"source": [
214+
"sns.violinplot(data=df, x=\"age\", y=\"alive\", bw_adjust=.5, inner=\"stick\")"
215+
]
216+
},
217+
{
218+
"cell_type": "raw",
219+
"id": "407bc513-5b7f-418c-8ffe-ec488836586d",
220+
"metadata": {},
221+
"source": [
222+
"By default, the violins are drawn at fixed positions on a categorical scale, even if the grouping variable is numeric. Starting in version 0.13.0, pass the `native_scale=True` parameter to preserve the original scale on both axes:"
223+
]
224+
},
225+
{
226+
"cell_type": "code",
227+
"execution_count": null,
228+
"id": "e7b6d901-9a97-4716-8d24-1b30145e9c57",
229+
"metadata": {},
230+
"outputs": [],
231+
"source": [
232+
"sns.violinplot(x=df[\"age\"].round(-1) + 5, y=df[\"fare\"], native_scale=True)"
233+
]
234+
},
235+
{
236+
"cell_type": "raw",
237+
"id": "790e3989-0b47-4e77-9bdb-dc757d1e938c",
238+
"metadata": {},
239+
"source": [
240+
"When using a categorical scale, the `formatter` parameter accepts a function that defines categories:"
241+
]
242+
},
243+
{
244+
"cell_type": "code",
245+
"execution_count": null,
246+
"id": "28a769d4-3e23-4b53-a9ef-391d5fc24201",
247+
"metadata": {},
248+
"outputs": [],
249+
"source": [
250+
"decades = lambda x: f\"{int(x)}–{int(x + 10)}\"\n",
251+
"sns.violinplot(x=df[\"age\"].round(-1), y=df[\"fare\"], formatter=decades)"
252+
]
253+
},
254+
{
255+
"cell_type": "raw",
256+
"id": "6f914d73-7a0c-4fbc-8432-40c4f0577857",
257+
"metadata": {},
258+
"source": [
259+
"By default, the \"inner\" representation scales with the `linewidth` and `linecolor` parameters:"
260+
]
261+
},
262+
{
263+
"cell_type": "code",
264+
"execution_count": null,
265+
"id": "18cb2afd-8487-40bd-b3f2-1f83243ffa3c",
266+
"metadata": {},
267+
"outputs": [],
268+
"source": [
269+
"sns.violinplot(data=df, x=\"age\", linewidth=1, linecolor=\"k\")"
270+
]
271+
},
272+
{
273+
"cell_type": "raw",
274+
"id": "ca2ef541-c07f-4853-ba98-ce75855ba262",
275+
"metadata": {},
276+
"source": [
277+
"Use `inner_kws` to pass parameters directly to the inner plotting functions:"
278+
]
279+
},
280+
{
281+
"cell_type": "code",
282+
"execution_count": null,
283+
"id": "934f91bc-2698-4c07-92cf-4e6039c801b2",
284+
"metadata": {},
285+
"outputs": [],
286+
"source": [
287+
"sns.violinplot(data=df, x=\"age\", inner_kws=dict(box_width=15, whis_width=2, color=\".8\"))"
161288
]
162289
},
163290
{
164291
"cell_type": "code",
165292
"execution_count": null,
166-
"id": "fdda9a33-37f3-43fd-b02d-1ff414657a37",
293+
"id": "4aa00d3c-f016-4db8-b6b0-da4e6a327831",
167294
"metadata": {},
168295
"outputs": [],
169296
"source": []

doc/_tutorial/categorical.ipynb

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -293,15 +293,15 @@
293293
"source": [
294294
"sns.catplot(\n",
295295
" data=tips, x=\"total_bill\", y=\"day\", hue=\"sex\",\n",
296-
" kind=\"violin\", bw=.15, cut=0,\n",
296+
" kind=\"violin\", bw_adjust=.5, cut=0,\n",
297297
")"
298298
]
299299
},
300300
{
301301
"cell_type": "raw",
302302
"metadata": {},
303303
"source": [
304-
"It's also possible to \"split\" the violins when the hue parameter has only two levels, which can allow for a more efficient use of space:"
304+
"It's also possible to \"split\" the violins, which can allow for a more efficient use of space:"
305305
]
306306
},
307307
{
@@ -406,7 +406,7 @@
406406
"metadata": {},
407407
"outputs": [],
408408
"source": [
409-
"sns.catplot(data=titanic, x=\"deck\", kind=\"count\", palette=\"ch:.25\")"
409+
"sns.catplot(data=titanic, x=\"deck\", kind=\"count\")"
410410
]
411411
},
412412
{

doc/whatsnew/index.rst

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -3,6 +3,13 @@
33
What's new in each version
44
==========================
55

6+
v0.13
7+
-----
8+
.. toctree::
9+
:maxdepth: 2
10+
11+
v0.13.0
12+
613
v0.12
714
-----
815
.. toctree::

examples/wide_form_violinplot.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -27,7 +27,7 @@
2727
f, ax = plt.subplots(figsize=(11, 6))
2828

2929
# Draw a violinplot with a narrower bandwidth than the default
30-
sns.violinplot(data=corr_df, palette="Set3", bw=.2, cut=1, linewidth=1)
30+
sns.violinplot(data=corr_df, palette="Set3", bw_adjust=.5, cut=1, linewidth=1)
3131

3232
# Finalize the figure
3333
ax.set(ylim=(-.7, 1.05))

0 commit comments

Comments
 (0)