-
Notifications
You must be signed in to change notification settings - Fork 2
Expand file tree
/
Copy pathdatasets.html
More file actions
257 lines (247 loc) · 10.4 KB
/
datasets.html
File metadata and controls
257 lines (247 loc) · 10.4 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
---
layout: bootstrap
menu_item: datasets
---
<h3>Publication models</h3>
<p>
There are three major ways in which the CLLD project helps publishing cross-linguistic
datasets:
</p>
<ol>
<li>
Large databases may be published as standalone CLLD apps under the umbrella of the
clld.org series; basically following the example of a book in an edited series.
</li>
<li>
Smaller datasets may be submitted to one of the database journals started by the
CLLD project.
</li>
<li>
Datasets may be hosted independently from the CLLD project, simply re-using the
<span style="font-family: monospace;">clld</span> software.
</li>
</ol>
<h3>Published datasets</h3>
<p>
The following datasets are maintained by the CLLD project, i.e. fall into categories 1
and 2 above:
</p>
<table class="table table-condensed table-striped">
<thead>
<tr>
<th>Name</th><th>Description</th><th>Editors</th><th>CLDF dataset on ZENODO</th>
</tr>
</thead>
<tbody>
<tr>
<td><a href="http://wals.info">WALS Online</a></td>
<td>The World Atlas of Language Structures</td>
<td>Matthew Dryer & Martin Haspelmath</td>
<td>
<a href="https://doi.org/10.5281/zenodo.3731125"><img src="https://zenodo.org/badge/DOI/10.5281/zenodo.3731125.svg" alt="DOI"></a>
</td>
</tr>
<tr>
<td><a href="http://wold.clld.org">WOLD</a></td>
<td>The World Loanword Database</td>
<td>Martin Haspelmath & Uri Tadmor</td>
<td>
<a href="https://doi.org/10.5281/zenodo.3537579"><img src="https://zenodo.org/badge/DOI/10.5281/zenodo.3537579.svg" alt="DOI"></a>
</td>
</tr>
<tr>
<td><a href="http://apics-online.info">APiCS Online</a></td>
<td>The Atlas of Pidgin and Creole Language Structures</td>
<td>Susanne Maria Michaelis, Philippe Maurer, Martin Haspelmath, and Magnus Huber</td>
<td>
<a href="https://doi.org/10.5281/zenodo.3823888"><img src="https://zenodo.org/badge/DOI/10.5281/zenodo.3823888.svg" alt="DOI"></a>
</td>
</tr>
<tr>
<td>ValPaL</td>
<td>Valency Patterns Leipzig</td>
<td>Iren Hartmann, Martin Haspelmath & Bradley Taylor</td>
<td> </td>
</tr>
<tr>
<td><a href="http://ewave-atlas.org">eWAVE</a></td>
<td>The Electronic World Atlas of Varieties of English</td>
<td>Bernd Kortmann & Kerstin Lunkenheimer</td>
<td>
<a href="https://doi.org/10.5281/zenodo.3712132"><img src="https://zenodo.org/badge/DOI/10.5281/zenodo.3712132.svg" alt="DOI"></a>
</td>
</tr>
<tr>
<td><a href="http://afbo.info">AfBo</a></td>
<td>A world-wide survey of affix borrowing</td>
<td>Frank Seifart</td>
<td>
<a href="https://doi.org/10.5281/zenodo.3610154"><img src="https://zenodo.org/badge/DOI/10.5281/zenodo.3610154.svg" alt="DOI"></a>
</td>
</tr>
<tr>
<td><a href="http://ids.clld.org">IDS</a></td>
<td>The Intercontinental Dictionary Series</td>
<td>Bernard Comrie & Hans-Jörg Bibiko</td>
<td>
<a href="https://doi.org/10.5281/zenodo.1299512"><img src="https://zenodo.org/badge/DOI/10.5281/zenodo.1299512.svg" alt="DOI"></a>
</td>
</tr>
<tr>
<td><a href="http://asjp.clld.org">ASJP</a></td>
<td>The database of the Automated Similarity Judgement Program</td>
<td>Søren Wichmann et al.</td>
<td>
<a href="https://doi.org/10.5281/zenodo.3843469"><img src="https://zenodo.org/badge/DOI/10.5281/zenodo.3843469.svg" alt="DOI"></a>
</td>
</tr>
<tr>
<td>Numerals</td>
<td>Numerals in the World’s Languages</td>
<td>Eugene Chan</td>
<td> </td>
</tr>
<tr>
<td><a href="http://glottolog.org">Glottolog</a></td>
<td>catalog of all languages, families and dialects, with comprehensive reference information</td>
<td>Harald Hammarström, Martin Haspelmath, Robert Forkel & Sebastian Bank</td>
<td>
<a href="https://doi.org/10.5281/zenodo.3754594"><img src="https://zenodo.org/badge/DOI/10.5281/zenodo.3754594.svg" alt="DOI"></a>
</td>
</tr>
<tr>
<td><a href="http://sails.clld.org">SAILS Online</a></td>
<td>The South American Indigenous Language Structures Online</td>
<td>Harald Hammarström</td>
<td>
<a href="https://doi.org/10.5281/zenodo.3608862"><img src="https://zenodo.org/badge/DOI/10.5281/zenodo.3608862.svg" alt="DOI"></a>
</td>
</tr>
<tr>
<td><a href="http://phoible.org">PHOIBLE Online</a></td>
<td>The world's largest database of phonological inventories</td>
<td>Steven Moran, Daniel McCloy and Richard Wright</td>
<td>
<a href="https://doi.org/10.5281/zenodo.2677911"><img src="https://zenodo.org/badge/DOI/10.5281/zenodo.2677911.svg" alt="DOI"></a>
</td>
</tr>
<tr>
<td><a href="http://tsammalex.clld.org">Tsammalex</a></td>
<td>A multilingual lexical database on plants and animals</td>
<td>Christfried Naumann & Steven Moran & Guillaume Segerer & Robert Forkel</td>
<td>
</td>
</tr>
<tr>
<td><a href="http://csd.clld.org">CSD</a></td>
<td>The Comparative Siouan Dictionary</td>
<td>Rankin, Robert L. & Carter, Richard T. & Jones, A. Wesley & Koontz, John E. & Rood, David S. & Hartmann, Iren</td>
<td>
</td>
</tr>
<tr>
<td><a href="https://concepticon.clld.org">Concepticon</a></td>
<td>The Concepticon</td>
<td>List, Johann Mattis & Rzymski, Christoph & Greenhill, Simon & Schweikhard, Nathanael & Pianykh, Kristina & Tjuka, Annika & Wu, Mei-Shin & Forkel, Robert</td>
<td>
</td>
</tr>
<tr>
<td><a href="http://dogonlanguages.org/">Dogonlanguages</a></td>
<td>Dogon and Bangime Linguistics</td>
<td>Moran, Steven & Forkel, Robert & Heath, Jeffrey</td>
<td>
</td>
</tr>
<tr>
<td><a href="https://dictionaria.clld.org">Dictionaria</a></td>
<td>Open-access journal publishing dictionaries from all over the world</td>
<td>
Chief editors: Haspelmath, Martin & Stiebels, Barbara;
Managing editor: Hartmann, Iren</td>
<td>
</td>
</tr>
<tr>
<td><a href="https://ldh.clld.org/">LDH</a></td>
<td>The Language Description Heritage library</td>
<td>
Managing editor: Robert Forkel</td>
<td>
<a href="https://zenodo.org/communities/ldh/">Community on Zenodo</a>
</td>
</tr>
<tr>
<td><a href="https://tular.clld.org/">TuLeD</a></td>
<td>Tupían Lexical Database</td>
<td>Fabrício Ferraz Gerardi and Stanislav Reichert</td>
<td></td>
</tr>
</tbody>
</table>
<p>
Among datasets in category 3 above, the following have come to our attention:
</p>
<table class="table table-condensed table-striped">
<thead>
<tr>
<th>Name</th><th>Description</th><th>Editors</th>
</tr>
</thead>
<tbody>
<tr>
<td><a href="http://northeuralex.org/">NorthEuraLex</a></td>
<td>Lexicostatistical Database of Northern Eurasia</td>
<td>Johannes Dellert and Gerhard Jäger</td>
</tr>
<tr>
<td><a href="http://moslex.org/">MosLex</a></td>
<td>Moscow Lexical Database</td>
<td>Alexei Kassian</td>
</tr>
<tr>
<td><a href="https://doreco.huma-num.fr/">DoReCo</a></td>
<td>DoReCo (Language DOcumentation REference COrpus)</td>
<td>Frank Seifart et al.</td>
</tr>
</tbody>
</table>
<h3>Update policy</h3>
<p>
CLLD datasets follow the update model of the traditional publications: Errata or
additions are collected until a new edition of the dataset is released. Typically
we aim to have not more than one edition per year.
</p>
<p>
But since we still want to exploit the fact that online publications could be
continuously updated, we distinguish two categories of data:
</p>
<dl>
<dt>Core data</dt>
<dd>
represents the contributions of the dataset to research, i.e. the citeable
content, e.g. value assignments in typological databases.
This type of data can only be updated with a new edition, since we want to make
it easy to identify and cite exact versions of a dataset.
</dd>
<dt>Supplemental data</dt>
<dd>
may be added to a dataset to enhance navigation within the set, or to enable
visualization. Examples for this kind of data are geo-coordinates for languages,
bibliographical information for sources, etc. Data in this category may be
updated anytime, although we will still keep track of when and what is changed.
</dd>
</dl>
<h3>Data reuse</h3>
<p>
CLLD data is meant to be easily re-usable and we would love to hear about cases where it has been reused - be it
in research or teaching.
</p>
<ul>
<li><a href="https://github.com/clld/clld/wiki/Tilemill">Using CLLD data with Tilemill</a></li>
<li><a href="http://nbviewer.org/gist/xflr6/9050337/glottolog.ipynb">Exploring Glottolog with Python</a></li>
<li><a href="http://nbviewer.ipython.org/url/clld.org/notebooks/Exploring%20APiCS%20with%20IPython.ipynb">Exploring APiCS with IPython</a> [<a href="/notebooks/Exploring%20APiCS%20with%20IPython.ipynb">Notebook</a>]</li>
</ul>
<!--
<h3>CLLD update policies</h3>
-->