-
Notifications
You must be signed in to change notification settings - Fork 4
Expand file tree
/
Copy pathindex.html
More file actions
1088 lines (1001 loc) · 49.4 KB
/
index.html
File metadata and controls
1088 lines (1001 loc) · 49.4 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="utf-8">
<meta http-equiv="X-UA-Compatible" content="IE=edge">
<meta name="viewport" content="width=device-width, initial-scale=1">
<meta name="description" content="">
<meta name="author" content="">
<link rel="shortcut icon" href="../../assets/ico/favicon.ico">
<title>Info 290T</title>
<!-- Bootstrap core CSS -->
<link href="css/bootstrap.css" rel="stylesheet">
<link href='https://fonts.googleapis.com/css?family=Merriweather' rel='stylesheet' type='text/css'>
<!-- Custom styles for this template -->
<link href="css/offcanvas.css" rel="stylesheet">
<link href="css/ap.css" rel="stylesheet">
<style>
/* Chrome, Safari, Opera */
@-webkit-keyframes rainbow {
0% {
color: rgb(255, 0, 0);
}
10% {
color: rgb(255, 111, 0);
}
20% {
color: rgb(255, 225, 0);
}
30% {
color: rgb(162, 255, 0);
}
40% {
color: rgb(43, 255, 0);
}
50% {
color: rgb(0, 255, 200);
}
60% {
color: rgb(0, 145, 255);
}
70% {
color: rgb(0, 64, 255);
}
80% {
color: rgb(174, 0, 255);
}
90% {
color: rgb(247, 0, 255);
}
100% {
color: rgb(255, 0, 0);
}
}
/* Internet Explorer */
@-ms-keyframes rainbow {
0% {
color: rgb(255, 0, 0);
}
10% {
color: rgb(255, 111, 0);
}
20% {
color: rgb(255, 225, 0);
}
30% {
color: rgb(162, 255, 0);
}
40% {
color: rgb(43, 255, 0);
}
50% {
color: rgb(0, 255, 200);
}
60% {
color: rgb(0, 145, 255);
}
70% {
color: rgb(0, 64, 255);
}
80% {
color: rgb(174, 0, 255);
}
90% {
color: rgb(247, 0, 255);
}
100% {
color: rgb(255, 0, 0);
}
}
/* Standar Syntax */
@keyframes rainbow {
0% {
color: rgb(255, 0, 0);
}
10% {
color: rgb(255, 111, 0);
}
20% {
color: rgb(255, 225, 0);
}
30% {
color: rgb(162, 255, 0);
}
40% {
color: rgb(43, 255, 0);
}
50% {
color: rgb(0, 255, 200);
}
60% {
color: rgb(0, 145, 255);
}
70% {
color: rgb(0, 64, 255);
}
80% {
color: rgb(174, 0, 255);
}
90% {
color: rgb(247, 0, 255);
}
100% {
color: rgb(255, 0, 0);
}
}
.primer {
color: rgb(255, 76, 76);
}
.primer:hover {
/* Chrome, Safari, Opera */
-webkit-animation: rainbow 2s infinite;
/* Internet Explorer */
-ms-animation: rainbow 1s infinite;
/* Standar Syntax */
animation: rainbow 1s infinite;
}
.primer:visited {
color: rgb(255, 76, 76);
}
.project-title {
font-size: 1.05em;
font-weight: bold;
}
.project-author {
font-size: 0.9em;
font-style: italic;
color: rgb(109, 109, 109);
}
.project-link {
font-size: 0.9em;
}
</style>
<!-- Just for debugging purposes. Don't actually copy this line! -->
<!--[if lt IE 9]><script src="../../assets/js/ie8-responsive-file-warning.js"></script><![endif]-->
<!-- HTML5 shim and Respond.js IE8 support of HTML5 elements and media queries -->
<!--[if lt IE 9]>
<script src="https://oss.maxcdn.com/libs/html5shiv/3.7.0/html5shiv.js"></script>
<script src="https://oss.maxcdn.com/libs/respond.js/1.4.2/respond.min.js"></script>
<![endif]-->
</head>
<body>
<script src="https://ajax.googleapis.com/ajax/libs/jquery/1.11.0/jquery.min.js"></script>
<script src="js/bootstrap.js"></script>
<script>
$(function () {
$("#sidebar").load("sidebar.html");
});
$(document).ready(function () {
$('.dropdown-toggle').dropdown();
});
</script>
<style>
td,
th {
padding: 6px;
}
</style>
<div class="navbar navbar-fixed-top navbar-inverse" role="navigation">
<div class="container">
<div class="navbar-header">
<button type="button" class="navbar-toggle" data-toggle="collapse" data-target=".navbar-collapse">
<span class="sr-only">Toggle navigation</span>
<span class="icon-bar"></span>
<span class="icon-bar"></span>
<span class="icon-bar"></span>
<span class="icon-bar"></span>
<span class="icon-bar"></span>
<span class="icon-bar"></span>
<span class="icon-bar"></span>
</button>
<a class="navbar-brand" href="index.html#">INFO290T</a>
</div>
<div class="collapse navbar-collapse">
<ul class="nav navbar-nav">
<li><a href="index.html#">Home</a></li>
<li><a href="index.html#description">About</a></li>
<li><a href="index.html#schedule">Schedule</a></li>
<li><a href="index.html#grading">Grading</a></li>
<li><a href="index.html#papers">Readings</a></li>
<li><a href="index.html#project">Project</a></li>
<li><a href="index.html#presentations-instructions">Presentations</a></li>
<li><a href="index.html#class-review-instructions">Reviews</a></li>
</ul>
</div>
</div>
</div>
<div class="container">
<div class="row row-offcanvas row-offcanvas-right">
<div class="col-xs-12 col-sm-9">
<p class="pull-right visible-xs">
<button type="button" class="btn btn-primary btn-xs" data-toggle="offcanvas">Other Info</button>
</p>
<div class="jumbotron" style="width:95%">
<h3>Info290T: Human-Centered Data Management</h3>
<!--<div class="row">
<div class="col-md-8" style="width: 100%;">
<p>
<img src="images/happyhumanheads.png" class="img-responsive img-rounded" style="margin-bottom: 15px;width: 50%; ">
This is a research-oriented graduate class on human-centered aspects in data management and analysis across the end-to-end data science/AI lifecycle. The class will entail reading and discussion of classical and modern research papers in this space. As part of this class, students will undertake a research project in this space. Students taking the class should have taken a database or data engineering class, at the level of INFO 258 / DATA 101 / COMPSCI 186, and/or have experience working with database or data engineering tools. </p>
</div>
</div>
-->
<div class="row">
<div class="col-md-4">
<img src="images/happyhumanheads.png" class="img-responsive img-rounded" style="margin-bottom: -25px;">
</div> <!-- /span2 -->
<div class="col-md-8">
<p>
This is a research-oriented class on human-centered aspects in data management and analysis across the
end-to-end data science/AI lifecycle. The class will entail reading and discussion of classical and
modern research papers in this space. As part of this class, students will undertake a research project
in this space. Students taking the class should have taken a database or data engineering class, at the
level of INFO 258 / DATA 101 / COMPSCI 186, and/or have experience working with database or data
engineering tools.
</p>
</div>
</div>
</div>
<a href="description"></a>
<div class="row">
<div class="col-md-12">
<h2> Description </h2>
<p> This class emphasizes the central role of humans in data management. We will collectively explore a
range of research papers in this space, drawn from the leading data management and human-computer
interaction (HCI)/visualization
venues. We will cover a range of technologies
and methodologies.</p>
<p>From a technology standpoint, we will explore
some or all of the following, time permitting: visual analytics systems, visualization recommendation
systems, spreadsheet systems, data cleaning and transformation systems, notebook-centric analysis tools,
SQL query builders, explanation and provenance systems, data discovery systems, gestural interfaces,
approximate query processing systems, predictive materialization systems, speech and natural language
querying systems, text and video analysis systems, semi-structured data systems. We will also explore
papers that study human perception and behavior as it pertains to data management. The emphasis will be on
a mix of human-centric concerns, interface ideas, and scalable data processing ideas (all with an eye
towards the end user).</p>
<p>There are multiple goals from this class. First, for those whom this is a first exposure to research
papers in data management and HCI, we will learn about how to read and critically evaluate research papers
from multiple perspectives (more on this later). Second, we will learn about multiple state-of-the-art
techniques. We will explore the design of novel interfaces for data management concerns and the design of
novel scalability techniques that focus on humans-in-the-center. Third, we will learn about the process of
designing, validating, and evaluating ideas in this space.
</p>
<p>The class will center around discussion. We will be employing
a role-playing approach to the class. More on this <a href="#presentations-instructions">here</a>.</p>
<p><b>Since this is the first time this class is being offered, and the first time we're trying out a
role-playing activity, everything, including the breakdown of topics and grading is tentative; please
expect hiccups if you're taking the class, and apologies for any issues in advance!</b> </p>
</div>
</div>
<a name="schedule"></a>
<div class="row">
<div class="col-md-12">
<h2> Schedule (tentative)</h2>
<table border="2" cellpadding="2" style="width:90%">
<!-- <table class="tg" style="undefined;table-layout: fixed; width: 687px"> -->
<colgroup>
<col style="width: 45px">
<col style="width: 250px">
<col style="width: 120px">
<col style="width: 50px">
</colgroup>
<thead>
<tr>
<th>Date</th>
<th>Topic</th>
<th>Materials</th>
<th>Notes</th>
</tr>
</thead>
<tbody>
<tr>
<td>8/24</td>
<td>Introduction</td>
<td><a href="slides/01-course-intro.pdf">Slides</a></td>
<td></td>
</tr>
<tr>
<td>8/29</td>
<td></td>
<td></td>
<td>No Class: VLDB</td>
</tr>
<tr>
<td>8/31</td>
<td>Primers: Reading Papers and Visualization</td>
<td>
<a class="primer" href="slides/02a-primers-reading-papers.pdf">Papers Primer</a>
<br />
<a class="primer" href="slides/02b-primers-visualization.pdf">Visualization Primer</a>
</td>
<td></td>
</tr>
<tr>
<td>9/5</td>
<td>Polaris: A System for Query, Analysis, and Visualization of Multidimensional Relational Databases
</td>
<td>
<a href="slides/03-polaris-author.pdf">Paper Author</a>
<br />
<a href="slides/03-polaris-indt.pdf">Industry Practitioner</a>
<br />
<a href="slides/03-polaris-acdm.pptx">Academic Researcher</a>
<br />
<a href="slides/03-polaris-discussion.pdf">Additional Discussion</a>
</td>
<td></td>
</tr>
<tr>
<td>9/7</td>
<td>Expressive Time-Series Querying by Hand-drawn Visual Sketches</td>
<td><a href="slides/04-qetch-author.pdf">Paper Author & Additional Discussion</a><br /><a
href="slides/04-timeseries-acdm.pptx">Academic Researcher</a></td>
<td></td>
</tr>
<tr>
<td>9/12</td>
<td>SeeDB: Efficient Data-Driven Visualization Recommendations to Support Visual Analytics</td>
<td>
<a href="slides/05-seedb-author.pdf">Paper Author</a>
<br />
<a href="slides/05-seedb-arch.pdf">Archaeologist</a>
<br />
<a href="slides/05-seedb-indt.pdf">Industry Practitioner</a>
<br />
<a href="slides/05-seedb-acdm.pdf">Academic Researcher</a>
</td>
<td></td>
</tr>
<tr>
<td>9/14</td>
<td>Falcon: Balancing Interactive Latency and Resolution Sensitivity for Scalable Linked
Visualizations</td>
<td>
<a class="primer" href="slides/02d-primers-scalability.pdf">Scalability Primer</a>
<br />
<a href="slides/06-falcon-author.pdf">Paper Author</a>
<br />
<a href="slides/06-falcon-acdm.pdf">Academic Researcher</a>
<br />
<a href="slides/06-falcon-indt.pdf">Industry Practitioner</a>
<br />
<a href="slides/06-falcon-peer.pdf">Peer Reviewer</a>
</td>
<td></td>
</tr>
<tr>
<td>9/19</td>
<td>Trust, but Verify: Optimistic Visualizations of Approximate Queries for Exploring Big Data</td>
<td>
<a href="slides/07-trust-author.pdf">Paper Author</a>
<br />
<a href="slides/07-trust-arch.pdf">Archaeologist</a>
<br />
<a href="slides/07-trust-discussion.pdf">Additional Discussion</a>
</td>
<td>Project Proposal Due 19th</td>
</tr>
<tr>
<td>9/21</td>
<td>Benchmarking Spreadsheet Systems</td>
<td>
<a class="primer" href="slides/02c-primers-HCI.pdf">HCI Primer</a>
<br />
<a href="slides/08-benchmark-author.pptx">Paper Author</a>
</td>
<td></td>
</tr>
<tr>
<td>9/26</td>
<td>Sigma Worksheet: Interactive Construction of OLAP Queries</td>
<td>
<a href="slides/09-sigma-author.pptx">Paper Author</a>
<br />
<a href="slides/09-sigma-peer.pdf">Peer Reviewer</a>
<br />
<a href="slides/09-sigma-acdm.pptx">Academic Researcher</a>
</td>
<td></td>
</tr>
<tr>
<td>9/28</td>
<td>Wrangler: interactive visual specification of data transformation scripts</td>
<td>
<a href="slides/10-wrangler-author.pdf">Paper Author</a>
<br />
<a href="slides/10-wrangler-discussion.pdf">Additional Discussion</a>
</td>
<td></td>
</tr>
<tr>
<td>10/3</td>
<td>Profiler: Integrated Statistical Analysis and Visualization for Data Quality Assessment</td>
<td>
<a href="slides/11-profiler-author.pdf">Paper Author</a>
<br />
<a href="slides/11-profiler-indt.pdf">Industry Practitioner</a>
<br />
<a href="slides/11-profiler-discussion.pdf">Additional Discussion</a>
</td>
<td></td>
</tr>
<tr>
<td>10/5</td>
<td>Gestural Query Specification</td>
<td>
<a href="slides/12-gesturedb-author.pdf">Paper Author</a>
<br />
<a href="slides/12-gesturedb-discussion.pdf">Additional Discussion</a>
</td>
<td></td>
</tr>
<tr>
<td>10/10</td>
<td>DataTone: Managing Ambiguity in Natural Language Interfaces for Data Visualization</td>
<td>
<a href="slides/13-datatone-author.pdf">Intermediate Report & Paper Author</a>
<br />
<a href="slides/13-datatone-acdm.pdf">Academic Researcher</a>
<br />
<a href="slides/13-datatone-indt.pdf">Industry Practitioner</a>
<br />
</td>
<td></td>
</tr>
<tr>
<td>10/12</td>
<td>SpeakQL: Towards Speech-driven Multimodal Querying of Structured Data</td>
<td>
<a href="slides/14-speakql-author.pdf">Paper Author</a>
<br />
<a href="slides/14-speakql-arch.pdf">Archaeologist</a>
</td>
<td></td>
</tr>
<tr>
<td>10/17</td>
<td>Interactive Browsing and Navigation in Relational Databases</td>
<td><a href="slides/15-interactivedb-author.pdf">Paper Author</a></td>
<td></td>
</tr>
<tr>
<td>10/19</td>
<td>DataPlay: Interactive Tweaking and Example-driven Correction of Graphical Database Queries</td>
<td>
<a href="slides/16-dataplay-author.pdf">Paper Author</a>
<br />
<a href="slides/16-dataplay-indt.pdf">Industry Practitioner</a>
</td>
<td></td>
</tr>
<tr>
<td>10/24</td>
<td>Lux: Always-on Visualization Recommendations</td>
<td>
<a href="slides/17-lux-discussion.pdf">Discussion</a>
<br />
<a href="slides/17-lux-author.pdf">Paper Author</a>
</td>
<td></td>
</tr>
<tr>
<td>10/26</td>
<td>mage: Fluid Moves Between Code and Graphical Work in Computational Notebooks</td>
<td>
<a href="slides/18-mage-discussion.pdf">Discussion</a>
<br />
<a href="slides/18-mage-author.pdf">Paper Author</a>
<br />
<a href="slides/18-mage-indt.pdf">Industry Practitioner</a>
<br />
<a href="slides/18-mage-arch.pdf">Archaeologist</a>
</td>
<td></td>
</tr>
<tr>
<td>10/31</td>
<td>BlinkDB: Queries with Bounded Errors and Bounded Response Times on Very Large Data</td>
<td>
<a href="slides/19-blinkdb-discussion.pdf">Discussion</a>
<br />
<a href="slides/19-blinkdb-author.pdf">Paper Author</a>
<br />
<a href="slides/19-blinkdb-acdm.pdf">Academic Researcher</a>
</td>
<td></td>
</tr>
<tr>
<td>11/2</td>
<td>AQP++: Connecting Approximate Query Processing With Aggregate Precomputation for Interactive
Analytics</td>
<td><a href="slides/20-aqppp-author.pdf">Paper Author</a></td>
<td>Intermediate Report Due 2nd</td>
</tr>
<tr>
<td>11/7</td>
<td>Distributed and Interactive Cube Exploration</td>
<td></td>
<td></td>
</tr>
<tr>
<td>11/9</td>
<td>Dremel: Interactive Analysis Of Web-Scale Datasets</td>
<td>
<a href="slides/22-dremel-discussion.pdf">Discussion</a>
<br />
<a href="slides/22-dremel-author.pdf">Paper Author</a>
<br />
<a href="slides/22-dremel-arch.pttx">Archaeologist</a>
</td>
<td></td>
</tr>
<tr>
<td>11/14</td>
<td>Scorpion: Explaining Away Outliers in Aggregate Queries</td>
<td><a href="slides/23-scorpion-author.pptx">Paper Author</a></td>
<td></td>
</tr>
<tr>
<td>11/16</td>
<td>Macrobase: Prioritizing attention in fast data</td>
<td>
<a href="slides/24-macrobase-author.pdf">Paper Author</a>
<br />
<a href="slides/24-macrobase-indt.pdf">Industry Practitioner</a>
</td>
<td></td>
</tr>
<tr>
<td>11/21</td>
<td>Can Foundation Models Wrangle Your Data?</td>
<td>
<a href="slides/25-foundation-author.pdf">Paper Author</a>
<br />
<a href="slides/25-foundation-peer.pptx">Peer Reviewer</a>
<br />
<a href="slides/25-foundation-acdm.pdf">Academic Researcher</a>
</td>
<td></td>
</tr>
<tr>
<td>11/23</td>
<td>Thanksgiving Holiday</td>
<td></td>
<td></td>
</tr>
<tr>
<td>11/28</td>
<td>Final Project Presentations (8-11) </td>
<td><a href="index.html#projects-listing">Projects Listing</a></td>
<td></td>
</tr>
<tr>
<td>11/30</td>
<td>No Class</td>
<td>Final Project Report Due 12/3</td>
<td></td>
</tr>
</tbody>
</table>
</div>
</div>
<a name="relationship"></a>
<div class="row">
<div class="col-md-12">
<h2> Requirements </h2>
<p> The official class requirements state "Students taking the class should have taken a database or data
engineering class, at the level of INFO 258 / DATA 101 / COMPSCI 186, and/or have experience working with
database or data engineering tools."
</p>
<p>
Given the varied backgrounds of students coming into this class, we are willing to be flexible in this
regard, but we will expect that you will know or be able to pick up missing pieces as we go along — to the
extent that it ensures that you can take part in the paper discussions, presentations, and reviews, as
well as the research project.
</p>
<p>
We will expect that you have experience with data science tooling (such as dataframe and visualization
libraries, and computational notebooks), databases (both from a query language standpoint, but also
performance aspects: query optimization, materialized view maintenance, indexing). If you struggle with
SQL or have not heard of query optimization, I encourage taking a database class first. Experience with
research or reading research papers is not a must. </p>
</div>
</div>
<a name="papers"></a>
<div class="row">
<div class="col-md-12">
<h2> Tentative List of Papers </h2>
<h3> Theme 1: Data Exploration (6) </h3>
<h4> Visual Analytics Systems + Next Generation: Visualization Search and Recommendation </h4>
<ul>
<li> <a href="papers/polaris.pdf">Polaris: A System for Query, Analysis, and Visualization of
Multidimensional Relational Databases</a></b></li>
<li> <a href="papers/expressive-time-series-querying.pdf">Expressive Time-Series Querying by Hand-drawn
Visual Sketches</a></li>
<li> <a href="papers/see-db.pdf">SeeDB: Efficient Data-Driven Visualization Recommendations to Support
Visual Analytics</a></li>
<li>Optional: <a href="papers/voyager.pdf">Voyager: Exploratory Analysis via Faceted Browsing of
Visualization Recommendations</a></li>
<li>Optional: <a href="papers/show-me.pdf">Show Me: Automatic Presentation for Visual Analysis</a></li>
<li>Optional: <a href="papers/zenvisage.pdf"> Effortless Data Exploration with zenvisage: An Expressive
and Interactive Visual Analytics System</a></li>
<li>Optional: <a href="papers/voyager2.pdf">Voyager 2: Augmenting Visual Analysis with Partial View
Specifications</a></li>
</ul>
<h4> Perceptual Approximation</h4>
<ul>
<li> <a href="papers/falcon.pdf">Falcon: Balancing Interactive Latency and Resolution Sensitivity for
Scalable Linked Visualizations</a></li>
<li> <a href="papers/trust-but-verify.pdf">Trust, but Verify: Optimistic Visualizations of Approximate
Queries for Exploring Big Data</a></li>
<li>Optional: <a href="papers/incvisage.pdf">I’ve Seen “Enough”: Incrementally Improving Visualizations to
Support Rapid Decision Making</a></li>
<li>Optional: <a href="papers/m4.pdf">M4: A Visualization-Oriented Time Series Data Aggregation</a></li>
<li>Optional: <a href="papers/how-progressive-visualizations-affect-exploratory-analysis.pdf">How
Progressive Visualizations Affect Exploratory Analysis</a></li>
<li>Optional: <a href="papers/trust-fisher.pdf">Trust Me, I’m Partially Right: Incremental Visualization
Lets Analysts Explore Large Datasets Faster</a> </li>
<li>Optional: <a href="papers/incremental-fisher.pdf">Incremental, Approximate Database Queries and
Uncertainty for Exploratory Visualization </a></li>
<li>Optional: <a href="papers/immens.pdf">imMens: Real-time Visual Querying of Big Data</a> </li>
</ul>
<h3> Theme 2: Data Manipulation (4) </h3>
<h4> Spreadsheets and Direct Manipulation </h4>
<ul>
<li> <a href="https://people.eecs.berkeley.edu/~adityagp/papers/spreadsheet_bench.pdf">Benchmarking
Spreadsheet Systems</a></li>
<li> <a href="papers/sigma-worksheet.pdf">Sigma Worksheet: Interactive Construction of OLAP Queries</a>
</li>
<li>Optional: <a href="papers/hillview.pdf">Hillview: A trillion-cell spreadsheet for big data</a></li>
<li>Optional: <a href="papers/noah.pdf">NOAH: Interactive Spreadsheet Exploration with Dynamic
Hierarchical Overviews</a></li>
<li>Optional: <a href="papers/expressive-query-construction.pdf">Expressive Query Construction through
Direct Manipulation of Nested Relational Results</a></li>
<li>Optional: <a
href="https://people.eecs.berkeley.edu/~adityagp/papers/data-spread-demo.pdf">Data-Spread: Unifying
Databases and Spreadsheets</a></li>
<li>Optional: <a href="papers/spreadsheet-scalability-issues.pdf">Characterizing Scalability Issues in
Spreadsheet Software Using Online Forums</a></li>
</ul>
<h4> Data Cleaning and Transformation </h4>
<ul>
<li> <a href="papers/wrangler.pdf">Wrangler: interactive visual specification of data transformation
scripts</a></li>
<li> <a href="papers/profiler.pdf">Profiler: Integrated Statistical Analysis and Visualization for Data
Quality Assessment</a> </li>
<li>Optional: <a href="papers/flash-extract.pdf">FlashExtract: A Framework for Data Extraction by
Examples</a> </li>
<li>Optional: <a href="papers/potters-wheel.pdf">Potter’s Wheel: An Interactive Data Cleaning System </a>
</li>
<li>Optional: <a href="papers/predictive-interaction.pdf">Predictive Interaction for Data
Transformation</a></li>
<li>Optional: <a href="papers/data-cleaning.pdf">Data Cleaning: Overview and Emerging Challenges</a></li>
</ul>
<h3> Theme 3: Beyond GUIs: Other No-Code Interface Modalities (3) </h3>
<h4> Touch and Gesture </h4>
<ul>
<li> <a href="papers/gesturedb.pdf">Gestural Query Specification</a></li>
<li>Optional: <a href="papers/dbtouch.pdf">dbTouch: Analytics at your Fingertips</a></li>
<li>Optional: <a href="papers/panoramic-data.pdf">PanoramicData: Data Analysis through Pen & Touch</a>
</li>
</ul>
<h4> Natural Language & Speech </h4>
<ul>
<li> <a href="papers/data-tone.pdf">DataTone: Managing Ambiguity in Natural Language Interfaces for Data
Visualization</a></li>
<li> <a href="papers/speak-ql.pdf">SpeakQL: Towards Speech-driven Multimodal Querying of Structured
Data</a></li>
<li>Optional: <a href="papers/voice-olap.pdf">A Holistic Approach for Query Evaluation and Result
Vocalization in Voice-Based OLAP</a></li>
<li>Optional: <a href="papers/shape-search.pdf">ShapeSearch: A Flexible and Efficient System for
Shape-based Exploration of Trendlines</a></li>
<li>Optional: <a href="papers/nl4dv.pdf">NL4DV: Toolkit for Generating Analytic Specs for Data Vis from
Natural Language Queries</a></li>
<li>Optional: <a href="papers/eviza.pdf">Eviza: A Natural Language Interface for Visual Analysis</a></li>
<li>Optional: <a href="papers/nlidb.pdf">Bridging the Semantic Gap with SQL Query Logs in Natural Language
Interfaces to Databases</a></li>
</ul>
<h3> Theme 4: No-Code-meets-Code (4) </h3>
<h4> SQL Query Construction </h4>
<ul>
<li> <a href="papers/interactive-relational-databases.pdf">Interactive Browsing and Navigation in
Relational Databases</a></li>
<li> <a href="papers/dataplay.pdf">DataPlay: Interactive Tweaking and Example-driven Correction of
Graphical Database Queries</a></li>
<li>Optional: <a href="papers/usability.pdf">Making Database Systems Usable</a></li>
<li>Optional: <a href="papers/qbo.pdf"> Query By Output </a></li>
<li>Optional: <a href="papers/snip-suggest.pdf"> Snipsuggest: Context-aware autocompletion for sql </a>
</li>
</ul>
<h4> Computational Notebook Tools </h4>
<ul>
<li> <a href="papers/lux.pdf">Lux: Always-on Visualization Recommendations</a></li>
<li> <a href="papers/mage.pdf">mage: Fluid Moves Between Code and Graphical Work in Computational
Notebooks</a></li>
<li>Optional: <a href="papers/b2.pdf">B2: Bridging Code and Interactive Visualization in Computational
Notebooks</a></li>
</ul>
<h4> Low-Code Data Manipulation Libraries </h4>
<ul>
<li> <a href="papers/auto-suggest.pdf">Auto-Suggest: Learning-to-Recommend Data Preparation Steps Using Data Science Notebooks</a></li>
<li>Optional: <a href="papers/scalable-dataframe.pdf">Towards Scalable Dataframe Systems</a></li>
</ul>
<h3> Theme 5: Scalability for Humans (5) </h3>
<h4> Approximate Query Processing </h4>
<ul>
<li> <a href="papers/blinkdb.pdf">BlinkDB: Queries with Bounded Errors and Bounded Response Times on Very
Large Data</a></li>
<li> <a href="papers/aqp-plus-plus.pdf">AQP++: Connecting Approximate Query Processing With Aggregate
Precomputation for Interactive Analytics</a></li>
<li>Optional: <a href="papers/sample+seek.pdf">Sample + Seek: Approximating Aggregates with Distribution
Precision Guarantee </a></li>
<li>Optional: <a href="papers/quickr.pdf">Quickr: Lazily Approximating Complex AdHoc Queries in BigData
Clusters</a></li>
<li>Optional: <a href="papers/verdictdb.pdf">VerdictDB: Universalizing Approximate Query Processing</a>
</li>
<li>Optional: <a href="papers/now.pdf">Scalable Progressive Analytics on Big Data in the Cloud</a></li>
<li>Optional: <a href="papers/msft-approximate-queries.pdf">Experiences with Approximating Queries in
Microsoft’s Production Big-Data Clusters</a></li>
<li>Optional: <a href="papers/approximate-query-processing.pdf">Approximate Query Processing: No Silver
Bullet</a></li>
</ul>
<h4> Materialization, Reuse, Prediction </h4>
<ul>
<li> <a href="papers/distributed-interactive-cube-exploration.pdf">Distributed and Interactive Cube
Exploration</a></li>
<li>Optional: <a href="papers/kyrix.pdf">Kyrix: Interactive Pan/Zoom Visualizations at Scale</a></li>
</ul>
<h4> Parallel Data Processing </h4>
<ul>
<li> <a href="papers/dremel.pdf">Dremel: Interactive Analysis Of Web-Scale Datasets</a></li>
<li>Optional: <a href="papers/snowflake-elastic.pdf">The Snowflake Elastic Data Warehouse</a></li>
<li>Optional: <a href="papers/spark_sql.pdf">Spark SQL: Relational Data Processing in Spark</a></li>
<li>Optional: <a href="papers/duckdb.pdf">DuckDB: an Embeddable Analytical Database</a></li>
</ul>
<h4> Surveys and Benchmarks </h4>
<ul>
<li> <a href="papers/data-management-vis-review.pdf">A Structured Review of Data Management Technology for
Interactive Visualization and Analysis</a></li>
<li> <a href="papers/db-benchmark.pdf">Database Benchmarking for Supporting Real-Time Interactive Querying
of Large Data</a></li>
</ul>
<h3> Theme 6: Going Deeper (3) </h3>
<h4> Outliers, Explanations, and Provenance </h4>
<ul>
<li> <a href="papers/scorpion.pdf">Scorpion: Explaining Away Outliers in Aggregate Queries</a></li>
<li> <a href="papers/macrobase.pdf">Macrobase: Prioritizing attention in fast data</a></li>
<li>Optional: <a href="papers/diff.pdf">DIFF: A Relational Interface for Large-Scale Data Explanation</a>
</li>
<li>Optional: <a href="papers/orpheusdb.pdf"> OrpheusDB: Bolt-on Versioning for Relational Databases </a>
</li>
<li>Optional: <a href="papers/safer-notebook.pdf">Fine-Grained Lineage for Safer Notebook Interactions</a>
</li>
<li>Optional: <a href="papers/smoke.pdf">Smoke: Fine-grained Lineage at Interactive Speed</a></li>
</ul>
<h4> Large Language Models for Data Work </h4>
<ul>
<li> <a href="papers/foundation-models.pdf">Can Foundation Models Wrangle Your Data?</a></li>
<li>Optional: <a href="papers/generate-structured-views.pdf">Language Models Enable Simple Systems for
Generating Structured Views of Heterogeneous Data Lakes</a></li>
</ul>
<h4> Collaborative Query Processing and Data Discovery </h4>
<ul>
<li>Optional: <a href="papers/find-related-tables.pdf">Finding Related Tables in Data Lakes for
Interactive Data Science</a></li>
<li>Optional: <a href="papers/fusion-tables.pdf">Google fusion tables: web-centered data management and
collaboration</a></li>
<li>Optional: <a href="papers/cqms.pdf">The case for a Collaborative Query Management System</a></li>
</ul>
<h4> Video Analysis </h4>
<ul>
<li>Optional: <a href="papers/rekall.pdf">Rekall: Specifying Video Events using Compositions of
Spatiotemporal Labels</a></li>
<li>Optional: <a href="papers/eva.pdf">EVA: A Symbolic Approach to Accelerating Exploratory Video
Analytics with Materialized Views</a></li>
<li>Optional: <a href="papers/viva.pdf">VIVA: An End-to-End System for Interactive Video Analytics</a>
</li>
</ul>
<h4> Machine Learning Systems </h4>
<ul>
<li>Optional: <a href="papers/madskills.pdf"> MAD Skills: New Analysis Practices for Big Data </a></li>
<li>Optional: <a href="papers/mlbase.pdf">MLbase: A Distributed Machine-learning System</a></li>
<li>Optional: <a href="papers/bismarck.pdf">Towards a Unified Architecture for in-RDBMS Analytics</a></li>
<li>Optional: <a href="papers/graphlab-earlier.pdf">GraphLab: A New Framework For Parallel Machine
Learning</a></li>
</ul>
</div>
</div>
<a name="class-review-instructions"></a>
<div class="row">
<div class="col-md-12">
<h2> Instructions for Submitting Class Reviews </h2>
<p>
For any class where you are not part of the role-playing activity, you may submit a brief review of the
paper.
You must use the following link to submit class reviews: <a href="./review.html">Link</a>.
</p>
<p> Remember to cover the 5 key questions: what is the problem, why is it important, what sets it apart from
previous work, what are the key technical ideas, what are the main areas of improvement and open issues,
all within 500 words.
</p>
<p> The class reviews must be submitted by <b>midnight the day before class.</b> No late submissions
accepted. These reviews will be lightly graded: we just want to make sure you've read the paper enough to
contribute in class. </p>
<p>You need to submit 10 reviews over the semester, for papers where you are not actively playing one of the
presenter roles.</p>
</div>
</div>
<a name="presentations-instructions"></a>
<div class="row">
<div class="col-md-12">
<h2> Instructions for the Role Playing/Presentation Activity </h2>
<p>
This class involves multiple roles adapted
originally from <a href="https://colinraffel.com/blog/role-playing-seminar.html">here</a>.
The goal of these multiple roles is to ensure that the class is not organized in a one-to-many format (one
speaker, many passive listeners), but a
many-to-many format (many speakers, many active listeners).
</p>
<p>
The roles in our class are the following: paper author, peer reviewer, archaeologist, academic researcher,
and industry practitioner. </p>
<p>
The paper author role involves a 15-20 minute presentation, in a conference style talk. Convey what's
interesting about the paper: what is the domain, what is the problem, what is wrong with prior work, what
is the approach advocated by the paper, how well does it do, etc. This is done either solo or as groups of
two.
</p>
<p>The other four roles are 5 minutes each. Each starts with a one slide summary.
<ul>
<li> The peer reviewer role describes a full review of the paper for a top venue in the corresponding
research area (databases, visualization, HCI, ...). Identifies a summary of the key contributions, and
the strong and weak points of the paper.
</li>
<li> The archaeologist reports on one older paper that the current paper builds on, and one new paper that
the current paper inspires, as a means of situating the paper in the literature. Reporting on more
papers are also allowed! </li>
<li> The academic researcher proposes one or more follow-on projects that builds on the ideas of the
paper. </li>
<li> The industry practitioner tries to make an argument for why they should be paid to implement the
methods in the paper — and to discuss any benefits and risks. </li>
</ul>
</p>
<p> More details on breakdown, how to sign up, to be announced shortly. </p>
</div>
</div>
<a name="grading"></a>
<div class="row">
<div class="col-md-12">
<h2> Grading Policy </h2>
<ul>
<li> Class Participation: 50% </li>
<ul>
<li> Paper Reviews: 15% </li>
<ul>
<li> Due day before class at midnight. Need to submit at least 10 reviews over the semester. Will be
lightly graded. You can't "double-dip": the papers where you are a presenter won't count towards
your 10 reviews </li>
</ul>
<li> Class Participation: 10% </li>
<ul>
<li> Any participation is good participation (within reason!). We want to have a good discussion as
part of the presentations. Feel free to chime in with ideas, concerns, questions, discussion points,
etc. </li>
</ul>
<li> Paper Presentation: 25% </li>
<ul>
<li> You will play the paper author role roughly 1-2 times, and other accessory roles up to 4-5 times.
</li>
</ul>
</ul>
<li> Research Project: 50%. A semester-long research project on human-centered data management in teams of
2-3 (1 only in the case of exceptions). </li>
<ul>
<li> Project Proposal: 5% </li>
<li> Intermediate Report: 10% </li>
<li> Final Report + Presentation: 35% </li>
</ul>
</ul>
</div>
</div>
<a name="project"></a>
<div class="row">
<div class="col-md-12">
<h2>Project</h2>
<p> As part of this class, you need to complete a semester-long project. Details will be announced shortly.
We encourage you to look for ideas in your domain of expertise: for instance, if you work in computational
journalism, building a new way to browse and manage large collections of textual archives could be a
perfectly reasonable project. Either way, you must speak to the instructor to verify that the project
fulfils the needs of the class.
</p>
<p> One interesting avenue for projects is to revisit data management problems (including papers in the list
above) with LLMs in the loop. Aspects of interest would include thinking of other interaction modalities
(beyond chat), human verification/validation of system interpretations, and dealing with brittleness and
hallucinations.
</p>
<a name="projects-listing"></a>
<h4>Final Project Slides</h4>
<ul>
<li>
<div class="project-title">SPADE: A System for Prompt Analysis and Delta-Based Evaluation</div>
[<span class="project-link"><a href="final-project-slides/shankar.pdf">Slide</a></span>]
<span class="project-author">Shreya Shankar</span>
</li>
<li>
<div class="project-title">Code Generation for Data Wrangling Tasks</div>
[<span class="project-link"><a href="final-project-slides/bhatia.pdf">Slide</a></span>]
<span class="project-author">Sahil Bhatia</span>