Skip to content

Commit fb28df8

Browse files
committed
add forgotten file
1 parent aa7a0fe commit fb28df8

File tree

2 files changed

+106
-1
lines changed

2 files changed

+106
-1
lines changed

docs/src/index.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -132,7 +132,7 @@ methods** dispatched on model type:
132132

133133
It should be emphasized that LearnAPI is itself agnostic to particular representations of
134134
data or the particular methods of accessing observations within them. By overloading these
135-
methods, Each `model` is free to choose its own data interface.
135+
methods, each `model` is free to choose its own data interface.
136136

137137
See [Optional data Interface](@ref data_interface) for more details.
138138

src/data_interface.jl

Lines changed: 105 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,105 @@
1+
"""
2+
LearnAPI.getobs(model, LearnAPI.fit, I, data...)
3+
4+
Return a subsample of `data` consisting of all observations with indices in `I`. Here
5+
`data` is data of the form expected in a call like `LearnAPI.fit(model, verbosity,
6+
data...; metadata...)`.
7+
8+
Always returns a tuple of the same length as `data`.
9+
10+
LearnAPI.getobs(model, operation, I, data...)
11+
12+
Return a subsample of `data` consisting of all observations with indices in `I`. Here
13+
`data` is data of the form expected in a call of the specified `operation`, e.g., in a
14+
call like `LearnAPI.predict(model, data...)`, if `operation = LearnAPI.predict`. Possible
15+
values for `operation` are: $DOC_OPERATIONS_LIST.
16+
17+
Always returns a tuple of the same length as `data`.
18+
19+
# New model implementations
20+
21+
Implementation is optional. If implemented, then ordinarily implemented for each signature
22+
of `fit` and operation implemented for `model`.
23+
24+
$(DOC_IMPLEMENTED_METHODS(:reformat))
25+
26+
The subsample returned must be acceptable in place of `data` in the call function named in
27+
the second argument.
28+
29+
## Example implementation
30+
31+
Suppose that `MyClassifier` is a model type for simple supervised classification, with
32+
`LearnAPI.fit(model::MyClassifier, verbosity, A, y)` and `predict(model::MyClassifier,
33+
fitted_params, A)` implemented assuming the target `y` is an ordinary abstract vector and
34+
the features `A` is an abstract matrix with columns as observations. Then the following is
35+
a valid implementation of `getobs`:
36+
37+
```julia
38+
LearnAPI.getobs(::MyClassifier, ::typeof(LearnAPI.fit), I, A, y) =
39+
(view(A, :, I), view(y, I))
40+
LearnAPI.getobs(::MyClassifier, ::typeof(LearnAPI.predict), I, A) = (view(A, :, I),)
41+
```
42+
43+
"""
44+
function getobs end
45+
46+
"""
47+
LearnAPI.reformat(model, LearnAPI.fit, user_data...; metadata...)
48+
49+
Return the model-specific representations `(data, metadata)` of user-supplied `(user_data,
50+
user_metadata)`, for consumption, after splatting, by `LearnAPI.fit`, `LearnAPI.update!`
51+
or `LearnAPI.ingest!`.
52+
53+
LearnAPI.reformat(model, operation, user_data...)
54+
55+
Return the model-specific representation `data` of user-supplied `user_data`, for
56+
consumption, after splatting, by the specified `operation`, dispatched on `model`. Here
57+
`operation` is one of: $DOC_OPERATIONS_LIST.
58+
59+
The following sample workflow illustrates the use of both versions of `reformat`above:
60+
61+
```julia
62+
data, metadata = LearnAPI.reformat(model, LearnAPI.fit, X, y; class_weights=dic)
63+
fitted_params, state, fit_report = LearnAPI.fit(model, 0, data...; metadata...)
64+
65+
test_data = LearnAPI.reformat(model, LearnAPI.predict, Xtest)
66+
ŷ, predict_report = LearnAPI.predict(model, fitted_params, test_data...)
67+
```
68+
69+
# New model implementations
70+
71+
Implementation of `reformat` is optional. The fallback simply slurps the supplied
72+
data/metadata. You will want to implement for each `fit` or operation signature
73+
implemented for `model`.
74+
75+
$(DOC_IMPLEMENTED_METHODS(:reformat, overloaded=true))
76+
77+
Ideally, any potentially expensive transformation of user-supplied data that is carried
78+
out during training only once, at the beginning, should occur in `reformat` instead of
79+
`fit`/`update!`/`ingest!`.
80+
81+
Note that the first form of `reformat`, for operations, should always return a tuple,
82+
because the output is splat in calls to the operation (see the sample workflow
83+
above). Similarly, in the return value `(data, metadata)` for the `fit` variant, `data` is
84+
always a tuple and `metadata` always a named tuple (or `Base.Pairs` object). If there is
85+
no metadata, a `NamedTuple()` can be returned in its place.
86+
87+
## Example implementation
88+
89+
Suppose that `MyClassifier` is a model type for simple supervised classification, with
90+
`LearnAPI.fit(model::MyClassifier, verbosity, A, y; names=...)` and
91+
`predict(model::MyClassifier, fitted_params, A)` implemented assuming that the target `y`
92+
is an ordinary vector, the features `A`is a matrix with columns as observations, and
93+
`names` are the names of the features. Then, supposing users supply features in tabular
94+
form, but target as expected, then we provide the following implementation of `reformat`:
95+
96+
```julia
97+
using Tables
98+
function LearnAPI.reformat(::MyClassifier, ::typeof(LearnAPI.fit), X, y)
99+
names = Tables.schema(Tables.rows(X)).names
100+
return ((Tables.matrix(X)', y), (; names))
101+
end
102+
LearnAPI.reformat(::MyClassifier, ::typeof(LearnAPI.predict), X) = (Tables.matrix(X)',)
103+
```
104+
"""
105+
reformat(::Any, ::Any, data...; model_data...) = (data, model_data)

0 commit comments

Comments
 (0)