Skip to content

Commit f16e086

Browse files
committed
Update multi-table primer to the new Sklearn multi-table schema specification
1 parent 3441563 commit f16e086

File tree

1 file changed

+24
-41
lines changed

1 file changed

+24
-41
lines changed

doc/multi_table_primer.rst

Lines changed: 24 additions & 41 deletions
Original file line numberDiff line numberDiff line change
@@ -76,40 +76,31 @@ feature object ``X``. Specifically, instead of a `pandas.DataFrame`, ``X`` must
7676
specifies the dataset schema in the following way::
7777

7878
X = {
79-
"main_table": <name of the main table>,
80-
"tables" : {
81-
<name of the main table>: (<dataframe of the main table>, <key of the main table>),
82-
<name of table 1>: (<dataframe of table 1>, <key of table 1>),
83-
<name of table 2>: (<dataframe of table 2>, <key of table 2>),
79+
"main_table": (<dataframe of the main table>, <key of the main table>),
80+
"additional_data_tables" : {
81+
<data path to table 1>: (
82+
<dataframe of table 1>, [<key of table 1>], <optional entity flag>
83+
),
84+
<data path to table 2>: (
85+
<dataframe of table 2>, [<key of table 2>], <optional entity flag>
86+
),
8487
...
8588
}
86-
"relations" : [
87-
(<name of the main table>, <name of a different table>, <entity flag>),
88-
(<name of another table>, <name of yet another table>, <entity flag>),
89-
...
90-
],
9189
}
9290

9391
The three fields of this dictionary are:
9492

95-
- ``main_table``: The name of the main table.
96-
- ``tables``: A dictionary indexed by the tables' names. Each table is associated to a 2-tuple
97-
containing the following fields:
93+
- ``main_table``: a 2-tuple containing the following fields:
94+
- The `pandas.DataFrame` object of the main table.
95+
- The key columns' names: A list of strings.
96+
.
97+
- ``additional_data_tables``: A dictionary indexed by the data paths to the secondary
98+
tables. Each data path is associated to a 2-tuple containing the following fields:
9899

99-
- The `pandas.DataFrame` object of the table.
100-
- The key columns' names : Either a list of strings or a single string.
101-
102-
- ``relations``: An optional field containing a list of tuples describing the relations between
103-
tables. The first two values (Strings) of each tuple correspond to names of both the parent and the child table
104-
involved in the relation. A third value (Boolean) can be optionally added to the tuple to indicate if the relation is
105-
either ``1:n`` or ``1:1`` (entity). For example, If the tuple ``(table1, table2, True)`` is contained in this
106-
field, it means that:
107-
108-
- ``table1`` and ``table2`` are in a ``1:1`` relationship
109-
- The key of ``table1`` is contained in that of ``table2`` (ie. keys are hierarchical)
110-
111-
If the ``relations`` field is not present then Khiops Python assumes that the tables are in a *star*
112-
schema.
100+
- The `pandas.DataFrame` object of the secondary table.
101+
- The key columns' names : A list of strings.
102+
- optionally, a flag which indicates if the secondary table is in
103+
a ``1:1`` relationship to its parent table.
113104

114105
.. note::
115106

@@ -138,9 +129,8 @@ We build the input ``X`` as follows::
138129
accidents_df = pd.read_csv(f"{kh.get_samples_dir()}/AccidentsSummary/Accidents.txt", sep="\t")
139130
vehicles_df = pd.read_csv(f"{kh.get_samples_dir()}/AccidentsSummary/Vehicles.txt", sep="\t")
140131
X = {
141-
"main_table" : "Accident",
142-
"tables": {
143-
"Accident": (accidents_df.drop("Gravity", axis=1), "AccidentId"),
132+
"main_table" : (accidents_df.drop("Gravity", axis=1), ["AccidentId"]),
133+
"additional_data_tables": {
144134
"Vehicle": (vehicles_df, ["AccidentId", "VehicleId"])
145135
}
146136
}
@@ -170,19 +160,12 @@ We build the input ``X`` as follows::
170160
places_df = pd.read_csv(f"{kh.get_samples_dir()}/Accidents/Places.txt", sep="\t")
171161

172162
X = {
173-
"main_table": "Accidents",
174-
"tables": {
175-
"Accidents": (accidents_df.drop("Gravity", axis=1), "AccidentId"),
163+
"main_table": (accidents_df.drop("Gravity", axis=1), ["AccidentId"]),
164+
"additional_data_tables": {
176165
"Vehicles": (vehicles_df, ["AccidentId", "VehicleId"]),
177-
"Users": (users_df, ["AccidentId", "VehicleId"]),
178-
"Places": (places_df, "AccidentId"),
179-
166+
"Vehicles/Users": (users_df, ["AccidentId", "VehicleId"]),
167+
"Places": (places_df, ["AccidentId"], True),
180168
},
181-
"relations": [
182-
("Accidents", "Vehicles"),
183-
("Vehicles", "Users"),
184-
("Accidents", "Places", True),
185-
],
186169
}
187170

188171
Both datasets can be found in the Khiops samples directory.

0 commit comments

Comments
 (0)