-
Notifications
You must be signed in to change notification settings - Fork 19
Description
The bug occurs when processing columns in the input DataFrame that contain a mix of numeric and non-numeric values. The relevant code is:
for col in df.columns:
col_data = df[col]
col_is_numeric = [is_numeric(v) for v in col_data if not pd.isnull(v)]
if not all(col_is_numeric) and any(col_is_numeric):
numeric_mask = col_data.apply(is_numeric)
df[col+'_str'] = df[col].copy()
df.loc[~numeric_mask, col] = np.nan
df.loc[numeric_mask, col+'_str'] = np.nan
col loops through all unique variable_names from the inputs file
col_data contains all values for that variable for each ID and time. If no record exists for a given ID/time, the value is None.
The code works correctly if all non-None values are of the same type (e.g., all floats).
Problem:
When col_data contains mixed types, numeric_mask includes non-boolean values (None for missing, False for strings, True for numbers). df.loc[numeric_mask] and df.loc[~numeric_mask] fail because loc expects a fully boolean mask.
Proposed Fix:
Replace the failing lines with:
df.loc[numeric_mask == False, col] = np.nan
df.loc[numeric_mask == True, col+'_str'] = np.nan
This ensures loc always receives a fully boolean mask.