Skip to content

steps.py fails on mixed-type columns #13

@alicewanner

Description

@alicewanner

The bug occurs when processing columns in the input DataFrame that contain a mix of numeric and non-numeric values. The relevant code is:

for col in df.columns:
     col_data = df[col]
     col_is_numeric = [is_numeric(v) for v in col_data if not pd.isnull(v)]
     if not all(col_is_numeric) and any(col_is_numeric): 
          numeric_mask = col_data.apply(is_numeric)
          df[col+'_str'] = df[col].copy()
          df.loc[~numeric_mask, col] = np.nan
          df.loc[numeric_mask, col+'_str'] = np.nan

col loops through all unique variable_names from the inputs file

col_data contains all values for that variable for each ID and time. If no record exists for a given ID/time, the value is None.

The code works correctly if all non-None values are of the same type (e.g., all floats).

Problem:
When col_data contains mixed types, numeric_mask includes non-boolean values (None for missing, False for strings, True for numbers). df.loc[numeric_mask] and df.loc[~numeric_mask] fail because loc expects a fully boolean mask.

Proposed Fix:
Replace the failing lines with:

df.loc[numeric_mask == False, col] = np.nan
df.loc[numeric_mask == True, col+'_str'] = np.nan

This ensures loc always receives a fully boolean mask.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions