steps.py fails on mixed-type columns

The bug occurs when processing columns in the input DataFrame that contain a mix of numeric and non-numeric values. The relevant code is:

```
for col in df.columns:
     col_data = df[col]
     col_is_numeric = [is_numeric(v) for v in col_data if not pd.isnull(v)]
     if not all(col_is_numeric) and any(col_is_numeric): 
          numeric_mask = col_data.apply(is_numeric)
          df[col+'_str'] = df[col].copy()
          df.loc[~numeric_mask, col] = np.nan
          df.loc[numeric_mask, col+'_str'] = np.nan
```

`col` loops through all unique `variable_names` from the inputs file

`col_data` contains all values for that variable for each ID and time. If no record exists for a given ID/time, the value is None.

The code works correctly if all non-None values are of the same type (e.g., all floats).

Problem:
When `col_data` contains mixed types, `numeric_mask` includes non-boolean values (`None` for missing, `False` for strings, `True` for numbers). `df.loc[numeric_mask]` and `df.loc[~numeric_mask]` fail because loc expects a fully boolean mask.

Proposed Fix:
Replace the failing lines with:

```
df.loc[numeric_mask == False, col] = np.nan
df.loc[numeric_mask == True, col+'_str'] = np.nan
```

This ensures loc always receives a fully boolean mask.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

steps.py fails on mixed-type columns #13

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

steps.py fails on mixed-type columns #13

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions