-
Notifications
You must be signed in to change notification settings - Fork 1
Description
As of 1.4.0. https://pandas.pydata.org/docs/reference/api/pandas.read_csv.html Will be removed in future, dunno when.
Per pandas-dev/pandas#43396, the reason seems to be it conflicted with the names and headers arguments, and didn't add much value. (I think it only took effect if you passed header=None and didn't pass names.) Instead of prefix='foo_' you're now supposed to call df.columns = [f'foo_{col}' for col in df.columns] after read_csv.
But p9-cli doesn't give the user a way to do that. By default columns are just numbered 0,1,..., and plotnine.aes works best if the column names are valid python identifiers. So if you have no header, your options to replace prefix=col seem to be:
- Provide
names,=col0 ,=col1 ,=col2 ...- annoying. - Instead of
x=col0 y=col1, usex='data[0]' y='data[1]'- undocumented in p9-cli, more verbose, and I'm not sure it works in all the same ways.
If I don't like these, options for p9-cli seem to be:
- Implement
prefixourselves, by removing it from the kwargs passed toread_csvand then renaming the columns afterwards.- Either always rename if
prefixis passed (different fromread_csv), or only ifheaderandnamesare bothNone(might be helpful if e.g. the header in the file is numeric; unlikely to cause problems?).
- Either always rename if
- If
headerandnamesare bothNone, automatically rename columns to add a prefix. (Obvious choices arec,colorcol_. FollowingqI think I likec.) - Some combo. Perhaps: if
headerandnamesare bothNone, automatically add prefix. Look at theprefixkwarg to choose the prefix, sensible default if not given. If they're not bothNone, and there's aprefixkwarg, add a prefix anyway. I think I like this best.
Could also move this prefix arg outside of --csv, which would improve consistency, and possibly also apply to --dataset and (if supported in future) reading from sqlite tables and stuff. I think I'll leave it there at least for now though.