Cleaning Data Cleaning with OpenRefine Lesson adapted from Data Carpentry by the OHSU Library
The current version has been tested with OpenRefine 3.7.2 on May 2023.
- This data set is derived from The Portal Project Long-term desert ecology project data. This data file was downloaded and then modified specifically for use with OpenRefine.
- Taxon names were put back into the file.
- The number of rows was reduced to simplify the reconciliation and URL parsing exercises.
- These modifications were made in order to illustrate some features of Open Refine.
- Errors were added to the taxon names (
scientificNamefield), to demonstrate OpenRefine's ability to find likely mis-entered data. - These errors can be found using clustering algorithms on the
scientificNamecolumn, showing the power of the algorithms to find discrepancies quickly and making it simple to fix all issues found.
- Errors were added to the taxon names (