suggested corrections added

moonlanderr · moonlanderr · commit 1fd6c49303fa · 2025-01-14T13:05:04.000+05:30
diff --git a/samples/04_gis_analysts_data_scientists/classifying_human_activity_using _tabPFN_classifier.ipynb b/samples/04_gis_analysts_data_scientists/classifying_human_activity_using _tabPFN_classifier.ipynb
@@ -4,7 +4,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "## Leveraging TabPFN for Human Activity Recognition Using Mobile dataset"
+    "## Leveraging TabPFN for Human Activity Recognition Using Mobile Dataset"
    ]
   },
   {
@@ -16,14 +16,14 @@
     "* [Necessary imports](#2)\n",
     "* [Connecting to ArcGIS](#3)\n",
     "* [Accessing the datasets](#4) \n",
-    "* [Prepare training data for TabFPN](#5)\n",
-    "    * [Data Preprocessing for tabFPN Classifier Model](#6) \n",
+    "* [Prepare training data for TabPFN](#5)\n",
+    "    * [Data Preprocessing for TabPFN Classifier Model](#6) \n",
     "    * [Visualize training data](#9)\n",
     "* [Model Training](#10) \n",
-    "    * [Define the tabFPN classifier model ](#11)\n",
+    "    * [Define the TabPFN classifier model ](#11)\n",
     "    * [Fit the model](#12)\n",
     "    * [Visualize results in validation set](#13)\n",
-    "* [Predicting using tabFPN classifier model](#14)\n",
+    "* [Predicting using TabPFN classifier model](#14)\n",
     "   *  [Predict using the trained model](#15)\n",
     "* [Accuracy assessment: Compute Model Metric](#16)\n",
     "* [Conclusion](#17)"
@@ -67,8 +67,6 @@
     }
    ],
    "source": [
-    "%%time\n",
-    "\n",
     "%matplotlib inline\n",
     "import matplotlib.pyplot as plt\n",
     "\n",
@@ -77,7 +75,6 @@
     "from sklearn.preprocessing import StandardScaler\n",
     "from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score, classification_report\n",
     "\n",
-    "import arcgis\n",
     "from arcgis.gis import GIS\n",
     "from arcgis.learn import MLModel, prepare_tabulardata"
    ]
@@ -102,9 +99,9 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "## Accessing the dataset <a class=\"anchor\" id=\"4\"></a>\n",
+    "## Accessing the datasets <a class=\"anchor\" id=\"4\"></a>\n",
     "\n",
-    "The HAR training dataset consists of 1,020 rows and 561 features, capturing sensor data from mobile devices to classify human activities like walking, running, and sitting. The data includes measurements from accelerometers, gyroscopes, and GPS, providing insights into movement patterns while ensuring that location data remains anonymized for privacy protection. Features such as BodyAcc (body accelerometer), GravityAcc (gravity accelerometer), BodyAccJerk, BodyGyro (body gyroscope), and BodyGyroJerk are used to capture dynamic and rotational movements. Time-domain and frequency-domain features are extracted from these raw signals, helping to distinguish between various activities based on patterns in acceleration, rotation, and speed, making the dataset ideal for activity classification tasks."
+    "Here we will access the train and test datasets. The Human Activity Recognition (HAR) training dataset consists of 1,020 rows and 561 features, capturing sensor data from mobile devices to classify human activities like walking, running, and sitting. The data includes measurements from accelerometers, gyroscopes, and GPS, providing insights into movement patterns while ensuring that location data remains anonymized for privacy protection. Features such as BodyAcc (body accelerometer), GravityAcc (gravity accelerometer), BodyAccJerk, BodyGyro (body gyroscope), and BodyGyroJerk are used to capture dynamic and rotational movements. Time-domain and frequency-domain features are extracted from these raw signals, helping to distinguish between various activities based on patterns in acceleration, rotation, and speed, making the dataset ideal for activity classification tasks."
    ]
   },
   {
@@ -153,7 +150,7 @@
    "metadata": {},
    "outputs": [],
    "source": [
-    "# Download the train datas and save ing it in local folder\n",
+    "# Download the train data and saving it in local folder\n",
     "data_path = data_table.get_data()"
    ]
   },
@@ -391,7 +388,7 @@
     }
    ],
    "source": [
-    "# Read the donwloaded data\n",
+    "# Read the downloaded data\n",
     "train_har_data = pd.read_csv(data_path)\n",
     "train_har_data.head(5)"
    ]
@@ -420,7 +417,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "Next we will access the test dataset, which is a larger dataset containing 6,332 samples. "
+    "Next, we will access the test dataset, which is significantly larger, containing 6,332 samples."
    ]
   },
   {
@@ -469,7 +466,7 @@
    "metadata": {},
    "outputs": [],
    "source": [
-    "# Download the test data and save it in local folder\n",
+    "# Download the test data and save it to a local folder\n",
     "test_data_path = test_data_table.get_data()"
    ]
   },
@@ -736,7 +733,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "## Prepare training data for TabFPN <a class=\"anchor\" id=\"5\"></a>"
+    "## Prepare training data for TabPFN <a class=\"anchor\" id=\"5\"></a>"
    ]
   },
   {
@@ -1354,14 +1351,14 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "### Data Preprocessing for tabFPN Classifier Model<a class=\"anchor\" id=\"6\"></a>"
+    "### Data Preprocessing for TabPFN Classifier Model<a class=\"anchor\" id=\"6\"></a>"
    ]
   },
   {
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "To process the training data for the TabPFN model, we will use Linear Discriminant Analysis (LDA) to reduce the number of features from the original 560 to below the tabFPN model's maximum limit of 100. By applying LDA, we can preserve the most relevant information for classification while reducing the complexity of the input data, making it suitable for the TabPFN model, which requires a compact input format for efficient processing and predictions."
+    "To process the training data for the TabPFN model, we will use Linear Discriminant Analysis (LDA) to reduce the number of features from the original 561 to below the TabPFN model's maximum limit of 100. By applying LDA, we can preserve the most relevant information for classification while reducing the complexity of the input data, making it suitable for the TabPFN model, which requires a compact input format for efficient processing and predictions."
    ]
   },
   {
@@ -1381,7 +1378,7 @@
     }
    ],
    "source": [
-    "# Data processing to reduce the features to 100 or less as required for tabFPN models\n",
+    "# Data processing to reduce the features to 100 or less as required for TabPFN models\n",
     "X = train_har_data.drop(columns=['Activity'])\n",
     "y = train_har_data['Activity']\n",
     "scaler = StandardScaler()\n",
@@ -1395,54 +1392,62 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 35,
+   "execution_count": 36,
    "metadata": {},
    "outputs": [
     {
      "data": {
       "text/plain": [
-       "5"
+       "Index(['LDA1', 'LDA2', 'LDA3', 'LDA4', 'LDA5', 'Activity'], dtype='object')"
       ]
      },
-     "execution_count": 35,
+     "execution_count": 36,
      "metadata": {},
      "output_type": "execute_result"
     }
    ],
    "source": [
-    "# define the explanatory vairables\n",
-    "X = list(X_train_lda_df.columns)\n",
-    "X =X[:-1]\n",
-    "len(X)"
+    "# Visualize the final processed training data columns\n",
+    "X_train_lda_df.columns"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "In the above training dataframe we will use the `Activity` as the target label to be predicted using rest of the features as explanatory variables `X`. We define the explanatory variables as follows: "
    ]
   },
   {
    "cell_type": "code",
-   "execution_count": 36,
+   "execution_count": 35,
    "metadata": {},
    "outputs": [
     {
      "data": {
       "text/plain": [
-       "Index(['LDA1', 'LDA2', 'LDA3', 'LDA4', 'LDA5', 'Activity'], dtype='object')"
+       "5"
       ]
      },
-     "execution_count": 36,
+     "execution_count": 35,
      "metadata": {},
      "output_type": "execute_result"
     }
    ],
    "source": [
-    "X_train_lda_df.columns"
+    "# define the explanatory vairables\n",
+    "X = list(X_train_lda_df.columns)\n",
+    "X =X[:-1]\n",
+    "len(X)"
    ]
   },
   {
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "Once the explanatory variables X are preprocessed this is now used as input for the *prepare_tabulardata* method from the tabular learner in the arcgis.learn. The method takes the feature layer or a spatial dataframe containing the dataset and prepares it for fitting the model.  \n",
+    "Once the explanatory variables `X` is defined, this is now used as input in the `prepare_tabulardata` method from the tabular learner in the `arcgis.learn`. The method takes the feature layer or a spatial dataframe containing the dataset and prepares it for fitting the model.  \n",
     "\n",
-    "The input parameters required for the tool are similar to the ones mentioned previously :"
+    "The input parameters required for the tool are used as shown here :"
    ]
   },
   {
@@ -1465,7 +1470,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "To get a sense of what the training data looks like, the show_batch() method will randomly pick a few training sample and visualize them. The sample  are showing the explanaotyr vairables and the variblss to predict column."
+    "To get a sense of what the training data looks like, the `show_batch()` method will randomly pick a few training sample and visualize them. The sample are showing the explanatory variables and the `Activity` target label  to predict."
    ]
   },
   {
@@ -1582,9 +1587,9 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "### Define the tabFPN classifier model  <a class=\"anchor\" id=\"11\"></a>\n",
+    "### Define the TabPFN classifier model  <a class=\"anchor\" id=\"11\"></a>\n",
     "\n",
-    "The default, initialization of the tabFPN classifier model object is shown below:"
+    "The default, initialization of the TabPFN classifier model object is shown below:"
    ]
   },
   {
@@ -1593,7 +1598,6 @@
    "metadata": {},
    "outputs": [],
    "source": [
-    "from arcgis.learn import MLModel\n",
     "tabpfn_classifier = MLModel(data, 'tabpfn.TabPFNClassifier',device='cpu', N_ensemble_configurations=32)"
    ]
   },
@@ -1639,7 +1643,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "We can see the model score is showing excellent result."
+    "We can see the model score is showing excellent results."
    ]
   },
   {
@@ -1770,7 +1774,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "## Predicting using the tabFPN classifier model <a class=\"anchor\" id=\"14\"></a>\n",
+    "## Predicting using the TabPFN classifier model <a class=\"anchor\" id=\"14\"></a>\n",
     "\n",
     "Once the TabPFN classifier is trained on the smaller dataset of 1,020 samples, we can use it to predict the classes of a larger dataset containing 6,332 samples. Given TabPFN’s ability to process data efficiently with a single forward pass, it can handle this larger dataset quickly, classifying each sample based on the patterns learned during training. Since the model is optimized for fast and scalable predictions, it will generate class predictions for all samples. \n",
     "\n",
@@ -2042,17 +2046,7 @@
    "source": [
     "### Accuracy assessment <a class=\"anchor\" id=\"16\"></a>\n",
     "\n",
-    "Here we weill evaluate the model's performance. This will print out multiple model metrics. we can assess the model quality using its corresponding metrics. These metrics include a combination of multiple evaluation criteria, such as `accuracy`, `precision`, `recall` and `F1-Score`, which collectively measure the model's performance on the validation set."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 47,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "import pandas as pd\n",
-    "from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score, classification_report"
+    "Here we will evaluate the model's performance. This will print out multiple model metrics. we can assess the model quality using its corresponding metrics. These metrics include a combination of multiple evaluation criteria, such as `accuracy`, `precision`, `recall` and `F1-Score`, which collectively measure the model's performance on the validation set."
    ]
   },
   {
@@ -2146,9 +2140,9 @@
  ],
  "metadata": {
   "kernelspec": {
-   "display_name": "pro3.4_climax_27October2024",
+   "display_name": "pro3.4_climaxAug2024",
    "language": "python",
-   "name": "pro3.4_climax_27october2024"
+   "name": "pro3.4_climaxaug2024"
   },
   "language_info": {
    "codemirror_mode": {
@@ -2160,7 +2154,7 @@
    "name": "python",
    "nbconvert_exporter": "python",
    "pygments_lexer": "ipython3",
-   "version": "3.11.9"
+   "version": "3.11.8"
   }
  },
  "nbformat": 4,