|
65 | 65 | "cell_type": "code", |
66 | 66 | "execution_count": null, |
67 | 67 | "metadata": { |
68 | | - "id": "bDGChJBK9ooq" |
| 68 | + "id": "bDGChJBK9ooq", |
| 69 | + "cellView": "form" |
69 | 70 | }, |
70 | 71 | "outputs": [], |
71 | 72 | "source": [ |
|
139 | 140 | " bigquery-public-data.idc_current.dicom_all\n", |
140 | 141 | "```\n", |
141 | 142 | "\n", |
142 | | - "To run this query interactively, copy the query above to the clipboard, paste it into the Editor tab in the [BigQuery SQL workspace](https://console.cloud.google.com/bigquery), and hit the \"Run\" button. Within few moments you should be able to see the list of collections in IDC in the \"Query results\" section of the interface.\n", |
| 143 | + "To run this query interactively, copy the query above to the clipboard, paste it into the query tab in the [BigQuery SQL workspace](https://console.cloud.google.com/bigquery), and hit the \"Run\" button. Within few moments you should be able to see the list of collections in IDC in the \"Query results\" section of the interface.\n", |
143 | 144 | "\n", |
144 | 145 | "\n", |
145 | 146 | "\n", |
|
167 | 168 | "2. `idc_current`is a _dataset_ within the `bigquery-public-data` project. Think of BigQuery datasets as containers that are used to organize and control access to the tables within the project.\n", |
168 | 169 | "3. `dicom_all` is one of the tables within the `idc_current` dataset. As you spend more time learning about IDC, you will hopefully leverage other tables available in that dataset.\n", |
169 | 170 | "\n", |
170 | | - "If you now look back at the [BigQuery console](https://console.cloud.google.com/bigquery) and expand the list of datasets under the `bigquery-public-data` project, you will see that in addition to the `idc_current` dataset there are also datasets `idc_v12`, `idc_v11`, etc all the way to `idc_v1`. Those datasets correspond to the IDC data release versions, with `idc_current` being an alias for the latest (at the moment, v12) version of IDC data. \n", |
| 171 | + "If you now look back at the [BigQuery console](https://console.cloud.google.com/bigquery) and expand the list of datasets under the `bigquery-public-data` project, you will see that in addition to the `idc_current` dataset there are also datasets `idc_v14`, `idc_v13`, etc all the way to `idc_v1`. Those datasets correspond to the IDC data release versions, with `idc_current` being an alias for the latest (at the moment of writing this, v14 is the latest release) version of IDC data. \n", |
171 | 172 | "\n", |
172 | 173 | "We will not spend time discussing how IDC versioning works, but it is important to know that \n", |
173 | 174 | "\n", |
174 | 175 | "1. IDC data is versioned;\n", |
175 | | - "2. queries against the `idc_current` dataset are equivalent to the queries against the latest version (currently, `idc_v12`) of IDC data;\n", |
| 176 | + "2. queries against the `idc_current` dataset are equivalent to the queries against the latest version (currently, `idc_v14`) of IDC data;\n", |
176 | 177 | "3. if you want the results of the queries to be persistent, write those against `idc_v*` datasets instead of `idc_current`." |
177 | 178 | ] |
178 | 179 | }, |
|
370 | 371 | " # Use AND operator to combine the filter values for the\n", |
371 | 372 | " # Modality and tcia_tumorLocation to select collections that\n", |
372 | 373 | " # include MR images for Lung cancer locations\n", |
| 374 | + " # Note that SQL uses single = for comparison, and strings should\n", |
| 375 | + " # be enclosed in \"\"\n", |
373 | 376 | "\"\"\"\n", |
374 | 377 | "\n", |
375 | 378 | "selection_result = bq_client.query(selection_query)\n", |
|
415 | 418 | "# we specified in the beginning of the notebook!\n", |
416 | 419 | "bq_client = bigquery.Client(my_ProjectID)\n", |
417 | 420 | "\n", |
418 | | - "# Execution of this cell will fail unless you wrote the query below!\n", |
419 | 421 | "selection_query = \"\"\"\n", |
420 | 422 | "SELECT \n", |
421 | 423 | " COUNT(DISTINCT(PatientID)) as patient_cnt\n", |
|
0 commit comments