diff --git a/R/extractWorks.R b/R/extractWorks.R index 3aca710..29addee 100644 --- a/R/extractWorks.R +++ b/R/extractWorks.R @@ -1,4 +1,4 @@ -#' Extract works associated with a concept in openAlex and store data as a compressed R list object +#' Extract works associated with a source or concept in openAlex and store data as a compressed R list object #' #' @param data_style options for how much/how little data to return, see @details #' @param mailto email address of user, needed to get in 'polite pool' of API diff --git a/R/queryConcepts.R b/R/queryConcepts.R index b68cb66..e63c200 100644 --- a/R/queryConcepts.R +++ b/R/queryConcepts.R @@ -8,6 +8,7 @@ #' @param variables to return in data.table #' @description Search spreadsheet of openAlex concept tree (see https://docs.openalex.org/about-the-data/concept). Current google sheet url is: https://docs.google.com/spreadsheets/d/1LBFHjPt4rj_9r0t0TTAlT68NwOtNH8Z21lBMsJDMoZg/edit#gid=1473310811 #' @return datatable of results +#' @example man/examples/concept.R #' @export #' @details NOTE: note that https://api.openalex.org/concepts doesn't seem to tolerate regex at this point, so that needs to be done with the output #' @importFrom stringr str_detect diff --git a/R/querySources.R b/R/querySources.R index cfc3a32..8ffd2b1 100644 --- a/R/querySources.R +++ b/R/querySources.R @@ -5,6 +5,7 @@ #' @param type which type of source should be included in query, defaults to all #' @description Primary use of this function is to get source ID for use in API #' @export +#' @example man/examples/sources.R #' @import jsonlite #' @import stringr #' @import httr diff --git a/R/queryTitles.R b/R/queryTitles.R index fc7ca73..62b82ff 100644 --- a/R/queryTitles.R +++ b/R/queryTitles.R @@ -15,6 +15,7 @@ #' @import jsonlite #' @import magrittr #' @import data.table +#' @example man/examples/sources.R #' @details Note that because extracted records can be pretty large--and are complicated, nested json file--there is an optional "data_style" command that lets the user specify what to return. Currently there are three options: (1) bare_bones returns OpenAlex ID + DOI, basically, results that can be used to look up the work again; (2) citation returns typical citation information, like journal name, author, etc., with a couple bonus items like source.id to link back to openAlex (3) comprehensive returns author institutional affiliations, open access info, funding data, etc.; and (4) [not active] all returns the entire result in original json format. #' @export #' diff --git a/README.md b/README.md index b65d040..2234a77 100644 --- a/README.md +++ b/README.md @@ -8,12 +8,26 @@ The full openAlex database is ~300GB and so hosting the entire database is not a # functions -indexBuild currently does three main tasks: (1) search and identify IDs for venues (e.g., journals) and concept tags in openAlex; (2) query works associated with venues or concepts in openAlex and return a json database; (3) turn json file trees for openAlex works into a row-wise data.table object with a simple subset of metadata. Right now, the first two tasks are split across venues and concepts, e.g., there are separate extractVenues() and extractConcepts() functions. At some point, these can be combined. +indexBuild currently does three main tasks: + +## (1) search and identify IDs for venues (e.g., journals) and concept tags in openAlex + +- `queryConcepts()` lets you search for concepts that works are tagged with in OpenAlex. Example use case: find openAlex ID for "public administration" and use that ID to subsequently query for works tagged with this concept. +- `querySources()` lets you search for journals in openAlex. Use case is to get journal IDs that can then be used to extract journal information or access all works associated with a journal. +- `queryTitles()` lets you input the name/title of a reference and search for matches in openAlex. Use case is to generate a list of candidate matches that can be indexed for more comprehensive, multivariate search. +- `lookupJournal()` is a convenience function for matching ISSN IDs to openAlex identifiers. Example use case is linking journal data from SciMago to OpenAlex. + + +## (2) extract data for works associated with a given venue or concept in openAlex and return a json database; + +- `extractWorks()` lets you input source or concept id and returns query result containing all works associated with that ID in openAlex. + +Currently, `extractWorks()` handles processing internally, applying the `processWork()` function to JSON query results to develop a flat file (data.table) representation of the results. `processWork` has multiple return options, including 'bare_bones' which returns just DOI and openAlex ID (useful for further query), 'citation' which returns basic reference information, and 'comprehensive' which returns extra information like authors' institutional affiliations, available funding data, and open access information. Note that where necessary, `processWork` collapses entries using the ';' separator to store multiple entries (e.g., co-author names and IDs) in a single table entry. # example To get information about a journal, you can feed in a journal title: ``` -queryVenues(venue_string = 'Journal of Public Administration Research and Theory') +querySources(venue_string = 'Journal of Public Administration Research and Theory') ``` and get information about a concept, you can feed in a concept. diff --git a/man/examples/concept.R b/man/examples/concept.R new file mode 100644 index 0000000..4a05975 --- /dev/null +++ b/man/examples/concept.R @@ -0,0 +1,5 @@ +# find concepts in openAlex +id <- queryConcepts(concept_string = 'public policy') +print(id) + + diff --git a/man/examples/sources.R b/man/examples/sources.R new file mode 100644 index 0000000..4662174 --- /dev/null +++ b/man/examples/sources.R @@ -0,0 +1,4 @@ +# find journal in openAlex +id <- querySources(source = 'Journal of Public Administration Research and Theory') +print(id) + diff --git a/man/examples/titles.R b/man/examples/titles.R new file mode 100644 index 0000000..82ef9a2 --- /dev/null +++ b/man/examples/titles.R @@ -0,0 +1,5 @@ +# find titles in openAlex +id <- queryTitles('Does collaborative governance work?') +print(id) + + diff --git a/man/extractWorks.Rd b/man/extractWorks.Rd index 04e2370..95d5a00 100644 --- a/man/extractWorks.Rd +++ b/man/extractWorks.Rd @@ -2,7 +2,7 @@ % Please edit documentation in R/extractWorks.R \name{extractWorks} \alias{extractWorks} -\title{Extract works associated with a concept in openAlex and store data as a compressed R list object} +\title{Extract works associated with a source or concept in openAlex and store data as a compressed R list object} \usage{ extractWorks( data_style = c("bare_bones", "citation", "comprehensive", "all"), diff --git a/man/queryConcepts.Rd b/man/queryConcepts.Rd index 59dd5bf..55be2a6 100644 --- a/man/queryConcepts.Rd +++ b/man/queryConcepts.Rd @@ -36,3 +36,10 @@ Search spreadsheet of openAlex concept tree (see https://docs.openalex.org/about \details{ NOTE: note that https://api.openalex.org/concepts doesn't seem to tolerate regex at this point, so that needs to be done with the output } +\examples{ +# find concepts in openAlex +id <- queryConcepts(concept_string = 'public policy') +print(id) + + +} diff --git a/man/querySources.Rd b/man/querySources.Rd index bdf55f0..a3bc4b4 100644 --- a/man/querySources.Rd +++ b/man/querySources.Rd @@ -16,3 +16,9 @@ querySources(source = NULL, mailto = NULL, type = NULL) \description{ Primary use of this function is to get source ID for use in API } +\examples{ +# find journal in openAlex +id <- querySources(source = 'Journal of Public Administration Research and Theory') +print(id) + +} diff --git a/man/queryTitle.Rd b/man/queryTitle.Rd index 7cc6664..2248c41 100644 --- a/man/queryTitle.Rd +++ b/man/queryTitle.Rd @@ -36,4 +36,8 @@ Note that because extracted records can be pretty large--and are complicated, ne data(titles) #query a single titles queryOpenAlex(titles[22]) +# find journal in openAlex +id <- querySources(source = 'Journal of Public Administration Research and Theory') +print(id) + }