Title: | An R Client to the 'PatentsView' API |
---|---|
Description: | Provides functions to simplify the 'PatentsView' API (<https://patentsview.org/apis/purpose>) query language, send GET and POST requests to the API's twelve endpoints, and parse the data that comes back. |
Authors: | Christopher Baker [aut, cre] |
Maintainer: | Christopher Baker <[email protected]> |
License: | MIT + file LICENSE |
Version: | 0.3.0 |
Built: | 2024-11-23 15:29:08 UTC |
Source: | https://github.com/mustberuss/patentsview |
This will cast the data fields returned by search_pv
so that
they have their most appropriate data types (e.g., date, numeric, etc.).
cast_pv_data(data)
cast_pv_data(data)
data |
The data returned by |
The same type of object that you passed into cast_pv_data
.
## Not run: fields <- c("patent_date", "patent_title", "patent_year") res <- search_pv(query = "{\"patent_id\":\"5116621\"}", fields = fields) cast_pv_data(data = res$data) ## End(Not run)
## Not run: fields <- c("patent_date", "patent_title", "patent_year") res <- search_pv(query = "{\"patent_id\":\"5116621\"}", fields = fields) cast_pv_data(data = res$data) ## End(Not run)
A data frame containing the names of retrievable fields for each of the endpoints. You can find this data on the API's online documentation for each endpoint as well (e.g., the patent endpoint field list table).
fieldsdf
fieldsdf
A data frame with the following columns:
The endpoint that this field record is for
The complete name of the field, including the parent group if applicable
The field's input data type
The group the field belongs to
The field name without the parent group structure
This function reminds the user what the possible PatentSearch API endpoints are. (Note that the API was originally know as the PatentsView API.)
get_endpoints()
get_endpoints()
A character vector with the names of each endpoint.
This function returns a vector of fields that you can retrieve from a given
API endpoint (i.e., the fields you can pass to the fields
argument in
search_pv
). You can limit these fields to only cover certain
entity group(s) as well (which is recommended, given the large number of
possible fields for each endpoint).
get_fields(endpoint, groups = NULL, include_pk = FALSE)
get_fields(endpoint, groups = NULL, include_pk = FALSE)
endpoint |
The API endpoint whose field list you want to get. See
|
groups |
A character vector giving the group(s) whose fields you want
returned. A value of |
include_pk |
Boolean on whether to include the endpoint's primary key,
defaults to FALSE. The primary key is needed if you plan on calling
|
A character vector with field names.
# Get all top level (non-nested) fields for the patent endpoint: fields <- get_fields(endpoint = "patent", groups = c("")) # ...Then pass to search_pv: ## Not run: search_pv( query = '{"_gte":{"patent_date":"2007-01-04"}}', fields = fields ) ## End(Not run) # Get all patent and assignee-level fields for the patent endpoint: fields <- get_fields(endpoint = "patent", groups = c("assignees", "")) ## Not run: # ...Then pass to search_pv: search_pv( query = '{"_gte":{"patent_date":"2007-01-04"}}', fields = fields ) ## End(Not run) # Get the nested inventors fields and the primary key in order to call unnest_pv_data # on the returned data. unnest_pv_data would throw an error if the primary key was # not present in the results. fields <- get_fields(endpoint = "patent", groups = c("inventors"), include_pk = TRUE) ## Not run: # ...Then pass to search_pv and unnest the results results <- search_pv( query = '{"_gte":{"patent_date":"2007-01-04"}}', fields = fields ) unnest_pv_data(results$data) ## End(Not run)
# Get all top level (non-nested) fields for the patent endpoint: fields <- get_fields(endpoint = "patent", groups = c("")) # ...Then pass to search_pv: ## Not run: search_pv( query = '{"_gte":{"patent_date":"2007-01-04"}}', fields = fields ) ## End(Not run) # Get all patent and assignee-level fields for the patent endpoint: fields <- get_fields(endpoint = "patent", groups = c("assignees", "")) ## Not run: # ...Then pass to search_pv: search_pv( query = '{"_gte":{"patent_date":"2007-01-04"}}', fields = fields ) ## End(Not run) # Get the nested inventors fields and the primary key in order to call unnest_pv_data # on the returned data. unnest_pv_data would throw an error if the primary key was # not present in the results. fields <- get_fields(endpoint = "patent", groups = c("inventors"), include_pk = TRUE) ## Not run: # ...Then pass to search_pv and unnest the results results <- search_pv( query = '{"_gte":{"patent_date":"2007-01-04"}}', fields = fields ) unnest_pv_data(results$data) ## End(Not run)
This function suggests a value that you could use for the pk
argument
in unnest_pv_data
, based on the endpoint you searched.
It will return a potential unique identifier for a given entity (i.e., a
given endpoint). For example, it will return "patent_id" when
endpoint_or_entity = "patent"
. It would return the same value if
the entity name "patents" was passed via get_ok_pk(names(pv_return$data))
where pv_return was returned from search_pv
.
get_ok_pk(endpoint_or_entity)
get_ok_pk(endpoint_or_entity)
endpoint_or_entity |
The endpoint or entity name for which you would like to know a potential primary key for. |
The name of a primary key (pk
) that you could pass to
unnest_pv_data
.
get_ok_pk(endpoint_or_entity = "inventor") # Returns "inventor_id" get_ok_pk(endpoint_or_entity = "cpc_group") # Returns "cpc_group_id"
get_ok_pk(endpoint_or_entity = "inventor") # Returns "inventor_id" get_ok_pk(endpoint_or_entity = "cpc_group") # Returns "cpc_group_id"
This function strategically pads a patent_id with zeroes to 8 characters, needed for custom paging and possibly when querying by patent_id.
pad_patent_id(patent_id)
pad_patent_id(patent_id)
patent_id |
The patent_id that needs to be padded. It can be the patent_id for a utility, design, plant or reissue patent. |
## Not run: padded <- pad_patent_id("RE36479") padded2 <- pad_patent_id("3930306") ## End(Not run)
## Not run: padded <- pad_patent_id("RE36479") padded2 <- pad_patent_id("3930306") ## End(Not run)
A list of functions that make it easy to write PatentsView queries. See the details section below for a list of the 15 functions, as well as the writing queries vignette for further details.
qry_funs
qry_funs
An object of class list
of length 15.
1. Comparison operator functions
There are 6 comparison operator functions that work with fields of type integer, float, date, or string:
eq
- Equal to
neq
- Not equal to
gt
- Greater than
gte
- Greater than or equal to
lt
- Less than
lte
- Less than or equal to
There are 2 comparison operator functions that only work with fields of type string:
begins
- The string begins with the value string
contains
- The string contains the value string
There are 3 comparison operator functions that only work with fields of type fulltext:
text_all
- The text contains all the words in the value
string
text_any
- The text contains any of the words in the value
string
text_phrase
- The text contains the exact phrase of the value
string
2. Array functions
There are 2 array functions:
and
- Both members of the array must be true
or
- Only one member of the array must be true
3. Negation function
There is 1 negation function:
not
- The comparison is not true
4. Convenience function
There is 1 convenience function:
in_range
- Builds a <= x <= b query
An object of class pv_query
. This is basically just a simple
list with a print method attached to it.
qry_funs$eq(patent_date = "2001-01-01") qry_funs$not(qry_funs$eq(patent_date = "2001-01-01")) qry_funs$in_range(patent_year = c(2010, 2021)) qry_funs$in_range(patent_date = c("1976-01-01", "1983-02-28"))
qry_funs$eq(patent_date = "2001-01-01") qry_funs$not(qry_funs$eq(patent_date = "2001-01-01")) qry_funs$in_range(patent_year = c(2010, 2021)) qry_funs$in_range(patent_date = c("1976-01-01", "1983-02-28"))
Some of the endpoints now return HATEOAS style links to get more data. E.g., the patent endpoint may return a link such as: "https://search.patentsview.org/api/v1/inventor/fl:th_ln:jefferson-1/"
retrieve_linked_data( url, encoded_url = FALSE, api_key = Sys.getenv("PATENTSVIEW_API_KEY"), ... )
retrieve_linked_data( url, encoded_url = FALSE, api_key = Sys.getenv("PATENTSVIEW_API_KEY"), ... )
url |
A link that was returned by the API on a previous call, an example in the documentation or a Request URL from the API's Swagger UI page. |
encoded_url |
boolean to indicate whether the url has been URL encoded, defaults to FALSE. Set to TRUE for Request URLs from Swagger UI. |
api_key |
API key, it defaults to Sys.getenv("PATENTSVIEW_API_KEY"). Request a key here. |
... |
Curl options passed along to httr2's |
A list with the following three elements:
A list with one element - a named data frame containing the data returned by the server. Each row in the data frame corresponds to a single value for the primary entity. For example, if you search the assignee endpoint, then the data frame will be on the assignee-level, where each row corresponds to a single assignee. Fields that are not on the assignee-level would be returned in list columns.
Entity counts across all pages of output (not just the page returned to you).
Details of the GET HTTP request that was sent to the server.
## Not run: retrieve_linked_data( "https://search.patentsview.org/api/v1/cpc_group/G01S7:4811/" ) endpoint_url <- "https://search.patentsview.org/api/v1/patent/" q_param <- '?q={"_text_any":{"patent_title":"COBOL cotton gin"}}' s_and_o_params <- '&s=[{"patent_id": "asc" }]&o={"size":50}' f_param <- '&f=["inventors.inventor_name_last","patent_id","patent_date","patent_title"]' # (URL broken up to avoid a long line warning in this Rd) retrieve_linked_data( paste0(endpoint_url, q_param, s_and_o_params, f_param) ) retrieve_linked_data( "https://search.patentsview.org/api/v1/patent/?q=%7B%22patent_date%22%3A%221976-01-06%22%7D", encoded_url = TRUE ) ## End(Not run)
## Not run: retrieve_linked_data( "https://search.patentsview.org/api/v1/cpc_group/G01S7:4811/" ) endpoint_url <- "https://search.patentsview.org/api/v1/patent/" q_param <- '?q={"_text_any":{"patent_title":"COBOL cotton gin"}}' s_and_o_params <- '&s=[{"patent_id": "asc" }]&o={"size":50}' f_param <- '&f=["inventors.inventor_name_last","patent_id","patent_date","patent_title"]' # (URL broken up to avoid a long line warning in this Rd) retrieve_linked_data( paste0(endpoint_url, q_param, s_and_o_params, f_param) ) retrieve_linked_data( "https://search.patentsview.org/api/v1/patent/?q=%7B%22patent_date%22%3A%221976-01-06%22%7D", encoded_url = TRUE ) ## End(Not run)
This function makes an HTTP request to the PatentsView API for data matching the user's query.
search_pv( query, fields = NULL, endpoint = "patent", subent_cnts = FALSE, mtchd_subent_only = lifecycle::deprecated(), page = lifecycle::deprecated(), per_page = lifecycle::deprecated(), size = 1000, after = NULL, all_pages = FALSE, sort = NULL, method = "GET", error_browser = NULL, api_key = Sys.getenv("PATENTSVIEW_API_KEY"), ... )
search_pv( query, fields = NULL, endpoint = "patent", subent_cnts = FALSE, mtchd_subent_only = lifecycle::deprecated(), page = lifecycle::deprecated(), per_page = lifecycle::deprecated(), size = 1000, after = NULL, all_pages = FALSE, sort = NULL, method = "GET", error_browser = NULL, api_key = Sys.getenv("PATENTSVIEW_API_KEY"), ... )
query |
The query that the API will use to filter records.
|
fields |
A character vector of the fields that you want returned to you.
A value of Nested fields can be fully qualified, e.g., "application.filing_date" or the
group name can be used to retrieve all of its nested fields, E.g. "application".
The latter would be similar to passing |
endpoint |
The web service resource you wish to search. Use
|
subent_cnts |
This is always FALSE in the new version of the API as the total counts of unique subentities is no longer available. |
mtchd_subent_only |
This is always FALSE in the new version of the API as non-matched subentities will always be returned. |
page |
The new version of the API does not use
|
per_page |
|
size |
The number of records that should be returned per page. This
value can be as high as 1,000 (e.g., |
after |
A list of sort key values that defaults to NULL. This
exposes the API's paging parameter for users who want to implement their own
paging. It cannot be set when |
all_pages |
Do you want to download all possible pages of output? If
|
sort |
A named character vector where the name indicates the field to
sort by and the value indicates the direction of sorting (direction should
be either "asc" or "desc"). For example, |
method |
The HTTP method that you want to use to send the request. Possible values include "GET" or "POST". Use the POST method when your query is very long (say, over 2,000 characters in length). |
error_browser |
|
api_key |
API key, it defaults to Sys.getenv("PATENTSVIEW_API_KEY"). Request a key here. |
... |
Curl options passed along to httr2's |
A list with the following three elements:
A list with one element - a named data frame containing the data returned by the server. Each row in the data frame corresponds to a single value for the primary entity. For example, if you search the assignee endpoint, then the data frame will be on the assignee-level, where each row corresponds to a single assignee. Fields that are not on the assignee-level would be returned in list columns.
Entity counts across all pages of output (not just the page returned to you).
Details of the HTTP request that was sent to the server.
When you set all_pages = TRUE
, you will only get a sample request.
In other words, you will not be given multiple requests for the multiple
calls that were made to the server (one for each page of results).
## Not run: search_pv(query = '{"_gt":{"patent_year":2010}}') search_pv( query = qry_funs$gt(patent_year = 2010), fields = get_fields("patent", c("", "assignees")) ) search_pv( query = qry_funs$gt(patent_year = 2010), method = "POST", fields = "patent_id", sort = c("patent_id" = "asc") ) search_pv( query = qry_funs$eq(inventor_name_last = "Crew"), endpoint = "inventor", all_pages = TRUE ) search_pv( query = qry_funs$contains(assignee_individual_name_last = "Smith"), endpoint = "assignee" ) search_pv( query = qry_funs$contains(inventors.inventor_name_last = "Smith"), endpoint = "patent", timeout = 40 ) search_pv( query = qry_funs$eq(patent_id = "11530080"), fields = "application" ) ## End(Not run)
## Not run: search_pv(query = '{"_gt":{"patent_year":2010}}') search_pv( query = qry_funs$gt(patent_year = 2010), fields = get_fields("patent", c("", "assignees")) ) search_pv( query = qry_funs$gt(patent_year = 2010), method = "POST", fields = "patent_id", sort = c("patent_id" = "asc") ) search_pv( query = qry_funs$eq(inventor_name_last = "Crew"), endpoint = "inventor", all_pages = TRUE ) search_pv( query = qry_funs$contains(assignee_individual_name_last = "Smith"), endpoint = "assignee" ) search_pv( query = qry_funs$contains(inventors.inventor_name_last = "Smith"), endpoint = "patent", timeout = 40 ) search_pv( query = qry_funs$eq(patent_id = "11530080"), fields = "application" ) ## End(Not run)
This function converts a single data frame that has subentity-level list columns in it into multiple data frames, one for each entity/subentity. The multiple data frames can be merged together using the primary key variable specified by the user (see the relational data chapter in "R for Data Science" for an in-depth introduction to joining tabular data).
unnest_pv_data(data, pk = get_ok_pk(names(data)))
unnest_pv_data(data, pk = get_ok_pk(names(data)))
data |
The data returned by |
pk |
The column/field name that will link the data frames together. This
should be the unique identifier for the primary entity. For example, if you
used the patent endpoint in your call to |
A list with multiple data frames, one for each entity/subentity.
Each data frame will have the pk
column in it, so you can link the
tables together as needed.
## Not run: fields <- c("patent_id", "patent_title", "inventors.inventor_city", "inventors.inventor_country") res <- search_pv(query = '{"_gte":{"patent_year":2015}}', fields = fields) unnest_pv_data(data = res$data, pk = "patent_id") ## End(Not run)
## Not run: fields <- c("patent_id", "patent_title", "inventors.inventor_city", "inventors.inventor_country") res <- search_pv(query = '{"_gte":{"patent_year":2015}}', fields = fields) unnest_pv_data(data = res$data, pk = "patent_id") ## End(Not run)
This function evaluates whatever code you pass to it in the environment of
the qry_funs
list. This allows you to cut down on typing when
writing your queries. If you want to cut down on typing even more, you can
try assigning the qry_funs
list into your global environment
with: list2env(qry_funs, envir = globalenv())
.
with_qfuns(code, envir = parent.frame())
with_qfuns(code, envir = parent.frame())
code |
Code to evaluate. See example. |
envir |
Where should R look for objects present in |
The result of code
- i.e., your query.
qry_funs$and( qry_funs$gte(patent_date = "2007-01-01"), qry_funs$text_phrase(patent_abstract = c("computer program")), qry_funs$or( qry_funs$eq(inventors.inventor_name_last = "Ihaka"), qry_funs$eq(inventors.inventor_name_last = "Chris") ) ) # ...With it, this becomes: with_qfuns( and( gte(patent_date = "2007-01-01"), text_phrase(patent_abstract = c("computer program")), or( eq(inventors.inventor_name_last = "Ihaka"), eq(inventors.inventor_name_last = "Chris") ) ) )
qry_funs$and( qry_funs$gte(patent_date = "2007-01-01"), qry_funs$text_phrase(patent_abstract = c("computer program")), qry_funs$or( qry_funs$eq(inventors.inventor_name_last = "Ihaka"), qry_funs$eq(inventors.inventor_name_last = "Chris") ) ) # ...With it, this becomes: with_qfuns( and( gte(patent_date = "2007-01-01"), text_phrase(patent_abstract = c("computer program")), or( eq(inventors.inventor_name_last = "Ihaka"), eq(inventors.inventor_name_last = "Chris") ) ) )