Title: | An R Client to the 'PatentsView' API |
---|---|
Description: | Provides functions to simplify the 'PatentsView' API (<https://patentsview.org/apis/purpose>) query language, send GET and POST requests to the API's twenty seven endpoints, and parse the data that comes back. |
Authors: | Christopher Baker [aut, cre], Russ Allen [aut] |
Maintainer: | Christopher Baker <[email protected]> |
License: | MIT + file LICENSE |
Version: | 1.0.0 |
Built: | 2025-03-11 22:32:11 UTC |
Source: | https://github.com/mustberuss/patentsview |
A data frame containing the names of retrievable fields for each of the endpoints.
fieldsdf
fieldsdf
A data frame with the following columns:
The endpoint that this field record is for
The complete name of the field, including the parent group if applicable
The field's data type
The group the field belongs to
This function reminds the user what the possible PatentsView API endpoints are.
get_endpoints()
get_endpoints()
A character vector with the names of each endpoint.
Get a vector of fields that you can retrieve from a given
API endpoint (i.e., the fields you can pass to the fields
argument in
search_pv
). You can limit these fields to only cover certain
entity group(s) as well (which is recommended, given the large number of
possible fields for each endpoint).
get_fields(endpoint, groups = NULL)
get_fields(endpoint, groups = NULL)
endpoint |
The API endpoint whose field list you want to get. See
|
groups |
A character vector giving the group(s) whose fields you want
returned. A value of |
A character vector with field names.
# Get all top level (non-nested) fields for the patent endpoint: fields <- get_fields(endpoint = "patent", groups = "patents") # ...Then pass to search_pv: ## Not run: search_pv( query = '{"_gte":{"patent_date":"2007-01-04"}}', fields = fields ) ## End(Not run) # Get patent and assignee-level fields for the patent endpoint: fields <- get_fields(endpoint = "patent", groups = c("assignees", "patents")) ## Not run: # ...Then pass to search_pv: search_pv( query = '{"_gte":{"patent_date":"2007-01-04"}}', fields = fields ) ## End(Not run)
# Get all top level (non-nested) fields for the patent endpoint: fields <- get_fields(endpoint = "patent", groups = "patents") # ...Then pass to search_pv: ## Not run: search_pv( query = '{"_gte":{"patent_date":"2007-01-04"}}', fields = fields ) ## End(Not run) # Get patent and assignee-level fields for the patent endpoint: fields <- get_fields(endpoint = "patent", groups = c("assignees", "patents")) ## Not run: # ...Then pass to search_pv: search_pv( query = '{"_gte":{"patent_date":"2007-01-04"}}', fields = fields ) ## End(Not run)
This function suggests column(s) that you could use for the pk
argument
in unnest_pv_data
, based on the endpoint you searched.
It will return a potential primary key - either a single column or a
composite set of columns - for the endpoint.
get_ok_pk(endpoint)
get_ok_pk(endpoint)
endpoint |
The endpoint which you would like to know a potential primary key for. |
The column names that represent a single row for the given endpoint.
get_ok_pk(endpoint = "inventor") get_ok_pk(endpoint = "patent/foreign_citation")
get_ok_pk(endpoint = "inventor") get_ok_pk(endpoint = "patent/foreign_citation")
Pad a patent_id with zeroes to 8 characters. This is needed only for custom paging that uses sorts by patent_id.
pad_patent_id(patent_id)
pad_patent_id(patent_id)
patent_id |
The patent_id to be padded. |
## Not run: padded <- pad_patent_id("RE36479") padded2 <- pad_patent_id("3930306") ## End(Not run)
## Not run: padded <- pad_patent_id("RE36479") padded2 <- pad_patent_id("3930306") ## End(Not run)
A list of functions that make it easy to write PatentsView queries. See the details section below for a list of the 14 functions, as well as the writing queries vignette for further details.
qry_funs
qry_funs
An object of class list
of length 14.
1. Comparison operator functions
There are 6 comparison operator functions that work with fields of type integer, float, date, or string:
eq
- Equal to
neq
- Not equal to
gt
- Greater than
gte
- Greater than or equal to
lt
- Less than
lte
- Less than or equal to
There are 2 comparison operator functions that only work with fields of type string:
begins
- The string begins with the value string
contains
- The string contains the value string
There are 3 comparison operator functions that only work with fields of type fulltext:
text_all
- The text contains all the words in the value
string
text_any
- The text contains any of the words in the value
string
text_phrase
- The text contains the exact phrase of the value
string
2. Array functions
There are 2 array functions:
and
- Both members of the array must be true
or
- Only one member of the array must be true
3. Negation function
There is 1 negation function:
not
- The comparison is not true
An object of class pv_query
. This is basically just a simple
list with a print method attached to it.
qry_funs$eq(patent_date = "2001-01-01") qry_funs$not(qry_funs$eq(patent_date = "2001-01-01"))
qry_funs$eq(patent_date = "2001-01-01") qry_funs$not(qry_funs$eq(patent_date = "2001-01-01"))
Some of the endpoints now return HATEOAS style links to get more data. E.g., the patent endpoint may return a link such as: "https://search.patentsview.org/api/v1/inventor/fl:th_ln:jefferson-1/". Use this function to fetch details from those links.
retrieve_linked_data(url, api_key = Sys.getenv("PATENTSVIEW_API_KEY"), ...)
retrieve_linked_data(url, api_key = Sys.getenv("PATENTSVIEW_API_KEY"), ...)
url |
A link that was returned by the API on a previous call. |
api_key |
API key, it defaults to Sys.getenv("PATENTSVIEW_API_KEY"). Request a key here. |
... |
Curl options passed along to httr2's |
## Not run: retrieve_linked_data( "https://search.patentsview.org/api/v1/cpc_group/G01S7:4811/" ) ## End(Not run)
## Not run: retrieve_linked_data( "https://search.patentsview.org/api/v1/cpc_group/G01S7:4811/" ) ## End(Not run)
This makes an HTTP request to the PatentsView API for data matching the user's query.
search_pv( query, fields = NULL, endpoint = "patent", subent_cnts = lifecycle::deprecated(), mtchd_subent_only = lifecycle::deprecated(), page = lifecycle::deprecated(), per_page = lifecycle::deprecated(), size = 1000, after = NULL, all_pages = FALSE, sort = NULL, method = "GET", error_browser = lifecycle::deprecated(), api_key = Sys.getenv("PATENTSVIEW_API_KEY"), ... )
search_pv( query, fields = NULL, endpoint = "patent", subent_cnts = lifecycle::deprecated(), mtchd_subent_only = lifecycle::deprecated(), page = lifecycle::deprecated(), per_page = lifecycle::deprecated(), size = 1000, after = NULL, all_pages = FALSE, sort = NULL, method = "GET", error_browser = lifecycle::deprecated(), api_key = Sys.getenv("PATENTSVIEW_API_KEY"), ... )
query |
The query that the API will use to filter records.
|
fields |
A character vector of the fields that you want returned to you.
A value of Note: The primary key columns for a given endpoint will be appended to your
list of fields within Note: If you specify all fields in a given group using their full qualified
names, the group name will be substituted in the HTTTP request. This helps
make HTTP requests shorter. This substitution will not happen when you specify
all of the primary-entity fields (e.g., passing
|
endpoint |
The web service resource you wish to search. Use
|
subent_cnts |
|
mtchd_subent_only |
|
page |
|
per_page |
|
size |
The number of records that should be returned per page. This
value can be as high as 1,000 (e.g., |
after |
A list of sort key values that defaults to NULL. This
exposes the API's paging parameter for users who want to implement their own
paging. It cannot be set when |
all_pages |
Do you want to download all possible pages of output? If
|
sort |
A named character vector where the name indicates the field to
sort by and the value indicates the direction of sorting (direction should
be either "asc" or "desc"). For example, |
method |
The HTTP method that you want to use to send the request. Possible values include "GET" or "POST". Use the POST method when your query is very long (say, over 2,000 characters in length). |
error_browser |
|
api_key |
API key, it defaults to Sys.getenv("PATENTSVIEW_API_KEY"). Request a key here. |
... |
Curl options passed along to httr2's |
A list with the following three elements:
A list with one element - a named data frame containing the
data returned by the server. Each row in the data frame corresponds to a
single value for the primary entity, as defined by the endpoint's primary key.
For example, if you search the assignee endpoint, then the data frame
will be on the assignee-level, where each row corresponds to a single
assignee (primary key would be assignee_id
). Fields that are not on
the assignee-level would be returned in list columns.
Entity counts across all pages of output (not just the page returned to you).
Details of the HTTP request that was sent to the server.
When you set all_pages = TRUE
, you will only get a sample request.
In other words, you will not be given multiple requests for the multiple
calls that were made to the server (one for each page of results).
## Not run: search_pv(query = '{"_gt":{"patent_year":2010}}') search_pv( query = qry_funs$gt(patent_year = 2010), fields = get_fields("patent", c("patents", "assignees")) ) search_pv( query = qry_funs$gt(patent_year = 2010), method = "POST", fields = "patent_id", sort = c("patent_id" = "asc") ) search_pv( query = qry_funs$eq(inventor_name_last = "Crew"), endpoint = "inventor", all_pages = TRUE ) search_pv( query = qry_funs$contains(assignee_individual_name_last = "Smith"), endpoint = "assignee" ) search_pv( query = qry_funs$contains(inventors.inventor_name_last = "Smith"), endpoint = "patent", timeout = 40 ) search_pv( query = qry_funs$eq(patent_id = "11530080"), fields = "application" ) ## End(Not run)
## Not run: search_pv(query = '{"_gt":{"patent_year":2010}}') search_pv( query = qry_funs$gt(patent_year = 2010), fields = get_fields("patent", c("patents", "assignees")) ) search_pv( query = qry_funs$gt(patent_year = 2010), method = "POST", fields = "patent_id", sort = c("patent_id" = "asc") ) search_pv( query = qry_funs$eq(inventor_name_last = "Crew"), endpoint = "inventor", all_pages = TRUE ) search_pv( query = qry_funs$contains(assignee_individual_name_last = "Smith"), endpoint = "assignee" ) search_pv( query = qry_funs$contains(inventors.inventor_name_last = "Smith"), endpoint = "patent", timeout = 40 ) search_pv( query = qry_funs$eq(patent_id = "11530080"), fields = "application" ) ## End(Not run)
This function converts a single data frame that has subentity-level list columns in it into multiple data frames, one for each entity/subentity. The multiple data frames can be merged together using the primary key variable specified by the user (see the relational data chapter in "R for Data Science" for an in-depth introduction to joining tabular data).
unnest_pv_data(data, pk = lifecycle::deprecated())
unnest_pv_data(data, pk = lifecycle::deprecated())
data |
The data returned by |
pk |
|
A list with multiple data frames, one for each entity/subentity.
Each data frame will have the pk
column in it, so you can link the
tables together as needed.
## Not run: fields <- c( "patent_id", "patent_title", "inventors.inventor_city", "inventors.inventor_country" ) res <- search_pv(query = '{"_gte":{"patent_year":2015}}', fields = fields) unnest_pv_data(data = res$data) ## End(Not run)
## Not run: fields <- c( "patent_id", "patent_title", "inventors.inventor_city", "inventors.inventor_country" ) res <- search_pv(query = '{"_gte":{"patent_year":2015}}', fields = fields) unnest_pv_data(data = res$data) ## End(Not run)
This function evaluates whatever code you pass to it in the environment of
the qry_funs
list. This allows you to cut down on typing when
writing your queries. If you want to cut down on typing even more, you can
try assigning the qry_funs
list into your global environment
with: list2env(qry_funs, envir = globalenv())
.
with_qfuns(code, envir = parent.frame())
with_qfuns(code, envir = parent.frame())
code |
Code to evaluate. See example. |
envir |
Where should R look for objects present in |
The result of code
- i.e., your query.
# Without with_qfuns, we have to do: qry_funs$and( qry_funs$gte(patent_date = "2007-01-01"), qry_funs$text_phrase(patent_abstract = c("computer program")), qry_funs$or( qry_funs$eq(inventors.inventor_name_last = "Ihaka"), qry_funs$eq(inventors.inventor_name_last = "Chris") ) ) # ...With it, this becomes: with_qfuns( and( gte(patent_date = "2007-01-01"), text_phrase(patent_abstract = c("computer program")), or( eq(inventors.inventor_name_last = "Ihaka"), eq(inventors.inventor_name_last = "Chris") ) ) )
# Without with_qfuns, we have to do: qry_funs$and( qry_funs$gte(patent_date = "2007-01-01"), qry_funs$text_phrase(patent_abstract = c("computer program")), qry_funs$or( qry_funs$eq(inventors.inventor_name_last = "Ihaka"), qry_funs$eq(inventors.inventor_name_last = "Chris") ) ) # ...With it, this becomes: with_qfuns( and( gte(patent_date = "2007-01-01"), text_phrase(patent_abstract = c("computer program")), or( eq(inventors.inventor_name_last = "Ihaka"), eq(inventors.inventor_name_last = "Chris") ) ) )