Package 'patentsview'

Title: An R Client to the 'PatentsView' API
Description: Provides functions to simplify the 'PatentsView' API (<https://patentsview.org/apis/purpose>) query language, send GET and POST requests to the API's twelve endpoints, and parse the data that comes back.
Authors: Christopher Baker [aut, cre]
Maintainer: Christopher Baker <[email protected]>
License: MIT + file LICENSE
Version: 0.3.0
Built: 2024-11-23 15:29:08 UTC
Source: https://github.com/mustberuss/patentsview

Help Index


Cast PatentsView data

Description

This will cast the data fields returned by search_pv so that they have their most appropriate data types (e.g., date, numeric, etc.).

Usage

cast_pv_data(data)

Arguments

data

The data returned by search_pv. This is the first element of the three-element result object you got back from search_pv. It should be a list of length 1, with one data frame inside it. See examples.

Value

The same type of object that you passed into cast_pv_data.

Examples

## Not run: 

fields <- c("patent_date", "patent_title", "patent_year")
res <- search_pv(query = "{\"patent_id\":\"5116621\"}", fields = fields)
cast_pv_data(data = res$data)

## End(Not run)

Fields data frame

Description

A data frame containing the names of retrievable fields for each of the endpoints. You can find this data on the API's online documentation for each endpoint as well (e.g., the patent endpoint field list table).

Usage

fieldsdf

Format

A data frame with the following columns:

endpoint

The endpoint that this field record is for

field

The complete name of the field, including the parent group if applicable

data_type

The field's input data type

group

The group the field belongs to

common_name

The field name without the parent group structure


Get endpoints

Description

This function reminds the user what the possible PatentSearch API endpoints are. (Note that the API was originally know as the PatentsView API.)

Usage

get_endpoints()

Value

A character vector with the names of each endpoint.


Get list of retrievable fields

Description

This function returns a vector of fields that you can retrieve from a given API endpoint (i.e., the fields you can pass to the fields argument in search_pv). You can limit these fields to only cover certain entity group(s) as well (which is recommended, given the large number of possible fields for each endpoint).

Usage

get_fields(endpoint, groups = NULL, include_pk = FALSE)

Arguments

endpoint

The API endpoint whose field list you want to get. See get_endpoints for a list of the 23 endpoints.

groups

A character vector giving the group(s) whose fields you want returned. A value of NULL indicates that you want all of the endpoint's fields (i.e., do not filter the field list based on group membership). See the Nested Fields listed online to see which groups you can specify for a given endpoint (e.g., the patents endpoint table), or use the fieldsdf table (e.g., unique(fieldsdf[fieldsdf$endpoint == "patent", "group"])). An empty string can also be specified to return all top level (non-nested) fields for the endpoint.

include_pk

Boolean on whether to include the endpoint's primary key, defaults to FALSE. The primary key is needed if you plan on calling unnest_pv_data on the results of search_pv

Value

A character vector with field names.

Examples

# Get all top level (non-nested) fields for the patent endpoint:
fields <- get_fields(endpoint = "patent", groups = c(""))

# ...Then pass to search_pv:
## Not run: 

search_pv(
  query = '{"_gte":{"patent_date":"2007-01-04"}}',
  fields = fields
)

## End(Not run)
# Get all patent and assignee-level fields for the patent endpoint:
fields <- get_fields(endpoint = "patent", groups = c("assignees", ""))

## Not run: 
# ...Then pass to search_pv:
search_pv(
  query = '{"_gte":{"patent_date":"2007-01-04"}}',
  fields = fields
)

## End(Not run)
# Get the nested inventors fields and the primary key in order to call unnest_pv_data
# on the returned data.  unnest_pv_data would throw an error if the primary key was
# not present in the results.
fields <- get_fields(endpoint = "patent", groups = c("inventors"), include_pk = TRUE)

## Not run: 
# ...Then pass to search_pv and unnest the results
results <- search_pv(
  query = '{"_gte":{"patent_date":"2007-01-04"}}',
  fields = fields
)
unnest_pv_data(results$data)

## End(Not run)

Get OK primary key

Description

This function suggests a value that you could use for the pk argument in unnest_pv_data, based on the endpoint you searched. It will return a potential unique identifier for a given entity (i.e., a given endpoint). For example, it will return "patent_id" when endpoint_or_entity = "patent". It would return the same value if the entity name "patents" was passed via get_ok_pk(names(pv_return$data)) where pv_return was returned from search_pv.

Usage

get_ok_pk(endpoint_or_entity)

Arguments

endpoint_or_entity

The endpoint or entity name for which you would like to know a potential primary key for.

Value

The name of a primary key (pk) that you could pass to unnest_pv_data.

Examples

get_ok_pk(endpoint_or_entity = "inventor") # Returns "inventor_id"
get_ok_pk(endpoint_or_entity = "cpc_group") # Returns "cpc_group_id"

Pad patent_id

Description

This function strategically pads a patent_id with zeroes to 8 characters, needed for custom paging and possibly when querying by patent_id.

Usage

pad_patent_id(patent_id)

Arguments

patent_id

The patent_id that needs to be padded. It can be the patent_id for a utility, design, plant or reissue patent.

Examples

## Not run: 
  padded <- pad_patent_id("RE36479")

  padded2 <- pad_patent_id("3930306")

## End(Not run)

List of query functions

Description

A list of functions that make it easy to write PatentsView queries. See the details section below for a list of the 15 functions, as well as the writing queries vignette for further details.

Usage

qry_funs

Format

An object of class list of length 15.

Details

1. Comparison operator functions

There are 6 comparison operator functions that work with fields of type integer, float, date, or string:

  • eq - Equal to

  • neq - Not equal to

  • gt - Greater than

  • gte - Greater than or equal to

  • lt - Less than

  • lte - Less than or equal to

There are 2 comparison operator functions that only work with fields of type string:

  • begins - The string begins with the value string

  • contains - The string contains the value string

There are 3 comparison operator functions that only work with fields of type fulltext:

  • text_all - The text contains all the words in the value string

  • text_any - The text contains any of the words in the value string

  • text_phrase - The text contains the exact phrase of the value string

2. Array functions

There are 2 array functions:

  • and - Both members of the array must be true

  • or - Only one member of the array must be true

3. Negation function

There is 1 negation function:

  • not - The comparison is not true

4. Convenience function

There is 1 convenience function:

  • in_range - Builds a <= x <= b query

Value

An object of class pv_query. This is basically just a simple list with a print method attached to it.

Examples

qry_funs$eq(patent_date = "2001-01-01")

qry_funs$not(qry_funs$eq(patent_date = "2001-01-01"))

qry_funs$in_range(patent_year = c(2010, 2021))

qry_funs$in_range(patent_date = c("1976-01-01", "1983-02-28"))

Retrieve Linked Data

Description

Some of the endpoints now return HATEOAS style links to get more data. E.g., the patent endpoint may return a link such as: "https://search.patentsview.org/api/v1/inventor/fl:th_ln:jefferson-1/"

Usage

retrieve_linked_data(
  url,
  encoded_url = FALSE,
  api_key = Sys.getenv("PATENTSVIEW_API_KEY"),
  ...
)

Arguments

url

A link that was returned by the API on a previous call, an example in the documentation or a Request URL from the API's Swagger UI page.

encoded_url

boolean to indicate whether the url has been URL encoded, defaults to FALSE. Set to TRUE for Request URLs from Swagger UI.

api_key

API key, it defaults to Sys.getenv("PATENTSVIEW_API_KEY"). Request a key here.

...

Curl options passed along to httr2's req_options function.

Value

A list with the following three elements:

data

A list with one element - a named data frame containing the data returned by the server. Each row in the data frame corresponds to a single value for the primary entity. For example, if you search the assignee endpoint, then the data frame will be on the assignee-level, where each row corresponds to a single assignee. Fields that are not on the assignee-level would be returned in list columns.

query_results

Entity counts across all pages of output (not just the page returned to you).

request

Details of the GET HTTP request that was sent to the server.

Examples

## Not run: 

retrieve_linked_data(
  "https://search.patentsview.org/api/v1/cpc_group/G01S7:4811/"
)

endpoint_url <- "https://search.patentsview.org/api/v1/patent/"
q_param <- '?q={"_text_any":{"patent_title":"COBOL cotton gin"}}'
s_and_o_params <- '&s=[{"patent_id": "asc" }]&o={"size":50}'
f_param <- '&f=["inventors.inventor_name_last","patent_id","patent_date","patent_title"]'
# (URL broken up to avoid a long line warning in this Rd)

retrieve_linked_data(
  paste0(endpoint_url, q_param, s_and_o_params, f_param)
)

retrieve_linked_data(
  "https://search.patentsview.org/api/v1/patent/?q=%7B%22patent_date%22%3A%221976-01-06%22%7D",
  encoded_url = TRUE
)

## End(Not run)

Search PatentsView

Description

This function makes an HTTP request to the PatentsView API for data matching the user's query.

Usage

search_pv(
  query,
  fields = NULL,
  endpoint = "patent",
  subent_cnts = FALSE,
  mtchd_subent_only = lifecycle::deprecated(),
  page = lifecycle::deprecated(),
  per_page = lifecycle::deprecated(),
  size = 1000,
  after = NULL,
  all_pages = FALSE,
  sort = NULL,
  method = "GET",
  error_browser = NULL,
  api_key = Sys.getenv("PATENTSVIEW_API_KEY"),
  ...
)

Arguments

query

The query that the API will use to filter records. query can come in any one of the following forms:

  • A character string with valid JSON.
    E.g., '{"_gte":{"patent_date":"2007-01-04"}}'

  • A list which will be converted to JSON by search_pv.
    E.g., list("_gte" = list("patent_date" = "2007-01-04"))

  • An object of class pv_query, which you create by calling one of the functions found in the qry_funs list...See the writing queries vignette for details.
    E.g., qry_funs$gte(patent_date = "2007-01-04")

fields

A character vector of the fields that you want returned to you. A value of NULL indicates to the API that it should return the default fields for that endpoint. Acceptable fields for a given endpoint can be found at the API's online documentation (e.g., check out the field list for the patents endpoint) or by viewing the fieldsdf data frame (View(fieldsdf)). You can also use get_fields to list out the fields available for a given endpoint.

Nested fields can be fully qualified, e.g., "application.filing_date" or the group name can be used to retrieve all of its nested fields, E.g. "application". The latter would be similar to passing get_fields("patent", group = "application") except it's the API that decides what fields to return.

endpoint

The web service resource you wish to search. Use get_endpoints() to list the available endpoints.

subent_cnts

[Deprecated] This is always FALSE in the new version of the API as the total counts of unique subentities is no longer available.

mtchd_subent_only

[Deprecated] This is always FALSE in the new version of the API as non-matched subentities will always be returned.

page

[Deprecated] The new version of the API does not use page as a parameter for paging, it uses after.

per_page

[Deprecated] The API now uses size

size

The number of records that should be returned per page. This value can be as high as 1,000 (e.g., size = 1000).

after

A list of sort key values that defaults to NULL. This exposes the API's paging parameter for users who want to implement their own paging. It cannot be set when all_pages = TRUE as the R package manipulates it for users automatically. See result set paging

all_pages

Do you want to download all possible pages of output? If all_pages = TRUE, the value of size is ignored.

sort

A named character vector where the name indicates the field to sort by and the value indicates the direction of sorting (direction should be either "asc" or "desc"). For example, sort = c("patent_id" = "asc") or
sort = c("patent_id" = "asc", "patent_date" = "desc"). sort = NULL (the default) means do not sort the results. You must include any fields that you wish to sort by in fields.

method

The HTTP method that you want to use to send the request. Possible values include "GET" or "POST". Use the POST method when your query is very long (say, over 2,000 characters in length).

error_browser

[Deprecated]

api_key

API key, it defaults to Sys.getenv("PATENTSVIEW_API_KEY"). Request a key here.

...

Curl options passed along to httr2's req_options when we do GETs or POSTs.

Value

A list with the following three elements:

data

A list with one element - a named data frame containing the data returned by the server. Each row in the data frame corresponds to a single value for the primary entity. For example, if you search the assignee endpoint, then the data frame will be on the assignee-level, where each row corresponds to a single assignee. Fields that are not on the assignee-level would be returned in list columns.

query_results

Entity counts across all pages of output (not just the page returned to you).

request

Details of the HTTP request that was sent to the server. When you set all_pages = TRUE, you will only get a sample request. In other words, you will not be given multiple requests for the multiple calls that were made to the server (one for each page of results).

Examples

## Not run: 

search_pv(query = '{"_gt":{"patent_year":2010}}')

search_pv(
  query = qry_funs$gt(patent_year = 2010),
  fields = get_fields("patent", c("", "assignees"))
)

search_pv(
  query = qry_funs$gt(patent_year = 2010),
  method = "POST",
  fields = "patent_id",
  sort = c("patent_id" = "asc")
)

search_pv(
  query = qry_funs$eq(inventor_name_last = "Crew"),
  endpoint = "inventor",
  all_pages = TRUE
)

search_pv(
  query = qry_funs$contains(assignee_individual_name_last = "Smith"),
  endpoint = "assignee"
)

search_pv(
  query = qry_funs$contains(inventors.inventor_name_last = "Smith"),
  endpoint = "patent",
  timeout = 40
)

search_pv(
  query = qry_funs$eq(patent_id = "11530080"),
  fields = "application"
)

## End(Not run)

Unnest PatentsView data

Description

This function converts a single data frame that has subentity-level list columns in it into multiple data frames, one for each entity/subentity. The multiple data frames can be merged together using the primary key variable specified by the user (see the relational data chapter in "R for Data Science" for an in-depth introduction to joining tabular data).

Usage

unnest_pv_data(data, pk = get_ok_pk(names(data)))

Arguments

data

The data returned by search_pv. This is the first element of the three-element result object you got back from search_pv. It should be a list of length 1, with one data frame inside it. See examples.

pk

The column/field name that will link the data frames together. This should be the unique identifier for the primary entity. For example, if you used the patent endpoint in your call to search_pv, you could specify pk = "patent_id". This identifier has to have been included in your fields vector when you called search_pv. You can use get_ok_pk to suggest a potential primary key for your data.

Value

A list with multiple data frames, one for each entity/subentity. Each data frame will have the pk column in it, so you can link the tables together as needed.

Examples

## Not run: 

fields <- c("patent_id", "patent_title", "inventors.inventor_city", "inventors.inventor_country")
res <- search_pv(query = '{"_gte":{"patent_year":2015}}', fields = fields)
unnest_pv_data(data = res$data, pk = "patent_id")

## End(Not run)

With qry_funs

Description

This function evaluates whatever code you pass to it in the environment of the qry_funs list. This allows you to cut down on typing when writing your queries. If you want to cut down on typing even more, you can try assigning the qry_funs list into your global environment with: list2env(qry_funs, envir = globalenv()).

Usage

with_qfuns(code, envir = parent.frame())

Arguments

code

Code to evaluate. See example.

envir

Where should R look for objects present in code that aren't present in qry_funs.

Value

The result of code - i.e., your query.

Examples

qry_funs$and(
  qry_funs$gte(patent_date = "2007-01-01"),
  qry_funs$text_phrase(patent_abstract = c("computer program")),
  qry_funs$or(
    qry_funs$eq(inventors.inventor_name_last = "Ihaka"),
    qry_funs$eq(inventors.inventor_name_last = "Chris")
  )
)

# ...With it, this becomes:
with_qfuns(
  and(
    gte(patent_date = "2007-01-01"),
    text_phrase(patent_abstract = c("computer program")),
    or(
      eq(inventors.inventor_name_last = "Ihaka"),
      eq(inventors.inventor_name_last = "Chris")
    )
  )
)