| Title: | Filter and Query Data Frames in 'shiny' Using an LLM Chat Interface |
|---|---|
| Description: | Adds an LLM-powered chatbot to your 'shiny' app, that can turn your users' natural language questions into 'SQL' queries that run against your data, and return the result as a reactive data frame. Use it to drive reactive calculations, visualizations, downloads, and more. |
| Authors: | Garrick Aden-Buie [aut, cre] (ORCID: <https://orcid.org/0000-0002-7111-0077>), Joe Cheng [aut, ccp], Carson Sievert [aut] (ORCID: <https://orcid.org/0000-0002-4958-2844>), Posit Software, PBC [cph, fnd] |
| Maintainer: | Garrick Aden-Buie <[email protected]> |
| License: | MIT + file LICENSE |
| Version: | 0.3.0.9000 |
| Built: | 2026-06-26 18:34:32 UTC |
| Source: | https://github.com/posit-dev/querychat |
A DataSource implementation that wraps a data frame using DuckDB or SQLite for SQL query execution.
This class creates an in-memory database connection and registers the provided data frame as a table. All SQL queries are executed against this database table. See DBISource for the full description of available methods.
By default, DataFrameSource uses the first available engine from duckdb
(checked first) or RSQLite. You can explicitly set the engine parameter to
choose between "duckdb" or "sqlite", or set the global option
querychat.DataFrameSource.engine to choose the default engine for all
DataFrameSource instances. At least one of these packages must be installed.
DataSource -> DBISource -> DataFrameSource
DataFrameSource$new()Create a new DataFrameSource
DataFrameSource$new(
df,
table_name,
engine = getOption("querychat.DataFrameSource.engine", NULL)
)
dfA data frame.
table_nameName to use for the table in SQL queries. Must be a valid table name (start with letter, contain only letters, numbers, and underscores)
engineDatabase engine to use: "duckdb" or "sqlite". Set the
global option querychat.DataFrameSource.engine to specify the default
engine for all instances. If NULL (default), uses the first available
engine from duckdb or RSQLite (in that order).
A new DataFrameSource object
DataFrameSource$cleanup()Disconnect from the database and shut down the DuckDB instance if used.
DataFrameSource$cleanup()
NULL (invisibly)
DataFrameSource$clone()The objects of this class are cloneable with this method.
DataFrameSource$clone(deep = FALSE)
deepWhether to make a deep clone.
# Create a data frame source (uses first available: duckdb or sqlite) df_source <- DataFrameSource$new(mtcars, "mtcars") # Get database type df_source$get_db_type() # Returns "DuckDB" or "SQLite" # Execute a query result <- df_source$execute_query("SELECT * FROM mtcars WHERE mpg > 25") # Explicitly choose an engine df_sqlite <- DataFrameSource$new(mtcars, "mtcars", engine = "sqlite") # Clean up when done df_source$cleanup() df_sqlite$cleanup()# Create a data frame source (uses first available: duckdb or sqlite) df_source <- DataFrameSource$new(mtcars, "mtcars") # Get database type df_source$get_db_type() # Returns "DuckDB" or "SQLite" # Execute a query result <- df_source$execute_query("SELECT * FROM mtcars WHERE mpg > 25") # Explicitly choose an engine df_sqlite <- DataFrameSource$new(mtcars, "mtcars", engine = "sqlite") # Clean up when done df_source$cleanup() df_sqlite$cleanup()
An abstract R6 class defining the interface that custom QueryChat data sources must implement. This class should not be instantiated directly; instead, use one of its concrete implementations like DataFrameSource or DBISource.
table_nameName of the table to be used in SQL queries
DataSource$get_db_type()Get the database type
DataSource$get_db_type()
A string describing the database type (e.g., "DuckDB", "SQLite")
DataSource$get_schema()Get schema information about the table
DataSource$get_schema(categorical_threshold = 20, table_spec = NULL)
categorical_thresholdMaximum number of unique values for a text column to be considered categorical
A string containing schema information formatted for LLM prompts
DataSource$get_schema_result()DataSource$get_schema_result(categorical_threshold = 20, table_spec = NULL)
DataSource$execute_query()Execute a SQL query and return results
DataSource$execute_query(query)
querySQL query string to execute
A data frame containing query results
DataSource$test_query()Test a SQL query by fetching only one row
DataSource$test_query(query, require_all_columns = FALSE)
querySQL query string to test
require_all_columnsIf TRUE, validates that the result includes
all original table columns (default: FALSE)
A data frame containing one row of results (or empty if no matches)
DataSource$get_data()Get the unfiltered data as a data frame
DataSource$get_data()
A data frame containing all data from the table
DataSource$get_data_description()Get a human-readable data description for the system prompt.
Subclasses may override this to provide metadata-derived descriptions (e.g., pin title/description). The default returns an empty string.
DataSource$get_data_description()
A string, or empty string if no description is available.
DataSource$cleanup()Clean up resources (close connections, etc.)
DataSource$cleanup()
NULL (invisibly)
DataSource$clone()The objects of this class are cloneable with this method.
DataSource$clone(deep = FALSE)
deepWhether to make a deep clone.
MyDataSource <- R6::R6Class( "MyDataSource", inherit = DataSource, public = list( initialize = function(table_name) { self$table_name <- table_name } # Implement abstract methods here... ) )MyDataSource <- R6::R6Class( "MyDataSource", inherit = DataSource, public = list( initialize = function(table_name) { self$table_name <- table_name } # Implement abstract methods here... ) )
A DataSource implementation for DBI database connections (SQLite, PostgreSQL, MySQL, etc.). This class wraps a DBI connection and provides SQL query execution against a single table in the database.
DataSource -> DBISource
DBISource$new()Create a new DBISource
DBISource$new(conn, table_name)
connA DBI connection object
table_nameName of the table in the database. Can be a character
string or a DBI::Id() object for tables in catalogs/schemas
A new DBISource object
DBISource$get_db_type()Get the database type
DBISource$get_db_type()
A string identifying the database type
DBISource$get_schema()Get schema information for the database table
DBISource$get_schema(categorical_threshold = 20, table_spec = NULL)
categorical_thresholdMaximum number of unique values for a text column to be considered categorical (default: 20)
A string describing the schema
DBISource$get_schema_result()DBISource$get_schema_result(categorical_threshold = 20, table_spec = NULL)
DBISource$get_semantic_views_description()Get information about semantic views (if any) for the system prompt.
DBISource$get_semantic_views_description()
A string with semantic view information, or empty string if none
DBISource$execute_query()Execute a SQL query
DBISource$execute_query(query)
querySQL query string. If NULL or empty, returns all data
A data frame with query results
DBISource$test_query()Test a SQL query by fetching only one row
DBISource$test_query(query, require_all_columns = FALSE)
querySQL query string
require_all_columnsIf TRUE, validates that the result includes
all original table columns (default: FALSE)
A data frame with one row of results
DBISource$get_data()Get all data from the table
DBISource$get_data()
A data frame containing all data
DBISource$cleanup()Disconnect from the database
DBISource$cleanup()
NULL (invisibly)
DBISource$clone()The objects of this class are cloneable with this method.
DBISource$clone(deep = FALSE)
deepWhether to make a deep clone.
# Connect to a database con <- DBI::dbConnect(RSQLite::SQLite(), ":memory:") DBI::dbWriteTable(con, "mtcars", mtcars) # Create a DBI source db_source <- DBISource$new(con, "mtcars") # Get database type db_source$get_db_type() # Returns "SQLite" # Execute a query result <- db_source$execute_query("SELECT * FROM mtcars WHERE mpg > 25") # Note: cleanup() will disconnect the connection # If you want to keep the connection open, don't call cleanup() db_source$cleanup()# Connect to a database con <- DBI::dbConnect(RSQLite::SQLite(), ":memory:") DBI::dbWriteTable(con, "mtcars", mtcars) # Create a DBI source db_source <- DBISource$new(con, "mtcars") # Get database type db_source$get_db_type() # Returns "SQLite" # Execute a query result <- db_source$execute_query("SELECT * FROM mtcars WHERE mpg > 25") # Note: cleanup() will disconnect the connection # If you want to keep the connection open, don't call cleanup() db_source$cleanup()
A DataSource implementation that reads data from a
pins board. When the "duckdb" engine is used
and the pin type is one DuckDB can read natively (parquet, CSV, JSON), the
data is loaded directly from the cached pin files into DuckDB without
deserializing into R. For other pin types (e.g. RDS), or when the "sqlite"
engine is used, the data is deserialized via pin_read() and must produce
a data frame (or tibble), which is then registered with the chosen engine
just like DataFrameSource.
When loaded into DuckDB, the connection's external file access is locked down so that LLM-generated SQL cannot reach the filesystem.
If the pin has a title, description, or tags, QueryChat uses them as
the default data_description, which you can override.
PinSource materializes the full dataset into DuckDB. For large parquet
pins where you want lazy query execution, read the pin files yourself and
pass a tbl_sql to querychat() instead:
paths <- pins::pin_download(board, "my_pin")
con <- DBI::dbConnect(duckdb::duckdb())
DBI::dbExecute(
con,
sprintf("CREATE VIEW my_pin AS SELECT * FROM read_parquet('%s')", paths[1])
)
qc <- querychat(dplyr::tbl(con, "my_pin"))
The pin files are still downloaded to a local cache — pin_download()
always fetches them. But rather than loading everything into memory, DuckDB
reads the parquet file lazily through dbplyr.
This approach skips the security lockdown that PinSource applies, so
LLM-generated SQL can access files on the local system.
DataSource -> DBISource -> PinSource
PinSource$new()Create a new PinSource
PinSource$new(
board,
name,
...,
table_name = name,
version = NULL,
engine = getOption("querychat.DataFrameSource.engine", NULL)
)
boardA pins board object (e.g. from pins::board_folder() or
pins::board_connect()).
nameName of the pin to read.
...Not used; included for extensibility.
table_nameName to use for the table in SQL queries. Defaults to the pin name.
versionPin version to read. If NULL (default), reads the
latest version.
engineDatabase engine to use: "duckdb" or "sqlite". Set the
global option querychat.DataFrameSource.engine to specify the default
engine. If NULL (default), uses the first available engine from
duckdb or RSQLite (in that order). Parquet, CSV, and JSON pins are read
most efficiently with the "duckdb" engine; with "sqlite" they are
deserialized via pin_read() instead.
A new PinSource object
PinSource$get_data_description()Get a human-readable description of the pin for use in the system prompt.
PinSource$get_data_description()
A string with the pin title, description, and tags, or an empty string if none are set.
PinSource$clone()The objects of this class are cloneable with this method.
PinSource$clone(deep = FALSE)
deepWhether to make a deep clone.
if (rlang::is_installed(c("pins", "duckdb"))) { # Create a temporary board and pin some data board <- pins::board_temp() pins::pin_write(board, mtcars, "mtcars", type = "parquet") # Create a PinSource ps <- PinSource$new(board, "mtcars") # Query the pinned data ps$execute_query("SELECT * FROM mtcars WHERE mpg > 25") ps$cleanup() }if (rlang::is_installed(c("pins", "duckdb"))) { # Create a temporary board and pin some data board <- pins::board_temp() pins::pin_write(board, mtcars, "mtcars", type = "parquet") # Create a PinSource ps <- PinSource$new(board, "mtcars") # Query the pinned data ps$execute_query("SELECT * FROM mtcars WHERE mpg > 25") ps$cleanup() }
Convenience functions for wrapping QueryChat creation (i.e., querychat())
and app launching (i.e., querychat_app()).
querychat( data_source, table_name = missing_arg(), ..., id = NULL, greeting = NULL, client = NULL, tools = c("filter", "query"), data_description = NULL, categorical_threshold = 20, extra_instructions = NULL, prompt_template = NULL, data_dict = NULL, cleanup = NA ) querychat_app( data_source, table_name = missing_arg(), ..., id = NULL, greeting = NULL, client = NULL, tools = c("filter", "query"), data_description = NULL, categorical_threshold = 20, extra_instructions = NULL, prompt_template = NULL, data_dict = NULL, cleanup = NA, bookmark_store = "url" )querychat( data_source, table_name = missing_arg(), ..., id = NULL, greeting = NULL, client = NULL, tools = c("filter", "query"), data_description = NULL, categorical_threshold = 20, extra_instructions = NULL, prompt_template = NULL, data_dict = NULL, cleanup = NA ) querychat_app( data_source, table_name = missing_arg(), ..., id = NULL, greeting = NULL, client = NULL, tools = c("filter", "query"), data_description = NULL, categorical_threshold = 20, extra_instructions = NULL, prompt_template = NULL, data_dict = NULL, cleanup = NA, bookmark_store = "url" )
data_source |
Either a data.frame or a database connection (e.g., DBI connection). |
table_name |
A string specifying the table name to use in SQL queries. |
... |
Additional arguments (currently unused). |
id |
Optional module ID for the QueryChat instance. |
greeting |
Optional initial message to display to users. |
client |
Optional chat client. |
tools |
Which querychat tools to include in the chat client. |
data_description |
Optional description of the data. |
categorical_threshold |
For text columns, the maximum number of unique values to consider as a categorical variable. Default is 20. |
extra_instructions |
Optional additional instructions for the chat model. |
prompt_template |
Optional path to or string of a custom prompt template. |
data_dict |
Optional data dictionary. A path to a YAML file or a list of paths. |
cleanup |
Whether or not to automatically run |
bookmark_store |
The bookmarking storage method. Default is |
A QueryChat object. See QueryChat for available methods.
Invisibly returns the chat object after the app stops.
# Quick start - chat with mtcars dataset in one line querychat_app(mtcars)# Quick start - chat with mtcars dataset in one line querychat_app(mtcars)
QueryChat is an R6 class built on Shiny, shinychat, and ellmer to enable
interactive querying of data using natural language. It leverages large
language models (LLMs) to translate user questions into SQL queries, execute
them against a data source (data frame or database), and various ways of
accessing/displaying the results.
The QueryChat class takes your data (a data frame or database connection)
as input and provides methods to:
Generate a chat UI for natural language queries (e.g., $app(),
$sidebar())
Initialize server logic that returns session-specific reactive values (via
$server())
Access reactive data, SQL queries, and titles through the returned server
values (use qc_vals$table("name") for multi-table access)
library(querychat)
# Create a QueryChat object
qc <- QueryChat$new(mtcars)
# Quick start: run a complete app
qc$app()
# Or build a custom Shiny app
ui <- page_sidebar(
qc$sidebar(),
verbatimTextOutput("sql"),
dataTableOutput("data")
)
server <- function(input, output, session) {
qc_vals <- qc$server()
output$sql <- renderText(qc_vals$sql())
output$data <- renderDataTable(qc_vals$df())
}
shinyApp(ui, server)
greetingThe greeting message displayed to users.
idID for the QueryChat instance.
id_overrideWhether the ID was explicitly set by the user.
toolsThe allowed tools for the chat client.
greeterThe QueryChatGreeter controlling greeting generation;
access its $tables and $prompt.
system_promptGet the system prompt.
data_sourceRemoved. Use $add_table() and $remove_table() to manage tables.
QueryChat$new()Create a new QueryChat object.
QueryChat$new(
data_source,
table_name = missing_arg(),
...,
id = NULL,
greeting = NULL,
client = NULL,
tools = c("filter", "query"),
data_description = NULL,
categorical_threshold = 20,
extra_instructions = NULL,
prompt_template = NULL,
data_dict = NULL,
cleanup = NA
)
data_sourceEither a data.frame, a database connection (e.g., DBI
connection), or NULL to defer setting the data source until later.
When NULL, the data source must be added via $add_table() or passed
to $server() before calling methods that require data access.
table_nameA string specifying the table name to use in SQL
queries. If data_source is a data.frame, this is the name to refer to
it by in queries (typically the variable name). If not provided, will
be inferred from the variable name for data.frame inputs. For database
connections or NULL data sources, this parameter is required.
...Additional arguments (currently unused).
idOptional module ID for the QueryChat instance. If not provided,
will be auto-generated from table_name. The ID is used to namespace
the Shiny module.
greetingOptional initial message to display to users. Can be a
character string (in Markdown format) or a file path. If not provided,
a greeting will be generated at the start of each conversation using
the LLM, which adds latency and cost. Use $generate_greeting() to
create a greeting to save and reuse.
clientOptional chat client. Can be:
An ellmer::Chat object
A string to pass to ellmer::chat() (e.g., "openai/gpt-4o")
NULL (default): Uses the querychat.client option, the
QUERYCHAT_CLIENT environment variable, or defaults to
ellmer::chat_openai()
toolsWhich querychat tools to include in the chat client, by
default. "filter" includes the tools for filtering and resetting the
dashboard and "query" includes the tool for executing SQL queries.
Use tools = "filter" when you only want the dashboard filtering tools,
or when you want to disable the querying tool entirely to prevent the
LLM from seeing any of the data in your dataset. The legacy name
"update" is still accepted as an alias for "filter".
data_descriptionOptional description of the data in plain text or Markdown. Can be a string or a file path. This provides context to the LLM about what the data represents.
categorical_thresholdFor text columns, the maximum number of unique values to consider as a categorical variable. Default is 20.
extra_instructionsOptional additional instructions for the chat model in plain text or Markdown. Can be a string or a file path.
prompt_templateOptional path to or string of a custom prompt template file. If not provided, the default querychat template will be used. See the package prompts directory for the default template format.
data_dictOptional data dictionary. A path to a YAML file, or a
list of YAML file paths. See read_data_dict() for the expected format.
cleanupWhether or not to automatically run $cleanup() when the
Shiny session/app stops. By default, cleanup only occurs if QueryChat
gets created within a Shiny session. Set to TRUE to always clean up,
or FALSE to never clean up automatically.
A new QueryChat object.
QueryChat$add_table()Add a table to this QueryChat instance.
QueryChat$add_table( data_source, table_name, replace = FALSE, include_in_greeting = FALSE )
data_sourceA data frame, database connection, or DataSource object.
table_nameThe SQL table name for this data source.
replaceWhether to replace an existing table with this name.
Default is FALSE.
include_in_greetingWhether to include this table in the greeting
context. Default is FALSE.
Invisibly returns self for chaining.
QueryChat$add_tables()Add multiple tables from a DBI connection in a single call.
Unlike calling $add_table() repeatedly, this method builds the
system prompt exactly once after all tables have been staged, avoiding
N-1 spurious intermediate rebuilds.
QueryChat$add_tables( conn, tables = NULL, replace = FALSE, include_in_greeting = FALSE )
connA DBI connection. Only DBI connections are supported; pass
individual data frames or other sources via $add_table().
tablesTable names to register. When NULL, all tables returned
by DBI::dbListTables(conn) are used.
replaceWhether to replace existing tables with the same name.
Default is FALSE.
include_in_greetingWhether to include added tables in the greeting
context. TRUE includes all tables; FALSE (default) includes none;
a character vector includes only those named tables (intersected with
the tables being added). Any other type raises an error.
Invisibly returns self for chaining.
QueryChat$remove_table()Remove a table from this QueryChat instance.
QueryChat$remove_table(table_name)
table_nameThe name of the table to remove.
Invisibly returns self for chaining.
QueryChat$table_names()Return the names of all registered tables.
QueryChat$table_names()
QueryChat$client()Create a chat client, complete with registered tools, for the current data source.
QueryChat$client(
tools = NA,
update_dashboard = function(query, title, table) {
},
reset_dashboard = function(table) {
},
visualize = function(data) {
},
session = NULL
)
toolsWhich querychat tools to include in the chat client.
"filter" includes the tools for filtering and resetting the dashboard
and "query" includes the tool for executing SQL queries. By default,
when tools = NA, the values provided at initialization are used.
The legacy name "update" is still accepted as an alias for "filter".
update_dashboardOptional function to call with the query,
title, and table generated by the LLM for the update_dashboard tool.
reset_dashboardOptional function to call when the
reset_dashboard tool is called. Takes a table argument.
visualizeOptional function to call with a list containing
ggsql, title, and widget_id when a visualization succeeds.
sessionA Shiny session object. Required when "visualize" is
in tools and you want interactive chart rendering. When NULL
(the default), visualizations still execute but are not rendered
as Shiny outputs.
QueryChat$console()Launch a console-based chat interface with the data source.
QueryChat$console(new = FALSE, ..., tools = "query")
newWhether to create a new chat client instance or continue the conversation from the last console chat session (the default).
...Additional arguments passed to the $client() method.
toolsWhich querychat tools to include in the chat client. See
$client() for details. Ignored when not creating a new chat client.
By default, only the "query" tool is included, regardless of the
tools set at initialization.
QueryChat$app()Create and run a Shiny gadget for chatting with data
QueryChat$app(..., bookmark_store = "url")
...Arguments passed to $app_obj().
bookmark_storeThe bookmarking storage method. Passed to
shiny::enableBookmarking(). If "url" or "server", the chat state
(including current query) will be bookmarked. Default is "url".
Invisibly returns a list of session-specific values.
QueryChat$app_obj()A streamlined Shiny app for chatting with data
QueryChat$app_obj(..., bookmark_store = "url")
...Additional arguments (currently unused).
bookmark_storeThe bookmarking storage method. Passed to
shiny::enableBookmarking(). Default is "url".
A Shiny app object that can be run with shiny::runApp().
QueryChat$sidebar()Create a sidebar containing the querychat UI.
QueryChat$sidebar( ..., width = 400, height = "100%", fillable = TRUE, id = NULL )
...Additional arguments passed to bslib::sidebar().
widthWidth of the sidebar in pixels. Default is 400.
heightHeight of the sidebar. Default is "100%".
fillableWhether the sidebar should be fillable. Default is
TRUE.
idOptional ID for the QueryChat instance.
A bslib::sidebar() UI component.
QueryChat$ui()Create the UI for the querychat chat interface.
QueryChat$ui(..., id = NULL)
...Additional arguments passed to shinychat::chat_ui().
idOptional ID for the QueryChat instance.
A UI component containing the chat interface.
QueryChat$server()Initialize the querychat server logic.
QueryChat$server( data_source = NULL, client = NULL, enable_bookmarking = FALSE, ..., id = NULL, session = shiny::getDefaultReactiveDomain() )
data_sourceOptional data source for backward compatibility.
If provided, calls $add_table() before initializing server logic.
clientOptional chat client override for this session.
enable_bookmarkingWhether to enable bookmarking. Default is FALSE.
...Ignored.
idOptional module ID override.
sessionThe Shiny session object.
A list containing session-specific reactive values and the chat
client. For single-table usage, includes df, sql, title directly.
For multi-table, use qc_vals$table("name") to get a TableAccessor
with per-table reactive state. Also includes table_names() to list tables.
current_table() returns the name of the most recently queried table,
or NULL before any query.
QueryChat$generate_greeting()Generate a welcome greeting for the chat.
QueryChat$generate_greeting(echo = c("none", "output"))
echoWhether to print the greeting to the console.
The greeting string in Markdown format.
QueryChat$cleanup()Clean up resources associated with the data source.
QueryChat$cleanup()
Invisibly returns NULL. Resources are cleaned up internally.
QueryChat$clone()The objects of this class are cloneable with this method.
QueryChat$clone(deep = FALSE)
deepWhether to make a deep clone.
# Basic usage with a data frame qc <- QueryChat$new(mtcars) ## Not run: app <- qc$app() ## End(Not run) # With a custom greeting greeting <- "Welcome! Ask me about the mtcars dataset." qc <- QueryChat$new(mtcars, greeting = greeting) # With a specific LLM provider qc <- QueryChat$new(mtcars, client = "anthropic/claude-sonnet-4-5") # Generate a greeting for reuse (requires internet/API access) ## Not run: qc <- QueryChat$new(mtcars) greeting <- qc$generate_greeting(echo = "text") # Save greeting for next time writeLines(greeting, "mtcars_greeting.md") ## End(Not run) # Or specify greeting and additional options at initialization qc <- QueryChat$new( mtcars, greeting = "Welcome to the mtcars explorer!", client = "openai/gpt-4o", data_description = "Motor Trend car road tests dataset" ) # Create a QueryChat object from a database connection # 1. Set up the database connection con <- DBI::dbConnect(RSQLite::SQLite(), ":memory:") # 2. (For this demo) Create a table in the database DBI::dbWriteTable(con, "mtcars", mtcars) # 3. Pass the connection and table name to `QueryChat` qc <- QueryChat$new(con, "mtcars")# Basic usage with a data frame qc <- QueryChat$new(mtcars) ## Not run: app <- qc$app() ## End(Not run) # With a custom greeting greeting <- "Welcome! Ask me about the mtcars dataset." qc <- QueryChat$new(mtcars, greeting = greeting) # With a specific LLM provider qc <- QueryChat$new(mtcars, client = "anthropic/claude-sonnet-4-5") # Generate a greeting for reuse (requires internet/API access) ## Not run: qc <- QueryChat$new(mtcars) greeting <- qc$generate_greeting(echo = "text") # Save greeting for next time writeLines(greeting, "mtcars_greeting.md") ## End(Not run) # Or specify greeting and additional options at initialization qc <- QueryChat$new( mtcars, greeting = "Welcome to the mtcars explorer!", client = "openai/gpt-4o", data_description = "Motor Trend car road tests dataset" ) # Create a QueryChat object from a database connection # 1. Set up the database connection con <- DBI::dbConnect(RSQLite::SQLite(), ":memory:") # 2. (For this demo) Create a table in the database DBI::dbWriteTable(con, "mtcars", mtcars) # 3. Pass the connection and table name to `QueryChat` qc <- QueryChat$new(con, "mtcars")
Loads a data dictionary from a YAML file conforming to the
data-dict spec. The dictionary is
returned as a plain list and can be passed directly to QueryChat via the
data_dict argument.
If name is absent from the YAML file, it defaults to the file stem.
read_data_dict(path)read_data_dict(path)
path |
Path to the YAML file. |
A named list with the structure of the YAML file.
A DataSource implementation for lazy SQL tibbles connected to databases via
dbplyr::tbl_sql() or dplyr::sql().
DataSource -> DBISource -> TblSqlSource
table_nameName of the table to be used in SQL queries
TblSqlSource$new()Create a new TblSqlSource
TblSqlSource$new(tbl, table_name = missing_arg())
tblA dbplyr::tbl_sql() (or SQL tibble via dplyr::tbl()).
table_nameName of the table in the database. Can be a character
string, or will be inferred from the tbl argument, if possible.
A new TblSqlSource object
TblSqlSource$get_db_type()Get the database type
TblSqlSource$get_db_type()
A string describing the database type (e.g., "DuckDB", "SQLite")
TblSqlSource$get_schema()Get schema information about the table
TblSqlSource$get_schema(categorical_threshold = 20, table_spec = NULL)
categorical_thresholdMaximum number of unique values for a text column to be considered categorical
A string containing schema information formatted for LLM prompts
TblSqlSource$get_schema_result()TblSqlSource$get_schema_result(categorical_threshold = 20, table_spec = NULL)
TblSqlSource$execute_query()Execute a SQL query and return results
TblSqlSource$execute_query(query)
querySQL query string to execute
A data frame containing query results
TblSqlSource$test_query()Test a SQL query by fetching only one row
TblSqlSource$test_query(query, require_all_columns = FALSE)
querySQL query string to test
require_all_columnsIf TRUE, validates that the result includes
all original table columns (default: FALSE)
A data frame containing one row of results (or empty if no matches)
TblSqlSource$prep_query()Prepare a generic SELECT * FROM ____ query to work with the SQL tibble
TblSqlSource$prep_query(query)
querySQL query as a string
A complete SQL query string
TblSqlSource$get_data()Get the unfiltered data as a SQL tibble
TblSqlSource$get_data()
A dbplyr::tbl_sql() containing the original, unfiltered data
TblSqlSource$cleanup()Clean up resources (close connections, etc.)
TblSqlSource$cleanup()
NULL (invisibly)
TblSqlSource$clone()The objects of this class are cloneable with this method.
TblSqlSource$clone(deep = FALSE)
deepWhether to make a deep clone.
con <- DBI::dbConnect(duckdb::duckdb()) DBI::dbWriteTable(con, "mtcars", mtcars) mtcars_source <- TblSqlSource$new(dplyr::tbl(con, "mtcars")) mtcars_source$get_db_type() # "DuckDB" result <- mtcars_source$execute_query("SELECT * FROM mtcars WHERE cyl > 4") # Note, the result is not the *full* data frame, but a lazy SQL tibble result # You can chain this result into a dplyr pipeline dplyr::count(result, cyl, gear) # Or collect the entire data frame into local memory dplyr::collect(result) # Finally, clean up when done with the database (closes the DB connection) mtcars_source$cleanup()con <- DBI::dbConnect(duckdb::duckdb()) DBI::dbWriteTable(con, "mtcars", mtcars) mtcars_source <- TblSqlSource$new(dplyr::tbl(con, "mtcars")) mtcars_source$get_db_type() # "DuckDB" result <- mtcars_source$execute_query("SELECT * FROM mtcars WHERE cyl > 4") # Note, the result is not the *full* data frame, but a lazy SQL tibble result # You can chain this result into a dplyr pipeline dplyr::count(result, cyl, gear) # Or collect the entire data frame into local memory dplyr::collect(result) # Finally, clean up when done with the database (closes the DB connection) mtcars_source$cleanup()