2025-11-13 16:45:17 -08:00
|
|
|
|
use std::path::PathBuf;
|
|
|
|
|
|
|
2025-09-22 20:30:16 +01:00
|
|
|
|
use tree_sitter::Node;
|
feat: expand the set of commands that can be safely identified as "trusted" (#1668)
This PR updates `is_known_safe_command()` to account for "safe
operators" to expand the set of commands that can be run without
approval. This concept existed in the TypeScript CLI, and we are
[finally!] porting it to the Rust one:
https://github.com/openai/codex/blob/c9e2def49487585cfe6f8bb7b2be442e8c0b5e1b/codex-cli/src/approvals.ts#L531-L541
The idea is that if we have `EXPR1 SAFE_OP EXPR2` and `EXPR1` and
`EXPR2` are considered safe independently, then `EXPR1 SAFE_OP EXPR2`
should be considered safe. Currently, `SAFE_OP` includes `&&`, `||`,
`;`, and `|`.
In the TypeScript implementation, we relied on
https://www.npmjs.com/package/shell-quote to parse the string of Bash,
as it could provide a "lightweight" parse tree, parsing `'beep || boop >
/byte'` as:
```
[ 'beep', { op: '||' }, 'boop', { op: '>' }, '/byte' ]
```
Though in this PR, we introduce the use of
https://crates.io/crates/tree-sitter-bash for parsing (which
incidentally we were already using in
[`codex-apply-patch`](https://github.com/openai/codex/blob/c9e2def49487585cfe6f8bb7b2be442e8c0b5e1b/codex-rs/apply-patch/Cargo.toml#L18)),
which gives us a richer parse tree. (Incidentally, if you have never
played with tree-sitter, try the
[playground](https://tree-sitter.github.io/tree-sitter/7-playground.html)
and select **Bash** from the dropdown to see how it parses various
expressions.)
As a concrete example, prior to this change, our implementation of
`is_known_safe_command()` could verify things like:
```
["bash", "-lc", "grep -R \"Cargo.toml\" -n"]
```
but not:
```
["bash", "-lc", "grep -R \"Cargo.toml\" -n || true"]
```
With this change, the version with `|| true` is also accepted.
Admittedly, this PR does not expand the safety check to support
subshells, so it would reject, e.g. `bash -lc 'ls || (pwd && echo hi)'`,
but that can be addressed in a subsequent PR.
2025-07-24 14:13:30 -07:00
|
|
|
|
use tree_sitter::Parser;
|
|
|
|
|
|
use tree_sitter::Tree;
|
|
|
|
|
|
use tree_sitter_bash::LANGUAGE as BASH;
|
|
|
|
|
|
|
2026-02-10 14:43:16 -08:00
|
|
|
|
use crate::shell_detect::ShellType;
|
|
|
|
|
|
use crate::shell_detect::detect_shell_type;
|
2025-11-13 16:45:17 -08:00
|
|
|
|
|
feat: expand the set of commands that can be safely identified as "trusted" (#1668)
This PR updates `is_known_safe_command()` to account for "safe
operators" to expand the set of commands that can be run without
approval. This concept existed in the TypeScript CLI, and we are
[finally!] porting it to the Rust one:
https://github.com/openai/codex/blob/c9e2def49487585cfe6f8bb7b2be442e8c0b5e1b/codex-cli/src/approvals.ts#L531-L541
The idea is that if we have `EXPR1 SAFE_OP EXPR2` and `EXPR1` and
`EXPR2` are considered safe independently, then `EXPR1 SAFE_OP EXPR2`
should be considered safe. Currently, `SAFE_OP` includes `&&`, `||`,
`;`, and `|`.
In the TypeScript implementation, we relied on
https://www.npmjs.com/package/shell-quote to parse the string of Bash,
as it could provide a "lightweight" parse tree, parsing `'beep || boop >
/byte'` as:
```
[ 'beep', { op: '||' }, 'boop', { op: '>' }, '/byte' ]
```
Though in this PR, we introduce the use of
https://crates.io/crates/tree-sitter-bash for parsing (which
incidentally we were already using in
[`codex-apply-patch`](https://github.com/openai/codex/blob/c9e2def49487585cfe6f8bb7b2be442e8c0b5e1b/codex-rs/apply-patch/Cargo.toml#L18)),
which gives us a richer parse tree. (Incidentally, if you have never
played with tree-sitter, try the
[playground](https://tree-sitter.github.io/tree-sitter/7-playground.html)
and select **Bash** from the dropdown to see how it parses various
expressions.)
As a concrete example, prior to this change, our implementation of
`is_known_safe_command()` could verify things like:
```
["bash", "-lc", "grep -R \"Cargo.toml\" -n"]
```
but not:
```
["bash", "-lc", "grep -R \"Cargo.toml\" -n || true"]
```
With this change, the version with `|| true` is also accepted.
Admittedly, this PR does not expand the safety check to support
subshells, so it would reject, e.g. `bash -lc 'ls || (pwd && echo hi)'`,
but that can be addressed in a subsequent PR.
2025-07-24 14:13:30 -07:00
|
|
|
|
/// Parse the provided bash source using tree-sitter-bash, returning a Tree on
|
|
|
|
|
|
/// success or None if parsing failed.
|
2025-10-20 15:52:25 -07:00
|
|
|
|
pub fn try_parse_shell(shell_lc_arg: &str) -> Option<Tree> {
|
feat: expand the set of commands that can be safely identified as "trusted" (#1668)
This PR updates `is_known_safe_command()` to account for "safe
operators" to expand the set of commands that can be run without
approval. This concept existed in the TypeScript CLI, and we are
[finally!] porting it to the Rust one:
https://github.com/openai/codex/blob/c9e2def49487585cfe6f8bb7b2be442e8c0b5e1b/codex-cli/src/approvals.ts#L531-L541
The idea is that if we have `EXPR1 SAFE_OP EXPR2` and `EXPR1` and
`EXPR2` are considered safe independently, then `EXPR1 SAFE_OP EXPR2`
should be considered safe. Currently, `SAFE_OP` includes `&&`, `||`,
`;`, and `|`.
In the TypeScript implementation, we relied on
https://www.npmjs.com/package/shell-quote to parse the string of Bash,
as it could provide a "lightweight" parse tree, parsing `'beep || boop >
/byte'` as:
```
[ 'beep', { op: '||' }, 'boop', { op: '>' }, '/byte' ]
```
Though in this PR, we introduce the use of
https://crates.io/crates/tree-sitter-bash for parsing (which
incidentally we were already using in
[`codex-apply-patch`](https://github.com/openai/codex/blob/c9e2def49487585cfe6f8bb7b2be442e8c0b5e1b/codex-rs/apply-patch/Cargo.toml#L18)),
which gives us a richer parse tree. (Incidentally, if you have never
played with tree-sitter, try the
[playground](https://tree-sitter.github.io/tree-sitter/7-playground.html)
and select **Bash** from the dropdown to see how it parses various
expressions.)
As a concrete example, prior to this change, our implementation of
`is_known_safe_command()` could verify things like:
```
["bash", "-lc", "grep -R \"Cargo.toml\" -n"]
```
but not:
```
["bash", "-lc", "grep -R \"Cargo.toml\" -n || true"]
```
With this change, the version with `|| true` is also accepted.
Admittedly, this PR does not expand the safety check to support
subshells, so it would reject, e.g. `bash -lc 'ls || (pwd && echo hi)'`,
but that can be addressed in a subsequent PR.
2025-07-24 14:13:30 -07:00
|
|
|
|
let lang = BASH.into();
|
|
|
|
|
|
let mut parser = Parser::new();
|
|
|
|
|
|
#[expect(clippy::expect_used)]
|
|
|
|
|
|
parser.set_language(&lang).expect("load bash grammar");
|
|
|
|
|
|
let old_tree: Option<&Tree> = None;
|
2025-10-20 15:52:25 -07:00
|
|
|
|
parser.parse(shell_lc_arg, old_tree)
|
feat: expand the set of commands that can be safely identified as "trusted" (#1668)
This PR updates `is_known_safe_command()` to account for "safe
operators" to expand the set of commands that can be run without
approval. This concept existed in the TypeScript CLI, and we are
[finally!] porting it to the Rust one:
https://github.com/openai/codex/blob/c9e2def49487585cfe6f8bb7b2be442e8c0b5e1b/codex-cli/src/approvals.ts#L531-L541
The idea is that if we have `EXPR1 SAFE_OP EXPR2` and `EXPR1` and
`EXPR2` are considered safe independently, then `EXPR1 SAFE_OP EXPR2`
should be considered safe. Currently, `SAFE_OP` includes `&&`, `||`,
`;`, and `|`.
In the TypeScript implementation, we relied on
https://www.npmjs.com/package/shell-quote to parse the string of Bash,
as it could provide a "lightweight" parse tree, parsing `'beep || boop >
/byte'` as:
```
[ 'beep', { op: '||' }, 'boop', { op: '>' }, '/byte' ]
```
Though in this PR, we introduce the use of
https://crates.io/crates/tree-sitter-bash for parsing (which
incidentally we were already using in
[`codex-apply-patch`](https://github.com/openai/codex/blob/c9e2def49487585cfe6f8bb7b2be442e8c0b5e1b/codex-rs/apply-patch/Cargo.toml#L18)),
which gives us a richer parse tree. (Incidentally, if you have never
played with tree-sitter, try the
[playground](https://tree-sitter.github.io/tree-sitter/7-playground.html)
and select **Bash** from the dropdown to see how it parses various
expressions.)
As a concrete example, prior to this change, our implementation of
`is_known_safe_command()` could verify things like:
```
["bash", "-lc", "grep -R \"Cargo.toml\" -n"]
```
but not:
```
["bash", "-lc", "grep -R \"Cargo.toml\" -n || true"]
```
With this change, the version with `|| true` is also accepted.
Admittedly, this PR does not expand the safety check to support
subshells, so it would reject, e.g. `bash -lc 'ls || (pwd && echo hi)'`,
but that can be addressed in a subsequent PR.
2025-07-24 14:13:30 -07:00
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
|
|
/// Parse a script which may contain multiple simple commands joined only by
|
|
|
|
|
|
/// the safe logical/pipe/sequencing operators: `&&`, `||`, `;`, `|`.
|
|
|
|
|
|
///
|
|
|
|
|
|
/// Returns `Some(Vec<command_words>)` if every command is a plain word‑only
|
|
|
|
|
|
/// command and the parse tree does not contain disallowed constructs
|
|
|
|
|
|
/// (parentheses, redirections, substitutions, control flow, etc.). Otherwise
|
|
|
|
|
|
/// returns `None`.
|
|
|
|
|
|
pub fn try_parse_word_only_commands_sequence(tree: &Tree, src: &str) -> Option<Vec<Vec<String>>> {
|
|
|
|
|
|
if tree.root_node().has_error() {
|
|
|
|
|
|
return None;
|
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
|
|
// List of allowed (named) node kinds for a "word only commands sequence".
|
|
|
|
|
|
// If we encounter a named node that is not in this list we reject.
|
|
|
|
|
|
const ALLOWED_KINDS: &[&str] = &[
|
|
|
|
|
|
// top level containers
|
|
|
|
|
|
"program",
|
|
|
|
|
|
"list",
|
|
|
|
|
|
"pipeline",
|
|
|
|
|
|
// commands & words
|
|
|
|
|
|
"command",
|
|
|
|
|
|
"command_name",
|
|
|
|
|
|
"word",
|
|
|
|
|
|
"string",
|
|
|
|
|
|
"string_content",
|
|
|
|
|
|
"raw_string",
|
|
|
|
|
|
"number",
|
2025-12-22 22:59:27 +03:00
|
|
|
|
"concatenation",
|
feat: expand the set of commands that can be safely identified as "trusted" (#1668)
This PR updates `is_known_safe_command()` to account for "safe
operators" to expand the set of commands that can be run without
approval. This concept existed in the TypeScript CLI, and we are
[finally!] porting it to the Rust one:
https://github.com/openai/codex/blob/c9e2def49487585cfe6f8bb7b2be442e8c0b5e1b/codex-cli/src/approvals.ts#L531-L541
The idea is that if we have `EXPR1 SAFE_OP EXPR2` and `EXPR1` and
`EXPR2` are considered safe independently, then `EXPR1 SAFE_OP EXPR2`
should be considered safe. Currently, `SAFE_OP` includes `&&`, `||`,
`;`, and `|`.
In the TypeScript implementation, we relied on
https://www.npmjs.com/package/shell-quote to parse the string of Bash,
as it could provide a "lightweight" parse tree, parsing `'beep || boop >
/byte'` as:
```
[ 'beep', { op: '||' }, 'boop', { op: '>' }, '/byte' ]
```
Though in this PR, we introduce the use of
https://crates.io/crates/tree-sitter-bash for parsing (which
incidentally we were already using in
[`codex-apply-patch`](https://github.com/openai/codex/blob/c9e2def49487585cfe6f8bb7b2be442e8c0b5e1b/codex-rs/apply-patch/Cargo.toml#L18)),
which gives us a richer parse tree. (Incidentally, if you have never
played with tree-sitter, try the
[playground](https://tree-sitter.github.io/tree-sitter/7-playground.html)
and select **Bash** from the dropdown to see how it parses various
expressions.)
As a concrete example, prior to this change, our implementation of
`is_known_safe_command()` could verify things like:
```
["bash", "-lc", "grep -R \"Cargo.toml\" -n"]
```
but not:
```
["bash", "-lc", "grep -R \"Cargo.toml\" -n || true"]
```
With this change, the version with `|| true` is also accepted.
Admittedly, this PR does not expand the safety check to support
subshells, so it would reject, e.g. `bash -lc 'ls || (pwd && echo hi)'`,
but that can be addressed in a subsequent PR.
2025-07-24 14:13:30 -07:00
|
|
|
|
];
|
|
|
|
|
|
// Allow only safe punctuation / operator tokens; anything else causes reject.
|
|
|
|
|
|
const ALLOWED_PUNCT_TOKENS: &[&str] = &["&&", "||", ";", "|", "\"", "'"];
|
|
|
|
|
|
|
|
|
|
|
|
let root = tree.root_node();
|
|
|
|
|
|
let mut cursor = root.walk();
|
|
|
|
|
|
let mut stack = vec![root];
|
|
|
|
|
|
let mut command_nodes = Vec::new();
|
|
|
|
|
|
while let Some(node) = stack.pop() {
|
|
|
|
|
|
let kind = node.kind();
|
|
|
|
|
|
if node.is_named() {
|
|
|
|
|
|
if !ALLOWED_KINDS.contains(&kind) {
|
|
|
|
|
|
return None;
|
|
|
|
|
|
}
|
|
|
|
|
|
if kind == "command" {
|
|
|
|
|
|
command_nodes.push(node);
|
|
|
|
|
|
}
|
|
|
|
|
|
} else {
|
|
|
|
|
|
// Reject any punctuation / operator tokens that are not explicitly allowed.
|
|
|
|
|
|
if kind.chars().any(|c| "&;|".contains(c)) && !ALLOWED_PUNCT_TOKENS.contains(&kind) {
|
|
|
|
|
|
return None;
|
|
|
|
|
|
}
|
|
|
|
|
|
if !(ALLOWED_PUNCT_TOKENS.contains(&kind) || kind.trim().is_empty()) {
|
|
|
|
|
|
// If it's a quote token or operator it's allowed above; we also allow whitespace tokens.
|
|
|
|
|
|
// Any other punctuation like parentheses, braces, redirects, backticks, etc are rejected.
|
|
|
|
|
|
return None;
|
|
|
|
|
|
}
|
|
|
|
|
|
}
|
|
|
|
|
|
for child in node.children(&mut cursor) {
|
|
|
|
|
|
stack.push(child);
|
|
|
|
|
|
}
|
|
|
|
|
|
}
|
|
|
|
|
|
|
2025-09-18 16:07:38 -07:00
|
|
|
|
// Walk uses a stack (LIFO), so re-sort by position to restore source order.
|
2025-09-22 20:30:16 +01:00
|
|
|
|
command_nodes.sort_by_key(Node::start_byte);
|
2025-09-18 16:07:38 -07:00
|
|
|
|
|
feat: expand the set of commands that can be safely identified as "trusted" (#1668)
This PR updates `is_known_safe_command()` to account for "safe
operators" to expand the set of commands that can be run without
approval. This concept existed in the TypeScript CLI, and we are
[finally!] porting it to the Rust one:
https://github.com/openai/codex/blob/c9e2def49487585cfe6f8bb7b2be442e8c0b5e1b/codex-cli/src/approvals.ts#L531-L541
The idea is that if we have `EXPR1 SAFE_OP EXPR2` and `EXPR1` and
`EXPR2` are considered safe independently, then `EXPR1 SAFE_OP EXPR2`
should be considered safe. Currently, `SAFE_OP` includes `&&`, `||`,
`;`, and `|`.
In the TypeScript implementation, we relied on
https://www.npmjs.com/package/shell-quote to parse the string of Bash,
as it could provide a "lightweight" parse tree, parsing `'beep || boop >
/byte'` as:
```
[ 'beep', { op: '||' }, 'boop', { op: '>' }, '/byte' ]
```
Though in this PR, we introduce the use of
https://crates.io/crates/tree-sitter-bash for parsing (which
incidentally we were already using in
[`codex-apply-patch`](https://github.com/openai/codex/blob/c9e2def49487585cfe6f8bb7b2be442e8c0b5e1b/codex-rs/apply-patch/Cargo.toml#L18)),
which gives us a richer parse tree. (Incidentally, if you have never
played with tree-sitter, try the
[playground](https://tree-sitter.github.io/tree-sitter/7-playground.html)
and select **Bash** from the dropdown to see how it parses various
expressions.)
As a concrete example, prior to this change, our implementation of
`is_known_safe_command()` could verify things like:
```
["bash", "-lc", "grep -R \"Cargo.toml\" -n"]
```
but not:
```
["bash", "-lc", "grep -R \"Cargo.toml\" -n || true"]
```
With this change, the version with `|| true` is also accepted.
Admittedly, this PR does not expand the safety check to support
subshells, so it would reject, e.g. `bash -lc 'ls || (pwd && echo hi)'`,
but that can be addressed in a subsequent PR.
2025-07-24 14:13:30 -07:00
|
|
|
|
let mut commands = Vec::new();
|
|
|
|
|
|
for node in command_nodes {
|
|
|
|
|
|
if let Some(words) = parse_plain_command_from_node(node, src) {
|
|
|
|
|
|
commands.push(words);
|
|
|
|
|
|
} else {
|
|
|
|
|
|
return None;
|
|
|
|
|
|
}
|
|
|
|
|
|
}
|
|
|
|
|
|
Some(commands)
|
|
|
|
|
|
}
|
|
|
|
|
|
|
2025-11-10 18:38:36 -08:00
|
|
|
|
pub fn extract_bash_command(command: &[String]) -> Option<(&str, &str)> {
|
2025-10-20 15:52:25 -07:00
|
|
|
|
let [shell, flag, script] = command else {
|
2025-09-25 19:46:20 -07:00
|
|
|
|
return None;
|
|
|
|
|
|
};
|
2025-11-13 16:45:17 -08:00
|
|
|
|
if !matches!(flag.as_str(), "-lc" | "-c")
|
|
|
|
|
|
|| !matches!(
|
|
|
|
|
|
detect_shell_type(&PathBuf::from(shell)),
|
2025-11-20 10:58:46 -08:00
|
|
|
|
Some(ShellType::Zsh) | Some(ShellType::Bash) | Some(ShellType::Sh)
|
2025-11-13 16:45:17 -08:00
|
|
|
|
)
|
|
|
|
|
|
{
|
2025-09-25 19:46:20 -07:00
|
|
|
|
return None;
|
|
|
|
|
|
}
|
2025-11-10 18:38:36 -08:00
|
|
|
|
Some((shell, script))
|
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
|
|
/// Returns the sequence of plain commands within a `bash -lc "..."` or
|
|
|
|
|
|
/// `zsh -lc "..."` invocation when the script only contains word-only commands
|
|
|
|
|
|
/// joined by safe operators.
|
|
|
|
|
|
pub fn parse_shell_lc_plain_commands(command: &[String]) -> Option<Vec<Vec<String>>> {
|
|
|
|
|
|
let (_, script) = extract_bash_command(command)?;
|
2025-09-25 19:46:20 -07:00
|
|
|
|
|
2025-10-20 15:52:25 -07:00
|
|
|
|
let tree = try_parse_shell(script)?;
|
2025-09-25 19:46:20 -07:00
|
|
|
|
try_parse_word_only_commands_sequence(&tree, script)
|
|
|
|
|
|
}
|
|
|
|
|
|
|
2026-02-10 11:46:40 -08:00
|
|
|
|
/// Returns the parsed argv for a single shell command in a here-doc style
|
|
|
|
|
|
/// script (`<<`), as long as the script contains exactly one command node.
|
|
|
|
|
|
pub fn parse_shell_lc_single_command_prefix(command: &[String]) -> Option<Vec<String>> {
|
|
|
|
|
|
let (_, script) = extract_bash_command(command)?;
|
|
|
|
|
|
let tree = try_parse_shell(script)?;
|
|
|
|
|
|
let root = tree.root_node();
|
|
|
|
|
|
if root.has_error() {
|
|
|
|
|
|
return None;
|
|
|
|
|
|
}
|
2026-02-21 22:26:13 -08:00
|
|
|
|
if !has_named_descendant_kind(root, "heredoc_redirect") {
|
|
|
|
|
|
return None;
|
|
|
|
|
|
}
|
2026-02-10 11:46:40 -08:00
|
|
|
|
|
|
|
|
|
|
let command_node = find_single_command_node(root)?;
|
|
|
|
|
|
parse_heredoc_command_words(command_node, script)
|
|
|
|
|
|
}
|
|
|
|
|
|
|
feat: expand the set of commands that can be safely identified as "trusted" (#1668)
This PR updates `is_known_safe_command()` to account for "safe
operators" to expand the set of commands that can be run without
approval. This concept existed in the TypeScript CLI, and we are
[finally!] porting it to the Rust one:
https://github.com/openai/codex/blob/c9e2def49487585cfe6f8bb7b2be442e8c0b5e1b/codex-cli/src/approvals.ts#L531-L541
The idea is that if we have `EXPR1 SAFE_OP EXPR2` and `EXPR1` and
`EXPR2` are considered safe independently, then `EXPR1 SAFE_OP EXPR2`
should be considered safe. Currently, `SAFE_OP` includes `&&`, `||`,
`;`, and `|`.
In the TypeScript implementation, we relied on
https://www.npmjs.com/package/shell-quote to parse the string of Bash,
as it could provide a "lightweight" parse tree, parsing `'beep || boop >
/byte'` as:
```
[ 'beep', { op: '||' }, 'boop', { op: '>' }, '/byte' ]
```
Though in this PR, we introduce the use of
https://crates.io/crates/tree-sitter-bash for parsing (which
incidentally we were already using in
[`codex-apply-patch`](https://github.com/openai/codex/blob/c9e2def49487585cfe6f8bb7b2be442e8c0b5e1b/codex-rs/apply-patch/Cargo.toml#L18)),
which gives us a richer parse tree. (Incidentally, if you have never
played with tree-sitter, try the
[playground](https://tree-sitter.github.io/tree-sitter/7-playground.html)
and select **Bash** from the dropdown to see how it parses various
expressions.)
As a concrete example, prior to this change, our implementation of
`is_known_safe_command()` could verify things like:
```
["bash", "-lc", "grep -R \"Cargo.toml\" -n"]
```
but not:
```
["bash", "-lc", "grep -R \"Cargo.toml\" -n || true"]
```
With this change, the version with `|| true` is also accepted.
Admittedly, this PR does not expand the safety check to support
subshells, so it would reject, e.g. `bash -lc 'ls || (pwd && echo hi)'`,
but that can be addressed in a subsequent PR.
2025-07-24 14:13:30 -07:00
|
|
|
|
fn parse_plain_command_from_node(cmd: tree_sitter::Node, src: &str) -> Option<Vec<String>> {
|
|
|
|
|
|
if cmd.kind() != "command" {
|
|
|
|
|
|
return None;
|
|
|
|
|
|
}
|
|
|
|
|
|
let mut words = Vec::new();
|
|
|
|
|
|
let mut cursor = cmd.walk();
|
|
|
|
|
|
for child in cmd.named_children(&mut cursor) {
|
|
|
|
|
|
match child.kind() {
|
|
|
|
|
|
"command_name" => {
|
|
|
|
|
|
let word_node = child.named_child(0)?;
|
|
|
|
|
|
if word_node.kind() != "word" {
|
|
|
|
|
|
return None;
|
|
|
|
|
|
}
|
|
|
|
|
|
words.push(word_node.utf8_text(src.as_bytes()).ok()?.to_owned());
|
|
|
|
|
|
}
|
|
|
|
|
|
"word" | "number" => {
|
|
|
|
|
|
words.push(child.utf8_text(src.as_bytes()).ok()?.to_owned());
|
|
|
|
|
|
}
|
|
|
|
|
|
"string" => {
|
2026-01-23 01:16:53 -05:00
|
|
|
|
let parsed = parse_double_quoted_string(child, src)?;
|
|
|
|
|
|
words.push(parsed);
|
feat: expand the set of commands that can be safely identified as "trusted" (#1668)
This PR updates `is_known_safe_command()` to account for "safe
operators" to expand the set of commands that can be run without
approval. This concept existed in the TypeScript CLI, and we are
[finally!] porting it to the Rust one:
https://github.com/openai/codex/blob/c9e2def49487585cfe6f8bb7b2be442e8c0b5e1b/codex-cli/src/approvals.ts#L531-L541
The idea is that if we have `EXPR1 SAFE_OP EXPR2` and `EXPR1` and
`EXPR2` are considered safe independently, then `EXPR1 SAFE_OP EXPR2`
should be considered safe. Currently, `SAFE_OP` includes `&&`, `||`,
`;`, and `|`.
In the TypeScript implementation, we relied on
https://www.npmjs.com/package/shell-quote to parse the string of Bash,
as it could provide a "lightweight" parse tree, parsing `'beep || boop >
/byte'` as:
```
[ 'beep', { op: '||' }, 'boop', { op: '>' }, '/byte' ]
```
Though in this PR, we introduce the use of
https://crates.io/crates/tree-sitter-bash for parsing (which
incidentally we were already using in
[`codex-apply-patch`](https://github.com/openai/codex/blob/c9e2def49487585cfe6f8bb7b2be442e8c0b5e1b/codex-rs/apply-patch/Cargo.toml#L18)),
which gives us a richer parse tree. (Incidentally, if you have never
played with tree-sitter, try the
[playground](https://tree-sitter.github.io/tree-sitter/7-playground.html)
and select **Bash** from the dropdown to see how it parses various
expressions.)
As a concrete example, prior to this change, our implementation of
`is_known_safe_command()` could verify things like:
```
["bash", "-lc", "grep -R \"Cargo.toml\" -n"]
```
but not:
```
["bash", "-lc", "grep -R \"Cargo.toml\" -n || true"]
```
With this change, the version with `|| true` is also accepted.
Admittedly, this PR does not expand the safety check to support
subshells, so it would reject, e.g. `bash -lc 'ls || (pwd && echo hi)'`,
but that can be addressed in a subsequent PR.
2025-07-24 14:13:30 -07:00
|
|
|
|
}
|
|
|
|
|
|
"raw_string" => {
|
2026-01-23 01:16:53 -05:00
|
|
|
|
let parsed = parse_raw_string(child, src)?;
|
|
|
|
|
|
words.push(parsed);
|
feat: expand the set of commands that can be safely identified as "trusted" (#1668)
This PR updates `is_known_safe_command()` to account for "safe
operators" to expand the set of commands that can be run without
approval. This concept existed in the TypeScript CLI, and we are
[finally!] porting it to the Rust one:
https://github.com/openai/codex/blob/c9e2def49487585cfe6f8bb7b2be442e8c0b5e1b/codex-cli/src/approvals.ts#L531-L541
The idea is that if we have `EXPR1 SAFE_OP EXPR2` and `EXPR1` and
`EXPR2` are considered safe independently, then `EXPR1 SAFE_OP EXPR2`
should be considered safe. Currently, `SAFE_OP` includes `&&`, `||`,
`;`, and `|`.
In the TypeScript implementation, we relied on
https://www.npmjs.com/package/shell-quote to parse the string of Bash,
as it could provide a "lightweight" parse tree, parsing `'beep || boop >
/byte'` as:
```
[ 'beep', { op: '||' }, 'boop', { op: '>' }, '/byte' ]
```
Though in this PR, we introduce the use of
https://crates.io/crates/tree-sitter-bash for parsing (which
incidentally we were already using in
[`codex-apply-patch`](https://github.com/openai/codex/blob/c9e2def49487585cfe6f8bb7b2be442e8c0b5e1b/codex-rs/apply-patch/Cargo.toml#L18)),
which gives us a richer parse tree. (Incidentally, if you have never
played with tree-sitter, try the
[playground](https://tree-sitter.github.io/tree-sitter/7-playground.html)
and select **Bash** from the dropdown to see how it parses various
expressions.)
As a concrete example, prior to this change, our implementation of
`is_known_safe_command()` could verify things like:
```
["bash", "-lc", "grep -R \"Cargo.toml\" -n"]
```
but not:
```
["bash", "-lc", "grep -R \"Cargo.toml\" -n || true"]
```
With this change, the version with `|| true` is also accepted.
Admittedly, this PR does not expand the safety check to support
subshells, so it would reject, e.g. `bash -lc 'ls || (pwd && echo hi)'`,
but that can be addressed in a subsequent PR.
2025-07-24 14:13:30 -07:00
|
|
|
|
}
|
2025-12-22 22:59:27 +03:00
|
|
|
|
"concatenation" => {
|
|
|
|
|
|
// Handle concatenated arguments like -g"*.py"
|
|
|
|
|
|
let mut concatenated = String::new();
|
|
|
|
|
|
let mut concat_cursor = child.walk();
|
|
|
|
|
|
for part in child.named_children(&mut concat_cursor) {
|
|
|
|
|
|
match part.kind() {
|
|
|
|
|
|
"word" | "number" => {
|
|
|
|
|
|
concatenated
|
|
|
|
|
|
.push_str(part.utf8_text(src.as_bytes()).ok()?.to_owned().as_str());
|
|
|
|
|
|
}
|
|
|
|
|
|
"string" => {
|
2026-01-23 01:16:53 -05:00
|
|
|
|
let parsed = parse_double_quoted_string(part, src)?;
|
|
|
|
|
|
concatenated.push_str(&parsed);
|
2025-12-22 22:59:27 +03:00
|
|
|
|
}
|
|
|
|
|
|
"raw_string" => {
|
2026-01-23 01:16:53 -05:00
|
|
|
|
let parsed = parse_raw_string(part, src)?;
|
|
|
|
|
|
concatenated.push_str(&parsed);
|
2025-12-22 22:59:27 +03:00
|
|
|
|
}
|
|
|
|
|
|
_ => return None,
|
|
|
|
|
|
}
|
|
|
|
|
|
}
|
|
|
|
|
|
if concatenated.is_empty() {
|
|
|
|
|
|
return None;
|
|
|
|
|
|
}
|
|
|
|
|
|
words.push(concatenated);
|
|
|
|
|
|
}
|
feat: expand the set of commands that can be safely identified as "trusted" (#1668)
This PR updates `is_known_safe_command()` to account for "safe
operators" to expand the set of commands that can be run without
approval. This concept existed in the TypeScript CLI, and we are
[finally!] porting it to the Rust one:
https://github.com/openai/codex/blob/c9e2def49487585cfe6f8bb7b2be442e8c0b5e1b/codex-cli/src/approvals.ts#L531-L541
The idea is that if we have `EXPR1 SAFE_OP EXPR2` and `EXPR1` and
`EXPR2` are considered safe independently, then `EXPR1 SAFE_OP EXPR2`
should be considered safe. Currently, `SAFE_OP` includes `&&`, `||`,
`;`, and `|`.
In the TypeScript implementation, we relied on
https://www.npmjs.com/package/shell-quote to parse the string of Bash,
as it could provide a "lightweight" parse tree, parsing `'beep || boop >
/byte'` as:
```
[ 'beep', { op: '||' }, 'boop', { op: '>' }, '/byte' ]
```
Though in this PR, we introduce the use of
https://crates.io/crates/tree-sitter-bash for parsing (which
incidentally we were already using in
[`codex-apply-patch`](https://github.com/openai/codex/blob/c9e2def49487585cfe6f8bb7b2be442e8c0b5e1b/codex-rs/apply-patch/Cargo.toml#L18)),
which gives us a richer parse tree. (Incidentally, if you have never
played with tree-sitter, try the
[playground](https://tree-sitter.github.io/tree-sitter/7-playground.html)
and select **Bash** from the dropdown to see how it parses various
expressions.)
As a concrete example, prior to this change, our implementation of
`is_known_safe_command()` could verify things like:
```
["bash", "-lc", "grep -R \"Cargo.toml\" -n"]
```
but not:
```
["bash", "-lc", "grep -R \"Cargo.toml\" -n || true"]
```
With this change, the version with `|| true` is also accepted.
Admittedly, this PR does not expand the safety check to support
subshells, so it would reject, e.g. `bash -lc 'ls || (pwd && echo hi)'`,
but that can be addressed in a subsequent PR.
2025-07-24 14:13:30 -07:00
|
|
|
|
_ => return None,
|
|
|
|
|
|
}
|
|
|
|
|
|
}
|
|
|
|
|
|
Some(words)
|
|
|
|
|
|
}
|
|
|
|
|
|
|
2026-02-10 11:46:40 -08:00
|
|
|
|
fn parse_heredoc_command_words(cmd: Node<'_>, src: &str) -> Option<Vec<String>> {
|
|
|
|
|
|
if cmd.kind() != "command" {
|
|
|
|
|
|
return None;
|
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
|
|
let mut words = Vec::new();
|
|
|
|
|
|
let mut cursor = cmd.walk();
|
|
|
|
|
|
for child in cmd.named_children(&mut cursor) {
|
|
|
|
|
|
match child.kind() {
|
|
|
|
|
|
"command_name" => {
|
|
|
|
|
|
let word_node = child.named_child(0)?;
|
|
|
|
|
|
if !matches!(word_node.kind(), "word" | "number")
|
|
|
|
|
|
|| !is_literal_word_or_number(word_node)
|
|
|
|
|
|
{
|
|
|
|
|
|
return None;
|
|
|
|
|
|
}
|
|
|
|
|
|
words.push(word_node.utf8_text(src.as_bytes()).ok()?.to_owned());
|
|
|
|
|
|
}
|
|
|
|
|
|
"word" | "number" => {
|
|
|
|
|
|
if !is_literal_word_or_number(child) {
|
|
|
|
|
|
return None;
|
|
|
|
|
|
}
|
|
|
|
|
|
words.push(child.utf8_text(src.as_bytes()).ok()?.to_owned());
|
|
|
|
|
|
}
|
|
|
|
|
|
// Allow shell constructs that attach IO to a single command without
|
|
|
|
|
|
// changing argv matching semantics for the executable prefix.
|
|
|
|
|
|
"variable_assignment" | "comment" => {}
|
|
|
|
|
|
kind if is_allowed_heredoc_attachment_kind(kind) => {}
|
|
|
|
|
|
_ => return None,
|
|
|
|
|
|
}
|
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
|
|
if words.is_empty() { None } else { Some(words) }
|
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
|
|
fn is_literal_word_or_number(node: Node<'_>) -> bool {
|
|
|
|
|
|
if !matches!(node.kind(), "word" | "number") {
|
|
|
|
|
|
return false;
|
|
|
|
|
|
}
|
|
|
|
|
|
let mut cursor = node.walk();
|
|
|
|
|
|
node.named_children(&mut cursor).next().is_none()
|
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
|
|
fn is_allowed_heredoc_attachment_kind(kind: &str) -> bool {
|
|
|
|
|
|
matches!(
|
|
|
|
|
|
kind,
|
|
|
|
|
|
"heredoc_body"
|
|
|
|
|
|
| "simple_heredoc_body"
|
|
|
|
|
|
| "heredoc_redirect"
|
|
|
|
|
|
| "herestring_redirect"
|
|
|
|
|
|
| "file_redirect"
|
|
|
|
|
|
| "redirected_statement"
|
|
|
|
|
|
)
|
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
|
|
fn find_single_command_node(root: Node<'_>) -> Option<Node<'_>> {
|
|
|
|
|
|
let mut stack = vec![root];
|
|
|
|
|
|
let mut single_command = None;
|
|
|
|
|
|
while let Some(node) = stack.pop() {
|
|
|
|
|
|
if node.kind() == "command" {
|
|
|
|
|
|
if single_command.is_some() {
|
|
|
|
|
|
return None;
|
|
|
|
|
|
}
|
|
|
|
|
|
single_command = Some(node);
|
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
|
|
let mut cursor = node.walk();
|
|
|
|
|
|
for child in node.named_children(&mut cursor) {
|
|
|
|
|
|
stack.push(child);
|
|
|
|
|
|
}
|
|
|
|
|
|
}
|
|
|
|
|
|
single_command
|
|
|
|
|
|
}
|
|
|
|
|
|
|
2026-02-21 22:26:13 -08:00
|
|
|
|
fn has_named_descendant_kind(node: Node<'_>, kind: &str) -> bool {
|
|
|
|
|
|
let mut stack = vec![node];
|
|
|
|
|
|
while let Some(current) = stack.pop() {
|
|
|
|
|
|
if current.kind() == kind {
|
|
|
|
|
|
return true;
|
|
|
|
|
|
}
|
|
|
|
|
|
let mut cursor = current.walk();
|
|
|
|
|
|
for child in current.named_children(&mut cursor) {
|
|
|
|
|
|
stack.push(child);
|
|
|
|
|
|
}
|
|
|
|
|
|
}
|
|
|
|
|
|
false
|
|
|
|
|
|
}
|
|
|
|
|
|
|
2026-01-23 01:16:53 -05:00
|
|
|
|
fn parse_double_quoted_string(node: Node, src: &str) -> Option<String> {
|
|
|
|
|
|
if node.kind() != "string" {
|
|
|
|
|
|
return None;
|
|
|
|
|
|
}
|
2026-01-22 22:38:10 -08:00
|
|
|
|
|
2026-01-23 01:16:53 -05:00
|
|
|
|
let mut cursor = node.walk();
|
|
|
|
|
|
for part in node.named_children(&mut cursor) {
|
|
|
|
|
|
if part.kind() != "string_content" {
|
|
|
|
|
|
return None;
|
|
|
|
|
|
}
|
|
|
|
|
|
}
|
|
|
|
|
|
let raw = node.utf8_text(src.as_bytes()).ok()?;
|
|
|
|
|
|
let stripped = raw
|
|
|
|
|
|
.strip_prefix('"')
|
|
|
|
|
|
.and_then(|text| text.strip_suffix('"'))?;
|
|
|
|
|
|
Some(stripped.to_string())
|
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
|
|
fn parse_raw_string(node: Node, src: &str) -> Option<String> {
|
|
|
|
|
|
if node.kind() != "raw_string" {
|
|
|
|
|
|
return None;
|
|
|
|
|
|
}
|
2026-01-22 22:38:10 -08:00
|
|
|
|
|
2026-01-23 01:16:53 -05:00
|
|
|
|
let raw_string = node.utf8_text(src.as_bytes()).ok()?;
|
|
|
|
|
|
let stripped = raw_string
|
|
|
|
|
|
.strip_prefix('\'')
|
|
|
|
|
|
.and_then(|s| s.strip_suffix('\''));
|
|
|
|
|
|
stripped.map(str::to_owned)
|
|
|
|
|
|
}
|
|
|
|
|
|
|
feat: expand the set of commands that can be safely identified as "trusted" (#1668)
This PR updates `is_known_safe_command()` to account for "safe
operators" to expand the set of commands that can be run without
approval. This concept existed in the TypeScript CLI, and we are
[finally!] porting it to the Rust one:
https://github.com/openai/codex/blob/c9e2def49487585cfe6f8bb7b2be442e8c0b5e1b/codex-cli/src/approvals.ts#L531-L541
The idea is that if we have `EXPR1 SAFE_OP EXPR2` and `EXPR1` and
`EXPR2` are considered safe independently, then `EXPR1 SAFE_OP EXPR2`
should be considered safe. Currently, `SAFE_OP` includes `&&`, `||`,
`;`, and `|`.
In the TypeScript implementation, we relied on
https://www.npmjs.com/package/shell-quote to parse the string of Bash,
as it could provide a "lightweight" parse tree, parsing `'beep || boop >
/byte'` as:
```
[ 'beep', { op: '||' }, 'boop', { op: '>' }, '/byte' ]
```
Though in this PR, we introduce the use of
https://crates.io/crates/tree-sitter-bash for parsing (which
incidentally we were already using in
[`codex-apply-patch`](https://github.com/openai/codex/blob/c9e2def49487585cfe6f8bb7b2be442e8c0b5e1b/codex-rs/apply-patch/Cargo.toml#L18)),
which gives us a richer parse tree. (Incidentally, if you have never
played with tree-sitter, try the
[playground](https://tree-sitter.github.io/tree-sitter/7-playground.html)
and select **Bash** from the dropdown to see how it parses various
expressions.)
As a concrete example, prior to this change, our implementation of
`is_known_safe_command()` could verify things like:
```
["bash", "-lc", "grep -R \"Cargo.toml\" -n"]
```
but not:
```
["bash", "-lc", "grep -R \"Cargo.toml\" -n || true"]
```
With this change, the version with `|| true` is also accepted.
Admittedly, this PR does not expand the safety check to support
subshells, so it would reject, e.g. `bash -lc 'ls || (pwd && echo hi)'`,
but that can be addressed in a subsequent PR.
2025-07-24 14:13:30 -07:00
|
|
|
|
#[cfg(test)]
|
|
|
|
|
|
mod tests {
|
|
|
|
|
|
use super::*;
|
2026-01-23 01:16:53 -05:00
|
|
|
|
use pretty_assertions::assert_eq;
|
feat: expand the set of commands that can be safely identified as "trusted" (#1668)
This PR updates `is_known_safe_command()` to account for "safe
operators" to expand the set of commands that can be run without
approval. This concept existed in the TypeScript CLI, and we are
[finally!] porting it to the Rust one:
https://github.com/openai/codex/blob/c9e2def49487585cfe6f8bb7b2be442e8c0b5e1b/codex-cli/src/approvals.ts#L531-L541
The idea is that if we have `EXPR1 SAFE_OP EXPR2` and `EXPR1` and
`EXPR2` are considered safe independently, then `EXPR1 SAFE_OP EXPR2`
should be considered safe. Currently, `SAFE_OP` includes `&&`, `||`,
`;`, and `|`.
In the TypeScript implementation, we relied on
https://www.npmjs.com/package/shell-quote to parse the string of Bash,
as it could provide a "lightweight" parse tree, parsing `'beep || boop >
/byte'` as:
```
[ 'beep', { op: '||' }, 'boop', { op: '>' }, '/byte' ]
```
Though in this PR, we introduce the use of
https://crates.io/crates/tree-sitter-bash for parsing (which
incidentally we were already using in
[`codex-apply-patch`](https://github.com/openai/codex/blob/c9e2def49487585cfe6f8bb7b2be442e8c0b5e1b/codex-rs/apply-patch/Cargo.toml#L18)),
which gives us a richer parse tree. (Incidentally, if you have never
played with tree-sitter, try the
[playground](https://tree-sitter.github.io/tree-sitter/7-playground.html)
and select **Bash** from the dropdown to see how it parses various
expressions.)
As a concrete example, prior to this change, our implementation of
`is_known_safe_command()` could verify things like:
```
["bash", "-lc", "grep -R \"Cargo.toml\" -n"]
```
but not:
```
["bash", "-lc", "grep -R \"Cargo.toml\" -n || true"]
```
With this change, the version with `|| true` is also accepted.
Admittedly, this PR does not expand the safety check to support
subshells, so it would reject, e.g. `bash -lc 'ls || (pwd && echo hi)'`,
but that can be addressed in a subsequent PR.
2025-07-24 14:13:30 -07:00
|
|
|
|
|
|
|
|
|
|
fn parse_seq(src: &str) -> Option<Vec<Vec<String>>> {
|
2025-10-20 15:52:25 -07:00
|
|
|
|
let tree = try_parse_shell(src)?;
|
feat: expand the set of commands that can be safely identified as "trusted" (#1668)
This PR updates `is_known_safe_command()` to account for "safe
operators" to expand the set of commands that can be run without
approval. This concept existed in the TypeScript CLI, and we are
[finally!] porting it to the Rust one:
https://github.com/openai/codex/blob/c9e2def49487585cfe6f8bb7b2be442e8c0b5e1b/codex-cli/src/approvals.ts#L531-L541
The idea is that if we have `EXPR1 SAFE_OP EXPR2` and `EXPR1` and
`EXPR2` are considered safe independently, then `EXPR1 SAFE_OP EXPR2`
should be considered safe. Currently, `SAFE_OP` includes `&&`, `||`,
`;`, and `|`.
In the TypeScript implementation, we relied on
https://www.npmjs.com/package/shell-quote to parse the string of Bash,
as it could provide a "lightweight" parse tree, parsing `'beep || boop >
/byte'` as:
```
[ 'beep', { op: '||' }, 'boop', { op: '>' }, '/byte' ]
```
Though in this PR, we introduce the use of
https://crates.io/crates/tree-sitter-bash for parsing (which
incidentally we were already using in
[`codex-apply-patch`](https://github.com/openai/codex/blob/c9e2def49487585cfe6f8bb7b2be442e8c0b5e1b/codex-rs/apply-patch/Cargo.toml#L18)),
which gives us a richer parse tree. (Incidentally, if you have never
played with tree-sitter, try the
[playground](https://tree-sitter.github.io/tree-sitter/7-playground.html)
and select **Bash** from the dropdown to see how it parses various
expressions.)
As a concrete example, prior to this change, our implementation of
`is_known_safe_command()` could verify things like:
```
["bash", "-lc", "grep -R \"Cargo.toml\" -n"]
```
but not:
```
["bash", "-lc", "grep -R \"Cargo.toml\" -n || true"]
```
With this change, the version with `|| true` is also accepted.
Admittedly, this PR does not expand the safety check to support
subshells, so it would reject, e.g. `bash -lc 'ls || (pwd && echo hi)'`,
but that can be addressed in a subsequent PR.
2025-07-24 14:13:30 -07:00
|
|
|
|
try_parse_word_only_commands_sequence(&tree, src)
|
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
|
|
#[test]
|
|
|
|
|
|
fn accepts_single_simple_command() {
|
|
|
|
|
|
let cmds = parse_seq("ls -1").unwrap();
|
|
|
|
|
|
assert_eq!(cmds, vec![vec!["ls".to_string(), "-1".to_string()]]);
|
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
|
|
#[test]
|
|
|
|
|
|
fn accepts_multiple_commands_with_allowed_operators() {
|
|
|
|
|
|
let src = "ls && pwd; echo 'hi there' | wc -l";
|
|
|
|
|
|
let cmds = parse_seq(src).unwrap();
|
|
|
|
|
|
let expected: Vec<Vec<String>> = vec![
|
|
|
|
|
|
vec!["ls".to_string()],
|
2025-09-18 16:07:38 -07:00
|
|
|
|
vec!["pwd".to_string()],
|
|
|
|
|
|
vec!["echo".to_string(), "hi there".to_string()],
|
|
|
|
|
|
vec!["wc".to_string(), "-l".to_string()],
|
feat: expand the set of commands that can be safely identified as "trusted" (#1668)
This PR updates `is_known_safe_command()` to account for "safe
operators" to expand the set of commands that can be run without
approval. This concept existed in the TypeScript CLI, and we are
[finally!] porting it to the Rust one:
https://github.com/openai/codex/blob/c9e2def49487585cfe6f8bb7b2be442e8c0b5e1b/codex-cli/src/approvals.ts#L531-L541
The idea is that if we have `EXPR1 SAFE_OP EXPR2` and `EXPR1` and
`EXPR2` are considered safe independently, then `EXPR1 SAFE_OP EXPR2`
should be considered safe. Currently, `SAFE_OP` includes `&&`, `||`,
`;`, and `|`.
In the TypeScript implementation, we relied on
https://www.npmjs.com/package/shell-quote to parse the string of Bash,
as it could provide a "lightweight" parse tree, parsing `'beep || boop >
/byte'` as:
```
[ 'beep', { op: '||' }, 'boop', { op: '>' }, '/byte' ]
```
Though in this PR, we introduce the use of
https://crates.io/crates/tree-sitter-bash for parsing (which
incidentally we were already using in
[`codex-apply-patch`](https://github.com/openai/codex/blob/c9e2def49487585cfe6f8bb7b2be442e8c0b5e1b/codex-rs/apply-patch/Cargo.toml#L18)),
which gives us a richer parse tree. (Incidentally, if you have never
played with tree-sitter, try the
[playground](https://tree-sitter.github.io/tree-sitter/7-playground.html)
and select **Bash** from the dropdown to see how it parses various
expressions.)
As a concrete example, prior to this change, our implementation of
`is_known_safe_command()` could verify things like:
```
["bash", "-lc", "grep -R \"Cargo.toml\" -n"]
```
but not:
```
["bash", "-lc", "grep -R \"Cargo.toml\" -n || true"]
```
With this change, the version with `|| true` is also accepted.
Admittedly, this PR does not expand the safety check to support
subshells, so it would reject, e.g. `bash -lc 'ls || (pwd && echo hi)'`,
but that can be addressed in a subsequent PR.
2025-07-24 14:13:30 -07:00
|
|
|
|
];
|
|
|
|
|
|
assert_eq!(cmds, expected);
|
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
|
|
#[test]
|
|
|
|
|
|
fn extracts_double_and_single_quoted_strings() {
|
|
|
|
|
|
let cmds = parse_seq("echo \"hello world\"").unwrap();
|
|
|
|
|
|
assert_eq!(
|
|
|
|
|
|
cmds,
|
|
|
|
|
|
vec![vec!["echo".to_string(), "hello world".to_string()]]
|
|
|
|
|
|
);
|
|
|
|
|
|
|
|
|
|
|
|
let cmds2 = parse_seq("echo 'hi there'").unwrap();
|
|
|
|
|
|
assert_eq!(
|
|
|
|
|
|
cmds2,
|
|
|
|
|
|
vec![vec!["echo".to_string(), "hi there".to_string()]]
|
|
|
|
|
|
);
|
|
|
|
|
|
}
|
|
|
|
|
|
|
2026-01-23 01:16:53 -05:00
|
|
|
|
#[test]
|
|
|
|
|
|
fn accepts_double_quoted_strings_with_newlines() {
|
|
|
|
|
|
let cmds = parse_seq("git commit -m \"line1\nline2\"").unwrap();
|
|
|
|
|
|
assert_eq!(
|
|
|
|
|
|
cmds,
|
|
|
|
|
|
vec![vec![
|
|
|
|
|
|
"git".to_string(),
|
|
|
|
|
|
"commit".to_string(),
|
|
|
|
|
|
"-m".to_string(),
|
|
|
|
|
|
"line1\nline2".to_string(),
|
|
|
|
|
|
]]
|
|
|
|
|
|
);
|
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
|
|
#[test]
|
|
|
|
|
|
fn accepts_mixed_quote_concatenation() {
|
|
|
|
|
|
assert_eq!(
|
2026-01-22 22:38:10 -08:00
|
|
|
|
parse_seq(r#"echo "/usr"'/'"local"/bin"#).unwrap(),
|
|
|
|
|
|
vec![vec!["echo".to_string(), "/usr/local/bin".to_string()]]
|
|
|
|
|
|
);
|
|
|
|
|
|
assert_eq!(
|
|
|
|
|
|
parse_seq(r#"echo '/usr'"/"'local'/bin"#).unwrap(),
|
2026-01-23 01:16:53 -05:00
|
|
|
|
vec![vec!["echo".to_string(), "/usr/local/bin".to_string()]]
|
|
|
|
|
|
);
|
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
|
|
#[test]
|
|
|
|
|
|
fn rejects_double_quoted_strings_with_expansions() {
|
2026-01-22 22:38:10 -08:00
|
|
|
|
assert!(parse_seq(r#"echo "hi ${USER}""#).is_none());
|
|
|
|
|
|
assert!(parse_seq(r#"echo "$HOME""#).is_none());
|
2026-01-23 01:16:53 -05:00
|
|
|
|
}
|
|
|
|
|
|
|
feat: expand the set of commands that can be safely identified as "trusted" (#1668)
This PR updates `is_known_safe_command()` to account for "safe
operators" to expand the set of commands that can be run without
approval. This concept existed in the TypeScript CLI, and we are
[finally!] porting it to the Rust one:
https://github.com/openai/codex/blob/c9e2def49487585cfe6f8bb7b2be442e8c0b5e1b/codex-cli/src/approvals.ts#L531-L541
The idea is that if we have `EXPR1 SAFE_OP EXPR2` and `EXPR1` and
`EXPR2` are considered safe independently, then `EXPR1 SAFE_OP EXPR2`
should be considered safe. Currently, `SAFE_OP` includes `&&`, `||`,
`;`, and `|`.
In the TypeScript implementation, we relied on
https://www.npmjs.com/package/shell-quote to parse the string of Bash,
as it could provide a "lightweight" parse tree, parsing `'beep || boop >
/byte'` as:
```
[ 'beep', { op: '||' }, 'boop', { op: '>' }, '/byte' ]
```
Though in this PR, we introduce the use of
https://crates.io/crates/tree-sitter-bash for parsing (which
incidentally we were already using in
[`codex-apply-patch`](https://github.com/openai/codex/blob/c9e2def49487585cfe6f8bb7b2be442e8c0b5e1b/codex-rs/apply-patch/Cargo.toml#L18)),
which gives us a richer parse tree. (Incidentally, if you have never
played with tree-sitter, try the
[playground](https://tree-sitter.github.io/tree-sitter/7-playground.html)
and select **Bash** from the dropdown to see how it parses various
expressions.)
As a concrete example, prior to this change, our implementation of
`is_known_safe_command()` could verify things like:
```
["bash", "-lc", "grep -R \"Cargo.toml\" -n"]
```
but not:
```
["bash", "-lc", "grep -R \"Cargo.toml\" -n || true"]
```
With this change, the version with `|| true` is also accepted.
Admittedly, this PR does not expand the safety check to support
subshells, so it would reject, e.g. `bash -lc 'ls || (pwd && echo hi)'`,
but that can be addressed in a subsequent PR.
2025-07-24 14:13:30 -07:00
|
|
|
|
#[test]
|
|
|
|
|
|
fn accepts_numbers_as_words() {
|
|
|
|
|
|
let cmds = parse_seq("echo 123 456").unwrap();
|
|
|
|
|
|
assert_eq!(
|
|
|
|
|
|
cmds,
|
|
|
|
|
|
vec![vec![
|
|
|
|
|
|
"echo".to_string(),
|
|
|
|
|
|
"123".to_string(),
|
|
|
|
|
|
"456".to_string()
|
|
|
|
|
|
]]
|
|
|
|
|
|
);
|
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
|
|
#[test]
|
|
|
|
|
|
fn rejects_parentheses_and_subshells() {
|
|
|
|
|
|
assert!(parse_seq("(ls)").is_none());
|
|
|
|
|
|
assert!(parse_seq("ls || (pwd && echo hi)").is_none());
|
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
|
|
#[test]
|
|
|
|
|
|
fn rejects_redirections_and_unsupported_operators() {
|
|
|
|
|
|
assert!(parse_seq("ls > out.txt").is_none());
|
|
|
|
|
|
assert!(parse_seq("echo hi & echo bye").is_none());
|
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
|
|
#[test]
|
|
|
|
|
|
fn rejects_command_and_process_substitutions_and_expansions() {
|
|
|
|
|
|
assert!(parse_seq("echo $(pwd)").is_none());
|
|
|
|
|
|
assert!(parse_seq("echo `pwd`").is_none());
|
|
|
|
|
|
assert!(parse_seq("echo $HOME").is_none());
|
|
|
|
|
|
assert!(parse_seq("echo \"hi $USER\"").is_none());
|
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
|
|
#[test]
|
|
|
|
|
|
fn rejects_variable_assignment_prefix() {
|
|
|
|
|
|
assert!(parse_seq("FOO=bar ls").is_none());
|
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
|
|
#[test]
|
|
|
|
|
|
fn rejects_trailing_operator_parse_error() {
|
|
|
|
|
|
assert!(parse_seq("ls &&").is_none());
|
|
|
|
|
|
}
|
2025-10-20 15:52:25 -07:00
|
|
|
|
|
2026-02-10 19:22:23 -08:00
|
|
|
|
#[test]
|
|
|
|
|
|
fn rejects_empty_command_position_with_leading_operator() {
|
|
|
|
|
|
assert!(parse_seq("&& ls").is_none());
|
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
|
|
#[test]
|
|
|
|
|
|
fn rejects_empty_command_position_with_double_separator() {
|
|
|
|
|
|
assert!(parse_seq("ls ;; pwd").is_none());
|
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
|
|
#[test]
|
|
|
|
|
|
fn rejects_empty_command_position_with_empty_pipeline_segment() {
|
|
|
|
|
|
assert!(parse_seq("ls | | wc").is_none());
|
|
|
|
|
|
}
|
|
|
|
|
|
|
2025-10-20 15:52:25 -07:00
|
|
|
|
#[test]
|
|
|
|
|
|
fn parse_zsh_lc_plain_commands() {
|
|
|
|
|
|
let command = vec!["zsh".to_string(), "-lc".to_string(), "ls".to_string()];
|
|
|
|
|
|
let parsed = parse_shell_lc_plain_commands(&command).unwrap();
|
|
|
|
|
|
assert_eq!(parsed, vec![vec!["ls".to_string()]]);
|
|
|
|
|
|
}
|
2025-12-22 22:59:27 +03:00
|
|
|
|
|
|
|
|
|
|
#[test]
|
|
|
|
|
|
fn accepts_concatenated_flag_and_value() {
|
|
|
|
|
|
// Test case: -g"*.py" (flag directly concatenated with quoted value)
|
|
|
|
|
|
let cmds = parse_seq("rg -n \"foo\" -g\"*.py\"").unwrap();
|
|
|
|
|
|
assert_eq!(
|
|
|
|
|
|
cmds,
|
|
|
|
|
|
vec![vec![
|
|
|
|
|
|
"rg".to_string(),
|
|
|
|
|
|
"-n".to_string(),
|
|
|
|
|
|
"foo".to_string(),
|
|
|
|
|
|
"-g*.py".to_string(),
|
|
|
|
|
|
]]
|
|
|
|
|
|
);
|
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
|
|
#[test]
|
|
|
|
|
|
fn accepts_concatenated_flag_with_single_quotes() {
|
|
|
|
|
|
let cmds = parse_seq("grep -n 'pattern' -g'*.txt'").unwrap();
|
|
|
|
|
|
assert_eq!(
|
|
|
|
|
|
cmds,
|
|
|
|
|
|
vec![vec![
|
|
|
|
|
|
"grep".to_string(),
|
|
|
|
|
|
"-n".to_string(),
|
|
|
|
|
|
"pattern".to_string(),
|
|
|
|
|
|
"-g*.txt".to_string(),
|
|
|
|
|
|
]]
|
|
|
|
|
|
);
|
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
|
|
#[test]
|
|
|
|
|
|
fn rejects_concatenation_with_variable_substitution() {
|
|
|
|
|
|
// Environment variables in concatenated strings should be rejected
|
|
|
|
|
|
assert!(parse_seq("rg -g\"$VAR\" pattern").is_none());
|
|
|
|
|
|
assert!(parse_seq("rg -g\"${VAR}\" pattern").is_none());
|
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
|
|
#[test]
|
|
|
|
|
|
fn rejects_concatenation_with_command_substitution() {
|
|
|
|
|
|
// Command substitution in concatenated strings should be rejected
|
|
|
|
|
|
assert!(parse_seq("rg -g\"$(pwd)\" pattern").is_none());
|
|
|
|
|
|
assert!(parse_seq("rg -g\"$(echo '*.py')\" pattern").is_none());
|
|
|
|
|
|
}
|
2026-02-10 11:46:40 -08:00
|
|
|
|
|
|
|
|
|
|
#[test]
|
|
|
|
|
|
fn parse_shell_lc_single_command_prefix_supports_heredoc() {
|
|
|
|
|
|
let command = vec![
|
|
|
|
|
|
"zsh".to_string(),
|
|
|
|
|
|
"-lc".to_string(),
|
|
|
|
|
|
"python3 <<'PY'\nprint('hello')\nPY".to_string(),
|
|
|
|
|
|
];
|
|
|
|
|
|
let parsed = parse_shell_lc_single_command_prefix(&command);
|
|
|
|
|
|
assert_eq!(parsed, Some(vec!["python3".to_string()]));
|
|
|
|
|
|
|
|
|
|
|
|
let command_unquoted = vec![
|
|
|
|
|
|
"zsh".to_string(),
|
|
|
|
|
|
"-lc".to_string(),
|
|
|
|
|
|
"python3 << PY\nprint('hello')\nPY".to_string(),
|
|
|
|
|
|
];
|
|
|
|
|
|
let parsed_unquoted = parse_shell_lc_single_command_prefix(&command_unquoted);
|
|
|
|
|
|
assert_eq!(parsed_unquoted, Some(vec!["python3".to_string()]));
|
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
|
|
#[test]
|
|
|
|
|
|
fn parse_shell_lc_single_command_prefix_rejects_multi_command_scripts() {
|
|
|
|
|
|
let command = vec![
|
|
|
|
|
|
"bash".to_string(),
|
|
|
|
|
|
"-lc".to_string(),
|
|
|
|
|
|
"python3 <<'PY'\nprint('hello')\nPY\necho done".to_string(),
|
|
|
|
|
|
];
|
|
|
|
|
|
assert_eq!(parse_shell_lc_single_command_prefix(&command), None);
|
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
|
|
#[test]
|
|
|
|
|
|
fn parse_shell_lc_single_command_prefix_rejects_non_heredoc_redirects() {
|
|
|
|
|
|
let command = vec![
|
|
|
|
|
|
"bash".to_string(),
|
|
|
|
|
|
"-lc".to_string(),
|
|
|
|
|
|
"echo hello > /tmp/out.txt".to_string(),
|
|
|
|
|
|
];
|
2026-02-21 22:26:13 -08:00
|
|
|
|
assert_eq!(parse_shell_lc_single_command_prefix(&command), None);
|
2026-02-10 11:46:40 -08:00
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
|
|
#[test]
|
|
|
|
|
|
fn parse_shell_lc_single_command_prefix_accepts_heredoc_with_extra_redirect() {
|
|
|
|
|
|
let command = vec![
|
|
|
|
|
|
"bash".to_string(),
|
|
|
|
|
|
"-lc".to_string(),
|
|
|
|
|
|
"python3 <<'PY' > /tmp/out.txt\nprint('hello')\nPY".to_string(),
|
|
|
|
|
|
];
|
|
|
|
|
|
assert_eq!(
|
|
|
|
|
|
parse_shell_lc_single_command_prefix(&command),
|
|
|
|
|
|
Some(vec!["python3".to_string()])
|
|
|
|
|
|
);
|
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
|
|
#[test]
|
2026-02-16 23:11:59 -08:00
|
|
|
|
fn parse_shell_lc_single_command_prefix_rejects_herestring_with_chaining() {
|
|
|
|
|
|
let command = vec![
|
|
|
|
|
|
"bash".to_string(),
|
|
|
|
|
|
"-lc".to_string(),
|
|
|
|
|
|
r#"echo hello > /tmp/out.txt && cat /tmp/out.txt"#.to_string(),
|
|
|
|
|
|
];
|
|
|
|
|
|
assert_eq!(parse_shell_lc_single_command_prefix(&command), None);
|
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
|
|
#[test]
|
2026-02-10 11:46:40 -08:00
|
|
|
|
fn parse_shell_lc_single_command_prefix_rejects_herestring_with_substitution() {
|
|
|
|
|
|
let command = vec![
|
|
|
|
|
|
"bash".to_string(),
|
|
|
|
|
|
"-lc".to_string(),
|
|
|
|
|
|
r#"python3 <<< "$(rm -rf /)""#.to_string(),
|
|
|
|
|
|
];
|
|
|
|
|
|
assert_eq!(parse_shell_lc_single_command_prefix(&command), None);
|
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
|
|
#[test]
|
|
|
|
|
|
fn parse_shell_lc_single_command_prefix_rejects_arithmetic_shift_non_heredoc_script() {
|
|
|
|
|
|
let command = vec![
|
|
|
|
|
|
"bash".to_string(),
|
|
|
|
|
|
"-lc".to_string(),
|
|
|
|
|
|
"echo $((1<<2))".to_string(),
|
|
|
|
|
|
];
|
|
|
|
|
|
assert_eq!(parse_shell_lc_single_command_prefix(&command), None);
|
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
|
|
#[test]
|
|
|
|
|
|
fn parse_shell_lc_single_command_prefix_rejects_heredoc_command_with_word_expansion() {
|
|
|
|
|
|
let command = vec![
|
|
|
|
|
|
"bash".to_string(),
|
|
|
|
|
|
"-lc".to_string(),
|
|
|
|
|
|
"python3 $((1<<2)) <<'PY'\nprint('hello')\nPY".to_string(),
|
|
|
|
|
|
];
|
|
|
|
|
|
assert_eq!(parse_shell_lc_single_command_prefix(&command), None);
|
|
|
|
|
|
}
|
feat: expand the set of commands that can be safely identified as "trusted" (#1668)
This PR updates `is_known_safe_command()` to account for "safe
operators" to expand the set of commands that can be run without
approval. This concept existed in the TypeScript CLI, and we are
[finally!] porting it to the Rust one:
https://github.com/openai/codex/blob/c9e2def49487585cfe6f8bb7b2be442e8c0b5e1b/codex-cli/src/approvals.ts#L531-L541
The idea is that if we have `EXPR1 SAFE_OP EXPR2` and `EXPR1` and
`EXPR2` are considered safe independently, then `EXPR1 SAFE_OP EXPR2`
should be considered safe. Currently, `SAFE_OP` includes `&&`, `||`,
`;`, and `|`.
In the TypeScript implementation, we relied on
https://www.npmjs.com/package/shell-quote to parse the string of Bash,
as it could provide a "lightweight" parse tree, parsing `'beep || boop >
/byte'` as:
```
[ 'beep', { op: '||' }, 'boop', { op: '>' }, '/byte' ]
```
Though in this PR, we introduce the use of
https://crates.io/crates/tree-sitter-bash for parsing (which
incidentally we were already using in
[`codex-apply-patch`](https://github.com/openai/codex/blob/c9e2def49487585cfe6f8bb7b2be442e8c0b5e1b/codex-rs/apply-patch/Cargo.toml#L18)),
which gives us a richer parse tree. (Incidentally, if you have never
played with tree-sitter, try the
[playground](https://tree-sitter.github.io/tree-sitter/7-playground.html)
and select **Bash** from the dropdown to see how it parses various
expressions.)
As a concrete example, prior to this change, our implementation of
`is_known_safe_command()` could verify things like:
```
["bash", "-lc", "grep -R \"Cargo.toml\" -n"]
```
but not:
```
["bash", "-lc", "grep -R \"Cargo.toml\" -n || true"]
```
With this change, the version with `|| true` is also accepted.
Admittedly, this PR does not expand the safety check to support
subshells, so it would reject, e.g. `bash -lc 'ls || (pwd && echo hi)'`,
but that can be addressed in a subsequent PR.
2025-07-24 14:13:30 -07:00
|
|
|
|
}
|