mirror of
https://github.com/ivuorinen/tree-sitter-shellspec.git
synced 2026-03-18 12:04:54 +00:00
* feat: implement complete tree-sitter-shellspec grammar with comprehensive testing - Add full ShellSpec grammar extending tree-sitter-bash - Support all ShellSpec constructs: Describe, Context, It, hooks, utilities - Include Data block parsing with statements and argument styles - Add 61 comprehensive test cases covering real-world patterns - Implement optimized GitHub workflows with CI/CD automation - Configure complete development tooling (linting, formatting, pre-commit) - Add comprehensive documentation and contribution guidelines - Optimize grammar conflicts to zero warnings - Support editor integration for Neovim, VS Code, Emacs Breaking Changes: - Initial release, no previous API to break BREAKING CHANGE: Initial implementation of tree-sitter-shellspec grammar * fix(ci): checkout before using local actions * fix(ci): use inline steps instead of actions * fix(ci): add checkout before testing, cleanup * chore(ci): add coderabbit config * chore: lint and code review fixes * chore(ci): update workflows * refactor: enhance CI/CD workflows and apply CodeRabbit suggestions - Convert GitHub Actions from local to inline actions for better maintainability - Add comprehensive caching for npm dependencies, tree-sitter CLI, and build artifacts - Fix checkout steps missing in test matrix jobs - Apply defensive programming in test coverage validation - Use local tree-sitter CLI via npx instead of global installation - Update tree-sitter-cli to v0.25.0 for compatibility with tree-sitter-bash - Add proper tree-sitter field to package.json with grammar metadata - Fix grammar precedence for Data blocks (#| lines now have higher precedence) - Standardize dates in memory files to September 12, 2025 - Enhance workflow robustness with dynamic workflow ID resolution - Improve test file pattern matching and error handling This commit addresses all CodeRabbit review suggestions and optimizes GitHub Actions workflows for better performance and reliability. * fix: apply CodeRabbit nitpick suggestions and improve code quality - Fix grammar.js TypeScript errors by correcting optional field usage - Update .yamlignore to use more robust glob pattern (**/node_modules/**) - Remove hard-coded test count from README.md for maintainability - Fix shellcheck directive format (add space after #) in all test specs - Fix typos throughout test specifications: - 'can not' → 'cannot' - 'expantion' → 'expansion' - 'singnal' → 'signal' - 'It mean' → 'It means' - Update CODE_OF_CONDUCT.md HTTP links to HTTPS - Update tree-sitter parse command to use --scope instead of --language - Add comments to .mega-linter.yml explaining disabled linters All grammar tests still pass (61/61) and the parser functions correctly with the updated tree-sitter CLI v0.25.0. * perf: optimize grammar for 32x faster parsing performance - Reduce grammar conflicts to essential bash and ShellSpec rules only - Restore original precedence values for consistent rule ordering - Simplify Data block rule while maintaining all functionality - Add required statements field to match test expectations Performance improvements: - Parse speed: ~55 bytes/ms → 1784 bytes/ms (32x faster) - All 61 tests still pass (100% success rate) - Significantly reduced parser generation time and runtime complexity The optimizations focused on minimizing unnecessary conflicts and simplifying complex choice structures while preserving full ShellSpec grammar compatibility and correctness. * fix(ci): ensure parser is built before testing in GitHub workflows - Add explicit parser build step before sample code testing - Remove --scope parameter that requires parser to be compiled first - Fix tree-sitter CLI v0.25.0 compatibility issue in CI environment The issue was that tree-sitter CLI v0.25.0 requires the parser to be compiled before it can recognize custom language scopes. This fix ensures the parser is always built before attempting to parse test files, resolving the 'Unknown scope' error in GitHub Actions. * feat(ci): expand cache paths to support all Node.js package managers - Add comprehensive caching for npm, yarn, and pnpm package managers - Cache paths now include: - npm: ~/.npm, node_modules/.cache - yarn: ~/.yarn, ~/.cache/yarn, ~/.cache/yarn/global - pnpm: ~/.pnpm-store, ~/.cache/pnpm, ~/.local/share/pnpm/global - Update cache keys to include all lockfile types (package-lock.json, yarn.lock, pnpm-lock.yaml) - Rename 'Cache Tree-sitter CLI' to 'Cache npx store' for clarity - Apply changes consistently across test, lint, and coverage jobs This improves cache hit rates and build performance regardless of which Node.js package manager is used in the development environment. * chore: tweaks to megalinter and grammar.js * fix(scanner): address memory safety and correctness issues in C code - Add len==0 check in set_contains() to prevent buffer overflow - Add missing stdlib.h include in scanner.c - Clear heredoc stack properly in deserialize when length==0 - Ensure NUL termination in delimiter deserialization - Create alloc.c to define ts_current_* symbols for TREE_SITTER_REUSE_ALLOCATOR All changes tested with full test suite: 61/61 tests passing. Addresses PR #1 review comments from CodeRabbit. * ci: improve workflow determinism and security scanning - Add --language=shellspec flag to tree-sitter parse for deterministic grammar selection - Add C++ language to CodeQL analysis to scan src/scanner.c for security issues Addresses PR #1 review comments from CodeRabbit. * test: fix typos and incorrect hook usage in spec files - Fix 'yot' → 'yet' typos in test/spec/03.example_spec.sh - Fix 'Sometime' → 'Sometimes' and cpunum.sh references in test/spec/22.sourcced_spec.sh - Fix Before → After in after hook section of test/spec/07.before_after_hook_spec.sh - Improve wording and capitalization throughout hook spec file All 61 tests still passing after corrections. Addresses PR #1 review comments from Copilot and CodeRabbit. * docs: update Node.js requirement to match CI configuration - Change Node.js requirement from v16 to v22+ to align with CI matrix - Update tree-sitter CLI recommendation from global install to npx usage - Matches actual devDependency configuration in package.json Addresses PR #1 review comment from CodeRabbit. * chore: update dependencies and workflow actions - Update GitHub Actions to latest versions (checkout v6, setup-node v6, cache v4.3) - Update package dependencies - Format workflow files - Update .gitignore and project configuration * fix(ci): remove unsupported --language flag from tree-sitter parse The --language flag is not supported in tree-sitter-cli 0.25.10. Tree-sitter correctly auto-detects the grammar based on file extension. * chore: add prettier and format all files - Install prettier ^3.6.2 - Add .prettierrc with project formatting rules - Add .prettierignore to exclude generated files and dependencies - Add npm scripts: format and format:check - Format all files with prettier * chore: add eclint for editorconfig linting and fix violations - Install eclint ^2.8.1 for editorconfig validation and fixing - Add .eclintignore to exclude generated files and dependencies - Add npm scripts: lint:editorconfig and lint:editorconfig:fix - Fix indentation issues in CONTRIBUTING.md (3 spaces -> 2 spaces) - Fix code alignment in scanner.c to match editorconfig rules - Regenerate parser after scanner.c formatting changes * feat: add post-generation script to preserve buffer overflow fix Created scripts/post-generate.sh that automatically re-applies the critical buffer overflow fix to parser.h after tree-sitter generate runs. This fix prevents undefined behavior in set_contains() when accessing an empty array. The script is automatically executed after tree-sitter generate via the npm generate script. Added generate:only for cases where post-processing should be skipped. * fix: address code review findings and critical issues Critical Fixes: - Fixed EditorConfig violations in grammar.js, scanner.c, README.md, .mega-linter.yml - Changed JSDoc comments from 1-space to 2-space indent per .editorconfig - Fixed line length violations in README.md and .mega-linter.yml - Updated test count badge from 59/59 to 61/61 in README.md - Created queries/highlights.scm for syntax highlighting support - Updated package.json with repository and files fields Configuration Updates: - Added repository field pointing to GitHub - Added files field to control npm package contents - Properly formatted CONTRIBUTING.md with prettier All 61 tests passing (100% success rate) All critical EditorConfig violations resolved * enhance: add Data block test coverage and improve syntax highlighting High Priority Enhancements: - Added 2 new Data block test cases for :raw and :expand modifiers - Enhanced syntax highlighting with "End" keyword (block terminator) - Added Data block modifiers (:raw, :expand, #|) to highlighting Test Coverage: - 63/63 tests passing (100%) - Test count increased from 61 to 63 - Average parse speed: 623 bytes/ms * docs: add comprehensive grammar documentation and precedence explanation Medium Priority Enhancement: - Added detailed precedence strategy comments explaining how ShellSpec extends bash - Documented all 5 conflicts with resolution strategies - Explained why conflicts are necessary and optimal - Added context about GLR parsing and precedence hints Documentation improvements: - Precedence levels clearly explained (bash: 1, ShellSpec: 2) - Each conflict documented with resolution strategy - Notes on intentional design decisions - Helps future maintainers understand grammar design * fix: resolve documentation inconsistencies and add ExampleGroup variants Documentation Fixes: - README.md: Update test count from 59 to 63 (badge, features, test command) - README.md: Fix lint script references to actual npm scripts - CONTRIBUTING.md: Correct format script reference to npm run format:check - package.json: Remove non-existent yamllint script, split lint:markdown into check/fix variants Grammar Enhancements: - Add fExampleGroup and xExampleGroup to Context block variants - Regenerate parser with new grammar (63/63 tests passing, 100% success rate) Syntax Highlighting: - Add fExampleGroup and xExampleGroup to focused/skipped block highlights - Remove non-matching Data modifier tokens (:raw, :expand, #|) - Add "End" keyword as block terminator Memory File Corrections: - Remove incorrect merge_group trigger references - Remove pr-lint.yml workflow references (deleted in previous optimization) - Update test counts with timestamps (59→63, added 2025-12-11) - Update conflict count (13→5, optimized) Code Style: - Auto-format renovate.json and tree-sitter.json with prettier * chore: update dependencies and project configuration - Align tree-sitter dependencies to latest versions (bash 0.25.1, cli 0.25.10) - Clean up .gitignore redundant patterns and normalize path styles - Improve CodeRabbit configuration with path filters and simplified instructions - Add test corpus exclusion to match project intent * docs: improve documentation and memory files - Update CONTRIBUTING.md code style check commands with actual available scripts - Use npx tree-sitter in test examples to avoid assuming global installation - Improve project status memory file with proper JSON formatting - Add CI enforcement recommendation for zero-conflict grammar generation - Align prerequisites with CI requirements (Node 22+) * ci: improve workflow configuration and reliability - Replace global read-all permissions with scoped permissions (contents: read, actions: write) - Fix cache configuration to exclude node_modules and include package-lock.json - Improve CI workflow resolution with flexible path matching and pagination - Verify version instead of committing version bumps from CI - Detect prereleases and publish with appropriate npm tags (next vs latest) - Use generic test suite description in release notes to avoid drift * fix: remove non-existent locals.scm reference from tree-sitter.json Remove queries/locals.scm from locals array as the file does not exist. Only queries/highlights.scm is present in the repository. * security: replace vulnerable eclint with editorconfig-checker - Remove eclint@2.8.1 (has 15 vulnerabilities, possibly abandoned) - Add editorconfig-checker@6.1.1 (actively maintained, zero vulnerabilities) - Update npm scripts to use editorconfig-checker commands - Resolves all 15 security vulnerabilities (8 moderate, 7 high) editorconfig-checker is a more modern, actively maintained alternative written in Go with no Node.js dependency vulnerabilities. * style: fix JSDoc comment indentation * fix(ci): separate CodeQL languages in matrix Previously 'actions,javascript' was treated as a single language. Now correctly split into separate 'actions' and 'javascript' entries. * chore(deps): update GitHub Actions dependencies - actions/checkout: v6.0.0 -> v6.0.1 - actions/setup-node: v6.0.0 -> v6.1.0 - softprops/action-gh-release: v2.4.2 -> v2.5.0 - ivuorinen/actions/*: v2025.11.x -> v2025.12.10 * ci: restore pr-lint workflow from main * chore(deps): update GitHub Actions dependencies Update action pins: checkout v6.0.2, setup-node v6.3.0, cache v5.0.3, pr-lint v2026.03.07. Add checkov skip comment, VERSION prefix strip, and scanner.c to grammar cache key. * feat: extend grammar with hooks, mocks, statements, and directives Add 27 ShellSpec-specific grammar rules covering hook blocks/statements, mock blocks, When/The/Assert statements, Path/Set/Dump/Intercept statements, Parameters variants, Pending/Skip/Todo, and percent directives. Update highlights and test corpus with 128 passing tests. * docs: update README, CLAUDE.md, and test spec comments Comprehensive README rewrite documenting all 27 grammar rules, block types, statement types, and directives. Add CLAUDE.md project instructions for Claude Code. Update test spec file comments for clarity. * chore: update project config and dependencies Update tree-sitter-cli to ^0.26.6, remove broken lint:editorconfig:fix script. Update shellcheck disabled rules. Add JSDoc header to post-generate script. Update gitignore for build artifacts. * fix(ci): use env var for version in release publish step Replace direct expression interpolation with VERSION env var to fix actionlint SC2193 false positive in the npm publish step. * style: fix editorconfig and markdownlint issues Break long cache key YAML value into multiline scalar to comply with 160-character line length limit. * ci: update MegaLinter config for ShellSpec project Add disabled linters for generated/DSL code false positives, v8r exclude pattern, and broader path filter. Remove .coderabbit.yaml in favor of shared org config. * chore: add Claude Code automation config Add hooks (pre-edit guard for generated files, post-edit lint), skills (generate-and-test, add-shellspec-rule, debug-parse-failure, update-highlights, validate-release), and grammar-validator agent. * chore: update Serena memory files Update project status, real-world ShellSpec patterns, and GitHub workflows optimization memory files. * fix(ci): fix MegaLinter config and remove C from CodeQL - Change FILTER_REGEX_EXCLUDE from `>` to `>-` to strip trailing newline that silently broke all path exclusions - Add YAML_V8R_FILTER_REGEX_EXCLUDE to skip schema validation on .mega-linter.yml (schemastore enum is outdated for BASH_BASH_EXEC) - Remove "c" from CodeQL language matrix since src/parser.c is generated and produces false positives * fix: add missing test spec stub files - Add test/spec/lib.sh stub with calc() function (referenced by 01.very_simple_spec.sh) - Add test/spec/count_cpus.sh stub (referenced by 21.intercept_spec.sh)
1292 lines
38 KiB
C
1292 lines
38 KiB
C
#include "tree_sitter/array.h"
|
|
#include "tree_sitter/parser.h"
|
|
|
|
#include <assert.h>
|
|
#include <ctype.h>
|
|
#include <stdlib.h>
|
|
#include <string.h>
|
|
#include <wctype.h>
|
|
|
|
enum TokenType {
|
|
HEREDOC_START,
|
|
SIMPLE_HEREDOC_BODY,
|
|
HEREDOC_BODY_BEGINNING,
|
|
HEREDOC_CONTENT,
|
|
HEREDOC_END,
|
|
FILE_DESCRIPTOR,
|
|
EMPTY_VALUE,
|
|
CONCAT,
|
|
VARIABLE_NAME,
|
|
TEST_OPERATOR,
|
|
REGEX,
|
|
REGEX_NO_SLASH,
|
|
REGEX_NO_SPACE,
|
|
EXPANSION_WORD,
|
|
EXTGLOB_PATTERN,
|
|
BARE_DOLLAR,
|
|
BRACE_START,
|
|
IMMEDIATE_DOUBLE_HASH,
|
|
EXTERNAL_EXPANSION_SYM_HASH,
|
|
EXTERNAL_EXPANSION_SYM_BANG,
|
|
EXTERNAL_EXPANSION_SYM_EQUAL,
|
|
CLOSING_BRACE,
|
|
CLOSING_BRACKET,
|
|
HEREDOC_ARROW,
|
|
HEREDOC_ARROW_DASH,
|
|
NEWLINE,
|
|
OPENING_PAREN,
|
|
ESAC,
|
|
ERROR_RECOVERY,
|
|
};
|
|
|
|
typedef Array(char) String;
|
|
|
|
typedef struct {
|
|
bool is_raw;
|
|
bool started;
|
|
bool allows_indent;
|
|
String delimiter;
|
|
String current_leading_word;
|
|
} Heredoc;
|
|
|
|
#define heredoc_new() \
|
|
{ \
|
|
.is_raw = false, \
|
|
.started = false, \
|
|
.allows_indent = false, \
|
|
.delimiter = array_new(), \
|
|
.current_leading_word = array_new(), \
|
|
};
|
|
|
|
typedef struct {
|
|
uint8_t last_glob_paren_depth;
|
|
bool ext_was_in_double_quote;
|
|
bool ext_saw_outside_quote;
|
|
Array(Heredoc) heredocs;
|
|
} Scanner;
|
|
|
|
static inline void advance(TSLexer *lexer) { lexer->advance(lexer, false); }
|
|
|
|
static inline void skip(TSLexer *lexer) { lexer->advance(lexer, true); }
|
|
|
|
static inline bool in_error_recovery(const bool *valid_symbols) {
|
|
return valid_symbols[ERROR_RECOVERY];
|
|
}
|
|
|
|
static inline void reset_string(String *string) {
|
|
if (string->size > 0) {
|
|
memset(string->contents, 0, string->size);
|
|
array_clear(string);
|
|
}
|
|
}
|
|
|
|
static inline void reset_heredoc(Heredoc *heredoc) {
|
|
heredoc->is_raw = false;
|
|
heredoc->started = false;
|
|
heredoc->allows_indent = false;
|
|
reset_string(&heredoc->delimiter);
|
|
}
|
|
|
|
static inline void reset(Scanner *scanner) {
|
|
for (uint32_t i = 0; i < scanner->heredocs.size; i++) {
|
|
reset_heredoc(array_get(&scanner->heredocs, i));
|
|
}
|
|
}
|
|
|
|
static unsigned serialize(Scanner *scanner, char *buffer) {
|
|
uint32_t size = 0;
|
|
|
|
buffer[size++] = (char)scanner->last_glob_paren_depth;
|
|
buffer[size++] = (char)scanner->ext_was_in_double_quote;
|
|
buffer[size++] = (char)scanner->ext_saw_outside_quote;
|
|
buffer[size++] = (char)scanner->heredocs.size;
|
|
|
|
for (uint32_t i = 0; i < scanner->heredocs.size; i++) {
|
|
Heredoc *heredoc = array_get(&scanner->heredocs, i);
|
|
if (size + 3 + sizeof(uint32_t) + heredoc->delimiter.size >=
|
|
TREE_SITTER_SERIALIZATION_BUFFER_SIZE) {
|
|
return 0;
|
|
}
|
|
|
|
buffer[size++] = (char)heredoc->is_raw;
|
|
buffer[size++] = (char)heredoc->started;
|
|
buffer[size++] = (char)heredoc->allows_indent;
|
|
|
|
memcpy(&buffer[size], &heredoc->delimiter.size, sizeof(uint32_t));
|
|
size += sizeof(uint32_t);
|
|
if (heredoc->delimiter.size > 0) {
|
|
memcpy(&buffer[size], heredoc->delimiter.contents,
|
|
heredoc->delimiter.size);
|
|
size += heredoc->delimiter.size;
|
|
}
|
|
}
|
|
return size;
|
|
}
|
|
|
|
static void deserialize(Scanner *scanner, const char *buffer, unsigned length) {
|
|
if (length == 0) {
|
|
// Fully clear heredocs to avoid stale stack entries after reset
|
|
for (uint32_t i = 0; i < scanner->heredocs.size; i++) {
|
|
Heredoc *h = array_get(&scanner->heredocs, i);
|
|
array_delete(&h->current_leading_word);
|
|
array_delete(&h->delimiter);
|
|
}
|
|
array_clear(&scanner->heredocs);
|
|
reset(scanner);
|
|
} else {
|
|
uint32_t size = 0;
|
|
scanner->last_glob_paren_depth = buffer[size++];
|
|
scanner->ext_was_in_double_quote = buffer[size++];
|
|
scanner->ext_saw_outside_quote = buffer[size++];
|
|
uint32_t heredoc_count = (unsigned char)buffer[size++];
|
|
for (uint32_t i = 0; i < heredoc_count; i++) {
|
|
Heredoc *heredoc = NULL;
|
|
if (i < scanner->heredocs.size) {
|
|
heredoc = array_get(&scanner->heredocs, i);
|
|
} else {
|
|
Heredoc new_heredoc = heredoc_new();
|
|
array_push(&scanner->heredocs, new_heredoc);
|
|
heredoc = array_back(&scanner->heredocs);
|
|
}
|
|
|
|
heredoc->is_raw = buffer[size++];
|
|
heredoc->started = buffer[size++];
|
|
heredoc->allows_indent = buffer[size++];
|
|
|
|
memcpy(&heredoc->delimiter.size, &buffer[size], sizeof(uint32_t));
|
|
size += sizeof(uint32_t);
|
|
array_reserve(&heredoc->delimiter, heredoc->delimiter.size > 0 ? heredoc->delimiter.size : 1);
|
|
|
|
if (heredoc->delimiter.size > 0) {
|
|
memcpy(heredoc->delimiter.contents, &buffer[size],
|
|
heredoc->delimiter.size);
|
|
size += heredoc->delimiter.size;
|
|
// Ensure NUL termination for safety
|
|
if (heredoc->delimiter.contents[heredoc->delimiter.size - 1] != '\0') {
|
|
array_reserve(&heredoc->delimiter, heredoc->delimiter.size + 1);
|
|
heredoc->delimiter.contents[heredoc->delimiter.size] = '\0';
|
|
heredoc->delimiter.size++;
|
|
}
|
|
}
|
|
}
|
|
assert(size == length);
|
|
}
|
|
}
|
|
|
|
/**
|
|
* Consume a "word" in POSIX parlance, and returns it unquoted.
|
|
*
|
|
* This is an approximate implementation that doesn't deal with any
|
|
* POSIX-mandated substitution, and assumes the default value for
|
|
* IFS.
|
|
*/
|
|
static bool advance_word(TSLexer *lexer, String *unquoted_word) {
|
|
bool empty = true;
|
|
|
|
int32_t quote = 0;
|
|
if (lexer->lookahead == '\'' || lexer->lookahead == '"') {
|
|
quote = lexer->lookahead;
|
|
advance(lexer);
|
|
}
|
|
|
|
while (lexer->lookahead &&
|
|
!(quote ? lexer->lookahead == quote || lexer->lookahead == '\r' ||
|
|
lexer->lookahead == '\n'
|
|
: iswspace(lexer->lookahead))) {
|
|
if (lexer->lookahead == '\\') {
|
|
advance(lexer);
|
|
if (!lexer->lookahead) {
|
|
return false;
|
|
}
|
|
}
|
|
empty = false;
|
|
array_push(unquoted_word, lexer->lookahead);
|
|
advance(lexer);
|
|
}
|
|
array_push(unquoted_word, '\0');
|
|
|
|
if (quote && lexer->lookahead == quote) {
|
|
advance(lexer);
|
|
}
|
|
|
|
return !empty;
|
|
}
|
|
|
|
static inline bool scan_bare_dollar(TSLexer *lexer) {
|
|
while (iswspace(lexer->lookahead) && lexer->lookahead != '\n' &&
|
|
!lexer->eof(lexer)) {
|
|
skip(lexer);
|
|
}
|
|
|
|
if (lexer->lookahead == '$') {
|
|
advance(lexer);
|
|
lexer->result_symbol = BARE_DOLLAR;
|
|
lexer->mark_end(lexer);
|
|
return iswspace(lexer->lookahead) || lexer->eof(lexer) ||
|
|
lexer->lookahead == '\"';
|
|
}
|
|
|
|
return false;
|
|
}
|
|
|
|
static bool scan_heredoc_start(Heredoc *heredoc, TSLexer *lexer) {
|
|
while (iswspace(lexer->lookahead)) {
|
|
skip(lexer);
|
|
}
|
|
|
|
lexer->result_symbol = HEREDOC_START;
|
|
heredoc->is_raw = lexer->lookahead == '\'' || lexer->lookahead == '"' ||
|
|
lexer->lookahead == '\\';
|
|
|
|
bool found_delimiter = advance_word(lexer, &heredoc->delimiter);
|
|
if (!found_delimiter) {
|
|
reset_string(&heredoc->delimiter);
|
|
return false;
|
|
}
|
|
return found_delimiter;
|
|
}
|
|
|
|
static bool scan_heredoc_end_identifier(Heredoc *heredoc, TSLexer *lexer) {
|
|
reset_string(&heredoc->current_leading_word);
|
|
// Scan the first 'n' characters on this line, to see if they match the
|
|
// heredoc delimiter
|
|
int32_t size = 0;
|
|
if (heredoc->delimiter.size > 0) {
|
|
while (lexer->lookahead != '\0' && lexer->lookahead != '\n' &&
|
|
(int32_t)*array_get(&heredoc->delimiter, size) == lexer->lookahead &&
|
|
heredoc->current_leading_word.size < heredoc->delimiter.size) {
|
|
array_push(&heredoc->current_leading_word, lexer->lookahead);
|
|
advance(lexer);
|
|
size++;
|
|
}
|
|
}
|
|
array_push(&heredoc->current_leading_word, '\0');
|
|
return heredoc->delimiter.size == 0
|
|
? false
|
|
: strcmp(heredoc->current_leading_word.contents,
|
|
heredoc->delimiter.contents) == 0;
|
|
}
|
|
|
|
static bool scan_heredoc_content(Scanner *scanner, TSLexer *lexer,
|
|
enum TokenType middle_type,
|
|
enum TokenType end_type) {
|
|
bool did_advance = false;
|
|
Heredoc *heredoc = array_back(&scanner->heredocs);
|
|
|
|
for (;;) {
|
|
switch (lexer->lookahead) {
|
|
case '\0': {
|
|
if (lexer->eof(lexer) && did_advance) {
|
|
reset_heredoc(heredoc);
|
|
lexer->result_symbol = end_type;
|
|
return true;
|
|
}
|
|
return false;
|
|
}
|
|
|
|
case '\\': {
|
|
did_advance = true;
|
|
advance(lexer);
|
|
advance(lexer);
|
|
break;
|
|
}
|
|
|
|
case '$': {
|
|
if (heredoc->is_raw) {
|
|
did_advance = true;
|
|
advance(lexer);
|
|
break;
|
|
}
|
|
if (did_advance) {
|
|
lexer->mark_end(lexer);
|
|
lexer->result_symbol = middle_type;
|
|
heredoc->started = true;
|
|
advance(lexer);
|
|
if (iswalpha(lexer->lookahead) || lexer->lookahead == '{' ||
|
|
lexer->lookahead == '(') {
|
|
return true;
|
|
}
|
|
break;
|
|
}
|
|
if (middle_type == HEREDOC_BODY_BEGINNING &&
|
|
lexer->get_column(lexer) == 0) {
|
|
lexer->result_symbol = middle_type;
|
|
heredoc->started = true;
|
|
return true;
|
|
}
|
|
return false;
|
|
}
|
|
|
|
case '\n': {
|
|
if (!did_advance) {
|
|
skip(lexer);
|
|
} else {
|
|
advance(lexer);
|
|
}
|
|
did_advance = true;
|
|
if (heredoc->allows_indent) {
|
|
while (iswspace(lexer->lookahead)) {
|
|
advance(lexer);
|
|
}
|
|
}
|
|
lexer->result_symbol = heredoc->started ? middle_type : end_type;
|
|
lexer->mark_end(lexer);
|
|
if (scan_heredoc_end_identifier(heredoc, lexer)) {
|
|
if (lexer->result_symbol == HEREDOC_END) {
|
|
array_pop(&scanner->heredocs);
|
|
}
|
|
return true;
|
|
}
|
|
break;
|
|
}
|
|
|
|
default: {
|
|
if (lexer->get_column(lexer) == 0) {
|
|
// an alternative is to check the starting column of the
|
|
// heredoc body and track that statefully
|
|
while (iswspace(lexer->lookahead)) {
|
|
if (did_advance) {
|
|
advance(lexer);
|
|
} else {
|
|
skip(lexer);
|
|
}
|
|
}
|
|
if (end_type != SIMPLE_HEREDOC_BODY) {
|
|
lexer->result_symbol = middle_type;
|
|
if (scan_heredoc_end_identifier(heredoc, lexer)) {
|
|
return true;
|
|
}
|
|
}
|
|
if (end_type == SIMPLE_HEREDOC_BODY) {
|
|
lexer->result_symbol = end_type;
|
|
lexer->mark_end(lexer);
|
|
if (scan_heredoc_end_identifier(heredoc, lexer)) {
|
|
return true;
|
|
}
|
|
}
|
|
}
|
|
did_advance = true;
|
|
advance(lexer);
|
|
break;
|
|
}
|
|
}
|
|
}
|
|
}
|
|
|
|
static bool scan(Scanner *scanner, TSLexer *lexer, const bool *valid_symbols) {
|
|
if (valid_symbols[CONCAT] && !in_error_recovery(valid_symbols)) {
|
|
if (!(lexer->lookahead == 0 || iswspace(lexer->lookahead) ||
|
|
lexer->lookahead == '>' || lexer->lookahead == '<' ||
|
|
lexer->lookahead == ')' || lexer->lookahead == '(' ||
|
|
lexer->lookahead == ';' || lexer->lookahead == '&' ||
|
|
lexer->lookahead == '|' ||
|
|
(lexer->lookahead == '}' && valid_symbols[CLOSING_BRACE]) ||
|
|
(lexer->lookahead == ']' && valid_symbols[CLOSING_BRACKET]))) {
|
|
lexer->result_symbol = CONCAT;
|
|
// So for a`b`, we want to return a concat. We check if the
|
|
// 2nd backtick has whitespace after it, and if it does we
|
|
// return concat.
|
|
if (lexer->lookahead == '`') {
|
|
lexer->mark_end(lexer);
|
|
advance(lexer);
|
|
while (lexer->lookahead != '`' && !lexer->eof(lexer)) {
|
|
advance(lexer);
|
|
}
|
|
if (lexer->eof(lexer)) {
|
|
return false;
|
|
}
|
|
if (lexer->lookahead == '`') {
|
|
advance(lexer);
|
|
}
|
|
return iswspace(lexer->lookahead) || lexer->eof(lexer);
|
|
}
|
|
// strings w/ expansions that contains escaped quotes or
|
|
// backslashes need this to return a concat
|
|
if (lexer->lookahead == '\\') {
|
|
lexer->mark_end(lexer);
|
|
advance(lexer);
|
|
if (lexer->lookahead == '"' || lexer->lookahead == '\'' ||
|
|
lexer->lookahead == '\\') {
|
|
return true;
|
|
}
|
|
if (lexer->eof(lexer)) {
|
|
return false;
|
|
}
|
|
} else {
|
|
return true;
|
|
}
|
|
}
|
|
if (iswspace(lexer->lookahead) && valid_symbols[CLOSING_BRACE] &&
|
|
!valid_symbols[EXPANSION_WORD]) {
|
|
lexer->result_symbol = CONCAT;
|
|
return true;
|
|
}
|
|
}
|
|
|
|
if (valid_symbols[IMMEDIATE_DOUBLE_HASH] &&
|
|
!in_error_recovery(valid_symbols)) {
|
|
// advance two # and ensure not } after
|
|
if (lexer->lookahead == '#') {
|
|
lexer->mark_end(lexer);
|
|
advance(lexer);
|
|
if (lexer->lookahead == '#') {
|
|
advance(lexer);
|
|
if (lexer->lookahead != '}') {
|
|
lexer->result_symbol = IMMEDIATE_DOUBLE_HASH;
|
|
lexer->mark_end(lexer);
|
|
return true;
|
|
}
|
|
}
|
|
}
|
|
}
|
|
|
|
if (valid_symbols[EXTERNAL_EXPANSION_SYM_HASH] &&
|
|
!in_error_recovery(valid_symbols)) {
|
|
if (lexer->lookahead == '#' || lexer->lookahead == '=' ||
|
|
lexer->lookahead == '!') {
|
|
lexer->result_symbol =
|
|
lexer->lookahead == '#' ? EXTERNAL_EXPANSION_SYM_HASH
|
|
: lexer->lookahead == '!' ? EXTERNAL_EXPANSION_SYM_BANG
|
|
: EXTERNAL_EXPANSION_SYM_EQUAL;
|
|
advance(lexer);
|
|
lexer->mark_end(lexer);
|
|
while (lexer->lookahead == '#' || lexer->lookahead == '=' ||
|
|
lexer->lookahead == '!') {
|
|
advance(lexer);
|
|
}
|
|
while (iswspace(lexer->lookahead)) {
|
|
skip(lexer);
|
|
}
|
|
if (lexer->lookahead == '}') {
|
|
return true;
|
|
}
|
|
return false;
|
|
}
|
|
}
|
|
|
|
if (valid_symbols[EMPTY_VALUE]) {
|
|
if (iswspace(lexer->lookahead) || lexer->eof(lexer) ||
|
|
lexer->lookahead == ';' || lexer->lookahead == '&') {
|
|
lexer->result_symbol = EMPTY_VALUE;
|
|
return true;
|
|
}
|
|
}
|
|
|
|
if ((valid_symbols[HEREDOC_BODY_BEGINNING] ||
|
|
valid_symbols[SIMPLE_HEREDOC_BODY]) &&
|
|
scanner->heredocs.size > 0 && !array_back(&scanner->heredocs)->started &&
|
|
!in_error_recovery(valid_symbols)) {
|
|
return scan_heredoc_content(scanner, lexer, HEREDOC_BODY_BEGINNING,
|
|
SIMPLE_HEREDOC_BODY);
|
|
}
|
|
|
|
if (valid_symbols[HEREDOC_END] && scanner->heredocs.size > 0) {
|
|
Heredoc *heredoc = array_back(&scanner->heredocs);
|
|
if (scan_heredoc_end_identifier(heredoc, lexer)) {
|
|
array_delete(&heredoc->current_leading_word);
|
|
array_delete(&heredoc->delimiter);
|
|
array_pop(&scanner->heredocs);
|
|
lexer->result_symbol = HEREDOC_END;
|
|
return true;
|
|
}
|
|
}
|
|
|
|
if (valid_symbols[HEREDOC_CONTENT] && scanner->heredocs.size > 0 &&
|
|
array_back(&scanner->heredocs)->started &&
|
|
!in_error_recovery(valid_symbols)) {
|
|
return scan_heredoc_content(scanner, lexer, HEREDOC_CONTENT, HEREDOC_END);
|
|
}
|
|
|
|
if (valid_symbols[HEREDOC_START] && !in_error_recovery(valid_symbols) &&
|
|
scanner->heredocs.size > 0) {
|
|
return scan_heredoc_start(array_back(&scanner->heredocs), lexer);
|
|
}
|
|
|
|
if (valid_symbols[TEST_OPERATOR] && !valid_symbols[EXPANSION_WORD]) {
|
|
while (iswspace(lexer->lookahead) && lexer->lookahead != '\n') {
|
|
skip(lexer);
|
|
}
|
|
|
|
if (lexer->lookahead == '\\') {
|
|
if (valid_symbols[EXTGLOB_PATTERN]) {
|
|
goto extglob_pattern;
|
|
}
|
|
if (valid_symbols[REGEX_NO_SPACE]) {
|
|
goto regex;
|
|
}
|
|
skip(lexer);
|
|
|
|
if (lexer->eof(lexer)) {
|
|
return false;
|
|
}
|
|
|
|
if (lexer->lookahead == '\r') {
|
|
skip(lexer);
|
|
if (lexer->lookahead == '\n') {
|
|
skip(lexer);
|
|
}
|
|
} else if (lexer->lookahead == '\n') {
|
|
skip(lexer);
|
|
} else {
|
|
return false;
|
|
}
|
|
|
|
while (iswspace(lexer->lookahead)) {
|
|
skip(lexer);
|
|
}
|
|
}
|
|
|
|
if (lexer->lookahead == '\n' && !valid_symbols[NEWLINE]) {
|
|
skip(lexer);
|
|
|
|
while (iswspace(lexer->lookahead)) {
|
|
skip(lexer);
|
|
}
|
|
}
|
|
|
|
if (lexer->lookahead == '-') {
|
|
advance(lexer);
|
|
|
|
bool advanced_once = false;
|
|
while (iswalpha(lexer->lookahead)) {
|
|
advanced_once = true;
|
|
advance(lexer);
|
|
}
|
|
|
|
if (iswspace(lexer->lookahead) && advanced_once) {
|
|
lexer->mark_end(lexer);
|
|
advance(lexer);
|
|
if (lexer->lookahead == '}' && valid_symbols[CLOSING_BRACE]) {
|
|
if (valid_symbols[EXPANSION_WORD]) {
|
|
lexer->mark_end(lexer);
|
|
lexer->result_symbol = EXPANSION_WORD;
|
|
return true;
|
|
}
|
|
return false;
|
|
}
|
|
lexer->result_symbol = TEST_OPERATOR;
|
|
return true;
|
|
}
|
|
if (iswspace(lexer->lookahead) && valid_symbols[EXTGLOB_PATTERN]) {
|
|
lexer->result_symbol = EXTGLOB_PATTERN;
|
|
return true;
|
|
}
|
|
}
|
|
|
|
if (valid_symbols[BARE_DOLLAR] && !in_error_recovery(valid_symbols) &&
|
|
scan_bare_dollar(lexer)) {
|
|
return true;
|
|
}
|
|
}
|
|
|
|
if ((valid_symbols[VARIABLE_NAME] || valid_symbols[FILE_DESCRIPTOR] ||
|
|
valid_symbols[HEREDOC_ARROW]) &&
|
|
!valid_symbols[REGEX_NO_SLASH] && !in_error_recovery(valid_symbols)) {
|
|
for (;;) {
|
|
if ((lexer->lookahead == ' ' || lexer->lookahead == '\t' ||
|
|
lexer->lookahead == '\r' ||
|
|
(lexer->lookahead == '\n' && !valid_symbols[NEWLINE])) &&
|
|
!valid_symbols[EXPANSION_WORD]) {
|
|
skip(lexer);
|
|
} else if (lexer->lookahead == '\\') {
|
|
skip(lexer);
|
|
|
|
if (lexer->eof(lexer)) {
|
|
lexer->mark_end(lexer);
|
|
lexer->result_symbol = VARIABLE_NAME;
|
|
return true;
|
|
}
|
|
|
|
if (lexer->lookahead == '\r') {
|
|
skip(lexer);
|
|
}
|
|
if (lexer->lookahead == '\n') {
|
|
skip(lexer);
|
|
} else {
|
|
if (lexer->lookahead == '\\' && valid_symbols[EXPANSION_WORD]) {
|
|
goto expansion_word;
|
|
}
|
|
return false;
|
|
}
|
|
} else {
|
|
break;
|
|
}
|
|
}
|
|
|
|
// no '*', '@', '?', '-', '$', '0', '_'
|
|
if (!valid_symbols[EXPANSION_WORD] &&
|
|
(lexer->lookahead == '*' || lexer->lookahead == '@' ||
|
|
lexer->lookahead == '?' || lexer->lookahead == '-' ||
|
|
lexer->lookahead == '0' || lexer->lookahead == '_')) {
|
|
lexer->mark_end(lexer);
|
|
advance(lexer);
|
|
if (lexer->lookahead == '=' || lexer->lookahead == '[' ||
|
|
lexer->lookahead == ':' || lexer->lookahead == '-' ||
|
|
lexer->lookahead == '%' || lexer->lookahead == '#' ||
|
|
lexer->lookahead == '/') {
|
|
return false;
|
|
}
|
|
if (valid_symbols[EXTGLOB_PATTERN] && iswspace(lexer->lookahead)) {
|
|
lexer->mark_end(lexer);
|
|
lexer->result_symbol = EXTGLOB_PATTERN;
|
|
return true;
|
|
}
|
|
}
|
|
|
|
if (valid_symbols[HEREDOC_ARROW] && lexer->lookahead == '<') {
|
|
advance(lexer);
|
|
if (lexer->lookahead == '<') {
|
|
advance(lexer);
|
|
if (lexer->lookahead == '-') {
|
|
advance(lexer);
|
|
Heredoc heredoc = heredoc_new();
|
|
heredoc.allows_indent = true;
|
|
array_push(&scanner->heredocs, heredoc);
|
|
lexer->result_symbol = HEREDOC_ARROW_DASH;
|
|
} else if (lexer->lookahead == '<' || lexer->lookahead == '=') {
|
|
return false;
|
|
} else {
|
|
Heredoc heredoc = heredoc_new();
|
|
array_push(&scanner->heredocs, heredoc);
|
|
lexer->result_symbol = HEREDOC_ARROW;
|
|
}
|
|
return true;
|
|
}
|
|
return false;
|
|
}
|
|
|
|
bool is_number = true;
|
|
if (iswdigit(lexer->lookahead)) {
|
|
advance(lexer);
|
|
} else if (iswalpha(lexer->lookahead) || lexer->lookahead == '_') {
|
|
is_number = false;
|
|
advance(lexer);
|
|
} else {
|
|
if (lexer->lookahead == '{') {
|
|
goto brace_start;
|
|
}
|
|
if (valid_symbols[EXPANSION_WORD]) {
|
|
goto expansion_word;
|
|
}
|
|
if (valid_symbols[EXTGLOB_PATTERN]) {
|
|
goto extglob_pattern;
|
|
}
|
|
return false;
|
|
}
|
|
|
|
for (;;) {
|
|
if (iswdigit(lexer->lookahead)) {
|
|
advance(lexer);
|
|
} else if (iswalpha(lexer->lookahead) || lexer->lookahead == '_') {
|
|
is_number = false;
|
|
advance(lexer);
|
|
} else {
|
|
break;
|
|
}
|
|
}
|
|
|
|
if (is_number && valid_symbols[FILE_DESCRIPTOR] &&
|
|
(lexer->lookahead == '>' || lexer->lookahead == '<')) {
|
|
lexer->result_symbol = FILE_DESCRIPTOR;
|
|
return true;
|
|
}
|
|
|
|
if (valid_symbols[VARIABLE_NAME]) {
|
|
if (lexer->lookahead == '+') {
|
|
lexer->mark_end(lexer);
|
|
advance(lexer);
|
|
if (lexer->lookahead == '=' || lexer->lookahead == ':' ||
|
|
valid_symbols[CLOSING_BRACE]) {
|
|
lexer->result_symbol = VARIABLE_NAME;
|
|
return true;
|
|
}
|
|
return false;
|
|
}
|
|
if (lexer->lookahead == '/') {
|
|
return false;
|
|
}
|
|
if (lexer->lookahead == '=' || lexer->lookahead == '[' ||
|
|
(lexer->lookahead == ':' && !valid_symbols[CLOSING_BRACE] &&
|
|
!valid_symbols
|
|
[OPENING_PAREN]) || // TODO(amaanq): more cases for regular word
|
|
// chars but not variable names for function
|
|
// words, only handling : for now? #235
|
|
lexer->lookahead == '%' ||
|
|
(lexer->lookahead == '#' && !is_number) || lexer->lookahead == '@' ||
|
|
(lexer->lookahead == '-' && valid_symbols[CLOSING_BRACE])) {
|
|
lexer->mark_end(lexer);
|
|
lexer->result_symbol = VARIABLE_NAME;
|
|
return true;
|
|
}
|
|
|
|
if (lexer->lookahead == '?') {
|
|
lexer->mark_end(lexer);
|
|
advance(lexer);
|
|
lexer->result_symbol = VARIABLE_NAME;
|
|
return iswalpha(lexer->lookahead);
|
|
}
|
|
}
|
|
|
|
return false;
|
|
}
|
|
|
|
if (valid_symbols[BARE_DOLLAR] && !in_error_recovery(valid_symbols) &&
|
|
scan_bare_dollar(lexer)) {
|
|
return true;
|
|
}
|
|
|
|
regex:
|
|
if ((valid_symbols[REGEX] || valid_symbols[REGEX_NO_SLASH] ||
|
|
valid_symbols[REGEX_NO_SPACE]) &&
|
|
!in_error_recovery(valid_symbols)) {
|
|
if (valid_symbols[REGEX] || valid_symbols[REGEX_NO_SPACE]) {
|
|
while (iswspace(lexer->lookahead)) {
|
|
skip(lexer);
|
|
}
|
|
}
|
|
|
|
if ((lexer->lookahead != '"' && lexer->lookahead != '\'') ||
|
|
((lexer->lookahead == '$' || lexer->lookahead == '\'') &&
|
|
valid_symbols[REGEX_NO_SLASH]) ||
|
|
(lexer->lookahead == '\'' && valid_symbols[REGEX_NO_SPACE])) {
|
|
typedef struct {
|
|
bool done;
|
|
bool advanced_once;
|
|
bool found_non_alnumdollarunderdash;
|
|
bool last_was_escape;
|
|
bool in_single_quote;
|
|
uint32_t paren_depth;
|
|
uint32_t bracket_depth;
|
|
uint32_t brace_depth;
|
|
} State;
|
|
|
|
if (lexer->lookahead == '$' && valid_symbols[REGEX_NO_SLASH]) {
|
|
lexer->mark_end(lexer);
|
|
advance(lexer);
|
|
if (lexer->lookahead == '(') {
|
|
return false;
|
|
}
|
|
}
|
|
|
|
lexer->mark_end(lexer);
|
|
|
|
State state = {false, false, false, false, false, 0, 0, 0};
|
|
while (!state.done) {
|
|
if (state.in_single_quote) {
|
|
if (lexer->lookahead == '\'') {
|
|
state.in_single_quote = false;
|
|
advance(lexer);
|
|
lexer->mark_end(lexer);
|
|
}
|
|
}
|
|
switch (lexer->lookahead) {
|
|
case '\\':
|
|
state.last_was_escape = true;
|
|
break;
|
|
case '\0':
|
|
return false;
|
|
case '(':
|
|
state.paren_depth++;
|
|
state.last_was_escape = false;
|
|
break;
|
|
case '[':
|
|
state.bracket_depth++;
|
|
state.last_was_escape = false;
|
|
break;
|
|
case '{':
|
|
if (!state.last_was_escape) {
|
|
state.brace_depth++;
|
|
}
|
|
state.last_was_escape = false;
|
|
break;
|
|
case ')':
|
|
if (state.paren_depth == 0) {
|
|
state.done = true;
|
|
}
|
|
state.paren_depth--;
|
|
state.last_was_escape = false;
|
|
break;
|
|
case ']':
|
|
if (state.bracket_depth == 0) {
|
|
state.done = true;
|
|
}
|
|
state.bracket_depth--;
|
|
state.last_was_escape = false;
|
|
break;
|
|
case '}':
|
|
if (state.brace_depth == 0) {
|
|
state.done = true;
|
|
}
|
|
state.brace_depth--;
|
|
state.last_was_escape = false;
|
|
break;
|
|
case '\'':
|
|
// Enter or exit a single-quoted string.
|
|
state.in_single_quote = !state.in_single_quote;
|
|
advance(lexer);
|
|
state.advanced_once = true;
|
|
state.last_was_escape = false;
|
|
continue;
|
|
default:
|
|
state.last_was_escape = false;
|
|
break;
|
|
}
|
|
|
|
if (!state.done) {
|
|
if (valid_symbols[REGEX]) {
|
|
bool was_space =
|
|
!state.in_single_quote && iswspace(lexer->lookahead);
|
|
advance(lexer);
|
|
state.advanced_once = true;
|
|
if (!was_space || state.paren_depth > 0) {
|
|
lexer->mark_end(lexer);
|
|
}
|
|
} else if (valid_symbols[REGEX_NO_SLASH]) {
|
|
if (lexer->lookahead == '/') {
|
|
lexer->mark_end(lexer);
|
|
lexer->result_symbol = REGEX_NO_SLASH;
|
|
return state.advanced_once;
|
|
}
|
|
if (lexer->lookahead == '\\') {
|
|
advance(lexer);
|
|
state.advanced_once = true;
|
|
if (!lexer->eof(lexer) && lexer->lookahead != '[' &&
|
|
lexer->lookahead != '/') {
|
|
advance(lexer);
|
|
lexer->mark_end(lexer);
|
|
}
|
|
} else {
|
|
bool was_space =
|
|
!state.in_single_quote && iswspace(lexer->lookahead);
|
|
advance(lexer);
|
|
state.advanced_once = true;
|
|
if (!was_space) {
|
|
lexer->mark_end(lexer);
|
|
}
|
|
}
|
|
} else if (valid_symbols[REGEX_NO_SPACE]) {
|
|
if (lexer->lookahead == '\\') {
|
|
state.found_non_alnumdollarunderdash = true;
|
|
advance(lexer);
|
|
if (!lexer->eof(lexer)) {
|
|
advance(lexer);
|
|
}
|
|
} else if (lexer->lookahead == '$') {
|
|
lexer->mark_end(lexer);
|
|
advance(lexer);
|
|
// do not parse a command
|
|
// substitution
|
|
if (lexer->lookahead == '(') {
|
|
return false;
|
|
}
|
|
// end $ always means regex, e.g.
|
|
// 99999999$
|
|
if (iswspace(lexer->lookahead)) {
|
|
lexer->result_symbol = REGEX_NO_SPACE;
|
|
lexer->mark_end(lexer);
|
|
return true;
|
|
}
|
|
} else {
|
|
bool was_space =
|
|
!state.in_single_quote && iswspace(lexer->lookahead);
|
|
if (was_space && state.paren_depth == 0) {
|
|
lexer->mark_end(lexer);
|
|
lexer->result_symbol = REGEX_NO_SPACE;
|
|
return state.found_non_alnumdollarunderdash;
|
|
}
|
|
if (!iswalnum(lexer->lookahead) && lexer->lookahead != '$' &&
|
|
lexer->lookahead != '-' && lexer->lookahead != '_') {
|
|
state.found_non_alnumdollarunderdash = true;
|
|
}
|
|
advance(lexer);
|
|
}
|
|
}
|
|
}
|
|
}
|
|
|
|
lexer->result_symbol = valid_symbols[REGEX_NO_SLASH] ? REGEX_NO_SLASH
|
|
: valid_symbols[REGEX_NO_SPACE] ? REGEX_NO_SPACE
|
|
: REGEX;
|
|
if (valid_symbols[REGEX] && !state.advanced_once) {
|
|
return false;
|
|
}
|
|
return true;
|
|
}
|
|
}
|
|
|
|
extglob_pattern:
|
|
if (valid_symbols[EXTGLOB_PATTERN] && !in_error_recovery(valid_symbols)) {
|
|
// first skip ws, then check for ? * + @ !
|
|
while (iswspace(lexer->lookahead)) {
|
|
skip(lexer);
|
|
}
|
|
|
|
if (lexer->lookahead == '?' || lexer->lookahead == '*' ||
|
|
lexer->lookahead == '+' || lexer->lookahead == '@' ||
|
|
lexer->lookahead == '!' || lexer->lookahead == '-' ||
|
|
lexer->lookahead == ')' || lexer->lookahead == '\\' ||
|
|
lexer->lookahead == '.' || lexer->lookahead == '[' ||
|
|
(iswalpha(lexer->lookahead))) {
|
|
if (lexer->lookahead == '\\') {
|
|
advance(lexer);
|
|
if ((iswspace(lexer->lookahead) || lexer->lookahead == '"') &&
|
|
lexer->lookahead != '\r' && lexer->lookahead != '\n') {
|
|
advance(lexer);
|
|
} else {
|
|
return false;
|
|
}
|
|
}
|
|
|
|
if (lexer->lookahead == ')' && scanner->last_glob_paren_depth == 0) {
|
|
lexer->mark_end(lexer);
|
|
advance(lexer);
|
|
|
|
if (iswspace(lexer->lookahead)) {
|
|
return false;
|
|
}
|
|
}
|
|
|
|
lexer->mark_end(lexer);
|
|
bool was_non_alpha = !iswalpha(lexer->lookahead);
|
|
if (lexer->lookahead != '[') {
|
|
// no esac
|
|
if (lexer->lookahead == 'e') {
|
|
lexer->mark_end(lexer);
|
|
advance(lexer);
|
|
if (lexer->lookahead == 's') {
|
|
advance(lexer);
|
|
if (lexer->lookahead == 'a') {
|
|
advance(lexer);
|
|
if (lexer->lookahead == 'c') {
|
|
advance(lexer);
|
|
if (iswspace(lexer->lookahead)) {
|
|
return false;
|
|
}
|
|
}
|
|
}
|
|
}
|
|
} else {
|
|
advance(lexer);
|
|
}
|
|
}
|
|
|
|
// -\w is just a word, find something else special
|
|
if (lexer->lookahead == '-') {
|
|
lexer->mark_end(lexer);
|
|
advance(lexer);
|
|
while (iswalnum(lexer->lookahead)) {
|
|
advance(lexer);
|
|
}
|
|
|
|
if (lexer->lookahead == ')' || lexer->lookahead == '\\' ||
|
|
lexer->lookahead == '.') {
|
|
return false;
|
|
}
|
|
lexer->mark_end(lexer);
|
|
}
|
|
|
|
// case item -) or *)
|
|
if (lexer->lookahead == ')' && scanner->last_glob_paren_depth == 0) {
|
|
lexer->mark_end(lexer);
|
|
advance(lexer);
|
|
if (iswspace(lexer->lookahead)) {
|
|
lexer->result_symbol = EXTGLOB_PATTERN;
|
|
return was_non_alpha;
|
|
}
|
|
}
|
|
|
|
if (iswspace(lexer->lookahead)) {
|
|
lexer->mark_end(lexer);
|
|
lexer->result_symbol = EXTGLOB_PATTERN;
|
|
scanner->last_glob_paren_depth = 0;
|
|
return true;
|
|
}
|
|
|
|
if (lexer->lookahead == '$') {
|
|
lexer->mark_end(lexer);
|
|
advance(lexer);
|
|
if (lexer->lookahead == '{' || lexer->lookahead == '(') {
|
|
lexer->result_symbol = EXTGLOB_PATTERN;
|
|
return true;
|
|
}
|
|
}
|
|
|
|
if (lexer->lookahead == '|') {
|
|
lexer->mark_end(lexer);
|
|
advance(lexer);
|
|
lexer->result_symbol = EXTGLOB_PATTERN;
|
|
return true;
|
|
}
|
|
|
|
if (!iswalnum(lexer->lookahead) && lexer->lookahead != '(' &&
|
|
lexer->lookahead != '"' && lexer->lookahead != '[' &&
|
|
lexer->lookahead != '?' && lexer->lookahead != '/' &&
|
|
lexer->lookahead != '\\' && lexer->lookahead != '_' &&
|
|
lexer->lookahead != '*') {
|
|
return false;
|
|
}
|
|
|
|
typedef struct {
|
|
bool done;
|
|
bool saw_non_alphadot;
|
|
uint32_t paren_depth;
|
|
uint32_t bracket_depth;
|
|
uint32_t brace_depth;
|
|
} State;
|
|
|
|
State state = {false, was_non_alpha, scanner->last_glob_paren_depth, 0,
|
|
0};
|
|
while (!state.done) {
|
|
switch (lexer->lookahead) {
|
|
case '\0':
|
|
return false;
|
|
case '(':
|
|
state.paren_depth++;
|
|
break;
|
|
case '[':
|
|
state.bracket_depth++;
|
|
break;
|
|
case '{':
|
|
state.brace_depth++;
|
|
break;
|
|
case ')':
|
|
if (state.paren_depth == 0) {
|
|
state.done = true;
|
|
}
|
|
state.paren_depth--;
|
|
break;
|
|
case ']':
|
|
if (state.bracket_depth == 0) {
|
|
state.done = true;
|
|
}
|
|
state.bracket_depth--;
|
|
break;
|
|
case '}':
|
|
if (state.brace_depth == 0) {
|
|
state.done = true;
|
|
}
|
|
state.brace_depth--;
|
|
break;
|
|
}
|
|
|
|
if (lexer->lookahead == '|') {
|
|
lexer->mark_end(lexer);
|
|
advance(lexer);
|
|
if (state.paren_depth == 0 && state.bracket_depth == 0 &&
|
|
state.brace_depth == 0) {
|
|
lexer->result_symbol = EXTGLOB_PATTERN;
|
|
return true;
|
|
}
|
|
}
|
|
|
|
if (!state.done) {
|
|
bool was_space = iswspace(lexer->lookahead);
|
|
if (lexer->lookahead == '$') {
|
|
lexer->mark_end(lexer);
|
|
if (!iswalpha(lexer->lookahead) && lexer->lookahead != '.' &&
|
|
lexer->lookahead != '\\') {
|
|
state.saw_non_alphadot = true;
|
|
}
|
|
advance(lexer);
|
|
if (lexer->lookahead == '(' || lexer->lookahead == '{') {
|
|
lexer->result_symbol = EXTGLOB_PATTERN;
|
|
scanner->last_glob_paren_depth = state.paren_depth;
|
|
return state.saw_non_alphadot;
|
|
}
|
|
}
|
|
if (was_space) {
|
|
lexer->mark_end(lexer);
|
|
lexer->result_symbol = EXTGLOB_PATTERN;
|
|
scanner->last_glob_paren_depth = 0;
|
|
return state.saw_non_alphadot;
|
|
}
|
|
if (lexer->lookahead == '"') {
|
|
lexer->mark_end(lexer);
|
|
lexer->result_symbol = EXTGLOB_PATTERN;
|
|
scanner->last_glob_paren_depth = 0;
|
|
return state.saw_non_alphadot;
|
|
}
|
|
if (lexer->lookahead == '\\') {
|
|
if (!iswalpha(lexer->lookahead) && lexer->lookahead != '.' &&
|
|
lexer->lookahead != '\\') {
|
|
state.saw_non_alphadot = true;
|
|
}
|
|
advance(lexer);
|
|
if (iswspace(lexer->lookahead) || lexer->lookahead == '"') {
|
|
advance(lexer);
|
|
}
|
|
} else {
|
|
if (!iswalpha(lexer->lookahead) && lexer->lookahead != '.' &&
|
|
lexer->lookahead != '\\') {
|
|
state.saw_non_alphadot = true;
|
|
}
|
|
advance(lexer);
|
|
}
|
|
if (!was_space) {
|
|
lexer->mark_end(lexer);
|
|
}
|
|
}
|
|
}
|
|
|
|
lexer->result_symbol = EXTGLOB_PATTERN;
|
|
scanner->last_glob_paren_depth = 0;
|
|
return state.saw_non_alphadot;
|
|
}
|
|
scanner->last_glob_paren_depth = 0;
|
|
|
|
return false;
|
|
}
|
|
|
|
expansion_word:
|
|
if (valid_symbols[EXPANSION_WORD]) {
|
|
bool advanced_once = false;
|
|
bool advance_once_space = false;
|
|
for (;;) {
|
|
if (lexer->lookahead == '\"') {
|
|
return false;
|
|
}
|
|
if (lexer->lookahead == '$') {
|
|
lexer->mark_end(lexer);
|
|
advance(lexer);
|
|
if (lexer->lookahead == '{' || lexer->lookahead == '(' ||
|
|
lexer->lookahead == '\'' || iswalnum(lexer->lookahead)) {
|
|
lexer->result_symbol = EXPANSION_WORD;
|
|
return advanced_once;
|
|
}
|
|
advanced_once = true;
|
|
}
|
|
|
|
if (lexer->lookahead == '}') {
|
|
lexer->mark_end(lexer);
|
|
lexer->result_symbol = EXPANSION_WORD;
|
|
return advanced_once || advance_once_space;
|
|
}
|
|
|
|
if (lexer->lookahead == '(' && !(advanced_once || advance_once_space)) {
|
|
lexer->mark_end(lexer);
|
|
advance(lexer);
|
|
while (lexer->lookahead != ')' && !lexer->eof(lexer)) {
|
|
// if we find a $( or ${ assume this is valid and is
|
|
// a garbage concatenation of some weird word + an
|
|
// expansion
|
|
// I wonder where this can fail
|
|
if (lexer->lookahead == '$') {
|
|
lexer->mark_end(lexer);
|
|
advance(lexer);
|
|
if (lexer->lookahead == '{' || lexer->lookahead == '(' ||
|
|
lexer->lookahead == '\'' || iswalnum(lexer->lookahead)) {
|
|
lexer->result_symbol = EXPANSION_WORD;
|
|
return advanced_once;
|
|
}
|
|
advanced_once = true;
|
|
} else {
|
|
advanced_once = advanced_once || !iswspace(lexer->lookahead);
|
|
advance_once_space =
|
|
advance_once_space || iswspace(lexer->lookahead);
|
|
advance(lexer);
|
|
}
|
|
}
|
|
lexer->mark_end(lexer);
|
|
if (lexer->lookahead == ')') {
|
|
advanced_once = true;
|
|
advance(lexer);
|
|
lexer->mark_end(lexer);
|
|
if (lexer->lookahead == '}') {
|
|
return false;
|
|
}
|
|
} else {
|
|
return false;
|
|
}
|
|
}
|
|
|
|
if (lexer->lookahead == '\'') {
|
|
return false;
|
|
}
|
|
|
|
if (lexer->eof(lexer)) {
|
|
return false;
|
|
}
|
|
advanced_once = advanced_once || !iswspace(lexer->lookahead);
|
|
advance_once_space = advance_once_space || iswspace(lexer->lookahead);
|
|
advance(lexer);
|
|
}
|
|
}
|
|
|
|
brace_start:
|
|
if (valid_symbols[BRACE_START] && !in_error_recovery(valid_symbols)) {
|
|
while (iswspace(lexer->lookahead)) {
|
|
skip(lexer);
|
|
}
|
|
|
|
if (lexer->lookahead != '{') {
|
|
return false;
|
|
}
|
|
|
|
advance(lexer);
|
|
lexer->mark_end(lexer);
|
|
|
|
while (isdigit(lexer->lookahead)) {
|
|
advance(lexer);
|
|
}
|
|
|
|
if (lexer->lookahead != '.') {
|
|
return false;
|
|
}
|
|
advance(lexer);
|
|
|
|
if (lexer->lookahead != '.') {
|
|
return false;
|
|
}
|
|
advance(lexer);
|
|
|
|
while (isdigit(lexer->lookahead)) {
|
|
advance(lexer);
|
|
}
|
|
|
|
if (lexer->lookahead != '}') {
|
|
return false;
|
|
}
|
|
|
|
lexer->result_symbol = BRACE_START;
|
|
return true;
|
|
}
|
|
|
|
return false;
|
|
}
|
|
|
|
void *tree_sitter_shellspec_external_scanner_create() {
|
|
Scanner *scanner = calloc(1, sizeof(Scanner));
|
|
array_init(&scanner->heredocs);
|
|
return scanner;
|
|
}
|
|
|
|
bool tree_sitter_shellspec_external_scanner_scan(void *payload, TSLexer *lexer,
|
|
const bool *valid_symbols) {
|
|
Scanner *scanner = (Scanner *)payload;
|
|
return scan(scanner, lexer, valid_symbols);
|
|
}
|
|
|
|
unsigned tree_sitter_shellspec_external_scanner_serialize(void *payload,
|
|
char *state) {
|
|
Scanner *scanner = (Scanner *)payload;
|
|
return serialize(scanner, state);
|
|
}
|
|
|
|
void tree_sitter_shellspec_external_scanner_deserialize(void *payload,
|
|
const char *state,
|
|
unsigned length) {
|
|
Scanner *scanner = (Scanner *)payload;
|
|
deserialize(scanner, state, length);
|
|
}
|
|
|
|
void tree_sitter_shellspec_external_scanner_destroy(void *payload) {
|
|
Scanner *scanner = (Scanner *)payload;
|
|
for (size_t i = 0; i < scanner->heredocs.size; i++) {
|
|
Heredoc *heredoc = array_get(&scanner->heredocs, i);
|
|
array_delete(&heredoc->current_leading_word);
|
|
array_delete(&heredoc->delimiter);
|
|
}
|
|
array_delete(&scanner->heredocs);
|
|
free(scanner);
|
|
}
|