Files
ghaw-auditor/README.md
2025-10-19 09:52:13 +03:00

477 lines
10 KiB
Markdown

# GitHub Actions & Workflows Auditor
A Python CLI tool for analyzing, auditing, and tracking
GitHub Actions workflows and actions.
## Features
- **Comprehensive Scanning**: Discovers workflows (`.github/workflows/*.yml`)
and action manifests (`action.yml`)
- **Action Resolution**: Resolves GitHub action references to specific SHAs
via GitHub API
- **Monorepo Support**: Handles monorepo actions like `owner/repo/path@ref`
- **Policy Validation**: Enforces security and best practice policies
- **Diff Mode**: Compare current state against baselines to track changes
over time
- **Multiple Output Formats**: JSON and Markdown reports
- **Fast & Cached**: Uses `uv` for dependency management and disk caching
for API responses
- **Rich Analysis**: Extracts triggers, permissions, secrets, runners,
containers, services, and more
## Usage (Recommended)
Run directly with `uvx` without installation:
```bash
# Scan current directory
uvx ghaw-auditor scan
# Scan specific repository
uvx ghaw-auditor scan --repo /path/to/repo
# With GitHub token for better rate limits
GITHUB_TOKEN=ghp_xxx uvx ghaw-auditor scan --repo /path/to/repo
# List unique actions
uvx ghaw-auditor inventory --repo /path/to/repo
# Validate against policy
uvx ghaw-auditor validate --policy policy.yml --enforce
```
> **Note:** `uvx` runs the tool directly without installation.
> For frequent use or CI pipelines, see
> [Installation](#installation-optional) below.
## Installation (Optional)
### Using uv (recommended)
```bash
# Install uv if you don't have it
curl -LsSf https://astral.sh/uv/install.sh | sh
# Clone and install
git clone <repo-url>
cd ghaw_auditor
uv sync
# Install in editable mode
uv pip install -e .
```
### Using pipx
```bash
pipx install .
```
> **When to install:** Install locally if you use the tool frequently,
> need it in CI pipelines, or want faster execution (no download on each run).
## Commands
> **Note:** Examples use `uvx ghaw-auditor`.
> If installed locally, use `ghaw-auditor` directly.
### `scan` - Full Analysis
Analyzes workflows, resolves actions, generates reports.
```bash
# Basic scan
uvx ghaw-auditor scan --repo .
# Full scan with all options
uvx ghaw-auditor scan \
--repo . \
--output .audit \
--format all \
--token $GITHUB_TOKEN \
--concurrency 8 \
--write-baseline
# Offline mode (no API calls)
uvx ghaw-auditor scan --offline --format md
```
**Options:**
- `--repo <path>` - Repository path (default: `.`)
- `--token <str>` - GitHub token (env: `GITHUB_TOKEN`)
- `--output <dir>` - Output directory (default: `.ghaw-auditor`)
- `--format <json|md|all>` - Output format (default: `all`)
- `--cache-dir <dir>` - Cache directory
- `--offline` - Skip API resolution
- `--concurrency <int>` - API concurrency (default: 4)
- `--verbose`, `--quiet` - Logging levels
### `inventory` - List Actions
Print deduplicated action inventory.
```bash
uvx ghaw-auditor inventory --repo /path/to/repo
# Output:
# Unique Actions: 15
# • actions/checkout@08c6903cd8c0fde910a37f88322edcfb5dd907a8
# • actions/setup-go@44694675825211faa026b3c33043df3e48a5fa00
# ...
```
### `validate` - Policy Validation
Validate workflows against policies.
```bash
# Validate with default policy
uvx ghaw-auditor validate --repo .
# Validate with custom policy
uvx ghaw-auditor validate --policy policy.yml --enforce
```
**Options:**
- `--policy <file>` - Policy file path
- `--enforce` - Exit non-zero on violations
## Diff Mode
Track changes over time by comparing against baselines.
```bash
# Create initial baseline
uvx ghaw-auditor scan --write-baseline --output .audit
# Later, compare against baseline
uvx ghaw-auditor scan --diff --baseline .audit/baseline
# Output: .audit/diff/report.diff.md
```
**Baseline contents:**
- `baseline/actions.json` - Action inventory snapshot
- `baseline/workflows.json` - Workflow metadata snapshot
- `baseline/meta.json` - Auditor version, commit SHA, timestamp
**Diff reports show:**
- Added/removed/modified workflows
- Added/removed actions
- Changes to permissions, triggers, concurrency, secrets, etc.
## Output
The tool generates structured reports in the output directory:
### JSON Files
- **`actions.json`** - Deduplicated action inventory with manifests
- **`workflows.json`** - Complete workflow metadata
- **`violations.json`** - Policy violations
### Markdown Report
**`report.md`** includes:
- Summary (workflow count, action count, violations)
- Analysis (triggers, runners, secrets, permissions)
- Per-workflow details (jobs, actions used, configuration)
- Action inventory with inputs/outputs
- Policy violations
### Example Output
```text
.ghaw-auditor/
├── actions.json
├── workflows.json
├── violations.json
├── report.md
├── baseline/
│ ├── actions.json
│ ├── workflows.json
│ └── meta.json
└── diff/
├── actions.diff.json
├── workflows.diff.json
└── report.diff.md
```
## Policy Configuration
Create `policy.yml` to enforce policies:
```yaml
require_pinned_actions: true # Actions must use SHA refs
forbid_branch_refs: true # Forbid branch refs (main, master, etc.)
require_concurrency_on_pr: true # PR workflows must have concurrency
allowed_actions: # Whitelist
- actions/*
- github/*
- docker/*
denied_actions: # Blacklist
- dangerous/action
min_permissions: true # Enforce least-privilege
```
**Policy rules:**
- `require_pinned_actions` - Actions must be pinned to SHA (not tags/branches)
- `forbid_branch_refs` - Forbid branch references (main, master, develop)
- `allowed_actions` - Whitelist of allowed actions (glob patterns)
- `denied_actions` - Blacklist of forbidden actions
- `require_concurrency_on_pr` - PR workflows must set concurrency groups
**Enforcement:**
```bash
# Warn on violations
uvx ghaw-auditor validate --policy policy.yml
# Fail CI on violations
uvx ghaw-auditor validate --policy policy.yml --enforce
# Exit code: 0 (pass), 1 (violations), 2 (error)
```
## Extracted Metadata
### Workflows
- Name, path, triggers (push, PR, schedule, etc.)
- Permissions (workflow & job-level)
- Concurrency groups
- Environment variables
- Reusable workflow contracts (inputs, outputs, secrets)
### Jobs
- Runner (`runs-on`)
- Dependencies (`needs`)
- Conditions (`if`)
- Timeouts
- Container & service configurations
- Matrix strategies
- Actions used per job
### Actions
- Type (GitHub, local, Docker)
- Resolved SHAs for GitHub actions
- Input/output definitions
- Runtime (composite, Docker, Node.js)
- Monorepo path support
### Security
- Secrets used (`${{ secrets.* }}`)
- Permissions (contents, packages, issues, etc.)
- Service containers (databases, caches)
- External actions (owner/repo resolution)
## Architecture
**Layers:**
- `cli` - Typer-based CLI interface
- `scanner` - File discovery
- `parser` - YAML parsing (ruamel.yaml)
- `resolver` - GitHub API integration
- `analyzer` - Pattern extraction
- `policy` - Policy validation
- `renderer` - JSON/Markdown reports
- `differ` - Baseline comparison
- `cache` - Disk-based caching
- `github_client` - HTTP client with retries
**Models (Pydantic):**
- `ActionRef`, `ActionManifest`
- `WorkflowMeta`, `JobMeta`
- `Permissions`, `Strategy`, `Container`, `Service`
- `Policy`, `Baseline`, `DiffEntry`
## Development
```bash
# Install dependencies
uv sync
# Run locally
uv run ghaw-auditor scan --repo .
# Run tests
uv run -m pytest
# Lint
uvx ruff check .
# Format
uvx ruff format .
# Type check
uvx mypy .
# Coverage
uv run -m pytest --cov --cov-report=html
```
## CI Integration
### GitHub Actions
```yaml
- name: Audit GitHub Actions
run: |
uvx ghaw-auditor scan --output audit-results
uvx ghaw-auditor validate --policy policy.yml --enforce
env:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
- name: Upload Audit Results
uses: actions/upload-artifact@v4
with:
name: audit-results
path: audit-results/
```
> **Alternative:** For faster CI runs, cache the installation:
> `pip install ghaw-auditor` then use `ghaw-auditor` directly.
### Baseline Tracking
```yaml
- name: Compare Against Baseline
run: |
uvx ghaw-auditor scan --diff --baseline .audit/baseline
cat .audit/diff/report.diff.md >> $GITHUB_STEP_SUMMARY
```
## Examples
### Analyze a Repository
```bash
uvx ghaw-auditor scan --repo ~/projects/myrepo
```
Output:
```text
Scanning repository...
Found 7 workflows and 2 actions
Parsing workflows...
Found 15 unique action references
Resolving actions...
Analyzing workflows...
Generating reports...
✓ Audit complete! Reports in .ghaw-auditor
```
### Track Changes Over Time
```bash
# Day 1: Create baseline
uvx ghaw-auditor scan --write-baseline
# Day 7: Check for changes
uvx ghaw-auditor scan --diff --baseline .ghaw-auditor/baseline
# View diff
cat .ghaw-auditor/diff/report.diff.md
```
### Validate Security Policies
```bash
# Check for unpinned actions
uvx ghaw-auditor validate --enforce
# Output:
# [ERROR] .github/workflows/ci.yml: Action actions/checkout
# is not pinned to SHA: v4
# Policy enforcement failed: 1 errors
```
### Generate Inventory
```bash
uvx ghaw-auditor inventory --repo . > actions-inventory.txt
```
## Performance
- **Parallel API calls** - Configurable concurrency (default: 4)
- **Disk caching** - API responses cached with TTL
- **Fast parsing** - Efficient YAML parsing with ruamel.yaml
- **Target**: 100+ workflows in < 60 seconds (with warm cache)
## Configuration
Optional `auditor.yaml` in repo root:
```yaml
exclude_paths:
- "**/node_modules/**"
- "**/vendor/**"
cache:
dir: ~/.cache/ghaw-auditor
ttl: 3600 # 1 hour
policies:
require_pinned_actions: true
forbid_branch_refs: true
```
## Troubleshooting
### Rate Limiting
```bash
# Set GitHub token for higher rate limits
export GITHUB_TOKEN=ghp_xxx
uvx ghaw-auditor scan
```
### Large Repositories
```bash
# Increase concurrency
uvx ghaw-auditor scan --concurrency 10
# Use offline mode for local analysis
uvx ghaw-auditor scan --offline
```
### Debugging
```bash
# Verbose output
uvx ghaw-auditor scan --verbose
# JSON logging for CI
uvx ghaw-auditor scan --log-json
```
## License
MIT
## Contributing
Contributions welcome! Please ensure:
- Tests pass: `uv run -m pytest`
- Code formatted: `uvx ruff format .`
- Linting clean: `uvx ruff check .`
- Type hints valid: `uvx mypy .`
- Coverage 85%