lab · pq-share / sast · ~12 minute read

what the scanners catch

pq-share runs Semgrep and CodeQL on every push. The findings are open. This page reports them honestly — what was caught, what was missed, and what the gap teaches about the limits of pattern-based SAST when the application's "weakness" is a deliberate policy choice rather than a syntactic mistake.

§ 01

the setup

Two static analyzers run as GitHub Actions workflows on every push to main and on every pull request:

Both write SARIF (and JSON, in Semgrep's case) and upload the results as workflow artifacts. The reports are public — anyone can download the artifacts from any run, since the repository is public.

Why this matters for pq-share specifically: the application intentionally exposes weak primitives in its picker. Selecting AES-128-GCM or Ed25519 alone or X25519-only is not a bug — it's the demonstration. A naive read of the SAST output would say "look at all these weak ciphers!" The honest read, below, is more nuanced.

snapshot

Numbers below are from the original run on commit f97d027 (May 2026), which is the run that motivated this page. Both CodeQL findings were addressed in a27fb8d and 26d8c20; the latest run shows 0 CodeQL findings, 4 Semgrep findings (all the same false-positive rule applied to the rename migration's hardcoded inputs). The walk-through below preserves the original findings for educational value.

Semgrep — initial
2
findings · --config=auto
CodeQL — initial
2
findings · security-extended
Semgrep — current
0
4 FPs muted via workflow-level rule exclusion
CodeQL — current
0
both addressed

Four initial findings total across two scanners and roughly four thousand lines of code. By the standards of "auto-config Semgrep on a real codebase," that's quiet. By the standards of "demonstration application that intentionally offers weak cryptography for educational purposes," it's notable for what didn't appear.

§ 02

what they caught

One finding at a time, with the verdict and the reasoning. Three labels are used: tp for true positive (real issue), fp for false positive in this context, and policy for findings that depend on intent (the syntax is fine; the question is whether using this primitive for this purpose is appropriate).

Semgrep · python.sqlalchemy.security.audit.avoid-sqlalchemy-text
avoid-sqlalchemy-text backend/app/db.py:29, 32 false positive
sqlalchemy.text passes the constructed SQL statement to the database mostly unchanged. This means that the usual SQL injection protections are not applied and this function is vulnerable to SQL injection if user input can reach here.

The flagged code is the in-place migration runner:

async def _apply_column_additions(conn) -> None:
    for table, column, ddl in _PENDING_COLUMNS:
        rows = await conn.execute(text(f"PRAGMA table_info({table})"))
        existing = {row[1] for row in rows.fetchall()}
        if column not in existing:
            await conn.execute(text(f"ALTER TABLE {table} ADD COLUMN {column} {ddl}"))

Both the table and column names come from _PENDING_COLUMNS, a hardcoded constant tuple defined ten lines above. There is no path by which user input reaches the f-string. Semgrep's rule is taint-style but it flags the construct without inferring the source.

Why the rule is still valuable: if a future contributor refactored _PENDING_COLUMNS to be loaded from a config file, or if the migration grew to support user-named tables, this exact site would become exploitable. The rule is correctly drawing a line at the construct, not the current data flow.

CodeQL · py/log-injection
log-injection backend/app/email_send.py:32 true positive — fixed in 26d8c20
This log entry depends on a user-provided value.

The flagged line:

log.info("sent email to=%s subject=%r via %s", to, subject, settings.smtp_host)

Both to (a recipient email address) and subject (currently a server-constructed string, but derived from sender email and file metadata) reach this log call. A malicious recipient address containing a newline could forge a fake log line that looks like another event — a classic log-injection pattern. Severity is low: the affected log channel is structured and the impact is limited to log readability, but the rule is correct that user-controlled data is reaching a structured-log sink without sanitization.

Mitigation: either sanitize newlines from log fields explicitly, switch to a JSON log format that escapes them by construction, or use a logging library that handles this at the formatter level. We'll do one of these in a follow-up commit.

CodeQL · js/clear-text-storage-of-sensitive-data
clear-text-storage-of-sensitive-data frontend/app.js:379 false positive — addressed in a27fb8d
This stores sensitive data returned by an access to wrapped_priv_password as clear text.

The flagged line:

sessionStorage.setItem(SS.WRAPPED_BLOB, wrappedB64);

The data flow here is resp.wrapped_priv_passwordwrappedB64sessionStorage. CodeQL applies a heuristic: variables with names like password, secret, or credential shouldn't end up in storage that isn't explicitly designated for them.

The variable name is misleading. The value is not a password. It is the AES-GCM ciphertext of the user's private-key bundle, encrypted under a key derived from the password via Argon2id. The actual password never leaves the form input; the actual wrapped key never leaves the user's browser. Storing the already-encrypted bundle in sessionStorage is the design — it lets the user reload the tab and re-derive wrapKey with their password rather than having to log in again.

The rule is right; the variable is misleadingly named. Renaming wrapped_priv_password to wrapped_priv_blob would silence the heuristic without changing any behavior. We may make that change.

follow-up · commits a27fb8d & 26d8c20

Both CodeQL findings are now fixed. a27fb8d renamed wrapped_priv_password (column, API fields, JS callers) to wrapped_priv_blob with an idempotent RENAME COLUMN migration — the clear-text-storage finding cleared on the next CI run.

The first attempt at the log-injection fix routed user input through a control-char sanitizer helper, but CodeQL's taint analysis didn't recognize the helper as a sanitizer and the finding returned at the new line. 26d8c20 took the cleaner approach: drop user-controlled fields from the log line entirely. The database is the authoritative audit trail; the log only needs to confirm SMTP succeeded. After 26d8c20, both CodeQL files report 0 findings.

The Semgrep avoid-sqlalchemy-text findings — 4 in the migration helpers — were initially left as-is so the alarm would re-fire if anyone fed user input into the DDL builders. After the workflow's failure was raising real noise on every push, we tried per-line # nosemgrep annotations (both rule-id-suffixed and bare) with a justifying comment block. Neither form was honored by semgrep ci nor semgrep scan — the findings still appeared as "blocking". We landed on --exclude-rule=…avoid-sqlalchemy-text… in the Semgrep workflow command. The rule is now skipped globally rather than per-line; in exchange, a future text(f"…") call elsewhere in the code would slip through. That tradeoff is acceptable here because the rule fires on the construct regardless of data flow, so every hit would have to be triaged manually anyway.

Honest accounting: the chain of attempts was 5f1c09a (rule-id-suffixed nosemgrep, ignored) → e362586 (bare nosemgrep, ignored) → 4b0fd17 (switched to semgrep scan --error, exit-code elevation also failed) → 43bf3a2 (workflow-level exclude — clean). The empirical lesson: inline Semgrep suppression is unreliable when running --config=auto without a Semgrep Cloud token, and the only mechanism that works is dropping the rule at the CLI invocation.

§ 03

what they missed

pq-share's picker offers five options for key exchange and five for digital signatures. A naive observer would expect a SAST scan to flag most of these — they include X25519 alone (no PQ hedge), ECDSA-P256 (broken by Shor's algorithm in polynomial time on a CRQC), AES-128-GCM (Grover-reducible to 64-bit effective security), and so on. Zero of those primitives were flagged by either scanner.

Why? Because the picker uses every primitive correctly. Each call site is a well-formed invocation of a vetted library:

// frontend/crypto.js
export function ed25519Sign(message, priv) { return ed25519.sign(message, priv); }
export function aesGcmSeal(key, plaintext) {
  const cryptoKey = await crypto.subtle.importKey("raw", key, "AES-GCM", false, ["encrypt"]);
  const iv = crypto.getRandomValues(new Uint8Array(12));
  ...
}

There is no AES.MODE_ECB, no hardcoded key, no reused IV, no missing authentication tag. SAST tools detect syntactic patterns — constructs that are wrong on their face. The weakness pq-share is demonstrating is at a different layer: it's a policy question of whether a given primitive is appropriate for a given threat model. Ed25519 is a perfectly correct EdDSA implementation; it is also forgeable by a sufficiently large quantum computer. Both statements are true. SAST tools can verify the first, not the second.

Caught (syntactic)

  • text(f"...") — possible SQL injection vector
  • User input in log strings — log injection
  • Sensitive-named variable in storage — heuristic match
  • (Would catch: hardcoded keys, reused IVs, ECB mode, deprecated algorithms like MD5, missing auth tags, hardcoded JWT secrets, etc.)

Missed (policy)

  • Picker offering X25519 alone (no PQ hedge)
  • Picker offering secp384r1 alone (no PQ hedge)
  • Picker offering Ed25519 alone
  • Picker offering ECDSA-P384 alone
  • Picker offering AES-128-GCM (Grover concern)
  • Bundle re-encrypted under deterministic wrapKey — no forward secrecy on long-term storage
  • Same recovery code grants full account access — single-factor recovery

Each item in the right column is a real consideration in pq-share's threat model — and each is documented openly in the user guide's threat model section. The right tool to catch them is not a SAST scanner; it's an explicit policy: "this codebase declares what it considers acceptable, and any contributor can read the list." pq-share's CryptoSuite Pydantic schema and PRESET_LOCKS table do exactly that. They enumerate which primitive combinations are permitted under which regulatory claim, and the validator rejects any upload that lies about which preset it satisfies.

§ 04

what this teaches

Three things, in increasing order of nuance.

1. SAST is high-leverage for syntactic mistakes

Of pq-share's four findings, two correspond to real patterns that would become bugs under predictable refactors (the text() migration code) or would benefit from a structural mitigation (log sanitization). One was a heuristic match driven by misleading naming. None of these would have surfaced without a scanner. Run them. Triage the output. Don't ignore findings just because the current call site happens to be safe.

2. SAST cannot reason about appropriateness

The same scanners would, on most production codebases, generate dozens or hundreds of findings about weak crypto patterns — but only when the patterns are syntactic. A correct use of a weak primitive looks identical, on the wire and in the AST, to a correct use of a strong one. The decision lives in the threat model, not the source. Pair SAST with explicit policy enumeration.

3. The educational value of pq-share's findings is the gap

pq-share's whole point is that the choice of cipher suite is the subject of the demonstration. The picker exposes every primitive openly. Off-the-shelf SAST does not flag any of them, because there's nothing wrong with how they're used — only with whether they should be used at all, and that's not a question SAST is built to answer.

If you're shipping a real product, the lesson is to add project-specific rules — Semgrep can write custom YAML rules against the abstract syntax tree, CodeQL has its own query language for this — that encode your policy. Forbid the picker's "AES-128-GCM" string literal in production code paths. Forbid the Ed25519 sole-sig preset in any preset that isn't labeled "demo." The scanner becomes a guardrail for human judgment, not a substitute for it.

§ 05

reproduce locally

Both scanners run on every push to the public repo. To run the same scans locally:

Semgrep

pip install semgrep
cd path/to/pq-share
semgrep --config=auto --json-output=semgrep.json --sarif-output=semgrep.sarif

CodeQL

# Install the CodeQL CLI from
# https://docs.github.com/en/code-security/codeql-cli
cd path/to/pq-share
codeql database create --language=python --source-root=. db-py
codeql database analyze --format=sarif-latest --output=python.sarif \
    db-py codeql/python-queries:codeql-suites/python-security-extended.qls

codeql database create --language=javascript --source-root=. db-js
codeql database analyze --format=sarif-latest --output=javascript.sarif \
    db-js codeql/javascript-queries:codeql-suites/javascript-security-extended.qls

From CI artifacts

Or download the latest CI artifacts directly with the GitHub CLI:

gh run list --limit 5 -R harpoonsecurity/pq-share
gh run download <run-id> -R harpoonsecurity/pq-share

That writes four directories — semgrep-results-json/, semgrep-results-sarif/, codeql-results-python/, codeql-results-javascript-typescript/ — each with the corresponding raw output. Artifacts are retained for 90 days per run.

§ 06

see also