Skip to main content
Security for Application Engineers

Input Validation and Injection Classes

Ravinder··6 min read
SecurityAppSecInjectionXSSSQLiSSRF
Share:
Input Validation and Injection Classes

Injection vulnerabilities have appeared in OWASP's Top 10 every year since the list was created. That is not because they are hard to prevent — it is because engineers keep solving the same problem with the wrong layer: sanitizing input when the fix is parameterization, filtering output when the fix is encoding, or blocking URLs when the fix is restricting outbound connections.

Every injection class has a root cause and a correct fix. Understanding the root cause is what lets you apply the fix instinctively instead of reaching for a regex after the fact.

The Root Cause Is Context Confusion

Injection happens when untrusted data is interpreted as code or control syntax in a different context. A SQL query is a string with syntax; inserting unescaped user input into it makes the interpreter see user content as query syntax. The same applies to HTML, shell commands, and URLs.

The correct fix is always to keep the data and the syntax separate — parameterized queries, template escaping, shell argument arrays, and network-level restrictions.

SQL Injection

The vulnerability. String-concatenating user input into a SQL query allows an attacker to alter the query's meaning.

# VULNERABLE — never do this
def get_user(username: str, db):
    query = f"SELECT * FROM users WHERE username = '{username}'"
    return db.execute(query).fetchone()
 
# Input: ' OR '1'='1
# Resulting query: SELECT * FROM users WHERE username = '' OR '1'='1'
# Returns all users.

The fix: parameterized queries. The driver sends the query and parameters separately; the database never interprets the parameter as syntax.

# SAFE — parameterized
def get_user(username: str, db):
    return db.execute(
        "SELECT * FROM users WHERE username = %s",
        (username,)
    ).fetchone()
 
# With SQLAlchemy ORM (also safe — uses bind parameters internally)
def get_user_orm(username: str, session):
    return session.query(User).filter(User.username == username).first()

ORMs do not automatically protect you. Raw text() queries with f-strings or .format() are still injectable. Watch for session.execute(text(f"...{user_input}...")).

Second-order injection. Data stored in the database is safe when written but injected when later used to build another query. The fix is the same — parameterize every query, including those built from database-sourced data.

Cross-Site Scripting (XSS)

XSS injects attacker-controlled JavaScript into a page viewed by another user. The browser runs it in the victim's security context — reading cookies (when not HttpOnly), exfiltrating tokens, performing actions as the victim.

flowchart LR A["Attacker posts comment:\n"] A --> B["Comment stored in DB"] B --> C["Victim loads page"] C --> D["Browser renders comment\nas HTML — script executes"] D --> E["Session cookie sent to evil.com"]

Stored XSS (above) is the most impactful — one injection hits every user who views the content.

The fix: context-aware output encoding. Encode for the context where data is inserted.

# Python — Jinja2 autoescape (on by default for HTML templates)
# Never disable autoescape for user-controlled content
 
# Unsafe: Markup() bypasses escaping
return Markup(f"<p>{user_comment}</p>")  # VULNERABLE
 
# Safe: let Jinja2 escape automatically
return render_template("comment.html", comment=user_comment)  # SAFE
// React — JSX auto-escapes string interpolation
// Safe by default:
<p>{userComment}</p>
 
// VULNERABLE — bypass escaping only when HTML is trusted and sanitized
<p dangerouslySetInnerHTML={{ __html: userComment }} />
 
// If you must render HTML, sanitize first:
import DOMPurify from "dompurify";
<p dangerouslySetInnerHTML={{ __html: DOMPurify.sanitize(userComment) }} />

Content Security Policy (CSP) is a defense-in-depth layer that restricts which scripts the browser will execute. Set it, but do not rely on it as the primary control — encoded output is the fix.

Content-Security-Policy: default-src 'self'; script-src 'self'; object-src 'none';

Server-Side Request Forgery (SSRF)

SSRF occurs when your application fetches a URL supplied or influenced by a user. The request originates from your server — with access to internal networks, cloud metadata services, and services behind firewalls.

# VULNERABLE — fetches any URL the user provides
import httpx
 
@router.get("/preview")
async def preview_url(url: str):
    resp = await httpx.AsyncClient().get(url)
    return resp.text
 
# Attacker calls: /preview?url=http://169.254.169.254/latest/meta-data/
# On AWS, this returns EC2 instance metadata including IAM credentials.

Mitigations, layered:

import ipaddress, httpx
from urllib.parse import urlparse
 
ALLOWED_SCHEMES = {"https"}
BLOCKED_NETWORKS = [
    ipaddress.ip_network("10.0.0.0/8"),
    ipaddress.ip_network("172.16.0.0/12"),
    ipaddress.ip_network("192.168.0.0/16"),
    ipaddress.ip_network("169.254.0.0/16"),   # link-local / metadata
    ipaddress.ip_network("127.0.0.0/8"),
    ipaddress.ip_network("::1/128"),
]
 
def is_safe_url(url: str) -> bool:
    parsed = urlparse(url)
    if parsed.scheme not in ALLOWED_SCHEMES:
        return False
    try:
        import socket
        ip = ipaddress.ip_address(socket.gethostbyname(parsed.hostname))
        return not any(ip in net for net in BLOCKED_NETWORKS)
    except Exception:
        return False
 
@router.get("/preview")
async def preview_url(url: str):
    if not is_safe_url(url):
        raise HTTPException(status_code=400, detail="URL not allowed")
    async with httpx.AsyncClient(
        follow_redirects=False  # prevent redirect to internal IP
    ) as client:
        resp = await client.get(url, timeout=5.0)
    return resp.text

Network-level control (egress firewall blocking RFC 1918 ranges from the application host) is a required complement — application-level allowlisting can be bypassed with DNS rebinding.

Command Injection

Command injection occurs when user input reaches a shell interpreter. On Linux/macOS, metacharacters like ;, |, $(), and && chain additional commands.

import subprocess
 
# VULNERABLE — shell=True with user input
filename = request.args.get("filename")
output = subprocess.run(f"convert {filename} output.png", shell=True, capture_output=True)
 
# Input: "image.jpg; rm -rf /tmp"
# Runs: convert image.jpg; rm -rf /tmp

The fix: argument arrays, never shell strings.

import shlex, pathlib, subprocess
 
ALLOWED_EXT = {".jpg", ".jpeg", ".png", ".gif", ".webp"}
 
def convert_image(filename: str, output_dir: pathlib.Path) -> bytes:
    # Validate extension before touching the filesystem
    p = pathlib.Path(filename)
    if p.suffix.lower() not in ALLOWED_EXT:
        raise ValueError("File type not allowed")
 
    # Use argument list — subprocess does NOT invoke a shell
    result = subprocess.run(
        ["convert", str(p), str(output_dir / "output.png")],
        capture_output=True,
        timeout=30,
        check=True,   # raises CalledProcessError on non-zero exit
    )
    return result.stdout

When you cannot avoid a shell call, use shlex.quote() on every user-supplied string — but prefer the array form.

Input Validation as Defense-in-Depth

Parameterization and encoding are the primary controls. Validation is a defense-in-depth layer that catches garbage early and reduces attack surface.

from pydantic import BaseModel, HttpUrl, constr, validator
 
class UserProfileUpdate(BaseModel):
    username: constr(min_length=3, max_length=32, pattern=r'^[a-zA-Z0-9_-]+$')
    bio: constr(max_length=500)
    website: HttpUrl | None = None
 
    @validator("website")
    def website_scheme(cls, v):
        if v and v.scheme not in ("http", "https"):
            raise ValueError("Only http/https URLs allowed")
        return v

Validation rejects malformed input before it reaches downstream systems. It does not replace parameterized queries or output encoding.

Key Takeaways

  • Injection is context confusion: the fix is always to keep data and syntax separate, not to sanitize the data harder.
  • SQL injection is prevented by parameterized queries — not escaping, not ORMs used carelessly, not input validation alone.
  • XSS is prevented by context-aware output encoding at render time; CSP adds defense-in-depth but does not replace encoding.
  • SSRF requires both application-level URL validation and network-level egress controls — DNS rebinding can bypass application checks alone.
  • Command injection is prevented by passing arguments as arrays to subprocess, never constructing shell command strings with user input.
  • Input validation (schema, type, range) is defense-in-depth, not a primary control — it reduces attack surface but cannot substitute for the structural fixes above.
Share: