Reading data is half the job; modifying it is the other half — and where the most damage happens. This lesson covers writing data and the transactional discipline that keeps it consistent.
INSERT
INSERT INTO users (email, name)
VALUES ('alex@example.com', 'Alex');
-- Multiple rows in one statement
INSERT INTO users (email, name) VALUES
('a@example.com', 'A'),
('b@example.com', 'B'),
('c@example.com', 'C');
-- Return generated values (Postgres)
INSERT INTO users (email, name)
VALUES ('alex@example.com', 'Alex')
RETURNING id, created_at;
-- Insert from a query
INSERT INTO archived_users (id, email)
SELECT id, email FROM users WHERE created_at < '2020-01-01';
Upsert (INSERT ... ON CONFLICT)
"Insert if it doesn't exist, otherwise update":
-- Postgres
INSERT INTO users (email, name)
VALUES ('alex@example.com', 'Alex')
ON CONFLICT (email) DO UPDATE
SET name = EXCLUDED.name,
updated_at = NOW();
-- Or do nothing on conflict
INSERT INTO users (email, name)
VALUES ('alex@example.com', 'Alex')
ON CONFLICT (email) DO NOTHING;
-- MySQL equivalent
INSERT INTO users (email, name)
VALUES ('alex@example.com', 'Alex')
ON DUPLICATE KEY UPDATE name = VALUES(name), updated_at = NOW();
Upserts are essential for idempotent imports — re-running the same script never creates duplicates.
UPDATE
UPDATE users
SET name = 'Alex Smith', updated_at = NOW()
WHERE id = 42;
-- Update many rows
UPDATE orders
SET status = 'archived'
WHERE created_at < NOW() - INTERVAL '1 year';
-- Update with values from another table (Postgres)
UPDATE orders
SET customer_name = u.name
FROM users u
WHERE orders.user_id = u.id;
The most dangerous SQL statement is UPDATE without a WHERE — it modifies every row in the table. Standard discipline before running any non-trivial UPDATE:
- Run the corresponding
SELECTwith the sameWHEREfirst; verify the row count. - Wrap in a transaction so you can roll back.
- Use
RETURNING(Postgres) or capture old values in another way for audit.
DELETE
DELETE FROM sessions WHERE expires_at < NOW();
DELETE FROM orders
WHERE id IN (SELECT id FROM orders WHERE status = 'cancelled' LIMIT 1000);
Same hazard as UPDATE: DELETE FROM users; with no WHERE clears the table. Test with SELECT first; wrap in a transaction.
Soft delete
Many production systems never truly delete. Instead they have a deleted_at column and treat any row with a non-null value as gone:
UPDATE users SET deleted_at = NOW() WHERE id = 42;
-- All "live" queries filter it out
SELECT * FROM users WHERE deleted_at IS NULL;
Pros: undo is trivial; you keep referential history; audits work. Cons: every query has to filter, and you accumulate data forever — pair with a periodic hard-delete job.
TRUNCATE vs DELETE
DELETE FROM t; | Removes rows one by one; fires triggers; can be rolled back; slow on large tables |
TRUNCATE TABLE t; | Empties the table instantly; resets identity counters; faster; behaves differently in transactions across engines |
Transactions
A transaction is a group of statements that succeed or fail together. They are the foundation of data integrity:
BEGIN;
UPDATE accounts SET balance = balance - 100 WHERE id = 1;
UPDATE accounts SET balance = balance + 100 WHERE id = 2;
-- Both succeeded — make it permanent
COMMIT;
-- Or, on any error
ROLLBACK;
Without a transaction, a crash between the two updates leaves money missing from one account and not added to the other. With it, either both happen or neither does.
ACID, Briefly
- Atomicity — all-or-nothing
- Consistency — the database moves from one valid state to another (constraints enforced)
- Isolation — concurrent transactions don't see each other's intermediate state
- Durability — committed changes survive crashes
Isolation Levels
Isolation is the only ACID property you tune. Levels (in order from weakest to strongest):
| Level | Prevents |
|---|---|
| Read Uncommitted | Almost nothing — rare in practice |
| Read Committed | Dirty reads. Default in Postgres & SQL Server. |
| Repeatable Read | + Non-repeatable reads. Default in MySQL InnoDB. |
| Serializable | + Phantom reads. Effectively as if transactions ran one-at-a-time. Most expensive. |
Higher isolation prevents more anomalies but reduces concurrency. The default is fine for most apps; use SERIALIZABLE when you need strict guarantees and can tolerate the throughput cost.
Locking and Conflicts
To enforce isolation, the database takes locks. Two common pitfalls:
- Long-running transactions hold locks the whole time, blocking other writers. Keep transactions short.
- Deadlocks: two transactions each hold a lock the other needs. Databases detect this and abort one of them — your code must handle and retry.
Explicit row lock
BEGIN;
SELECT * FROM accounts WHERE id = 1 FOR UPDATE; -- locks the row
UPDATE accounts SET balance = balance - 100 WHERE id = 1;
COMMIT;
FOR UPDATE blocks other writers from touching that row until you commit. Use sparingly — usually the engine's default locking is enough.
Auto-Commit
By default, most clients run each statement in its own transaction. So UPDATE users ...; from a SQL shell auto-commits unless you explicitly BEGIN. ORMs and frameworks usually wrap a request handler in one transaction by default — convenient, but be aware of it.
Batching Big Modifications
Don't DELETE FROM logs WHERE created_at < '2020-01-01' when there are 50 million matching rows — you'll lock the table for hours and bloat the WAL. Batch it:
-- Loop in your app code
DELETE FROM logs
WHERE id IN (
SELECT id FROM logs
WHERE created_at < '2020-01-01'
LIMIT 10000
);
Each batch is a small transaction. Sleep briefly between batches if production traffic is heavy.
The Mental Checklist Before Any Write
- Did I include a
WHERE? - Have I run the equivalent
SELECTand verified the row count? - Am I in a transaction?
- Will this hold long-running locks? Should I batch?
- Is there a backup of this table from the last hour?
Five questions, ten seconds, countless saved careers.