PostgreSQL DBA LAB Practice 1

 PostgreSQL DBA LAB Practice 1


SECTION 1 — PostgreSQL Installation, Setup, Configuration (30 Scenarios)

  1. Install PostgreSQL 16 on Linux using official repos.
  2. Change PostgreSQL data directory from /var/lib/pgsql/16/data/data/pgdata.
  3. Initialize a database cluster using initdb with a custom locale.
  4. Configure logging_collector and rotate logs properly.
  5. Configure PostgreSQL to listen on a custom port (e.g., 5434).
  6. Enable SSL and force SSL connections for specific roles.
  7. Create multiple PostgreSQL instances on the same VM.
  8. Migrate postgresql.conf settings to a new server.
  9. Tune shared_buffers, work_mem, maintenance_work_mem for OLTP.
  10. Validate configuration using pg_controldata.
  11. Diagnose PostgreSQL not starting due to corrupt postmaster.pid.
  12. Fix permission errors caused by wrong PGDATA ownership.
  13. Configure WAL directory on separate storage.
  14. Enable huge_pages and verify they are in use.
  15. Test effects of synchronous_commit ON/OFF.
  16. Configure and test replication slots.
  17. Simulate out-of-disk space in pg_xlog/pg_wal.
  18. Enable pgaudit and generate audited events.
  19. Recover after accidental deletion of configuration files.
  20. Configure statement_timeout globally and per user.
  21. Enable auto_explain and tune slow query logging.
  22. Benchmark using pgbench with custom scripts.
  23. Configure server for high-concurrency workloads.
  24. Diagnose PostgreSQL startup after kernel update.
  25. Tune effective_io_concurrency and test I/O improvements.
  26. Configure PostgreSQL memory settings for 128GB RAM.
  27. Test WAL compression.
  28. Configure custom background workers.
  29. Recover cluster from partial pg_wal archive loss.
  30. Setup patroni or repmgr configuration files.

🟦 SECTION 2 — Roles, Security, Access Control, Encryption (25 Scenarios)

  1. Create role with least privilege access for app users.
  2. Implement row-level security (RLS).
  3. Enforce password expiry for selected roles.
  4. Configure client cert authentication using pg_hba.conf.
  5. Implement SCRAM‑SHA‑256 authentication.
  6. Audit failed logins using pgaudit.
  7. Restrict access to certain tables using privileges + RLS.
  8. Create a login trigger to audit sessions.
  9. Detect brute-force login attempts.
  10. Restrict superuser access safely.
  11. Set per-user connection limits.
  12. Detect users overriding search_path to execute malicious code.
  13. Enforce password complexity policy.
  14. Identify unused roles and clean them safely.
  15. Restrict PUBLIC schema access.
  16. Encrypt client/server traffic using SSL and validate cipher.
  17. Protect PostgreSQL from SQL injection at server level.
  18. Create and test role inheritance rules.
  19. Implement IP-based restrictions at pg_hba.conf.
  20. Audit all SELECTs on sensitive tables.
  21. Enable data at rest encryption using file system tools.
  22. Block cross‑schema access exploitation.
  23. Identify roles with excessive grants.
  24. Detect unauthorized creation of functions.
  25. Audit DROP TABLE attempts.

🟦 SECTION 3 — Backup, Restore, PITR, WAL Archiving (40 Scenarios)

  1. Take a base backup using pg_basebackup.
  2. Restore cluster using PITR to a specific timestamp.
  3. Enable WAL archiving via archive_command.
  4. Restore from WAL archive after disaster.
  5. Perform pg_dump + pg_restore of a 500GB DB.
  6. Restore specific schema instead of full DB.
  7. Take logical backups of only tables with specific patterns.
  8. Restore only selected tables from logical dump.
  9. Validate backup consistency with pg_verifybackup.
  10. Perform pg_receivewal streaming backup.
  11. Rebuild cluster from only WAL and base backup.
  12. Fix corrupted backup manifest.
  13. Simulate archive_command failure and recover.
  14. Perform PITR using transaction ID boundaries.
  15. Backup only data without indexes and rebuild indexes later.
  16. Test restore on lower hardware.
  17. Restore after accidental TRUNCATE.
  18. Perform cross-version upgrade using pg_upgrade.
  19. Validate binary upgrade thoroughly.
  20. Perform rollback after upgrade failure.
  21. Restore cluster with tablespace mapping.
  22. Validate WAL archiving via WAL segment inspection.
  23. Detect WAL archive gaps.
  24. Fix corrupted WAL segment.
  25. Perform a backup and restore with RDS/Aurora.
  26. Backup using Barman/pgBackRest.
  27. Restore using pgBackRest with delta restore.
  28. Simulate WAL corruption on standby and repair.

🟦 SECTION 4 — Replication (Streaming, Logical, Cascading, Slots) — 50 Scenarios

  1. Set up streaming replication (sync or async).
  2. Add standby to replication using replication slots.
  3. Enable physical cascading replication chain (Primary → Standby → Standby).
  4. Fix replication after WAL archive missing.
  5. Fix replication getting stuck at a specific LSN.
  6. Diagnose slow replay on standby.
  7. Fix standby lag during heavy writes.
  8. Promote standby without losing data.
  9. Demote a primary safely.
  10. Fix replication failing due to conflicting xid.
  11. Solve replication restart due to WAL retention misconfig.
  12. Fix “requested WAL segment has already been removed”.
  13. Enable synchronous replication and test data consistency.
  14. Fix sync replication causing writer stalls.
  15. Test effect of synchronous_standby_names.
  16. Misconfigured slot causing disk full — fix.
  17. Enable logical decoding and create replication slot.
  18. Use logical replication to replicate selective tables.
  19. Fix logical replication missing updates.
  20. Handle schema drift in logical replication.
  21. Replicate only DDL changes using pg_logical.
  22. Perform failover with repmgr/patroni.
  23. Repoint apps automatically after failover.
  24. Fix hot standby conflict due to vacuum.
  25. Fix standby terminating due to recovery conflicts.
  26. Diagnose heavy WAL sender CPU usage.
  27. Identify longest-running replication transaction.
  28. Fix WAL segment recycling too fast.
  29. Solve "replication slot inactive" issues.
  30. Repair orphaned replication slots.
  31. Fix inconsistent LSN across replicas.
  32. Rebuild standby without restarting primary.
  33. Diagnose sync replica missing heartbeats.
  34. Fix replication failover loop.
  35. Solve replication post-failover divergence.

🟦 SECTION 5 — Vacuum, Autovacuum, Bloat, Freeze, XID (40 Scenarios)

  1. Diagnose table bloat using pg_stat_all_tables.
  2. Reclaim bloat using VACUUM FULL.
  3. Rebuild index concurrently.
  4. Diagnose autovacuum not triggering.
  5. Fix autovacuum freeze issues.
  6. Identify wraparound risk and fix immediately.
  7. Tune autovacuum for heavy OLTP workloads.
  8. Reduce dead tuples after mass-update.
  9. Diagnose index bloat separately from table bloat.
  10. Fix HOT update not working.
  11. Identify XID exhaustion risk.
  12. Force aggressive freeze on large table.
  13. Fix autovacuum freezing DB causing performance issues.
  14. Identify very slow VACUUM activity.
  15. Solve VACUUM preventing standby replay.
  16. Fix autovacuum workers starvation.
  17. Identify vacuum dead rows across partitions.
  18. Create vacuum cost-based tuning.
  19. Fix anti-wraparound shutdown incidents.
  20. Perform page-level inspection using pageinspect.
  21. Freeze all tuples in multi‑TB table.
  22. Diagnose “could not read block” errors.
  23. Fix vacuum running forever.
  24. Fix VACUUM FULL causing table locks.
  25. Diagnose bloat after failed autovacuum.
  26. Freeze partitioned tables properly.
  27. Solve vacuum interfering with logical decoding.

🟦 SECTION 6 — Performance Tuning, Query Tuning, Planner, Cache (50 Scenarios)

  1. Identify slow queries using pg_stat_statements.
  2. Fix queries doing sequential scan unexpectedly.
  3. Diagnose bad join order chosen by planner.
  4. Fix planner misestimating row counts.
  5. Detect implicit casts hurting index use.
  6. Diagnose hash join memory spills.
  7. Tune work_mem for multi-join queries.
  8. Fix slow CTE queries after PostgreSQL 12 changes.
  9. Optimize queries creating temp tables.
  10. Fix sorts spilling to disk.
  11. Tune parallel workers for large queries.
  12. Force planner to use specific index and compare performance.
  13. Diagnose correlation issues causing poor index usage.
  14. Fix queries generating large heap fetches.
  15. Tune effective_cache_size.
  16. Reduce full table scans.
  17. Fix queries that ignore multicolumn index.
  18. Tune JIT execution for analytic queries.
  19. Diagnose regression after upgrade.
  20. Optimize queries using JSON.
  21. Identify slow system catalogs access.
  22. Tune shared_buffers for large datasets.
  23. Fix CPU-bound workloads.
  24. Fix I/O-bound workloads.
  25. Detect dead tuples affecting query speed.
  26. Fix SELECT count(*) slowness.
  27. Optimize GROUP BY on large tables.
  28. Fix inefficient window functions.
  29. Solve slow OFFSET/LIMIT queries.
  30. Optimize DISTINCT queries.
  31. Fix slow plpgsql functions.
  32. Detect cache thrashing.
  33. Fix JIT compilation overhead.
  34. Reduce nested loop join cost.
  35. Tune huge join queries executed at peak load.
  36. Analyze pg_buffercache for hot pages.
  37. Fix slow stored procedures doing loops.
  38. Reduce catalog contention, pg_class/pg_attribute hotspots.
  39. Fix performance regression after new index addition.
  40. Remove unused indexes hurting write performance.

🟦 SECTION 7 — Indexing, Partitioning, Data Modeling (40 Scenarios)

  1. Create BRIN index for time-series tables.
  2. Fix BRIN index not pruning properly.
  3. Create partial index for selective queries.
  4. Diagnose overlapping partial index issues.
  5. Tune index fillfactor.
  6. Fix index corruption using REINDEX.
  7. Build indexes concurrently on large tables.
  8. Use expression index and test effectiveness.
  9. Diagnose multi-column index misuse.
  10. Identify redundant indexes.
  11. Optimize table partitioning strategy.
  12. Fix partition misrouting.
  13. Validate constraint exclusion behavior.
  14. Optimize partition pruning.
  15. Migrate non-partitioned table to partitioned table online.
  16. Convert partition strategy from RANGE to LIST.
  17. Move large partitions to new tablespace.
  18. Identify hottest partition.
  19. Fix foreign keys slowing inserts.
  20. Optimize composite PK for OLTP.
  21. Diagnose wide tables (100+ columns).
  22. Optimize datatype choices for memory.
  23. Identify oversized rows using pg_column_size.
  24. Normalize / Denormalize data models.
  25. Fix dead rows causing BRIN index bloat.
  26. Validate CHECK constraints with partitioning.
  27. Improve selective WHERE queries via bloom indexes.
  28. Fix primary-key hot spot issues.
  29. Create proper index strategy for multi-tenant schema.
  30. Detect and fix misaligned statistics on partitions.

🟦 SECTION 8 — High Availability, Failover, Clustering (Patroni/Repmgr/etc.) — 35 Scenarios

  1. Configure Patroni cluster with 3 nodes.
  2. Simulate node crash and observe automatic failover.
  3. Fix Patroni failing to elect new leader.
  4. Repair DCS (etcd/consul) corruption.
  5. Fix split-brain detection failure.
  6. Rejoin failed node without data loss.
  7. Debug repmgr switchover problems.
  8. Fix repmgr node repeatedly leaving cluster.
  9. Solve cascading replication inconsistency.
  10. Fix cluster pauses due to WAL shipping delays.
  11. Diagnose “fencing” failures.
  12. Configure synchronous replica priorities.
  13. Fix node promotion loops.
  14. Test manual failover + rejoin.
  15. Fix cluster misrouting read-only traffic.
  16. Recreate cluster after catastrophic failover.
  17. Diagnose bottlenecks inside Patroni watchdog.
  18. Test cluster upgrade with minimal downtime.
  19. Fix failover caused by long-running queries.
  20. Fix DCS network partition issues.
  21. Diagnose replication slot sync issues in cluster.
  22. Fix multi-node recovery race conditions.
  23. Test Geo-distributed HA cluster behavior.

🟦 SECTION 9 — Corruption, File System Issues, WAL Problems (40 Scenarios)

  1. Fix block corruption using pg_checksums.
  2. Identify corrupted heap pages using pageinspect.
  3. Repair index-only corruption.
  4. Fix “invalid page header” errors.
  5. Recover from missing pg_filenode file.
  6. Restore corrupted relation from previous backup.
  7. Fix WAL record with invalid CRC.
  8. Fix broken visibility map.
  9. Diagnose relation truncated unexpectedly.
  10. Fix orphaned toast tables.
  11. Identify cause of unexpected checkpoint.
  12. Fix checkpoint spikes causing massive lag.
  13. Resolve WAL recycling too aggressively.
  14. Diagnose WAL file generation anomalies.
  15. Fix full pg_wal directory crash.
  16. Diagnose missing FPI (full page image).
  17. Fix tablespace corruption.
  18. Solve “fork number not found” errors.
  19. Fix failed relation extension.
  20. Fix read-only mount making PG unstable.
  21. Diagnose filesystem-level corruption affecting PG.
  22. Repair invalid FSM (free space map).
  23. Fix database inconsistent with pg_control.
  24. Repair multi‑file relation corruption.
  25. Diagnose silent data corruption.

🟦 SECTION 10 — Advanced Cloud (RDS/Aurora/GCP/Azure) — 40 Scenarios

  1. Fix RDS IOPS exhaustion.
  2. Diagnose Aurora replica lag.
  3. Fix RDS storage full state.
  4. Tune RDS parameters via parameter groups.
  5. Fix RDS failover cycles.
  6. Diagnose RDS high CPU despite low queries.
  7. Fix RDS connection storms.
  8. Fix Aurora writer routing issues.
  9. Diagnose Aurora GlobalDB replication break.
  10. Restore RDS snapshot in another region.
  11. Migrate from on-prem PG → Aurora.
  12. Fix CPU spikes caused by cloudwatch metrics.
  13. Solve performance drop after scaling instance.
  14. Tune RDS for heavy write workloads.
  15. Fix RDS vacuum freezing the instance.
  16. Fix GCP CloudSQL connection limit issues.
  17. Fix Azure PostgreSQL slow due to IO throttling.
  18. Repair corrupted RDS logical replication slot.
  19. Fix RDS outage caused by parameter misconfig.
  20. Diagnose Aurora write stalls.
  21. Fix RDS slow query caused by network jitter.
  22. Diagnose cross-region read replica lag.
  23. Resolve Aurora storage node corruption.
  24. Test Aurora fast failover behavior.
  25. Fix RDS frozen due to long transaction.
  26. Resolve DB parameter applying error.

🟦 SECTION 11 — Migration, Upgrades, DevOps Automation (35 Scenarios)

  1. Perform pg_upgrade dry-run test.
  2. Fix pg_upgrade failure due to incompatible extension.
  3. Migrate PostgreSQL from 9.6 → 16 with minimal downtime.
  4. Migrate large table via logical replication.
  5. Fix migration failing because of collation mismatch.
  6. Migrate sequences properly.
  7. Synchronize data using pglogical.
  8. Test cross-architecture migration (ARM → x86).
  9. Automate DB deployments using Ansible.
  10. Build CI pipeline to test DB schema changes.
  11. Fix migration errors in CI pipeline.
  12. Automate PITR restore on staging.
  13. Automate pgBackRest incremental backups.
  14. Detect drift between environments.
  15. Fix failed logical replication during migration.
  16. Test blue‑green deployment with PG.
  17. Integrate DB schema changes into application deployments.
  18. Validate compatibility before upgrade.
  19. Fix GIN/GiST index rebuild after upgrade.
  20. Run vacuum freeze before version upgrade.
  21. Migrate on-prem cluster to Kubernetes.

🟦 SECTION 12 — Rare, Edge-Case, Nightmare PostgreSQL Failures (35 Scenarios)

  1. Fix vacuum freezing entire cluster.
  2. Diagnose invisible dead rows due to snapshot issues.
  3. Solve SERIAL column exhaustion.
  4. Fix random postmaster crash due to background worker.
  5. Diagnose multixact ID exhaustion.
  6. Repair toast table mismatch.
  7. Recover from failed drop of partitioned table.
  8. Fix relation not found after crash.
  9. Diagnose missing visibility bits.
  10. Solve duplicate xmin bugs.
  11. Fix stuck transaction due to 2PC failure.
  12. Diagnose WAL replay stuck forever.
  13. Fix out-of-sync pg_class counters.
  14. Identify OID wraparound issues.
  15. Diagnose logical decoding memory leak.
  16. Repair corrupted pg_largeobject table.
  17. Fix checkpoint loops.
  18. Fix temp table causing OOM.
  19. Repair unclean shutdown with corrupted multixact.
  20. Solve planner crash due to incorrect stats.
  21. Fix toggle between HOT and non-HOT updates.
  22. Repair vacuum aborted mid-cycle.
  23. Diagnose performance regression due to JIT.
  24. Fix pg_stat_statements overflow.
  25. Repair replication slot catalog mismatch.

πŸŸ₯ DONE — You now have 500+ PostgreSQL DBA LAB Scenarios


🧨 PART 2 — 250+ PostgreSQL DBA Interview Questions 

Here is a short selection (first 50) to confirm you want the full 250+:

  1. Explain PostgreSQL MVCC at storage level (CTID, xmin, xmax).
  2. Why do long transactions cause vacuum freeze issues?
  3. Explain HOT vs non-HOT updates.
  4. What is the role of visibility map?
  5. How does PostgreSQL handle full table rewrite?
  6. How does planner choose join order?
  7. Why would planner not choose an index scan?
  8. Explain autovacuum tuning in high-write workloads.
  9. What causes PID conflicts in PostgreSQL?
  10. Explain the difference between VACUUM, VACUUM FULL, ANALYZE.
  11. Why is count(*) slow? How do you optimize it?
  12. Explain WAL internal structure (pages, records).
  13. What is checkpoint and why does it cause I/O storms?
  14. Explain streaming replication internals.
  15. How do replication slots prevent WAL removal?
  16. What causes bloated indexes?
  17. Explain JIT compilation.
  18. What is async vs sync replication?
  19. How do you recover from corrupted relation files?
  20. Explain BRIN vs BTREE index decisions.
  21. What is TOAST and how does it work?
  22. What causes table bloat?
  23. What is an advisory lock?
  24. How do you repair corrupted visibility map?
  25. Explain how PostgreSQL handles failover.
  26. Difference between logical and physical replication.
  27. Why do logical replication conflicts happen?
  28. Explain full_page_writes.
  29. Why does PostgreSQL not support clustered indexes like InnoDB?
  30. What is the role of FSM and VM?
  31. What causes deadlocks in PostgreSQL?
  32. Explain how analyze updates planning decisions.
  33. How do you identify slowest queries?
  34. What is parallel query execution?
  35. When does parallelism not get used?
  36. What is a checkpoint buffer write?
  37. Explain synchronous_standby_names behavior.
  38. What is vacuum freeze min_age?
  39. Why does partition pruning fail?
  40. Explain SERIALIZABLE isolation implementation.
  41. What causes “tuple concurrently updated”?
  42. Explain RBM_NORMAL and RBM_ZERO.
  43. Why does PostgreSQL rewrite table on certain ALTER operations?
  44. What is clog (pg_xact)?
  45. How do you detect transaction ID wraparound?
  46. Why does vacuum get canceled on hot standby?
  47. Explain WAL recycling.
  48. Why do GIN/GiST indexes get huge?
  49. What causes frequent checkpoints?
  50. Explain RLS security barrier.

Comments