PostgreSQL DBA LAB Practice 1
SECTION 1 — PostgreSQL Installation, Setup, Configuration (30 Scenarios)
- Install PostgreSQL 16 on Linux using official repos.
- Change PostgreSQL data directory from
/var/lib/pgsql/16/data→/data/pgdata. - Initialize a database cluster using initdb with a custom locale.
- Configure logging_collector and rotate logs properly.
- Configure PostgreSQL to listen on a custom port (e.g., 5434).
- Enable SSL and force SSL connections for specific roles.
- Create multiple PostgreSQL instances on the same VM.
- Migrate postgresql.conf settings to a new server.
- Tune shared_buffers, work_mem, maintenance_work_mem for OLTP.
- Validate configuration using
pg_controldata. - Diagnose PostgreSQL not starting due to corrupt postmaster.pid.
- Fix permission errors caused by wrong PGDATA ownership.
- Configure WAL directory on separate storage.
- Enable huge_pages and verify they are in use.
- Test effects of synchronous_commit ON/OFF.
- Configure and test replication slots.
- Simulate out-of-disk space in pg_xlog/pg_wal.
- Enable pgaudit and generate audited events.
- Recover after accidental deletion of configuration files.
- Configure statement_timeout globally and per user.
- Enable auto_explain and tune slow query logging.
- Benchmark using pgbench with custom scripts.
- Configure server for high-concurrency workloads.
- Diagnose PostgreSQL startup after kernel update.
- Tune effective_io_concurrency and test I/O improvements.
- Configure PostgreSQL memory settings for 128GB RAM.
- Test WAL compression.
- Configure custom background workers.
- Recover cluster from partial pg_wal archive loss.
- Setup patroni or repmgr configuration files.
π¦ SECTION 2 — Roles, Security, Access Control, Encryption (25 Scenarios)
- Create role with least privilege access for app users.
- Implement row-level security (RLS).
- Enforce password expiry for selected roles.
- Configure client cert authentication using pg_hba.conf.
- Implement SCRAM‑SHA‑256 authentication.
- Audit failed logins using pgaudit.
- Restrict access to certain tables using privileges + RLS.
- Create a login trigger to audit sessions.
- Detect brute-force login attempts.
- Restrict superuser access safely.
- Set per-user connection limits.
- Detect users overriding search_path to execute malicious code.
- Enforce password complexity policy.
- Identify unused roles and clean them safely.
- Restrict PUBLIC schema access.
- Encrypt client/server traffic using SSL and validate cipher.
- Protect PostgreSQL from SQL injection at server level.
- Create and test role inheritance rules.
- Implement IP-based restrictions at pg_hba.conf.
- Audit all SELECTs on sensitive tables.
- Enable data at rest encryption using file system tools.
- Block cross‑schema access exploitation.
- Identify roles with excessive grants.
- Detect unauthorized creation of functions.
- Audit DROP TABLE attempts.
π¦ SECTION 3 — Backup, Restore, PITR, WAL Archiving (40 Scenarios)
- Take a base backup using
pg_basebackup. - Restore cluster using PITR to a specific timestamp.
- Enable WAL archiving via archive_command.
- Restore from WAL archive after disaster.
- Perform pg_dump + pg_restore of a 500GB DB.
- Restore specific schema instead of full DB.
- Take logical backups of only tables with specific patterns.
- Restore only selected tables from logical dump.
- Validate backup consistency with pg_verifybackup.
- Perform
pg_receivewalstreaming backup. - Rebuild cluster from only WAL and base backup.
- Fix corrupted backup manifest.
- Simulate archive_command failure and recover.
- Perform PITR using transaction ID boundaries.
- Backup only data without indexes and rebuild indexes later.
- Test restore on lower hardware.
- Restore after accidental TRUNCATE.
- Perform cross-version upgrade using pg_upgrade.
- Validate binary upgrade thoroughly.
- Perform rollback after upgrade failure.
- Restore cluster with tablespace mapping.
- Validate WAL archiving via WAL segment inspection.
- Detect WAL archive gaps.
- Fix corrupted WAL segment.
- Perform a backup and restore with RDS/Aurora.
- Backup using Barman/pgBackRest.
- Restore using pgBackRest with delta restore.
- Simulate WAL corruption on standby and repair.
π¦ SECTION 4 — Replication (Streaming, Logical, Cascading, Slots) — 50 Scenarios
- Set up streaming replication (sync or async).
- Add standby to replication using replication slots.
- Enable physical cascading replication chain (Primary → Standby → Standby).
- Fix replication after WAL archive missing.
- Fix replication getting stuck at a specific LSN.
- Diagnose slow replay on standby.
- Fix standby lag during heavy writes.
- Promote standby without losing data.
- Demote a primary safely.
- Fix replication failing due to conflicting xid.
- Solve replication restart due to WAL retention misconfig.
- Fix “requested WAL segment has already been removed”.
- Enable synchronous replication and test data consistency.
- Fix sync replication causing writer stalls.
- Test effect of synchronous_standby_names.
- Misconfigured slot causing disk full — fix.
- Enable logical decoding and create replication slot.
- Use logical replication to replicate selective tables.
- Fix logical replication missing updates.
- Handle schema drift in logical replication.
- Replicate only DDL changes using pg_logical.
- Perform failover with repmgr/patroni.
- Repoint apps automatically after failover.
- Fix hot standby conflict due to vacuum.
- Fix standby terminating due to recovery conflicts.
- Diagnose heavy WAL sender CPU usage.
- Identify longest-running replication transaction.
- Fix WAL segment recycling too fast.
- Solve "replication slot inactive" issues.
- Repair orphaned replication slots.
- Fix inconsistent LSN across replicas.
- Rebuild standby without restarting primary.
- Diagnose sync replica missing heartbeats.
- Fix replication failover loop.
- Solve replication post-failover divergence.
π¦ SECTION 5 — Vacuum, Autovacuum, Bloat, Freeze, XID (40 Scenarios)
- Diagnose table bloat using pg_stat_all_tables.
- Reclaim bloat using VACUUM FULL.
- Rebuild index concurrently.
- Diagnose autovacuum not triggering.
- Fix autovacuum freeze issues.
- Identify wraparound risk and fix immediately.
- Tune autovacuum for heavy OLTP workloads.
- Reduce dead tuples after mass-update.
- Diagnose index bloat separately from table bloat.
- Fix HOT update not working.
- Identify XID exhaustion risk.
- Force aggressive freeze on large table.
- Fix autovacuum freezing DB causing performance issues.
- Identify very slow VACUUM activity.
- Solve VACUUM preventing standby replay.
- Fix autovacuum workers starvation.
- Identify vacuum dead rows across partitions.
- Create vacuum cost-based tuning.
- Fix anti-wraparound shutdown incidents.
- Perform page-level inspection using
pageinspect. - Freeze all tuples in multi‑TB table.
- Diagnose “could not read block” errors.
- Fix vacuum running forever.
- Fix VACUUM FULL causing table locks.
- Diagnose bloat after failed autovacuum.
- Freeze partitioned tables properly.
- Solve vacuum interfering with logical decoding.
π¦ SECTION 6 — Performance Tuning, Query Tuning, Planner, Cache (50 Scenarios)
- Identify slow queries using pg_stat_statements.
- Fix queries doing sequential scan unexpectedly.
- Diagnose bad join order chosen by planner.
- Fix planner misestimating row counts.
- Detect implicit casts hurting index use.
- Diagnose hash join memory spills.
- Tune work_mem for multi-join queries.
- Fix slow CTE queries after PostgreSQL 12 changes.
- Optimize queries creating temp tables.
- Fix sorts spilling to disk.
- Tune parallel workers for large queries.
- Force planner to use specific index and compare performance.
- Diagnose correlation issues causing poor index usage.
- Fix queries generating large heap fetches.
- Tune effective_cache_size.
- Reduce full table scans.
- Fix queries that ignore multicolumn index.
- Tune JIT execution for analytic queries.
- Diagnose regression after upgrade.
- Optimize queries using JSON.
- Identify slow system catalogs access.
- Tune shared_buffers for large datasets.
- Fix CPU-bound workloads.
- Fix I/O-bound workloads.
- Detect dead tuples affecting query speed.
- Fix SELECT count(*) slowness.
- Optimize GROUP BY on large tables.
- Fix inefficient window functions.
- Solve slow OFFSET/LIMIT queries.
- Optimize DISTINCT queries.
- Fix slow plpgsql functions.
- Detect cache thrashing.
- Fix JIT compilation overhead.
- Reduce nested loop join cost.
- Tune huge join queries executed at peak load.
- Analyze pg_buffercache for hot pages.
- Fix slow stored procedures doing loops.
- Reduce catalog contention, pg_class/pg_attribute hotspots.
- Fix performance regression after new index addition.
- Remove unused indexes hurting write performance.
π¦ SECTION 7 — Indexing, Partitioning, Data Modeling (40 Scenarios)
- Create BRIN index for time-series tables.
- Fix BRIN index not pruning properly.
- Create partial index for selective queries.
- Diagnose overlapping partial index issues.
- Tune index fillfactor.
- Fix index corruption using REINDEX.
- Build indexes concurrently on large tables.
- Use expression index and test effectiveness.
- Diagnose multi-column index misuse.
- Identify redundant indexes.
- Optimize table partitioning strategy.
- Fix partition misrouting.
- Validate constraint exclusion behavior.
- Optimize partition pruning.
- Migrate non-partitioned table to partitioned table online.
- Convert partition strategy from RANGE to LIST.
- Move large partitions to new tablespace.
- Identify hottest partition.
- Fix foreign keys slowing inserts.
- Optimize composite PK for OLTP.
- Diagnose wide tables (100+ columns).
- Optimize datatype choices for memory.
- Identify oversized rows using pg_column_size.
- Normalize / Denormalize data models.
- Fix dead rows causing BRIN index bloat.
- Validate CHECK constraints with partitioning.
- Improve selective WHERE queries via bloom indexes.
- Fix primary-key hot spot issues.
- Create proper index strategy for multi-tenant schema.
- Detect and fix misaligned statistics on partitions.
π¦ SECTION 8 — High Availability, Failover, Clustering (Patroni/Repmgr/etc.) — 35 Scenarios
- Configure Patroni cluster with 3 nodes.
- Simulate node crash and observe automatic failover.
- Fix Patroni failing to elect new leader.
- Repair DCS (etcd/consul) corruption.
- Fix split-brain detection failure.
- Rejoin failed node without data loss.
- Debug repmgr switchover problems.
- Fix repmgr node repeatedly leaving cluster.
- Solve cascading replication inconsistency.
- Fix cluster pauses due to WAL shipping delays.
- Diagnose “fencing” failures.
- Configure synchronous replica priorities.
- Fix node promotion loops.
- Test manual failover + rejoin.
- Fix cluster misrouting read-only traffic.
- Recreate cluster after catastrophic failover.
- Diagnose bottlenecks inside Patroni watchdog.
- Test cluster upgrade with minimal downtime.
- Fix failover caused by long-running queries.
- Fix DCS network partition issues.
- Diagnose replication slot sync issues in cluster.
- Fix multi-node recovery race conditions.
- Test Geo-distributed HA cluster behavior.
π¦ SECTION 9 — Corruption, File System Issues, WAL Problems (40 Scenarios)
- Fix block corruption using pg_checksums.
- Identify corrupted heap pages using pageinspect.
- Repair index-only corruption.
- Fix “invalid page header” errors.
- Recover from missing pg_filenode file.
- Restore corrupted relation from previous backup.
- Fix WAL record with invalid CRC.
- Fix broken visibility map.
- Diagnose relation truncated unexpectedly.
- Fix orphaned toast tables.
- Identify cause of unexpected checkpoint.
- Fix checkpoint spikes causing massive lag.
- Resolve WAL recycling too aggressively.
- Diagnose WAL file generation anomalies.
- Fix full pg_wal directory crash.
- Diagnose missing FPI (full page image).
- Fix tablespace corruption.
- Solve “fork number not found” errors.
- Fix failed relation extension.
- Fix read-only mount making PG unstable.
- Diagnose filesystem-level corruption affecting PG.
- Repair invalid FSM (free space map).
- Fix database inconsistent with pg_control.
- Repair multi‑file relation corruption.
- Diagnose silent data corruption.
π¦ SECTION 10 — Advanced Cloud (RDS/Aurora/GCP/Azure) — 40 Scenarios
- Fix RDS IOPS exhaustion.
- Diagnose Aurora replica lag.
- Fix RDS storage full state.
- Tune RDS parameters via parameter groups.
- Fix RDS failover cycles.
- Diagnose RDS high CPU despite low queries.
- Fix RDS connection storms.
- Fix Aurora writer routing issues.
- Diagnose Aurora GlobalDB replication break.
- Restore RDS snapshot in another region.
- Migrate from on-prem PG → Aurora.
- Fix CPU spikes caused by cloudwatch metrics.
- Solve performance drop after scaling instance.
- Tune RDS for heavy write workloads.
- Fix RDS vacuum freezing the instance.
- Fix GCP CloudSQL connection limit issues.
- Fix Azure PostgreSQL slow due to IO throttling.
- Repair corrupted RDS logical replication slot.
- Fix RDS outage caused by parameter misconfig.
- Diagnose Aurora write stalls.
- Fix RDS slow query caused by network jitter.
- Diagnose cross-region read replica lag.
- Resolve Aurora storage node corruption.
- Test Aurora fast failover behavior.
- Fix RDS frozen due to long transaction.
- Resolve DB parameter applying error.
π¦ SECTION 11 — Migration, Upgrades, DevOps Automation (35 Scenarios)
- Perform pg_upgrade dry-run test.
- Fix pg_upgrade failure due to incompatible extension.
- Migrate PostgreSQL from 9.6 → 16 with minimal downtime.
- Migrate large table via logical replication.
- Fix migration failing because of collation mismatch.
- Migrate sequences properly.
- Synchronize data using pglogical.
- Test cross-architecture migration (ARM → x86).
- Automate DB deployments using Ansible.
- Build CI pipeline to test DB schema changes.
- Fix migration errors in CI pipeline.
- Automate PITR restore on staging.
- Automate pgBackRest incremental backups.
- Detect drift between environments.
- Fix failed logical replication during migration.
- Test blue‑green deployment with PG.
- Integrate DB schema changes into application deployments.
- Validate compatibility before upgrade.
- Fix GIN/GiST index rebuild after upgrade.
- Run vacuum freeze before version upgrade.
- Migrate on-prem cluster to Kubernetes.
π¦ SECTION 12 — Rare, Edge-Case, Nightmare PostgreSQL Failures (35 Scenarios)
- Fix vacuum freezing entire cluster.
- Diagnose invisible dead rows due to snapshot issues.
- Solve SERIAL column exhaustion.
- Fix random postmaster crash due to background worker.
- Diagnose multixact ID exhaustion.
- Repair toast table mismatch.
- Recover from failed drop of partitioned table.
- Fix relation not found after crash.
- Diagnose missing visibility bits.
- Solve duplicate xmin bugs.
- Fix stuck transaction due to 2PC failure.
- Diagnose WAL replay stuck forever.
- Fix out-of-sync pg_class counters.
- Identify OID wraparound issues.
- Diagnose logical decoding memory leak.
- Repair corrupted pg_largeobject table.
- Fix checkpoint loops.
- Fix temp table causing OOM.
- Repair unclean shutdown with corrupted multixact.
- Solve planner crash due to incorrect stats.
- Fix toggle between HOT and non-HOT updates.
- Repair vacuum aborted mid-cycle.
- Diagnose performance regression due to JIT.
- Fix pg_stat_statements overflow.
- Repair replication slot catalog mismatch.
π₯ DONE — You now have 500+ PostgreSQL DBA LAB Scenarios
𧨠PART 2 — 250+ PostgreSQL DBA Interview Questions
Here is a short selection (first 50) to confirm you want the full 250+:
- Explain PostgreSQL MVCC at storage level (CTID, xmin, xmax).
- Why do long transactions cause vacuum freeze issues?
- Explain HOT vs non-HOT updates.
- What is the role of visibility map?
- How does PostgreSQL handle full table rewrite?
- How does planner choose join order?
- Why would planner not choose an index scan?
- Explain autovacuum tuning in high-write workloads.
- What causes PID conflicts in PostgreSQL?
- Explain the difference between VACUUM, VACUUM FULL, ANALYZE.
- Why is count(*) slow? How do you optimize it?
- Explain WAL internal structure (pages, records).
- What is checkpoint and why does it cause I/O storms?
- Explain streaming replication internals.
- How do replication slots prevent WAL removal?
- What causes bloated indexes?
- Explain JIT compilation.
- What is async vs sync replication?
- How do you recover from corrupted relation files?
- Explain BRIN vs BTREE index decisions.
- What is TOAST and how does it work?
- What causes table bloat?
- What is an advisory lock?
- How do you repair corrupted visibility map?
- Explain how PostgreSQL handles failover.
- Difference between logical and physical replication.
- Why do logical replication conflicts happen?
- Explain full_page_writes.
- Why does PostgreSQL not support clustered indexes like InnoDB?
- What is the role of FSM and VM?
- What causes deadlocks in PostgreSQL?
- Explain how analyze updates planning decisions.
- How do you identify slowest queries?
- What is parallel query execution?
- When does parallelism not get used?
- What is a checkpoint buffer write?
- Explain synchronous_standby_names behavior.
- What is vacuum freeze min_age?
- Why does partition pruning fail?
- Explain SERIALIZABLE isolation implementation.
- What causes “tuple concurrently updated”?
- Explain RBM_NORMAL and RBM_ZERO.
- Why does PostgreSQL rewrite table on certain ALTER operations?
- What is clog (pg_xact)?
- How do you detect transaction ID wraparound?
- Why does vacuum get canceled on hot standby?
- Explain WAL recycling.
- Why do GIN/GiST indexes get huge?
- What causes frequent checkpoints?
- Explain RLS security barrier.
Comments
Post a Comment