What a Proxmox POC must actually validate
The first question to ask when an IT leadership team wants to "run a Proxmox POC": what exactly do you want to validate?
The answer is often vague. "We want to see if it works." But that doesn't say much. A Proxmox VM boots in 90 seconds on any developer laptop. It has "worked" since version 3.0.
A serious enterprise POC answers a precise question: can this migration project be executed on this scope, with this team, under these constraints, while maintaining this level of service?
Define validation criteria before you begin
This is the most important rule, and the most commonly bypassed.
If success criteria are not defined before the POC, they'll be defined after. And when they're defined after, they adapt to the observed results — which eliminates any decision value from the POC.
Criteria to define upfront:
- Target RTO per VM category (critical, standard, development)
- Acceptable RPO per application profile
- Minimum expected performance (IOPS, latency) on representative workloads
- Maximum migration time per VM (to estimate wave duration)
- Rollback criteria during the POC itself
Rollback: the first test, not the last
The ability to go back is often tested at the end of a POC, if tested at all. That's a mistake.
Rollback must be the first validated scenario. Before migrating anything to production, you need to be able to answer this question: "if we identify a problem 48 hours after a migration, how do we return to the previous state?"
Rollback tests to include:
- Migrate a test VM to Proxmox
- Simulate an application problem (don't wait for a real problem — simulate one)
- Rollback decision: who decides, on what criteria, within what timeframe?
- Execute the rollback: put the VM back on VMware and verify it boots correctly
- Measured end-to-end duration
Operational workflows: what documentation doesn't test
The POC must simulate the routine tasks that the operations team will perform in production.
Not the initial deployment tasks — anyone can do those with the documentation. The recurring tasks and interventions under pressure.
Workflows to cover without exception:
- Emergency restart of a VM
- Console connection when network is lost on a VM
- Manual snapshot before a risky operation
- File restore from PBS
- Adding a disk to a production VM
- Updating a Proxmox node with live migration of VMs
Backup validation: deeper than "the job completed without errors"
An unverified backup isn't a backup. It's a file of hope.
The POC must include complete restore tests — not just verifications of the "backup job shows a green checkmark" type.
PBS validation tests:
- Backup of a representative application VM (ideally with a database)
- Full restore in an isolated environment
- Start the restored VM and verify application consistency (not just "ping responds")
- Measure end-to-end restore time
- PBS integrity verification test on the produced backup
DRP tests: not only on the Proxmox side
A Proxmox POC is not complete if it doesn't integrate a full DRP test.
This test must simulate the loss of a site or cluster, and measure:
- Incident detection time
- Decision time (who decides the failover, on what criteria)
- Technical failover execution time
- Application consistency after failover
- Time to return to nominal state
The "technical" RTO (Proxmox failover execution time) is often different from the "real" RTO (time before users see a service return). The difference includes detection, decision, and sometimes unanticipated application dependencies.
Performance consistency over time
A POC benchmark run over 30 minutes doesn't represent production performance under real load for 8 hours.
What the POC must measure:
- Performance under representative load (not only synthetic benchmarks)
- Performance during a maintenance operation (live migration of another node, Ceph rebuild)
- Performance after a week of continuous operation under load
Team maturity: a non-technical but decisive criterion
The POC must also answer a question that sometimes isn't explicitly asked: is the team ready to operate this environment?
Evaluation criteria:
- Time needed to diagnose a Proxmox network problem without external help
- Ability to interpret Ceph logs when an alert fires
- Comfort with the Proxmox interface for common operations
- Understanding of the HA failover model
If these criteria aren't met by the end of the POC, two interpretations are possible: the scope isn't suitable, or training must precede the production deployment. Both are honest answers. What's not an honest answer: ignoring the question and delivering a cluster without a training plan.
What the POC must produce as a deliverable
A POC without an exit document didn't exist for the decision-makers who weren't there.
Minimum deliverables:
- Results measured against each validation criterion defined upfront
- Identified gaps and remediation plans
- Measured RTO/RPO vs objectives
- Residual risk points documented
- Go / no-go / conditional go recommendation
This document is what allows IT leadership to make a committed, documented, and defensible decision.