Let's Build a Lab - Part 2
Contract work is occasionally... fun... more so when it involves building labs.
So many words. Even today with my own writing I want a TL;DR, so fuck it here's a not-boilerplate version of one standardized "Statement of Work" contracts that is oriented on building labs. Timeline are negotiable, scope is negotiable, hourly rate is not often negotiable - I learned that lesson a long time ago.
For those not familiar, the SoW is a consulting agreement between contractor and employer, which defines the scope and timeline and requirements etc to "do the needful" in a broader sense of everyone's favorite phrase from the 'ol "Follow the Sun" SLA support model from the early pre-Cloud era (sorry cloudbabies it's from when you were single digits old or not yet born).
This specific SoW has been used in prior endeavors which eventually follow the typical enterprise hardware lifecycle in no specific order [1]: forklifts, retros, rebuilds, expansion, and... words.. so many words.
Labs last a long time, so it's important to build things correctly the first time - not waste time hacking together whatever compromised decision trees were necessary due to last-minute planning (or no planning at all) and little to no experience with building labs. Remember, if you are a specialized engineer who excels in one or two areas, that does not imply domain specific knowledge from adjacent or entirely different sides of engineering worlds. I'm a systems architect .. I don't build jet engines (though it's probably fun), so I do what I'm good at. Otherwise to say, "stay in your lane or learn to love losing - but protect that ego at all costs and never admit fault ever!" (yes, I'm bitter and for exceptionally valid reasons).
So, until the pre-IPO runway runs out, or the post-IPO environment bores people to tears and they jump ship with their ISOs and RSUs and START ALL OVER AGAIN! (I have definitely never not done this exact thing several times never for sure for sure? sure.)
Successful Labs Require People and Hardware
Vaporware labs don't have hardware. Real labs do not revolve around PPTX slide decks, or Slack Canvas, or Sharepoint whatever, or ephemeral Confluence pages that change ownership by a click of a button. Real labs have real hardware and real engineers and architects involved at every critical stage to ensure that standards are being used, implemented, tested, and validated.
If you think I'm wrong, go ahead, do it all by slide deck, by half-baked simulators that don't even use SST and parallelization correctly, with lots of Big Discussions for the Big People you want to impress to get ahead instead of making quality hardware.
See how far the Peter Principle gets you - chances are, if you're surrounded by "Yes Men" and "C-Suite but never Technical" boardrooms, you will get far enough to be proven the fraud that you are.
The Peter principle is a concept in management .. which observes that people in a hierarchy tend to rise to "a level of respective incompetence": employees are promoted based on their success in previous jobs until they reach a level at which they are no longer competent, as skills in one job do not necessarily translate to another.
Having risen rapidly, since you're up high now - you can destroy entire companies by making extremely stupid choices "to save money" by not spec'ing hot-swap trays for NVMe drives on 1U systems which were never designed for the wattage you're running - and were not spec'd for the BTU and LFM required to keep the systems from burning out 100G optics..
Just when you think it's all good, um why are there only seven drives in this chassis that can run with eight?
- Oh it's because the 8th drive would draw too much power and overload the non-redundant PSUs. What happens when a system which should have N+1 has only N?
- The PSUs could be in redundant mode but Billy didn't want to authorize the enablement of active/active mode, and instead has demanded that his lacky make the systems run the PSUs in "combined mode" to try to get more power to that 8th drive... but no... not possible.
- So all of the fancy new systems can't have eight drives like the original spec REQUIRED in order to fulfill global fleet machine generation scaling models. As in, you lose 1/8th of your drives on every new system for so many thousands of systems. 1/8th, for those not inclined to fractional conversions, is
- So now all of those systems will be missing ~2-16-64TB of NVMe drives (multiply by number of nodes per rack, per pod, per cage, per site, per region, per global fleet). That's a lot of expected high-speed storage that is now not being deployed.
- Hopefully no contracts were made to customers based on "Total NVMe Raw Storage per <unit/region/etc>" variables, which now renders the deliverable impossible.
- What can be done? Those contracts have to either be revised and compensated (yeah good f'in luck), or a massive workaround must be implemented to compensate - at additional cost to the company.
Why does that matter to building labs?
Because any lab worth their salt would have raised immediate red flags and halted the entire ill-devised scope. Then they would use standardized and periodically revised for betterment:
- provide technical backing evidence
- scaling costs, IOP/s cost, power assessments, baseline to STDDEV ratio, then appropriation wear-cycle equations
- All that industry jargon that managerializes into the ether for all to repeat.
- "keep me honest" you may say, so then.. are you often lying? Ill-informed, unprepared, completely clueless?
- Maybe not, hopefully not, but analytical people will see you that way and eventually the Peter Principle catches up and BAM! You're hit by a <REDACTED> and your entire team is fired, and so you move on with your RSUs, talk yourself up, and try to ruin those who smote you from afar.
Congrats, Billy, you'll never change - and that is why we test in labs - because things change.
The Contract Statement of Work
Always get the scope in writing with sign-off. Scope changes mid-stream, not awesome.
Statement of Work
Consultant Services Engagement
This Statement of Work ("SOW") is entered into between [Client Legal
Name] ("Client") and [Consultant Legal Name] ("Consultant"), effective
as of [Effective Date], and is governed by the [Master Consulting
Agreement / Professional Services Agreement] dated [Agreement Date] (the
"Agreement"). In the event of any conflict between this SOW and the
Agreement, the Agreement will control.
-----------------------------------------------------------------------
Client [Client Legal Name]
----------------------------------- -----------------------------------
Consultant [Consultant Legal Name]
SOW Effective Date [Effective Date]
Initial Term Six (6) months from the SOW
Effective Date
Target Level of Effort Forty (40) hours per week,
approximately 1,040 hours over the
initial term
Primary Stakeholders VP, OCTO, Eng-Org, App-Team, and
Sales-Org
Commercial Terms Billing rate, invoicing, payment
terms, and approved expenses as set
forth in the Agreement and/or
applicable order form
-----------------------------------------------------------------------
1. Purpose and Objectives
The Consultant will provide technical leadership, architecture,
implementation, documentation, mentoring, and subject matter expertise
to establish standardized HPC benchmarking, workload automation, and
telemetry/analytics capabilities, while also improving cross-functional
execution and the overall developer experience for CORP products and
associated collateral.
- Define, document, and deploy a modular "Industry Expectations
Benchmark Suite" leveraging HPC, SPEC, Top500, AI/ML GPU leaderboards,
and SNIA methodologies to support rigorous adherence and consistent
results using standard performance testing tools.
- Define and deploy a standardized HPC "Workload Automation Pipeline"
utilizing SLURM, Jenkins, Celery, RabbitMQ ("RMQ"), ClickHouse,
Ansible, and related tooling.
- Define and deploy a standardized HPC "Telemetry + Analytics
Aggregation + Data Visualization Processing Pipeline" utilizing
Check_MK, SNMP, Elasticsearch + Kibana, Prometheus, InfluxDB, Grafana,
and related tooling.
- Assess current organizational working patterns across Eng-Org,
App-Team, and Sales-Org and provide a quarterly review to the VP and
OCTO.
- Assess the current CORP collateral surface area, including SDKs,
documentation, libraries, proofs of concept, and reference designs, as
an ongoing initiative to improve the developer experience.
- Author white papers, research documents, and related technical
materials, with a feedback loop to validate and improve internal and
external documentation.
- Provide mentoring for junior through senior application and systems
engineers, plus reasonable availability as an SME across storage,
systems, database, and AI/ML domains.
- Provide ad hoc consultation to support Sales-Org customer scoping,
implementation, and integration needs.
2. Term and Working Model
The engagement will begin on the SOW Effective Date and continue for an
initial six (6) month term. The Consultant will generally allocate forty
(40) hours per week to the engagement, subject to mutually agreed
priorities, holiday schedules, and reasonable scheduling adjustments.
The parties acknowledge that some workstreams in this SOW are ongoing or
iterative in nature. Accordingly, the Client and Consultant will jointly
prioritize and sequence the work so that the highest-value deliverables
are completed within the available time and level of effort.
Services may be provided remotely or through other mutually agreed
working arrangements. Any onsite work, travel, or work materially
outside the target level of effort will require prior written approval.
3. Scope of Services
3.1 Modular "Industry Expectations Benchmark Suite"
The Consultant will define, document, and deploy a modular benchmark
suite intended to measure representative performance, reproducibility,
and adherence to industry-standard methodologies across compute,
storage, and AI/ML use cases.
Illustrative deliverables:
- Benchmark taxonomy and workload catalog covering applicable HPC, SPEC,
Top500-style, AI/ML GPU leaderboard-relevant, and SNIA-aligned
methodologies.
- Benchmark execution standards, parameterization guidance, and runbooks
designed to improve repeatability and result consistency.
- Result templates and reporting conventions for capturing baseline
measurements and comparative findings.
- Initial implementation and/or deployment in designated Client
environment(s), subject to infrastructure readiness and access.
- Validation summary describing observed results, known limitations, and
next-step recommendations.
3.2 Standardized HPC "Workload Automation Pipeline"
The Consultant will define and deploy a standardized HPC workload
automation architecture using SLURM, Jenkins, Celery, RMQ/RabbitMQ,
ClickHouse, Ansible, and related tools as appropriate to the Client
environment.
Illustrative deliverables:
- Reference architecture and component interaction design for job
submission, orchestration, execution, retry handling, and result
capture.
- Deployment-ready configurations, automation artifacts, or initial
implementation in designated environment(s).
- Operational workflows for benchmark or workload scheduling, execution,
data collection, and reporting.
- Runbooks and handoff documentation sufficient to support repeatable
operation and internal adoption.
3.3 Standardized HPC "Telemetry + Analytics Aggregation + Data
Visualization Processing Pipeline"
The Consultant will define and deploy a standardized observability and
analytics processing pipeline spanning infrastructure telemetry,
log/event aggregation, time-series monitoring, analytics storage, and
visualization.
Illustrative deliverables:
- Reference architecture covering Check_MK, SNMP, Elasticsearch +
Kibana, Prometheus, InfluxDB, Grafana, and related integrations as
applicable.
- Recommended data model, retention approach, dashboard structure, and
operating practices for standardized reporting and analysis.
- Initial dashboards, visualizations, and/or deployment artifacts for
designated environment(s).
- Documentation for data sources, ingestion paths, dashboards, and
operational ownership considerations.
3.4 Cross-Functional Collaboration Assessment
The Consultant will assess current organizational working patterns
across Eng-Org, App-Team, and Sales-Org in order to identify friction
points, handoff gaps, role ambiguity, communication issues, and
opportunities to improve execution.
Illustrative deliverables:
- Current-state assessment and concise findings memo.
- Recommendations for cadence, decision ownership, communication flow,
and escalation patterns.
- Quarterly review readout(s) presented to the VP and OCTO during the
engagement term.
3.5 CORP Collateral Surface Area and Developer Experience
The Consultant will assess the current ‘surface area’ of CORP collateral,
including SDKs, documentation, libraries, proofs of concept, and
reference designs, as an iterative stretch initiative with the overall
goal of improving the developer experience.
Illustrative deliverables:
- Inventory and gap analysis of current CORP technical collateral.
- Prioritized recommendations and/or backlog themes to improve
consistency, usability, completeness, and customer/integrator
experience.
- Feedback loop for validating and improving internal and external
documentation and supporting artifacts.
- Periodic progress updates and next-step roadmap recommendations within
the engagement window.
3.6 White Papers, Research Documents, and Documentation Improvement
The Consultant will author white papers, research documents, technical
notes, and similar written materials as prioritized by the Client, and
will help create a feedback loop to validate and improve internal and
external documentation quality.
Illustrative deliverables:
- White papers, technical briefs, research documents, architecture
notes, or similar written outputs as mutually prioritized.
- Document review comments, revision recommendations, and technical
fact-checking input.
- Structured feedback loop to capture lessons learned and improve future
documentation quality.
3.7 Mentoring and Knowledge Transfer
The Consultant will provide mentoring support for junior through senior
application and systems engineers, with an emphasis on practical
technical development and hands-on knowledge transfer.
Illustrative deliverables:
- Office hours, design reviews, pair-working sessions, workshops, or
other mentoring interactions as appropriate.
- Guidance on benchmarking, automation, observability, storage, systems,
database, and AI/ML topics relevant to the engagement.
- Knowledge transfer materials or summary notes where appropriate to
support broader team adoption.
3.8 SME Availability and Sales-Org Consultation
The Consultant will maintain reasonable availability to organizational
staff in applicable subject-matter areas of focus and will provide ad
hoc consultation for Sales-Org customer scoping, implementation, and
integration needs, as capacity permits within the contracted level of
effort.
Illustrative deliverables:
- Advisory support for storage, systems, database, and AI/ML questions
arising during the engagement.
- Participation in internal and customer-facing scoping discussions,
solution reviews, or implementation planning sessions, when requested
by the Client.
- Technical input intended to improve solution fit, implementation
readiness, and integration clarity; provided as advisory support
rather than as a standalone managed service or customer support
commitment.
4. Indicative Delivery Plan
The following plan is illustrative and may be reprioritized by mutual
agreement based on Client needs, environment readiness, and emerging
business priorities.
-----------------------------------------------------------------------
Period Primary Focus Illustrative Outputs
----------------------- ----------------------- -----------------------
Month 1 Discovery and Baseline Stakeholder alignment,
current-state review,
environment access,
priority mapping,
benchmark/workload
definition, and
architecture framing.
Months 2-3 Design and Initial Benchmark suite v1,
Deployment automation pipeline
design and initial
deployment artifacts,
telemetry pipeline
design, and initial
dashboards or data
paths.
Months 4-5 Validation and Validation runs,
Expansion documentation
expansion,
collaboration
assessment findings,
collateral review,
white papers/research
outputs, and mentoring
support.
Month 6 Handoff and Executive Runbooks,
Review recommendations,
backlog or roadmap
items, knowledge
transfer, and
quarterly/executive
review presentation(s)
to the VP and OCTO.
-----------------------------------------------------------------------
5. Governance and Reporting
- The Client will designate a primary engagement owner or point of
contact responsible for prioritization, coordination, and consolidated
feedback.
- The parties will maintain a regular working cadence, which may include
weekly or biweekly syncs, technical reviews, design reviews, and
status reporting as appropriate.
- The Consultant will provide concise updates on progress, risks,
blockers, and recommended next actions.
- A quarterly review readout will be provided to the VP and OCTO during
the engagement term.
6. Assumptions, Dependencies, and Constraints
- The Client will provide timely access to relevant systems,
environments, credentials, documentation, repositories, data, and
subject-matter stakeholders reasonably required for the services.
- The Client is responsible for providing or approving the hardware,
software, licenses, network access, security approvals, and
change-management windows required to implement or deploy
deliverables.
- Any production deployment, change control, or release activity remains
subject to Client approval processes and operational safeguards.
- Where this SOW references industry-standard benchmarks or
leaderboard-adjacent methodologies, any formal external publication or
third-party submission will remain subject to separate Client approval
and third-party rules.
- The Consultant’s work product may include recommendations, reference
architectures, configurations, code or scripts, runbooks, dashboards,
presentations, reports, and live working sessions; not every
workstream will result in a standalone written deliverable.
- Because certain activities are advisory and ongoing in nature, the
parties will periodically rebalance priorities to remain within the
contracted time and level of effort.
- Travel, onsite presence, after-hours support, and work materially
beyond the initial term or weekly level of effort are excluded unless
separately approved in writing.
7. Out of Scope
- Managed services, long-term production operations ownership, or 24x7
operational support.
- Hardware procurement, datacenter installation, or unrelated
infrastructure build-out not directly tied to the services described
in this SOW.
- Binding commitments to end customers, contractual promises to third
parties, or customer project delivery obligations not expressly
authorized by the Client.
- Ongoing maintenance, enhancement, or support of deliverables after the
end of the engagement term, except as separately agreed in writing.
8. Acceptance and Change Control
Commercial terms, invoicing, payment terms, intellectual property,
confidentiality, warranties, limitation of liability, and any formal
acceptance procedures will be governed by the Agreement.
Where the Agreement does not specify a separate acceptance procedure,
the Client will review deliverables promptly and provide consolidated
feedback within five (5) business days after delivery. Deliverables will
be deemed accepted when they materially conform to the scope described
in this SOW, subject to any documented punch-list items or agreed
revisions.
Any material change to scope, deliverables, timeline, or level of effort
will require written approval by both parties.
9. Signatures
The parties acknowledge and agree to this SOW as of the dates written
below.
By: ________________________________
Name: ______________________________
[Consultant Legal Name]
Title: _______________________________
Date: _______________________________
By: ________________________________
[Client Legal Name]
Name: ______________________________
Title: _______________________________
Date: _______________________________