Let's Build a Lab - Part 2

Contract work is occasionally... fun... more so when it involves building labs.

Share
Let's Build a Lab - Part 2
Decent results but pretty accurate if the bench were a bit more organized with ESD grounds.

So many words. Even today with my own writing I want a TL;DR, so fuck it here's a not-boilerplate version of one standardized "Statement of Work" contracts that is oriented on building labs. Timeline are negotiable, scope is negotiable, hourly rate is not often negotiable - I learned that lesson a long time ago.

For those not familiar, the SoW is a consulting agreement between contractor and employer, which defines the scope and timeline and requirements etc to "do the needful" in a broader sense of everyone's favorite phrase from the 'ol "Follow the Sun" SLA support model from the early pre-Cloud era (sorry cloudbabies it's from when you were single digits old or not yet born).

This specific SoW has been used in prior endeavors which eventually follow the typical enterprise hardware lifecycle in no specific order [1]: forklifts, retros, rebuilds, expansion, and... words.. so many words.

Labs last a long time, so it's important to build things correctly the first time - not waste time hacking together whatever compromised decision trees were necessary due to last-minute planning (or no planning at all) and little to no experience with building labs. Remember, if you are a specialized engineer who excels in one or two areas, that does not imply domain specific knowledge from adjacent or entirely different sides of engineering worlds. I'm a systems architect .. I don't build jet engines (though it's probably fun), so I do what I'm good at. Otherwise to say, "stay in your lane or learn to love losing - but protect that ego at all costs and never admit fault ever!" (yes, I'm bitter and for exceptionally valid reasons).

So, until the pre-IPO runway runs out, or the post-IPO environment bores people to tears and they jump ship with their ISOs and RSUs and START ALL OVER AGAIN! (I have definitely never not done this exact thing several times never for sure for sure? sure.)


Successful Labs Require People and Hardware

Vaporware labs don't have hardware. Real labs do not revolve around PPTX slide decks, or Slack Canvas, or Sharepoint whatever, or ephemeral Confluence pages that change ownership by a click of a button. Real labs have real hardware and real engineers and architects involved at every critical stage to ensure that standards are being used, implemented, tested, and validated.

If you think I'm wrong, go ahead, do it all by slide deck, by half-baked simulators that don't even use SST and parallelization correctly, with lots of Big Discussions for the Big People you want to impress to get ahead instead of making quality hardware.

See how far the Peter Principle gets you - chances are, if you're surrounded by "Yes Men" and "C-Suite but never Technical" boardrooms, you will get far enough to be proven the fraud that you are.

The Peter principle is a concept in management .. which observes that people in a hierarchy tend to rise to "a level of respective incompetence": employees are promoted based on their success in previous jobs until they reach a level at which they are no longer competent, as skills in one job do not necessarily translate to another.

Having risen rapidly, since you're up high now - you can destroy entire companies by making extremely stupid choices "to save money" by not spec'ing hot-swap trays for NVMe drives on 1U systems which were never designed for the wattage you're running - and were not spec'd for the BTU and LFM required to keep the systems from burning out 100G optics..

Just when you think it's all good, um why are there only seven drives in this chassis that can run with eight?

  • Oh it's because the 8th drive would draw too much power and overload the non-redundant PSUs. What happens when a system which should have N+1 has only N?
  • The PSUs could be in redundant mode but Billy didn't want to authorize the enablement of active/active mode, and instead has demanded that his lacky make the systems run the PSUs in "combined mode" to try to get more power to that 8th drive... but no... not possible.
  • So all of the fancy new systems can't have eight drives like the original spec REQUIRED in order to fulfill global fleet machine generation scaling models. As in, you lose 1/8th of your drives on every new system for so many thousands of systems. 1/8th, for those not inclined to fractional conversions, is
  • So now all of those systems will be missing ~2-16-64TB of NVMe drives (multiply by number of nodes per rack, per pod, per cage, per site, per region, per global fleet). That's a lot of expected high-speed storage that is now not being deployed.
  • Hopefully no contracts were made to customers based on "Total NVMe Raw Storage per <unit/region/etc>" variables, which now renders the deliverable impossible.
  • What can be done? Those contracts have to either be revised and compensated (yeah good f'in luck), or a massive workaround must be implemented to compensate - at additional cost to the company.

Why does that matter to building labs?

Because any lab worth their salt would have raised immediate red flags and halted the entire ill-devised scope. Then they would use standardized and periodically revised for betterment:

  • provide technical backing evidence
  • scaling costs, IOP/s cost, power assessments, baseline to STDDEV ratio, then appropriation wear-cycle equations
  • All that industry jargon that managerializes into the ether for all to repeat.
  • "keep me honest" you may say, so then.. are you often lying? Ill-informed, unprepared, completely clueless?
  • Maybe not, hopefully not, but analytical people will see you that way and eventually the Peter Principle catches up and BAM! You're hit by a <REDACTED> and your entire team is fired, and so you move on with your RSUs, talk yourself up, and try to ruin those who smote you from afar.

Congrats, Billy, you'll never change - and that is why we test in labs - because things change.


The Contract Statement of Work

Always get the scope in writing with sign-off. Scope changes mid-stream, not awesome.

Statement of Work

Consultant Services Engagement

This Statement of Work ("SOW") is entered into between [Client Legal
Name] ("Client") and [Consultant Legal Name] ("Consultant"), effective
as of [Effective Date], and is governed by the [Master Consulting
Agreement / Professional Services Agreement] dated [Agreement Date] (the
"Agreement"). In the event of any conflict between this SOW and the
Agreement, the Agreement will control.

  -----------------------------------------------------------------------
  Client                              [Client Legal Name]
  ----------------------------------- -----------------------------------
  Consultant                          [Consultant Legal Name]

  SOW Effective Date                  [Effective Date]

  Initial Term                        Six (6) months from the SOW
                                      Effective Date

  Target Level of Effort              Forty (40) hours per week,
                                      approximately 1,040 hours over the
                                      initial term

  Primary Stakeholders                VP, OCTO, Eng-Org, App-Team, and
                                      Sales-Org

  Commercial Terms                    Billing rate, invoicing, payment
                                      terms, and approved expenses as set
                                      forth in the Agreement and/or
                                      applicable order form
  -----------------------------------------------------------------------

1. Purpose and Objectives

The Consultant will provide technical leadership, architecture,
implementation, documentation, mentoring, and subject matter expertise
to establish standardized HPC benchmarking, workload automation, and
telemetry/analytics capabilities, while also improving cross-functional
execution and the overall developer experience for CORP products and
associated collateral.

- Define, document, and deploy a modular "Industry Expectations
  Benchmark Suite" leveraging HPC, SPEC, Top500, AI/ML GPU leaderboards,
  and SNIA methodologies to support rigorous adherence and consistent
  results using standard performance testing tools.

- Define and deploy a standardized HPC "Workload Automation Pipeline"
  utilizing SLURM, Jenkins, Celery, RabbitMQ ("RMQ"), ClickHouse,
  Ansible, and related tooling.

- Define and deploy a standardized HPC "Telemetry + Analytics
  Aggregation + Data Visualization Processing Pipeline" utilizing
  Check_MK, SNMP, Elasticsearch + Kibana, Prometheus, InfluxDB, Grafana,
  and related tooling.

- Assess current organizational working patterns across Eng-Org,
  App-Team, and Sales-Org and provide a quarterly review to the VP and
  OCTO.

- Assess the current CORP collateral surface area, including SDKs,
  documentation, libraries, proofs of concept, and reference designs, as
  an ongoing initiative to improve the developer experience.

- Author white papers, research documents, and related technical
  materials, with a feedback loop to validate and improve internal and
  external documentation.

- Provide mentoring for junior through senior application and systems
  engineers, plus reasonable availability as an SME across storage,
  systems, database, and AI/ML domains.

- Provide ad hoc consultation to support Sales-Org customer scoping,
  implementation, and integration needs.

2. Term and Working Model

The engagement will begin on the SOW Effective Date and continue for an
initial six (6) month term. The Consultant will generally allocate forty
(40) hours per week to the engagement, subject to mutually agreed
priorities, holiday schedules, and reasonable scheduling adjustments.

The parties acknowledge that some workstreams in this SOW are ongoing or
iterative in nature. Accordingly, the Client and Consultant will jointly
prioritize and sequence the work so that the highest-value deliverables
are completed within the available time and level of effort.

Services may be provided remotely or through other mutually agreed
working arrangements. Any onsite work, travel, or work materially
outside the target level of effort will require prior written approval.

3. Scope of Services

3.1 Modular "Industry Expectations Benchmark Suite"

The Consultant will define, document, and deploy a modular benchmark
suite intended to measure representative performance, reproducibility,
and adherence to industry-standard methodologies across compute,
storage, and AI/ML use cases.

Illustrative deliverables:

- Benchmark taxonomy and workload catalog covering applicable HPC, SPEC,
  Top500-style, AI/ML GPU leaderboard-relevant, and SNIA-aligned
  methodologies.

- Benchmark execution standards, parameterization guidance, and runbooks
  designed to improve repeatability and result consistency.

- Result templates and reporting conventions for capturing baseline
  measurements and comparative findings.

- Initial implementation and/or deployment in designated Client
  environment(s), subject to infrastructure readiness and access.

- Validation summary describing observed results, known limitations, and
  next-step recommendations.

3.2 Standardized HPC "Workload Automation Pipeline"

The Consultant will define and deploy a standardized HPC workload
automation architecture using SLURM, Jenkins, Celery, RMQ/RabbitMQ,
ClickHouse, Ansible, and related tools as appropriate to the Client
environment.

Illustrative deliverables:

- Reference architecture and component interaction design for job
  submission, orchestration, execution, retry handling, and result
  capture.

- Deployment-ready configurations, automation artifacts, or initial
  implementation in designated environment(s).

- Operational workflows for benchmark or workload scheduling, execution,
  data collection, and reporting.

- Runbooks and handoff documentation sufficient to support repeatable
  operation and internal adoption.

3.3 Standardized HPC "Telemetry + Analytics Aggregation + Data
Visualization Processing Pipeline"

The Consultant will define and deploy a standardized observability and
analytics processing pipeline spanning infrastructure telemetry,
log/event aggregation, time-series monitoring, analytics storage, and
visualization.

Illustrative deliverables:

- Reference architecture covering Check_MK, SNMP, Elasticsearch +
  Kibana, Prometheus, InfluxDB, Grafana, and related integrations as
  applicable.

- Recommended data model, retention approach, dashboard structure, and
  operating practices for standardized reporting and analysis.

- Initial dashboards, visualizations, and/or deployment artifacts for
  designated environment(s).

- Documentation for data sources, ingestion paths, dashboards, and
  operational ownership considerations.

3.4 Cross-Functional Collaboration Assessment

The Consultant will assess current organizational working patterns
across Eng-Org, App-Team, and Sales-Org in order to identify friction
points, handoff gaps, role ambiguity, communication issues, and
opportunities to improve execution.

Illustrative deliverables:

- Current-state assessment and concise findings memo.

- Recommendations for cadence, decision ownership, communication flow,
  and escalation patterns.

- Quarterly review readout(s) presented to the VP and OCTO during the
  engagement term.

3.5 CORP Collateral Surface Area and Developer Experience

The Consultant will assess the current ‘surface area’ of CORP collateral,
including SDKs, documentation, libraries, proofs of concept, and
reference designs, as an iterative stretch initiative with the overall
goal of improving the developer experience.

Illustrative deliverables:

- Inventory and gap analysis of current CORP technical collateral.

- Prioritized recommendations and/or backlog themes to improve
  consistency, usability, completeness, and customer/integrator
  experience.

- Feedback loop for validating and improving internal and external
  documentation and supporting artifacts.

- Periodic progress updates and next-step roadmap recommendations within
  the engagement window.

3.6 White Papers, Research Documents, and Documentation Improvement

The Consultant will author white papers, research documents, technical
notes, and similar written materials as prioritized by the Client, and
will help create a feedback loop to validate and improve internal and
external documentation quality.

Illustrative deliverables:

- White papers, technical briefs, research documents, architecture
  notes, or similar written outputs as mutually prioritized.

- Document review comments, revision recommendations, and technical
  fact-checking input.

- Structured feedback loop to capture lessons learned and improve future
  documentation quality.

3.7 Mentoring and Knowledge Transfer

The Consultant will provide mentoring support for junior through senior
application and systems engineers, with an emphasis on practical
technical development and hands-on knowledge transfer.

Illustrative deliverables:

- Office hours, design reviews, pair-working sessions, workshops, or
  other mentoring interactions as appropriate.

- Guidance on benchmarking, automation, observability, storage, systems,
  database, and AI/ML topics relevant to the engagement.

- Knowledge transfer materials or summary notes where appropriate to
  support broader team adoption.

3.8 SME Availability and Sales-Org Consultation

The Consultant will maintain reasonable availability to organizational
staff in applicable subject-matter areas of focus and will provide ad
hoc consultation for Sales-Org customer scoping, implementation, and
integration needs, as capacity permits within the contracted level of
effort.

Illustrative deliverables:

- Advisory support for storage, systems, database, and AI/ML questions
  arising during the engagement.

- Participation in internal and customer-facing scoping discussions,
  solution reviews, or implementation planning sessions, when requested
  by the Client.

- Technical input intended to improve solution fit, implementation
  readiness, and integration clarity; provided as advisory support
  rather than as a standalone managed service or customer support
  commitment.

4. Indicative Delivery Plan

The following plan is illustrative and may be reprioritized by mutual
agreement based on Client needs, environment readiness, and emerging
business priorities.

  -----------------------------------------------------------------------
          Period          Primary Focus           Illustrative Outputs
  ----------------------- ----------------------- -----------------------
          Month 1         Discovery and Baseline  Stakeholder alignment,
                                                  current-state review,
                                                  environment access,
                                                  priority mapping,
                                                  benchmark/workload
                                                  definition, and
                                                  architecture framing.

        Months 2-3        Design and Initial      Benchmark suite v1,
                          Deployment              automation pipeline
                                                  design and initial
                                                  deployment artifacts,
                                                  telemetry pipeline
                                                  design, and initial
                                                  dashboards or data
                                                  paths.

        Months 4-5        Validation and          Validation runs,
                          Expansion               documentation
                                                  expansion,
                                                  collaboration
                                                  assessment findings,
                                                  collateral review,
                                                  white papers/research
                                                  outputs, and mentoring
                                                  support.

          Month 6         Handoff and Executive   Runbooks,
                          Review                  recommendations,
                                                  backlog or roadmap
                                                  items, knowledge
                                                  transfer, and
                                                  quarterly/executive
                                                  review presentation(s)
                                                  to the VP and OCTO.
  -----------------------------------------------------------------------

5. Governance and Reporting

- The Client will designate a primary engagement owner or point of
  contact responsible for prioritization, coordination, and consolidated
  feedback.

- The parties will maintain a regular working cadence, which may include
  weekly or biweekly syncs, technical reviews, design reviews, and
  status reporting as appropriate.

- The Consultant will provide concise updates on progress, risks,
  blockers, and recommended next actions.

- A quarterly review readout will be provided to the VP and OCTO during
  the engagement term.

6. Assumptions, Dependencies, and Constraints

- The Client will provide timely access to relevant systems,
  environments, credentials, documentation, repositories, data, and
  subject-matter stakeholders reasonably required for the services.

- The Client is responsible for providing or approving the hardware,
  software, licenses, network access, security approvals, and
  change-management windows required to implement or deploy
  deliverables.

- Any production deployment, change control, or release activity remains
  subject to Client approval processes and operational safeguards.

- Where this SOW references industry-standard benchmarks or
  leaderboard-adjacent methodologies, any formal external publication or
  third-party submission will remain subject to separate Client approval
  and third-party rules.

- The Consultant’s work product may include recommendations, reference
  architectures, configurations, code or scripts, runbooks, dashboards,
  presentations, reports, and live working sessions; not every
  workstream will result in a standalone written deliverable.

- Because certain activities are advisory and ongoing in nature, the
  parties will periodically rebalance priorities to remain within the
  contracted time and level of effort.

- Travel, onsite presence, after-hours support, and work materially
  beyond the initial term or weekly level of effort are excluded unless
  separately approved in writing.

7. Out of Scope

- Managed services, long-term production operations ownership, or 24x7
  operational support.

- Hardware procurement, datacenter installation, or unrelated
  infrastructure build-out not directly tied to the services described
  in this SOW.

- Binding commitments to end customers, contractual promises to third
  parties, or customer project delivery obligations not expressly
  authorized by the Client.

- Ongoing maintenance, enhancement, or support of deliverables after the
  end of the engagement term, except as separately agreed in writing.

8. Acceptance and Change Control

Commercial terms, invoicing, payment terms, intellectual property,
confidentiality, warranties, limitation of liability, and any formal
acceptance procedures will be governed by the Agreement.

Where the Agreement does not specify a separate acceptance procedure,
the Client will review deliverables promptly and provide consolidated
feedback within five (5) business days after delivery. Deliverables will
be deemed accepted when they materially conform to the scope described
in this SOW, subject to any documented punch-list items or agreed
revisions.

Any material change to scope, deliverables, timeline, or level of effort
will require written approval by both parties.

9. Signatures

The parties acknowledge and agree to this SOW as of the dates written
below.

By: ________________________________ 
Name: ______________________________ 


[Consultant Legal Name]
Title: _______________________________
Date: _______________________________
By: ________________________________ 

[Client Legal Name]
Name: ______________________________
Title: _______________________________
Date: _______________________________