Job Description
Job Title: Hardware Qualification Engineer
Job Location: San Francisco Bay Area/ Santa Clara, CA/ NYC, NY/ Clifton, NJ (Onsite from day one)
Job Duration: 12+ months
Minimum years of experience required: 5+ years
Type of Hire: Contract
JOB DETAILS:
Team Focus Areas:
Automation, Python
Linux commands
System or Setup-level understanding
Interface knowledge (PCIe, DDR, SPI, I2C)
Firmware knowledge
Required Skills:
In-depth understanding of hardware designs and subsystems (e.g., BMC, PCIe, CPU, GPU).
Proven experience qualifying hardware designs for production release (SKU qualification).
Experience with component-level testing across various subsystems (e.g., Component Qual).
Strong Linux systems experience, including:
Troubleshooting networking interfaces
Developing and applying configuration management
Enforcing security best practices
System monitoring and debugging
Experience with firmware testing and deployment (Firmware Qual).
Strong Python scripting and automation skills.
Skilled in working with 1 or more orchestration frameworks.
Strong analytical and communication skills.
Strong documentation skills, including writing test plans and internal documentation for engineering and operations teams.
Proven collaboration with cross-functional and geographically distributed teams.
Bonus Skills:
Hands-on experience with High Performance Computing (HPC) clusters.
Hardware expertise in NVIDIA or AMD platforms.
Experience with automated wide-scale testing.
Experience using NCCL or similar frameworks.
Experience in developing and maintaining automation frameworks focused on test execution for hands-free qualification.
Responsibilities:
Provide onsite support for hardware qualification efforts in NYC3 and SFO2.
Work on new server SKU qualifications across Compute, GPU Hypervisors, Storage, and Infrastructure server hardware.
Perform hardware validation against design targets (functional and performance).
Perform hardware reconfiguration to support testing needs (e.g., system changes, component swaps).
Troubleshoot and integrate hardware with platform operational tools (onboarding).
Execute hardware validation and qualification tests.
Conduct firmware, BIOS, and kernel upgrade testing.
Run automated test cases, analyze logs, and monitor performance and system health.