AMD · Taiwan, Jingmao 1st Rd. Nangang Dist. Taipei City 115, TW · 3 days ago
WHAT YOU DO AT AMD CHANGES EVERYTHING
At AMD, our mission is to build great products that accelerate next-generation computing experiences—from AI and data centers, to PCs, gaming and embedded systems. Grounded in a culture of innovation and collaboration, we believe real progress comes from bold ideas, human ingenuity and a shared passion to create something extraordinary. When you join AMD, you’ll discover the real differentiator is our culture. We push the limits of innovation to solve the world’s most important challenges—striving for execution excellence, while being direct, humble, collaborative, and inclusive of diverse perspectives. Join us as we shape the future of AI and beyond. Together, we advance your career.
Key Responsibilities
Customer Engagement & Technical Debug Support
Serve as the primary technical interface for customers on GPU server bring‑up, stability, and debug issues.
Support customers during system integration, validation, and production ramp, acting as the first line of escalation.
Pre‑Sales & Post‑Sales Support
Support POC, EVT/DVT/PVT, and early customer deployments from a system debug perspective.
Review customer system architecture and provide debug readiness, risk assessment, and best‑practice guidance.
Server Bring‑Up & Issue Debugging - EPYC
Diagnose and resolve server‑level issues including boot failures, OS bring‑up, GPU/NIC detection, PCIe issues, and system hangs.
Perform HW/SW co‑debug across BIOS/UEFI, BMC, firmware, drivers, OS, and GPU stacks.
Analyze logs, dumps, and traces (BIOS, BMC, OS, GPU, NIC) to isolate root causes.
Work closely with ODMs, component vendors, and internal engineering teams to drive issue closure.
GPU & Platform Debug - Instinct/Pensando
Debug GPU server issues related to power, thermals, PCIe, interconnects, and multi‑GPU configurations.
Validate GPU functionality under stress, burn‑in, and long‑run stability conditions.
Support RMA analysis and failure reproduction when required.
Performance Validation & Stability
Assist with system‑level performance validation and identify platform bottlenecks.
Support customer concerns related to system stability, reliability, and scalability in multi‑GPU servers.
Documentation & Knowledge Sharing
Create debug guides, checklists, and best‑practice documents for server bring‑up and issue triage.
Provide technical training to customers and internal teams on server debug methodology and tools
Qualifications
Bachelor’s or Master’s degree in related field.
5+ years of experience in server platform debug, GPU systems, or data center hardware support.
Strong understanding of x86 server architecture, GPU platforms, PCIe, memory, power, and thermals.
Hands‑on experience with Linux OS, system logs, firmware, and driver‑level debugging.
Experience working with ODMs/OEMs and cross‑functional engineering teams.
Strong communication skills for customer‑facing debug and escalation management.
Preferred Skills
Experience debugging GPU servers or AI/HPC platforms in customer environments.
Familiarity with BIOS/UEFI, BMC (OpenBMC), firmware update flows, and server validation stages.
Understanding of networking (NICs, RDMA, Ethernet/InfiniBand) in GPU servers.
Ability to work independently, manage multiple customer issues, and drive problems to closure.
#LI-SC1
#LI-HYBRID
Benefits offered are described: AMD benefits at a glance.
AMD does not accept unsolicited resumes from headhunters, recruitment agencies, or fee-based recruitment services. AMD and its subsidiaries are equal opportunity, inclusive employers and will consider all applicants without regard to age, ancestry, color, marital status, medical condition, mental or physical disability, national origin, race, religion, political and/or third-party affiliation, sex, pregnancy, sexual orientation, gender identity, military or veteran status, or any other characteristic protected by law. We encourage applications from all qualified candidates and will accommodate applicants’ needs under the respective laws throughout all stages of the recruitment and selection process.
AMD may use Artificial Intelligence to help screen, assess or select applicants for this position. AMD’s “Responsible AI Policy” is available here.
This posting is for an existing vacancy.
Headquarters
Taiwan, Jingmao 1st Rd. Nangang Dist. Taipei City 115
Work Location
hybrid
Job Category
Engineering
Application Deadline
Not specified
Job Type
full-time
Experience Level
manager-level
Application Method
Apply via JobSpring
Salary
Not specified
No related jobs found