Team Insight : The Data Center Infrastructure Services (DCIS) team sits within TikTok's global technology structure and supports the company's fast growth by building and operating hyper-scale datacenters, managing the life cycle of server fleet, providing cloud solutions, and developing various infrastructure services, making sure they are scalable and are reliable. Role Insight :
We are seeking an experienced Datacenter Operations Engineer to apply technical expertise in a dynamic, fast-paced environment. This role requires strong knowledge of server hardware and mechanical / electrical infrastructure in large-scale datacenters. You will be responsible for diagnosing and resolving server issues, escalating when necessary, and collaborating closely with remote teams. In addition, you will work within the server rack lifecycle process to support the buildout of computing and storage environments. Candidates should have hands-on experience in at least one of the following areas : Networking, Scripting, or Hardware Repair. Success in this role requires excellent communication skills, the ability to work both independently and in a team, and the adaptability to thrive in a rapidly changing environment. - Manage and respond to requests within the ticketing system, prioritizing by business impact and urgency to ensure timely resolution of escalations
- Perform hardware replacements, component swaps, power cycles, BIOS / RAID checks, OS reimages, and CLI log reviews
- Troubleshoot independently using internal tools (., SAOS, DCIM), escalating complex issues with detailed context when required
- Support rack-and-stack activities, cabling, power-up procedures, and physical installation of new infrastructure
- Contribute to datacenter initiatives such as capacity expansions, infrastructure retrofits, and localized upgrade projects
- Lead small deployment efforts, assist in project management tasks, track milestones, and coordinate cross-functional communication
- Maintain accurate inventory, asset management, and operational documentation (including SOP reviews and updates)
- Participate in recovery efforts for impactful incidents and perform root cause analysis using logs, error codes, and device history
- Identify recurring issues, capture and escalate data on repeat offenders, and flag vendor / OEM-related concerns
- Apply change management practices and act as a change initiator during the CM process
- Provide input on testing and deployment of new tools and processes
- Develop and deliver technical training modules; mentor junior team members through 1 : 1s and small-group coaching
- Serve as the first layer of escalation for incident management and maintain team communications in leadership absence
- Build strong working relationships with internal teams, datacenter technicians, and external partners
Minimum Qualifications :
An Associate’s degree in an IT or facility - related field (Computer Science, Information Systems, electrical, mechanical etc) or equivalent professional / training experience - 2+ years of direct experience with data center hardware or facilityApproved certification OR acquired equivalent skills comparable to EPI CDCP or CNET CDCTP Preferred Qualifications :Industry Certifications (A+, Net+, Server+, Linux+) - Experience in large scale datacenters.Basic working knowledge of Mechanical and Electrical infrastructure.Experience operating heavy-load movement equipment such as pallet jacks, server lifts, etc.