Description:
The Principal Support Engineer is a senior, hands-on technical expert responsible for driving the rapid resolution of complex customer issues. This role blends deep technical troubleshooting, advanced log and API analysis, Datadog monitoring expertise, and leadership of high-severity escalations. The engineer will collaborate tightly with Engineering, Product, and external vendors to eliminate root causes, improve MTTR, and elevate the overall customer experience. This is a high-visibility individual contributor role requiring strong ownership, urgency, and the ability to translate complex technical problems into clear, actionable paths forward for both customers and internal stakeholders.
Key Responsibilities:
- Perform hands‑on technical troubleshooting using Datadog (logs, traces, dashboards), API tools (Postman/cURL), and distributed log tracing.
- Lead high-severity and strategic customer escalations, providing authoritative technical direction and timely communication.
- Drive vendor ticket escalations, ensuring SLA adherence and proactive follow‑ups with Microsoft, Adobe, AWS, Cisco, and others.
- Collaborate with Engineering to deliver root-cause fixes, submit detailed technical findings, and validate permanent resolutions.
- Partner with Product to identify platform gaps, recurring customer pain points, and areas for workflow or UX improvement.
- Analyze MTTR performance, SLA trends, and operational bottlenecks; publish weekly metrics and insights.
- Develop SOPs, escalation workflows, and troubleshooting guides that improve global support operations.
- Identify automation opportunities and collaborate with internal teams to enhance Zendesk workflows and self-service deflection.
Required Skills & Qualifications:
- Bachelor’s degree or Diploma in engineering (e.g., electrical, mechanical, computer science/engineering, etc.) or equivalent.
- Minimum 10+ years in Technical Support Engineering, Site Reliability Engineer (SRE), or related roles.
- Strong experience with APIs, Python scripting, and application performance monitoring tools to automate workflows and optimize system reliability.
- Skilled at analyzing and interpreting performance metrics to identify bottlenecks, troubleshoot issues, and improve overall application efficiency.
- Strong analytical skills with the ability to translate business requirements into clear technical direction, collaborate effectively with engineers and product managers, demonstrate leadership, and create concise, well-structured incident summaries as needed.
- Expertise with Datadog or similar (log search, traces, monitors, dashboards) would be an asset.
- Experience in diagnosing distributed systems, integrations, and SaaS platform behavior.
- Proven ability to interface with strategic enterprise customers and communicate complex technical issues clearly.
- Hands-on experience with vendor escalation processes and SLA governance.
- Strong working knowledge of incident management and technical support KPIs.
- Familiarity with Zendesk, Jira or similar ticketing platforms.