Description:
As a member of the Low Power AI Solutions team, you will play a critical role in enabling efficient deployment of AI models on Qualcomm's low-power AI accelerators. This position focuses on developing and optimizing the machine learning runtime framework for inference workloads on embedded edge devices. You will be responsible for implementing performance-critical components of the machine learning runtime framework and applying advanced optimization techniques. This role includes adding runtime support for popular ML architectures that are best suited for Qualcomm’s low-power AI accelerators. Your work will directly impact the runtime efficiency, latency, and power consumption of AI applications running on Qualcomm hardware.
Key Responsibilities
- Design and implement core components of the ML runtime framework for inference on embedded systems.
- Collaborate with compiler, hardware, and model teams to co-design efficient execution paths for AI workloads.
- Develop and maintain C/C++ code for runtime kernels and system-level integration.
- Develop tools to assist with performance profiling and debugging of quantized model accuracy
- Analyze and improve runtime behavior using profiling tools and hardware counters.
- Support deployment of models from popular ML frameworks (e.g., Onnx, TensorFlow, PyTorch) onto Qualcomm’s inference stack.
Required Skills & Experience
- Strong hands-on experience in performance optimization for embedded or low-power systems.
- Proficient in C/C++ programming, with a focus on system-level and runtime development.
- Solid understanding of embedded system design, including memory hierarchy and hardware-software interaction.
- Experience with Linux/Android development environments and toolchains.
- Familiarity with computer architecture, especially for AI accelerators or DSPs.
- Basic knowledge of machine learning concepts and model structures.
Preferred Qualifications
- Master’s degree in Computer Science, Engineering, or related field.
- 2+ years of experience with ML frameworks (e.g., TensorFlow, PyTorch, ONNX).
- 2+ years of experience in embedded system development and optimization for ML inference.
- 2+ years of experience with C/C++ in performance-critical environments.
- Experience with low-level OS interactions (Linux, Android, QNX).
- Familiarity with quantization, graph optimization, and model deployment pipelines.
- Experience working in cross-functional teams and large matrixed organizations.