Role and Responsibilities
- Optimize deep learning models for deployment using Pytorch, ONNX, TensorRT, and other relevant frameworks.
- Develop and implement techniques for model quantization and compression to reduce memory footprint and increase inference speed.
- Develop and implement techniques for model obfuscation and secure deployments.
- Collaborate with AI researchers and developers to integrate advanced performance optimization techniques into our production systems.
- Analyze and improve existing model architectures for better efficiency and performance.
- Interface with production engineering team for assistance with on-prem deployments
About You
- Bachelorโs or Masterโs degree in Computer Science, Electrical Engineering, or related field
- Experience implementing modern deep learning architectures (transformers, CNNs, etc.)
- Experience compiling model inference code for deployment
- Strong software development skills
- Strong familiarity with machine (deep) learning frameworks such as PyTorch, ONNX, and TensorRT
- 2+ years industry experience preparing ML models for production