d Matrix

Cloud MLIR/LLVM Compiler Architect - Lead Inference

d Matrix

- Toronto

Posted: 30/04/2026

A leading generative AI company in Toronto is looking for a Software Compiler Architect specialized in MLIR/LLVM for cloud inference. This role involves architecting a scalable compiler framework to optimize large-scale AI models and collaborating with cross-functional teams to ensure efficient deployment. Candidates should have substantial experience in compiler design and strong knowledge of AI frameworks. This position offers a hybrid work environment, requiring onsite presence 3-5 days weekly. #J-18808-Ljbffr

d

ML Compiler Architect, Senior Principal

d Matrix

- Toronto

Posted: 30/04/2026

Apply Now

Overview At d-Matrix, we are focused on unleashing the potential of generative AI to power the transformation of technology. We are at the forefront of software and hardware innovation, pushing the boundaries of what is possible. Our culture is one of respect and collaboration. We value humility and believe in direct communication. Our team is inclusive, and our differing perspectives allow for better solutions. We are seeking individuals passionate about tackling challenges and are driven by execution. Ready to come find your playground? Together, we can help shape the endless possibilities of AI. Location: Hybrid, working onsite at our Toronto, Ontario, Canada headquarters 3-5 days per week. Role: Software Compiler Architect – MLIR/LLVM for Cloud Inference What You Will Do: As a hands‑on Front‑End Software Compiler Architect focused on cloud‑based AI inference, you will drive the design and implementation of a scalable MLIR‑based compiler framework optimized for deploying large‑scale NLP and transformer models in cloud environments. You will architect the end‑to‑end software pipeline that translates high‑level AI models into efficient, low‑latency executables on a distributed, multi‑chiplet hardware platform featuring heterogeneous compute elements such as in‑memory tensor processors, vector engines, and hierarchical memory. Your compiler designs will enable dynamic partitioning, scheduling, and deployment of inference workloads across a cloud‑scale infrastructure, supporting both statically compiled and runtime‑optimized execution paths. You will focus on compiler strategies that minimize inference latency, maximize throughput, and efficiently utilize compute and memory resources in data center environments, in addition to your work on developing the compiler. You will collaborate cross‑functionally with systems architects, ML framework teams, runtime developers, performance engineers, and cloud orchestration groups to ensure seamless integration and optimized inference delivery at scale. Key Responsibilities: Architect the MLIR‑based compiler for cloud inference workloads, focusing on efficient mapping of large‑scale AI models (e.g., LLMs, Transformers, Torch‑MLIR) onto distributed compute and memory hierarchies. Lead the development of compiler passes for model partitioning, operator fusion, tensor layout optimization, memory tiling, and latency‑aware scheduling. Design support for hybrid offline/online compilation and deployment flows with runtime‑aware mapping, allowing for adaptive resource utilization and load balancing in cloud scenarios. Define compiler abstractions that interoperate efficiently with runtime systems, orchestration layers, and cloud deployment frameworks. Drive scalability, reproducibility, and performance through well‑designed IR transformations and distributed execution strategies. Mentor and guide a team of compiler engineers to deliver high‑performance inference‑optimized software stacks. What You Will Bring: BS 15+ Yrs / MS 12+ Yrs / PhD 10+ Yrs Computer Science or Electrical Engineering, with 12+ years of experience in Front End Compiler and systems software development, with a focus on ML inference. Deep experience in designing or leading compiler efforts using MLIR, LLVM, Torch‑MLIR, or similar frameworks. Strong understanding of model optimization for inference: quantization, fusion, tensor layout transformation, memory hierarchy utilization, and scheduling. Expertise in deploying ML models to heterogeneous compute environments, with specific attention to latency, throughput, and resource scaling in cloud systems. Proven track record working with AI frameworks (e.g., PyTorch, TensorFlow), ONNX, and hardware backends. Experience with cloud infrastructure, including resource provisioning, distributed execution, and profiling tools. Preferred Qualifications: Experience targeting inference accelerators (AI ASICs, FPGAs, GPUs) in cloud‑scale deployments. Knowledge of cloud deployment orchestration (e.g., Kubernetes, containerized AI workloads). Strong leadership skills with experience mentoring teams and collaborating with large‑scale software and hardware organizations. Excellent written and verbal communication; capable of presenting complex compiler architectures and trade‑offs to both technical and executive stakeholders. This role is a cornerstone of our cloud AI software strategy. You'll shape the way inference workloads are deployed, optimized, and scaled across data center infrastructure. Equal Opportunity Employment Policy d-Matrix is proud to be an equal opportunity workplace and affirmative action employer. We’re committed to fostering an inclusive environment where everyone feels welcomed and empowered to do their best work. We hire the best talent for our teams, regardless of race, religion, color, age, disability, sex, gender identity, sexual orientation, ancestry, genetic information, marital status, national origin, political affiliation, or veteran status. Our focus is on hiring teammates with humble expertise, kindness, dedication and a willingness to embrace challenges and learn together every day. d-Matrix does not accept resumes or candidate submissions from external agencies. We appreciate the interest and effort of recruitment firms, but we kindly request that individuals interested in opportunities with d-Matrix apply directly through our official channels. This approach allows us to streamline our hiring processes and maintain a consistent and fair evaluation of all applicants. Thank you for your understanding and cooperation. #J-18808-Ljbffr

d

ML Compiler Architect, Senior Principal

d Matrix

- Toronto

Posted: 30/04/2026

Apply Now

Overview At

d-Matrix , we are focused on unleashing the potential of generative AI to power the transformation of technology. We are at the forefront of software and hardware innovation, pushing the boundaries of what is possible. Our culture is one of

respect and collaboration.

We value humility and believe in direct communication. Our

team is inclusive , and our differing perspectives allow for better solutions. We are seeking individuals passionate about tackling challenges and are driven by execution. Ready to come find your playground?

Together , we can help shape the endless possibilities of AI.

Location: Hybrid, working onsite at our Toronto, Ontario, Canada headquarters 3-5 days per week.

Role: Software Compiler Architect – MLIR/LLVM for Cloud Inference What You Will Do: As a hands‑on Front‑End Software Compiler Architect focused on cloud‑based AI inference, you will drive the design and implementation of a scalable MLIR‑based compiler framework optimized for deploying large‑scale NLP and transformer models in cloud environments. You will architect the end‑to‑end software pipeline that translates high‑level AI models into efficient, low‑latency executables on a distributed, multi‑chiplet hardware platform featuring heterogeneous compute elements such as in‑memory tensor processors, vector engines, and hierarchical memory.

Your compiler designs will enable dynamic partitioning, scheduling, and deployment of inference workloads across a cloud‑scale infrastructure, supporting both statically compiled and runtime‑optimized execution paths. You will focus on compiler strategies that minimize inference latency, maximize throughput, and efficiently utilize compute and memory resources in data center environments, in addition to your work on developing the compiler.

You will collaborate cross‑functionally with systems architects, ML framework teams, runtime developers, performance engineers, and cloud orchestration groups to ensure seamless integration and optimized inference delivery at scale.

Key Responsibilities:

Architect the MLIR‑based compiler for cloud inference workloads, focusing on efficient mapping of large‑scale AI models (e.g., LLMs, Transformers, Torch‑MLIR) onto distributed compute and memory hierarchies.

Lead the development of compiler passes for model partitioning, operator fusion, tensor layout optimization, memory tiling, and latency‑aware scheduling.

Design support for hybrid offline/online compilation and deployment flows with runtime‑aware mapping, allowing for adaptive resource utilization and load balancing in cloud scenarios.

Define compiler abstractions that interoperate efficiently with runtime systems, orchestration layers, and cloud deployment frameworks.

Drive scalability, reproducibility, and performance through well‑designed IR transformations and distributed execution strategies.

Mentor and guide a team of compiler engineers to deliver high‑performance inference‑optimized software stacks.

What You Will Bring:

BS 15+ Yrs / MS 12+ Yrs / PhD 10+ Yrs Computer Science or Electrical Engineering, with 12+ years of experience in Front End Compiler and systems software development, with a focus on ML inference.

Deep experience in designing or leading compiler efforts using MLIR, LLVM, Torch‑MLIR, or similar frameworks.

Strong understanding of model optimization for inference: quantization, fusion, tensor layout transformation, memory hierarchy utilization, and scheduling.

Expertise in deploying ML models to heterogeneous compute environments, with specific attention to latency, throughput, and resource scaling in cloud systems.

Proven track record working with AI frameworks (e.g., PyTorch, TensorFlow), ONNX, and hardware backends.

Experience with cloud infrastructure, including resource provisioning, distributed execution, and profiling tools.

Preferred Qualifications:

Experience targeting inference accelerators (AI ASICs, FPGAs, GPUs) in cloud‑scale deployments.

Knowledge of cloud deployment orchestration (e.g., Kubernetes, containerized AI workloads).

Strong leadership skills with experience mentoring teams and collaborating with large‑scale software and hardware organizations.

Excellent written and verbal communication; capable of presenting complex compiler architectures and trade‑offs to both technical and executive stakeholders.

This role is a cornerstone of our cloud AI software strategy. You'll shape the way inference workloads are deployed, optimized, and scaled across data center infrastructure.

Equal Opportunity Employment Policy d-Matrix

is proud to be an equal opportunity workplace and affirmative action employer. We’re committed to fostering an inclusive environment where everyone feels welcomed and empowered to do their best work. We hire the best talent for our teams, regardless of race, religion, color, age, disability, sex, gender identity, sexual orientation, ancestry, genetic information, marital status, national origin, political affiliation, or veteran status. Our focus is on hiring teammates with humble expertise, kindness, dedication and a willingness to embrace challenges and learn together every day.

d-Matrix

does not accept resumes or candidate submissions from external agencies. We appreciate the interest and effort of recruitment firms, but we kindly request that individuals interested in opportunities with d-Matrix apply directly through our official channels. This approach allows us to streamline our hiring processes and maintain a consistent and fair evaluation of all applicants. Thank you for your understanding and cooperation.

#J-18808-Ljbffr

d

Cloud MLIR/LLVM Compiler Architect - Lead Inference

d Matrix

- Toronto

Posted: 30/04/2026

Apply Now

A leading generative AI company in Toronto is looking for a Software Compiler Architect specialized in MLIR/LLVM for cloud inference. This role involves architecting a scalable compiler framework to optimize large-scale AI models and collaborating with cross-functional teams to ensure efficient deployment. Candidates should have substantial experience in compiler design and strong knowledge of AI frameworks. This position offers a hybrid work environment, requiring onsite presence 3-5 days weekly. #J-18808-Ljbffr

d

Software Engineering Intern - Kernels

d Matrix

- Toronto

Posted: 30/04/2026

Apply Now

At

d-Matrix , we are focused on unleashing the potential of generative AI to power the transformation of technology. We are at the forefront of software and hardware innovation, pushing the boundaries of what is possible. Our culture is one of

respect and collaboration.

We value humility and believe in direct communication. Our

team is inclusive , and our differing perspectives allow for better solutions. We are seeking individuals passionate about tackling challenges and are driven by execution. Ready to come find your playground?

Together , we can help shape the endless possibilities of AI.

Job Title Software Engineering Intern - Kernels

Location Toronto, Canada

Program Duration 12 weeks: June 1st - August 21st or June 22nd - September 11th

Project Overview As a Software Engineering Intern within our Kernels team, you will play a key role in developing high performance kernels essential for accelerating Machine Learning models. Your responsibilities will span developing reference implementations for accuracy verification, defining unit tests for implemented operators, performance tuning, scalability analysis across varied problem sizes, and packaging/shipping the final implementations. You will also collect performance metrics and identify bottlenecks to improve core functionality.

What You Will Do

Implement high performance kernels in low-level languages (Assembly/ISA experience a plus)

Develop, test, and tune kernels for machine learning models and performance

Create and automate reference implementations and unit tests

Analyze scalability and performance, collect metrics, and troubleshoot bottlenecks

Package and share implementations with partner teams

Required Skills

Ability to implement high performance kernels in low-level languages; Assembly/ISA coding experience is advantageous

Proficiency in Python and/or C++

Solid background in Machine Learning model architecture (e.g., LLMs, CNNs)

Experience with ML frameworks such as PyTorch and ML packages like Numpy

General understanding of computer architecture (CPU, GPU, custom ASICs, etc.)

Currently enrolled in a graduate program (Master's or Ph.D) in a relevant discipline

Preferred Qualifications

Previous internship or project experience related to high performance computing or ML kernel development

Familiarity with additional ML frameworks (TensorFlow, etc.)

Interest in hardware-software co-design

Equal Opportunity Employment Policy We are proud to be an equal opportunity workplace and affirmative action employer. We’re committed to fostering an inclusive environment where everyone feels welcomed and empowered to do their best work. We hire the best talent for our teams, regardless of race, religion, color, age, disability, sex, gender identity, sexual orientation, ancestry, genetic information, marital status, national origin, political affiliation, or veteran status. Our focus is on hiring teammates with humble expertise, kindness, dedication and a willingness to embrace challenges and learn together every day.

We do not accept resumes or candidate submissions from external agencies. We appreciate the interest and effort of recruitment firms, but we kindly request that individual interested in opportunities with d-Matrix apply directly through our official channels. This approach allows us to streamline our hiring processes and maintain a consistent and fair evaluation of all applicants. Thank you for your understanding and cooperation.

#J-18808-Ljbffr

d

Software Engineering Intern - Kernels

d Matrix

- Toronto

Posted: 30/04/2026

Apply Now

At

d-Matrix , we are focused on unleashing the potential of generative AI to power the transformation of technology. We are at the forefront of software and hardware innovation, pushing the boundaries of what is possible. Our culture is one of

respect and collaboration . We value humility and believe in direct communication. Our

team is inclusive , and our differing perspectives allow for better solutions. We are seeking individuals passionate about tackling challenges and are driven by execution. Ready to come find your playground? Together, we can help shape the endless possibilities of AI. Job Title: Software Engineering Intern - Kernels

Location: Toronto, Canada

Program Duration

12 weeks: June 1st - August 21st or June 22nd - September 11th Project Overview

As a Software Engineering Intern within our Kernels team, you will play a key role in developing high performance kernels essential for accelerating Machine Learning models. Your responsibilities will span developing reference implementations for accuracy verification, defining unit tests for implemented operators, performance tuning, scalability analysis across varied problem sizes, and packaging/shipping the final implementations. You will also collect performance metrics and identify bottlenecks to improve core functionality. What You Will Do

Implement high performance kernels in low-level languages (Assembly/ISA experience a plus) Develop, test, and tune kernels for machine learning models and performance Create and automate reference implementations and unit tests Analyze scalability and performance, collect metrics, and troubleshoot bottlenecks Package and share implementations with partner teams Required Skills

Ability to implement high performance kernels in low-level languages; Assembly/ISA coding experience is advantageous Proficiency in Python and/or C++ Solid background in Machine Learning model architecture (e.g., LLMs, CNNs) Experience with ML frameworks such as PyTorch and ML packages like Numpy General understanding of computer architecture (CPU, GPU, custom ASICs, etc.) Currently enrolled in a graduate program (Master's or Ph.D) in a relevant discipline Preferred Qualifications

Previous internship or project experience related to high performance computing or ML kernel development Familiarity with additional ML frameworks (TensorFlow, etc.) Interest in hardware-software co-design Equal Opportunity Employment Policy

d-Matrix

is proud to be an equal opportunity workplace and affirmative action employer. We’re committed to fostering an inclusive environment where everyone feels welcomed and empowered to do their best work. We hire the best talent for our teams, regardless of race, religion, color, age, disability, sex, gender identity, sexual orientation, ancestry, genetic information, marital status, national origin, political affiliation, or veteran status. Our focus is on hiring teammates with humble expertise, kindness, dedication and a willingness to embrace challenges and learn together every day. d-Matrix does not accept resumes or candidate submissions from external agencies. We appreciate the interest and effort of recruitment firms, but we kindly request that individuals interested in opportunities with d-Matrix apply directly through our official channels. This approach allows us to streamline our hiring processes and maintain a consistent and fair evaluation of all applicants. Thank you for your understanding and cooperation.

#J-18808-Ljbffr

d

High-Performance ML Kernel Intern

d Matrix

- Toronto

Posted: 30/04/2026

Apply Now

A leading technology company in Toronto is seeking a Software Engineering Intern for a 12-week program focused on developing high-performance kernels for Machine Learning models. Candidates should be enrolled in a relevant graduate program and possess skills in Python, C++, and machine learning model architecture. This role offers hands-on experience and the chance to work in an inclusive environment that promotes collaboration and innovation. #J-18808-Ljbffr

d

High-Performance ML Kernel Intern

d Matrix

- Toronto

Posted: 30/04/2026

Apply Now

A leading technology firm is seeking a Software Engineering Intern in Toronto to develop high-performance kernels essential for accelerating Machine Learning models. You will implement and tune kernels, automate reference tests, analyze performance, and collaborate with partner teams. Ideal candidates are enrolled in a relevant master's or Ph.D. program with skills in Python, C++, and machine learning frameworks like PyTorch. Join us to push the boundaries of AI innovation while fostering a culture of respect and collaboration. #J-18808-Ljbffr

Company Detail

Login to View contact details

About Company

Job Openings

Cloud MLIR/LLVM Compiler Architect - Lead Inference

ML Compiler Architect, Senior Principal

ML Compiler Architect, Senior Principal

Cloud MLIR/LLVM Compiler Architect - Lead Inference

Software Engineering Intern - Kernels

Software Engineering Intern - Kernels

High-Performance ML Kernel Intern

High-Performance ML Kernel Intern

Company Detail

Google Map

Quick Links

For Jobseekers

For Publisher

International

For Employers

For Partner

Contact Us