Skip to Main Content

Job Title


Azure Cloud Engineer AI


Company : Themesoft Inc.


Location : Toronto, Ontario


Created : 2025-11-06


Job Type : Full Time


Job Description

Themesoft Inc. is a global IT solutions provider and a WomanOwned Minority Business Enterprise headquartered in Dallas, TX. With a strong presence across the US, Canada, India, Singapore, and Brazil, we specialize in digital transformation, consulting, and workforce solutions across diverse industries.We are currently looking for a tech-savvy and results-driven professional for one of our leading clients. If youre passionate about technology and looking to grow in a dynamic, fast-paced environment, this could be the perfect fit for you!Role : Azure Cloud Engineer AILocation : Toronto, Canada- Hybrid (3 days to office)6+ monthsCloud Engineer AI InfrastructureRole OverviewAs a Cloud Engineer, you will be responsible for implementing and maintaining scalable, secure, and high-performance cloud infrastructure to support AI/ML workloads. Youll work closely with platform, application, and data teams to ensure reliable operations and efficient delivery of AI services.Key ResponsibilitiesInfrastructure & Platform OperationsDeploy and manage cloud-native infrastructure for AI/ML workloads (GPU/CPU clusters, autoscaling, spot instances).Configure and maintain networking components (Azure VNet, Private Link, peering, HA/DR setups).Operate storage and database systems including Azure Data Lake Storage, relational databases, and vector databases (FAISS, Milvus, Pinecone).Implement IAM policies, secrets management (Key Vault), and encryption standards.Observability & ReliabilitySet up monitoring for latency, throughput, GPU utilization, and cost metrics.Integrate logging and tracing tools (OpenTelemetry) and maintain SLOs/SLIs for infrastructure services.Support incident response and root cause analysis using SRE principles.CI/CD & Infrastructure AutomationBuild and maintain CI/CD pipelines using GitHub Actions or Azure DevOps.Implement GitOps workflows for infrastructure-as-code using Terraform or Bicep.Create reusable IaC modules and templates for consistent deployments.FinOps & Cost OptimizationMonitor and optimize GPU usage, caching strategies, and inference performance.Support cost governance and reporting for AI infrastructure.Application EnablementProvide infrastructure support for APIs, microservices, and event-driven architectures.Enable model serving runtimes (TensorRT-LLM, vLLM, Triton/KServe).Support RAG pipelines including embeddings, chunking, and retrieval systems.Security & ComplianceApply defense-in-depth strategies: IAM least privilege, private networking, image signing.Ensure compliance with data residency, encryption, and audit requirements.QualificationsBachelors degree in Computer Science, Engineering, or related field.35 years of experience in cloud infrastructure (Azure preferred).Hands-on experience with Kubernetes, Terraform/Bicep, and cloud networking.Familiarity with AI/ML infrastructure components and model serving.Proficiency in Python for automation; Go or TypeScript is a plus.Tech StackCloud & Infra: Azure (AKS, Functions, Event Hubs, Key Vault), Terraform/Bicep, GitHub ActionsAI Infra: Kubernetes, KServe/Triton, vLLM, TensorRT-LLMOps: Prometheus, Grafana, OpenTelemetry, ArgoCDData: Feature stores (Feast), vector DBs (FAISS, Milvus), relational DBsApp Layer: APIs, microservices, frontend/backend integrationSuccess MetricsReliability: SLOs met, uptime maintainedSecurity: No critical vulnerabilities, audit-ready infrastructureCost Efficiency: Optimized GPU and infra spendVelocity: Fast and reliable deploymentsCollaboration: Effective cross-team support and documentationRegards,_______________________Parthasarathy KLead RecruiterWork: 972-474-8787 Ext: 306,Direct: Inc |Themesoft Jobs