Job Description

Looking for a Freelance Computer Vision Engineer to join a team of rockstar developers. The candidate should have a minimum of 10+ yrs. of experience.There are multiple openings. If you're looking for freelance/ part time opportunity (along with your day job) & a chance to work with the top 0.1% of developers in the industry, this one is for you! You will report into IIT'ans/BITS grads with 10+ years of development experience + work with F500 companies (our customers).Company Background - We are a multinational software company that is growing at a fast pace. We have offices in Florida & New Delhi. Our clientele spreads across the US, Australia & APAC. To give you a sense of our growth rate, we've added 70+ employees in the last 6 weeks itself and expect another 125+ by the end of Q4 2025Key ResponsibilitiesBuild, train, fine-tune, and evaluate multimodal ML models across Text+Vision, Text+Audio, or Vision+Audio combinations.Develop end-to-end multimodal pipelines, integrating preprocessing, model training, optimization, and deployment.Work extensively on computer vision tasks including image processing, object detection, and segmentation.Experiment with and implement modern open-source multimodal architectures (CLIP, BLIP, LLaVA, VLM-based frameworks).Own the design and execution of large-scale training runs on multimodal datasets.Collaborate with product and engineering teams to integrate models into production-grade systems.Conduct research-style explorations to push advancements in generative AI and multimodal learning.Maintain high coding standards and write scalable, efficient, and well-documented Python/ML code.Must-Have Skills10+ years of experience in Machine Learning / Deep Learning; minimum 2+ years in multimodal ML or CV or audio ML.Strong understanding of multimodal ML workflows, including dataset curation, tokenization, alignment, and cross-attention mechanisms.Hands-on expertise in Vision ML: image processing, object detection, segmentation, embeddings.Practical experience in model fine-tuning, prompt tuning, and training multimodal architectures.Proficiency in PyTorch or TensorFlow for deep learning model development.Solid command of Python, data structures, ML pipelines, and optimization techniques.Familiarity with multimodal benchmarks, datasets, and open-source models (CLIP, BLIP, LLaVA, VLMs).Experience with GPU-based training, distributed training, or LLM/vision foundation models.What we need-~35 hours of work per week.-100% remote from our side-You will be paid out every month.-Min 10yrs of experience-Please apply only if you have a 100% remote job currently-If you do well, this will continue for a long time

Job Title

Company : Leading MNC

Location : Ludhiana, Punjab

Created : 2025-12-19

Job Type : Full Time