Job Description
Job Description
Key Responsibilities:
-
Architect and deploy an on-prem AI/ML environment, including GPU clusters and high-performance computing resources.
-
Collaborate with infrastructure teams to test and optimize networking storage and compute resources for AI workloads.
-
Implement scalable storage solutions (e.g., distributed file systems, object storage) for efficient data handling.
-
Ensure system reliability, security, and performance through best practices in Linux system administration and resource scheduling.
-
Configure AI model training and inference environments, leveraging containerization (Docker, Kubernetes) and MLOps pipelines.
-
Design and implement MLOps processes to support efficient model training, validation, deployment, and monitoring.
-
Configure and set up ML Oracle Cloud from scratch, ensuring a scalable and production-ready infrastructure.
-
Collaborate with cross-functional teams to understand data requirements and integrate AI/ML solutions into existing enterprise systems.
-
Work with developers to integrate AI model outputs into business intelligence tools such as Power BI and Oracle Analytics.
Mandatory Qualifications:
-
Master’s or Ph.D. in Computer Science, Data Science, Machine Learning, or a related field – At least a Master’s Degree
-
Certifications in Oracle Data Science Platform preferred.
-
Onsite working (HYBRID) in Downtown Brooklyn, NY
-
3+ years of experience in AI/ML engineering with a focus on infrastructure, MLOps, and cloud AI deployment.
-
Experience configuring and setting up ML platforms on-premises or in Oracle Cloud from scratch.
-
Strong expertise in Linux-based AI/ML environments, including performance optimization, package management, shell scripting
-
Experience with HPC environments, GPU clusters (H100, A100, or similar), and distributed AI workloads.
-
Strong programming skills in Python and experience with AI/ML frameworks such as TensorFlow, PyTorch, or similar.
-
Hands-on experience with MLOps, including model training, validation, deployment, and monitoring.
-
Experience integrating AI/ML models into business intelligence tools (Power BI, Oracle Analytics, or APIs).
-
Experience with high-speed networking, storage solutions, and AI/ML system performance tuning.
Please note all candidates need to have a Master’s Degree and 3 + years of experience.