Search

Machine Learning Engineer

Gravity IT Resources
locationNashville, TN, USA
PublishedPublished: 6/14/2022
Engineering
Full Time

Job Description

Machine Learning Engineer

Employment Type: Full-Time

Location: Nashville, TN (hybrid)


About the Role

We’re hiring a Maching Learning Engineer to design and deploy AI systems end-to-end — from data preparation and evaluation to model fine-tuning, inference, and agentic workflows. You’ll work closely with product and engineering teams to deliver reliable, cost-effective, and scalable LLM-powered solutions on AWS.


What You’ll Do

  • End-to-End GenAI Solutions: Scope problems, choose the right approach (prompt engineering, fine-tuning, agents), implement, evaluate, and deploy.
  • Data & SQL: Write efficient SQL for analytics and data prep; manage schemas and pipelines for model training and inference.
  • Model Training & Fine-Tuning: Run supervised fine-tuning (PEFT/LoRA/QLoRA), optimize prompts, and manage experiment tracking/evaluation.
  • Agentic Systems: Build agent workflows with tool use, memory, and safety/guardrails.
  • Inference & Deployment: Package services with Docker, optimize latency and cost (batching, caching, quantization), and deploy on AWS (ECS, EKS, SageMaker, Lambda with GPU acceleration).
  • MLOps & Observability: Set up CI/CD for models/prompts; maintain offline/online evaluation pipelines, monitoring, and rollback strategies.
  • Security & Compliance: Implement data governance, PHI/PII protections, and guardrails against prompt injection and unsafe outputs.
  • Cross-Functional Collaboration: Work with product managers and engineers to align GenAI capabilities with product goals; clearly document and communicate trade-offs.
  • Production Readiness: Lead conversations around scaling, monitoring, and maintaining GenAI systems in production environments.


Minimum Qualifications

  • 5+ years of Software/ML engineering experience, including 2+ years building and deploying GenAI/LLM systems.
  • MS/PhD in Computer Science, Data Science, or equivalent experience.
  • Strong SQL and Python skills with solid software engineering fundamentals.
  • Experience with agent frameworks (LangGraph, AutoGen, CrewAI) and tool-driven agents.
  • Hands-on with deep learning (PyTorch or TensorFlow) and LLM fine-tuning (SFT/PEFT like LoRA/QLoRA).
  • Production experience with Docker and AWS (ECS, EKS, SageMaker, Lambda, or GPU services).
  • Experience building scalable data and model pipelines for training and deployment.
  • Familiarity with prompt engineering, evaluation frameworks (LLM-as-judge, metrics), and offline test harnesses.
  • Understanding of security & compliance for sensitive data (e.g., PHI/PII).
  • Excellent problem-solving, communication, and documentation skills.


Preferred Qualifications

  • Experience with inference optimization: quantization (bitsandbytes, GPTQ/AWQ), batching, caching, or vLLM.
  • Background in healthcare, including HIPAA compliance or medical data handling.
  • Experience with experiment tracking (MLflow, W&B), CI/CD for ML, and monitoring tools (Prometheus, Grafana).
  • Familiarity with major LLM APIs and open-source models (OpenAI, Anthropic, Llama, Mistral).


Tech Stack

  • Languages: Python, SQL
  • DL/LLM: PyTorch, TensorFlow, Hugging Face, PEFT/TRL, vLLM
  • Data: Snowflake, Postgres
  • Cloud: AWS (ECS, EKS, SageMaker, Lambda)
  • MLOps: Docker, CI/CD, MLflow, or W&B
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...