Lead AI Architect
Lead AI Architect
Job Summary:
We are hiring a Senior Distinguished Engineer for AI Systems within Capital One to assist us in the establishment of our company's AI foundations. This role requires collaboration with our team of engineers and researchers, designing and implementing robust and secure infrastructures, empirical distributed training clusters, and advanced AI research and development initiatives. Our ideal candidate has a strong background in cloud environments, machine learning and distributed computing, combined with an unwavering commitment to driving innovation within the finance industry.
Job Duties and Responsibilities:
- Construct a resilient system capable of managing extensive training tasks, implementing containers and checkpoint libraries.
- The implementation of advanced infrastructure features is necessary to ensure that our cloud can seamlessly run large machine learning models.
- The public cloud can be harnessed to build a training cluster composed of one thousand nodes, and this infrastructure can be further optimized by reconfiguring the storage and networking stack to attain the best performance.
- Evaluate the use of different technology stacks for AI software systems and their impact on performance by comparing the processing speeds and accuracy of various AI algorithms using different technologies.
- Leverage the power of large language models (LLMs) and feature models (FMs) to develop high-quality applications with exceptional user experiences.
- By leveraging version control tools, MLOps teams can easily track and manage changes to the code, data, and model artifacts used to build and deploy foundation models.
Qualifications and Experience:
- To qualify for this role, an individual must possess an undergraduate degree in Computer Science, Computer Engineering or a comparable technical discipline.
- My professional background revolves around designing and constructing distributed computing, HPC, and large-scale ML models over the past seven years.
- The last half-decade of work experience has been centered around AI and ML algorithm development using either Python or C/C++ programming languages.
- I possess an impressive 3 years of experience in the complete lifecycle of machine learning development, working with AI and ML frameworks, along with the public cloud.
Preferred Qualifications:
- To qualify for this position, applicants must have a Master's or PhD degree in Engineering, Computer Science, or another related technical discipline, or an equivalent practical experience that focuses on contemporary AI approaches.
- Skilled in designing complex distributed platforms in the cloud, such as AWS, Azure, or GCP.
- Having previously architected complex cloud systems, the candidate has a thorough understanding of the various factors that contribute to a secure, high-performing, and cost-effective cloud environment.
- We have extensive experience in handling massive models throughout the MLOps process, from preliminary exploration to effective serving.
- The development of GPU clusters in the public cloud, with emphasis on ensuring seamless and coordinated storage and networking capabilities, has been a fulfilling undertaking.
- Adept at using the complete technology stack for distributed training of large-scale models, which includes machine learning compilers, distributed training frameworks, and development platforms like PyTorch, TensorFlow, and Lightning.
- Possessing hands-on experience in working with various aspects of the AI technology stack, including prompt engineering, guardrails, vector databases/knowledge bases, LLM hosting, and fine-tuning to ensure the accuracy and effectiveness of AI-based tools.
- Exemplary research publications showcased at leading peer-reviewed conferences, or recognized achievements in the area of neural networks, distributed training, and SysML.
Benefits of the Position:
- Compensation that is dependent on an individual's performance, such as cash bonuses and incentives that are earned over a long period, is referred to as performance-based incentive compensation.
- Our company takes a holistic approach to employee well-being, providing comprehensive benefits that support physical health, financial stability, and more.
- At Capital One, there are numerous opportunities available for career progression as well as skill-building programs.
- Personal and professional expansion is encouraged in the welcoming and encouraging work environment.
About Company:
Capital One has a strong commitment to developing AI systems that prioritize the highest standards of trustworthiness and reliability, as well as incorporating human participation, for a revolutionary shift in banking. We are focusing on refining our skills with AI and machine learning technologies to provide customers with real-time, automation-enabled, and intelligent banking experiences. The capabilities we gain from utilizing machine learning platforms and public cloud infrastructure will empower us to leverage AI's potential and its impact on the financial industry. By joining our team, you can be a part of reimagining how we cater to clients and businesses, contributing to the future of banking.