Keep in touch with meI'm using Intch to connect with new people. Use this link to open chat with me via Intch app
Work Background
Systems Engineer
American Family InsuranceSystems Engineer
Dec. 2023United StatesResponsible for the management and administration of large HPC production environment.  Responsible for provisioning and configuration management for GPU servers, interconnect fabrics, and network infrastructure. Perform administration of job scheduling environment. Responsible for troubleshooting and remediation of system issues in cooperation with stakeholders. Drafted operational documentation (runbooks) and training materials. Responsible for the maintenance and administration of large virtual and cloud environment. Responsible for the administration and maintenance of large GPFS implementations.
Infrastructure Manager
Toyota North AmericaInfrastructure Manager
Jun. 2021 - Oct. 2023United StatesManaged large HPC/GPU infrastructure for Vehicle development Team in large data center environment. Configured interfaces between testing servers, vehicle ECUs and head units. Wrote test cases and implemented ELK for collection and analysis of test data. Developed and curated knowledge management system for research staff. Acted as scrum master in Agile environment and worked with project management staff to track project goals. Held ownership of Docker and Kubernetes implementation. Administered and maintained VMWare ESX production environment. Provided third level support using ServiceNow and Jira. Responsible for all infrastructure elements including but not limited to shared storage, interconnect fabrics, backup/disaster recovery, network infrastructure, communications and messaging infrastructure, VPN, web servers (Apache, lighttpd), and user account management.
Laboratory Manager
IBMLaboratory Manager
Aug. 2018 - Jan. 2020United StatesPrimary responsibilities included maintaining availability for multiple large UNIX and Linux HPC/GPU clusters. Substantially reduced labor costs after building new infrastructure for bare metal systems provisioning and deployment. Responsible for orchestration and configuration management duties using xCAT, Ansible, and IBM Spectrum LSF. Led testing efforts to certify AI and deep learning configurations with Nvidia GPU accelerators for multiple frameworks including Tensor Flow, Caffe, and PyTorch. Administered knowledge management system to capture and curate large data sets for process improvement. Implemented Kubernetes for container orchestration. Provided second and third level help desk support for technical issues using Jira. Supported and maintained licensing infrastructure for all operating systems/applications. Held top line responsibility for all laboratory infrastructure including hardware/software, network infrastructure (subnet, VLAN, DHCP, and DNS), backup/disaster recovery, inventory, asset management, product lifecycle, and roadmap input.
Principal Systems Engineer
Hewlett PackardPrincipal Systems Engineer
Mar. 2003 - Jan. 2017United StatesCustomer facing technical role for the deployment and integration of large HPC/GPU clusters for enterprise clients. Delivered tailored systems documentation and training materials to customer staff detailing system administration best practices, user support, and troubleshooting of converged infrastructure. Developed, integrated, and supported system automation tools using PERL and Bash scripts. Delivered pre-sales consulting services and training for government, education, and medical customers seeking to leverage HP blade server infrastructure. Assisted clients with core service deployment/integration using Microsoft Active Directory, Novell Directory Services, DNS, DHCP, TCP/IP, NFS/NIS, and X.500. Worked closely with OEM and ISV partners to build advanced proof of concept dual-boot Microsoft Windows 2008/CentOS high performance cluster. Provided training and best practices documentation for configuration management and system monitoring utilizing Ansible, Nagios, Cacti, Chef, and XCAT. Built and maintained virtual development/proof of concept environment using VMware. Delivered customer-facing consulting services relating to systems integration in heterogeneous environments containing Linux (RHEL, CentOS, Debian, SuSE, Ubuntu), Microsoft Windows Server, Exchange, and SQL Server. Responsibilities included 10GbE/InfiniBand network fabrics, storage integration, performance tuning, benchmarking, and systems documentation. Maintained HP trainer certification with focus on converged infrastructure, HPC, and enterprise messaging/collaboration solutions. Delivered systems administration training for Linux (RHEL, CentOS, Debian) on HP Proliant blade server environments. Provided engineering services for HPC environments using Platform LSF, Torque/MAUI MOAB, Bright, and HP CMU. Worked extensively with web hosting customers to optimize and consolidate diverse hosting infrastructures (cPanel, WordPress, SAS models). Worked with storage partners (EMC, DDN, Net App, and Hitachi) to integrate clustered file system support (Lustre, GPFS, SFS) into HPC environments. Delivered consulting services and technical training to customer staff relating to business continuity solutions, backup/restore methodologies, and storage management. Presenter at industry trade shows as well as HP sponsored events covering HP blade server infrastructure and high-performance compute clustering.
Intch is a Professional Networking App for the Future of Work
300k+ people
130+ countries
AI matching
See more people like Craig on Intch
IT
1021988 people
17
Application Developer @ Nile27
16
Manager Solution Consulting @ Alkami Technology
16
Program Manager @ DISH Network
ITProduct Manager
163260 people
16
Manager Solution Consulting @ Alkami Technology
17
Business Analyst
16
Director, User Experience Design @ Cox Automotive Inc.