System Engineer III - Cyber Infrastructure HPC
Full Time
Richardson, TX 75080
Posted
Job description
Job Summary
This position is responsible for provisioning, deploying, administering, monitoring, maintaining, troubleshooting, upgrading and patching of University high performance computational (HPC) resources and related research services. The SE-III will have additional responsibility to document processes, procedures, system configurations, services and to place configuration information within our configuration management systems. The SE-III will interact with research Pl’s on a regular basis and assist with computational cyberinfrastructure needs. Able to independently and collaboratively coalesce design specifications, manage and develop stable, best practice, environment specific enterprise class solutions. Able to manage and move forward projects. Possesses a comprehensive understanding of HPC solutions, architecture and lifecycles. Mentors junior engineers. Ability to multitask at high volume and with high detail. Ability to produce effective, timely results equally well alone and in team environments of all sizes. Able to develop and implement proactive, automated solutions and processes. Able to quickly prioritize and efficiently complete tasks of significantly varied scope, scale and technical requirement. Acts as a role model in demonstrating integrity and ethical behavior in working with confidential and university information. Understands and adapts with; agility to different work styles, conflict resolution techniques, office etiquette and demonstrates the ability to interact with employees and stakeholders in a positive, productive, technically but appropriately stratified manner. Self-motivated and stays abreast of applicable new technologies and technical methodologies to advance productivity and career path.
Minimum Education and Experience
Bachelor’s Degree with four (4) years related experience OR Associate degree with six (6) related experience OR High School or equivalent with eight (8) years related experience.
Preferred Education and Experience
Master’s degree in Computer Science or equivalent with four years of experience in corresponding research services, support efforts, products and technologies. Current knowledge of best practice and systems deployment and maintenance. Troubleshooting methodology and awareness of industry standards. Mid to deep understanding of applicable product, platform and service roadmaps. Great problem-solving skills. Great communication skills. Very strong technical documentation, diagramming and organizational skills. Level 3 support experience at scale of 1 to 3 with 3 being a senior specialist. Extensive system administration and networking skills. Experience with at least two high performance cluster operating systems such as OpenHPC. Experience with large scale high performance parallel file storage systems. Experience in supporting and operating 1Gbps – 100Gbps Ethernet and 100Gbps – 200 Gbps Infiniband HPC network interconnects. Experience with: Open source and commercial research related software, Python, R, Matlab, Julia, Ansys, Intel, nVidia cuda and GCC compilers. Experience with all related dev ops tools such as GitHub, GitLab, Ansible, package management tools for rpm and or deb package building. Ability to train, be trained and interact with other team members in a fast-paced modern environment. Familiarity with OpenHPC. Familiarity with SLURM. Familiarity with Lmod or environment modules. Ability to package scientific software into RPMs (and integrate with Lmod—so users can `module load <software>`). Familiarity with containers (docker, podman, apptainer). Familiarity with Open OnDemand. Familiarity with Singularity HPC.
Essential Duties and Responsibilities
Expected areas of expertise and duties will include current proficiency in the following:
Physical Activities Working Conditions Additional Information
- Professional, efficient HPC system administration, user support and training related to high-performance computing in a research environment.
- Assists in the development and implementation of policies, rules, and operation procedures for Research Computing and Cyber infrastructure to guarantee various assurance models such as NIST 800-53 and NIST 800-171 under which research is conducted.
- Must work in small and medium-sized team environments, so communication skills and professionalism are of great importance.
- A comprehensive understanding of HPC Services, Linux and supporting physical and virtual hosting platform configurations.
- Advanced installation, configuration, updating, networking, performance monitoring and troubleshooting of HPC Systems.
- Ability to develop, troubleshoot, modify, catalog, document and update scripts.
- Excellent interpersonal and communication skills at all levels.
- Familiarity with OpenHPC
- Familiarity with SLURM
- Familiarity with Lmod or environment modules
- Ability to package scientific software into RPMs and integrate with Lmod—so users can `module load <software>`
- Familiarity with containers (docker, podman, apptainer)
- Familiarity with Open OnDemand
- Familiarity with Singularity HPC
On-call availability for quickly responding to and resolving system emergencies, both during regular and emergency off-hours. Emergency on-call rotation availability for 24×7×365 coverage. Position is currently eligible for a remote work context with further discussion and agreement. Perform other duties as assigned. Sitting for extended periods of time. Dexterity of hands and fingers to operate a computer keyboard, mouse, power tools, and to handle other computer components. Lifting and transporting of moderately heavy objects, such as servers, switches, computers, and peripherals.
Visa sponsorship is not available.
FOR TEXAS RESIDENTS – Hybrid remote work environment is available and a UT Dallas Remote Work Agreement is required upon employment. Use of personal computers and other standard office equipment may be required. Must be located within the State of Texas (or Must be located within the DFW Area) and have the ability to be on campus with 24 hours of notice.
Special Instructions Summary Important Message
Visa sponsorship is not available.
FOR TEXAS RESIDENTS – Hybrid remote work environment is available and a UT Dallas Remote Work Agreement is required upon employment. Use of personal computers and other standard office equipment may be required. Must be located within the State of Texas (or Must be located within the DFW Area) and have the ability to be on campus with 24 hours of notice.
1) All employees serve as a representative of the University and are expected to display respect, civility, professional courtesy, consideration of others and discretion in all interactions with members of the UT Dallas community and the general public.
2) The University of Texas at Dallas is committed to providing an educational, living, and working environment that is welcoming, respectful, and inclusive of all members of the university community. UT Dallas does not discriminate on the basis of race, color, religion, sex (including pregnancy), sexual orientation, gender identity, gender expression, age, national origin, disability, genetic information, or veteran status in its services, programs, activities, employment, and education, including in admission and enrollment. EOE, including disability/veterans. The University is committed to providing access, equal opportunity, and reasonable accommodation for individuals with disabilities. To request reasonable accommodation in the employment application and interview process, contact the ADA Coordinator . For inquiries regarding nondiscrimination policies, contact the Title IX Coordinator.
2) The University of Texas at Dallas is committed to providing an educational, living, and working environment that is welcoming, respectful, and inclusive of all members of the university community. UT Dallas does not discriminate on the basis of race, color, religion, sex (including pregnancy), sexual orientation, gender identity, gender expression, age, national origin, disability, genetic information, or veteran status in its services, programs, activities, employment, and education, including in admission and enrollment. EOE, including disability/veterans. The University is committed to providing access, equal opportunity, and reasonable accommodation for individuals with disabilities. To request reasonable accommodation in the employment application and interview process, contact the ADA Coordinator . For inquiries regarding nondiscrimination policies, contact the Title IX Coordinator.
seankuhnke.com is the go-to platform for job seekers looking for the best job postings from around the web. With a focus on quality, the platform guarantees that all job postings are from reliable sources and are up-to-date. It also offers a variety of tools to help users find the perfect job for them, such as searching by location and filtering by industry. Furthermore, seankuhnke.com provides helpful resources like resume tips and career advice to give job seekers an edge in their search. With its commitment to quality and user-friendliness, seankuhnke.com is the ideal place to find your next job.