We use cookies. Find out more about it here. By continuing to browse this site you are agreeing to our use of cookies.
#alert
Back to search results

Storage Systems Engineer

University of California - San Francisco
120,000 - 220,000
United States, California, San Francisco
654 Minnesota Street (Show on map)
May 21, 2026

JOB SUMMARY

This position is primarily responsible for architecture, implementation, and lifecycle management for the Facility for Advanced Computing (FAC), storage and systems, including support for large storage environments, NSF-funded infrastructure, and OS Nexus-aligned data platforms. The role ensures seamless integration between storage systems and the CoreHPC compute cluster, enabling performant, reliable, and scalable data access for AI, data science, and computational research workloads.

The Storage Systems Engineer will:

  • Work with the lead to continue supporting the design and evolution of storage architecture across on-prem and hybrid environments, including VAST, parallel filesystems, and enterprise storage platforms
  • Develop and maintain data movement strategies and tooling (e.g., rsync, rclone, Globus, SMB workflows) to support large-scale data ingestion, migration, and lifecycle management
  • Ensure tight integration between storage and HPC compute systems, optimizing throughput, latency, and reliability for distributed workloads
  • Support and scale storage systems backing major institutional initiatives (FAC storage, OS Nexus integration)
  • Collaborate closely with DevOps, networking, and security teams to deliver cohesive research infrastructure solutions
  • Design and implement monitoring, performance tuning, and capacity planning strategies for storage and data systems
  • Troubleshoot complex issues across storage, networking, and compute boundaries
  • Participate in system upgrades, migrations, and expansion efforts with minimal disruption to researchers
  • Provide guidance to researchers on data organization, transfer strategies, and performance optimization
  • Evaluate and recommend emerging storage technologies and architectures

This role may lead storage-focused projects and contribute to cross-functional initiatives that improve the scalability, usability, and reliability of UCSF's research computing ecosystem.

Department Overview

Academic Research Systems (ARS) serves the needs of the UCSF research community by providing an integrated repository of HIPAA compliant clinical and life sciences data and a centralized, secure, professionally managed infrastructure for the storage and management of research data. ARS empowers medical scientific investigations by offering secure computing environments, data capture, management and analysis tools, and support services which meet researchers' needs.

The Research Infrastructure team of the Academic Research Service (ARS) focuses on large scale research platform support, high performance computational and storage services for UCSF researchers so they can address complex computational, AI, and data science problems.

DUTIES & ESSENTIAL JOB FUNCTIONS

Identify the functions or tasks that employees in the job perform. The essential functions should state the purpose of the work and the results to be accomplished, rather than how the function is performed. Of the tasks listed, what percentage of time is devoted to each? The more time employees spend on a function, the more likely it is that the function is essential. Generally, include those functions that account for 10% or more of the work, i.e., key items that contribute significantly to the achievement of the job. The functions should add up to 100%.

%

of time

Essential Function (Yes/No)

Key Responsibilities

(To be completed by Supervisor)

25

Yes

Storage Architecture & Infrastructure

Design, deploy, and operate large-scale storage systems, including VAST and parallel filesystems.
Define standards for performance, redundancy, and scalability.
Lead the evolution of institutional storage platforms, including FAC storage environments.

15

yes

Data Movement & Migration

Architect and execute large-scale data migrations.
Develop and maintain data movement workflows using tools such as rsync, rclone, and Globus.
Optimize data transfer processes across storage and compute environments.

15

yes

HPC Integration & Performance

Integrate storage systems with the CoreHPC compute cluster.
Optimize I/O performance for AI, machine learning, and HPC workloads.
Support efficient data access patterns for distributed and scheduled workloads.

30

yes

Operations & Reliability

Implement monitoring, alerting, and capacity planning for storage systems.
Troubleshoot issues across storage, network, and compute infrastructure.
Perform system maintenance, patching, and lifecycle management.

10

no

Researcher Enablement

Advise researchers on data workflows and storage best practices.
Support onboarding of projects with large-scale data requirements.

5

no

Collaboration & Strategy

Collaborate with DevOps, networking, and security teams.
Evaluate and recommend new storage technologies and architectures.

0

0

0

100%

(To update total %, enter the amount of time in whole numbers (without the % symbol - e.g., 15, 20) then highlight the total sum (e.g., 1%) at the bottom of the column and press F9. The total sum should add up to 100%.)

REQUIRED QUALIFICATIONS

  • Bachelor's degree in related area such as compuer science or engineering, and 6+ years of experience with storage infrastructure support and management * or* 10+ years of related experience with large scale storage systems
  • Demonstrated testing and test planning skills. Demonstrated ability to create automated testing.
  • Knowledge of HPC job scheduler system design and operation such as SLURM or PBS,
  • Demonstrated skill (5 years +) deploying, managing, and troubleshooting Warewulf (or similar) infiniband based clusters
  • Ability to write technical documentation in a clear and concise manner. Ability to develop runbooks defining complex technical processes in a clear and concise manner
  • Strong knowledge of High performance parallel filesystems and storage such as GPFS, Lustre, Vast, DDN, etc
  • Understanding of system performance monitoring and actions that can be taken to improve or correct performance.
  • Demonstrated advanced knowledge, skills and abilities associated with system problem identification and resolution. Experience with design, configuration, operation, repair, and tuning of technology systems.
  • Ability to elicit and communicate technical and non-technical information in a clear and concise manner.
  • Self-motivated and works independently and as part of a team. Demonstrates problem-solving skills. Able to learn effectively and meet deadlines.
  • Advanced experience writing and editing the most complex scripts used to perform system maintenance and administration.
  • Advanced knowledge of computer security best practices and policies including demonstrated experience securing research cyberinfrastructure systems to meet NIST 800-171 / 800-223, HIPPA or IS-3 requirements

PREFERRED QUALIFICATIONS

  • Expert knowledge of HPC systems infrastructure design
  • Knowledge of the design, development and application of technology and systems to meet business needs.
  • General knowledge of other areas of IT. Thorough understanding of and experience with systems-related issues and actions that can be taken to improve or correct performance.
  • Demonstrated skills associated with adapting equipment and technology to serve user needs. Demonstrated comprehensive understanding of how system management actions affect other systems, system users and dependent / related functions.

About UCSF
The University of California, San Francisco (UCSF) is a leading university dedicated to promoting health worldwide through advanced biomedical research, graduate-level education in the life sciences and health professions, and excellence in patient care. It is the only campus in the 10-campus UC system dedicated exclusively to the health sciences. We bring together the world's leading experts in nearly every area of health. We are home to five Nobel laureates who have advanced the understanding of cancer, neurodegenerative diseases, aging and stem cells.
Pride Values
UCSF is a diverse community made of people with many skills and talents. We seek candidates whose work experience or community service has prepared them to contribute to our commitment to professionalism, respect, integrity, diversity and excellence - also known as our PRIDE values.
In addition to our PRIDE values, UCSF is committed to equity - both in how we deliver care as well as our workforce. We are committed to building a broadly diverse community, nurturing a culture that is welcoming and supportive, and engaging diverse ideas for the provision of culturally competent education, discovery, and patient care. Additional information about UCSF is available here.
Join us to find a rewarding career contributing to improving healthcare worldwide.
Equal Employment Opportunity
The University of California is an Equal Opportunity Employer. All qualified applicants will receive consideration for employment without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, disability, age, protected veteran status, or other protected status under state or federal law.

Salary Information


The final salary and offer components are subject to additional approvals based on UC policy.


Your placement within the salary range is dependent on a number of factors including your work experience and internal equity within this position classification at UCSF. For positions that are represented by a labor union, placement within the salary range will be guided by the rules in the collective bargaining agreement.


To learn more about the benefits of working at UCSF, including total compensation, please visit: https://ucnet.universityofcalifornia.edu/compensation-and-benefits/index.html

REQUIRED QUALIFICATIONS

  • Bachelor's degree in related area such as compuer science or engineering, and 6+ years of experience with storage infrastructure support and management * or* 10+ years of related experience with large scale storage systems
  • Demonstrated testing and test planning skills. Demonstrated ability to create automated testing.
  • Knowledge of HPC job scheduler system design and operation such as SLURM or PBS,
  • Demonstrated skill (5 years +) deploying, managing, and troubleshooting Warewulf (or similar) infiniband based clusters
  • Ability to write technical documentation in a clear and concise manner. Ability to develop runbooks defining complex technical processes in a clear and concise manner
  • Strong knowledge of High performance parallel filesystems and storage such as GPFS, Lustre, Vast, DDN, etc
  • Understanding of system performance monitoring and actions that can be taken to improve or correct performance.
  • Demonstrated advanced knowledge, skills and abilities associated with system problem identification and resolution. Experience with design, configuration, operation, repair, and tuning of technology systems.
  • Ability to elicit and communicate technical and non-technical information in a clear and concise manner.
  • Self-motivated and works independently and as part of a team. Demonstrates problem-solving skills. Able to learn effectively and meet deadlines.
  • Advanced experience writing and editing the most complex scripts used to perform system maintenance and administration.
  • Advanced knowledge of computer security best practices and policies including demonstrated experience securing research cyberinfrastructure systems to meet NIST 800-171 / 800-223, HIPPA or IS-3 requirements

PREFERRED QUALIFICATIONS

  • Expert knowledge of HPC systems infrastructure design
  • Knowledge of the design, development and application of technology and systems to meet business needs.
  • General knowledge of other areas of IT. Thorough understanding of and experience with systems-related issues and actions that can be taken to improve or correct performance.
  • Demonstrated skills associated with adapting equipment and technology to serve user needs. Demonstrated comprehensive understanding of how system management actions affect other systems, system users and dependent / related functions.
Applied = 0

(web-77cf7d65c7-z52c2)