In Brief
Rebuilding the Backbone of Oregon’s Student Data
Behind every education funding decision is a complex data system most people never see. Oregon’s Statewide Longitudinal Data System (SLDS) relied on an aging ETL tool and outdated Java libraries, making critical student data access slow, costly, and difficult to maintain.
The Oregon Longitudinal Data Collaborative (OLDC) needed an efficient way to manage and process statewide student enrollment and program data housed in the SLDS. Resource Data implemented a secure, VNET-injected Azure Databricks workspace and rebuilt ETL data processes with Azure PySparks notebooks. The system cut the time required to update, validate, and deliver refreshed data and reports, from weeks to hours, and reduced annual costs by over 80%.
Key Takeaways
Cutting over 80% of annual ETL Costs with Databricks Automation
-
A Unified, Structured View of Student Data Across Oregon
V-NET Injected Databricks Workspace replaced manual and fragmented processing with a single, automated platform. Two hundred data tables across four state agencies are unified into a single view of student enrollment and programs for easier reporting and decision-making.
-
Scheduled, Reliable Data Processing Replaced Manual Coordination
Databricks Workflows automates ETL execution across agencies, removing manual handoffs and coordination. Student data can now be refreshed, validated, and delivered on a predictable schedule in hours instead of weeks.
-
Code-Based ETL Significantly Reduced Maintenance Effort
Code-based PySpark notebooks replaced outdated Informatica workflows, making the ETL easier to update, test, and adapt as reporting and policy needs change.
-
Radically Lower Ongoing Operational Costs by 80%
Switching off of legacy Informatica tooling reduced annual ETL costs by over 80% with the Databricks Workspace model.
-
Improved Student Longitudinal Analysis Through Streamlined Identity Matching
Databricks standardizes and automates the data preparation steps feeding OLDC’s identity matching process, reducing delays and making longitudinal analysis for 18,000 students more reliable.
Our Client
Oregon Longitudinal Data Collaborative (OLDC)
The Oregon Higher Education Coordinating Commission (OHECC) is a state agency that coordinates higher education and workforce training in Oregon. Its mission is to improve access, equity, and student success, so education programs meet the needs of Oregon’s workforce.
Within OHECC, the Oregon Longitudinal Data Collaborative (OLDC) serves as an inter-agency research office that links education and workforce data across state partners like OHECC, public colleges and universities, apprenticeship programs, and workforce agencies.
OLDC manages the Statewide Longitudinal Data System (SLDS) and produces annual Career and Technical Education (CTE) outcomes reports that track whether graduates are employed or enrolled in postsecondary education. OHECC and the Oregon Department of Education use this data to evaluate programs, allocate funding, and support statewide policy development.
Challenges
Aging ETL Tools Slowed Data Processes and Delayed Critical Updates
The Statewide Longitudinal Data System supports Oregon’s long-term analysis of student pathways from education into employment. The system links approximately 156 GB of student enrollment data across 200 data tables for over 18,000 Career and Technical Education (CTE) student records across Oregon institutions. This data underpins Oregon’s research, reporting, and education funding decisions.
Over time, the Informatica-based ETL supporting the SLDS limited daily tasks and data processes. Aging graphical workflows and fragmented pipelines were slow, difficult to update, and required manual data movement between systems. Statewide enrollment updates took weeks or months, making it hard for OLDC to refresh data for research or respond to updated policy and reporting needs for funding.
Outdated, Java-based Informatica data components were costly and unpredictable. It made it difficult to align student data with corresponding workforce policy programs. Without modernization, OLDC faced growing data volumes, high vendor costs, and risked the ability to deliver timely insights to education and workforce policymakers.
The Solution
Rebuilding Trust in Statewide Student Data with Cloud-Based Databricks Architecture
Resource Data modernized and unified OLDC’s data operations by implementing a VNet-injected Azure Databricks workspace. The Databricks environment is a cloud-based analytics platform deployed within Oregon’s existing Azure cloud network. The solution focuses on secure access, automation, and a maintainable workflow to support long-term reporting across 200 cross-agency data tables.
The new architecture eliminated manual file transfers by allowing Databricks to connect directly to OLDC’s data sources inside the State’s network. Within this environment, our team rebuilt OLDC’s statewide ETL processes using PySpark notebooks. These notebooks handle data ingestion, validation, and transformation across ETL cycles, so they are automatically ready for downstream identity matching and reporting.
Within the Databricks workspace, workflows automate the full ETL process, including handoffs to and retrieval of results from the identity-matching system. Scheduling, dependencies, and monitoring are centrally managed, replacing ad hoc execution. Each stage of the ETL is tracked in a single system, making issues faster to identify and resolve. What once took weeks of coordination now completes in hours with a consistent, analysis-ready view of OLDC’s data in the SLDS.
Features
From Manual Pipelines and Updates to a Secure, Automated, Quick Data Platform
-
VNet-Injected Databricks Workspace for Secure, Direct Data Access
Specialized agents retrieve data from structured records and unstructured documents, allowing the chatbot to assemble thorough answers from multiple authoritative sources.
-
PySpark-Based ETL Enforces Data Quality and Simplifies Maintenance
Code-based PySpark notebooks enforce agency-specific data quality rules, log errors, and document transformations. This improves data consistency, simplifies debugging, and makes ETL logic easier to maintain as requirements change.
-
Databricks Workflows Eliminate Manual Coordination and Speed Data Delivery
Databricks Workflows automate ETL sequencing and scheduling across state agencies. This removes manual handoffs, provides clear run visibility, and delivers refreshed data in hours instead of weeks. Databricks keeps a history of data changes, while workflows provide clear run status and failure points.
-
A Governed ETL with Privacy, Compliance, and Security
ETL development and changes were reviewed and approved in coordination with OLDC’s data governance committee and data partners. This ensured data quality rules, transformations, and reporting outputs met privacy, security, and compliance requirements before reaching researchers.
Results
Reliable, Cloud Data Processes Transform Statewide Student Pathways
The Databricks Workspace reduced data processing cycles from weeks to only hours. Workflows that once required manual coordination now run automatically, with clear visibility into each step. ETL Operational annual costs dramatically dropped from by over 80%.
SLDS new architecture delivers analysis-ready unified student data faster on a scheduled pipeline. Improved data processing enables researchers and policymakers to thoroughly evaluate student and workforce programs and make confident funding and policy decisions. This modernization changed how quickly OHECC can answer policy questions and adapt to new education and workforce policies.

What's Next
Evolving the SLDS to Meet Oregon’s Growing Data and Policy Demands
With Databricks established as a reliable cloud-computing and automation platform, OLDC is positioned to evolve and update SLDS components as needed. Resource Data continues to support OLDC as they scale their data capabilities to serve Oregon’s long-term research and student policy needs.