

Neha Arora, Mobility AI Team Lead, and Yechen Li, Software Engineer, Google Research
We establish a definitive correlation between hard-braking events (HBEs) captured via Android Auto and actual road segment crash rates. Our findings conclusively demonstrate that roadways with elevated HBE frequencies exhibit significantly higher crash risks, positioning HBEs as crucial leading indicators for robust road safety evaluations.
Traditional traffic safety analysis relies predominantly on police-reported crash statistics, the established benchmark due to their direct correlation with fatalities, injuries, and property damage. However, this reliance on historical crash data as a predictive model foundation presents considerable limitations; such data inherently functions as a lagging indicator. Furthermore, crashes are statistically infrequent occurrences on arterial and local roadways, necessitating years to compile sufficient data for a valid safety profile of a specific road segment. This data sparsity, compounded by inconsistent reporting standards across jurisdictions, impedes the development of resilient risk prediction models. Proactive safety assessments demand leading indicators: actionable proxies for crash risk that correlate with safety outcomes yet manifest with greater frequency than actual collisions.
In "From Lagging to Leading: Validating Hard Braking Events as High-Density Indicators of Segment Crash Risk", we rigorously evaluate the effectiveness of hard-braking events (HBEs) as a scalable surrogate for crash risk. An HBE signifies a vehicle's forward deceleration exceeding a predefined threshold (-3m/s²), interpreted as an evasive maneuver. HBEs enable comprehensive network-wide analysis due to their origin in connected vehicle data, distinguishing them from proximity-based surrogates like time-to-collision that often depend on fixed sensor infrastructure. We established a statistically significant positive correlation between crash rates (across all severity levels) and HBE frequency by integrating public crash data from Virginia and California with anonymized, aggregated HBE data from the Android Auto platform.
Data density
To rigorously validate this metric's utility, we analyzed a decade of public crash data juxtaposed with aggregated HBE measurements. The paramount advantage of HBEs lies in their signal density. Our analysis of road segments across California and Virginia revealed that the count of segments experiencing observed HBEs was an astounding 18 times greater than those with reported crashes. While crash data is notoriously sparse, often requiring years to record a single event on certain local roads, HBEs provide a continuous, high-volume data stream, effectively illuminating previously invisible gaps in the safety landscape.

HBEs are observed on 18x more road segments compared to reported crashes.
Statistical validation
The fundamental objective was to ascertain if a high frequency of HBEs definitively correlates with an elevated crash rate. We employed negative binomial (NB) regression models, a recognized standard in the Highway Safety Manual (HSM), to accurately account for the inherent overdispersion characteristic of crash data.
Our model structure meticulously controlled for a comprehensive array of confounding variables, including:
- Exposure: Quantified traffic volume and precise segment length.
- Infrastructure: Detailed road classification (local, arterial, highway), gradient analysis, and cumulative turning radii.
- Dynamics: Identification of ramp presence and analysis of lane count variations.
The analytical outcomes revealed a statistically significant nexus between HBE rates and crash rates across both surveyed states. Road segments exhibiting higher HBE frequencies consistently demonstrated amplified crash rates, a relationship robustly maintained across diverse road typologies, from local arterials to intricate controlled-access highways.

Crash Rate vs. HBE rate for different types of roads in California and Virginia.
The regression analysis additionally quantified the impact of specific infrastructural elements. For example, the presence of an on-ramp on a road segment showed a positive association with crash risk in both states, predominantly attributable to the complex weaving maneuvers necessitated by merging traffic.
Case study: High-risk merge identification
To vividly illustrate the practical utility of this metric, we investigated a critical freeway merge segment within California, facilitating the confluence of Highway 101 and Highway 880. Historical data indicates this segment registers an HBE rate approximately 70 times greater than the average California freeway segment, with an average crash occurrence every six weeks over the preceding decade.

Freeway merge segment in California Bay Area with one crash every six weeks and a 70x higher than average HBE rate.
Upon analyzing the connected vehicle data for this specific location, we determined it ranked within the top 1% of all road segments nationwide for HBE frequency. The HBE signal effectively identified this high-risk outlier proactively, obviating the need for the decade of crash reports required for statistical confirmation of the danger. This direct alignment substantiates HBEs as a reliable proxy capable of pinpointing hazardous locations even when long-term collision history is absent.
Real-world application
Validating HBEs as a definitive proxy for crash risk elevates a raw sensor metric into an indispensable safety tool for comprehensive road management. This validation empowers the utilization of connected vehicle data for network-wide traffic safety assessments, delivering unparalleled spatial and temporal granularity. While these findings highlight significant utility for road segment risk determination, they do not extend to conclusions regarding location-independent driving behavior risk.
The Mobility AI team at Google Research is actively collaborating with Google Maps Platform to integrate these HBE datasets as a core component of the Roads Management Insights offering. By incorporating these high-density signals, transportation agencies gain access to aggregated, anonymized data that is substantially more current and encompasses a broader spectrum of the road network compared to conventional crash statistics. This empowers the identification of high-risk locations through leading indicators, liberating reliance from sparse and lagging collision records.
Future work
Although this study robustly confirms HBEs as a potent leading indicator of crash risk, avenues for further signal refinement exist. We are actively investigating methodologies for spatially clustering homogeneous road segments to achieve even greater data density. Addressing these potential limitations will facilitate a crucial transition from mere risk identification to targeted engineering interventions, where high-density data informs specific infrastructure enhancements, ranging from optimized signal timing and superior signage to the geometric redesign of perilous merge lanes.
Acknowledgements
This pioneering work represents a significant collaborative achievement involving researchers from Google and Virginia Tech. We extend our sincere gratitude to our co-authors Shantanu Shahane, Shoshana Vasserman, Carolina Osorio, Yi-fan Chen, Ivan Kuznetsov, Kristin White, Justyna Swiatkowska, and Feng Guo. We also acknowledge the invaluable contributions of Aurora Cheung, Andrew Stober, Reymund Dumlao, and Nick Kan in successfully translating this research into tangible, real-world applications.