Forecasting Suspicious Account Activity at Large-Scale Online Service Providers
In the face of large-scale automated social engineering at-tacks to large online services, fast detection and remediation of com-promised accounts are crucial to limit the spread of the attack and tomitigate the overall damage to users, companies, and the public at large.We advocate a fully automated approach based on machine learning: wedevelop an early warning system that harnesses account activity tracesto predict which accounts are likely to be compromised in the future. Wedemonstrate the feasibility and applicability of the system through anexperiment at a large-scale online service provider using four months ofreal-world production data encompassing hundreds of millions of users.We show that—even limiting ourselves to login data only in order toderive features with low computational cost, and a basic model selec-tion approach—our classifier can be tuned to achieve good classificationprecision when used for forecasting. Our system correctly identifies upto one month in advance the accounts later flagged as suspicious withprecision, recall, and false positive rates that indicate the mechanismis likely to prove valuable in operational settings to support additionallayers of defense.