BSides Boston 2017 - Web Scraping for Fun and Profit
The participant will learn more about the kinds of data that are being posted to the internet, specifically on information-sharing sites regularly visited by malicious actors, and how this data can be retrieved, parsed, stored, and turned into actionable intelligence.
Pastebin.com and other public ‘paste’ sites are rich sources of sensitive information. Hackers will often post their stolen ‘loot’ to websites like these for public consumption. These sources of information go largely unmonitored.
Pastebin is keenly aware of this fact, and offers users the ability to create a list of alert keywords. In the event that one of the keywords is found in a public paste, an email is sent to the user. They will also remove pastes that are found to contain personally identifiable information. However, we have shown that a well-designed scraper can capture this information before it is removed by the Pastebin team. This data has included:
Suite of stolen NSA tools published to Pastebin NASA and other government sector breaches published to Pastebin Daily onslaught of compromised website credentials, Netflix, proxies, and occasionally, credit card data and even SSNs.
This session will begin with an introduction and agenda about what we will cover. From there, we will cite industry compliance standards and best practices that surround the safeguarding of data. After this, we will cite many of the OSINT channels that data is shared on by malicious individuals. At this point, we will discuss the methods used to retrieve the data that is appearing on these web sources. Next, we will move into how our scraper works, what kinds of data it looks for, and how we are alerted about potential findings. Additionally, we plan to speak on the process we used to integrate our scraper with our slack platform. Finally, we cover statistics and examples of the sensitive data that we have come across during our time scraping the data and we will close with best practices and takeaways we can implement right away to continue to keep our personal information and data that our organizations hold safe and secure.
Event Speakers:
Nick DeLena
Senior IT Audit Manager
Nick is one of our Senior Managers here at OCD Tech. He works with the internal senior management to scope and budget engagements, and provides oversight and training to existing staff.
Nick has 6 years in audit and advisory experience, and 12 years in various IT operations, analyst, and compliance positions. He also holds an MBA from Brown University and several certifications including:
- Certified Information Systems Auditor (CISA), ISACA
- Certified in Risk and Information Systems Control (CRISC), ISACA
- Information Technology Infrastructure Library Practitioner (ITIL)
- Security+ (CompTIA)
- Apple Certified Associate (ACA)
- Member, InfraGard, and partnership between the private sector and FBI
Scott Goodwin is one of our experienced IT Security Analyst’s here at OCD Tech. He is currently working on several research projects related to information security and penetration testing.
He graduated with a Bachelor of Science in Physics from the University of Massachusetts – Boston and previously worked in IT vulnerability assessments for the automobile dealership industry. He holds several certifications, including:
- Information Systems Audit and Control Association, Inc. (ISACA) member
- CompTIA CompTIA Security+ certification
- CompTIA IT Fundamentals certification
- ISACA CSX CyberSecurity Fundamentals certification
- Microsoft Technology Associate (MTA) – Security certification