30 Minute Skills: Web Scraping 102

The New England First Amendment Coalition recently launched a monthly educational series featuring short, practical lessons on journalism and the First Amendment.

The goal of the program — called “30 Minute Skills” — is to provide reporters and other citizens knowledge they can use immediately in newsgathering, data collection, storytelling and other areas of journalism and First Amendment law.

The lessons will be provided in a 30-minute format to accommodate the demanding schedules faced by many working in New England newsrooms. The program is free and open to the public. Registration for each lesson is required.

Additional Upcoming 30 Minute Skills Classes
Sept. 21 | How to Practice Self-Care as a Journalist | Register
Oct. 21 | FERPA and Public Records | Register

Web Scraping 102

August 20 | 12 p.m. EST

This is the second of two introductory lessons on collecting online data through web scraping. By attending this class, you will learn:

• Additional ways to scrape websites for information.
• How to use free online tools for web scraping.
• How to scrape data from PDF documents for use in spreadsheets.

Requirements for Class

• View Web Scraping 101
• Download Tabula (free software)
• Create an IFTTT.com account
• “Opioid Overdoes” PDF File
• “Massachusetts Racial Disparity” PDF File

About Your Instructor

MAGGIE MULVIHILL | Maggie Mulvihill is a veteran investigative and data journalist, journalism educator, news entrepreneur and impassioned defender of our right to know. Her data journalism students have been honored with 10 regional or national journalism awards since 2011 as well as being named finalists for the prestigious Livingston Award for Young Journalists. An attorney, Mulvihill is co-founder of the New England Center for Investigative Reporting. In 2014, Maggie founded Boston University’s summer workshops, Data + Narrative: Data-Driven Storytelling. She serves on the Steering Committee of the Reporter’s Committee for Freedom of the Press, the board of the New England First Amendment Coalition, was a 2004–2005 fellow at the Nieman Foundation for Journalism at Harvard University and was a 2014-2016 member of the Federal Freedom of Information Act Advisory Committee. (Photo Credit: Molly Hamill)

Recent 30 Minute Skills

Web Scraping 101 | This is the first of two introductory lessons taught by NEFAC’s Maggie Mulvihill about collecting online data through web scraping. By viewing this class, you will learn: (1) how web scraping can be helpful to data collection (2) where to find free tools to use for web scraping and (3) how to begin scraping various websites for information and data.

HIPAA and Public Records | The Health Insurance Portability and Accountability Act, or HIPAA, is one of the most misunderstood laws. Government officials often incorrectly cite the 25-year-old federal statute. By completing this lesson, you will learn (1) the history and purpose of HIPAA (2) under what circumstances and to what types of entities HIPAA applies and (3) how to best respond to public record denials that are incorrectly attributed to HIPAA.

Data Visualization 101 | While there are encouraging signs that COVID-19 in New England is becoming less severe, telling stories using data will continue to be an important skill for journalists covering any beat. By viewing this lesson, you will learn (1) how to find and obtain reliable health data (2) how to create simple data visualizations using free online tools and (3) how to identify story ideas based on vaccination data.

Protecting Women Journalists | With newsrooms often lacking effective support systems, women journalists are regularly belittled, have their professionalism questioned and endure mistreatment strictly based on their gender. By viewing this lesson, you will learn about: (1) The types of threats currently facing women journalists (2) How to protect yourself from online trolling and (3) How to stay safe at protests and large demonstrations.

How to Respond to a Subpoena | For an increasing number of New England newsrooms without regular access to attorneys, these legal orders can be intimidating and infringe on the rights of local journalists. Taught by attorney Matthew Byrne of Gravel & Shea in Burlington, Vt., this lesson teaches you: (1) the history of subpoenas and their legal authority (2) arguments and strategies that can be made in response to a subpoena (3) and how to advocate for yourself and newsroom when in court.

Data Cleaning 102 | This lesson is taught by NEFAC’s Maggie Mulvihill. It is the second of two introductory lessons on cleaning datasets obtained online or through public records requests. By completing this lesson, you will: (1) advance your data cleaning skills with OpenRefine (2) learn how to import dirty data from websites and increase memory in OpenRefine (3) build your facet and clustering skills (4) and learn how to split and merge data.

Data Cleaning 101 | The first of two introductory lessons on cleaning datasets obtained online or through public records requests. Instructed by NEFAC’s Maggie Mulvihill, a professor at Boston University. By completing this lesson, you will: (1) understand what data cleaning is and why it’s necessary (2) learn about the free tools available to help clean data and (3) begin building your data cleaning skills.

NEFAC was formed in 2006 to advance and protect the Five Freedoms of the First Amendment, including the principle of the public’s right to know. We’re a broad-based organization of people who believe in the power of an informed democratic society. Our members include lawyers, journalists, historians, academics and private citizens.

Our coalition is funded through contributions made by those who value the First Amendment and who strive to keep government accountable. Please make a donation here.

Leadership Circle donors for 2021 include Hearst Connecticut Media Group, The Boston Globe, Paul and Ann Sagan, and the Robertson Foundation. Major Supporters include Boston University, WBUR-Boston and the Academy of New England Journalists.