Skip to content

Commit d373a96

Browse files
authored
Merge pull request DhanushNehru#289 from Charul00/update-readme
Added Web Scraper Script
2 parents 5738550 + d9ecdb1 commit d373a96

File tree

3 files changed

+40
-0
lines changed

3 files changed

+40
-0
lines changed

README.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -122,6 +122,8 @@ More information on contributing and the general code of conduct for discussion
122122
| Weather GUI | [Weather GUI](https://github.com/DhanushNehru/Python-Scripts/tree/master/Weather%20GUI) | Displays information on the weather. |
123123
| Website Blocker | [Website Blocker](https://github.com/DhanushNehru/Python-Scripts/tree/master/Website%20Blocker) | Downloads the website and loads it on your homepage in your local IP. |
124124
| Website Cloner | [Website Cloner](https://github.com/DhanushNehru/Python-Scripts/tree/master/Website%20Cloner) | Clones any website and opens the site in your local IP. |
125+
| Web Scraper | [Web Scraper](https://github.com/Charul00/Python-Scripts/tree/main/Web%20Scraper) | A Python script that scrapes blog titles from Python.org and saves them to a file. |
126+
125127
| Weight Converter | [Weight Converter](https://github.com/WatashiwaSid/Python-Scripts/tree/master/Weight%20Converter) | Simple GUI script to convert weight in different measurement units. |
126128
| Wikipedia Data Extractor | [Wikipedia Data Extractor](https://github.com/DhanushNehru/Python-Scripts/tree/master/Wikipedia%20Data%20Extractor) | A simple Wikipedia data extractor script to get output in your IDE. |
127129
| Word to PDF | [Word to PDF](https://github.com/DhanushNehru/Python-Scripts/tree/master/Word%20to%20PDF%20converter) | A Python script to convert an MS Word file to a PDF file. |

Web Scraper/README.md

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,8 @@
1+
In this script, we use the `requests` library to send a GET request to the Python.org blogs page. We then use the `BeautifulSoup` library to parse the HTML content of the page.
2+
3+
We find all the blog titles on the page by searching for `h2` elements with the class `blog-title`. We then print each title found and save them to a file named `blog_titles.txt`.
4+
5+
To run this script, first install the required libraries:
6+
7+
```bash
8+
pip install requests beautifulsoup4

Web Scraper/Web_Scraper.py

Lines changed: 30 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,30 @@
1+
import requests
2+
from bs4 import BeautifulSoup
3+
4+
# URL to scrape data from
5+
URL = "https://www.python.org/blogs/"
6+
7+
# Send a GET request to the URL
8+
response = requests.get(URL)
9+
10+
# Parse the webpage content using BeautifulSoup
11+
soup = BeautifulSoup(response.content, "html.parser")
12+
13+
# Find all the blog titles on the page
14+
titles = soup.find_all('h2', class_='blog-title')
15+
16+
# Print each title found
17+
print("Python.org Blog Titles:\n")
18+
for i, title in enumerate(titles, start=1):
19+
print(f"{i}. {title.get_text(strip=True)}")
20+
21+
# Save the titles to a file
22+
with open("blog_titles.txt", "w") as file:
23+
for title in titles:
24+
file.write(title.get_text(strip=True) + "\n")
25+
26+
print("\nBlog titles saved to 'blog_titles.txt'.")
27+
28+
29+
30+

0 commit comments

Comments
 (0)