Skip to content

Commit 43f99e5

Browse files
Kunal614cclauss
authored andcommitted
Python program that surfs 3 site at a time (#1389)
* Python program that scrufs 3 site at a time add input in the compiling time like -- python3 project1.py (man) * Update project1.py * noqa: F401 and reformat with black * Rename project1.py to web_programming/crawl_google_results.py * Add beautifulsoup4 to requirements.txt * Add fake_useragent to requirements.txt * Update crawl_google_results.py * headers={"UserAgent": UserAgent().random} * html.parser, not lxml * link, not links
1 parent 5ef5f67 commit 43f99e5

File tree

2 files changed

+22
-0
lines changed

2 files changed

+22
-0
lines changed

requirements.txt

+2
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,6 @@
1+
beautifulsoup4
12
black
3+
fake_useragent
24
flake8
35
matplotlib
46
mypy
+20
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,20 @@
1+
import sys
2+
import webbrowser
3+
4+
from bs4 import BeautifulSoup
5+
from fake_useragent import UserAgent
6+
import requests
7+
8+
print("Googling.....")
9+
url = "https://www.google.com/search?q=" + " ".join(sys.argv[1:])
10+
res = requests.get(url, headers={"UserAgent": UserAgent().random})
11+
# res.raise_for_status()
12+
with open("project1a.html", "wb") as out_file: # only for knowing the class
13+
for data in res.iter_content(10000):
14+
out_file.write(data)
15+
soup = BeautifulSoup(res.text, "html.parser")
16+
links = list(soup.select(".eZt8xd"))[:5]
17+
18+
print(len(links))
19+
for link in links:
20+
webbrowser.open(f"http://google.com{link.get('href')}")

0 commit comments

Comments
 (0)