Skip to content

Commit 70b9f64

Browse files
author
codebasics
committed
regex
1 parent 6e41dfd commit 70b9f64

4 files changed

+618
-0
lines changed
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,154 @@
1+
{
2+
"cells": [
3+
{
4+
"cell_type": "code",
5+
"execution_count": 1,
6+
"metadata": {},
7+
"outputs": [],
8+
"source": [
9+
"import re"
10+
]
11+
},
12+
{
13+
"cell_type": "markdown",
14+
"metadata": {},
15+
"source": [
16+
"**1. Extract all twitter handles from following text. Twitter handle is the text that appears after https://twitter.com/ and is a single word. Also it contains only alpha numeric characters i.e. A-Z a-z , o to 9 and underscore _**"
17+
]
18+
},
19+
{
20+
"cell_type": "code",
21+
"execution_count": 5,
22+
"metadata": {
23+
"scrolled": true
24+
},
25+
"outputs": [
26+
{
27+
"data": {
28+
"text/plain": [
29+
"['elonmusk', 'teslarati', 'dummy_tesla', 'dummy_2_tesla']"
30+
]
31+
},
32+
"execution_count": 5,
33+
"metadata": {},
34+
"output_type": "execute_result"
35+
}
36+
],
37+
"source": [
38+
"text = '''\n",
39+
"Follow our leader Elon musk on twitter here: https://twitter.com/elonmusk, more information \n",
40+
"on Tesla's products can be found at https://www.tesla.com/. Also here are leading influencers \n",
41+
"for tesla related news,\n",
42+
"https://twitter.com/teslarati\n",
43+
"https://twitter.com/dummy_tesla\n",
44+
"https://twitter.com/dummy_2_tesla\n",
45+
"'''\n",
46+
"pattern = 'https://twitter\\.com/([a-zA-Z0-9_]+)'\n",
47+
"\n",
48+
"re.findall(pattern, text)"
49+
]
50+
},
51+
{
52+
"cell_type": "markdown",
53+
"metadata": {},
54+
"source": [
55+
"**2. Extract Concentration Risk Types. It will be a text that appears after \"Concentration Risk:\", In below example, your regex should extract these two strings**\n",
56+
"\n",
57+
"(1) Credit Risk\n",
58+
"\n",
59+
"(2) Supply Rish"
60+
]
61+
},
62+
{
63+
"cell_type": "code",
64+
"execution_count": 6,
65+
"metadata": {},
66+
"outputs": [
67+
{
68+
"data": {
69+
"text/plain": [
70+
"['Credit Risk', 'Credit Risk']"
71+
]
72+
},
73+
"execution_count": 6,
74+
"metadata": {},
75+
"output_type": "execute_result"
76+
}
77+
],
78+
"source": [
79+
"text = '''\n",
80+
"Concentration of Risk: Credit Risk\n",
81+
"Financial instruments that potentially subject us to a concentration of credit risk consist of cash, cash equivalents, marketable securities,\n",
82+
"restricted cash, accounts receivable, convertible note hedges, and interest rate swaps. Our cash balances are primarily invested in money market funds\n",
83+
"or on deposit at high credit quality financial institutions in the U.S. These deposits are typically in excess of insured limits. As of September 30, 2021\n",
84+
"and December 31, 2020, no entity represented 10% or more of our total accounts receivable balance. The risk of concentration for our convertible note\n",
85+
"hedges and interest rate swaps is mitigated by transacting with several highly-rated multinational banks.\n",
86+
"Concentration of Risk: Supply Risk\n",
87+
"We are dependent on our suppliers, including single source suppliers, and the inability of these suppliers to deliver necessary components of our\n",
88+
"products in a timely manner at prices, quality levels and volumes acceptable to us, or our inability to efficiently manage these components from these\n",
89+
"suppliers, could have a material adverse effect on our business, prospects, financial condition and operating results.\n",
90+
"'''\n",
91+
"pattern = 'Concentration of Risk: ([^\\n]*)'\n",
92+
"\n",
93+
"re.findall(pattern, text)"
94+
]
95+
},
96+
{
97+
"cell_type": "markdown",
98+
"metadata": {},
99+
"source": [
100+
"**3. Companies in europe reports their financial numbers of semi annual basis and you can have a document like this. To exatract quarterly and semin annual period you can use a regex as shown below**\n",
101+
"\n",
102+
"Hint: you need to use (?:) here to match everything enclosed"
103+
]
104+
},
105+
{
106+
"cell_type": "code",
107+
"execution_count": 2,
108+
"metadata": {},
109+
"outputs": [
110+
{
111+
"data": {
112+
"text/plain": [
113+
"['2021 Q1', '2021 S1']"
114+
]
115+
},
116+
"execution_count": 2,
117+
"metadata": {},
118+
"output_type": "execute_result"
119+
}
120+
],
121+
"source": [
122+
"text = '''\n",
123+
"Tesla's gross cost of operating lease vehicles in FY2021 Q1 was $4.85 billion.\n",
124+
"BMW's gross cost of operating vehicles in FY2021 S1 was $8 billion.\n",
125+
"'''\n",
126+
"\n",
127+
"pattern = 'FY(\\d{4} (?:Q[1-4]|S[1-2]))'\n",
128+
"matches = re.findall(pattern, text)\n",
129+
"matches"
130+
]
131+
}
132+
],
133+
"metadata": {
134+
"kernelspec": {
135+
"display_name": "Python 3",
136+
"language": "python",
137+
"name": "python3"
138+
},
139+
"language_info": {
140+
"codemirror_mode": {
141+
"name": "ipython",
142+
"version": 3
143+
},
144+
"file_extension": ".py",
145+
"mimetype": "text/x-python",
146+
"name": "python",
147+
"nbconvert_exporter": "python",
148+
"pygments_lexer": "ipython3",
149+
"version": "3.8.5"
150+
}
151+
},
152+
"nbformat": 4,
153+
"nbformat_minor": 4
154+
}
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,135 @@
1+
{
2+
"cells": [
3+
{
4+
"cell_type": "markdown",
5+
"metadata": {},
6+
"source": [
7+
"<h1 align='center'>Python Regular Expression Tutorial Exericse</h1>"
8+
]
9+
},
10+
{
11+
"cell_type": "code",
12+
"execution_count": 3,
13+
"metadata": {},
14+
"outputs": [],
15+
"source": [
16+
"import re"
17+
]
18+
},
19+
{
20+
"cell_type": "markdown",
21+
"metadata": {},
22+
"source": [
23+
"**1. Extract all twitter handles from following text. Twitter handle is the text that appears after https://twitter.com/ and is a single word. Also it contains only alpha numeric characters i.e. A-Z a-z , o to 9 and underscore _**"
24+
]
25+
},
26+
{
27+
"cell_type": "code",
28+
"execution_count": null,
29+
"metadata": {
30+
"scrolled": true
31+
},
32+
"outputs": [],
33+
"source": [
34+
"text = '''\n",
35+
"Follow our leader Elon musk on twitter here: https://twitter.com/elonmusk, more information \n",
36+
"on Tesla's products can be found at https://www.tesla.com/. Also here are leading influencers \n",
37+
"for tesla related news,\n",
38+
"https://twitter.com/teslarati\n",
39+
"https://twitter.com/dummy_tesla\n",
40+
"https://twitter.com/dummy_2_tesla\n",
41+
"'''\n",
42+
"pattern = '' # todo: type your regex here\n",
43+
"\n",
44+
"re.findall(pattern, text)"
45+
]
46+
},
47+
{
48+
"cell_type": "markdown",
49+
"metadata": {},
50+
"source": [
51+
"**2. Extract Concentration Risk Types. It will be a text that appears after \"Concentration Risk:\", In below example, your regex should extract these two strings**\n",
52+
"\n",
53+
"(1) Credit Risk\n",
54+
"\n",
55+
"(2) Supply Rish"
56+
]
57+
},
58+
{
59+
"cell_type": "code",
60+
"execution_count": null,
61+
"metadata": {},
62+
"outputs": [],
63+
"source": [
64+
"text = '''\n",
65+
"Concentration of Risk: Credit Risk\n",
66+
"Financial instruments that potentially subject us to a concentration of credit risk consist of cash, cash equivalents, marketable securities,\n",
67+
"restricted cash, accounts receivable, convertible note hedges, and interest rate swaps. Our cash balances are primarily invested in money market funds\n",
68+
"or on deposit at high credit quality financial institutions in the U.S. These deposits are typically in excess of insured limits. As of September 30, 2021\n",
69+
"and December 31, 2020, no entity represented 10% or more of our total accounts receivable balance. The risk of concentration for our convertible note\n",
70+
"hedges and interest rate swaps is mitigated by transacting with several highly-rated multinational banks.\n",
71+
"Concentration of Risk: Supply Risk\n",
72+
"We are dependent on our suppliers, including single source suppliers, and the inability of these suppliers to deliver necessary components of our\n",
73+
"products in a timely manner at prices, quality levels and volumes acceptable to us, or our inability to efficiently manage these components from these\n",
74+
"suppliers, could have a material adverse effect on our business, prospects, financial condition and operating results.\n",
75+
"'''\n",
76+
"pattern = '' # todo: type your regex here\n",
77+
"\n",
78+
"re.findall(pattern, text)"
79+
]
80+
},
81+
{
82+
"cell_type": "markdown",
83+
"metadata": {},
84+
"source": [
85+
"**3. Companies in europe reports their financial numbers of semi annual basis and you can have a document like this. To exatract quarterly and semin annual period you can use a regex as shown below**\n",
86+
"\n",
87+
"Hint: you need to use (?:) here to match everything enclosed"
88+
]
89+
},
90+
{
91+
"cell_type": "code",
92+
"execution_count": null,
93+
"metadata": {},
94+
"outputs": [],
95+
"source": [
96+
"text = '''\n",
97+
"Tesla's gross cost of operating lease vehicles in FY2021 Q1 was $4.85 billion.\n",
98+
"BMW's gross cost of operating vehicles in FY2021 S1 was $8 billion.\n",
99+
"'''\n",
100+
"\n",
101+
"pattern = '' # todo: type your regex here\n",
102+
"matches = re.findall(pattern, text)\n",
103+
"matches"
104+
]
105+
},
106+
{
107+
"cell_type": "markdown",
108+
"metadata": {},
109+
"source": [
110+
"__[Solution](http://ndtv.com)__"
111+
]
112+
}
113+
],
114+
"metadata": {
115+
"kernelspec": {
116+
"display_name": "Python 3",
117+
"language": "python",
118+
"name": "python3"
119+
},
120+
"language_info": {
121+
"codemirror_mode": {
122+
"name": "ipython",
123+
"version": 3
124+
},
125+
"file_extension": ".py",
126+
"mimetype": "text/x-python",
127+
"name": "python",
128+
"nbconvert_exporter": "python",
129+
"pygments_lexer": "ipython3",
130+
"version": "3.8.5"
131+
}
132+
},
133+
"nbformat": 4,
134+
"nbformat_minor": 4
135+
}

0 commit comments

Comments
 (0)