Skip to content

Commit dad6139

Browse files
authored
Update README.md with some content
1 parent 1f2aee4 commit dad6139

File tree

1 file changed

+232
-31
lines changed

1 file changed

+232
-31
lines changed

README.md

+232-31
Original file line numberDiff line numberDiff line change
@@ -1,11 +1,11 @@
1-
# Learn Python for Data Analysis
1+
<h1 align="center"> Learn Python for Data Analysis </h1>
22

33
I am building this repository for study purposes. I am on the journey to become a Data Analyst, and I want to share what I have learned along the way.
44

55
## Prerequisites
66
1. Python 3.x version
7-
2. Git is installed and you know basics of [git commands.](git/git-basic-commands.md)
8-
3. Have access and know [how to use terminal/command line.](cms/cms-basic-commands.md)
7+
2. Git is installed and you know basics of [git commands.](https://git-scm.com/book/en/v2/Git-Basics-Getting-a-Git-Repository)
8+
3. Have access and know [how to use terminal/command line.](https://learn.microsoft.com/en-us/windows-server/administration/windows-commands/windows-commands)
99

1010
## About the Content
1111

@@ -17,7 +17,7 @@ Note 2: At the time of viewing this, I may still be in the process of developing
1717

1818
### Python Introduction Level
1919

20-
1. [Why you should learn programming?](intro/why-should-you-learn-programming.md)
20+
1. Why you should learn programming?
2121
2. Why Python?
2222
3. Installing Python and setting up the environment.
2323
4. Baby steps with Python.
@@ -54,63 +54,264 @@ Note 2: At the time of viewing this, I may still be in the process of developing
5454
9. Transforming and Enconding Variables.
5555
10. Math and Statistics for Descriptive Analysis.
5656
11. Web Scraping and Data Analysis.
57-
12. Attribute Enginneering.
58-
13. Data Prepossing.
57+
12. Attribute Engineering.
58+
13. Data Prepocessing.
5959
14. Data Visualization Design.
6060

6161
### Libraries
6262

6363
Note: The purpose of this section is not to cover all Python libraries - this is not even possible. The main goal of this repository section is to delve deeper into libraries that I may be using during my studies and popular ones that I may use in the future.
6464

65-
1. [Pandas](libraries/pandas.md)
65+
#### 1. Pandas
6666

67-
Pandas is an open source library, providing high-performance, easy-to-use high level data structures and many data analysis tools for the Python programming language.
67+
About:
6868

69-
2. Matplotlib
69+
- Pandas is a Python library used for data analysis and manipulation. Provides high level data structures for efficiently storing large datasets and tools for working with them. [Official documentation.](https://pandas.pydata.org/docs/index.html)
7070

71-
Matplotlib is a comprehensive library for creating static, animated, and interactive visualizations in Python. Matplotlib makes easy things easy and hard things possible.
71+
Simple Example using Pandas to create a DataFrame and perfom some basic operations:
7272

73-
3. NumPy
73+
```Python
7474

75-
NumPy is the fundamental package for scientific computing with Python. Brings the computational power of languages like C and Fortran to Python. Forms the basis of powerful machine learning libraries Lies at the core of a rich ecosystem of data science libraries. NumPy’s accelerated processing of large arrays allows researchers to visualize datasets far larger than native Python could handle. NumPy's API is the starting point when libraries are written to exploit innovative hardware, create specialized array types, or add capabilities beyond what NumPy provides.
75+
import pandas as pd
7676

77-
4. Seaborn
77+
# Creating a DataFrame from a dictionary
7878

79-
Seaborn is a Python data visualization library based on matplotlib. It provides a high-level interface for drawing attractive and informative statistical graphics.
79+
data = {'Name': ['John', 'Jane', 'Jim', 'Joan'],
80+
'Age': [29, 31, 27, 35],
81+
'Country': ['USA', 'UK', 'Canada', 'Australia']}
82+
df = pd.DataFrame(data)
8083

81-
5. Scikit-Learn
84+
# Display the DataFrame
85+
print(df)
8286

83-
Simple and efficient tools for predictive data analysis. Accessible to everybody, and reusable in various contexts. Built on NumPy, SciPy, and matplotlib.
87+
# Selecting rows based on condition
88+
df = df[df['Age'] > 30]
8489

85-
6. TensorFlow
90+
# Sorting the DataFrame
91+
df = df.sort_values(by='Age', ascending=False)
8692

87-
TensorFlow is an open source software library for high performance numerical computation. Its flexible architecture allows easy deployment of computation across a variety of platforms (CPUs, GPUs, TPUs), and from desktops to clusters of servers to mobile and edge devices. TensorFlow is an open-source and popular deep learning library developed by Google.
93+
# Display the DataFrame after sorting
94+
print(df)
8895

89-
Deep learning is a subfield of machine learning, a set of algorithms, and is inspired by the structure and function of the human brain.
96+
```
9097

91-
7. BeautifulSoup
98+
Code Anatomy:
9299

93-
Beautiful Soup is a library that makes it easy to scrape information from web pages. It sits atop an HTML or XML parser, providing Pythonic idioms for iterating, searching, and modifying the parse tree.
100+
- `import pandas as pd`: Imports the Pandas library and renames it as "pd".
101+
- `data = {'Name': ['John', 'Jane', 'Jim', 'Joan'],'Age': [29, 31, 27, 35], 'Country': ['USA', 'UK', 'Canada', 'Australia']}`: Creates a dictionary of data with three columns - Name, Age, and Country.
102+
- `df = pd.DataFrame(data)`: Creates a Pandas DataFrame from the dictionary.
103+
- `print(df)`: Prints the DataFrame.
104+
- `df = df[df['Age'] > 30]`: Selects rows from the DataFrame where the age is greater than 30.
105+
- `df = df.sort_values(by='Age', ascending=False)`: Sorts the DataFrame in descending order of age.
106+
- `print(df)`: Prints the DataFrame after sorting.
94107

95-
8. Selenium
108+
#### 2. Matplotlib
96109

97-
For testers across the world, Selenium is the first choice for executing automated tests. Selenium is an open source automation testing tool that supports a number of scripting languages like Python, C#, Java, Perl, Ruby, JavaScript, etc. depending on the application to be tested, one can choose the script accordingly
110+
About:
98111

99-
9. StatsModels
112+
- Matplotlib is a Python library used for data visualization. Its help you to create a various types of static, animated and interactive visualizations. [Official documentation.]( https://matplotlib.org/ )
100113

101-
Statsmodels is a Python module that provides classes and functions for the estimation of many different statistical models, as well as for conducting statistical tests, and statistical data exploration. An extensive list of result statistics are available for each estimator. The results are tested against existing statistical packages to ensure that they are correct.
114+
Example using Matplotlib to create a line plot:
115+
impor
116+
```python
117+
118+
import matplotlib.pyplot as plt
102119

103-
10. SciPy
120+
# Data to plot
121+
x = [1, 2, 3, 4, 5]
122+
y = [2, 4, 6, 8, 10]
104123

105-
SciPy is a free and open-source Python library used for scientific computing and technical computing. It is a collection of mathematical algorithms and convenience functions built on the NumPy extension of Python
124+
# Creating the plot
125+
plt.plot(x, y)
106126

107-
11. PyTorch
127+
# Adding labels to the plot
128+
plt.xlabel('X axis')
129+
plt.ylabel('Y axis')
108130

109-
An open source machine learning framework that accelerates the path from research prototyping to production deployment.
131+
# Adding title to the plot
132+
plt.title('Line Plot Example')
110133

111-
12. Keras
134+
# Showing the plot
135+
plt.show()
136+
```
112137

113-
Keras is an API designed for human beings, not machines. Keras follows best practices for reducing cognitive load: it offers consistent & simple APIs, it minimizes the number of user actions required for common use cases, and it provides clear & actionable error messages. It also has extensive documentation and developer guides.
138+
Code Anatomy:
139+
140+
- `import matplotlib.pyplot as plt`: Imports the Matplotlib library and renames it as "plt" for ease of use.
141+
- `x = [1, 2, 3, 4, 5]`: Creates a list of values for the x-axis.
142+
- `y = [2, 4, 6, 8, 10]`: Creates a list of values for the y-axis.
143+
- `plt.plot(x, y)`: Creates a line plot using the x and y values.
144+
- `plt.xlabel('X axis')`: Adds a label to the x-axis.
145+
- `plt.ylabel('Y axis')`: Adds a label to the y-axis.
146+
- `plt.title('Line Plot Example')`: Adds a title to the plot.
147+
- `plt.show()`: Displays the plot.
148+
149+
#### 3. NumPy
150+
151+
About:
152+
153+
- NumPy is a library in Python used for numerical computing and data manipulation. It provides a high-perfomance multidimensional array object and tools for working with these arrays. [Official documentation.]( https://numpy.org/ )
154+
155+
Example with NumPy to create arrays:
156+
157+
```python
158+
159+
import numpy as np
160+
161+
# Creating a 1D array
162+
a = np.array([1, 2, 3, 4, 5])
163+
164+
# Creating a 2D array
165+
b = np.array([[1, 2, 3], [4, 5, 6]])
166+
167+
# Finding the shape of an array
168+
print(a.shape)
169+
print(b.shape)
170+
171+
# Reshaping an array
172+
b = b.reshape(3, 2)
173+
174+
# Finding the shape of the reshaped array
175+
print(b.shape)
176+
177+
# Finding the sum of all elements in an array
178+
print(a.sum())
179+
180+
# Finding the mean of all elements in an array
181+
print(a.mean())
182+
183+
```
184+
185+
Code Anatomy:
186+
187+
- `import numpy as np`: Imports the NumPy library and renames it as "np" for ease of use.
188+
- `import numpy as np`import numpy as np: Creates a one-dimensional NumPy array with values [1, 2, 3, 4, 5].
189+
- `b = np.array([[1, 2, 3], [4, 5, 6]])`: Creates a two-dimensional NumPy array with values [[1, 2, 3], [4, 5, 6]].
190+
- `print(a.shape)`: Prints the shape of the a array, which is (5,).
191+
- `print(b.shape)`: Prints the shape of the b array, which is (2, 3).
192+
- `b = b.reshape(3, 2)`: Reshapes the b array from shape (2, 3) to shape (3, 2).
193+
- `print(b.shape)`: Prints the shape of the reshaped b array, which is (3, 2).
194+
- `print(a.sum())`: Prints the sum of all elements in the a array, which is 15.
195+
- `print(a.mean())`: Prints the mean of all elements in the a array, which is 3.
196+
197+
#### 4. Seaborn
198+
199+
About:
200+
201+
- Seaborn is a Python library used for statistical data visualization. It provides a high-level interface for creating beautiful and informative statistical graphics. Seaborn is built on top of Matplotlib and provides a more convenient and simple way to create plots. [Official documentation.]( https://seaborn.pydata.org/ )
202+
203+
Example using Seaborn to create a bar plot:
204+
205+
```python
206+
207+
import seaborn as sns
208+
import matplotlib.pyplot as plt
209+
210+
# Creating a sample data set
211+
data = {"A": [1, 2, 3, 4, 5], "B": [2, 4, 6, 8, 10]}
212+
213+
# Creating a bar plot using Searbon
214+
sns.barplot(data=data, x="A", y="B")
215+
216+
# Adding labels to the plot
217+
plt.xlabel('X axis')
218+
plt.ylabel('Y axis')
219+
220+
# Adding title to the plot
221+
plt.title('Bar Plot Example')
222+
223+
# Showing the plot
224+
plt.show()
225+
226+
```
227+
228+
Code Anatomy:
229+
230+
- `import seaborn as sns`: Imports the Seaborn library and renames it as "sns" for ease of use.
231+
- `import matplotlib.pyplot as plt`: Imports the Matplotlib library and renames it as "plt" for ease of use.
232+
- `data = {"A": [1, 2, 3, 4, 5], "B": [2, 4, 6, 8, 10]}`: Creates a sample data set with two columns "A" and "B".
233+
- `sns.barplot(data=data, x="A", y="B")`: Creates a bar plot using the Seaborn library. The data argument specifies the data set to use, and the x and y arguments specify the columns to use for the x-axis and y-axis, respectively.
234+
- `plt.xlabel('X axis')`: Adds a label to the x-axis.
235+
- `plt.ylabel('Y axis')`: Adds a label to the y-axis.
236+
- `plt.title('Bar Plot Example')`: Adds a title to the plot.
237+
- `plt.show()`: Displays the plot.
238+
239+
#### 5. BeautifulSoup
240+
241+
About:
242+
243+
- BeautifulSoup is a Python Library used for web scraping and parsing HTML and XML documents. It provides a simple way to extract data from HTML and XML documents and navigate through the tree-like structure of the documents.
244+
245+
Example using BeautifulSoup to scrape the title of the top news articles from the BBC News website:
246+
247+
```python
248+
249+
import requests
250+
from bs4 import BeautifulSoup
251+
252+
# Sending a GET request to the webpage
253+
url = 'https://www.bbc.com/news'
254+
response = requests.get(url)
255+
256+
# Checking if the request was successful
257+
if response.status_code == 200;
258+
# Parsing the HTML content of the page
259+
soup = BeautifulSoup(response.text, 'html.parser')
260+
261+
# Finding all the article tags
262+
article_tags = soup.find_all('h3', {'class': 'gs-c-promo-heading__title'})
263+
264+
# Extracting the text from each article tag
265+
articles = [article_tag.text.strip() for article_tag in article_tags]
266+
267+
# Printing the titles of the articles
268+
print(articles)
269+
else:
270+
print('Failed to retrieve the page')
271+
272+
```
273+
274+
Code Anatomy:
275+
276+
- `import requests`: Imports the requests library, which is used to send HTTP requests.
277+
- `from bs4 import BeautifulSoup`: Imports the Beautiful Soup library.
278+
- `url = 'https://www.bbc.com/news'`: Specifies the URL of the BBC News homepage.
279+
- `response = requests.get(url)`: Sends a GET request to the specified URL and stores the response in the "response" variable.
280+
- `if response.status_code == 200:`: Checks if the request was successful by checking the status code of the response. A status code of 200 means that the request was successful.
281+
- `soup = BeautifulSoup(response.text, 'html.parser')`: Parses the HTML content of the page and creates a Beautiful Soup object. The response.text argument specifies the HTML content to parse, and the 'html.parser' argument specifies the HTML parser to use.
282+
- `article_tags = soup.find_all('h3', {'class': 'gs-c-promo-heading__title'})`: Searches for all the "h3" tags with the class gs-c-promo-heading__title, which correspond to the titles of the articles.
283+
- `articles = [article_tag.text.strip() for article_tag in article_tags]`: Extracts the text from each article tag and creates a list of article titles. The strip() method is used to remove any leading or trailing whitespace from the text.
284+
- `print(articles)`: Prints the list of article titles.
285+
- `else:`: Executed if the request was not successful.
286+
- `print('Failed to retrieve the page')`: Prints an error message.
287+
288+
#### 6. Scikit-Learn
289+
290+
under development.
291+
292+
#### 7. TensorFlow
293+
294+
under development.
295+
296+
#### 8. Selenium
297+
298+
under development.
299+
300+
#### 9. StatsModels
301+
302+
under development.
303+
304+
#### 10. SciPy
305+
306+
under development.
307+
308+
#### 11. PyTorch
309+
310+
under development.
311+
312+
#### 12. Keras
313+
314+
under development.
114315

115316

116317
## Contact me 🔗 👇

0 commit comments

Comments
 (0)