Skip to content

Commit e017ee2

Browse files
committed
update readme and readibility of the code
1 parent a98328c commit e017ee2

File tree

5 files changed

+29
-40
lines changed

5 files changed

+29
-40
lines changed

README.md

+2-29
Original file line numberDiff line numberDiff line change
@@ -113,7 +113,6 @@ The output will be a dictionary like the following:
113113
"contact_email": "contact@scrapegraphai.com"
114114
}
115115
```
116-
117116
There are other pipelines that can be used to extract information from multiple pages, generate Python scripts, or even generate audio files.
118117

119118
| Pipeline Name | Description |
@@ -125,6 +124,8 @@ There are other pipelines that can be used to extract information from multiple
125124
| SmartScraperMultiGraph | Multi-page scraper that extracts information from multiple pages given a single prompt and a list of sources. |
126125
| ScriptCreatorMultiGraph | Multi-page scraper that generates a Python script for extracting information from multiple pages and sources. |
127126

127+
For each of these graphs there is the multi version. It allows to make calls of the LLM in parallel.
128+
128129
It is possible to use different LLM through APIs, such as **OpenAI**, **Groq**, **Azure** and **Gemini**, or local models using **Ollama**.
129130

130131
Remember to have [Ollama](https://ollama.com/) installed and download the models using the **ollama pull** command, if you want to use local models.
@@ -167,34 +168,6 @@ Please see the [contributing guidelines](https://github.com/VinciGit00/Scrapegra
167168
[![My Skills](https://skillicons.dev/icons?i=linkedin)](https://www.linkedin.com/company/scrapegraphai/)
168169
[![My Skills](https://skillicons.dev/icons?i=twitter)](https://twitter.com/scrapegraphai)
169170

170-
## 🗺️ Roadmap
171-
172-
We are working on the following features! If you are interested in collaborating right-click on the feature and open in a new tab to file a PR. If you have doubts and wanna discuss them with us, just contact us on [discord](https://discord.gg/uJN7TYcpNa) or open a [Discussion](https://github.com/VinciGit00/Scrapegraph-ai/discussions) here on Github!
173-
174-
```mermaid
175-
%%{init: {'theme': 'base', 'themeVariables': { 'primaryColor': '#5C4B9B', 'edgeLabelBackground':'#ffffff', 'tertiaryColor': '#ffffff', 'primaryBorderColor': '#5C4B9B', 'fontFamily': 'Arial', 'fontSize': '16px', 'textColor': '#5C4B9B' }}}%%
176-
graph LR
177-
A[DeepSearch Graph] --> F[Use Existing Chromium Instances]
178-
F --> B[Page Caching]
179-
B --> C[Screenshot Scraping]
180-
C --> D[Handle Dynamic Content]
181-
D --> E[New Webdrivers]
182-
183-
style A fill:#ffffff,stroke:#5C4B9B,stroke-width:2px,rx:10,ry:10
184-
style F fill:#ffffff,stroke:#5C4B9B,stroke-width:2px,rx:10,ry:10
185-
style B fill:#ffffff,stroke:#5C4B9B,stroke-width:2px,rx:10,ry:10
186-
style C fill:#ffffff,stroke:#5C4B9B,stroke-width:2px,rx:10,ry:10
187-
style D fill:#ffffff,stroke:#5C4B9B,stroke-width:2px,rx:10,ry:10
188-
style E fill:#ffffff,stroke:#5C4B9B,stroke-width:2px,rx:10,ry:10
189-
190-
click A href "https://github.com/VinciGit00/Scrapegraph-ai/issues/260" "Open DeepSearch Graph Issue"
191-
click F href "https://github.com/VinciGit00/Scrapegraph-ai/issues/329" "Open Chromium Instances Issue"
192-
click B href "https://github.com/VinciGit00/Scrapegraph-ai/issues/197" "Open Page Caching Issue"
193-
click C href "https://github.com/VinciGit00/Scrapegraph-ai/issues/197" "Open Screenshot Scraping Issue"
194-
click D href "https://github.com/VinciGit00/Scrapegraph-ai/issues/279" "Open Handle Dynamic Content Issue"
195-
click E href "https://github.com/VinciGit00/Scrapegraph-ai/issues/171" "Open New Webdrivers Issue"
196-
```
197-
198171
## 📈 Telemetry
199172
We collect anonymous usage metrics to enhance our package's quality and user experience. The data helps us prioritize improvements and ensure compatibility. If you wish to opt-out, set the environment variable SCRAPEGRAPHAI_TELEMETRY_ENABLED=false. For more information, please refer to the documentation [here](https://scrapegraph-ai.readthedocs.io/en/latest/scrapers/telemetry.html).
200173

scrapegraphai/utils/cleanup_code.py

+3
Original file line numberDiff line numberDiff line change
@@ -4,6 +4,9 @@
44
import re
55

66
def extract_code(code: str) -> str:
7+
"""
8+
Module for extracting code
9+
"""
710
pattern = r'```(?:python)?\n(.*?)```'
811

912
match = re.search(pattern, code, re.DOTALL)

scrapegraphai/utils/cleanup_html.py

+1-1
Original file line numberDiff line numberDiff line change
@@ -101,7 +101,7 @@ def reduce_html(html, reduction):
101101
for attr in list(tag.attrs):
102102
if attr not in attrs_to_keep:
103103
del tag[attr]
104-
104+
105105
if reduction == 1:
106106
return minify_html(str(soup))
107107

scrapegraphai/utils/code_error_analysis.py

+12-5
Original file line numberDiff line numberDiff line change
@@ -2,24 +2,27 @@
22
This module contains the functions that are used to generate the prompts for the code error analysis.
33
"""
44
from typing import Any, Dict
5+
import json
56
from langchain.prompts import PromptTemplate
67
from langchain_core.output_parsers import StrOutputParser
7-
import json
88
from ..prompts import (
99
TEMPLATE_SYNTAX_ANALYSIS, TEMPLATE_EXECUTION_ANALYSIS,
1010
TEMPLATE_VALIDATION_ANALYSIS, TEMPLATE_SEMANTIC_ANALYSIS
1111
)
1212

1313
def syntax_focused_analysis(state: dict, llm_model) -> str:
14-
prompt = PromptTemplate(template=TEMPLATE_SYNTAX_ANALYSIS, input_variables=["generated_code", "errors"])
14+
prompt = PromptTemplate(template=TEMPLATE_SYNTAX_ANALYSIS,
15+
input_variables=["generated_code", "errors"])
1516
chain = prompt | llm_model | StrOutputParser()
1617
return chain.invoke({
1718
"generated_code": state["generated_code"],
1819
"errors": state["errors"]["syntax"]
1920
})
2021

2122
def execution_focused_analysis(state: dict, llm_model) -> str:
22-
prompt = PromptTemplate(template=TEMPLATE_EXECUTION_ANALYSIS, input_variables=["generated_code", "errors", "html_code", "html_analysis"])
23+
prompt = PromptTemplate(template=TEMPLATE_EXECUTION_ANALYSIS,
24+
input_variables=["generated_code", "errors",
25+
"html_code", "html_analysis"])
2326
chain = prompt | llm_model | StrOutputParser()
2427
return chain.invoke({
2528
"generated_code": state["generated_code"],
@@ -29,7 +32,9 @@ def execution_focused_analysis(state: dict, llm_model) -> str:
2932
})
3033

3134
def validation_focused_analysis(state: dict, llm_model) -> str:
32-
prompt = PromptTemplate(template=TEMPLATE_VALIDATION_ANALYSIS, input_variables=["generated_code", "errors", "json_schema", "execution_result"])
35+
prompt = PromptTemplate(template=TEMPLATE_VALIDATION_ANALYSIS,
36+
input_variables=["generated_code", "errors",
37+
"json_schema", "execution_result"])
3338
chain = prompt | llm_model | StrOutputParser()
3439
return chain.invoke({
3540
"generated_code": state["generated_code"],
@@ -39,7 +44,9 @@ def validation_focused_analysis(state: dict, llm_model) -> str:
3944
})
4045

4146
def semantic_focused_analysis(state: dict, comparison_result: Dict[str, Any], llm_model) -> str:
42-
prompt = PromptTemplate(template=TEMPLATE_SEMANTIC_ANALYSIS, input_variables=["generated_code", "differences", "explanation"])
47+
prompt = PromptTemplate(template=TEMPLATE_SEMANTIC_ANALYSIS,
48+
input_variables=["generated_code",
49+
"differences", "explanation"])
4350
chain = prompt | llm_model | StrOutputParser()
4451
return chain.invoke({
4552
"generated_code": state["generated_code"],

scrapegraphai/utils/code_error_correction.py

+11-5
Original file line numberDiff line numberDiff line change
@@ -10,32 +10,38 @@
1010
)
1111

1212
def syntax_focused_code_generation(state: dict, analysis: str, llm_model) -> str:
13-
prompt = PromptTemplate(template=TEMPLATE_SYNTAX_CODE_GENERATION, input_variables=["analysis", "generated_code"])
13+
prompt = PromptTemplate(template=TEMPLATE_SYNTAX_CODE_GENERATION,
14+
input_variables=["analysis", "generated_code"])
1415
chain = prompt | llm_model | StrOutputParser()
1516
return chain.invoke({
1617
"analysis": analysis,
1718
"generated_code": state["generated_code"]
1819
})
1920

2021
def execution_focused_code_generation(state: dict, analysis: str, llm_model) -> str:
21-
prompt = PromptTemplate(template=TEMPLATE_EXECUTION_CODE_GENERATION, input_variables=["analysis", "generated_code"])
22+
prompt = PromptTemplate(template=TEMPLATE_EXECUTION_CODE_GENERATION,
23+
input_variables=["analysis", "generated_code"])
2224
chain = prompt | llm_model | StrOutputParser()
2325
return chain.invoke({
2426
"analysis": analysis,
2527
"generated_code": state["generated_code"]
2628
})
2729

2830
def validation_focused_code_generation(state: dict, analysis: str, llm_model) -> str:
29-
prompt = PromptTemplate(template=TEMPLATE_VALIDATION_CODE_GENERATION, input_variables=["analysis", "generated_code", "json_schema"])
31+
prompt = PromptTemplate(template=TEMPLATE_VALIDATION_CODE_GENERATION,
32+
input_variables=["analysis", "generated_code",
33+
"json_schema"])
3034
chain = prompt | llm_model | StrOutputParser()
3135
return chain.invoke({
3236
"analysis": analysis,
3337
"generated_code": state["generated_code"],
3438
"json_schema": state["json_schema"]
3539
})
36-
40+
3741
def semantic_focused_code_generation(state: dict, analysis: str, llm_model) -> str:
38-
prompt = PromptTemplate(template=TEMPLATE_SEMANTIC_CODE_GENERATION, input_variables=["analysis", "generated_code", "generated_result", "reference_result"])
42+
prompt = PromptTemplate(template=TEMPLATE_SEMANTIC_CODE_GENERATION,
43+
input_variables=["analysis", "generated_code",
44+
"generated_result", "reference_result"])
3945
chain = prompt | llm_model | StrOutputParser()
4046
return chain.invoke({
4147
"analysis": analysis,

0 commit comments

Comments
 (0)