Skip to content

Commit 11ae717

Browse files
committed
add new doc
1 parent ec9ef2b commit 11ae717

File tree

3 files changed

+71
-1
lines changed

3 files changed

+71
-1
lines changed

docs/source/introduction/overview.rst

+40-1
Original file line numberDiff line numberDiff line change
@@ -22,6 +22,45 @@ This flexibility ensures that scrapers remain functional even when website layou
2222
We support many LLMs including **GPT, Gemini, Groq, Azure, Hugging Face** etc.
2323
as well as local models which can run on your machine using **Ollama**.
2424

25+
AI Models and Token Limits
26+
==========================
27+
28+
ScrapGraphAI supports a wide range of AI models from various providers. Each model has a specific token limit, which is important to consider when designing your scraping pipelines. Here's an overview of the supported models and their token limits:
29+
30+
OpenAI Models
31+
-------------
32+
- GPT-3.5 Turbo (16,385 tokens)
33+
- GPT-4 (8,192 tokens)
34+
- GPT-4 Turbo Preview (128,000 tokens)
35+
36+
Azure OpenAI Models
37+
-------------------
38+
- GPT-3.5 Turbo (16,385 tokens)
39+
- GPT-4 (8,192 tokens)
40+
- GPT-4 Turbo Preview (128,000 tokens)
41+
42+
Google AI Models
43+
----------------
44+
- Gemini Pro (128,000 tokens)
45+
- Gemini 1.5 Pro (128,000 tokens)
46+
47+
Anthropic Models
48+
----------------
49+
- Claude Instant (100,000 tokens)
50+
- Claude 2 (200,000 tokens)
51+
- Claude 3 (200,000 tokens)
52+
53+
Mistral AI Models
54+
-----------------
55+
- Mistral Large (128,000 tokens)
56+
- Open Mistral 7B (32,000 tokens)
57+
- Open Mixtral 8x7B (32,000 tokens)
58+
59+
For a complete list of supported models and their token limits, please refer to the API documentation.
60+
61+
Understanding token limits is crucial for optimizing your scraping tasks. Larger token limits allow for processing more text in a single API call, which can be beneficial for scraping lengthy web pages or documents.
62+
63+
2564
Library Diagram
2665
===============
2766

@@ -95,4 +134,4 @@ Sponsors
95134
.. image:: ../../assets/transparent_stat.png
96135
:width: 15%
97136
:alt: Stat Proxies
98-
:target: https://dashboard.statproxies.com/?refferal=scrapegraph
137+
:target: https://dashboard.statproxies.com/?refferal=scrapegraph

docs/source/modules/modules.rst

+3
Original file line numberDiff line numberDiff line change
@@ -5,3 +5,6 @@ scrapegraphai
55
:maxdepth: 4
66

77
scrapegraphai
8+
9+
scrapegraphai.helpers.models_tokens
10+
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,28 @@
1+
scrapegraphai.helpers.models_tokens module
2+
==========================================
3+
4+
.. automodule:: scrapegraphai.helpers.models_tokens
5+
:members:
6+
:undoc-members:
7+
:show-inheritance:
8+
9+
This module contains a comprehensive dictionary of AI models and their corresponding token limits. The `models_tokens` dictionary is organized by provider (e.g., OpenAI, Azure OpenAI, Google AI, etc.) and includes various models with their maximum token counts.
10+
11+
Example usage:
12+
13+
.. code-block:: python
14+
15+
from scrapegraphai.helpers.models_tokens import models_tokens
16+
17+
# Get the token limit for GPT-4
18+
gpt4_limit = models_tokens['openai']['gpt-4']
19+
print(f"GPT-4 token limit: {gpt4_limit}")
20+
21+
# Check the token limit for a specific model
22+
model_name = "gpt-3.5-turbo"
23+
if model_name in models_tokens['openai']:
24+
print(f"{model_name} token limit: {models_tokens['openai'][model_name]}")
25+
else:
26+
print(f"{model_name} not found in the models list")
27+
28+
This information is crucial for users to understand the capabilities and limitations of different AI models when designing their scraping pipelines.

0 commit comments

Comments
 (0)