add new doc

VinciGit00 · VinciGit00 · commit 11ae717623a3 · 2024-10-21T11:16:29.000+02:00
diff --git a/docs/source/introduction/overview.rst b/docs/source/introduction/overview.rst
@@ -22,6 +22,45 @@ This flexibility ensures that scrapers remain functional even when website layou
 We support many LLMs including **GPT, Gemini, Groq, Azure, Hugging Face** etc.
 as well as local models which can run on your machine using **Ollama**.
 
+AI Models and Token Limits
+==========================
+
+ScrapGraphAI supports a wide range of AI models from various providers. Each model has a specific token limit, which is important to consider when designing your scraping pipelines. Here's an overview of the supported models and their token limits:
+
+OpenAI Models
+-------------
+- GPT-3.5 Turbo (16,385 tokens)
+- GPT-4 (8,192 tokens)
+- GPT-4 Turbo Preview (128,000 tokens)
+
+Azure OpenAI Models
+-------------------
+- GPT-3.5 Turbo (16,385 tokens)
+- GPT-4 (8,192 tokens)
+- GPT-4 Turbo Preview (128,000 tokens)
+
+Google AI Models
+----------------
+- Gemini Pro (128,000 tokens)
+- Gemini 1.5 Pro (128,000 tokens)
+
+Anthropic Models
+----------------
+- Claude Instant (100,000 tokens)
+- Claude 2 (200,000 tokens)
+- Claude 3 (200,000 tokens)
+
+Mistral AI Models
+-----------------
+- Mistral Large (128,000 tokens)
+- Open Mistral 7B (32,000 tokens)
+- Open Mixtral 8x7B (32,000 tokens)
+
+For a complete list of supported models and their token limits, please refer to the API documentation.
+
+Understanding token limits is crucial for optimizing your scraping tasks. Larger token limits allow for processing more text in a single API call, which can be beneficial for scraping lengthy web pages or documents.
+
+
 Library Diagram
 ===============
 
@@ -95,4 +134,4 @@ Sponsors
 .. image:: ../../assets/transparent_stat.png
    :width: 15%
    :alt: Stat Proxies
-   :target: https://dashboard.statproxies.com/?refferal=scrapegraph
+   :target: https://dashboard.statproxies.com/?refferal=scrapegraph
diff --git a/docs/source/modules/modules.rst b/docs/source/modules/modules.rst
@@ -5,3 +5,6 @@ scrapegraphai
    :maxdepth: 4
 
    scrapegraphai
+
+   scrapegraphai.helpers.models_tokens
+
diff --git a/docs/source/modules/scrapegraphai.helpers.models_tokens.rst b/docs/source/modules/scrapegraphai.helpers.models_tokens.rst
@@ -0,0 +1,28 @@
+scrapegraphai.helpers.models_tokens module
+==========================================
+
+.. automodule:: scrapegraphai.helpers.models_tokens
+   :members:
+   :undoc-members:
+   :show-inheritance:
+
+This module contains a comprehensive dictionary of AI models and their corresponding token limits. The `models_tokens` dictionary is organized by provider (e.g., OpenAI, Azure OpenAI, Google AI, etc.) and includes various models with their maximum token counts.
+
+Example usage:
+
+.. code-block:: python
+
+   from scrapegraphai.helpers.models_tokens import models_tokens
+
+   # Get the token limit for GPT-4
+   gpt4_limit = models_tokens['openai']['gpt-4']
+   print(f"GPT-4 token limit: {gpt4_limit}")
+
+   # Check the token limit for a specific model
+   model_name = "gpt-3.5-turbo"
+   if model_name in models_tokens['openai']:
+       print(f"{model_name} token limit: {models_tokens['openai'][model_name]}")
+   else:
+       print(f"{model_name} not found in the models list")
+
+This information is crucial for users to understand the capabilities and limitations of different AI models when designing their scraping pipelines.