CSV Data Agent with DeepSeek-R1

Overview

This project demonstrates how to interact with CSV data using a DeepSeek-R1-Distill-Qwen-32B. It integrates a streaming LLM from Hugging Face to process natural language queries on a pandas DataFrame. The project leverages langchain, pandas, and huggingface_hub to enable seamless data interaction.

Features

Uses a Hugging Face-hosted LLM (deepseek-ai/DeepSeek-R1-Distill-Qwen-32B) for processing queries.
Parses structured output from the LLM.
Integrates langchain to create a pandas DataFrame agent.
Allows querying CSV data using natural language prompts.
Supports structured parsing for responses.

Installation

Clone this repository:

git clone https://github.com/DataByteSun/CSV-Data-Agent-with-DeepSeek-R1.git

Create and activate a virtual environment (optional but recommended):

python -m venv venv
source venv/bin/activate   # On Windows use: venv\Scripts\activate

Install dependencies:

pip install -r requirements.txt

Set up your environment variables:

Create a .env file in the project directory and add your Hugging Face API key(Get Free key from Huggung Face):
```
HUGGINGFACE_API_KEY=your_huggingface_api_key
```

Load environment variables:

from dotenv import load_dotenv
load_dotenv()

Usage

Load and preprocess your CSV data:

import pandas as pd
df = pd.read_csv("your_data.csv")

Instantiate the LLM and create an agent:

from huggingface_hub import InferenceClient

client = InferenceClient(api_key="your_huggingface_api_key")
agent = create_pandas_dataframe_agent(llm=client, df=df, verbose=True)

Query the agent using natural language:

result = agent.invoke("how many rows are there?")

Expected Output (Screenshots)

Running agent.invoke("how many rows are there?") may yield an output like:
With a Prompt Engineering Question: How may patients were hospitalized during Mar 2021 in Alaska use column hospitalizedCumulative?

⚠️ Warning: This code includes experimental components that may pose risks. Ensure thorough testing in a sandboxed environment to avoid potential vulnerabilities or data loss.

Customization

Modify model_name in the Hugging Face API request to experiment with different LLMs.

Adjust parameters such as temperature, max_tokens, and top_p for fine-tuning responses.

Contributions

Feel free to contribute by opening an issue or submitting a pull request.

Contact

For questions or feedback, reach out to surajpawar.in@gmail.com.

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
.gitignore		.gitignore
CSV_Agent.ipynb		CSV_Agent.ipynb
README.md		README.md
all-states-history.csv		all-states-history.csv
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CSV Data Agent with DeepSeek-R1

Overview

Features

Installation

Usage

Expected Output (Screenshots)

Customization

Contributions

Contact

About

Releases

Packages

Languages

DataByteSun/CSV-Data-Agent-with-DeepSeek-R1

Folders and files

Latest commit

History

Repository files navigation

CSV Data Agent with DeepSeek-R1

Overview

Features

Installation

Usage

Expected Output (Screenshots)

Customization

Contributions

Contact

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages