phasellm.agents#

Agents to help with workflows.

Module Contents#

Classes#

Agent

Abstract class for an agent.

CodeExecutionAgent

Creates a new CodeExecutionAgent.

ExecCommands

Typed version of namedtuple.

SandboxedCodeExecutionAgent

Creates a new SandboxedCodeExecutionAgent.

EmailSenderAgent

Create an EmailSenderAgent.

NewsSummaryAgent

Create a NewsSummaryAgent.

WebpageAgent

Create a WebpageAgent.

WebSearchResult

This dataclass represents a single search result.

WebSearchAgent

Create a WebSearchAgent.

RSSAgent

Create a RSSAgent

Functions#

stdout_io([stdout])

Used to hijack printing to screen so we can save the Python code output for the LLM (or any other arbitrary code).

class phasellm.agents.Agent(name: str = '')#

Bases: abc.ABC

Abstract class for an agent.

Parameters:

name – The name of the agent.

__repr__()#

Return repr(self).

phasellm.agents.stdout_io(stdout=None)#

Used to hijack printing to screen so we can save the Python code output for the LLM (or any other arbitrary code).

Parameters:

stdout – The stdout to use.

class phasellm.agents.CodeExecutionAgent(name: str = '')#

Bases: Agent

Creates a new CodeExecutionAgent.

This agent is NOT sandboxed and should only be used for trusted code.

Parameters:

name – The name of the agent.

__repr__()#

Return repr(self).

static execute_code(code: str, globals=None, locals=None) str#

Executes arbitrary Python code and saves the output (or error!) to a variable.

Returns the variable and a boolean (is_error) depending on whether an error took place.

Parameters:
  • code – Python code to execute.

  • globals – python globals

  • locals – python locals

Returns:

The logs from the code execution.

class phasellm.agents.ExecCommands#

Bases: NamedTuple

Typed version of namedtuple.

Usage in Python versions >= 3.6:

class Employee(NamedTuple):
    name: str
    id: int

This is equivalent to:

Employee = collections.namedtuple('Employee', ['name', 'id'])

The resulting class has an extra __annotations__ attribute, giving a dict that maps field names to types. (The field names are also in the _fields attribute, which is part of the namedtuple API.) Alternative equivalent keyword syntax is also accepted:

Employee = NamedTuple('Employee', name=str, id=int)

In Python versions <= 3.5 use:

Employee = NamedTuple('Employee', [('name', str), ('id', int)])
requirements: str#
python: str#
class phasellm.agents.SandboxedCodeExecutionAgent(name: str = '', docker_image: str = 'python:3', scratch_dir: pathlib.Path | str = None, module_package_mappings: Dict[str, str] = None)#

Bases: Agent

Creates a new SandboxedCodeExecutionAgent.

This agent is for executing arbitrary code in a sandboxed environment. We choose to use docker for this, so if you’re running this code, you’ll need to have docker installed and running.

Examples

>>> from typing import Generator
>>> from phasellm.agents import SandboxedCodeExecutionAgent
Managing the docker client yourself:
>>> agent = SandboxedCodeExecutionAgent()
>>> logs = agent.execute_code('print("Hello World!")')
>>> for log in logs:
...     print(log)
Hello World!
>>> agent.close()
Using the context manager:
>>> with SandboxedCodeExecutionAgent() as agent:
...     logs: Generator = agent.execute_code('print("Hello World!")')
...     for log in logs:
...         print(log)
Hello World!
Code with custom packages is possible! Note that the package must exist in the module_package_mappings dictionary:
>>> module_package_mappings = {
...     "numpy": "numpy"
...}
>>> with SandboxedCodeExecutionAgent(module_package_mappings=module_package_mappings) as agent:
...     logs = agent.execute_code('import numpy as np; print(np.__version__)')
...     for log in logs:
...         print(log)
1.24.3
Disable log streaming (waits for code to finish executing before returning logs):
>>> with SandboxedCodeExecutionAgent() as agent:
...     logs = agent.execute_code('print("Hello World!")', stream=False)
...     print(logs)
Hello World!
Custom docker image:
>>> with SandboxedCodeExecutionAgent(docker_image='python:3.7') as agent:
Hello World!
Custom scratch directory:
>>> with SandboxedCodeExecutionAgent(scratch_dir='my_dir') as agent:
Stop container after each call to agent.execute_code()
>>> with SandboxedCodeExecutionAgent() as agent:
...     logs = agent.execute_code('print("Hello 1")', auto_stop_container=True)
...     assert agent._container is None
...     logs = agent.execute_code('print("Hello 2")', auto_stop_container=True)
...     assert agent._container is None
Parameters:
  • name – Name of the agent.

  • docker_image – Docker image to use for the sandboxed environment.

  • scratch_dir – Scratch directory to use for copying files (bind mounting) to the sandboxed environment.

  • module_package_mappings – Dictionary of module to package mappings. This is used to determine

  • environment. (which packages are allowed to be installed in the sandboxed) –

CODE_FILENAME = 'sandbox_code.py'#
__repr__()#

Return repr(self).

__enter__()#

Runs When entering the context manager.

Returns:

SandboxedCodeExecutionAgent()

__exit__(exc_type, exc_val, exc_tb)#

Runs when exiting the context manager.

Parameters:
  • exc_type – The exception type.

  • exc_val – The exception value.

  • exc_tb – The exception traceback.

_ping_client() None#

Pings the docker client to make sure it’s running.

Raises:

docker.errors.APIError – If the server returns an error.

_create_scratch_dir() None#

Creates the scratch directory if it doesn’t exist.

_write_code_file(code: str) None#

Writes the code to a file in the scratch directory.

Parameters:

code – The code string to write to the file.

Returns:

_write_requirements_file(packages: List[str]) None#

Writes a requirements.txt file to the scratch directory.

Parameters:

packages – List of packages to write to the requirements.txt file.

Returns:

_modules_to_packages(code: str) List[str]#

Scans the code for modules and maps them to a package. If no package is specified in the mapping whitelist, then the package is ignored.

Parameters:

code – The code to scan for modules.

Returns:

A list of packages to install in the sandboxed environment.

_prep_commands(packages: List[str]) ExecCommands#

Prepares the commands to be run in the docker container.

Parameters:

packages – List of packages to install in the docker container.

Returns:

A tuple containing the requirements command and the python command in the form (requirements_command, python_command).

static _handle_exec_errors(output: str, exit_code: int, code: str) None#

Handles errors that occur during code execution.

Parameters:
  • output – The output of the code execution.

  • exit_code – The exit code of the code execution.

  • code – The code that was executed.

Returns:

_execute(code: str, auto_stop_container: bool) Generator#

Starts the container, installs packages defined in the code (if they are provided in the module_package_mappings), and executes the provided code inside the container.

Parameters:
  • code – The code string to execute.

  • auto_stop_container – Whether to automatically stop the container after execution.

Returns:

A Generator that yields the stdout and stderr of the code execution.

close() None#

Stops all containers and closes client sessions. This should be called when you’re done using the agent.

This method automatically runs when exiting the context manager. If you do not use a context manager, you should call this method manually.

Returns:

start_container() None#

Starts the docker container.

Returns:

stop_container() None#

Stops the docker container and removes it, if it exists.

Returns:

execute_code(code: str, stream: bool = True, auto_stop_container: bool = False) str | Generator#

Executes the provided code inside a sandboxed container.

Parameters:
  • code – The code string to execute.

  • stream – Whether to stream the output of the code execution.

  • auto_stop_container – Whether to automatically stop the container after the code execution.

Returns:

A string output of the whole code execution stdout and stderr if stream is False, otherwise a Generator that yields the stdout and stderr of the code execution.

class phasellm.agents.EmailSenderAgent(sender_name: str, smtp: str, sender_address: str, password: str, port: int, name: str = '')#

Bases: Agent

Create an EmailSenderAgent.

Sends emails via an SMPT server.

Parameters:
  • sender_name – Name of the sender (i.e., “Wojciech”)

  • smtp – The smtp server (e.g., smtp.gmail.com)

  • sender_address – The sender’s email address

  • password – The password for the email account

  • port – The port used by the SMTP server

  • name – The name of the agent (optional)

__repr__()#

Return repr(self).

sendPlainEmail(recipient_email: str, subject: str, content: str) None#

DEPRECATED: see send_plain_email

Parameters:
  • recipient_email – The person receiving the email

  • subject – Email subject

  • content – The plain text context for the email

send_plain_email(recipient_email: str, subject: str, content: str) None#

Sends an email encoded as plain text.

Parameters:
  • recipient_email – The person receiving the email

  • subject – Email subject

  • content – The plain text context for the email

class phasellm.agents.NewsSummaryAgent(apikey: str = None, name: str = '')#

Bases: Agent

Create a NewsSummaryAgent.

Takes a query, calls the API, and gets news articles.

Parameters:
  • apikey – The API key for newsapi.org

  • name – The name of the agent (optional)

__repr__()#

Return repr(self).

getQuery(query: str, days_back: int = 1, include_descriptions: bool = True, max_articles: int = 25) str#

DEPRECATED: see get_query

Parameters:
  • query – What keyword to look for in news articles

  • days_back – How far back we go with the query

  • include_descriptions – Will include article descriptions as well as titles; otherwise only titles

  • max_articles – How many articles to include in the summary

Returns:

A news summary string

get_query(query: str, days_back: int = 1, include_descriptions: bool = True, max_articles: int = 25) str#

Gets all articles for a query for the # of days back. Returns a String with all the information so that an LLM can summarize it. Note that obtaining too many articles will likely cause an issue with prompt length.

Parameters:
  • query – What keyword to look for in news articles

  • days_back – How far back we go with the query

  • include_descriptions – Will include article descriptions as well as titles; otherwise only titles

  • max_articles – How many articles to include in the summary

Returns:

A news summary string

class phasellm.agents.WebpageAgent(name: str = '')#

Bases: Agent

Create a WebpageAgent.

This agent helps you scrape webpages.

Examples

>>> from phasellm.agents import WebpageAgent
Use default parameters:
>>> agent = WebpageAgent()
>>> text = agent.scrape('https://10millionsteps.com/ai-inflection')
Keep html tags:
>>> agent = WebpageAgent()
>>> text = agent.scrape('https://10millionsteps.com/ai-inflection', text_only=False, body_only=False)
Keep html tags, but only return body content:
>>> agent = WebpageAgent()
>>> text = agent.scrape('https://10millionsteps.com/ai-inflection', text_only=False, body_only=True)
Use a headless browser to enable scraping of dynamic content:
>>> agent = WebpageAgent()
>>> text = agent.scrape('https://10millionsteps.com/ai-inflection', text_only=False, body_only=True,
...                     use_browser=True)
Pass custom headers:
>>> agent = WebpageAgent()
>>> headers = {'Example': 'header'}
>>> text = agent.scrape('https://10millionsteps.com/ai-inflection', headers=headers)
Wait for a selector to load (useful for dynamic content, only works when use_browser=True):
>>> agent = WebpageAgent()
>>> text = agent.scrape('https://10millionsteps.com/ai-inflection', use_browser=True,
...                     wait_for_selector='#dynamic')
Parameters:

name – The name of the agent (optional)

__repr__()#

Return repr(self).

static _validate_url(url: str) None#

This method validates that a url can be used by the agent.

static _handle_errors(res: requests.Response) None#

This method handles errors that occur during a request.

Parameters:

res – The response from the request.

static _parse_html(html: str, text_only: bool = True, body_only: bool = False) str#

This method parses the given html string.

Parameters:
  • html – The html to parse.

  • text_only – If True, only the text of the webpage is returned. If False, the entire HTML is returned.

  • body_only – If True, only the body of the webpage is returned. If False, the entire HTML is returned.

Returns:

The string containing the webpage text or html.

static _prep_headers(headers: Dict = None) Dict#

This method prepares the headers for a request. It fills in missing headers with default values. It also adds a fake user agent to reduce the likelihood of being blocked.

Parameters:

headers – The headers to use for the request.

Returns:

The headers to use for the request.

_scrape_html(url: str, headers: Dict = None) str#

This method scrapes a webpage and returns a string containing the html of the webpage.

Parameters:
  • url – The URL of the webpage to scrape.

  • headers – A dictionary of headers to use for the request.

Returns:

A string containing the html of the webpage.

static _scrape_html_and_js(url: str, headers: Dict, wait_for_selector: str = None) str#

This method scrapes a webpage and returns a string containing the html of the webpage. It uses a headless browser to render the webpage and execute javascript.

Parameters:
  • url – The URL of the webpage to scrape.

  • headers – A dictionary of headers to use for the request.

  • wait_for_selector – The selector to wait for before returning the HTML. Useful for when you know something

  • page (should be on the) –

  • javascript. (but it is not there yet since it needs to be rendered by) –

Returns:

A string containing the html of the webpage.

scrape(url: str, headers: Dict = None, use_browser: bool = False, wait_for_selector: str = None, text_only: bool = True, body_only: bool = True) str#

This method scrapes a webpage and returns a string containing the html or text of the webpage.

Parameters:
  • url – The URL of the webpage to scrape.

  • headers – A dictionary of headers to use for the request.

  • use_browser – If True, the webpage is rendered using a headless browser, allowing javascript to run and hydrate the page. If False, the webpage is scraped as-is.

  • wait_for_selector – The selector to wait for before returning the HTML. Useful for when you know something should be on the page, but it is not there yet since it needs to be rendered by javascript. Only used when use_browser is True.

  • text_only – If True, only the text of the webpage is returned. If False, the entire HTML is returned.

  • body_only – If True, only the body of the webpage is returned. If False, the entire HTML is returned.

Returns:

A string containing the text of the webpage.

class phasellm.agents.WebSearchResult#

This dataclass represents a single search result.

title: str#
url: str#
description: str#
content: str#
class phasellm.agents.WebSearchAgent(name: str = '', api_key: str = None, rate_limit: float = 1, text_only: bool = True, body_only: bool = True, use_browser: bool = False, wait_for_selector: str = None)#

Bases: Agent

Create a WebSearchAgent.

This agent helps you search the web using a web search API. Currently, the agent supports Google and Brave.

Examples

>>> from phasellm.agents import WebSearchAgent
Search with Google:
>>> agent = WebSearchAgent(
...     name='Google Search Agent',
...     api_key='YOUR_API_KEY'
... )
>>> results = agent.search_google(
...     query='test'
...     custom_search_engine_id='YOUR_CUSTOM_SEARCH_ENGINE_ID'
... )
Search with Brave:
>>> agent = WebSearchAgent(
...     name='Brave Search Agent',
...     api_key='YOUR_API_KEY'
... )
>>> results = agent.search_brave(query='test')
Iterate over the results:
>>> for result in results:
...     print(result.title)
...     print(result.url)
...     print(result.description)
...     print(result.content)
Parameters:
  • name – The name of the agent (optional).

  • api_key – The API key to use for the search engine.

  • rate_limit – The number of seconds to wait between requests for webpage content.

  • text_only – If True, only the text of the webpage is returned. If False, the entire HTML is returned.

  • body_only – If True, only the body of the webpage is returned. If False, the entire HTML is returned.

  • use_browser – If True, the webpage is rendered using a headless browser, allowing javascript to run and hydrate the page. If False, the webpage is scraped as-is.

  • wait_for_selector – The selector to wait for before returning the HTML. Useful for when you know something should be on the page, but it is not there yet since it needs to be rendered by javascript. Only used if use_browser is True.

__repr__()#

Return repr(self).

static _prepare_url(base_url: str, params: Dict) str#

This method prepares a URL for a request.

Parameters:
  • base_url – The base url.

  • params – A dictionary of parameters to use for the request.

Returns:

The prepared URL.

static _handle_errors(res: requests.Response) None#

This method handles errors that occur during a request.

Parameters:

res – The response from the request.

_send_request(base_url: str, headers: Dict = None, params: Dict = None) Dict#

This method sends a request to a URL.

Parameters:
  • base_url – The base URL to send the request to.

  • headers – A dictionary of headers to use for the request.

  • params – A dictionary of parameters to use for the request.

Returns:

The response from the request.

search_brave(query: str, **kwargs) List[WebSearchResult]#

This method performs a web search using Brave.

Get an API key here (credit card required): https://api.search.brave.com/register

Parameters:
  • query – The query to search for.

  • **kwargs – Additional parameters to pass to the API.

Returns:

A list of WebSearchResult objects.

search_google(query: str, custom_search_engine_id: str = None, **kwargs) List[WebSearchResult]#

This method performs a web search using Google.

Get an API key here: https://developers.google.com/custom-search/v1/overview

You must create a custom search engine and pass its ID. To create or view custom search engines, visit: https://programmablesearchengine.google.com/u/1/controlpanel/all

Parameters:
  • query – The search query.

  • custom_search_engine_id – The ID of the custom search engine to use.

  • **kwargs – Any additional keyword arguments to pass to the API.

Returns:

A list of WebSearchResult objects.

class phasellm.agents.RSSAgent(name: str = '', url: str = None, **kwargs)#

Bases: Agent

Create a RSSAgent

This agent helps you read data from RSS feeds.

Parameters:
  • name – The name of the agent.

  • url – The URL of the RSS feed.

  • **kwargs – Any additional keyword arguments to pass to feedparser.parse(). You may need to pass a user agent header or other headers for some RSS feeds. See https://feedparser.readthedocs.io/en/latest/http.html.

Examples

Read an RSS feed once, passing a user agent header:
>>> from phasellm.agents import RSSAgent
>>> agent = RSSAgent(url='https://arxiv.org/rss/cs', agent="it's me!")
>>> data = agent.read()
Poll the arXiv CS RSS feed every 60 seconds:
>>> from phasellm.agents import RSSAgent
>>> agent = RSSAgent(url='https://arxiv.org/rss/cs')
>>> with agent.poll(interval=60) as poller:
>>>     for data in poller():
>>>         print(data)
Poll the arXiv CS RSS feed every 60 seconds and stop after 5 minutes:
>>> from phasellm.agents import RSSAgent
>>> agent = RSSAgent(url='https://arxiv.org/rss/cs')
>>> def poll_helper(p: Callable[[], Generator[List[Dict], None, None]]):
>>>     for data in poller():
>>>         print(data)
>>> with agent.poll(interval=60) as poller:
>>>     t = Thread(target=poll_helper, kwargs={'p': poller})
>>>     t.start()
>>>     time.sleep(300)
>>> t.join()
Poll and print the data and polling time after each update is received.
>>> from phasellm.agents import RSSAgent
>>> agent = RSSAgent(url='https://arxiv.org/rss/cs')
>>> with agent.poll(interval=60) as poller:
>>>     for data in poller():
>>>         print(f'data: {data}')
>>>         print(f'polling time: {agent.poll_time}')
property poll_time: datetime.timedelta#

This method calculates the amount of time the agent has been polling.

Returns:

A timedelta object.

__repr__()#

Return repr(self).

static _yield_data(queue: queue.Queue) Generator[List[Dict], None, None]#

This method is responsible for yielding data from the queue. It stops generating when it receives None.

Parameters:

queue – The queue to yield data from.

Returns:

A generator that yields data from the queue.

_poll_thread(queue: queue.Queue, interval: int = 60) None#

This method is responsible for polling the RSS feed and putting new data in the queue.

Parameters:
  • queue – The queue to put data in.

  • interval – The number of seconds to wait between polls.

read() List[Dict]#

This method reads data from an RSS feed.

Returns:

A list of dictionaries containing the data from the RSS feed.

poll(interval: int = 60) Generator[Callable[[], Generator[List[str], None, None]], None, None]#

This method polls an RSS feed for new data.

Parameters:

interval – The number of seconds to wait between polls.

Returns:

A generator that yields a list of dictionaries containing the data from the RSS feed.