Skip to content
View levitation's full-sized avatar
👋
I may be slow to respond.
👋
I may be slow to respond.

Organizations

@levitation-opensource @biological-alignment-benchmarks

Block or report levitation

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse

Pinned Loading

  1. biological-alignment-benchmarks/biological-alignment-gridagents-benchmarks biological-alignment-benchmarks/biological-alignment-gridagents-benchmarks Public

    Safety challenges for RL and LLM agents' ability to learn and properly apply biologically and economically aligned utility functions. The benchmarks are implemented in a gridworld-based environment…

    Python 8 5

  2. biological-alignment-benchmarks/ai-safety-gridworlds biological-alignment-benchmarks/ai-safety-gridworlds Public

    Forked from google-deepmind/ai-safety-gridworlds

    Extended, multi-agent, and multi-objective (MaMoRL / MoMaRL) gridworld environments building framework based on DeepMind's AI Safety Gridworlds. This is a suite of reinforcement learning environmen…

    Python 12 1

  3. biological-alignment-benchmarks/bioblue biological-alignment-benchmarks/bioblue Public

    Systematic runaway-optimiser-like LLM failure modes on Biologically and Economically aligned AI safety benchmarks for LLM-s with simplified navigation-free observation format. The benchmark themes …

    Python 4 3

  4. biological-alignment-benchmarks/milgram-for-llms biological-alignment-benchmarks/milgram-for-llms Public

    Four main takeaways: (1) LLMs are subject to pressure, they comply despite expressing distress; (2) LLMs are vulnerable to gradual boundary/value violations; (3) when LLMs refuse, they may ignore t…

    Python 2 1

  5. biological-alignment-benchmarks/zoo_to_gym_multiagent_adapter biological-alignment-benchmarks/zoo_to_gym_multiagent_adapter Public

    Enables you to convert a PettingZoo environment to a Gym environment while supporting multiple agents (MARL). Gym's default setup doesn't easily support multi-agent environments, but this wrapper r…

    Python 2 1

  6. levitation-opensource/Manipulative-Expression-Recognition levitation-opensource/Manipulative-Expression-Recognition Public

    MER is a software that identifies and highlights manipulative communication in text from human conversations and AI-generated responses. MER benchmarks language models for manipulative expressions,…

    HTML 14 3