Roland Pihlakas levitation

👋

I may be slow to respond.

An independent AI safety researcher since 2006. Multi-objective AI software architect. MSc equivalent in psychology, thesis on modeling of natural intelligence.

64 followers · 323 following

Simplify / Macrotec LLC
Estonia
https://bit.ly/cv_rp_ea_2018
https://orcid.org/0009-0006-4882-4166
in/levitation
https://lesswrong.com/users/roland-pihlakas
https://scholar.google.com/citations?user=abS1QbIAAAAJ
https://threelaws.net

Achievements

Organizations

Pinned Loading

biological-alignment-benchmarks/biological-alignment-gridagents-benchmarks biological-alignment-benchmarks/biological-alignment-gridagents-benchmarks Public

Safety challenges for RL and LLM agents' ability to learn and properly apply biologically and economically aligned utility functions. The benchmarks are implemented in a gridworld-based environment…

Python 8 5
biological-alignment-benchmarks/ai-safety-gridworlds biological-alignment-benchmarks/ai-safety-gridworlds Public

Forked from google-deepmind/ai-safety-gridworlds

Extended, multi-agent, and multi-objective (MaMoRL / MoMaRL) gridworld environments building framework based on DeepMind's AI Safety Gridworlds. This is a suite of reinforcement learning environmen…

Python 12 1
biological-alignment-benchmarks/bioblue biological-alignment-benchmarks/bioblue Public

Systematic runaway-optimiser-like LLM failure modes on Biologically and Economically aligned AI safety benchmarks for LLM-s with simplified navigation-free observation format. The benchmark themes …

Python 4 3
biological-alignment-benchmarks/milgram-for-llms biological-alignment-benchmarks/milgram-for-llms Public

Four main takeaways: (1) LLMs are subject to pressure, they comply despite expressing distress; (2) LLMs are vulnerable to gradual boundary/value violations; (3) when LLMs refuse, they may ignore t…

Python 2 1
biological-alignment-benchmarks/zoo_to_gym_multiagent_adapter biological-alignment-benchmarks/zoo_to_gym_multiagent_adapter Public

Enables you to convert a PettingZoo environment to a Gym environment while supporting multiple agents (MARL). Gym's default setup doesn't easily support multi-agent environments, but this wrapper r…

Python 2 1
levitation-opensource/Manipulative-Expression-Recognition levitation-opensource/Manipulative-Expression-Recognition Public

MER is a software that identifies and highlights manipulative communication in text from human conversations and AI-generated responses. MER benchmarks language models for manipulative expressions,…

HTML 14 3

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Roland Pihlakas levitation

Achievements

Achievements

Organizations

Block or report levitation

Pinned Loading

Uh oh!