Skip to content

harrypm/IA-Interact

Repository files navigation

Internet Archive Interact

An interactive command-line tool for managing Internet Archive repositories.

Use this script to list files, upload files, download files, delete files, move files, and create new repositories with detailed metadata input.

Table of Contents

Features

  • Interactive Menu: Choose options to list files, upload files, download files, delete or move files, or create a new repository.
  • Basic GUI Mode: Includes a login page for S3 keys, repository file list selection for download, and local file list selection for upload.
  • Single Launcher: One executable supports both modes (CLI by default, GUI via --gui).
  • Flexible Repository Input: Accepts a plain identifier (for example my_item) or Archive URLs such as /details/, /download/, and /metadata/.
  • Test Mode & Permanent Mode: Run in simulation (Test Mode, where no changes are made) or execute actual changes (Permanent Mode).
  • Metadata Support: Input metadata including title, description, creator, date, language, license URL, collection, subject tags, and test item status.
  • Collection Options: Supports collections such as community, opensource, texts, movies, audio, image, etree, folksoundomy, games, and software.
  • Progress Bars: Uses tqdm to display file upload progress.
  • S3 Authentication: Uses S3 access keys (set as environment variables) for secure communication with the Internet Archive.

Prerequisites

Before you begin, ensure you have:

  • A Linux environment (Ubuntu, Debian, etc.)
  • Python 3 installed
  • Tkinter support (python3-tk on Linux if your distro does not include it by default)
  • S3 Access Keys from Internet Archive
  • An active internet connection
  • Internet Archive Python Tool

Installation

1. Preparing Your System

Update Your System:

sudo apt update && sudo apt upgrade -y

Install Python 3 and pip:

sudo apt install -y python3 python3-pip

Install internet archive CLI via pipx:

pipx install internetarchive

2. Setting Up the Script

Download the Script into your home or internet archive folder

alt text

  1. Open your text editor and create a file named ia-interact.py:
  2. Paste the full script code into the file, then save and exit.

(Optional) Make the Script Executable:

chmod +x ia-interact.py

3. Installing Python Libraries

Install Required Libraries:

pip3 install requests tqdm

Verify the Library Installation:

pip3 show requests tqdm

alt text

4. Configuring Environment Variables

Obtain Your S3 Access Keys:

Retrieve your S3 keys from Internet Archive Account Page.

Set Up Environment Variables:

Edit your shell configuration file (e.g., ~/.bashrc or ~/.zshrc) and add:

export S3_ACCESS_KEY="your-access-key"
export S3_SECRET_KEY="your-secret-key"

Replace "your-access-key" and "your-secret-key" with your actual keys.

Reload the Configuration:

source ~/.bashrc

Test the Environment Variables:

echo $S3_ACCESS_KEY
echo $S3_SECRET_KEY

5. Build Local Portable Binaries

Install build dependencies:

pip3 install -r requirements-build.txt

Build a single-file Linux executable:

pyinstaller --clean --onefile --name ia-interact ia-interact.py

Release icon asset used by packaged binaries:

assets/icons/internet-archive.png

The binary will be output to:

dist/ia-interact

On Windows, the file will be:

dist/ia-interact.exe

Run CLI mode from the packaged binary:

./dist/ia-interact --cli

Run GUI mode from the same packaged binary:

./dist/ia-interact --gui

Build a Linux .AppImage (x86_64):

chmod +x scripts/build-appimage.sh
./scripts/build-appimage.sh

The x86_64 AppImage will be output to:

release/ia-interact-linux-x86_64.AppImage

Build a Linux .AppImage (arm64):

APPIMAGE_ARCH=aarch64 ./scripts/build-appimage.sh

The arm64 AppImage will be output to:

release/ia-interact-linux-aarch64.AppImage

6. GitHub Actions: Portable Releases

This repository includes:

.github/workflows/build-binaries.yml

The workflow builds portable release artifacts for:

  • Linux x86_64: ia-interact-linux-x86_64.AppImage
  • Linux arm64: ia-interact-linux-aarch64.AppImage
  • Windows x86_64: ia-interact-windows-x86_64.exe
  • Windows arm64: ia-interact-windows-arm64.exe
  • macOS universal (arm64 + x86_64): ia-interact-macos-universal.app.zip

These release targets use Internet Archive icon assets from assets/icons/.

Each run uploads these as workflow artifacts. When you push a tag matching v* (for example v1.0.0), the workflow also publishes these files to the GitHub Release for that tag.

7. Windows Python Mode Notes (from issue #1)

Issue reference: https://github.com/harrypm/IA-Interact/issues/1

The issue confirms the script can be run directly on Windows with Python.

Recommended flow:

  1. Install Python on Windows and enable Add Python to PATH during install.

  2. Open cmd in the folder containing ia-interact.py (File Explorer address bar -> type cmd).

  3. Install dependencies:

    python -m pip install requests tqdm
    
  4. Run the tool:

    python -m ia-interact
    

    Alternative:

    python ia-interact.py
    
  5. Configure S3 keys before use (recommended: environment variables rather than hardcoding keys in the script).

    • The issue notes also mention replacing os.getenv("S3_ACCESS_KEY") and os.getenv("S3_SECRET_KEY") inline in the script.
    • If you do that for troubleshooting, keep it local-only and do not commit secrets.

Large upload note from the issue:

  • In the upload flow, choosing new folder is more reliable for large uploads.
  • Using existing folder with ./ may return a 400 error.

8. GUI Mode (Login + File Selection)

Run the GUI:

python3 ia-interact-gui.py

Or use the unified launcher:

python3 ia-interact.py --gui

GUI flow:

  1. Enter S3 access and secret keys on the login page.
  2. Enter your repository URL or identifier (for example archive.org/details/<id>, archive.org/download/<id>/..., archive.org/metadata/<id>, or just <id>).
  3. Select repository files (left list) to download.
  4. Add/select local files (right list) to upload.
  5. Set target upload directory and run upload/download actions.
    • Upload can start directly from a valid repository field value.
    • Download requires a loaded repository file list.

Usage

alt text

Running the Script

To execute the script, run:

python3 ia-interact.py

To launch the GUI version, run: python3 ia-interact.py --gui python3 ia-interact-gui.py

To force CLI mode explicitly, run: python3 ia-interact.py --cli

To interact with a repo, you can use either the item identifier directly or a full Archive URL:

xxxxxxxxxx
https://archive.org/details/xxxxxxxxxx
https://archive.org/download/xxxxxxxxxx/some/file.ext
https://archive.org/metadata/xxxxxxxxxx

From AppImage output, run CLI mode:

./release/ia-interact-linux-x86_64.AppImage --cli

From the same AppImage, run GUI mode:

./release/ia-interact-linux-x86_64.AppImage --gui

When opened/clicked from a desktop environment, the packaged app defaults to GUI mode.

Script Options

When the script runs, it displays an interactive menu with the following options:

  • List Files: Display the contents of an existing repository.
  • Upload Files: Add files to a repository.
  • Download Files: Download one file or all files from a repository to a local folder.
  • Delete/Move Files: Manage files within a repository.
  • Create a New Repository: Upload an entire folder and configure repository metadata.

During repository creation, you will be prompted to:

  • Input metadata (title, description, creator, date, language, license URL).
  • Select a collection from the provided list.
  • Enter subject tags (e.g., music, history).
  • Specify if the repository is a test item (Note: Test items are automatically deleted after 30 days).
  • Choose between Test Mode (simulate actions without an actual upload) and Permanent Mode (execute actual uploads).

Troubleshooting

  • Missing Libraries:
    If you encounter errors about missing libraries, run:

    pip3 install requests tqdm

  • Environment Variables Not Set:
    Ensure your environment variables are defined in your shell configuration file and reload it:

    source ~/.bashrc

  • API or Network Issues:
    Verify that your S3 keys are correct and that your internet connection is stable.

  • Logging:
    To help with debugging, you can redirect output to a log file:

    python3 ia-interact.py > script_output.log 2>&1

Full Breakdown and Feature Notes

Overview

"IA Interact" is an interactive command-line tool designed to manage repositories on the Internet Archive. It supports operations such as:

  • Uploading Files: Upload individual files or entire folders to an Internet Archive repository.
  • Listing Repository Contents: Retrieve and display the contents of a repository using the metadata API.
  • Deleting Files: Remove specified files from a repository.
  • Moving Files: Change a file’s location within a repository by copying it and then deleting the original.
  • Downloading Files: Download a single file or all files from a repository to a local path.
  • Creating a New Repository: Upload a folder as a new repository and submit metadata.
  • User Interaction: Offers an interactive menu with a help option, test mode (simulation) vs. permanent mode, and filtering to avoid showing files from ".thumbs" directories.

This script uses the Internet Archive’s S3-compatible interface and Metadata API, and it includes robust file upload logic (with chunking, progress bars, and retry strategies).


Detailed Breakdown by Function

1. get_repo_identifier(repo_link)

  • Purpose:
    Extracts the repository identifier from either a full Internet Archive URL or a plain identifier.
  • How It Works:
    Normalizes and parses Archive URLs (/details/, /download/, /metadata/) and returns the item identifier.
    If a plain identifier is provided, it is accepted directly. If input is invalid, it alerts the user and returns None.

2. upload_file_with_progress(identifier, file_path, directory)

  • Purpose:
    Uploads a file to a specified directory within a repository.
  • Key Features:
    • Chunking: Reads the file in 2MB chunks.
    • Progress Tracking: Uses the tqdm library to display a real-time progress bar.
    • Retry Logic: Implements retry strategy (5 retries) using an HTTPAdapter.
    • S3 Authentication: Reads S3 keys from environment variables and includes them in the request headers.
  • Note:
    The function sends HTTP PUT requests to the URL https://s3.us.archive.org/{identifier}/{directory}/{filename} to perform the upload.

3. list_repository_files(identifier)

  • Purpose:
    Retrieves and lists the files in a repository.
  • Key Features:
    • Metadata API: Sends a GET request to https://archive.org/metadata/{identifier} to fetch repository metadata (in JSON).
    • Filtering: Excludes any files that reside in directories with names ending in ".thumbs" (i.e. if any component of the path ends with ".thumbs").
    • Display: Prints out a numbered list of the filtered file names.

4. delete_file(identifier, file_path)

  • Purpose:
    Deletes a file from the repository.
  • Key Features:
    • HTTP DELETE: Sends a DELETE request to the S3 endpoint https://s3.us.archive.org/{identifier}/{file_path}.
    • S3 Authentication: Utilizes S3 credentials stored in environment variables.
    • Feedback: Notifies the user whether the file deletion succeeded (checks for HTTP 200 or 204).

5. move_file(identifier, file_name, source_dir, target_dir)

  • Purpose:
    Moves a file from one location in the repository to another.
  • Key Features:
    • Copy-Delete Approach:
      1. Copy: Uses an HTTP PUT request with the x-amz-copy-source header to copy the file to the target directory.
      2. Delete: If the copy is successful, deletes the original file.
    • S3 Authentication: Requires S3 keys from environment variables.
    • Error Handling: Provides error messages if the copy or delete fails.

6. create_rules_file(folder_path)

  • Purpose:
    Ensures that a local folder contains a _rules.conf file.
  • Key Features:
    • Default Rules: If _rules.conf does not exist, it is created with the default content CAT.ALL.
    • Usage: This file can help control file visibility during repository uploads.

7. prompt_metadata()

  • Purpose:
    Collects metadata from the user required to create a new repository.
  • Collected Metadata Includes:
    • Basic Information: Title, description, creator, date, language, license URL.
    • Collection: The user selects one from a predefined list (e.g., community, opensource, texts, movies, audio, image, etree, folksoundomy, games, software).
    • Subject Tags: A comma-separated list, such as "music, history".
    • Test Item Flag: A flag to indicate if the repository is a test item (if "yes" is entered, it sends "true"; if "no", the field is omitted).

8. initialize_repository(folder_path, identifier, metadata, mode)

  • Purpose:
    Uploads all files from a specified folder as a new repository.
  • Key Features:
    • Mode Selection:
      • Test Mode: Simulates the upload process without transferring any files.
      • Permanent Mode: Uploads each file via HTTP PUT requests.
    • Recursive Upload: Iterates through all files in the folder (using os.walk).
    • Metadata Submission: After file uploads, sends repository metadata via a POST request.

9. print_help()

  • Purpose:
    Displays a help message describing each menu option and its usage.
  • Features:
    Provides detailed instructions for each operation and references the official Internet Archive CLI documentation for further details.

10. main()

  • Purpose:
    Serves as the entry point of the script with an interactive menu.
  • Key Features:
    • Main Menu: Displays options for uploading, listing, deleting, moving, and downloading files, creating a repository, or viewing help.
    • Conditional Prompts:
      • For Existing Repositories (options 1–5): Prompts for the repository URL after the option selection.
      • For Folder-based Repository Creation (option 6): Gathers folder path, mode, and metadata.
    • Action Dispatch: Calls the corresponding function based on the user’s selection.

Creator Notes

This script was built using Microsoft Copilot, and 5 hours of Harry Munday's lifespan, Enjoy.

About

Python Wrapper for Internet Archives Insufferble CLI tool

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors