Skip to content

MDEV-39092 Copy Aria data and logs as part of backup#4971

Draft
mariadb-andrzejjarzabek wants to merge 4 commits into
MariaDB:MDEV-14992from
mariadb-andrzejjarzabek:MDEV-39092
Draft

MDEV-39092 Copy Aria data and logs as part of backup#4971
mariadb-andrzejjarzabek wants to merge 4 commits into
MariaDB:MDEV-14992from
mariadb-andrzejjarzabek:MDEV-39092

Conversation

@mariadb-andrzejjarzabek
Copy link
Copy Markdown

This is an initial simple implementation which copies all the Aria files in the "end" phase of the backup. Nothing protects the copy from concurrent DDL or DML. Copying only works on MacOS (intended for refactoring to use common file copy method across engines and SQL layer).

@CLAassistant
Copy link
Copy Markdown

CLAassistant commented Apr 22, 2026

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
You have signed the CLA already but the status is still pending? Let us recheck it.

Comment thread include/my_backup.h Outdated
namespace backup
{

using target_dir_t= IF_WIN(const char*,int);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This violates MDEV-25861. We should not use the reserved POSIX _t suffix in any new type declarations.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have a suggested naming convention for types/typedefs? Target_dir? If just target_dir, do we have a convention for naming a variable of that type representing the target directory (which in current code is called "target_dir")?

Comment thread include/my_backup.h Outdated

#pragma once

#include <cstdint>
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why would we include this header here? For uintptr_t perhaps? But we also use IF_WIN() and off_t without any declaration.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, for uintptr_t. IF_WIN is included by my_global.h from the source file - I notice it's common to not include specifically header dependencies if they're included by my_global.h. For off_t - agreed, I'll add sys/types.h.

Comment thread include/my_backup.h Outdated
Comment on lines +25 to +38
using target_dir_t= IF_WIN(const char*,int);

inline void* to_void_ptr(target_dir_t tgt) noexcept
{
return IF_WIN(const_cast<char*>, reinterpret_cast<void*>)(tgt);
}

inline target_dir_t to_target_dir(void* ptr) noexcept
{
return IF_WIN(static_cast<const char*>(ptr),
int(reinterpret_cast<uintptr_t>(ptr)));
}

#ifndef _WIN32
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far as I understand, this header is not at all usable on Windows, because Microsoft Windows does not provide primitives for copying data between file descriptors, other than TransmitFile(), which we would put into use in MDEV-38362.

I don’t see the value of including this header on Microsoft Windows (until we make it cover file streaming) or defining such conversion functions. On Windows, we will need a different way of copying files, based on file names. I don’t think we should try to shoehorn it into a common interface.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First, the header defines target_dir_t (to be renamed), which de-duplicates all the IF_WIN(const char*,int). to_void_ptr and to_target_dir are used in one header only and could be defined there, but are linked to the definition of the target dir type, so my thinking is best to keep them together (but I'm not adamant on this, I can move them to sql_backup.cc).

But also more generally this header is meant for common code for BACKUP SERVER which is shared between SQLlayer and plugins. In the current implementation both engine-agnostic files like db.opt and *.frm and Aria specific files are copied by the Aria plugin, but that will change. At that point it will be useful to have a function which copies a file in its entirety from data directory to backup target, which works on all platforms. The intention is for this header to be where functions like that are declared. Similarly the code which traverses the data directory and its subdirectories may at that point have to be extracted to be re-used between SQL and plugin layers.

Comment thread mysys/CMakeLists.txt Outdated
file_logger.c my_dlerror.c crc32/crc32c.cc
my_timezone.cc my_compr_int.cc my_thread_name.cc
my_virtual_mem.c)
my_virtual_mem.c my_backup.cc)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The functions are about copying files, not backup. The file name is misleading.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's intended for "implementation of common functions needed by BACKUP SERVER" (or definitions of functions/statics declared in my_backup.h - incidentally it contains just the copy function now but other shared functions are likely to end up in there. As an example a cross -platform (including Windows) function to "ensure a subdirectory exists in target directory (create if needed)" will probably end up in there, and that using the specific way that BACUP SERVER code defines a "target directory" (which may not be shared by non-backup related code which defines the concept of a "target directory" for its own purposes).

But when we consider misleading names, one existing problem in this codebase is that the term "backup" (and therefore e.g. source file names containing that word" is used in several different meanings: supporting backup via external tools (BACKUP STAGE etc.), copying files to the cloud using the S3 storage engine, and now BACKUP SERVER. This leads to less-than-fortunate situations where sql sourc directory has unrelated backup.* and sql_backup.* files, Aria plugin gets ma_backup.h and ma_backup.cc in addition to existing but unrelated ma_backup.c which defines functions in include/aria_backup.h etc. I thought about using inserver_backup as naming convention for everything specifically related to BACKUP SERVER, it seemed a bit long, but maybe that's not a problem? Or maybe have a shortened version of that (ins_backup?)

Comment thread mysys/my_backup.cc Outdated
Comment on lines +170 to +176
# endif
DBUG_ASSERT(ret <= 0);
return int(ret);
#endif
}
#endif
} No newline at end of file
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The preprocessor directives are indented inconsistently, and the last line is missing a terminator.

Comment thread sql/handler.h Outdated
Comment on lines +1908 to +1909
int (*backup_start)(THD *thd, IF_WIN(const char*,int) target);
int (*backup_start)(THD *thd, backup::target_dir_t target);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The name change is misleading. In MDEV-38362, this interface is subject to revision. Then it’s thinkable that the int argument would be the handle of an output a stream rather than of a target directory.

Comment thread sql/sql_backup.cc Outdated
Comment on lines +22 to +24
#include "my_backup.h"

using namespace backup;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we define the file-copying API in C? Most of mysys is in C.

Comment thread storage/maria/ma_backup.cc Outdated
Comment on lines +299 to +300
/* TODO: .frm failes are nto Aria-specific, they are copied here as a stop-gap */
const std::vector<std::string> Aria_backup::data_exts {".MAD"s, ".MAI", ".frm"s};
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this not an array of const char*?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason is because it's useful to know the length of the suffix when checking if a string ends with it. Obviously we may either use strlen every time or pre-calculate it but this seemed like a reasonable solution between performance and legibility. Note that in any mainstream implementation of the standard library this particular case will not do dynamic allocation here because these suffixes fall under short string optimization. But it does make sense to make it a static member, which I will do.

Comment thread storage/maria/ma_backup.cc Outdated
Comment on lines +229 to +234
int copy_file(const std::string &path) const noexcept
{
std::string src_path= std::string(maria_data_root) + "/" + path;
#ifndef _WIN32
int ret_val = 0;
int src_fd = open(src_path.c_str(), O_RDONLY);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Outside Windows, could we please retain a handle to open(maria_data_root, O_DIRECTORY) and invoke openat(src_dir, path, O_RDONLY) to open the file? That would reduce the amount of memory heap operations.

I’d avoid std::string whenever possible. We already have too many issues around heap memory fragmentation. Even on Windows we could use NtCreateFile() to simulate openat(2). I will give it a try, because it would allow us to pass target directory handles rather than names across all platforms.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will a string allocation which is done once per backup per file and then very quickly freed noticeably impact heap fragmentation or performance in general?

Comment on lines +108 to +119
int perform_backup() noexcept
{
if (scan_datadir())
return 1;
if (copy_databases())
return 1;
if (copy_control_file())
return 1;
if (copy_logs())
return 1;
return 0;
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is fundamentally incompatible with the handlerton::backup_step API. Even if you are for now copying everything in handlerton::backup_end, the internal API should be kept as compatible with handlerton::backup_step as possible: copying one file at a time. At least the copy_databases() step must be refactored so that the higher-level API is invoking something that copies one file at a time. Possibly, the copying of log files should be interleaved with that, like innodb_backup_step is doing.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a stop-gap to get something working. Eventual final implementation will require re-working the hadlerton API and Sql_cmd_backup to take into account that different subsets of files may be copied under diffefent levels of metadata lock, and also "start" and potentially even "end" may need to be split into different phases with different levels of metadata lock. Given that the stage API is written that way to support multi-threaded copy, which isn't implemented at this time, I propose to merge the change as proposed (I will fix some of the other problems you mentioned) and iteratively improve from there.

This is an initial simple implementation which copies all the Aria files
in the "end" phase of the backup. Nothing protects the copy from
concurrent DDL or DML. Copying only works on MacOS (intended for
refactoring to use common file copy method across engines and SQL
layer).
Copy non-Aria-specific files *.frm and db.opt as part of Aria backup.
Add MTR test. Copy MyISAM files in Aria plugin so that the MTR works.
Perform the end step under maximum level of backup MDL to safely copy
non-transactional Aria and MyISAM files.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Development

Successfully merging this pull request may close these issues.

4 participants