MDEV-39092 Copy Aria data and logs as part of backup#4971
MDEV-39092 Copy Aria data and logs as part of backup#4971mariadb-andrzejjarzabek wants to merge 4 commits into
Conversation
|
|
4ef94be to
c143ff2
Compare
| namespace backup | ||
| { | ||
|
|
||
| using target_dir_t= IF_WIN(const char*,int); |
There was a problem hiding this comment.
This violates MDEV-25861. We should not use the reserved POSIX _t suffix in any new type declarations.
There was a problem hiding this comment.
Do we have a suggested naming convention for types/typedefs? Target_dir? If just target_dir, do we have a convention for naming a variable of that type representing the target directory (which in current code is called "target_dir")?
|
|
||
| #pragma once | ||
|
|
||
| #include <cstdint> |
There was a problem hiding this comment.
Why would we include this header here? For uintptr_t perhaps? But we also use IF_WIN() and off_t without any declaration.
There was a problem hiding this comment.
Yes, for uintptr_t. IF_WIN is included by my_global.h from the source file - I notice it's common to not include specifically header dependencies if they're included by my_global.h. For off_t - agreed, I'll add sys/types.h.
| using target_dir_t= IF_WIN(const char*,int); | ||
|
|
||
| inline void* to_void_ptr(target_dir_t tgt) noexcept | ||
| { | ||
| return IF_WIN(const_cast<char*>, reinterpret_cast<void*>)(tgt); | ||
| } | ||
|
|
||
| inline target_dir_t to_target_dir(void* ptr) noexcept | ||
| { | ||
| return IF_WIN(static_cast<const char*>(ptr), | ||
| int(reinterpret_cast<uintptr_t>(ptr))); | ||
| } | ||
|
|
||
| #ifndef _WIN32 |
There was a problem hiding this comment.
As far as I understand, this header is not at all usable on Windows, because Microsoft Windows does not provide primitives for copying data between file descriptors, other than TransmitFile(), which we would put into use in MDEV-38362.
I don’t see the value of including this header on Microsoft Windows (until we make it cover file streaming) or defining such conversion functions. On Windows, we will need a different way of copying files, based on file names. I don’t think we should try to shoehorn it into a common interface.
There was a problem hiding this comment.
First, the header defines target_dir_t (to be renamed), which de-duplicates all the IF_WIN(const char*,int). to_void_ptr and to_target_dir are used in one header only and could be defined there, but are linked to the definition of the target dir type, so my thinking is best to keep them together (but I'm not adamant on this, I can move them to sql_backup.cc).
But also more generally this header is meant for common code for BACKUP SERVER which is shared between SQLlayer and plugins. In the current implementation both engine-agnostic files like db.opt and *.frm and Aria specific files are copied by the Aria plugin, but that will change. At that point it will be useful to have a function which copies a file in its entirety from data directory to backup target, which works on all platforms. The intention is for this header to be where functions like that are declared. Similarly the code which traverses the data directory and its subdirectories may at that point have to be extracted to be re-used between SQL and plugin layers.
| file_logger.c my_dlerror.c crc32/crc32c.cc | ||
| my_timezone.cc my_compr_int.cc my_thread_name.cc | ||
| my_virtual_mem.c) | ||
| my_virtual_mem.c my_backup.cc) |
There was a problem hiding this comment.
The functions are about copying files, not backup. The file name is misleading.
There was a problem hiding this comment.
It's intended for "implementation of common functions needed by BACKUP SERVER" (or definitions of functions/statics declared in my_backup.h - incidentally it contains just the copy function now but other shared functions are likely to end up in there. As an example a cross -platform (including Windows) function to "ensure a subdirectory exists in target directory (create if needed)" will probably end up in there, and that using the specific way that BACUP SERVER code defines a "target directory" (which may not be shared by non-backup related code which defines the concept of a "target directory" for its own purposes).
But when we consider misleading names, one existing problem in this codebase is that the term "backup" (and therefore e.g. source file names containing that word" is used in several different meanings: supporting backup via external tools (BACKUP STAGE etc.), copying files to the cloud using the S3 storage engine, and now BACKUP SERVER. This leads to less-than-fortunate situations where sql sourc directory has unrelated backup.* and sql_backup.* files, Aria plugin gets ma_backup.h and ma_backup.cc in addition to existing but unrelated ma_backup.c which defines functions in include/aria_backup.h etc. I thought about using inserver_backup as naming convention for everything specifically related to BACKUP SERVER, it seemed a bit long, but maybe that's not a problem? Or maybe have a shortened version of that (ins_backup?)
| # endif | ||
| DBUG_ASSERT(ret <= 0); | ||
| return int(ret); | ||
| #endif | ||
| } | ||
| #endif | ||
| } No newline at end of file |
There was a problem hiding this comment.
The preprocessor directives are indented inconsistently, and the last line is missing a terminator.
| int (*backup_start)(THD *thd, IF_WIN(const char*,int) target); | ||
| int (*backup_start)(THD *thd, backup::target_dir_t target); |
There was a problem hiding this comment.
The name change is misleading. In MDEV-38362, this interface is subject to revision. Then it’s thinkable that the int argument would be the handle of an output a stream rather than of a target directory.
| #include "my_backup.h" | ||
|
|
||
| using namespace backup; |
There was a problem hiding this comment.
Can we define the file-copying API in C? Most of mysys is in C.
| /* TODO: .frm failes are nto Aria-specific, they are copied here as a stop-gap */ | ||
| const std::vector<std::string> Aria_backup::data_exts {".MAD"s, ".MAI", ".frm"s}; |
There was a problem hiding this comment.
Why is this not an array of const char*?
There was a problem hiding this comment.
The reason is because it's useful to know the length of the suffix when checking if a string ends with it. Obviously we may either use strlen every time or pre-calculate it but this seemed like a reasonable solution between performance and legibility. Note that in any mainstream implementation of the standard library this particular case will not do dynamic allocation here because these suffixes fall under short string optimization. But it does make sense to make it a static member, which I will do.
| int copy_file(const std::string &path) const noexcept | ||
| { | ||
| std::string src_path= std::string(maria_data_root) + "/" + path; | ||
| #ifndef _WIN32 | ||
| int ret_val = 0; | ||
| int src_fd = open(src_path.c_str(), O_RDONLY); |
There was a problem hiding this comment.
Outside Windows, could we please retain a handle to open(maria_data_root, O_DIRECTORY) and invoke openat(src_dir, path, O_RDONLY) to open the file? That would reduce the amount of memory heap operations.
I’d avoid std::string whenever possible. We already have too many issues around heap memory fragmentation. Even on Windows we could use NtCreateFile() to simulate openat(2). I will give it a try, because it would allow us to pass target directory handles rather than names across all platforms.
There was a problem hiding this comment.
Will a string allocation which is done once per backup per file and then very quickly freed noticeably impact heap fragmentation or performance in general?
| int perform_backup() noexcept | ||
| { | ||
| if (scan_datadir()) | ||
| return 1; | ||
| if (copy_databases()) | ||
| return 1; | ||
| if (copy_control_file()) | ||
| return 1; | ||
| if (copy_logs()) | ||
| return 1; | ||
| return 0; | ||
| } |
There was a problem hiding this comment.
This is fundamentally incompatible with the handlerton::backup_step API. Even if you are for now copying everything in handlerton::backup_end, the internal API should be kept as compatible with handlerton::backup_step as possible: copying one file at a time. At least the copy_databases() step must be refactored so that the higher-level API is invoking something that copies one file at a time. Possibly, the copying of log files should be interleaved with that, like innodb_backup_step is doing.
There was a problem hiding this comment.
This is a stop-gap to get something working. Eventual final implementation will require re-working the hadlerton API and Sql_cmd_backup to take into account that different subsets of files may be copied under diffefent levels of metadata lock, and also "start" and potentially even "end" may need to be split into different phases with different levels of metadata lock. Given that the stage API is written that way to support multi-threaded copy, which isn't implemented at this time, I propose to merge the change as proposed (I will fix some of the other problems you mentioned) and iteratively improve from there.
efbc62b to
c77278b
Compare
e89625e to
06fd556
Compare
4d6a19c to
d86a6a1
Compare
d86a6a1 to
c6f5119
Compare
This is an initial simple implementation which copies all the Aria files in the "end" phase of the backup. Nothing protects the copy from concurrent DDL or DML. Copying only works on MacOS (intended for refactoring to use common file copy method across engines and SQL layer).
Copy non-Aria-specific files *.frm and db.opt as part of Aria backup.
Add MTR test. Copy MyISAM files in Aria plugin so that the MTR works. Perform the end step under maximum level of backup MDL to safely copy non-transactional Aria and MyISAM files.
c6f5119 to
9ebb23e
Compare
This is an initial simple implementation which copies all the Aria files in the "end" phase of the backup. Nothing protects the copy from concurrent DDL or DML. Copying only works on MacOS (intended for refactoring to use common file copy method across engines and SQL layer).