Skip to content

update gold standard for err and load#814

Open
atupone wants to merge 1 commit into
TACC:mainfrom
atupone:main
Open

update gold standard for err and load#814
atupone wants to merge 1 commit into
TACC:mainfrom
atupone:main

Conversation

@atupone

@atupone atupone commented Apr 19, 2026

Copy link
Copy Markdown

Here (gentoo) some tests does not pass as the err file is different.

This commit fix it here

@rtmclay

rtmclay commented Apr 22, 2026

Copy link
Copy Markdown
Member

The gold files do not change for different version of the O.S. If there are differences, it is handled by the filter. Many of the differences you are seeing because there are changes in the warning/error messages between Lmod 8.7+ and Lmod 9.2+.

If there is an issue with the tests, you will have to upgrade to the latest version of Lmod 9.2+ and rerun the tests.

@atupone

atupone commented Apr 22, 2026

Copy link
Copy Markdown
Author

I did that running 9.2

@rtmclay

rtmclay commented May 15, 2026

Copy link
Copy Markdown
Member

Please see the discussion seen in Issue #821.

If you have time please rerun your issue with Lmod 9.2.1.

@atupone

atupone commented May 18, 2026

Copy link
Copy Markdown
Author

The help test case is fixed in 9.2.1, I only have to add
--- a/rt/help/help.tdesc 2026-05-18 16:03:41.653576306 +0200
+++ b/rt/help/help.tdesc 2026-05-18 16:06:38.378582894 +0200
@@ -45,6 +45,8 @@
cat _stderr.[0-9][0-9][0-9] > _stderr.orig
cleanUp _stderr.orig err.txt

  • sed -i '/libsandbox.so/d' err.txt
    
  • rm -f results.csv
    wrapperDiff --csv results.csv $(testDir)/out.txt out.txt
    wrapperDiff --csv results.csv $(testDir)/err.txt err.txt
    

as our sandbox inject libsandbox in the LD_PRELOAD

for the load test I still have problem:

$ diff -u _err.left _err.right 
--- _err.left	2026-05-18 16:15:06.726850310 +0200
+++ _err.right	2026-05-18 16:15:06.727850299 +0200
@@ -125,20 +125,16 @@
 step 18
 lua ProjectDIR/src/lmod.in.lua shell --regression_testing load boost/1.33.0
 ===========================
-Lmod has detected the following error: The following module(s) are unknown: "boost/1.33.0"
-Please check the spelling or version number. Also try "module spider ..."
-It is also possible your cache file is out-of-date; it may help to try:
-  $ module --ignore_cache load "boost/1.33.0"
-Also make sure that all modulefiles written in TCL start with the string #%Module
+Lmod has detected the following error: These module(s) or extension(s) exist but cannot be loaded as requested: "boost/1.33.0"
+   Try: "module spider boost/1.33.0" to see how to load the module(s).
+   The requested module(s) require a toolchain that is incompatible with the currently loaded environment.
 ===========================
 step 19
 lua ProjectDIR/src/lmod.in.lua shell --regression_testing load boost/1.57.0
 ===========================
-Lmod has detected the following error: The following module(s) are unknown: "boost/1.57.0"
-Please check the spelling or version number. Also try "module spider ..."
-It is also possible your cache file is out-of-date; it may help to try:
-  $ module --ignore_cache load "boost/1.57.0"
-Also make sure that all modulefiles written in TCL start with the string #%Module
+Lmod has detected the following error: These module(s) or extension(s) exist but cannot be loaded as requested: "boost/1.57.0"
+   Try: "module spider boost/1.57.0" to see how to load the module(s).
+   The requested module(s) require a toolchain that is incompatible with the currently loaded environment.
 ===========================
 step 20
 lua ProjectDIR/src/lmod.in.lua shell --regression_testing load boost
@@ -213,11 +209,9 @@
 step 30
 lua ProjectDIR/src/lmod.in.lua shell --regression_testing load DoesNotExist
 ===========================
-Lmod has detected the following error: The following module(s) are unknown: "DoesNotExist"
-Please check the spelling or version number. Also try "module spider ..."
-It is also possible your cache file is out-of-date; it may help to try:
-  $ module --ignore_cache load "DoesNotExist"
-Also make sure that all modulefiles written in TCL start with the string #%Module
+Lmod has detected the following error: These module(s) or extension(s) exist but cannot be loaded as requested: "DoesNotExist"
+   Try: "module spider DoesNotExist" to see how to load the module(s).
+   The requested module(s) require a toolchain that is incompatible with the currently loaded environment.
 ===========================
 step 31
 lua ProjectDIR/src/lmod.in.lua shell --regression_testing list
@@ -334,29 +328,23 @@
 step 39
 lua ProjectDIR/src/lmod.in.lua shell --regression_testing load bad_symlink
 ===========================
-Lmod has detected the following error: The following module(s) are unknown: "bad_symlink"
-Please check the spelling or version number. Also try "module spider ..."
-It is also possible your cache file is out-of-date; it may help to try:
-  $ module --ignore_cache load "bad_symlink"
-Also make sure that all modulefiles written in TCL start with the string #%Module
+Lmod has detected the following error: These module(s) or extension(s) exist but cannot be loaded as requested: "bad_symlink"
+   Try: "module spider bad_symlink" to see how to load the module(s).
+   The requested module(s) require a toolchain that is incompatible with the currently loaded environment.
 ===========================
 step 40
 lua ProjectDIR/src/lmod.in.lua shell --regression_testing load empty
 ===========================
-Lmod has detected the following error: The following module(s) are unknown: "empty"
-Please check the spelling or version number. Also try "module spider ..."
-It is also possible your cache file is out-of-date; it may help to try:
-  $ module --ignore_cache load "empty"
-Also make sure that all modulefiles written in TCL start with the string #%Module
+Lmod has detected the following error: These module(s) or extension(s) exist but cannot be loaded as requested: "empty"
+   Try: "module spider empty" to see how to load the module(s).
+   The requested module(s) require a toolchain that is incompatible with the currently loaded environment.
 ===========================
 step 41
 lua ProjectDIR/src/lmod.in.lua shell --regression_testing load version_only
 ===========================
-Lmod has detected the following error: The following module(s) are unknown: "version_only"
-Please check the spelling or version number. Also try "module spider ..."
-It is also possible your cache file is out-of-date; it may help to try:
-  $ module --ignore_cache load "version_only"
-Also make sure that all modulefiles written in TCL start with the string #%Module
+Lmod has detected the following error: These module(s) or extension(s) exist but cannot be loaded as requested: "version_only"
+   Try: "module spider version_only" to see how to load the module(s).
+   The requested module(s) require a toolchain that is incompatible with the currently loaded environment.
 ===========================
 step 42

@rtmclay

rtmclay commented May 18, 2026

Copy link
Copy Markdown
Member

After looking at your new diff. It is clear that Lmod 9.2.1 is generating an error called "e_FAILED_LOAD" and you are generating an error called "e_FAILED_LOAD_2". They are different error because Lmod is not following the same path on your machine and mine. It is not that the gold file err.txt has to change. It is that the different path through the code has to be resolved.

I have uploaded Lmod 9.2.2 which fixes the issue with the help test. It now checks that ld_preload has a value and does not care what the value is.

To resolve this issue requires that I be able to reproduce it. There are a couple of ways to handle this:

  • Ignore it. You do not need to run the tests. The only issue here is that unknown files get different messages that the module could not be found.
  • Modify the apptainer container setup found in Test failures when installing Lmod 9.2 with Gentoo Prefix #821 to reproduce your issue. This would be best as I would be able to test and iterate to find why it is taking different paths locally.
  • You help me with the debugging using the latest version of Lmod.

To help me debug, you would have to do the following steps

  1. You modify the line 53 in rt/load.tdesc from runLmod load boost/1.33.0 to runLmod -D load boost/1.33.0
  2. run the commands: cd rt/load; rm -rf t1; tm .; tar czf t1.load.tgz t1
  3. Attach t1.load.tgz here

This may take a few iterations of you downloading new versions of Lmod and redoing steps 1-3

@bedroge

bedroge commented May 19, 2026

Copy link
Copy Markdown

Strange, for me the 9.2.1 version from gentoo/gentoo#46278 worked fine with Gentoo Prefix and passed all the tests, while for the one that you actually merged (gentoo/gentoo@346253a, with additional patches) the load test does fail again.

@atupone

atupone commented May 19, 2026

Copy link
Copy Markdown
Author

Strange, for me the 9.2.1 version from gentoo/gentoo#46278 worked fine with Gentoo Prefix and passed all the tests, while for the one that you actually merged (gentoo/gentoo@346253a, with additional patches) the load test does fail again.

I guess some of the dependency have different version. I work with stable tree, except the things I am working on

@atupone

atupone commented May 19, 2026

Copy link
Copy Markdown
Author

t1.zip
zipping a tar.gz is silly, but tgz is not accepted
lmod-9.2.2.txt
I am also attaching the ebuild (instruction to build) that we use. Not all are flour from my own sack

@rtmclay

rtmclay commented May 21, 2026

Copy link
Copy Markdown
Member

The tests did not run correctly:

===========================
step 1
lua ProjectDIR/src/lmod.in.lua shell --regression_testing --version
===========================
lua: /usr/share/lua/5.1/posix/init.lua:23: module 'posix.glob' not found:
    no field package.preload['posix.glob']
    no file '/opt/hermes/lib/posix/glob.lua'
    no file '/opt/hermes/tools/posix/glob.lua'
    no file './posix/glob.lua'
    no file '/usr/share/lua/5.1/posix/glob.lua'
    no file '/usr/share/lua/5.1/posix/glob/init.lua'
    no file '/usr/lib64/lua/5.1/posix/glob.lua'
    no file '/usr/lib64/lua/5.1/posix/glob/init.lua'
    no file '/usr/lib64/lua/5.4/posix/glob.so'
    no file '/usr/lib64/lua/5.4/loadall.so'
    no file './posix/glob.so'
    no file '/usr/lib64/lua/5.4/posix.so'
    no file '/usr/lib64/lua/5.4/loadall.so'
    no file './posix.so'
stack traceback:
    [C]: in function 'require'
    /usr/share/lua/5.1/posix/init.lua:23: in main chunk
    [C]: in function 'require'
    ...e/sys-cluster/lmod-9.2.2/work/Lmod-9.2.2/src/lmod.in.lua:61: in main chunk
    [C]: in ?

Somehow your test setup did not find luaposix. Can you add it? Is there something I can do?

@atupone

atupone commented May 22, 2026

Copy link
Copy Markdown
Author

I have now luaposix installed.
Situation is not changed, but I was able to pass the test if I set globally lua to 5.1.
When I set to 5.3 globally, even if at configure time I pass lua 5.1, I got that failing test

@atupone

atupone commented May 22, 2026

Copy link
Copy Markdown
Author

Maybe during the test there is a PATH cleanup? This is put in front of the PATH /var/tmp/portage/sys-cluster/lmod-9.2.2/temp/lua5.1/bin to have the selected lua chosen

@rtmclay

rtmclay commented May 22, 2026

Copy link
Copy Markdown
Member

I am not sure what you are saying. One point I want to check is: the version of lua and having luaposix installed must match. If you have Lua 5.1 installed and then install luaposix with lua 5.1, you can't use lua 5.3 with the luaposix built with Lua 5.1

What do you mean when you say that the test passes with Lua 5.1? Does cd rt/load; tm . pass or something else?

@atupone

atupone commented May 22, 2026

Copy link
Copy Markdown
Author

In gentoo we can have more then one version of lua. On my system I have:
$ eselect lua list
[1] lua5.1 *
[2] lua5.3
[3] lua5.4
[4] luajit-2.1.1731601260
The * means this is the selected one.

luaposix is built with lua5.1:
[ebuild R ] dev-lua/luaposix-36.3::gentoo USE="-doc" LUA_TARGETS="lua5-1 -lua5-3 -lua5-4 -luajit" 0 KiB
lmod also is built with lua5.1 here:
[ebuild UD~] sys-cluster/lmod-8.6.14-r1::gentoo [9.2::gentoo] USE="auto-swap cache -duplicate-paths -test*" LUA_SINGLE_TARGET="lua5-1 -lua5-3" 0 KiB
When I select lua5.1 globally:
lrwxrwxrwx 1 root root 6 May 22 13:28 /usr/bin/lua -> lua5.1
tests are working.
If I select lua5.3 globally:
$ ls -l /usr/bin/lua
lrwxrwxrwx 1 root root 6 May 22 14:57 /usr/bin/lua -> lua5.3
they are not.
Build and tests are run inside an environment that select lua5.1 as the lua executable not changing the system one. So they pass to configure the good value, and also add to PATH a directory where lua point to lua5.1.
It seems to me (I'm not a lua guy) that this particular test is escaping this setting

@rtmclay

rtmclay commented May 26, 2026

Copy link
Copy Markdown
Member

I would like to go back to the original issue. While unlikely, the load test does change $PATH. So this might have caused a problem with different version of lua run.

To get around this possible problem I have change the modulefiles so that any path that the modules add start with "/unknown". I have also modified the load.tdesc to generate debug info when run. This means that the test cannot pass. However it will lead to debugging info that I can use to figure what the issue might be.

Please run your regular install and test with the IS814-gentoo branch of Lmod. Then please make a zip file of the rt/load/t1 directory tree. and attach that here.

@atupone

atupone commented May 26, 2026

Copy link
Copy Markdown
Author

t1-lua5.1.zip
t1-lua5.3.zip
I have added 2 zips one is for when lua is a link to lua5.1 and one is when lua is a link to lua5.3. Is more than you required but I think it can help.

I can also add the result for when lmod is build with lua5.3 where the global lua is 5.1 or 5.3. Just ask if you need it

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants