Closed Bug 785102 Opened 12 years ago Closed 12 years ago

Two different libxul.so used while populating startupcache

Categories

(Firefox Build System :: General, defect)

x86_64
Linux
defect
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED
mozilla17

People

(Reporter: espindola, Assigned: espindola)

References

Details

Attachments

(2 files, 3 obsolete files)

xpcshell links with libxul.so and the script that populates the startupcache dlopens libxul.so. The problem is that with elfhack they are not the same libxul.so.
This happens during "make package". We use bin/run-mozilla.sh which causes ld.so to load the libxul.so in dist/bin. The script then dlopens dist/firefox/libxul.so.
Using dist/firefox/run-mozilla.sh fixes the problem by making both ld.so and dlopen use the libxul.so in dist/firefox.

The dlopen is from osfile_shared_allthreads.jsm:

     let libxul = ctypes.open(OS.Constants.Path.libxul)
A quick summary from IRC:

We can fix this by
* Using dist/firefox/run-mozilla.sh instead of dist/bin/run-mozilla.sh
* Passing just a filename in osfile_shared_allthreads.jsm
* Fixing the path computation in osfile_shared_allthreads.jsm

Looks like openbsd requires a full path to dlopen.
Comment on attachment 654664 [details] [diff] [review]
patch to use dist/firefox/run-mozilla.sh

I'm pretty sure this will break windows builds, because they don't have run-mozilla.sh at all.
Attachment #654664 - Flags: review?(mh+mozilla) → review-
(In reply to Mike Hommey [:glandium] from comment #5)
> Comment on attachment 654664 [details] [diff] [review]
> patch to use dist/firefox/run-mozilla.sh
> 
> I'm pretty sure this will break windows builds, because they don't have
> run-mozilla.sh at all.

On the other hand, if we can solve the Windows problem, using run-mozilla.sh would make startupcache much less error-prone (see e.g. bug 784368).

However, for the current bug, I can add a hack in osfile_shared_allthreads.jsm to not open libxul if we are currently building the startup cache.

Of course, it might be interesting if there is a good way to detect that we are building the startup cache. For the moment, the only technique I know is to check whether |NS_GetSpecialDirectory("ProfD", ...)| success.
(In reply to David Rajchenbach Teller [:Yoric] from comment #6)
> (In reply to Mike Hommey [:glandium] from comment #5)
> > Comment on attachment 654664 [details] [diff] [review]
> > patch to use dist/firefox/run-mozilla.sh
> > 
> > I'm pretty sure this will break windows builds, because they don't have
> > run-mozilla.sh at all.
> 
> On the other hand, if we can solve the Windows problem, using run-mozilla.sh
> would make startupcache much less error-prone (see e.g. bug 784368).

Looks like this is the same bug.

> However, for the current bug, I can add a hack in
> osfile_shared_allthreads.jsm to not open libxul if we are currently building
> the startup cache.

Was there a problem, besides openbsd, loading "libxul.so" or "XUL" or "xul.dll", without a full path?
(In reply to Mike Hommey [:glandium] from comment #7)
> > However, for the current bug, I can add a hack in
> > osfile_shared_allthreads.jsm to not open libxul if we are currently building
> > the startup cache.
> 
> Was there a problem, besides openbsd, loading "libxul.so" or "XUL" or
> "xul.dll", without a full path?

IIRC, MacOS X. I can double-check if necessary.
Comment on attachment 654713 [details] [diff] [review]
Handle windows too

Review of attachment 654713 [details] [diff] [review]:
-----------------------------------------------------------------

::: toolkit/mozapps/installer/packager.mk
@@ +461,5 @@
>  PRECOMPILE_GRE=$$PWD
>  endif
>  
> +ifneq (WINNT,$(OS_ARCH))
> +RUN_FROM_PWD = "$$PWD/run-mozilla.sh"

Sorry to throw the ball again, but i just realized two things:
- OS2 would also be broken by this (yes, OS2 ; I know some IBM people were still building Firefox on it)
- applications built against a xulrunner sdk would likely be broken too.

So, you should safeguard for OS_ARCH==OS2 and LIBXUL_SDK, and still fallback to $(_ABS_RUN_TEST_PROGRAM)
Attachment #654713 - Flags: review?(mh+mozilla) → review-
Attached patch and os 2Splinter Review
Attachment #654713 - Attachment is obsolete: true
Attachment #654722 - Flags: review?
Attachment #654722 - Flags: review? → review?(mh+mozilla)
Comment on attachment 654722 [details] [diff] [review]
and os 2

Review of attachment 654722 [details] [diff] [review]:
-----------------------------------------------------------------

::: toolkit/mozapps/installer/packager.mk
@@ +469,5 @@
> +ifeq (OS2,$(OS_ARCH))
> +# FIXME: this is untested. Is OS/2 using the correct XUL?
> +RUN_TEST_PROGRAM = $(topsrcdir)/build/os2/test_os2.cmd "$(LIBXUL_DIST)"
> +else
> +ifneq (WINNT,$(OS_ARCH))

if you ifneq(,$(filter WINNT OS2,$(OS_ARCH)), you shouldn't need the OS2 part, which, by the way, sets RUN_TEST_PROGRAM instead of RUN_FROM_PWD.
Attachment #654722 - Flags: review?(mh+mozilla) → review+
Note that bug 775588 (just landed) may have fixed the issue.
https://hg.mozilla.org/mozilla-central/rev/7f81af866697
Status: ASSIGNED → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla17
RUN_FROM_PWD is not set in the LIBXUL_SDK case :(
We should really start moving away from using run-mozilla.sh, the only remaining use seems to be to set library paths, and we also don't really need to ship it on at least some, but probably all platforms (e.g. bug 715089).
Unfortunately this broke Thunderbird and SeaMonkey as they don't ship run-mozilla.sh on mac, and therefore that doesn't get written into the app directory where the command is run from.

As all run-mozilla.sh is really giving us is the library path definitions, I just checked in a change to fix that, and the case of RUN_FROM_PWD not being set in the libxul_sdk with r=glandium over irc:

https://hg.mozilla.org/mozilla-central/rev/b3c4235d1300
As promised, here is a hack that could resolve the issue by just not loading libxul.so.
How can I reproduce said issue? I can't seem to find it on TryServer.
Attachment #655002 - Flags: review?(mh+mozilla)
(In reply to David Rajchenbach Teller [:Yoric] from comment #18)
> Created attachment 655002 [details] [diff] [review]
> Hack: just do not execute osfile_shared_allthreads.jsm while building the
> startup cache,
> 
> As promised, here is a hack that could resolve the issue by just not loading
> libxul.so.
> How can I reproduce said issue? I can't seem to find it on TryServer.

You can just do a try run with clang.

https://hg.mozilla.org/try/rev/b0fa449f6855

should do it. It includes my patch, you can just replace that part with yours.
(In reply to Mark Banner (:standard8) from comment #17)
> Unfortunately this broke Thunderbird and SeaMonkey as they don't ship
> run-mozilla.sh on mac, and therefore that doesn't get written into the app
> directory where the command is run from.
> 
> As all run-mozilla.sh is really giving us is the library path definitions, I
> just checked in a change to fix that, and the case of RUN_FROM_PWD not being
> set in the libxul_sdk with r=glandium over irc:
> 
> https://hg.mozilla.org/mozilla-central/rev/b3c4235d1300

Unfortunately this broke linux64 debug with a segfault, so I backed out just the linux part of that changeset:

https://hg.mozilla.org/mozilla-central/rev/85634e93f08d
> Unfortunately this broke linux64 debug with a segfault, so I backed out just
> the linux part of that changeset:
> 
> https://hg.mozilla.org/mozilla-central/rev/85634e93f08d

Is that correct? I am afraid OS X will still be loading XUL twice, just not manifesting the same problem as linux.
Comment on attachment 655002 [details] [diff] [review]
Hack: just do not execute osfile_shared_allthreads.jsm while building the startup cache,

Not worth doing that way. Either we work around the issue on the xpcshell launching side, or you find a way to actually load the right library, which, on at least some platforms, is just a matter of loading the file name instead of a full path.
Attachment #655002 - Flags: review?(mh+mozilla) → review-
It seems to me that the "right" solution would be for the cache builder to not execute scripts when it only needs to parse them. This bug, as well as the issue that prevented bug 775588 from landing are just side-effects of the cache builder executing code that is not meant to be executed in raw xpcshell.

In the meantime, rewriting some code to make loading of libxul lazy.
Here is a second version of the workaround. This time, we ensure that libxul is loaded lazily, which should be sufficient for this bug. Still rather fragile, of course.
Attachment #655002 - Attachment is obsolete: true
Attachment #655091 - Flags: review?(mh+mozilla)
Comment on attachment 655091 [details] [diff] [review]
Semi-hack: Loading libxul lazily from osfile_shared_allthreads.jsm

Review of attachment 655091 [details] [diff] [review]:
-----------------------------------------------------------------

This should probably go in a separate bug, now that this one has been addressed at the packaging level.

::: toolkit/components/osfile/osfile_shared_allthreads.jsm
@@ +33,5 @@
>  
> +     // Define a lazy getter for a property
> +     let defineLazyGetter = function(object, name, getter) {
> +       Object.defineProperty(object, name, {
> +         configurable: true,

What is this variable for?

Note the implementation in js/xpconnect/loader/XPCOMUtils.jsm is way simpler. I'd rather you used that one.

@@ +884,5 @@
> +         return exports.OS.Shared.libxul.declare("osfile_ns_free",
> +           ctypes.default_abi,
> +           /*return*/ Types.void_t.implementation,
> +           /*ptr*/ Types.voidptr_t.implementation);
> +       });

You're changing how NS_Free is defined, and don't define wstrdup, but I guess they were not intended to be exported?
Attachment #655091 - Flags: review?(mh+mozilla) → review+
> This should probably go in a separate bug, now that this one has been addressed at the packaging level.

Re-filed as bug 785828. However, comment 16, comment 17 and comment 19 gave me the impression that the current patch is not satisfactory.

> > +     // Define a lazy getter for a property
> > +     let defineLazyGetter = function(object, name, getter) {
> > +       Object.defineProperty(object, name, {
> > +         configurable: true,
> 
> What is this variable for?

Without |configurable|, I cannot replace the getter with something else.
 
> Note the implementation in js/xpconnect/loader/XPCOMUtils.jsm is way
> simpler. I'd rather you used that one.

It uses deprecated methods that do not work in strict mode, so no can do.

> @@ +884,5 @@
> > +         return exports.OS.Shared.libxul.declare("osfile_ns_free",
> > +           ctypes.default_abi,
> > +           /*return*/ Types.void_t.implementation,
> > +           /*ptr*/ Types.voidptr_t.implementation);
> > +       });
> 
> You're changing how NS_Free is defined, and don't define wstrdup, but I
> guess they were not intended to be exported?

Indeed, they are not really meant to be exported, at least not yet.
Depends on: 785748
Product: Core → Firefox Build System
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: