Background
I had recently been gifted a Lenovo Thinkpad X220. This specific laptop was chosen due to its solid performance on FreeBSD and common usage amongst FreeBSD developers. In setting it up, I decided that I wanted to ditch BIOS and go pure UEFI. This triggered a small journey. =)
Rough Beginning
The first problem I ran into was with the install media/head snapshot that I downloaded. Excellent, it begins already! I ran smack dab into the problem described in PR 224825 ("Screen corruption booting 20171227 snapshot"). The kernel, immediately upon execution by loader, would draw the screen in a subset of the screen - seemingly at a lower resolution than the screen and heavily distorted. The end result is that the console is effectively unusable, especially for install media. The documented workaround is to do one of two things:
- Break into a loader prompt and manually set a GOP mode prior to booting
- Add `gop set X` to loader.rc.local
#1 requires you to remember to do this on every boot, and #2 is not a sustainable long-term solution for me due to lualoader, which I'll write about some other day. #2 is also not something you can, in good conscience, bake into a release image. I note here that if I weren't a FreeBSD developer myself then I would've likely punted the install media off of my flash drive in a heartbeat and chosen something else... this wouldn't have been a very good first impression. =(
The fix for this started with r331321. Further investigation had revealed to me that boot1.efi was choosing a console mode that would put the system into a higher resolution, but GOP would seemingly not reflect this change. We rely on GOP or UGA to tell us the size of the current framebuffer so that we can pass that on to the kernel, so when GOP is reporting 640x480 after your screen resolution has been set to 1024x768... well, that gives you the above mentioned problem.
r331321 addresses this by deferring any mode selection until loader.conf(5) has been read, but before we draw anything of use. It actually adds an efi-autoresizecons loader command that does the dirty work, then we either do or do not invoke this in Forth/Lua. This may not be the cleanest of ways to do it, but we decidedly *do* want to do mode selection after loader.conf(5) has been read in so that the system user can effectively limit their console resolution if they'd like (see: efi_max_resolution). The last of the documentation changes for this work is in head as of r331470; any revision after that should be OK.
More Rough Patches
Next up was actually something I hadn't stumbled upon naturally. I had been helping out imp with testing some of his loader patches on ZFS systems (GELI and non-GELI). I figured out why it was failing with ZFS in general, and things were going well in that nature. His patches had actually helped a $work Lenovo that would fail to boot with the TPM enabled, but I think that's a story for another day. Anywho- his long term goal is to actually use UEFI variables more effectively, so I decided to take a look at efivar(8) and efibootmgr(8). This took us through some more rough waters! Read on.
Using UEFI Runtime Services
FreeBSD has had support for runtime services since kib introduced them in r306097. Shout out to kib, he's awesome. Anyways- EFI runtime services are exposed by either compiling your kernel with `options EFIRT` or loading efirt.ko. This will expose a /dev/efi node that efivar(8) and efibootmgr(8) interact with via ioctl(2). It becomes quickly apparent from this X220 and the $work Lenovo mentioned earlier that we have two general runtime service problems:
- On the $work Lenovo, efirt loads fine but any use of efivar(8)/efibootmgr(8) results in an immediate kernel panic.
- On the X220, efirt panics the kernel immediately upon load.
Tracking Things Down
This part is actually really luck-filled, so please don't interpret it as if I had any idea what I was doing.
Problem #1
This was the first problem I ran into. Naturally, I asked kib about it. He wasn't able to pin down the cause of the problem, but the information I was able to give him showed that the UEFI implementation was trying to jump into boot services memory. Being in the kernel proper, this is really bizarre and we almost wrote the whole thing off as a firmware bug. I tried to try other OS to see if others found a way on this laptop, but my attempts to attempt were all met with failure of one of three forms:
- OS did not have a full live distro that I could play with (i.e. full userland),
- OS did not boot, or
- OS would boot UEFI but somehow still no access to UEFI stuff was granted
I was at a loss. I reviewed all kinds of UEFI documentation but nothing hinted at what could cause this, so I entered "let's just keep panicking" mode. Eventually, I stumbled across SetVirtualAddressMap.
For a little bit of background, there's effectively three scenarios in which runtime services may be invoked:
- In a bootloader, prior to ExitBootServices
- After ExitBootServices
- After ExitBootServices, in virtual addressing mode
I already know that #1 works. I can use UEFI vars in the loader, but this isn't particularly helpful- I want to be able to inspect them from userland. #2 is the mode of operation that kib implemented all of this in. See r306097 for specific information there. #3 is the scenario that I had no information about, but it didn't seem highly likely that it would make any difference, so I asked kib how best to try this out.
His answer resulted in r330868. It was surprising, but SetVirtualAddressMap actually solved the problem I was experiencing- efivar(8) and efibootmgr(8) were both perfectly fine following this commit. I later find out from andrew@ that a similar problem actually exists in U-Boot, but in a different format. The explanation for the behavior I noted is likely that the variable related calls have actually two versions: one for use during boot services, and one for use after. The Lenovo firmware likely uses SetVirtualAddressMap to effectively switch to the post-boot service method.
Problem #2
This one was a little bit less fun to work out. Basically, it turned out (after some Angry Printf(TM) sprinkled about) that the panic was in trying to fetch the current time via efirtc. The backtrace was misleading due to some inlining that occurred, so this wasn't immediately obvious. The fix for this is r330843. It turns out that the X220's firmware doesn't understand that the capabilities pointer is optional and attempts to dereference it. We were previously passing in NULL because, well, it's optional by the spec! Unfortunately, that doesn't work out, so we have to pass something. =)
Call for Testing
After addressing the above, I issued a call for testing on -current@ and echoed that e-mail on Twitter for a little more exposure. So far, this call for testing has been pretty successful. I was immediately made aware that I'm not good at testing things (this actually wasn't news to me =)), and another fairly import bug was sussed out: efirt.ko cannot be pre-loaded by loader. This would result in a panic in fpu_kern_enter. It turns out that this was due to some SYSINIT ordering issues, fixed in r331365.
At this point, we seem to be in a pretty stable state. Continued testing on head and reports of failures would be greatly appreciated. New head snapshots should come out in the Thursday/Friday time range including all of the above fixes, making for some good weekend testing! As of now, I think we're on target to have these MFC'd to stable/11 on April 4th, barring any majors disasters of course. =)