User Tools

Site Tools


wiki:blog

Upgrade Time...

Back in about 2015 or so, I bought two of these:

It's an HP Proliant Microserver, Generation 8. From memory, they cost me about AU$360 a piece, which seemed pretty cheap at the time for a “proper” server, capable of using up to 32GB of ECC RAM and running a 4-disk multi-Terabyte storage array, constantly and reliably. (For the record, you can now get Generation 10 equivalents, which I don't own personally; but having installed one for the local church Parish Office, I can attest that they are excellent, too).

One of the reasons I got the servers so cheap was that I bought the base model -they each shipped with a 2-core Celeron G1610T processor, for example. Since the servers just sat there running ZFS and providing bulk storage to the other PCs around the house, that level of CPU was more than sufficient.

But never one to let things run unchanged for long, I have recently been thinking that it would be quite nice to upgrade both servers with rather more capable Xeon CPUs. You can buy suitable models on Ebay for quite reasonable sums it turns out, and I therefore rapidly acquired a Xeon E3-1260L (for around £45) and a slightly more powerful Xeon E3-1230 V2 (for £60). Both are socket LGA 1155 processors, so slot in perfectly to where the Celerons used to sit!

Both new CPUs are 4-core with hyperthreading, so 8-thread parts (as compared to the Celeron's 2-cores, 2-threads). Moreover, the Celerons only ran at 2.3GHz with no turbo; the E3-1230 runs at 3.3GHz with a 3.7GHz turbo, and the E3-1260L runs at 2.40GHz with a 3.3GHz turbo. So, four times the threads and with the possibility in turbo mode of an approximately 33% boost in processor speed: what's not to like?!

Well: there is one problem. The original Celerons were 35W parts, and the passive heat sink the HP Microservers ship with are rated for 35W, so are a nice match. But the E3-1260L is a 45W part and the E3-1230 runs at a whopping 65W. More watts equals more heat, especially with a heatsink rated way below what each of the new processors can output. Space is incredibly tight within the server chassis, though, so there is no way to buy a thunking great cooling fan and sticking it onto the original heatsink: there just isn't enough clearance, as this photo shows:

It doesn't help, either, that the original heatsink is attached by a rather unique rectangular arrangement of screws; your standard, square cooling fans and alternative passive heatsinks are not going to fit.

As it turns out, the 45W CPU is almost 35W, enough so anyway that when I checked the CPU temperatures under load, the temperatures were not outrageous: I forgot to take a detailed note, but from memory, I was pegging around 58°C after 30 minutes of 100% CPU utilisation for all 8 threads, which compared relatively favourably to the original Celeron's approximately 46°C. Clearly the extra wattage of the new CPU results in a hefty temperature rise of around 12°C, but the result is not anywhere near toasty temperature levels that would cause real concern. So the 45W CPU can run as it is, with no further concerns or consideration.

But the E3-1230 V2 was another matter. After its 30 minutes of 100% utilisation, that CPU was clocked at a rather scary 86°C, which is a degree or so above the point where it's automatically throttled back to lower, slower speeds (ambient temperature in my study, where I tested these things, was always around 22°C, by the way). It is true that I don't ever expect these servers to be continually stressed like that in day-to-day operation; nevertheless such a high operating temperature under load seemed to me a problem I'd like to avoid if at all possible.

So I visited Amazon and bought two of these things:

It's a tiny -and I mean that's it's only about as big as the first joint of your thumb, so really very petite- 12 volt fan. It is fairly quiet in operation and pushes a reasonable amount of air around. I glued both of them to the passive heatsink, like so:

…and wired them into the existing tangle of molex cables, making sure that they blew air towards the back of the case, where the giant built-in fan is waiting to expell it all into the great outdoors! After another 30 minutes at 100% utilisation, I was recording temperatures of around 80°C, which wasn't a spectacular improvement, but 6°C means the CPU wasn't being throttled any more and was a temperature I could (just about) live with.

But whilst the HP BIOS on this server is configured to run the main system fan in 'optimal mode' by default (which means it just purrs quietly in the background!), there is an option to bump that up to 'Increased' or 'Maximum Cooling'. I therefore decided to go all-in and switched on 'Maximum Cooling' mode and re-ran my temperature tests. The gentle purr of optimal cooling gave way to a server-room-like roar in maximum mode, but I ended up with temperatures of around 71°C, which is an excellent result. Presumably, without the little fans mounted on the CPU, I might have been able to achieve 77°C or so anyway, just by bumping up the built-in fan capabilities, but the extra 6°C from the purchased micro-fans is appreciated anyway.

Happily, as this screenshot shows, with 100% CPU utilisation on all four cores (and all 8 threads), the CPU temperatures not only topped off at around the 70°C mark, but the CPU frequency easily maintained an un-throttled 3.5GHz:

Incidentially, the new roar of the built-in fan won't be a problem for me once these servers are returned to their normal hiding place: in the loft, where ambient temperatures are usually significantly cooler than those found in my study, which can only help matters further!

My two old servers are now both more than capable of running as my ZFS storage servers and as virtual machine hosts, increasing their utility to me significantly. For a relatively small outlay on new CPUs and an enjoyable day or two of tinkering with thermal paste application and micro-fan gluing, I've extended the useful life of these workhorses with minimal effort. Recommended!

2019/05/17 16:59 · dizwell

30 OK

Fedora released version 30 of their distro the other day. Since I was already running version 29, I thought I'd give the in-place distro upgrade mechanism a workout:

sudo dnf upgrade --refresh
sudo dnf install dnf-plugin-system-upgrade
sudo dnf system-upgrade download --releasever=30
sudo dnf system-upgrade reboot

For the most part, it was plain sailing and no trouble at all -except that the third command produced an error, indicating that something called “mscore” couldn't be upgraded and, therefore, that nothing at all would be upgraded.

That's a bit sad, since “mscore” is actually MuseScore, a music notation program I use quite a lot. Permanently losing it would not be acceptable, but a temporary loss of it can be coped with. Thus a simple sudo dnf remove musescore was issued, and that allowed a subsequent attempt to run the third command above without drama.

The actual upgrade process takes place after the reboot command; some 4700+ applications needed to be upgraded, which took a fair amount of time. However, after no more than (I would guess!) 15 minutes, the PC was back in working order and sporting the new version:

[[email protected] ~]$ cat /etc/redhat-release 
Fedora release 30 (Thirty)

It is, of course, still Musescore-less, but that will hopefully be rectified soon enough.

I keep looking around for an alternative distro I can live with and like rather more than Fedora, but not much is coming up on my distro-radar right now. In a virtual machine, I am trying to learn to love Manjaro OpenBox, but it's definitely not love at first sight! Meanwhile, a somewhat fresh Fedora does all I could ask for (apart from run Musescore!!)

Update: One other minor problem arising: there's no version 30 repository for VirtualBox as yet, so all attempts to do a 'dnf update' generate the error:

Fedora 30 - x86_64 - VirtualBox                      2.2 kB/s | 6.9 kB     00:03    
Failed to synchronize cache for repo 'virtualbox'
Ignoring repositories: virtualbox

The problem is found in /etc/yum.repos.d/virtualbox.repo, which contains the line:

baseurl=http://download.virtualbox.org/virtualbox/rpm/fedora/$releasever/$basearch

Of course, that “$releasever” variable gets turned into “30” after you upgrade to Fedora 30 -and no such sub-directory exists on the Oracle/VirtualBox end of things as yet. So, for now, the workaround is to change that line to read:

baseurl=http://download.virtualbox.org/virtualbox/rpm/fedora/29/$basearch

That is, you're basically hard-coding 'use version 29's software' for now, and that makes the earlier error message disappear. Eventually, one assumes, there will be a version 30 directory at VirtualBox, and when that happens, it will be fine to change back to '$releasever'. Meantime, the hard-coded '29' will ensure your VirtualBox installation remains current and properly patched regardless.

Update #2: It's now May 5th, so a couple of days after writing the main article: MuseScore is back and working fine. It's in the main repositories and no addition of slightly-suspect 'copr' repositories is required to get it working. Nice work, developers!

2019/05/02 12:52 · dizwell

Candid Camera

Our house is festooned with old surveillance cameras that the previous owners thought desirable, but which we thought looked, frankly, utterly paranoid and bonkers -largely because there were half a dozen of them! So we've never used the pre-existing surveillance infrastructure. After a spate of local break-ins, however, we thought it might be a good idea to have some video surveillance after all -just not in quite the loony way our predecessors considered suitable!

In the shower one day (too much information, I realise!), I had the idea that the drawer-full of old smartphones we possess but never use might be put to good use with a bit of ffmpeg or similar Linux cleverness. What started as an idle thought in the shower soon became a bit of a project, however, as smartphones are not ideal webcams and ffmpeg isn't easy to get to grips with!

In the end, I put a simple two-camera system together (with a freshly-bought, second-hand 2012 i3 PC) for around £120 and the results are pleasing …though still require a bit of tweaking in the days and weeks to come, I suspect.

There was then a bit of a scare when a neighbour suggested I might be in breach of the GDPR (new European privacy regulations), but a bit of research there made it clear that the GDPR is not an obstacle to running a domestic CCTV system, provided you follow a couple of simple rules/guidelines.

Anyway: I wrote the entire thing up as a bit of an article, which can be found here.

2019/04/04 14:51 · dizwell

Fixed!

Just a short note to say that several days have now elapsed and Fedora 29 hasn't crashed once in all that time: the fix mentioned last time (remove Nouveau and replace with proprietary Nvidia graphics drivers) has genuinely resolved the intermittent lock-up problem I was having.

I decided to stick with Fedora, rather than re-install Debian, as I suspected I might!

2019/04/02 09:28 · dizwell

Nouveau Bother

After my fourth abrupt crash on my newly-installed Debian system, I'd had enough. CPU, disk and graphics card temperatures were all fine and Memtest didn't record any problems with my 96GB RAM, so the PC is physically as fine as I can check it to be. So: it clearly must be the operating system!

I therefore swiftly wiped Debian Testing and installed Fedora 29 (KDE Spin). Looks a lot better than most Fedoras I remember, everything application-wise installed fine too, with minimal fuss. Then the neighbour popped round for a short chat and once she'd gone, I returned to whatever I'd been up to when she first arrived… and discovered my PC had completely locked up again! Maybe it wasn't the operating system after all!

Again, physical health checks were fine, so in desperation, I did what I should have done the first time… and read the contents of /var/log/messages. Therein, I found this:

Mar 25 16:49:10 britten kscreenlocker_greet[25948]: Connecting to deprecated signal QDBusConnectionInterface::serviceOwnerChanged(QString,QString,QString)
Mar 25 16:54:12 britten kernel: nouveau 0000:02:00.0: fifo: SCHED_ERROR 0a [CTXSW_TIMEOUT]
Mar 25 16:54:12 britten kernel: nouveau 0000:02:00.0: fifo: runlist 0: scheduled for recovery
Mar 25 16:54:12 britten kernel: nouveau 0000:02:00.0: fifo: channel 2: killed
Mar 25 16:54:12 britten kernel: nouveau 0000:02:00.0: fifo: engine 7: scheduled for recovery
Mar 25 16:54:12 britten kernel: nouveau 0000:02:00.0: fifo: engine 0: scheduled for recovery
Mar 25 16:54:12 britten kernel: nouveau 0000:02:00.0: Xorg[1307]: channel 2 killed!
Mar 25 16:54:43 britten kernel: perf: interrupt took too long (3929 > 3925), lowering kernel.perf_event_max_sample_rate to 50000
Mar 25 16:55:01 britten cupsd[1256]: REQUEST localhost - - "POST / HTTP/1.1" 200 182 Renew-Subscription successful-ok
Mar 25 17:25:58 britten kernel: Linux version 5.0.3-200.fc29.x86_64 ([email protected]) (gcc version 8.3.1 20190223 (Red Hat 8.3.1-2) (GCC)) #1 SMP Tue Mar 19 15:07:58 UTC 2019
Mar 25 17:25:58 britten kernel: Command line: BOOT_IMAGE=/vmlinuz-5.0.3-200.fc29.x86_64 root=UUID=5b2c6cd0-adfb-427b-af31-864ce0dd2d0c ro resume=UUID=1b31f287-5672-4a4f-8e28-a7a9ec39f3a6 rhgb quiet LANG=en_GB.UTF-8
Mar 25 17:25:58 britten kernel: x86/fpu: Supporting XSAVE feature 0x001: 'x87 floating point registers'
Mar 25 17:25:58 britten kernel: x86/fpu: Supporting XSAVE feature 0x002: 'SSE registers'
Mar 25 17:25:58 britten kernel: x86/fpu: Supporting XSAVE feature 0x004: 'AVX registers'

You can see from the time jump that the lock-up happened around 4:55pm and I had to wield the power switch at 5:25pm. The interesting lines are what proceed the 4:55pm dramas: they are all, in various ways, reporting that the Nouveau graphic driver is doing something peculiar, including 'killing off' channel 2 (whatever that means).

This would explain why, although my machine appears to lock up, it will quite happily continue playing music in the background if any were playing beforehand: the error messages suggest that the PC itself has not locked up at all, but that everything graphical (which, thanks to Xorg, also means keyboard-related) is dead beyond repair.

So: my suspicions were raised that all was not well with Nouveau and its attempts to interact with my Quadro K4000 graphics card. A swift bit of Googling later, and my suspicions seemed confirmed. Others had reported sporadic 'nouveau channel X killed' messages in the past, too: this report seemed to describe my situation exactly, for example.

I therefore followed this invaluable guide to installing the “proper” NVIDIA drives (and taking Nouveau off the system entirely). So far, so good (though the intermittent nature of the lock-ups means I can't yet guarantee that the problem has genuinely been fixed).

I can't say I've ever had Nouveau problems on this PC before… but, come to think of it, I used to install Nvidia's proprietary drivers routinely because I was using a dual-monitor setup that Nouveau had difficulty configuring properly. These days, I'm single-monitor, and thus didn't think the proprietary drivers were needed… but it seems that sometimes, they are, no matter how many monitors you're using!

It seems I owe Debian Testing an apology, therefore! It wasn't really the distro at fault, just the Nouveau drivers (which every distro uses anyway, so all of them would presumably have faced the same issue, sooner or later).

By way of a happy side-effect, I had also noticed that when I ran my Stellarium astronomy program, it's version of the daylight sky had a weird pink glow about it, like this one (reported by another user a couple of years ago, presumably not for the same reasons):

After the change to proprietary Nvidia graphics drivers, everything looks normal once again:

So, there are a few 'morals-of-the-story' here. First, check your logs before you go doing drastic things like changing operating systems! Second, if one of your applications is showing signs of 'graphical distress', something probably isn't all good with your graphics subsystem. Third, don't blame an entire distro for a single driver's mishap! Fourth, Nouveau is good, but it's not perfect, so don't imagine it can never produce strange results (though this is the first time in many years it has for me and my multitudinous and varied computers).

And there's just one 'outstanding issue' here, too: do I revert back to Debian Testing (which I was enjoying before I unfairly lost my temper with it!)? Or do I stick with Fedora 29 because it's now installed and seems to be working fine??

I suspect I'll stick with Fedora for a while, just because I've done too many installations of late! But watch this space…

2019/03/25 19:07 · dizwell

Older entries >>

wiki/blog.txt · Last modified: 2018/12/12 14:06 by dizwell