/opt/csw/lib/cups/filter/pstopxl /opt/csw/lib/cups/filter/pstoraster
Just got off the phone w/a Sun rep who called up to see how I was doing, did I need any coasters, etc. I took the opportunity to put a bug in his ear about Solaris.
If Oracle removes the entitlement to run Solaris on non-Sun hardware, then what the hell do I have to play with? I've got a bunch of Sun hardware, but only one machine running Solaris -- and that's in production, holding home directories on ZFS; I'm not playing with that.
OpenSolaris folks are asking for answers and not getting any. And saying "Go run OpenSolaris" ignores the problem of figuring out what's in Solaris proper, what's going to be there RSN, and what's two or more releases out.
If Solaris disappears, then I'm not going to figure out how it's better; that's just how:
all work.
I like Solaris for precisely two things: ZFS and DTrace. Solaris has more, I know, but those are the things that matter to me. In all other respects, for me and my situation, Linux or the BSDs are good enough or better. And oh: FreeBSD has DTrace; DragonFly BSD has HAMMER; Linux has *@#%$)%! packaging.
No good ending for this, so we'll just call it quits.
Valerie Aurora always makes for interesting reading. This entry is no exception:
If you spend all day with your co-workers, socialize only with your co-workers, and then come home and eat dinner with -- you guessed it -- your co-worker, you might go several years without hearing the words, "Run Solaris on my desktop? Are you f-ing kidding me?"
Schwartz's "the financial crisis did it" explanation for Sun's demise is a symptom of an inbred company culture in which employees at all levels voluntarily isolated themselves from the larger Silicon Valley culture. Tech journalists write incessantly about the exchange of expertise and best practice between companies as a major driver of the Bay area's success. But you have to actually talk to your competition to do that -- over a beer, or maybe a pillow.
Reminder to myself: Got a file called .nfs.*
? Here's what's going
on:
# These files are created by NFS clients when an open file is
# removed. To preserve some semblance of Unix semantics the client
# renames the file to a unique name so that the file appears to have
# been removed from the directory, but is still usable by the process
# that has the file open.
That quote is from /usr/lib/fs/nfs/nfsfind
, a shell script on
Solaris 10 that's run once a week from root's crontab. Some
references:
I ran into a couple problems compiling NUT on Solaris 10 today. They were pretty much due to bad setup on my part, but they did take a while to track down. For the record:
libtool: link: only absolute run-paths are allowed
: This turned
out to be an obscure way of saying "You don't have libsnmp
installed". Solution: configure --without-snmp
.
false cru
: The full error was:
libtool: link: false cru .libs/libparseconf.a .libs/parseconf.o
gmake[1]: *** [libparseconf.la] Error 1
This turned out to be a consequence of not having /usr/ccs/bin
in
my $PATH
.
Install logwatch on Solaris fileserver.
Notice that logwatch emails are not coming in.
Log in and run logwatch by hand.
Inspect mail log and notice lack of any entries.
Notice that Postfix is in maintenance mode; start it up.
Notice continued lack of emails.
Notice that Postfix is running, which confused svcadm when told to start up Postfix. It fails to do so and fails to log this.
killall postfix, svcadm enable postfix.
man svcadm; svcadm clear postfix; svcadm enable postfix.
Run logwatch by hand; notice emailed report to "root@localhost.localdomain", which gets bounced by Postfix on the mail server because it's a non-existent host.
Resist temptation to go down that rabbit hole just now, and stick to the problem at hand.
Edit /opt/csw/etc/log.d/logwatch.conf and set MailTo to proper address.
Re-run logwatch and note that reports are still going to root@localhost.
After much swearing, notice that actually, logwatch is set to look in /opt/csw/etc/log.d/conf/logwatch.conf for configuration.
Edit that file, re-run logwatch.
Notice errors from Postfix: "postdrop[13848]: [ID 947731 mail.warning] warning: mailqueueenter: create file maildrop/908447.13848: Permission denied".
Run "postfix set-permissions". Test mail; still failing.
Check permissions on another system and set by hand.
Re-run logwatch. Still no email. Re-run with debug=high and get email.
Wonder idly about futility of self-aware log watching system that can't report on its own heisenbug-induced failure, crappy packaging practices, inability to check end-to-end email connectivity, other career options.
(Update) Realize that the emails show up if "Detail" is set to Medium or High ; Low, the default, makes the report silent.
(Update) Uninstall the package and reinstall, only to find that the symlink to conf/logwatch.conf is set up at installation, and that this is probably a case of $EDITOR breaking the symlink. Apply head to desk.
I knew I didn't like Vaio's very much, but I had no idea they were so awful — to the point of requiring hacking on your goddamn BIOS to enable VMX.
The flash demo for Dell's ML6000 tape library boasts that it's "completely self-aware". Not sure I want SkyNet running my backups…
O'Reilly has an upcoming webcast on -- deep breath -- "Advanced Twitter for Business". (At least they didn't call it a webinar. When I told my wife about this, she said "So...you and O'Reilly break up yet?"
Obviously not, because I've just ordered Backup and Recovery and Linux Clusters with Oscar, Rocks, OpenMosix and MPI. I had purchased B&R at my last job, but this is for me.
And did I mention the dream I had a while back about a Sun laptop that looked like an X4200 server folded in half? In the dream it ran nearly perfectly, except when you tried to go to a web page with flash; then it would crash, and a movie of Matt Stone would play, apologizing on behalf of Jonathan Schwartz and everyone else at Sun.
I'm playing with the CVS version of Emacs after reading about some of the new features in what will become Emacs 23. It's nice, but the daemon mode isn't quite multi-tty — you can run Emacs server, detached from any TTY, but if you try connecting to it with multiple emacsclient instances, the first one is where all the TTY action goes. Not sure what I'm missing.
Heads up for those of you using Blastwave and CUPS: after upgrading to the latest stable version, printing stopped working for me (and a few users :-). I eventually tracked it down to the movement of two files: suddenly
/opt/csw/lib/cups/filter/pstopxl /opt/csw/lib/cups/filter/pstoraster
were moved to
/opt/csw/lib/cups/pstopxl /opt/csw/lib/cups/pstoraster
resulting in many error messages like Unsupported format text/plain! and Hint: is ESP ghostscript installed?
. Moving them both back into place and restarting CUPS fixed things just fine.
According to Bacula (yay Bacula!) both files were in the right directory as of last night, and Blastwave's file list for Ghostscript shows the new location for these two files. A bug has been filed.
At last: I'm finally coming to the end of working with the verdammnt web registration forms. We're going from our awful hack of a glued-together mess of Mambo and custom PHP, to something that'll mainly be Drupal with no custom code. Allegedly it's six weeks 'til launch date; the registration forms in use right now will limp along 'til they're no longer needed (end of the summer).
The registration form I'm working on now is not complicated in the absolute sense, but it's the most complicated one we've got. Last year I was afraid to touch the (old, legacy, ugly) code, and mostly just changed dates. This year I thought "fuck it" and rewrote nearly all of it, using the tools and skills I'd picked up in the meantime. (I'm still not a great programmer, understand, but I have improved some over last year.)
After a full day banging my head against it, I'm finally coming to the point where I'm pretty confident that the code will do what it's supposed to. And that's a relief. Therefore, in the stylee du Chromatic, I give thanks to:
In other news...just downloaded the second dev preview of Indiana, which I'd managed to not hear about at all (the preview releases, that is). I love university bandwidth; 640MB in about 1 minute. Sweet. I'll give it a try at home and see how it feels.
I've just finished reading the summaries of LISA '07 in the latest issue of ;login:. I feel…incredibly left out. I'm starting to think this profession might not be such a simple thing, you know, man? Sir? The presentations on autonomic computing have left me feeling a bit like a buggy whip maker with his nose to the grindstone.
And yes, it's a way off, and yes, small shops and generalists will probably be around for a while to come. But I'm not sure how much I want to keep being at a small shop. Which means learning the big stuff. Which, natch, is hard to do when you're trying to figure out how to properly test registration forms. Sigh.
But: I just stuck my head out a door at work and saw a chickadee. It chirped for a while, sitting on a tree near our building, then flew off. On a rare sunny day in Vancouver in Frebruary, after a week of messed-up sleep and feeling like I've been spinning my wheels, this is nice.
At work, our mail server is an aging E220R. While underpowered for all it does, it has behaved well, more or less, until recently.
A couple of months ago it power cycled itself for no apparent reason. This weekend, it did the same thing. This is exactly the same behaviour I saw from another E220R at $other_university, and in that case it got progressively worse. Another sysadmin here says he's seen the same behaviour with two in his care. I'm preparing for the worst.
Part of that has meant preparing to move its functionality to another machine; this has been an excellent chance to delve into the bowels of our mail and list system. I've been steadily improving (read: creating) this for some time now, but this points out some bits I hadn't. So that's good.
Plan C is a loaner E280R from the other sysadmin (op cit.). I ran into trouble getting it working, though. First, I couldn't get a serial console working. (Getting a serial port working always seems to be a pain for me, no matter what the machine.) It has two of the old DB-25 ports; no problem, since I had a splitter and had got that working on the E220R. Except that it didn't work: no matter which port I hooked it up to, I couldn't see any output. I tried flipping the key around to diagnostic mode, but I still didn't see anything. (The manual said that you should be able to force output to ttyA by power-cycling the machine and hitting the power button twice when the amber service LED started blinking…but I never saw the blinking.)
This was especially weird to me because I had been able to get output from the RSC card using the same setup: OpenBSD laptop -> usb serial adapter -> DB-9 to RJ-45 adapter -> Cat 5 cable -> RJ-45 on RSC card. (The only difference was that, with the DB-25 port, the Cat5 cable had fit into the back of the DB-25 splitter.) But I couldn't log into the RSC card, and a quick Google turned up no easy way of resetting its password. (Putting it into the other E280 I have, which runs our database and website, was not an option.)
Out of desperation I finally hooked up the Cat5 to the DB-25 splitter on one side, and the console server on the other…and that worked. Damned if I know what was going on.
But then I had another problem: when it booted, I kept seeing line
after line of I2C reset error
; after a while, it would power-cycle
itself and the pattern would start again. I remembered that op
cit. had slotted the second CPU for me, so what the hell: I reseated
it, and that did the trick.
Next up is detaching $failing_machine's second hard drive from the mirror and seeing if I can get it to boot in the 280. Let's hope.
In other news, LinuxFest Northwest is calling for papers. Were that not right around the due date of Project U-14, I might try submitting something and see what happens. Oh well...next beer in Jerusalem!
And there's the laptop battery...shoulda charged it at work.
One of the things about pkgsrc is that it's very sensitive to paths and which compiler you use. (And fair enough; the whole process of bootstrapping a working set of tools for eight hundred thousand different OS' is ridiculous enough that it's a wonder it works at all. But I digress.)
Case in point: Solaris 10 machine today, installing pkgsrc on it for
the first time. I successfully compiled gcc34
, added GCC_REQD=3.4
to mk.conf
, and then went to compile kile. During compiling of
Mesalibs, one of its 3.2x10^6 dependencies, I got this error during
the final linking phase:
/opt/pkg/bin/libtool: ar: not found
Naturally it was there in my path, so WTF?
I eventually came across a message to the pkgsrc user's list
which suggested rebuilding libtool-base
. This made a certain amount
of sense to me, as I'd built that package using the bootstrap (ie,
not-installed-from-pkgsrc) version of gcc to compile it; it was before
I figured out the GCC_REQD
directive. So I ran:
$ pkg_delete libtool $ cd /opt/pkgsrc/devel/libtool $ bmake clean && bmake install $ cd /opt/pkgsrc/graphics/MesaLib $ bmake clean && bmake install
and everything was right again.
My lack of experience with LDAP in general, and Sun's (iPlanet|Directory Server( Enterprise Edition)?) in particular, has proven to be a bit of a handicap of late.
Case in point: when I upgraded $big_machine to Solaris 10 at the end of August, I also upgraded its LDAP server from iPlanet 5.1 to DSEE 6 (same software, different name). At the time I had two problems: I was unable to get replication to $big_server (we have a multi-master configuration; not supposed to work with 5.1, but it does/did for us) working over SSL, and replication from $big_server to other machines did not work. There were a lot of things going wrong at that point, so I set up replication in the clear from $little_machine, another LDAP server on the LAN, and left it 'til I had more time. It wasn't ideal, but it would do.
The last two Saturdays I've been trying to figure out why replication wasn't working. I concentrated on getting replication to it working over SSL. This was tough, because the logs didn't tell me much:
Server failed to flush BER data back to client
I swear, this turned up more Googlejuice today than it did a few weeks ago, because this time it turned up the ever-excellent Brandon Hutchinson again. This time he had a truly great set of instructions on installing DSEE6. That lead me to this blog entry, very helpful, giving information about the different sorts of databases you can stick your SSL certs into. (Must learn more about SSL/OpenSSL…)
However, in the end it turned out to be a simple and moderately
embarassing mistake: it's not enough, with DS6, to say dsadm
add-cert
and be done with it; you actually have to specify the
certificate to use. As Brandon points out, you have to edit =dse.ldif=
in order to do so (though I had to stop the server, edit the file and
start it up again, rather than just edit and restart, in order to get
it to work).
The other thing — replication from $big_server elsewhere — is still not working. I suspect this is my fault; in an attempt to get things working, I decided that the thing to try would be initializing $big_server from $little_server, then the other way around. This did not change things, and now $little_server is unable to push its changes elsewhere. I've since been told this is a mistake on my part; arghh.
Unfortunately, there were other things I screwed up in the original install of DS6 on $big_server — embarassing and rather pointless to record for Google right now — and I strongly suspect that I'm going to have to reinstall or reinitialize $big_server just to get things into a reasonably coherent state. Fortunately, there aren't that many changes that ever happen on it, so there shouldn't be many to lose or redo if it's wiped.
And thus my Saturday.
One of the problems I've been working on since the upgrade to Solaris 10 has been the slowness of the SunRay terminals. There are a few different problems here, but one of 'em is that after typing in your password and hitting Enter, it takes about a minute to get the JDS "Loading your desktop…" icons up.
I scratched my head over this one for a long time 'til I saw this:
ptree 10533 906 /usr/dt/bin/dtlogin -daemon -udpPort 0 10445 /usr/dt/bin/dtlogin -daemon -udpPort 0 ``` 10533 /bin/ksh /usr/dt/config/Xstartup 10551 /bin/ksh -p /opt/SUNWut/lib/utdmsession -c 4 10585 /bin/ksh -p /etc/opt/SUNWut/basedir/lib/utscrevent -c 4 -z utdmsession 10587 ksh -c echo 'CREATE_SESSION 4 # utdmsession' >/dev/tcp/127.0.0.1/7013
which just sat there and sat there for, oh, about a minute. So I run netcat on port 7013, log out and log in again, and boom! quick as anything.
/etc/services
says:
utscreventd 7013/tcp # SUNWut SRCOM event deamon
which we're not running; something to do with smart cards. So why does
it hang so long? Because for some reason, the host isn't sending back
an RST packet (I presume; can't listen to find out) to kill the
connection, like it does on $other_server
.
So now I'm trying to figure out why that is. It's not the firewall;
they're identical. I've tried looking at ndd /dev/tcp \?
but I don't
see anything obvious there. My google-fu doesn't appear to be up to
the task either. I may have to cheat and go visit a fellow sysadmin to
find out.
And what do I see on Ben's blog but the new version of Solaris out — 8/07, not two weeks after this fiasco. Craptastic!
Some fun Emacs stuff:
I had a meeting with my boss at work last week (before a nice four-day weekend…the split schedule I've got means that sort of thing happens very rarely. But I digress) to set my priorities now that the upgrade has more or less been finished (lingering issues aside; see ahead).
One of the big things is getting Zimbra set up. This will be nice; we do not have a calendar for the office right now, and this is is getting to be a pain. My boss is open to the idea of something that's not Outlook/Exchange, and that's good.
The other thing is getting a bunch more Windows machines in. This is a small shop, so "a bunch" means another 15 or 20 -- which'll double the number we have. I'm not entirely happy about that, but because this is a longer-term project I've been given time to do this right. And to me, "right" means "using open-source tools whenever possible to manage Windows". Thus, I'll be getting the time to set up Unattended and wpkg, and possibly even digging up Windflower and seeing if it's worth continuing. I'm actually kind of excited about this.
It's a little strange having a manager take this much of a hand in setting priorities; I've worked in a series of small shops and, up 'til now, have been left more or less on my own nearly the whole time. It does feel good to get a bit of direction, though. I mean, I know what needs to be done and I'm doing it, but I've always felt a bit lost trying to decide what's most important for everyone once past the finger-in-the-dike stage.
Now to go try and get Multi-TTY working on this laptop…
Ack: Just realized I never described the lingering problems with
Solaris 10. Fairly simple to describe: LDAP lookups take 'way longer
than they should (ls -l /home/
can take 5 seconds per line
sometimes), and JDS on the SunRays is slower in parts than it should
be (click on the logout button, wait 60 seconds, message pops up
saying "Are you shure you want to log out?"). I'm hopeful I can track
those down without too much effort…
Saturday I upgraded the big machine at work to Solaris 10 11/06. This did not go well.
First off, I ended up installing onto a disk that held home directories. The install was a manual one, and I'd carefully noted in advance the disk I'd be installing to: the second internal hard drive, the one I'd tried doing the luactivate on a couple weeks ago.
Only the disk targets/names/whatever changed, and so c1t0d1 (say) was now one of the home partitions mounted from the external StorEdge array. Fuck. There were backups: I'd taken a backup before starting the install. Unfortunately, they were taken 3 hours before the install started, and during that time the machine had been up and running. The install started at 8am, so I'm hopeful there wasn't too much lost between 5am and 8am. But don't think I'm trying to minimize that mistake.
Second, I'd also managed to bork the disklabel for the original Solaris 9 install. I dug up the original disklabel somewhere — it wasn't in the documentation we've got, and I should have put it in there a long time ago — and restored everything to the way it was. It hadn't been formatted, so everything was okay.
Third, when it came up only one of the three external drives from the StorEdge was present, and I could not figure out where the others had gone. (It took me a while to figure this out; when first I realized my first mistake, I thought I'd installed over all the home directories. That was an awful moment.)
It took a lot of Googling to figure out what I should have already
known about Solaris in general, and what should have been documented
about this machine in particular: that /kernel/drv/sd.conf
had been
modified to add additional entries for LUNs that otherwise Solaris
wouldn't have looked for.
(Many thanks to Brandon Hutchinson, whose entry on this very subject saved my butt. I wrote him a grateful email, and I wish him the best.)
(Incidentally, a reconfiguration reboot on a VS480 takes between 10 and 25 minutes. It's not a fast process. Also not a fast process is installing Solaris patches; I spent at least two hours on this all told, not counting reconfiguration reboots.)
I restored the one home directory (having recreated it in ZFS…one bright spot in all that) and mounted the others. All this got me, at 6pm, where I should have been at noon.
I was there 'til 11:30pm on Saturday fixing things up to the point where it was more or less ready for SSH-based logins. Then I took a cab home. Then I came in yesterday at 10am and got almost everything else working: SunRays (oh, the new desktop is beautiful), printing, software, and I can't even remember what all at this point.
I took lots of notes and did everything from within screen
with
logging turned on. (Bonus points for next time: set the prompt to show
the time, so I can tell what order I did things in.) I'll be going
over all of it to do things better next time.
Here's some stuff I already know:
Backups. It's said you never know how much you need 'em 'til you need 'em. True 'nuff.
DOCUMENTATION. I spent a good part of yesterday getting information on every disk while waiting for other software to install. I should have done this long, long ago.
(Incidentally, on that front I owe Blastwave an apology: right on the
goddamn HOWTO page there's a section on automation. My
mistake. But I still don't like the fact that the remove option (-r
)
is undocumented, and presumably undocumented because of the warning it
prints that it's not very smart and shouldn't be used.)
Know what you're dealing with. The home partition I erased was bigger than the disk I expected to install on, but I wasn't sure of its size.
Stop if you're not sure. I should have stopped at the last point.
Be paranoid. Usually I am, but it would have been good to disconnect every superfluous drive rather than go through all this hell.
Sometimes it really amazes me that I get paid to do this work because it's so much fun. And sometimes I'm amazed because I figure I shouldn't be allowed to touch computers with a ten-foot pole.
I'm feeling pretty damned humble this morning. With luck that feeling will stay.
I've complained about Blastwave before, but this is just terrible.
Trying to install VLC on a Solaris 10 machine using Blastwave. Says
that CSWcommon
is out of date, so please run pkg-get -u
. As this
always includes thousands of prompts that look like this:
The following package is currently installed:
CSWoldapclient openldap_client - OpenLDAP client executables (oldapclient)
(sparc) 2.3.31,REV=2007.01.07
Do you want to remove this package? [y,n,?,q] y
## Removing installed package instance <CSWoldapclient>
## Verifying package <CSWoldapclient> dependencies in global zone
WARNING:
The <CSWoldap> package depends on the package currently
being removed.
Dependency checking failed.
Do you want to continue with the removal of this package [y,n,?,q]
...I look around for a way to automate this. And surprise, there
is, and I've missed it the whole time. My bad. So: pkg-get -f
upgrade
it is, then.
It runs for 45 minutes and stops with an error about CSWcommon:
Current administration requires that a unique instance of the
<CSWcommon> package be created. However, the maximum number of
instances of the package which may be supported at one time on the
same system has already been met.
Hm, sez I. That's strange, but maybe that's what it's like for package
managers that suck. pkg-get -r common
and pkg-get -i common
, and
I'm ready for the upgrade again.
Somehow in the process I managed to remove the pkg_get
package,
which (surprise) contains the pkg-get
command. Fortunately I have a
backup copy around and use that to install pkg_get
. Life continues.
And it's not for another 15 minutes after that that I notice that the package manager is going in loops. It keeps going over the same packages again and again, giving the same errror about unique instances each time. A quick search turns up this link, which tells me I'm a fool for believing the help offered by pkg-get:
$ pkg-get -h
pkg-get, by Philip Brown , phil@bolthole.com
(Internal SCCS code revision 3.6)
Originally from http://www.bolthole.com/solaris/pkg-get.html
pkg-get is used to install free software packages
pkg-get
Need one of 'install', 'upgrade', 'available','compare'
'-i|install' installs a package
'-u|upgrade' upgrades already installed packages if possible
'-a|available' lists the available packages in the catalog
'-c|compare' shows installed package versions vs available
'-l|list' shows installed packages by software name only
Optional modifiers:
'-d|download' just download the package, not install
'-D|describe' describe available packages, or search for one
'-U|updatecatalog' updates download site inventory
'-S|sync' Makes update mode sync to version on mirror site
'-f' dont ask any questions: force default pkgadd behaviour
Normally used with an override admin file
See /var/pkg-get/admin-fullauto
'-s ftp://site/dir' temporarily override site to get from
and that the correct way to do what I want is to run:
true | sudo pkg-get upgrade
I admit that I neither knew nor sought to find out what "default pkgadd behaviour" would be, so that's my fault. I admit that I was the one who borked things by removing the pkg-get
command. I admit that I did not think to record all of this with script
, so at the moment I'm going on scribbled notes and memory. This is not a bug report, which is what I really should be writing. These are all things I did wrong or badly.
But isn't this what apt has fixed? On its worst day, I've never
had to set up yes
to be the drinking bird that would let me
get stuff done. And — when all was done, and I got to go back to
installing VLC — I've never had it depend on gcc.
Arghh. Arghh arghh arghh.
The upgrade to Solaris 10 did not work. The main problem was that logging in at the console (even as root!) simply would not work: I'd get logged right back out again each time, with no error message or anything. WTF?
I managed to go into single-user mode, provide the root password (see? they do trust me) and get access that way. But I still couldn't figure out what was going wrong. Eventually I came across this entry in the logs
svc.startd[7]: [ID 694882 daemon.notice] instance svc:/system/console-login:default exited with status 16
And /var/svc/log/system-console-login:default.log
said:
[ Aug 4 14:23:48 Executing start method ("/lib/svc/method/console-login") ]
[ Aug 4 14:24:05 Stopping because all processes in service exited. ]
Eventually I had to give up and revert back to Solaris 9. That part worked well, at least.
I've no idea what went wrong at this point, but since I haven't come
across this before with other Solaris 10 installs I'm starting to
wonder if it's a product of luupgrade attemting to merge the machine's
current settings with Sol10. Between that suspicion and the increase
in disk space needed to run luupgrade (not sure why, but for example
/usr
needed a couple extra GB of space in order to complete
luupgrade
; I presume something's being added or kept around, but
there's no explanation I can find for this), I'm starting to think
that just going with a clean install of Sol10 is the way to go.
Arghh. Live Upgrade was supposed to just work.
I'm running Solaris Live Upgrade at work to upgrade our main server from Solaris 9 to Solaris 10, and one thing I haven't seen mentioned in all the things I've read about it is how long it takes.
Right now, for example, I'm running luactivate
to activate the new
boot environment. It's been running for half an hour now, with no
indication about how long it's going to take. If I'd known it would
take this long, I'd have scheduled it for earlier this morning. And
yeah, it would've been obvious if I'd thought about it...
Shet my mouth, it just finished after 38 minutes. For the record, this is on a V480/ 16GB of RAM, and call it 50GB total of disks to be synced.
Which is very timely, as I'm trying to track down why nscd
door access is taking so long: http://au.sun.com/news/onsun/2002-11/tech_tips.html
While trying to figure out how to get a colour printer to print colour (HP: why the hell would you turn off colour in the PPD for your colour printer? Huh?), I came across this very cool post from Martin Paul, the guy who wrote pca, the best damn Sun patching tool I've come across.
Turns out you can take a new version of printer firmware for your HP printer and print the damned thing to your printer to update it. In particular, he mentions the 79.00FE problem that has plagued me for a while; I'll have to give it a try.
Oh, and the PPD thing -- for the record, there's a new HP 4700dn in town. I'm adding it to Solaris 10, which once you figure out how to do it is relatively simple:
lpadmin -p NewPrinter -p /dev/null -m netstandard_foomatic lpadmin -p NewPrinter -I PostScript -n /path/to/ppdfile lpadmin -p NewPrinter -D "HP 666 in Room 212" lpadmin -p NewPrinter -o dest=newprinter:9100 -o protocol=tcp -o timeout=5 cd /etc/lp/fd for i in *fd ; do name=`basename $i .fd` ; lpfilter -f NewPrinter -F $i ; done accept NewPrinter enable NewPrinter
Simple, that is, if HP haven't gone and stuck a stanza like this into the PPD on the CD:
*% *% Print Color as Gray *% Chose NOT to use Adobe's *ColorModel keyword because color on or off is simpler *% *OpenUI *ColorModel/Print Color as Gray: Boolean *OrderDependency: 20 AnySetup *ColorModel *DefaultColorModel: CMYK *ColorModel CMYK/Off: "<</ProcessColorModel /DeviceCMYK>> setpagedevice" *ColorModel Gray/On: "<</ProcessColorModel /DeviceGray>> setpagedevice" *?ColorModel: " save ``` currentpagedevice /ProcessColorModel get /DeviceGray eq {(True)}{(False)}ifelse = flush ``` restore " *End *CloseUI: *ColorModel
Took a while to track that down. Yes, I could've used one of the other PPDs on the machine — pretty generic colour Postscript, really — but then they didn't know about the duplexer. And I have to admit this makes it easy to set up a b&w-only queue.
From Bruce Schneier's newsletter comes this blog entry suggesting that there simply aren't that many serious spammers. Interesting data.
Managed to get the Perl/PHP parser extended so that it would see
nested PHP arrays and translate them to the proper hash/array
references in Perl. It was good to do that, but then other problems
arise — like the fact that, as the parser stands right now, it simply
stops parsing if it finds something it doesn't understand. This could
be something like a comment in a nested array, or something like if
($debug == 1) { $foo = "bar"; } else { … }
.
Again, I'm concluding that this would all be much, much easier if it was in a database…just have PHP and Perl suck out the data and do what they want. Either that, or just start writing everything in Perl…
Update: Also, this is not what I expect to see at the top of Planet Solaris — though maybe this should've prepared me. Rockwood's coworker's post is worth reading too.
Update2: Just for completeness, I'll mention that Ben's updates and comments are also worth reading. That's it from the Obvious Dep't.
I think I'm going to have to end my experiment with Nexenta.
I've been running it for a couple months now on my desktop machine, and for the most part it does everything I'd want it to do. Sound doesn't work (built-in Intel chipset, 945 I think), but I haven't really looked into it too hard; the screen resolution keeps changing back to 1400x1200 for me in IceWM, but again I haven't really looked into it too hard. Firefox runs fine, xterms work, Emacs is there, and since it's a 2.8GHz P4 (cf. the 500MHz P3 I was running before), it's all ver' fast.
But when I started using it, I had visions of helping get it released; there are 90 bugs to knock down, and I could help with that. I can — I did (a little) — but with a 9-month old kid to help take care of, my time is et up pretty damn quick. A couple of hours on the weekend is the sum of my spare time right now, and that's for everything.
Why do I mention that? Because OpenSolaris needs a lot of learning, and Nexenta/GNU/Solaris needs a lot of work to get a beta release out the door. I thought I'd learn dtrace; I thought I'd knock down a half-dozen bugs while a growing community joined in.
That turns out not to be the case. And it's a damned shame, and I'm not helping matters any by giving up. I love the idea of Solaris + Debian. I'd like to see it up and running and grabbing people's attention and all the rest, but it's just not happening right now.
And so, in a few minutes I think I'm going to install Debian on here. It has what I want and lots more. There are plenty of people involved in the project, covering for my slacking. I'll be running testing, 'cos I've been doing it so long that it seems foolish to stop now :-). When I finally get around to upgrading the server that prompted this digression, I'll make it stable. I'll probably replace SuSE at work with stable, too.
And now for teh big finish...
$ pkg-get foo
Can't install foo; need newer version of libbar.
$ pkg-get libbar
No such package.
$ pkg-get -a |grep bar
Barlib The libraries of bar.
pkg-get -u foo
does work. Supported? Who knows?(Edit: corrected name of Blastwave's package manager. So much for the moral high ground…:-)
Last month, my work got a new H.323 video conferencing unit, and today we had our first real test: a lecture given at SFU that was streamed to us. For the most part, it went really well; there were no big screw-ups and everything went as planned. During the second half of the conference, though, the audio was intermittently choppy. I'm not certain, but I think that a local user's Internet radio stream may have caused the problems.
If that's the case — and it would surprise me, since I'd assumed we
had a pretty damned fast connection to the Internet — then I'll need
to start adding traffic shaping to our firewall. Working on the
firewall is something I've been putting off for a while, since it's a
bit obscure…lovely pf firewall, littered through with quick
rules. But there's a good tool for pf unit testing I've been meaning
to try out since I heard about it at LISA. Probably won't be as
big a help with the traffic shaping stuff, but at least I'll be
reasonably sure I'm not screwing anything else up.
And now I'm wondering just how hard it would be to come up with (handwave) something that would combine automatic form generation, web-based testing code and summary code. We have these multiple conferences that need registration pages; while some of the information is the same (name, email address) some is different (one conference has a banquet, another wants to know if you're going to be attending all three days). Putting all this in a database and using something like Formitable to generate the form seems perfect.
Since I'm already using Perl's WWW::Mechanize and Test::More to test the pages, it'd be nice to have it look at the stuff used to generate the form and use that to test the page. (That's not the clearest way I could put that, but if I don't write this down now I'll never write it down.) And if I could add something that'd automatically generate summary pages for conference organizers, it'd be even better; stuff like email and address is always easy, but being aware of special questions would be nice too. (Though maybe not necessary…how hard is it to generate summary pages?)
Trouble is, this is a lot of deep thinking that I've never really had to do before. I suspect this sort of thing is a good programmer's bread and butter, but I've never been a programmer (good or otherwise). The more I think about this, the more I can't decide whether this is really hard, possible but too much effort to be worth it, or already done by something I haven't come across yet.
The little things I can handle, though. This crash looks like
it's happening because of a mixup between rand(3)
and
random(3)
. In Linux, both have a maximum of RAND_MAX
, but in
Solaris the latter has a maximum of 2^31. This wreaks havoc with the
let's-shuffle-the-playlist routine in XMMS, and we end up with a
crash. Once I figure out how to program in C, it shouldn't be too hard
to get it fixed. :-)
Now that pkgsrc is working on the main Solaris box on work, I've been
trying to compile Kile, the lovely KDE-based LaTeX editor. KDE,
of course, brings in lots of other stuff; in other situations there
are probably ways of keeping things out (this is where Portage's USE
flags are nice), but as far as I could tell there was no way of
keeping, say, libogg out of the KDE build.
But the annoying part is when the build of KDE stuff failed because
OpenEXR, ILM's open-source high dynamic range graphics file
format, failed to compile. And why? Because hypotf
isn't defined,
along with a bunch of similarly-named functions (atanf
, cosf
,
sinf
and so on). I tried throwing -lm
into LDFLAGS
, but that did
nothing.
Some digging around in include files on a couple different machines turned up the problem: these functions were added in Solaris 10, and thus are not present on Solaris 9. I haven't been able to find any mention of this problem yet, at least for OpenEXR and/or pkgsrc; I'm hoping that there will be some other way of making this work.
Solved a ghostscript problem at work yesterday; not a big deal in itself, but I'd always had this impression that GS crashes were dark, nasty, impenetrable things that I could not possibly understand. I mean, c'mahn, look at this error:
$ ps2pdf report06w5060.ps
Error: /invalidfont in findfont
Operand stack:
Fi 87 --nostringval-- 55 45 --nostringval-- 65 74
74 111 74 83 46 65 65 83 83 83 83 120 46 2
--nostringval-- 4
6 83 83 46 74 83 74 83 83 12 --nostringval-- 92
83 101 1 --nostringval-- 101 120 1 --nostringval-- 138
4 --nostring
val-- 120 120 101 101 120 111 101 101 19
--nostringval-- 55 42 1 --nostringval-- 83 2
--nostringval-- 55 35 --nostringv
al-- 83 83 2 --nostringval-- --nostringval-- 45 166.044
Times-Italic Font Times-Italic 496086 Times-Italic
--nostringval-- Times-It
alic NimbusRomNo9L-ReguItal (NimbusRomNo9L-ReguItal)
NimbusRomNo9L-ReguItal (NimbusRomNo9L-ReguItal)
NimbusRomNo9L-ReguItal
Execution stack:
%interp_exit .runexec2 --nostringval-- --nostringval--
--nostringval-- 2 %stopped_push --nostringval--
--nostringval-- --nostringval-- f
alse 1 %stopped_push 1 3 %oparray_pop 1 3 %oparray_pop
1 3 %oparray_pop 1 3 %oparray_pop .runexec2
--nostringval-- --nostring
val-- --nostringval-- 2 %stopped_push --nostringval--
--nostringval-- 74 4 %oparray_pop 75 4 %oparray_pop
--nostringval-- --nostringv
al-- --nostringval-- --nostringval-- --nostringval-- false 1
%stopped_push 78 5 %oparray_pop --nostringval--
--nostringval-- --nostring
val-- 5 -1 1 --nostringval-- %for_neg_int_continue
--nostringval-- --nostringval--
Dictionary stack:
--dict:1046/1123(ro)(G)-- --dict:0/20(G)-- --dict:75/200(L)--
--dict:103/300(L)-- --dict:17/17(ro)(G)--
--dict:1046/1123(ro)(G)--
Current allocation mode is local
Last OS error: 2
Current file position is 95763
AFPL Ghostscript 8.00: Unrecoverable error, exit code 1
Then, in desperation, I JFGI and found the problem: for some
reason, the fonts had disappeared. This is an old install with lots of
overlapping installs of everything, so it's hard to tell why it
might've happened. However, it should just be a matter of either
getting rid of the old install (rm /opt/bin/gs* (and yes, I know
that's bogus)) or setting GS_FONTPATH
and GS_LIB
appropriately. (Or figuring out why they got borked…hm.)
OTOH, on the same machine I've got The Case Of The Missing Java:
$ java
There was an error trying to initialize the HPI library.
Please check your installation, HotSpot does not work correctly
when installed in the JDK 1.2 Solaris Production Release, or
with any JDK 1.1.x release.
Could not create the Java virtual machine.
instead of (same version of Solaris, too):
$ java
Usage: java [-options] class [args...] (to execute a class)
or java -jar [-options] jarfile [args...] (to execute a jar file)
which kind of worries me since its, like, Solaris and all, and java really should be working. Sigh.
There is nothing worse than a problem that goes away once you restart
the program. Case in point: ls -l /home
ran atrociously slow
(slowly?) on a Solaris 10 machine at work today. It's running Sun's DS
5.2 (or whatever they're calling it these days).
I've come across this problem before when I was trying to figure out how to get the thing to bind to itself by default as an LDAP client, rather than to one of the remote servers that're meant to be backups.
This time, though, that simply wasn't the problem: no traffic was
going to the other machines at all. All I saw was looooooooooong
lookup times for simple passwd stuff. Error logs showed
nothing. Access logs swore blind that access times were on the order
of zero nanoseconds. Truss showed it kept mmap()
ing things; dtrace
showed a whole lotta reads. I couldn't figure out more than that
(which, natch, is my fault, not the tools).
In the end and out of desperation I restarted the server…which did the trick but left me frustrated that I'm no closer to figuring out what's going on with the damn thing.
Thank you to our sponsors for the title.
Good news: I'm going to LISA! I convinced my employers to heavily subsidize my trip. I've booked a double room at the hotel; I'll be posting to the roomshare mailing list shortly, but feel free to comment or email if you wanna split the cost.
Bad news: I somehow borked X on my desktop at work yesterday. The symptoms are quite strange, and mostly involve not being able to click on a window and have focus move there. It's IceWM, and I haven't changed focus model, and the symptoms persisted over multiple restarts of KDM (ctrl-alt-backspace). I looked for open files, running processes and even removed .gconf*
and .gnome*
on principle; nothing. The only thing that was different was running, for the first time, the new(ish - 1.5.0.2) version of Firefox after d/l it from the Mozilla site. The machine is running SuSE 10, and for various reasons I can't update it right now. In the end, I got desparate enough to try a reboot, and of course that fixed it...which is NO FUCKING WAY to solve problems, dammit.
(Interesting how this pokes holes in my manly command-line-only stance; yes, I was able to get some work done by going to the console, but frankly I've become very very used to managing terminals and a browser with IceWM and it's hard to switch back. Damn.)
Weird news: A while back I came across a problem with a Solaris 10 machine: lpq just hung, and eventually timed out with an error (that I haven't written down, so I suck). Eventually figured out it was trying to contact the lpd service on the machine's main interface (handwave goes here about BSD-compatibility printing commands), which should've been run by inetd. Okay, but inetd is now taken care of by inetadm
and svcs
, not /etc/inetd.conf
anymore. And while the command is called in.lpd
, it's actually called svc:/application/print/rfc1179
. Which is in maintenance mode, so start it up only it doesn't and I cannot figure out why: no log files I can see (the scattering of log files in a default Solaris install is really driving me nuts), no reason given, nothing. I ask another sysadmin who admits he's stumped by it but just for fun tries putting in an entry in /etc/inetd.conf
and then running inetconv
, the way you're not supposed to have to do except for weird legacy stuff that hasn't been moved to svcs
yet. And damnitall, it works. Again, no idea why.
And that is it for now. I am tired beyond belief, having moved up my annual snifter of port from Xmas to go out with coworkers last night. I stopped drinking at 7pm and I'm still tired today. Pathetic. Arlo would be so disappointed in me.
Trying to NFS mount something (Solaris client and server) and
getting error 7 (RPC: Authentication error)
? Check
/etc/nsswitch.conf
on the server and make sure that things like
auth_attr
, netgroups
and so on are not set to use (say) ldap
[NOTFOUND=return]
. Doubly so if you've just run ldapclient -v init
the night before and forgot that, surprise! it changes
nsswitch.conf
.
Starting up OpenOffice on Solaris 9/Gnome only to see gibberish in
the title bar rather than the file name? Log out, and in the
whatever-DM login screen go to Language
and select C - POSIX
. Log
back in. Works like a charm!
These Top Tips brought to you by the number Pi and the beverage Beer.
So Pouxie, my new OpenSolaris box, started displaying the same let's-shut-down-randomly-'cos-it's-Friday problems it previously did -- guess it's not the case after all. No problem, 'cos I happen to have a spare mobo and CPU that I've been itching to try out.
As it happens, it's got an onboard Intel ethernet interface which is
detected just fine (iprb0, thank you) by Belenix/OpenSolaris, but
fails to be brought up properly during boot. The problem is that while
the interface is assigned an IPv4 address, it's not actually up,
which means that adding the route fails, and
/lib/svc/method/net-physical
(which surprised me by being a simple
shell script) declares failure. (I think it's just the route
command
that fails, but I should check this out.)
No idea why this happens on iprb0
and not nfo0
, but what the
hell. Looking around the script shows that it does do ifconfig
plumb up
on IPv6 interfaces -- but when I tried touching
/etc/hostname6.iprb0
and running the script again (yeah, I know,
probably a horrible thing that makes Bill Joy cry) it created a
duplicate iprb0
interface with only an IPv6 interface. It was up,
the IPv4 version was still down, and the IPv4 route command failed.
In the end I just edited the script to make it run ifconfig plumb up
like it does with IPv6, and it seemed to do the trick just fine. I'm
currently trying to see if there's a similar bug already filed on
OpenSolaris.org; looks like I have a lot of slogging.
In other news, I thought I'd be posting this using BlogFS, but
I'm running into library problems. First, I had to change import
xmlrpc
to importxmlrpclib
. No biggie, even I can do that, but now
I'm getting this when I try to create the directory that would mount
the blog:
# mkdir foo:bar@saintaardvarkthecarpeted.com/blog/xmlrpc.php
mkdir: cannot create directory `./foo:bar@saintaardvarkthecarpeted.com/blog/xmlrpc.php': No such file or directory
Not sure what's going on.
In preparation for my new job, I've installed OpenSolaris on Pouxie, my wife's old desktop machine (a nice 2GHz Athlon). I've used Belenix, a live CD that includes a driver for Pouxie's onboard NForce ethernet interface.
So far I'm having a lot of fun. It took me three hours (spread over four days...damn this commute) to get a static IP address assigned to the thing, and then to get DNS working. But after a reinstall (a newer version of Belenix had come out that included the Sun packaging tools, which should let me use Blastwave to grab Emacs...a good first project, I think), I had it up and running in just a few minutes. Progress!
For those playing the home game, here's what I had to do:
modinfo | grep nfo
: yep, the module has been loaded.ifconfig -a | grep nfo0
: Not there.dladm show-link
: But it is here.echo "192.168.23.40 pouxie-2" >> /etc/inet/hosts
echo "pouxie-2" > /etc/hostname.nfo0 ; echo "netmask 255.255.255.0" >> /etc/hostname.nfo0
echo "192.168.23.254" > /etc/defaultrouter
reboot -- -r
: to get Solaris to find the new interface (?)ifconfig -a
: Now it shows up configured.svcadm --disable svc:/network/inetmenu
: Otherwise, it interferes with the change to nsswitch.conf I'm going to do up ahead.
svcadm --enable svc:/network/dns/client
: I long to know what this actually turns on.cp /etc/nsswitch.dns /etc/nsswitch.conf
echo "nameserver 192.168.23.254" >> /etc/resolv.conf
ping www.saintaardvarkthecarpeted.com
: It's alive!Happy birthday, OpenSolaris!