Median of 2 sorted arrays

May 4th, 2008

A nice young man wrote about working on an algorithm problem after being inspired to think thusly by his new copy of The Algorithm Design Manual. Coincidentally, I recently received my copy of the book as well. I was less inspired than Mr. Torreborre, probably because I’m very lazy. I thank him for presenting his interview question, because that got me to thinking yesterday morning. How would I find the median element of the merger of two sorted lists without actually doing a (linear) merge?

I’m now typing in what’s been going through my head, mostly while cooking breakfast for my kids. I contemplated trying to explain it to them, but I decided against it. Over the course of the explanation they would certainly concoct a variety of theories about why I was trying to punish them in such a strange and tedious way. I haven’t gone to search for a result in the book or on the web; maybe that’ll be obvious after you read this. I’ll check later.

First, I’ll say the median element of a sorted list afirst … alast is afloor((last-first+1)/2). In other words, it’s the element right in the middle (rounding down - arbitrarily - if the list has an even number of elements). So for this problem, I’ve got two lists, a and b, and I want to know what element would be smack in the middle of the list resulting from their merger.

I went around in confused circles for a while before hitting upon (what I think to be) a good way to view the problem. I know what the size of the merged list would be - it’s the sum of the sizes of a and b. Thus I know exactly where the median value will be in the result. However, all I know about the result list is that it’s a muddle of values from the two original lists. For any index i in the merged list, all I can say is that the value will be from either a or b.

Now, there are a few cheap things I can do to examine my lists. Looking at an element at a given position is cheap - well, at least, I’m going to declare it to be cheap. So, for example, I can look at the median of either source list, or at the first element, or the last. Another thing I can do, less cheaply, is to find where a number would go in one of the source lists. That’s an O(log2 n) operation (with n being the list length).

If I think about a couple of ordered lists of numbers (with no known bounds on the values in the lists), it’s clear that one list may have values larger than any value in the other, or smaller than any value in the other. If I take at the smaller of the two last values in my source lists (that is, alast and blast), and then find where it would go in the other list, I now know something really interesting: I know exactly what values occupy the positions at the end of the merged list! That is, if alast is smaller than blast, and I find that it would 100 numbers down from the last element of b, then I know that the last 100 elements of the merged array have to be the last 100 elements of b. Of course I can make the same discovery at the bottom ends of the lists.

In the degenerate case, one list might contain nothing but values completely beyond the end of the other list. In that case, I can immediately find the median of the combined lists because the merge is easy. If it’s not, then with 2 log n operations I can snip off some of both lists. Now, some portion of the low and high ends of my hypothetical merged lists are no longer muddled - I know that those portions contain values from one of the two lists. In fact, I can now see what my goal is: I need to get to the point where the median slot - the position smack in the middle of the hypothetical merged list - is not muddled.

So now I have two sublists of the original lists, which represent the “muddled middle” of the hypothetical merged result. Hmm. I’m not liking this approach, because I’m not lopping off sufficiently big chunks of the problem. Well, they might be big, but I’m not forcing any bigness; I’m just whittling the ends of the lists down, and the rapidity with which the lists get smaller is entirely dependent on the values in the lists. I need to get a little more radical. (I do still like checking to see whether one list lies completely beyond the other, however, at least for now. It’s cheap to do.)

Another thing I can do cheaply is look at the median of one of my lists, and then I can see where it’d go in the other list. If I do that, then I will definitely lop off half of one list. I’ll still have a muddled result, except now I know a little bit more: I now have four lists, not two, and by looking at the combined sizes I’ll know that two of those can be forgotten. Now that’s looking good, because on every iteration I’m throwing away half of one list and at least some of the other. I’ll always pick the larger list to be the one I cut in half of course. Eventually, I’ll get to the point where one of the two lists lies completely beyond the other, and then I’m home free.

Coding up something like this makes me feel uncomfortable. I fear “off by one” errors the way some people fear spiders. Sitting here now I can conjure up a vision of working through the day on this stupid thing. I know that I really should try. One thing that’s clear is that it’d be silly to do it in a fun language like Erlang, because it all depends on it being O(1) to look at values in the lists. (Well I guess I could merge two Erlang binaries, treating them as arrays of integers, but compounding my off-by-one fears with the need to essentially code up my own array indexing routines really freaks me out.) I’ll try not to be lazy and do this in Java or something boring like that, at least.

Later…
Well I just checked this blog by a smart person and he does this a totally ‘nuther way. I don’t have enough glucose left in my blood right now to figure out which is better, but I bet his is because he doesn’t have to do any searches. He has a weird definition of the median though in the case of lists with an even number of elements. I’ve always thought that the median value has to actually be in the data set, because otherwise there can be an arbitrary number of median values. I’m no statistician however. It probably makes no difference at all for this problem.

Later still…
I’ve been putting together a model cannon kit whose instructions are basically “glue it together”. My eyes are sore. Anyway, to the extent that I can get a lame approximation to the stuff I wrote above working, I think it’s still interesting. In the case of a significant disparity in list sizes, mine converges really quickly on the median because it chews up the shorter list very quickly. When the lists are pretty dense and about the same size, it takes about log n iterations. I’m not sure how to figure the bounding function - maybe it’s log n but maybe not (I suspect the later but I’m dumb).

Quote your angle brackets

April 29th, 2008

Here’s a neat fact: both IE and Firefox will find a </script> tag in the middle of a Javascript string constant. Now that’s probably not too surprising; the browser sort-of has to do that, or else a missing quote in a script block would eat the whole page. This came up in my universe however when I was doing some testing on a page I suspected wasn’t quoting stuff properly.

Any web-oriented templating mechanism (or, more generally, anything spewing out HTML programmatically) will have to worry about what to do with external data that needs to be dropped into the HTML. When it’s going into HTML code, like this:


  <label>Name:</label> ${whatever.name}

then you’ve got to translate angle brackets and ampersands to HTML entities “&lt;”, “&gt;”, and “&amp;”. However, when you’re dropping stuff into portions of the HTML document that are actually Javascript blocks (for example, when populating a data structure), you don’t do that. Instead, you have to massage the value so that it works inside a Javascript string constant (well, that’s what you do if you want to put it in a string constant, at least).

Our library has an “escapeJS” routine that does that sort of quoting. What it worried about (up until about 30 minutes ago) was quote characters, backslashes, and characters outside the old 7-bit printable ASCII range. Of note, it did not worry about less-than characters (left angle bracket, that is). I stumbled over this because I was getting a weird complaint from Firefox:


  whatever = "${mylib:escapeJS(whatever)}";

The error seemed strange because it was about the string being unterminated. When I brought up the source, however, the whole thing was there (including an embedded “</script>” in the string - remember, I was doing XSS testing). “Durrr,” I thought to myself. “What’s Firefox doing?” Embarrassingly it took me some time to get it. I thought that maybe it was a Firefox 3 thing, but no. When I saw that IE did it too, the feeble 20-watt light bulb went on.

I updated the “escapeJS” routine so that it treats less-than characters the same as control characters, encoding them with the Javascript “\u” notation. Probably everybody else in the world is less dumb than me, but nevertheless I figured I’d write this up.

PS: Ha ha I just realized that this is another “spring bug.”

Spring Bugs

April 14th, 2008

Yesterday was very pleasant, so I got out the Sigma macro lens and the “ring flash” and started looking for bugs. The first thing I found was a small spider on some big rosemary plants. He seemed to be repairing a web.


As usual I have no idea what sort of spider that is. The bright orange mark is what caught my eye. I think I’m going to send the photo in to What’s That Bug to see if they know. (Well now I know: it’s a Leucauge venusta, an “orchard spider”. This morning the web is looking very nice and neat.)

The Monarch Butterfly thing this year is weird. They’ve been all over the place, and in particular they found the newly-emerging milkweed in our back yard.


This one is getting started on a whole new milkweed plant, since he and his friends ate the leaves completely off the other plant. They and the black swallowtails are pigs.

Green beetle in a rose:


Bee on some marjoram:


Roadtrip Day 4: A Hike

March 30th, 2008

We got up moderately early, packed the kids up with breakfast junk from the hotel buffet, and headed out to the park. It was in the low 50’s but it didn’t feel cold; we’re at 3000+ feet up here, and that really seems to affect the perception of air temperature. I had running shorts and a running t-shirt, anticipating (correctly) that I’d be running back from a finished-up crew to the van.

We opted for the “Juniper Cliffside Trail.” The trails on the parks department map are not very clearly described or marked. In this case, we started on one side of the river (the Prairie Dog Fork of the Red), and found we needed to ford the washed-over road crossing to continue. Wet feet. The trail is a multi-use foot-bike-horse trail, and a few squads of bikers - and one of runners - passed us along the way.

The lower reaches of the canyon are “Quartermaster” formation Permian sediments. The white bands are gypsum, in the form of selenite or alabaster (and something else I can’t remember). Up close the white bands look like masses of brilliant thin crystals all lined up vertically next to each other. The gypsum precipitated out over layers of sediment during dry periods of the late Permian, when the world climate was going bonkers due to the conglomeration of continents into Pangea. As the soil dried, got somewhat re-wetted, and dried again - a lot - the gypsum sort-of filtered out of the sediments and was left as those distinctive white stripes.

Above the Quartermaster layers are two groups of late Triassic sediments, the Tecovas group and the Trujillo group. Those consist of random shales, conglomerates, and sandstones in a weird variety of colors. In particular, a pale purple and a greenish-yellow layer can be seen all over the place. On the trail (which is almost completely in Quartermaster territory), occasional arroyos have washed down the grey-green, yellow-green, and purple-grey sands and pebbles to mix with the red and white Permian sediments. It’s really cool to see. The water erosion on the cliffsides give a thoroughly Georgia O’Keefe look.

I can’t resist cheap compositions with gnarly dead trees. The place was easy pickings for that sort of stuff today.


The kids were pretty good about the whole thing. When they’d exhausted their energy, I left the camera bag with Elaine and headed back down the road. About two or three hundred yards out, I heard a couple of panicked-sounding screams of “Mike! Mike!”. I stopped to verify, and then realized I had no choice but to run back. An annoying minivan showed up to mask any further sound as I ran back up the park road as fast as I could (not fast). When the fam came finally into view, they were just strolling along. Pat saw me and came back to tell me that Elaine wondered whether I had the car keys. I expressed dissatisfaction with that motive for a distress call in a way that apparently made Elaine a little upset. I felt bad about that after I found out the effects. It turned out we hadn’t gone that far, because even with my slow running pace it took no time at all to get back to the van, though it had taken two hours to get where we got via the trail.

I sat around the pool watching Allie and Pat “swim” in the pool while Elaine and Christopher returned to the museum. Christopher came back in a couple hours to construct a “replica” of a historic firearm he’d found at the museum. It’s good that he’s satisfied with the most vague topological similarity between his realizations and the actual originals.

We went back to the park in the late afternoon one more time, just to have a look. It’s a really nice park, and with the museum as icing I think it was a nice trip. It’s really far, true, but Canyon is a surprisingly nice little town (compared especially to Lubbock, which had us wondering whether things would keep getting worse as we went north). We ate at a Thai-Laotian (yes, really) place last night, and at the little “Something Different” place today we had a good lunch and a flawless free wireless connection.

Roadtrip Day 3: Cool Museum, Make-Believe Hikes

March 29th, 2008

Local temperature was 34 degrees this morning, and the plain outside the hotel was loosely fogged. Our day started relatively late, due to this enormous multi-room suite and its black-out curtains. We headed out to the Panhandle-Plains Historical Society Museum to make it in before the crowds. We were successful.

This museum is large. It’s billed, in fact, as the largest history museum in Texas. It’s reminiscent of the randomness of the Witte Museum in San Antonio or the Texas Memorial Museum on the UT campus (in the latter case, its former randomness - it’s been rationalized in the recent past). The collections include paleontology, geology, Plains Indian anthropology, petroleum production, sea shells, stuffed birds, stuffed plains animals, Indian art, Western art, firearms, cars, old gas pumps, WWII memorabilia, “pioneer” living, local ranching history, and the Texas Revolution. If you’re ever in Canyon TX, it’d be ridiculous not to visit.

After a diffuse lunch period we headed back to the canyon. With our fresh new annual pass we breezed in and drove down to the canyon floor. We had no explicit goals. The day had cleared up and gotten sunny and warm.

The lower reaches of the canyon are walled and floored with orange-red Permian sediments, highlighted by white bands of gypsum. Above that are multi-colored Triassic sandstones and shales. The top is capped by relatively recent “caprock” limestone and caliche. The scrubby vegetation was in various stages of dormancy, but the stark twiggy look was dramatic and “classic.”

The dark twisty trees in this image:

reminded me of the witches from Macbeth. What would Shakespeare have thought of landscapes like this? The harsh light beaming down on our baby hikes over forty or fifty yards of wild terrain wouldn’t have done for those witches, but the terror thorns, the twisted, scraggling shrubs, and the abrasive polychromed crags would possibly have seemed too fantastic to believe. I have to wonder what it’s like on moonlit nights.



The Triassic strata include (in the Trujillo group) layers of “banded” sandstone. Up close it has a shiny gray look, but from a distance it looks dull gray to blue-gray to an almost malachite green. These rocks are pieces that have tumbled down from an original altitude a hundred feet or more up the slope.

The plan is that we’ll head back in the morning for a real hike. We’ll see how that goes.

Roadtrip Day 2: Gliders and a Canyon

March 28th, 2008

After the long day on Thursday, we went to bed pretty early. Everybody woke up at about 5:30. By 6:00 nobody could maintain the charade of trying to sleep, so Elaine got the kids out the door for the dee-luxe breakfast buffet at the Days Inn. The breakfast area smelled strongly of petrochemicals, as it was primarily inhabited by oilfield service workers. Several semi-toxic packaged sweet rolls and bowls of cereal later we were ready to head out. The sun wasn’t yet up.

The Caprock Escarpment comes into view a few miles south of Post, TX. After Post, highway 84 rises up onto the plateau that is the High Plains of North America. The top-level rock strata is the Ogallala Formation, a fairly recent deposit of more-or-less rocky sediments formed from the broad downwash from the rising Rocky Mountains, stretching south to north across much of the country. The topmost layer is a hard quasi-limestone caliche that’s what was originally called the “caprock”, as it sits atop the typical bluffs that appear up and down the edge of the High Plains.

An interesting thing about this day is that it began about 40 degrees colder than it was Thursday afternoon. Thus upon reaching Lubbock we stopped at a Starbucks (with idiotic for-pay wireless) and then sought out a WalMart so that we could pick up more clothes for the kids. Lubbock is - partisans, please pardon me - a horrid place. For some reason we attracted beggars consistently. After a trip to a cold, windblown Prairie Dog Town (pop. 1)

we headed out, hoping to figure out a way to avoid the place altogether on the way back. However just outside the pale we notice a sign for a museum at the Lubbock airport. We spun around to go there after seeing a pretty good-condition C-47 parked in front. The museum is the “Silent Wings” museum, and it preserves the story of USAAF pilots who flew the various transport gliders employed in WW2 for airborne assaults. Inside the museum was, among other very well-done exhibits, a complete CG-4A WACO glider.

We soon found that the glider was set up such that we could climb inside and look around - somewhat astounding, given the fact that the things required delicate care when new.


The museum was a great find overall. We headed out and got to Canyon at lunchtime. After some pizza for the kids we drove out to the park. The van was low on gas so we didn’t want to drive down into the low part of the park, but we stopped at the point you can take this picture:

Palo Duro Canyon looks to be a pretty spectacular place. It’s a hard thing to capture with photography. I hope to go back over the next couple days, and I want to try and capture small stuff that evokes the visual impression of the big stuff. We’ll see.

On the way out of the parking lot we saw this tricked-out extreme Land Cruiser, apparently ferried here from Switzerland.

Roadtrip Day 1: Fossils, Bombers, Windmills

March 27th, 2008

San Saba is a ways away from places that are a ways away from places whose locations you know. Specifically, it’s about 20 miles west of Lometa. It’s a nice town, the “Pecan Capital of the World.” There’s a park there around a millpond. I saw no mill, but there was a bridge.

West of San Saba to Richland Springs, then north across Wilbarger Creek, then west about 12 miles, there’s a roadcut through some Pennsylvanian sediments. We stopped and looked around. Within about a minute Allie had found a slab of fossilized stuff about 8 inches square.


Little chunks of fossilized crinoids were all over the place, sometimes separately and sometimes in little concretions like that. About 3 vehicles passed us during the half hour we rooted around.

We headed out and onwards to Abilene, which we skirted on its southwest side. We stopped after joining I20 at an exit labeled “Tye”. The Conoco truck stop sported a gift shop filled with weird dolls. After gassing and washing up, and after the inevitable purchase of some Silly Putty, we walked out to the sound of something in the sky. There flew a B1B, back to its home at Dyess AFB. Three or four more flew in one after the other, while a C-130 flew over, a couple thousand feet higher, in the opposite direction. It was pretty cool.

There are a lot of windmills - big ones - on hills southwest of Abiliene and out basically all over the place along highway 84 northwest out of Sweetwater. They looked a lot bigger than the ones I remember from northern California, but maybe I’ve shrunk since then.

Wasp

March 21st, 2008

wasp photo

I hooked up my homemade “ring flash” Amazon box the other day and took some pictures of emerging flowers. I wasn’t happy with any of them in the on-camera preview, but I finally got around to uploading them. This wasp one looked terribly blurred and over-exposed on the LCD, but it looks fine to me now.

The pear blossoms in which this wasp was cavorting have since dropped off the tree; it only takes a couple days. It’s not our tree; it’s in the neighbor’s yard. It produces a tremendous number of pears. They don’t taste like anything at all when they’re ripe off the tree (July), but my brother-in-law told us to put them in the refrigerator for a couple weeks. After such treatment the pears get a lot sweeter.

This wasp was a little thing; the blossoms are at most an inch across when they’re open.

I got an OK picture of a mutabilis rose blossom too. The background is black because the ring flash doesn’t cast enough light to expose stuff not really close to the lens.

rose photo

“We Are Not Shed People”

March 1st, 2008

Backyard sheds are an important part of suburban American life. With a shed comes obligation, however, and some are not up to the demands of shed ownership. In particular, I am not a worthy shed owner.

Shed

Our shed was, as sheds go, a pretty nice shed. It had been added to the home (we think) by its original owner, perhaps at the time the house was finished by the builder. The outside walls were finished with similar wood siding to that on the house; the inside was left without real walls. We used it to store a shedful of stuff that we didn’t want, didn’t know what else to do with.

Shed

One thing our shed did for the local ecosystem was provide a home for small rodents. This I discovered surprisingly recently. As a grossly unqualified shed owner, I hadn’t been in the shed for at least two years. A large rose bush with gigantic, murderous thorns had grown completely over the shed door. Before that the last thing I’d done was replace the shed doorknob with a new one. Since that time, mice or rats, or both working as a team, had gnawed a classic cartoon-like access portal at the bottom of the bush-hidden side of the shed. Our dog - a “rat terrier” as unworthy of her title as we are of “shed people” - had been showing a lot of interest in those bushes and that area, but had never come anywhere close to actually getting a rat. One afternoon, as I walked out with Gypsy, she and I both heard a brief rustle, and I turned to see a rat on a low holly branch. Gypsy saw it too, and sprang to the attack by cleverly running the opposite direction towards the area where she really suspected the rats to be.

No Shed

At that point I wondered, “Where is that rat going?” Only then did I think to check whether there was another way into the shed besides the inaccessible well-locked door. The next day we cut the rose bush back and checked inside. There amidst the unwanted detritus steaming away in the shed was what must have been some of the most valuable prime rat real estate in the area. We didn’t see any rats at that point, and the dog couldn’t find any either, but it was clear that they’d made much better use of the shed than we ever had.

Rat Check

About $800 later, we’re shedless. It was somewhat embarrassing taking the pictures. There I was, an affluent yuppie unable to maintain a shed in his own back yard, taking pictures of guys forced to wear breathing masks to protect themselves from the rodent filth I’d allowed to accumulate. It made me feel contemptible, but as I was depriving myself of a shed I contented myself that that was appropriate punishment. I don’t deserve a shed.

We’ll put flowerpots on the slab, or something. Flowers and herbs I can take care of, usually. The rose bush will be a lot happier anyway. I don’t know where the rats will go. They were of course seriously traumatized, and to some extent I feel kind-of bad about that too.

The Gaff

February 23rd, 2008

I’m writing this from the main dining hall of The Gaff, the #1 cyber pub in Port Aransas, TX. A well-known local just walked in to a shout of acclaim. The owner then told us about the Thursday pirate raid pub crawls they do from here.

I don’t have much to say - or, actually, I could probably stay in this place and write a few bad novels over the course of a season. I did however feel compelled to blog from here.

The pizzas just came, as did Allie’s pirate-sized meatball sandwich, so I sort-of have to wrap this up.

Later…

The Gaff is everything you could want from a run-down beach hangout. The menu has like four things on it. The weirdly efficient bartendress puts the pizzas together and they were approx. of BB’s quality. Pyramid Wheat on tap. Belt sanders used in periodic belt sander races decorate the grimy windows. There’s an enormous bigscreen tv that doesn’t seem to work properly. They have free wireless, and this paean is good evidence of what a fine idea that is.

We walked in, and for some reason the kids immediately started exploring all 1000 square feet of the place, with many declamations of “Classic!” as the piratical denizens paused (briefly) from their beer-cigarette cocktails.

After sitting down, Pat loudly announced, “This is just like Las Vegas!”