Posted 3 months, 3 weeks ago at 10:05 pm. 0 comments
I wanted to give WF2 a decent shot in Java, to see how it compares to the funky OCaml / Scala / Ruby versions that other people were making. It runs pretty well, about 15 minutes, and isn’t that complex. Okay, I did make it a single class file just to be simpler, but OO purists won’t like it It’s a hack, a test, just to see how well it will run.
UseConcMarkSweepGC is a good thing. UseParallelGC sounds like it would be good with lots of cores, but it kept killing the VM about 70% through.
I went with a gigantomous heap, just because I don’t really need it all, but as the queues get bigger, it’s nice to have. It probably would run fine with much, much less.
Java6 does much better than Java5. I’m using a locally-installed version since the /usr version wasn’t working. You need to unpack the 32bit sparc9 binaries first, then unpack the 64bit binaries on top of that. Have to use -d64 to get a heap bigger than 3G.
The -server JIT flag didn’t make any difference in processing time for me.
Interestingly, NIO block read size didn’t matter that much from 32k up to 4M, and neither did the number of threads, whether 25, 30, 35, 40, 50, 60, or even 90. Wacky! Not too much overhead in terms of context switching…
My app design has a single thread doing all the IO, simple reads, and then it hands the ByteBuffers to a separate blocking queue for each worker thread, to avoid any lock overhead. I think that’s probably irrelevant now, and that locking would be nanoseconds, so maybe I’ll redesign it, but it works fine.
My biggest problem is that my worker threads are for the most part waiting on IO to get more data to process. And the reduce phase at the end is not very long, 64 seconds, and it’s actually single threaded for now because shrinking 64 seconds to 64/5 seconds isn’t going to drop me from 15 minutes to 7 minutes
And okay, my results are off by .01% or something, but I haven’t re-run since I updated my parser to handle spaces in URLs. Close enough for me, not close enough for some other WF2ers. It all depends on your business domain what your accuracy needs to be.
MTNIOStats.java - file name capitalization may be munged - thanks WordPress!
Posted 10 months, 4 weeks ago at 8:44 pm. 0 comments
It saddens me to say, but after a year of fiddling with Apache Derby, working around its quirks, making a custom statement cache so that it’s not so frickin’ slow, that it’s finally time to call it quits and move on to something else.
Data corruption… Over the past year, I have been using Derby as an embedded database inside of Fubario. I’ve been running it on Windows2000, WindowsXP, and OSX 10.4. On all of these OSes, some are laptops, some are desktop machines on all the time, I’ve had varied data corruption scenarios occur, even under relatively little load.
I will admit, I haven’t done a good job keeping records of each kind of data corruption, probably because of foolish optimism that when a big point release of Derby came out it would have solved the problems with the prior version. But alas, every release has had various things go wrong. I’ve had entire databases corrupt so that Derby would refuse to startup, or even an interesting “poison table” corruption where everything was fine until I issue a query against a particular table that was corrupt, at which point Derby went out to lunch.
And when I say little load, I mean pretty much the databases were idling 24×7, with maybe a few thousand rows in them. And to think that I switched to Derby specifically to get more reliability than I had been having with hsqldb or its shiny new cousin h2…
So where am I now? I’ve built a custom flat-file store for Fubario for storing encrypted backup files. It’s pretty simple, really, with a few key characteristics of secret sauce that make it perfect for Fubario and it’s really tiny in terms of amount of code, because it doesn’t have to do very much at all. I’m also using h2 again for storing maybe tens of properties or something like that, miniscule really, probably could use properties in a flat file, but I’m just reusing the sql-driven classes that had been talking to Derby.
I’m done with the 20 second startup and shutdown times.
I’m done with these random and varied and all peculiar data corruption scenarios.
The electronic catalogs for my nearby public libraries are pretty basic. They are the same boring Horizon Information Portal websites that provide the basics but sifting through the results is clumsy. And searching another library at the same time? Forget it.
Enter WorldCat. Searching a library is, dare I say it, fun again! Everyone can add reviews to library items ala Amazon, it automatically shows you nearby libraries that have the item using guesstimated location based on your IP, and they make money by providing links to Amazon if you’re too lazy to go get the book from a library yourself. Very well done… I’m jealous.
I still wish I could find my library search engine that I made back at UIUC in 1994. It was the capstone project for, of all things, CS411 Database Systems. The professor was dating/engaged/married to Eric Bina who was the co-creator of Mosaic with Marc Andreessen. And so apparently she liked the idea of making her databases class build a web application. I was fine with it, really, because Mosaic was pretty cool, although I found myself always keeping 2 windows open and dragging links from one to the other so I wouldn’t lose my place while I wandered about. Kind of a kludgy coping mechanism before the era of tabbed browsing.
I came up with the idea for my team of 3 to build a web interface to the UIUC library system, which was a mainframe beast, maybe 3270 but I’m not sure. Luckily one of the guys on my team had build the TCPSETUP program for Doom, so he was good at making socket programs and so he built a library for talking to the mainframe. I built out the perl-based webapp which took care of sessions, form-processing, calling the library to talk to the mainframe, rendering the search results, etc. The other guy, um, didn’t do anything.
Even though I didn’t have a name for it (usability) I worked hard to make the webapp easy, even fun, to use. You typed in a search by author or title and would get back a list of results all nicely lined up with information about each match. Click on a book to see more details, and best of all, get a form where you could put in your school ID and the library would send the book to your dorm room! For real!
Every team presented their final projects during the last week of class. Other teams had built basically useless apps, all were clunky, didn’t make sense, whatever. We presented ours to applause, actually, but apparently our professor was not impressed. “It looks too easy”, she said. That was the goddamn point! The back-end work to screenscrape multiple greenscreens for every page view was nutso stuff, and yes, the front end looked smooth & clean. She gave us a B. And that was when I knew that all those people who had told me that “grad school was when you start doing real work, no more theory crap like undergrad classes” were out of their minds. I had worked my ass off to build something real, and got a frakkin’ B for my troubles.
Postscript - a few weeks after this class ended, the webmaster for UIUC heard about my library webapp and asked if she could install it on the library website. It became the official library web search engine running at UIUC for the next several years. Yeah, that sounds like B-level work to me… Adios, grad school!
Posted 1 year, 2 months ago at 7:33 pm. 0 comments
Tivo has 3 speeds of fast-forward. One arrow is about 2x, and when you drop back to real time, it starts playing right at that spot. But with two arrow and three arrow levels of fast-forward, when you click play to drop back to normal playback speed, Tivo rewinds a bit, on the premise that by the time your brain processes the scene that you want to see, you’ll have skipped past it a bit, so Tivo guesses how far back to rewind and starts playing from there.
In theory, this is a great feature, because you can relax a bit more while fast-forwarding. Instead of being on the edge of your seat and trying to “predict” when what you want to see will show up, you get to see a slight “preview” of what you are fast-forwarding past, and then you say “aha! that’s what I want”, and then Tivo skips back a bit so you don’t miss anything. But that, alas, is the Tivo Spoiler.
For those of us who occasionally watch reality TV shows that have various dramatic moments, it’s awesome to fast-forward through all the gobbledygook and inane chatter the hosts spew out to waste time and ratchet up the tension. The only problem is that as you try to drop out of fast-forward to see the Big Reveal or whatnot, you usually tend to see just a frame or two of the contestants just before your Tivo rewinds a few seconds so you can watch the envelope opening or what not, but your brain has already processed who won and who lost, so it’s kind of a let down.
*sigh*
I’m not sure how to get around this, really, aside from being more vigilant during fast-forwarding and also subjecting myself to more inane reality-show chatter…
Posted 1 year, 3 months ago at 12:27 pm. 0 comments
I’ve got a public beta of Fubario up now. It’s a peer-to-peer backup program where you store your backups on your friends’ computers. Easy to set up, leave it running forever, never think about it.
Things still to do - backing up in-use files and firewall traversal…
Posted 1 year, 3 months ago at 11:30 am. 1 comment
A week or so ago, I read Tim Bray’s post about X-Me on Facebook being a virus, and I had to respond, being one of the “little guys” that has low usage numbers. A friendly Google tech recruiter read my post, found this dusty blog, wandered over to CLG where I work, and emailed me to ask if I had any interest in working for Google.
At first glance, seems kind of cool. But I’m not moving to California, as I’ve got all my family and extended family in the Chicago area. The recruiter kindly mentions that they have a Chicago office. Nice! Let me google them up…
*stomach lurches*
It appears that the Chicago office for Google is essentially an ad sales office (fine) with 3 developers who work in a corner. The developers are (mostly) the people behind subversion, which is one of my least favorite technologies that exists right now. I am struggling to understand why so many people are in love with it, apache/jakarta is switching their projects to it, and yet CVS works just fine, barring some warts. Eclipse covers up most of those warts, and since I breathe Eclipse 24×7, I’m fine with it.
But subversion… Well, I am one of the few non-believers, apparently. It was forced on me at CLG (long story), and turned out to be slooooooooow, taking 5 minutes to sync our projects, vs maybe a minute or less with CVS. It must take extra time to not do keyword substitution (j/k) which okay, I don’t like that CVS does, but whatever.
The big one for me is that subversion crapped out on me. Our repo server died, and I had a backup on another box, but apparently one of the files had some kind of strangeness in it, such that svn couldn’t read it. Um, okay, well that’ll just break one file, right? Nope. Entire repo, dead. WTF? You gotta be kidding me. An error in a single file kills my entire 6-year commit history??? And the error was caused by svn’s file handling, as far as I can tell, as I googled for it and a few other people had the same problem. When they asked the developers about it and showed them the corrupt file(s), the developers’ answer was, “that can’t happen” (paraphrasing). Um, except it did. To him. And me too. To be fair, the developers went on to try and fix the problem(s). But problems that are this varied, complex, and almost impossible to reproduce on demand are unlikely to be completely resolved. The bottom line is, the svn guys have made their own database thang for storage, and it sometimes blows up, taking your entire repo down with it. Not cool.
Don’t tell me I should have had more backups going further back in time - that’s a cop out. I had backups, but they were useless because one file in the middle was busted. Unacceptable.
No subversion for you! Er, me. I’m sticking with good ol’ CVS, which sucks a little, but it’s a devil I know, with warts I know, and if I back up the files for it, then I know goddamn well that I’ve got a reasonably good backup of my repository even if there’s a twiddled bit in there somewhere due to disk failure or whatever.
So I was compelled to mention my anti-svn feelings to the recruiter, in as friendly and upbeat a way as I possibly could, and I asked about working on other projects or remotely for another office, and haven’t heard back. So methinks, no interview for me. Could have been an interesting experience, but seeing as I hate goofy interview questions/riddles about manhole covers, numbers of gas stations, and any of those moving Mt Fujiisms, it might be for the best.
Posted 1 year, 3 months ago at 12:29 pm. 0 comments
For the (entire world’s population - my direct family) who don’t know about GIVM.com, it is a very cool (IMHO), Web 1.0-ish shared gift list manager. It was born out of the frustration of trying to coordinate xmas and birthday gift giving amongst family members, especially as the family(ies) grew from people such as myself getting married.
So, the GiftList was born, originally running at bratton.com/giftlist, and it has been running for six years now, with about zero problems. It’s also been running with about zero growth. *sigh* Granted, I haven’t really tried to market it at all, because of the hope that it would virally take off.
The big problem is that the viral analogy doesn’t work for gift giving, because most gift circles are just that - circles. They have very few branch points to virally spread the word. Okay, yes, in a perfect world, somehow overlapping families would spread the word, but it hasn’t happened much at all.
One of the most amazing things about GIVM is that I managed to snag givm.com a while back. Crazy! I love the name, and when I saw it as available, I was in shock for a while before I bought it.
So, last month I heard about Facebook opening up their “platform” API thingamdooey, and I spent a few long nights porting GIVM to run inside of Facebook. One big bonus was that I back-ported some of the CSS to givm.com, so now whether you use givm.com or the GiftList app inside of Facebook, they both look pretty good.
Overall the dev experience was fairly straightforward, but I lucked out a lot that my schema design matched up very cleanly with Facebook’s. And I rock, of course.
After a week or so of anxious waiting to be approved into the Application Directory, I was in! There are a few other gift list type apps available on Facebook, but none of them have the same usability that mine has. I’ve had six freakin’ years to make sure that grandmas and grandpas can use this thing! I know where most gifts come from, baby, and you’ve got to play to your audience.
Facebook? Grandparents? Hm, not quite a match-up there yet. And that of course is the cart before the horse problem I’ve got, which is that Facebook is supposed to grow from 20ish million users to 50ish million users by the end of the year. Okay, a lot of those will be older people, not just the current FB highschoolers and college folk, but it probably will be a slow growth curve for the gray set. Which means my GiftList is probably not going to explode with high user numbers on Facebook.
After 3 weeks, I’m up to about 430 users. Lots more than givm.com, but dust in the wind compared to silly trifles like X-Me…
Posted 1 year, 6 months ago at 2:45 pm. 0 comments
Okay, so it’s not that big of a leap, I guess, but I stumbled across Tony Buzan talking about the miracle that is mind mapping (not), and as I watched, I couldn’t help but think that it’s just a well marketed and not so interesting variation on an outline. Do curvy lines help with information? Um, that’d be a no. Go ask Tufte and he’ll give you a whack upside the head (or a failing grade) if you have anything in your information design that is not contributing to the comprehension. If the thickness and curviness of the lines is conveying information, then great! But if it’s just because “the brain likes curves”, then stop kidding yourself - it’s chartjunk.
I often use MS Word to do outlining. I’ve got a simple generic template for MS Word that I use, and you can use tab and shift-tab to move sections in and out. Well, most of the time. Sometimes things get mucked up if you hit tab in the middle of a line. Try highlighing an entire line or three and hitting tab, or having your cursor at the beginning of a line.
Anyways, I tend to brainstorm project ideas, requirements, designs, anything, in outline form. And then everything is numbered, easy to move around, copy, paste, edit, delete. And now, I feel so proud that I’ve been doing mind maps all along! I don’t need any new-fangled online mindmap tools! MS Word, for me thanks! Or Google Docs does just fine, too, actually, if you don’t want to spend any buckage.
Posted 1 year, 10 months ago at 5:50 pm. 8 comments
So. I wanted to set up a cvs server under OSX. I looked around and found a few people with docs about how to set up xinetd or set up launchd, blah blah blah, but I didn’t want to deal with that. Yeah, I know OSX is unix and isn’t that great, but I want something like the old days where to install a web server, you just launched the MacHTTP app and you were done. No config files, just simple apps. Want a mail server? Fire up EIMS. Ah, the good old days…
So, as it turns out, it apparently _is_ simple to get cvs working under OSX. Here are the steps (from memory, so I might have missed something, but damn it’s simple)
Insert OSX Install Disc 1 and launch the Xcode Tools / XcodeTools.mpkg installer, Continue, Continue, Agree, Continue, Customize, deselect everything except for the top item, “Developer Tools Software”. Finish the install
Fire up Terminal and make a cvs-repository directory in your home directory (or wherever, but that’s what I used): “mkdir cvs-repository”
Add required files to the cvs repository with “cvs -d /Users/erich/cvs-repository init”
Open your System Preferences, Sharing, Services - make sure Remote Login is enabled
There is no step 5
That’s it. I fired up Eclipse, made a test project, connected using the IP address (not by name, for simplicity) of the OSX box, connection type of “extssh”, repository path of “/Users/erich/cvs-repository”, and it worked. That was it.
Secure cvs, no pserver bullshit.
Almost back to the old days. Way better than trying to set this up on Winders. But I can dream of a CVS.app someday, right? Are you on that?