pages tagged advogato

karatsuba multiplication in sbcl

2017-04-02T20:48:17Z

Possible alternative title: I’m on a train!

In particular, I’m on the train heading to the European Lisp Symposium, and for the first time since December I don’t have a criticially urgent piece of teaching to construct. (For the last term, I’ve been under the cosh of attempting to teach Algorithms & Data Structures to a class, having never learnt Algorithms & Data Structures formally, let along properly, myself).

I have been giving the students some structure to help them in their learning by constructing multiple-choice quizzes. “But multiple-choice quizzes are easy!”, I hear you cry! Well, they might be in general, but these quizzes were designed to probe some understanding, and to help students recognize what they did not know; of the ten quizzes I ran this term, several had a period where the modal mark in the quiz was zero. (The students were allowed take the quizzes more than once; the idea behind that being that they can learn from their mistakes and improve their score; the score is correlated to a mark in some minor way to act as a tiny carrot-bite of motivation; this means I have to write lots of questions so that multiple attempts aren’t merely an exercise in memory or screenshot navigation).

The last time I was on a train, a few weeks ago, I was travelling to and from Warwick to sing Haydn’s Nelson Mass (“Missa in angustiis”; troubled times, indeed), and had to write a quiz on numbers. I’d already decided that I would show the students the clever Karatsuba trick for big integer multiplication, and I wanted to write some questions to see if they’d understood it, or at least could apply some of the results of it.

Standard multiplication as learnt in school is, fairly clearly, an Ω(d²) algorithm. My children learn to multiply using the “grid method”, where: each digit value of the number is written out along the edges of a table; the cells of the table are the products of the digit values; and the result is found by adding the cells together. Something like:

       400     20      7
300 120000   6000   2100
 90  36000   1800    630
  3   1200     60     21

for 427×393 = 167811. Similar diagrammatic ways of multiplying (like [link]) duplicate this table structure, and traditional long multiplication or even the online multiplication trick where you can basically do everything in your head all multiply each digit of one of the multiplicands with each digit of the other.

But wait! This is an Algorithms & Data Structures class, so there must be some recursive way of decomposing the problem into smaller problems; divide-and-conquer is classic fodder for Computer Scientists. So, write a×b as (a_hi×2^k+a_lo)×(b_hi×2^k+b_lo), multiply out the brackets, and hi and lo and behold we have a_hi×b_hi×2^2k+(a_hi×b_lo+a_lo×b_hi)×2^k+a_lo×b_lo, and we’ve turned our big multiplication into four multiplications of half the size, with some additional simpler work to combine the results, and big-O dear! that’s still quadratic in the number of digits to multiply. Surely there is a better way?

Yes there is. Karatsuba multiplication is a better (asymptotically at least) divide-and-conquer algorithm. It gets its efficiency from a clever observation[1]: that middle term in the expansion is expensive, and in fact we can compute it more cheaply. We have to calculate c_hi=a_hi×b_hi and c_lo=a_lo×b_lo, there’s no getting around that, but to get the cross term we can compute (a_hi+a_lo)×(b_hi+b_lo) and subtract off c_hi and c_lo: and that’s then one multiply for the result of two. With that trick, Karatsuba multiplication lets us turn our big multiplication into three multiplications of half the size, and that eventaully boils down to an algorithm with complexity Θ(d^1.58) or thereabouts. Hooray!

Some of the questions I was writing for the quiz were for the students to compute the complexity of variants of Karatsuba’s trick: generalize the trick to cross-terms when the numbers are divided into thirds rather than halves, or quarters, and so on. You can multiply numbers by doing six multiplies of one-third the size, or ten multiplies of one-quarter the size, or... wait a minute! Those generalizations of Karatsuba’s trick are worse, not better! That was completely counter to my intuition that a generalization of Karatsuba’s trick should be asymptotically better, and that there was probably some sense in which the limit of doing divide-bigly-and-conquer-muchly would turn into an equivalent of FFT-based multiplication with Θ(d×log(d)) complexity. But this generalization was pulling me back towards Θ(d²)! What was I doing wrong?

Well what I was doing wrong was applying the wrong generalization. I don’t feel too much shame; it appears that Karatsuba did the same. If you’re Toom or Cook, you probably see straight away that the right generalization is not to be clever about how to calculate cross terms, but to be clever about how to multiply polynomials: treat the divided numbers as polynomials in 2^k, and use the fact that you need one more value than the polynomial’s degree to determine all its coefficients. This gets you a product in five multiplications of one-third the size, or seven multiplications of one-quarter, and this is much better and fit with my intuition as to what the result should be. (I decided that the students could do without being explicitly taught about all this).

Meanwhile, here I am on my train journey of relative freedom, and I thought it might be interesting to see whether and where there was any benefit to implement Karatsuba multiplication in SBCL. (This isn’t a pedagogy blog post, or an Algorithms & Data Structures blog post, after all; I’m on my way to a Lisp conference!). I had a go, and have a half-baked implementation: enough to give an idea. It only works on positive bignums, and bails if the numbers aren’t of similar enough sizes; on the other hand, it is substantially consier than it needs to be, and there’s probably still some room for micro-optimization. The results?

The slopes on the built-in and Karatsuba mulitply (according to the linear model fit) are 1.85±0.04 and 1.52±0.1 respectively, so even million-(binary)-digit bignums aren’t clearly in the asymptotic régimes for the two multiplication methods: but at that size (and even at substantially smaller sizes, though not quite yet at Juho’s 1000 bits) my simple Karatsuba implementation is clearly faster than the built-in multiply. I should mention that at least part of the reason that I have even heard of Karatsuba multiplication is Raymond Toy’s implementation of it in CMUCL.

Does anyone out there use SBCL for million-digit multiplications?

going to els2017

2017-04-02T13:24:33Z

I’m going to the European Lisp Symposium this year! In fact, I have to catch a train in two and a half hours, so I should start thinking about packing.

I don’t have a paper to present, or indeed any agenda beyond catching up with old and new friends and having a little bit of space to think about what might be fun to try. If you’re there and you want to make suggestions, or just tell me what you’re up to: I’d love to hear. (And if you’re not there: bad luck, and see you another time!)

not going to els2016

2016-05-07T20:30:37Z

I’m not going to the European Lisp Symposium this year.

It’s a shame, because this is the first one I’ve missed; even in the height of the confusion of having two jobs, I managed to make it to Hamburg and Zadar. But organizing ELS2015 took a lot out of me, and it feels like it’s been relentless ever since; while it would be lovely to spend two days in Krakow to recharge my batteries and just listen to the good stuff that is going on, I can’t quite spare the time or manage the complexity.

Some of the recent complexity: following one of those “two jobs” link might give a slightly surprising result. Yes, Teclo Networks AG was acquired by Sandvine, Inc. This involved some fairly intricate and delicate negotiations, sucking up time and energy; some minor residual issues aside, I think things are done and dusted, and it’s as positive an outcome for all as could be expected.

There have also been a number of sadder outcomes recently; others have written about David MacKay’s recent death; I had the privilege to be in his lecture course while he was writing Information Theory, Inference, and Learning Algorithms, and I can trace the influence of both the course material and the lecturing style on my thought and practice. I (along with many others) admire his book about energy and humanity; it is beautifully clear, and starts from the facts and argues from those. “Please don’t get me wrong: I’m not trying to be pro-nuclear. I’m just pro-arithmetic.” – a rallying cry for advocates of rationality. I will also remember David cheefully agreeing to play the viola for the Jesus College Music Society when some preposterous number of independent viola parts were needed (my fallible memory says “Brandenburg 3”). David’s last interview is available to view; iPlayer-enabled listeners can hear Alan Blackwell’s (computer scientist and double-bassist) tribute on BBC Radio 4’s Last Word.

So with regret, I’m not travelling to Krakow this year; I will do my best to make the 10th European Lisp Symposium (how could I miss a nice round-numbered edition?), and in the meantime I’ll raise a glass of Croatian Maraschino, courtesy of my time in Zadar, to the success of ELS 2016.

lots of jobs in computing at goldsmiths

2015-06-01T09:27:02Z

There’s an awkward truth that perhaps isn’t as well known as it ought to be about academic careers: an oversupply of qualified, motivated candidates chasing an extremely limited supply of academic jobs. Simplifying somewhat, the problem is: if one tenured professor (or “lecturer” as we call them in the UK) is primarily responsible for fifteen PhDs over their career, then fourteen of those newly-minted doctors will not get permanent jobs in academia.

That wouldn’t be a problem if the career aspirations of people studying for doctorates were in line with the statistics – if about one in ten to one in twenty PhD candidates wanted a job in academia, then there would be minimal disappointment. However, that isn’t the case; many more doctoral students have the ambition and indeed the belief to go on and have an academic career: and when belief meets reality, sometimes things break. Even when they don’t, the oversupply of idealistic, qualified and motivated candidates leads to distortions, such as a large number of underpaid sessional teaching staff, assisting in the delivery of courses to ever larger cohorts of students (see also). The sector hasn’t sunk as low as the “unpaid internship” seen in other oversupplied careers (games, journalism, fashion) – though it has come close, and there are some zero-hour contract horror stories out there, as well as the nigh-on-exploitative short-term postdocs that are also part of the pyramid.

All this is a somewhat depressing way to set the scene for our way of redressing the balance: Goldsmiths Computing is hiring to fill a number of positions. Some of the positions are traditional lecturer jobs – fixed-term and permanent – and while they’re good openings, and I look forward to meeting candidates and working with whoever is successful, they’re not what’s most interesting here. We have also allocated funds for a number of post-doctoral teaching and research fellowships: three year posts where, in exchange for helping out with our teaching, the fellows will be able to pursue their own research agenda, working in collaboration with (but not under the direction of) established members of staff. I think this is a hugely positive move, and a real opportunity for anyone interesting in the particular kinds of areas of Computing that we have strengths in at Goldsmiths: Games and Graphics, Music and Art Computing, Data and Social Computing, Human-Computer Interaction and AI, Robotics and Cognition. (And if applicants were to want to work with me on projects in Music Informatics or even involving some programming language work, so much the better!)

The complete list of positions we’re hoping to fill (apply by searching for the “Computing” Department in this search form) is:

Lecturer in Computational Art – 0.5FTE, 3 year fixed-term
Lecturer in Computer Science – full-time, 3 year fixed-term
Lecturer in Computer Science – 0.5FTE, 3 year fixed-term
Lecturer in Games and Graphics – full-time, open-ended
Lecturer in Games Art – 0.5FTE, open-ended
Lecturer in Physical Computing – full-time, open-ended
Post-doctoral Teaching and Research Fellow – full-time, 3 year fixed-term

The deadline for applications for most of these posts is Monday 8th June, so get applying!

els2015 it happened

2015-04-23T10:47:10Z

Oh boy.

It turns out that organizing a conference is a lot of work. Who’d have thought? And it’s a lot of work even after accounting for the benefits of an institutional Conference Services division, who managed things that only crossed my mind very late: signage, extra supplies for college catering outlets – the kinds of things that are almost unnoticeable if they’re present, but whose absence would cause real problems. Thanks to Julian Padget, who ran the programme, and Didier Verna, who handled backend-financials and the website; but even after all that there were still a good number of things I didn’t manage to delegate – visa invitation letters, requests for sponsorship, printing proceedings, attempting to find a last-minute solution for recording talks after being reminded of it on the Internet somewhere... I’m sure there is more (e.g. overly-restrictive campus WiFi, blocking outbound ssh and TLS-enabled IMAP) but it’s beginning to fade into a bit of a blur. (An enormous “thank you” to Richard Lewis for stepping in to handle recording the talks as best he could at very short notice).

And the badges! People said nice things about the badges on twitter, but... I used largely the same code for the ILC held in Cambridge in 2007, and the comment passed back to me then was that while the badges were clearly going to become collectors’ items, they failed in the primary purpose of a badge at a technical conference, which is to give to the introvert with poor facial recognition some kind of clue who they are talking to: the font size for the name was too small. Inevitably, I got round to doing the badges at the last minute, and between finding the code to generate PDFs of badges (I’d lost my local copy, but the Internet had one), finding a supplier for double-sided sheets of 10 85x54mm business cards, and fighting with the office printer (which insisted it had run out of toner) the thought of modifying the code beyond the strictly necessary didn’t cross my mind. Since I asked for feedback in the closing session, it was completely fair for a couple of delegates to say that the badges could have been better in this respect, so in partial mitigation I offer a slightly cleaned-up and adjusted version of the badge code with the same basic design but larger names: here you go (sample output). (Another obvious improvement suggested to me at dinner on Tuesday: print a list of delegate names and affiliations and pin it up on a wall somewhere).

My experience of the conference is likely to be atypical – being the responsible adult, I did have to stay awake at all times, and do some of the necessary behind-the-scenes stuff while the event was going on. But I did get to participate; I listened to most of most of the talks, with particular highlights for me being Breanndán Ó Nualláin’s talk about a DSL for graph algorithms, Martin Cracauer’s dense and technical discussion of conservative garbage collection, and the demo session on Tuesday afternoon: three distinct demos in three different areas, each both well-delivered and with exciting content. Those highlights were merely the stand-out moments for me; the rest of the programme was pretty good, too, and it looked like there were some good conversations happening in the breaks, over lunch, and at the banquet on Monday evening. We ended up with 90 registrations all told, with people travelling in from 18 other countries; the delegate with the shortest distance to travel lived 500m from Goldsmiths; the furthest came from 9500km away.

The proceedings are now available for free download from the conference website; some speakers have begun putting up their talk materials, and in the next few weeks we’ll try to collect as much of that as we can, along with getting release permissions from the speakers to edit and publish the video recordings. At some point there will be a financial reckoning, too; Goldsmiths has delivered a number of services on trust, while ELSAA has collected registration fees in order to pay for those services – one of my next actions is to figure out the bureaucracy to enable these two organizations to talk to each other. Of course, Goldsmiths charges in pounds, while ELSAA collected fees in euros, and there’s also the small matter of cross-border sales tax to wrap my head around... it’s exciting being a currency speculator!

In summary, things went well – at least judging by the things people said to my face. I’m not quite going to say “A+ would organize again”, because it is a lot of work – but organizing it once is fine, and a price worth paying to help sustain and to contribute to the communication between multiple different Lisp communities. It would be nice to do some Lisp programming myself some day: some of the stuff that you can do with it is apparently quite neat!

els2015 is nearly here

2015-03-20T17:32:13Z

This year, I have had the dubious pleasure of being the Local Organizer for the European Lisp Symposium 2015, which is now exactly one month away; in 31 days, hordes of people will be descending on South East London to listen to presentations, give lightning talks and engage in general discussions about all things Lisp – the programme isn’t quite finalized, but expect Racket, Clojure, elisp and Common Lisp to be represented, as well as more... minority interests, such as C++.

Registration is open! In fact, for the next nine days (until 29th March) the registration fee is set at the absurdly low price of €120 (€60 for students) for two days of talks, tutorials, demos, coffee, pastries, biscuits, convivial discussion and a conference dinner. I look forward to welcoming old and new friends alike to Goldsmiths.

tmus research programmer position

2015-03-16T11:47:14Z

The AHRC-funded research project that I am a part of, Transforming Musicology, is recruiting a developer for a short-term contract, primarily to work with me on database systems for multimedia (primarily audio) content. The goal for that primary part of the contract is to take some existing work on audio feature extraction and probabilistic nearest-neighbour search indexing, and to develop a means for specialist users (e.g. musicologists, librarians, archivists, musicians) to access the functionality without needing to be experts in the database domain. This of course will involve thinking about user interfaces, but also about distributed computation, separation of data and query location, and so on.

The funding is for six months of programmer time. I would have no objection to someone working for six months in a concentrated block of time; I would also have no objection to stretching the funding out over a longer period of calendar time: it might well provide more time for reflection and a better outcome in the end. I would expect the development activities to be exploratory as well as derived from a specification; to span between the systems and the interface layer; to be somewhat polyglot (we have a C++ library, bindings in Python, Common Lisp and Haskell, and prototype Javascript and Emacs front-ends – no applicant is required to be fluent in all of these!)

There are some potentially fun opportunities during the course of the contract, not least working with the rest of the Transforming Musicology team. The post is based at Goldsmiths, and locally we have some people working on Systems for Early Music, on Musicology and Social Networking, and on Musical Memory; the Goldsmiths contribution is part of a wider effort, with partners based at Oxford working on Wagnerian Leitmotif and on a Semantic Infrastructure, at Queen Mary working on mid-level representations of music, and in Lancaster coordinating multiple smaller projects. As well as these opportunities for collaboration, there are a number of events coming up: firstly, the team would hope to have a significant presence at the conference of the International Society for Music Information Retrieval, which will be held in Málaga in October. We don’t yet know what we will be submitting there (let alone what will be accepted!) but there should be an opportunity to travel there, all being well. In July we’ll be participating in the Digital Humanities at Oxford Summer School, leading a week-long programme of workshops and lectures on digital methods for musicology. Also, in November of 2014, we also participated in the AHRC’s “Being Human” festival, with a crazy effort to monitor participants’ physiological responses to Wagner opera; there’s every possibility that we will be invited to contribute to a similar event this year. And there are other, similar projects in London with whom we have friendly relations, not least the Digital Music Lab project at City University and the EU-funded PRAISE project at Goldsmiths.

Sound interesting? Want to apply? There’s a more formal job advert on the project site, while the “person specification” and other application-related materials is with Goldsmiths HR. The closing date for applications is the 27th March; we’d hope to interview shortly after that, and to have the developer working with us from some time in May or June. Please apply!

ref2014 data update

2015-03-10T13:36:30Z

Does anyone still care about REF2014? Apart from agonizing about what it will mean in the new assignments of quality-related funding for institutions, obviously.

Among the various surprising things I have had to do this term (as in “surprise! You have to do this”) was to participate in the Winter graduation ceremony: reading out the names of graduands. It was fun; the tiniest bit stressful, because I hadn’t ever managed to attend a ceremony – but since everyone was there to celebrate, most of the pressure was off; I think I managed to doff my hat at the required times and not to trip over my gown while processing. Part of the ceremony is a valedictory speech from the Warden (vice-Chancellor equivalent), and it was perhaps inevitable that part of that was a section congratulating our soon-to-be alumni on belonging to an institution with high-quality research, or possibly “research intensity”.

That reminded me to take another look at the contextual data published by the Higher Education Statistics Authority; part of the “research intensity” calculation involves an estimate of the number of staff who were eligible to participate in the REF. It is only an estimate, not for any fundamental reason but because the employment data and the research submissions were collected by two different agencies; the data quality is not great, resulting probably in about equal measure from database schema mismatches (working out which REF2014 “Unit of Assessment” a given member of departmental staff belongs to) and human error. The good news is that at least the human error can be corrected later; there are now four Universities who have submitted corrected employment numbers to HESA, subtracting off a large number of research assistants (fixed-term contract researchers) from their list of REF-eligible staff – which naturally tends to bump up their measured research intensity.

New corrections mean new spreadsheets, new slightly-different data layouts, and new ingestion code; I suffer so you don’t have to. I’ve made a new version of my ref2014 R package containing the REF2014 results and contextual data; I’ve also put the source of the package up on github in case anyone wants to criticize (or learn from, I guess) my packaging.

a year in review

2014-12-31T22:27:37Z

A brief retrospective, partly brought to you by grep:

CATS credits earnt: 30 (15 + 7.5 + 7.5) at level 7
crosswords solved: >=40
words blogged: 75k
words blogged, excluding crosswords: 50k
SBCL releases made: 12 (latest today!)
functioning ARM boards sitting on my desk: 3 (number doing anything actually useful, beyond SBCL builds: 0 so far, working on it)
emacs packages worked on: 2 (iplayer squeeze)
public engagement events co-organized: 1
ontological inconsistencies resolved: 1
R packages made: 1 (University PR departments offended: not enough)

Blogging’s been a broad success; slightly tailed off of late, what with

Crawl Dungeon Sprint wins: 2 (The Pits, Thunderdome: both GrFi)

so there’s an obvious New Year’s Resolution right there.

Happy New Year!

ref2014 data in R package form

2014-12-28T22:55:18Z

As threatened, one last post about REF2014. Actually this post is not so much about REF2014 any more; it’s mostly about R. I’ve finally found the time to package up the REF2014 data as a convenient R package, along with accompanying documentation and a “vignette”. Some quick observations about the process:

it’s not desperately easy to find authoritative documentation on best practices, or even appropriate first steps. There’s authoritative documentation on the format and contents of an R package, but that documentation suffers from describing what not to do without describing what to do instead (example); there’s Hadley Wickham’s suggestions for best practices, but not quite enough of those suggestions were applicable to this case (data-only, converted from upstream formats). Putting it all together took more brain cycles than I care to admit.
Sweave is a bit clunky. It does the job but I can’t help but compare it to org-mode and markdown (R has knitr for markdown vignettes, but I couldn’t get rmarkdown working in the budgeted time, so fell back to Sweave).
it’s quite nice to be forced to document all the columns of a data frame. I say “forced”; it’s not that strong, but having R CMD check tell me about all the bad things I’ve done is a decent motivator.

I’m not sure what the licence of the REF2014 data is (or even if that is a question that makes sense). It appears I’m not the only one who is unsure; the Web and Internet Science crowd at Southampton have put up a RDF version of the REF2014 data, and Christopher Gutteridge doesn’t know about licensing either. Meanwhile, in the general belief that having this dataset more available is likely to be positive, all things considered, have at it (and tell me where the packaging has gone wrong...).