Tuesday, August 19, 2003
SCO's Math Is Off, Or Maybe It's Their Ethics
Dick Gringas, a programmer and Groklaw reader spent the time to figure out some of SCO's math. They are talking about millions of lines of code. Dick has figured out the numbers for SMP/RCU/NUMA code in Linux, and even if you put them all together in one heap, it doesn't add up to millions of lines of code. Here is Dick Gringas' work, and thank you for it.
"Just finished spending about eight hours compiling info on the lines of SMP/RCU/NUMA code contained in the Linux kernel (see below).
"I'm not a member of the Linux kernel community, but I've been programming for upwards of 35 years, the first 12 of which I worked on operating systems and compilers, so I have sufficient background to do
a credible job analyzing the code base.
"Because I had to eyeball each file that possibly contained some of the disputed code, I thought I might as well include the name(s) of author(s) and the last copyright year. So without further ado, here's the data:
"Lines of code (LOC) in Linux SMP, RCU and NUMA.
"The total LOC for all of SMP/RCU/NUMA is 5,124. To provide perspective, the total LOC for all of the Linux kernel is approximately 5.2 million, including the code for all twenty architectures that Linux will run on plus all the drivers for the myriad supported peripherals.
"The results here were obtained by searching the kernel tree for:
"1. a filename that contains the string smp/rcu/numa, or
"2. a source file that contains #ifdef for SMP/RCU/NUMA.
"Each resulting file was then manually examined and the lines pertaining to SMP/RCU/NUMA were counted.
"All line counts include comments and blank lines.
"Only files used as part of the Intel i386 architecture are included
because that's the only platform on which SCO's OpenServer and UnixWare run. Most of
the code for SMP and NUMA is completely different for other architectures, including the Intel IA64 (Itanium).
"Not counted: source files that contain trivial code, i.e.,
". includes of header files (.h)
". variable definitions
". macro definitions
". calls to external subroutines defined in one of the principle modules, for instance, drivers for peripheral hardware
"Names of authors and last copyright date is noted if copyright statements or authorship was given. If an author indicated his company, that is so noted. Where a source file was worked on by many programmers, only the
principle authors are listed."
Linux Kernel 2.6.0-test3 (latest as of 8/17/03)
Symmetric MultiProcessing (SMP) Code:
592 arch/i386/kernel/smp.c 1995 Alan Cox, Red Hat; 2000 Ingo Molnar, Red Hat
1186 arch/i386/kernel/smpboot.c 995 Alan Cox, Red Hat; 2000 Ingo Molnar, Red Hat
295 kernel/module.c 2002 Rusty Russell, IBM
528 kernel/sched.c 2002 Linus Torvalds; Ingo Molnar
60 kernel/timer.c 1992 Linus Torvalds; Ingo Molnar, Red Hat; David S Miller; Alexey Kuznetsov
5 kernel/exit.c 1992 Linus Torvalds
35 kernel/posix-timers.c 2002 George Anzinger, MontaVista Software; Richard Henderson
22 mm/swap.c 1994 Linus Torvalds
60 mm/slab.c 1997 Mark Hemment; 2002 Manfred Spraul
3367=Total SMP Code
Read-Copy Update (RCU) Code: (actually part of SMP code)
267 kernel/rcupdate.c 2001 Dipankar Sarma, IBM
135 include/linux/rcupdate.h 2001 Dipankar Sarma, IBM
402= Total RCU Code
Non-Uniform Memory Architecture (NUMA) Code:
164 kernel/sched.c (see under SMP)
58 arch/i386/kernel/mpparse.c 1995 Alan Cox, Red Hat
25 arch/i386/kernel/smpboot.c 1995 Alan Cox, Red Hat
106 arch/i386/kernel/numaq.c 2002 Patricia Gaughen,
429 arch/i386/mm/discontig.c 2002 Patricia Gaughen,
129 arch/i386/pci/numa.c no copyright statement
19 arch/i386/mach-default/topology.c 2003 Patrick Mochel, OSDL; Paul Dorwin, IBM; Matthew Dobson, IBM
186 drivers/acpi/numa.c 2002 Takayoshi Kochi, NEC
23 mm/page_alloc.c 1999 Kanoj Sarcar, SGI
~50 mm/slab.c 2002 Manfred Spraul
166 include/asm-i386/numaq.h 2002 Patricia Gaughen,
1355=TOTAL NUMA Code
Dick Gingras, August 19, 2003
I asked another programmer to repeat the work, and he reports that the work is good in his opinion, with minor number differences, but not of any significance to the main point. Gingras chose to use the 2.6 kernel, because it presumably has the most high-end code.
Then I got another email, and another coder has been doing some math homework too, and when he also found the code can't add up to millions of lines, he has a theory:
"I think SCO is including everything that _uses_ the 3 disputed technologies and not just allegedly copied SYSV code. I grepped for files that use the 3 technologies (using a rough method) and counted their lines.
$ grep -irlE '_smp|smp_' . | xargs cat | wc -l 1120087 (sco claims 750k)
$ grep -irlE '_rcu|rcu_' . | xargs cat | wc -l 79138 (sco claims 110k)
$ grep -irlE '_numa|numa_' . | xargs cat | wc -l 41809 (sco claims 55k)
"The figures don't exactly match but they're in the right ballpark. I think this is similar to the method SCO has been using to discover "derivative forks". They think anything that links against their allegedly copied SYSV code is a derivative work of SYSV. For example, the ext2 filesystem code uses spinlock code from the SMP core. I think SCO is claiming that ext2 is 'copied' from SYSV because of those spinlocks.
"I hope I've got it wrong because if this is what SCO is doing then they're engaged in a IP land-grab. They're using their allegedly copied SMP and NUMA and RCU code to steal millions of lines of code from thousands of Linux copyright holders. The hypocrisy of SCO claiming they're protecting IP rights for the 'little guy' while trampling over the IP rights of Linux copyright holders... it makes me sick to the stomach."
Today America, Tomorrow the World!
SCO plans an Anschluss of GNU/Linux in Europe, in addition to the US land grab. That is the conclusion I reach from reading today's press release:
"The SCO Group Announces Appointment of Gregory Blepp
"Tuesday August 19, 8:03 am ET
"Former VP of International Business at SuSE Joins SCO As VP of SCOsource in Europe
"LINDON, Utah, Aug. 19 /PRNewswire-FirstCall/ -- The SCO Group, Inc. (Nasdaq: SCOX - News), the owner of the UNIX® operating system, today announced the appointment of Gregory Blepp as vice president of SCOsource. Blepp will report to Chris Sontag, the senior vice president and general manager of SCOsource, the division of SCO tasked with protecting and licensing the company's UNIX intellectual property.
"Blepp, a former VP of International Business at SuSE, brings to SCO a wealth of experience in marketing and business management from time at Network Associates and Computer Associates. Blepp's appointment is taking place at SCOForum in Las Vegas this week where he is being introduced to SCO partners and resellers.
"'We're pleased to have Gregory Blepp join SCO to assist in our efforts around SCOsource in Europe,' said Chris Sontag, senior vice president and GM, SCOsource. 'We look forward to using Blepp's talents and expertise in assisting the company to properly license SCO's valuable UNIX intellectual property.'"
I like the knighthood being bestowed at SCOSource, with its catchy theme, Mission: Trained to Sell. I am sure his mommy is very proud of him for this great career move. And it does seem just right for a marketer to be the number two guy at SCOSource. I don't think that's the fit we'd normally have thought of, so it just goes to show how creative those wacky guys are in Utah. Bet Blepp will love Utah. Why, shucks, I even like his name for his assignment. I don't think Dickens could have invented anything better, and there's no denying that our Alice in Wonderland metaphor is starting to feel a little too nice and too sweet for what is now going on, and a Dickensian shift seems in order. Bleak House comes to mind. So all in all, Blepp seems the right man for the job. Here's hoping he's out of a job so fast that he never even shows up as a blip on the radar.
UPDATE: Lots of analysis of this available now by Bruce Perens, and LWN and Slashdot. Thank you, Groklaw readers. You are, once again, amazing. I knew you'd come through, but I'm amazed at the speed.
There is a
comment on Yahoo! Finance from a poster whose identity is unknown to me. Normally, that would mean I wouldn't mention it on Groklaw, as I have scrupulously avoided anything I couldn't evaluate and verify. But this is something I'd be delinquent not to report, I think, now that it's on the internet anyway and public. So I will just say that I am putting it up with a request that those who can appropriately evaluate it do so. Here is the message in full:
"SCO's 'proof. A joke.
by: d1rkinator 08/19/03 09:10 am
Msg: 29448 of 29609
"The code SCO finds offending:
"Its location in Linux:
"And its heritage:
"Ok, SCO: This was easy. Now, show us the other many examples.
This is definitely one of those days I wish I were a programmer. Feedback, those of you who are?
They "Show" the Code
Well, those slides went to SCOForum, even if Boies didn't. Here's the mainstream account:
"Sontag then showed, in a series of slides, Linux code that he claimed has been literally copied from Unix. He said numerous comments, unusual spellings and typographical errors had also been copied directly into Linux.
"Much of the Unix code in the slides was obscured, because the company wants to keep its intellectual property under wraps, but SCO is allowing people who want to see a more extensive side-by-side comparison during the conference to do so if they sign a nondisclosure agreement."
For Your Eyes Only, I guess. Somebody at the conference is posting to Yahoo! Finance as Korbomite, and his account of what he saw isn't exactly the same:
"McBride showed a number of what I would have thought were classroom exercises in a first-year c programming class. One side was marked as 'Linux,' one as UNIX®. The code seemed to be basic iterative programming and set-up code, as you would see in any text or on a test. Primarily, it seemed to be initializations of variables and set-up of stacks and heaps. At no time was there any explanation of or provision to provenance of either code example. No one brought up the general availability of the Linux source tree and the time it has been available vs. the date of SCO's filing. No one questioned the sheer AMOUNT of claimed code vs. the total in the kernel/module space. There WERE striking similarities in the examples, but there were also differences."
I see CNET reporting that the crowd burst into applause at one point. Korbomite says there were "far fewer" than 1,000 people there. I haven't seen that pointed out anywhere else. And when they applauded, it was about OpenServer's features, which include ... um... Samba, a GPLd product. Bit of a disconnect there.
ITWorld reports that they are now claiming a million lines of code, derivative code, not direct copying. eWeek says the same thing, adding that Sontag says "it's highly unlikely the matter could be resolved by removing that code". No, you won't let us remove it. Then the case would be over. Talking derivative code instead of copying means nothing can be fixed until we go to trial and hear their "rocket scientists" testify to how they used "spectral analysis" to find common code. Rocket scientists found the allegedly infringing code. Anything sound fishy to you about that?
Here's the CRN description:
"While it was difficult to ascertain the exact code being shown on screen, attorneys pointed to exact copying of some code from Unix to Linux and claimed that IBM improperly donated almost a million lines of Unix System V code to the Linux 2.4x and Linux 2.5x kernel that infringe on its Unix System V contract with SCO -- and SCO's intellectual property.
"SCO claimed that much of the core code of Linux including Non-Uniform Memory Access, the Read Copy Update for high-end database scalability, Journaling File System, XFS, Schedulers, Linux PPC 32 and 64-bit support and enterprise volume management is covered by SCO's Unix System V contracts and copyrights.
"For example, 110,000 lines of Unix System V code for read copy update, 55,000 lines of NUMA code and more than 750,000 lines of symmetric multi-processing code from Unix System V has made its way into Linux, attorneys and SCO executives claimed."
Again, McBride compared GNU/Linux users to pirates:
"'We're fighting for a right in the industry to make a living selling software,' McBride said. 'The whole notion that software should be free is something SCO doesn't stand for. We have drawn the line. We're supposed to be excited about that and we're not. . . . '
"Globally, it's not just about Red hat and IBM. There are a lot of issues around IP with music, and in Hollywood. We are in the software industry having these issues and this can have a significant impact going forward. The evidence we have is strong.'"
So, you've been warned, coders. SCO won't stand for it if you let people use your code for free.
And this tidbit: It seems HP is their new best friend, to hear McBride tell it:
"But SCO's McBride said that there are two companies he has no intention of going after: Hewlett-Packard Co. and Sun Microsystems Inc. 'We have no problems with Sun and HP with regards to infringement as both have honored the conditions of their Unix license contracts and operated within these,' he said."
Seems they are planning on rolling out a 64-bit UNIX for Itanium 2 one of these days:
"SCO Group Inc is proving its commitment to the future of Unix on Intel Corp processors by announcing plans for new versions of its OpenServer and UnixWare flavors and a new 64-bit version for Itanium 2.
"The Lindon, Utah-based company has been through multiple projects to develop a 64-bit version of Unix for Intel in the past, most recently Project Monterey with IBM Corp, which led in part to the current legal battle between the two companies.
"SCO said it will be careful not to infringe on any information gained through those projects in the development of SCO Unix 9, the new 64-bit variant due for release in 2005."
What are the odds of this staying out of the case? My favorite quotation? One SCO exec said, "Under the microscope we're in, I'm sure we'll do the right thing." That's as opposed to when they are not being watched closely, I presume. Here, it says they have formed a partnership with Open Systems, Inc., a company that does accounting software for Windows, UNIX, and Linux. Here's what Open Systems, Inc.'s VP of Marketing Mil Miketic has to say about what they will be working on together with SCO:
"'SCO is a strategic partner for Open Systems,' said Miketic. 'Both of our organizations share a common target market, and we can leverage our combined channel strengths to take advantage of the rising popularity of Linux applications in creating accounting solutions for small to medium-sized businesses.'"
So, maybe the actual explanation is that SCO wishes to grab Linux for itself, claim ownership, get the GPL invalidated, get paid royalties, and then...profit! And this is about honoring IP? What about the owners of the GPL code? Planning on at least sharing the loot with the guys who actually wrote the code and didn't turn over their copyrights to you, you SCO pirates? Of course, I could be hallucinating, I suppose.
A bit more explanation from Moglen in the Register, on the GPL.
James Bond on a Mission to Sell
I do wonder sometimes what will become of my synapses after I spend so many months trying to figure out the way McBride and his band of merry men think. Translating SCOSpeak can't be neuronally healthy.
But while I still am in command of my senses, I couldn't help but notice something curious in the news coverage of SCOForum today. Could SCO's Chris Sontag be telling fibs? Heavens to Betsy, surely that can't be it. Maybe he's so new, he doesn't know his company's history? Or maybe James has to tell a few fibs when he's on a "Misson: Trained to Sell". Here's part of what Sontag said today:
"Turning to derivative works that have found their way into Linux, Sontag said these include NUMA (non uniform memory access), Read Copyright Update (RCU), Journal File System and schedulers. 'A number of entities have violated their contracts and contributed inappropriate code to Linux. That's how Linux has advanced so quickly and found its way into the enterprise so soon,' Sontag said.
"'We have an improbable Linux development process. The current 2.5 kernel contains features and functionality that took years and years to be developed in Unix. With Linux we've seen it develop from a baby to a race car driver in three or four years,' he said."
I believe we can assume that when he said "Read Copyright Update", it was a Freudian slip, their business strategy du jour -- those rascals just never stop thinking business, so it's bound to spill over into their speech -- and what he meant to say was Read Copy Update. Or possibly another mainstream journalist has had his brain snatched and turned into cabbage, and now the poor thing just writes whatever SCO speaks, without questioning it. But, hey, what does precision matter on such a vital mission?
Let's take a trip back in time, to a simpler, gentler Caldera just about a year ago. Computerwire ran a story on June 13, 2002, and if you compare what was said today with what they wrote, your brain might explode trying to make them match up. Both stories can't be true, because they seem to be mutually exclusive. For those of you with a sub to Computerwire, here's where you can get it the whole article, "Caldera Backs Away From 64-Bit Open Unix". For the rest, here are relevant snips:
"Caldera International Inc has maintained its commitment to the Unix operating systems it acquired from Santa Cruz Operation Inc, despite admitting that it currently has no plans to port Open Unix to Intel Corp's 64-bit Itanium processor.
"With development of the company's Linux distribution more or less handed over to SuSE Linux AG and the UnitedLinux project, Caldera's research and development dollars are now focused on its Open Unix and OpenServer Unix flavors and the Volution management products, but while both Unix variants continue to be developed by the company, neither are likely to be available for 64-bit processors.
As the legacy Unix variant, OpenServer was never likely to be ported to Itanium, but sizable investment has gone in to projects to develop a 64-bit version of Open Unix, both with IBM on the Monterey project and through SCO's Gemini project that created UnixWare 7, the predecessor to the current Open Unix 8. Feedback from Intel and customers, however, has led Caldera to the conclusion that there is enough life in the 32-bit market.
"'The feedback from Intel and our customers is that 64-bit addressing today just isn't a priority, and the 32-bit processors are just getting better and better,' said Caldera's VP EMEA, Chris Flynn. '32-bit is good enough for most people's processing requirements.' That appears to suggest that Open Unix and OpenServer's lifespan will last only as long as 32-bit processors continue to sell, but Flynn maintained that the operating systems will remain available as long as customers want them.
"'There's plenty of mileage in 32-bit Unix,' he said. 'Until our customers tell us that they don't want Unix and they don't want 32-bit Intel any more, which I don't see happening, then nothing's going to change. 32-bit is just great for customers over the next few years, but we do have
choices, and we could move forward with our 64-bit projects.'
"One of those choices will be 64-bit Linux, which is being developed through the IA-64 Linux Project, and will be available from Caldera. Flynn believes that by the time users are looking to purchase 64-bit servers and operating systems in volume, Linux will have gained the robustness and scalability it requires to compete with Unix in the enterprise market.
"Another option Caldera has on the shelf is IBM's AIX 5L, which was developed from the Monterey project between IBM and SCO. In 2001, Caldera offered a preview of the AIX 5L operating system for Itanium to developers, and it remains a possibility that Caldera will offer IBM's Unix for 64-bit users should there be the demand."
If you look at this chart, you'll see that System V is later called OpenUnix, in case you like visuals. I worry my brain is starting to look like that chart, or worse, from trying to parse out all the SCOstories. Now, I'm not a programmer, so it's certainly possible my brain is just missing something -- heaven only knows I laughed myself silly today reading all about the anti-GPL strategy our worthy opponents have concocted -- but I simply can't harmonize what was said today with this article from June of 2002. Then again, I'm not Trained to Sell, and so far, my brain hasn't been turned into cabbage, so maybe that explains how come I notice disparities and think they matter.