Why XML?

Airlines began computerizing their reservations systems starting in the 1960's, with American Airlines' SABRE system being the first. Other airlines soon realized that this was a competitive advantage and created their own systems and by the 1970's, every airline was computerized. Of course, what this means is that the travel industry is based on 1960's technology.

In the 1970's, if you wanted to ask United's Apollo reservation system for a list of published fares between Raleigh / Durham, North Carolina, and Seattle, Washington, you'd say something like this:

$DRDUSEA
and Apollo would respond with something like this:
FARES LAST UPDATED 23JAN 11:31 AM
>$DRDUSEA
         RDU-SEA DEPART 23JAN
ADULT FARES
CMP/NXC/FFY/GOV/MIL/MLC/SEN/SPL/VUS/YTH FARES ALSO EXIST
     U.S. PASSENGER FACILITY CHARGES / SURCHARGES MAY APPLY
     TAXES AND FEES MAY VARY DEPENDING ON THE BOOKED ITINERARY
        USD     FARE         MIN/   XL  TVL DATES    TKT DATES
    CX  FARE    BASIS    AP    MAX  FE  FIRST/LAST   FIRST/LAST
  1 CO  198.00R TLE7SN   07| SU/30  ||     -/20JUNC     -/-
  2 NW  198.00R KO7M1N6  07| 01/--  ||     -/31MARC     -/-
  3 DL  198.00R U14M1O49 14| 01/--  ||     -/06JUNC     -/-
  4 UA  198.00R TE7ONQ5  07| SU/--  ||     -/31MARC     -/-
)><
In point of fact, it would have looked different for a number of reasons, principally that it's now 30 years later, but you get the idea. The important thing is that this was the state of the art, and if you're talking to Sabre, Amadeus, or Worldspan, things would work pretty much the same. The data formats are pretty terse, because back in 1960, bytes were expensive. Note, for instance, that Apollo tells you that the last effective date for that first fare on Continental is June 20, but the meaning of that is the next occurrence of June 20; in other words, the meaning of a date depends on today's date. It's human readable, if the human knows a bit about airlines and how they work. I'd be willing to bet that most people who've travelled by air could pick out some meaning; pretty much anybody could tell you that the cheapest fare is $198.00. If you know what you're looking for, it's not bad to parse, except for the complete lack of metadata, the lack of any self-describing features (i.e. field delimiters) and the fact that this response could change at any time, without warning. To get around that problem, the airlines started using structured data. Many in the industry started building proprietary structured data. United's Apollo and British Airways' Galileo systems, now united under common ownership, standardized on a format that involved fixed length formats. So in the 1990's, you could make the same request that we just made above using the following request:
PQQ010001FRQ00CHRGFILDIN01142
4080000183K000000010065GFQH000F0002000000002NNNNNNNNNNNNYNNN
NNNNNNNN                0118GFFD000F00020000RDU  SEA  NNNNNN
NNNNNNN NN
              NNNNNNNN
To which Apollo or Galileo would respond:
PRR01000101Y00DOT011425080021921K00000001000052
GFRH000F00030000NNNYYNNNNNNNNNNNNNNNNNNNNNNNNNNN0067GFMM000F
00010000000000000000090FARES LAST UPDATED 24OCT  6:42PM0099G
FMM000F00010000000000000000090         RDU-SEA DEPART 23JAN
                                  0099GFMM000F00010000000000
000000090ADULT FARES
             0099GFMM000F00010000000000000000090CMP/NXC/FFY/
GOV/MIL/MLC/SEN/VUS/YTH FARES ALSO EXIST            0099GFMM
000F00010000000000000000090     U.S. PASSENGER FACILITY CHAR
GES / SURCHARGES MAY APPLY     0099GFMM000F00010000000000000
000090     TAXES AND FEES MAY VARY DEPENDING ON THE BOOKED I
TINERARY  0097GFMM000F00010000000000000000090        USD
 FARE         MIN/   XL  TVL DATES    TKT DATES0098GFMM000F0
0010000000000000000090    CX  FARE    BASIS    AP    MAX  FE
  FIRST/LAST   FIRST/LAST0091GFTD000F00030001NYNNNNNNNNNNNNN
N    DL   172.00RUL21M1SN21|01-- ||    0404C        09860091
GFTD000F00030002NYNNNNNNNNNNNNNN    CO   172.00RTO211BSN21|0
1-- ||    0404C        00060091GFTD000F00030003NYNNNNNNNNNNN
NNN    NW   172.00RK21PRNR 21|01-- ||    0404C        020900
91GFTD000F00030004NYNNNNNNNNNNNNNN    AA   172.00RLHE21D1N21
|01-- ||    0404C        0002
I've truncated the response to the first 4 fares. This isn't nearly as readable, even an expert would struggle at decoding this. You might pick out the RDU, SEA, and UA, and you might even pick up on the inventory listing (that's the part that looks like 01Y 009B 009M...). But it's pretty difficult to read and parse visually. Mechanically, the story isn't much better. While there is metadata for this, the format is pretty unforgiving and unhelpful because it isn't self-describing; you have to know how long each field is, a priori before you can parse it. You do get a bit of help, though. The Data Record block (that's the part that starts DOT011001060203294) gives you some hints, in particular that this data record is called 1001 version 6.2 and is 3294 bytes long. The mainframe could send you 1001 6.3 but any fields that weren't in version 6.2 would go at the end of the record, so clients parse as far as they know how and discard the remainder of the 3294 bytes that they don't. The mainframe can't send you version 7, because it's not constrained by the "end of block" rule and your request was version 6.2, so the mainframe can't expect you to know about version 7.

In any case, this is better, but really not good enough. The fact is that it takes a lot of code that knows about all the characteristics of an individual field. There's EDIFACT, which has delimited fields and a heirarchical structure, but EDIFACT turns out to be incredibly complex to compose messages in, and beastly to debug. And then there's XML. What if we took the metadata we have about our proprietary structures, and wrote the last parser anybody will ever write? What if we named each piece of data and wrapped it up in a tag?

<FareQuoteTariffDisplay_8_0>
 <FareDisplayMods>
 <QueryHeader>
  <UniqueKey>0000</UniqueKey>
  <LangNum>00</LangNum>
  <Action>002</Action>
  <RetCRTOutput>N</RetCRTOutput>
  <NoMsg>N</NoMsg>
  <NoTrunc>N</NoTrunc>
  <IMInd>N</IMInd>
  <FIPlus>N</FIPlus>
  <PEInd>N</PEInd>
  <HostUse16>N</HostUse16>
  <NBInd>N</NBInd>
  <ActionOnlyInd>N</ActionOnlyInd>
  <TranslatePeriod>N</TranslatePeriod>
  <GFYInd>N</GFYInd>
  <IntFrame1>N</IntFrame1>
  <SmartParsed>Y</SmartParsed>
  <PDCodes>N</PDCodes>
  <BkDtOverride>N</BkDtOverride>
  <HostUse25>N</HostUse25>
  <DefCurrency>N</DefCurrency>
  <PFPWInd>N</PFPWInd>
  <HostUse28>N</HostUse28>
  <HostUse29>N</HostUse29>
  <HostUse30>N</HostUse30>
  <HostUse31>N</HostUse31>
  <DefCurrencyLocInd>N</DefCurrencyLocInd>
  <HostUse33>N</HostUse33>
  </QueryHeader>
 <TravConstraints>
  <UniqueKey>0000</UniqueKey>
  <StartPt>DEN</StartPt>
  <EndPt>ORD</EndPt>
  <OW>N</OW>
  <RT>N</RT>
  <LongDispInd>N</LongDispInd>
  <ValidatingDispInd>N</ValidatingDispInd>
  <NUCInd>N</NUCInd>
  <RetDataInd>N</RetDataInd>
  <RulesInd>N</RulesInd>
  <BaseFares>N</BaseFares>
  <ConxPts>N</ConxPts>
  <IncDomTax>N</IncDomTax>
  <ConvAP>N</ConvAP>
  <FQSFareType>N</FQSFareType>
  <HalfRT>N</HalfRT>
  <CalShopReq />
  <Spare1>NN</Spare1>
 <StartDt>
 <![CDATA[        ]]>
  </StartDt>
  <AirV1 />
  <AirV2 />
  <AirV3 />
  <GlobDir />
  <ConxPt1 />
  <ConxPt2 />
 <EndDt>
 <![CDATA[        ]]>
  </EndDt>
 <TkDt>
 <![CDATA[        ]]>
  </TkDt>
  <FareType />
  <Currency />
  <Pt />
  <SellCurrency />
  <JointFares>N</JointFares>
  <RndWorld>N</RndWorld>
  <CircTrip>N</CircTrip>
  <Spare2>NNNNN</Spare2>
  </TravConstraints>
  </FareDisplayMods>
</FareQuoteTariffDisplay_8_0>
You'll get a considerable amount of data back. What we have is sort of readable, it's self-describing, and we can use off the shelf tools to parse it:
<FareQuoteTariffDisplay_8_0>
<FareInfo>
  <RespHeader>
    <UniqueKey>0000</UniqueKey>
    <CRTOutput>N</CRTOutput>
    <ErrMsg>N</ErrMsg>
    <AgntAlert>N</AgntAlert>
    <SmartParsedData>Y</SmartParsedData>
    <Spares1>YNNN</Spares1>
    <FQSOnlyItin>N</FQSOnlyItin>
    <Spares2>N</Spares2>
    <IFQLastF0>N</IFQLastF0>
    <IFQLastFQ>N</IFQLastFQ>
    <IFQLastD>N</IFQLastD>
    <IFQLastB>N</IFQLastB>
    <IFQLastV>N</IFQLastV>
    <Spare3>N</Spare3>
    <AppInd1>N</AppInd1>
    <AppInd2>N</AppInd2>
    <AppInd3>N</AppInd3>
    <AppInd4>N</AppInd4>
    <AppInd5>N</AppInd5>
    <AppInd6>N</AppInd6>
    <AppInd7>N</AppInd7>
    <AppInd8>N</AppInd8>
    <AppInd9>N</AppInd9>
    <AppInd10>N</AppInd10>
    <AppInd11>N</AppInd11>
    <AppInd12>N</AppInd12>
    <AppInd13>N</AppInd13>
    <AppInd14>N</AppInd14>
    <AppInd15>N</AppInd15>
    <AppInd16>N</AppInd16>
  </RespHeader>
  <InfoMsg>
    <UniqueKey>0000</UniqueKey>
    <QuoteNum>0</QuoteNum>
    <MsgNum>0</MsgNum>
    <AppNum>0</AppNum>
    <MsgType>9</MsgType>
    <Lang>0</Lang>
    <Text><![CDATA[FARES LAST UPDATED 24OCT  6:42PM]]></Text>
  </InfoMsg>
  <Tariff>
    <UniqueKey>1</UniqueKey>
    <Type1>N</Type1>
    <Type2>Y</Type2>
    <Type3>N</Type3>
    <Type4>N</Type4>
    <HasCitiesLine>N</HasCitiesLine>
    <PermittedDisc>N</PermittedDisc>
    <HasFreeForm>N</HasFreeForm>
    <HasPF>N</HasPF>
    <Spare1>NNNNNNNN</Spare1>
    <PIC />
    <Type2Qual>
      <SpclCondInd />
      <AirV>DL</AirV>
      <Fare>172.00</Fare>
      <RTInd>R</RTInd>
      <FIC>UL21M1SN</FIC>
      <AP>21</AP>
      <APEndItem>Š</APEndItem>
      <MinStay>01</MinStay>
      <MaxStay>--</MaxStay>
      <DirInd />
      <Pens>ŠŠ</Pens>
      <FirstTravDt />
      <LastTravDt>0404</LastTravDt>
      <FootnoteType>C</FootnoteType>
      <FirstTkDt />
      <LastTkDt />
      <RteInfo>986</RteInfo>
    </Type2Qual>
  </Tariff>
  <Tariff>
    <UniqueKey>2</UniqueKey>
    <Type1>N</Type1>
    <Type2>Y</Type2>
    <Type3>N</Type3>
    <Type4>N</Type4>
    <HasCitiesLine>N</HasCitiesLine>
    <PermittedDisc>N</PermittedDisc>
    <HasFreeForm>N</HasFreeForm>
    <HasPF>N</HasPF>
    <Spare1>NNNNNNNN</Spare1>
    <PIC />
    <Type2Qual>
      <SpclCondInd />
      <AirV>CO</AirV>
      <Fare>172.00</Fare>
      <RTInd>R</RTInd>
      <FIC>TO211BSN</FIC>
      <AP>21</AP>
      <APEndItem>Š</APEndItem>
      <MinStay>01</MinStay>
      <MaxStay>--</MaxStay>
      <DirInd />
      <Pens>ŠŠ</Pens>
      <FirstTravDt />
      <LastTravDt>0404</LastTravDt>
      <FootnoteType>C</FootnoteType>
      <FirstTkDt />
      <LastTkDt />
      <RteInfo>6</RteInfo>
    </Type2Qual>
  </Tariff>
  <Tariff>
    <UniqueKey>3</UniqueKey>
    <Type1>N</Type1>
    <Type2>Y</Type2>
    <Type3>N</Type3>
    <Type4>N</Type4>
    <HasCitiesLine>N</HasCitiesLine>
    <PermittedDisc>N</PermittedDisc>
    <HasFreeForm>N</HasFreeForm>
    <HasPF>N</HasPF>
    <Spare1>NNNNNNNN</Spare1>
    <PIC />
    <Type2Qual>
      <SpclCondInd />
      <AirV>NW</AirV>
      <Fare>172.00</Fare>
      <RTInd>R</RTInd>
      <FIC>K21PRNR</FIC>
      <AP>21</AP>
      <APEndItem>Š</APEndItem>
      <MinStay>01</MinStay>
      <MaxStay>--</MaxStay>
      <DirInd />
      <Pens>ŠŠ</Pens>
      <FirstTravDt />
      <LastTravDt>0404</LastTravDt>
      <FootnoteType>C</FootnoteType>
      <FirstTkDt />
      <LastTkDt />
      <RteInfo>209</RteInfo>
    </Type2Qual>
  </Tariff>
  <Tariff>
    <UniqueKey>4</UniqueKey>
    <Type1>N</Type1>
    <Type2>Y</Type2>
    <Type3>N</Type3>
    <Type4>N</Type4>
    <HasCitiesLine>N</HasCitiesLine>
    <PermittedDisc>N</PermittedDisc>
    <HasFreeForm>N</HasFreeForm>
    <HasPF>N</HasPF>
    <Spare1>NNNNNNNN</Spare1>
    <PIC />
    <Type2Qual>
      <SpclCondInd />
      <AirV>AA</AirV>
      <Fare>172.00</Fare>
      <RTInd>R</RTInd>
      <FIC>LHE21D1N</FIC>
      <AP>21</AP>
      <APEndItem>Š</APEndItem>
      <MinStay>01</MinStay>
      <MaxStay>--</MaxStay>
      <DirInd />
      <Pens>ŠŠ</Pens>
      <FirstTravDt />
      <LastTravDt>0404</LastTravDt>
      <FootnoteType>C</FootnoteType>
      <FirstTkDt />
      <LastTkDt />
      <RteInfo>2</RteInfo>
    </Type2Qual>
  </Tariff>
</FareInfo></FareQuoteTariffDisplay_8_0>