Citation Style Language

Revision as of 22:19, 4 May 2010 by 119.82.183.48 (Talk) (CSL record format)

Revision as of 22:19, 4 May 2010 by 119.82.183.48 (Talk) (CSL record format)

The Citation Style Language (CSL) is an XML-Based stylesheet language for formatting of citations and bibliographies. It is used in reference management software such as Zotero, Mendeley, CiteProc and Pandoc. CSL was initiated by Bruce D’Arcus in the XBib project. The latest specification of the language, CSL 1.0, was published in March 2010.

The idea behind CSL

Citation output is generated using CSL in a way similar to XSLT processing. If you know BibTeX you can compare CSL with the BibTeX style file language BAFLL (BibTeX Anonymous Forth-Like). The basic idea is to separate bibliographic data and the citation style definition, so that nicely formatted citations in various styles can be generated from a single body of data.

                           CSL-Style
                               |
                               v
 Bibliographic record -> CSL-Processor -> Citation

CSL processors have been written in a variety of programming languages. The most complete implementation of CSL 1.0 at present is the Javascript implementation, citeproc-js, which runs in Firefox and other Gecko-based browsers, Google Chrome, Safari, IE6 and above, and in Rhino and spidermonkey/tracemonkey for server-side deployments.

Getting started

If you use Zotero or Mendeley, you already use CSL under the hood. If you want to dig your hands into code, have a look at citeproc-js:

 hg clone http://bitbucket.org/fbennett/citeproc-js

A formatted version of the processor manual is available online, and a demo that runs the processor in a browser is also available. The citeproc-js source archive contains a large suite of test cases, and the test framework offers a lightweight platform for exploring the behavior of the processor.

Bibliographic record format

Of course you cannot throw just any bibliographic record format into a CSL processor; you must use the field names defined in the CSL 1.0 specification. Fields are of three types: plain text, date fields, and name fields. The latter two have an internal structure as described here. As a guide to the field assignments for particular types of content, the CSL mappings used in the Zotero reference manager are described here.

CSL record format

Derived from the CSL 1.0 specification and the citeproc-js documentation, a CSL record can be defined as a follows in incomplete Backus-Naur-Form and additional description:

A record is a JSON object with unique keys of three kinds (VAR, NAME, DATE, and TYPE):

(1) RECORD := '{' { STD ':' STD_VAL | NAME ':' NAME_VAL | DATE ':' DATE_VAL | TYPE }* '}' (plus comma as seperator)

A STD is a standard variable name as listed at http://citationstyles.org/downloads/specification.html#standard-variables.

(2) STD := '"abstract"' | '"annote"' | '"archive"' | ...

A NAME is a name variable name as listed at http://citationstyles.org/downloads/specification.html#name-variables.

(3) NAME := '"author"' | '"editor"' | ...

A DATE is a date variable name as listed at http://citationstyles.org/downloads/specification.html#date-variables.

(4) NAME := '"accessed"' | '"container"' | ...

A STD_VAL is simple JSON string

(5) STD_VAL := JSON_STRING (see JSON standard)

A TYPE contains a value from the types listed at http://citationstyles.org/downloads/specification.html#appendix-ii-types

(6) TYPE := '"type"' ':' ( '"article"' | '"book"' | ... )

A NAME_VAL is non-empty JSON array of JSON objects with NAME_PART keys and simple JSON string values:

(7) NAME_VAL := '[' ( '{' NAME_PART ':' JSON_STRING | STATIC_ORDERING '}' )+ ']' (plus comma as seperator)

A NAME_PART is variable name is one of

(8) NAME_PART := '"family"' | '"given"' | '"suffix"' | '"non-dropping-particle"' | '"dropping-particle"'

In addition you can add STATIC_ORDERING as part of the NAME_VAL to flag that a name is always displayed with the family name first ("non-Byzantine" names):

(9) STATIC_ORDERING := '"static-ordering"' ':' ANY_TRUE_JSON_VALUE (TODO: what is ANY_TRUE_JSON_VALUE?)

A DATE_VAL is a JSON object which contains at least a DATE_PARTS element and optionally a SEASON_VAL element:

(10) NAME_VAL := '{' '"date-parts"' ':' DATE_PARTS ( ',' '"season"' ':' SEASON_VAL )? '}'

A DATE_PARTS is is a nested JSON array containing a start date and optional end date, each of which consists of a year, an optional month and an optional day, in that order if present.

(11a) DATE_PARTS := '[' DATE ( ',' DATE )? ']'
(11b) DATE  := '[' YEAR ( ',' MONTH ( ',' DAY )? )? ']'
(11c) YEAR  := JSON_STRING | JSON_INTEGER (string must contain an interger. Number must not be zero)
(11d) MONTH  := JSON_STRING | JSON_INTEGER (1 to 12)
(11e) DAY  := JSON_STRING | JSON_INTEGER (1 to 31)

A SEASON_VAL should be one of 1 to 4 or a fixed JSON string:

(12) SEASON_VAL := '"1"' | '"2"' | '"3"' | '"4"' | JSON_STRING

The dirty-tricks fields of citeproc-js are not valid CSL. Please clean your input data before feeding it to a CSL processor if you want to get sane citations.

Other record formats

If you want to use some other format (BibTeX, RIS, MARC, MODS, Bibliographic Ontology etc.) you go this way:

 Record in your format -> some miracle occurs -> record in CSL format -> CSL-Processor -> Citation

Please replace "some miracle occurs" with the conversion service of your choice, for instance Zotero or some library software hacks that libraries tend to use. There is nothing wrong with specific bibliographic formats but its not their purpose to create citations (counterexamples: BibTeX and RIS).

Embedding CSL records in twitter annotations

On the Code4lib mailing list it has been discussed to embed bibliographic data in twitter annotations. If this annotations contain CSL records then you could display a bibliographic reference in the citation style of your choice, delegating the formatting task to the client application.

A twitter annotation is a JSON object with up to 512 bytes (later more):

The CSL input format is also JSON but you need to specify a root element and how to deal with multiple references. This is how an annotation could look like:

  
{ "cslrecords" : {
    "ITEM-2" : {
	"author": [ {
			"family": "Bennett",
			"given": "Frank G.",
			"suffix": "Jr.",
			"static-ordering": false
	} ],
	"title":"Getting Property Right: \"Informal\" Mortgages in the Japanese Courts",
	"container-title":"Pacific Rim Law & Policy Journal",
	"volume": "18",
	"page": "463-509",
	"issued": { "date-parts": [ [2009, 8] ]	},
	"type": "article-journal"
     }
}

But you could also wrap the single records in a way to easily add more non-CSL data to it:

{ "bibrecords":
    "ITEM-2" : {
      "csl" : {
	"author": [ {
			"family": "Bennett",
			"given": "Frank G.",
			"suffix": "Jr.",
			"static-ordering": false
	} ],
	"title":"Getting Property Right: \"Informal\" Mortgages in the Japanese Courts",
	"container-title":"Pacific Rim Law & Policy Journal",
	"volume": "18",
	"page": "463-509",
	"issued": { "date-parts": [ [2009, 8] ]	},
	"type": "article-journal"
      },
      "identifier" [
         "urn:issn:1066-8632",
         "http://ssrn.com/abstract=1541102",
         "bibkey:18561d99b88967f176f0e4ab63d230c0e"
      ]
   }
}

References

Alternatives

  • http://www.refbase.net/ is open source and contains import filters and citation styles to create citations from bibliographic data

This page is licensed under CC-BA-SA and thus can be used on other pages such as Wikipedia as you like