README.md 6.29 KB
Newer Older
Rosanny Sihombing's avatar
Rosanny Sihombing committed
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
# character-parser

Parse JavaScript one character at a time to look for snippets in Templates.  This is not a validator, it's just designed to allow you to have sections of JavaScript delimited by brackets robustly.

[![Build Status](https://img.shields.io/travis/ForbesLindesay/character-parser/master.svg)](https://travis-ci.org/ForbesLindesay/character-parser)

## Installation

    npm install character-parser

## Usage

### Parsing

Work out how much depth changes:

```js
var state = parse('foo(arg1, arg2, {\n  foo: [a, b\n');
assert.deepEqual(state.stack, [')', '}', ']']);

parse('    c, d]\n  })', state);
assert.deepEqual(state.stack, []);
```

### Custom Delimited Expressions

Find code up to a custom delimiter:

```js
// EJS-style
var section = parser.parseUntil('foo.bar("%>").baz%> bing bong', '%>');
assert(section.start === 0);
assert(section.end === 17); // exclusive end of string
assert(section.src = 'foo.bar("%>").baz');

var section = parser.parseUntil('<%foo.bar("%>").baz%> bing bong', '%>', {start: 2});
assert(section.start === 2);
assert(section.end === 19); // exclusive end of string
assert(section.src = 'foo.bar("%>").baz');

// Jade-style
var section = parser.parseUntil('#[p= [1, 2][i]]', ']', {start: 2})
assert(section.start === 2);
assert(section.end === 14); // exclusive end of string
assert(section.src === 'p= [1, 2][i]')

// Dumb parsing
// Stop at first delimiter encountered, doesn't matter if it's nested or not
// This is the character-parser@1 default behavior.
var section = parser.parseUntil('#[p= [1, 2][i]]', '}', {start: 2, ignoreNesting: true})
assert(section.start === 2);
assert(section.end === 10); // exclusive end of string
assert(section.src === 'p= [1, 2')
''
```

Delimiters are ignored if they are inside strings or comments.

## API

All methods may throw an exception in the case of syntax errors. The exception contains an additional `code` property that always starts with `CHARACTER_PARSER:` that is unique for the error.

### parse(str, state = defaultState(), options = {start: 0, end: src.length})

Parse a string starting at the index start, and return the state after parsing that string.

If you want to parse one string in multiple sections you should keep passing the resulting state to the next parse operation.

Returns a `State` object.

### parseUntil(src, delimiter, options = {start: 0, ignoreLineComment: false, ignoreNesting: false})

Parses the source until the first occurence of `delimiter` which is not in a string or a comment.

If `ignoreLineComment` is `true`, it will still count if the delimiter occurs in a line comment.

If `ignoreNesting` is `true`, it will stop at the first bracket, not taking into account if the bracket part of nesting or not. See example above.

It returns an object with the structure:

```js
{
  start: 0,//index of first character of string
  end: 13,//index of first character after the end of string
  src: 'source string'
}
```

### parseChar(character, state = defaultState())

Parses the single character and returns the state.  See `parse` for the structure of the returned state object.  N.B. character must be a single character not a multi character string.

### defaultState()

Get a default starting state.

### isPunctuator(character)

Returns `true` if `character` represents punctuation in JavaScript.

### isKeyword(name)

Returns `true` if `name` is a keyword in JavaScript.

### TOKEN_TYPES & BRACKETS

Objects whose values can be a frame in the `stack` property of a State (documented below).

## State

A state is an object with the following structure

```js
{
  stack: [],          // stack of detected brackets; the outermost is [0]
  regexpStart: false, // true if a slash is just encountered and a REGEXP state has just been added to the stack

  escaped: false,     // true if in a string and the last character was an escape character
  hasDollar: false,   // true if in a template string and the last character was a dollar sign

  src: '',            // the concatenated source string
  history: '',        // reversed `src`
  lastChar: ''        // last parsed character
}
```

`stack` property can contain any of the following:

- Any of the property values of `characterParser.TOKEN_TYPES`
- Any of the property values of `characterParser.BRACKETS` (the end bracket, not the starting bracket)

It also has the following useful methods:

- `.current()` returns the innermost bracket (i.e. the last stack frame).
- `.isString()` returns `true` if the current location is inside a string.
- `.isComment()` returns `true` if the current location is inside a comment.
- `.isNesting([opts])` returns `true` if the current location is not at the top level, i.e. if the stack is not empty. If `opts.ignoreLineComment` is `true`, line comments are not counted as a level, so for `// a` it will still return false.

### Errors

All errors thrown by character-parser has a `code` property attached to it that allows one to identify what sort of error is thrown. For errors thrown from `parse` and `parseUntil`, an additional `index` property is available.

## Transition from v1

In character-parser@2, we have changed the APIs quite a bit. These are some notes that will help you transition to the new version.

### State Object Changes

Instead of keeping depths of different brackets, we are now keeping a stack. We also removed some properties:

```js
state.lineComment   state.current() === parser.TOKEN_TYPES.LINE_COMMENT
state.blockComment  state.current() === parser.TOKEN_TYPES.BLOCK_COMMENT
state.singleQuote   state.current() === parser.TOKEN_TYPES.SINGLE_QUOTE
state.doubleQuote   state.current() === parser.TOKEN_TYPES.DOUBLE_QUOTE
state.regexp        state.current() === parser.TOKEN_TYPES.REGEXP
```

### `parseMax`

This function has been removed since the usefulness of this function has been questioned. You should find that `parseUntil` is a better choice for your task.

### `parseUntil`

The default behavior when the delimiter is a bracket has been changed so that nesting is taken into account to determine if the end is reached.

To preserve the original behavior, pass `ignoreNesting: true` as an option.

To see the difference between the new and old behaviors, see the "Usage" section earlier.

### `parseMaxBracket`

This function has been merged into `parseUntil`. You can directly rename the function call without any repercussions.

## License

MIT