class StringScanner
Class StringScanner
supports processing a stored string as a stream; this code creates a new StringScanner
object with string 'foobarbaz'
:
require 'strscan' scanner = StringScanner.new('foobarbaz')
About the Examples¶ ↑
All examples here assume that StringScanner
has been required:
require 'strscan'
Some examples here assume that these constants are defined:
MULTILINE_TEXT = <<~EOT Go placidly amid the noise and haste, and remember what peace there may be in silence. EOT HIRAGANA_TEXT = 'こんにちは' ENGLISH_TEXT = 'Hello'
Some examples here assume that certain helper methods are defined:
-
put_situation(scanner)
: Displays the values of the scanner’s methodspos
,charpos
,rest
, andrest_size
. -
put_match_values(scanner)
: Displays the scanner’s match values. -
match_values_cleared?(scanner)
: Returns whether the scanner’s match values are cleared.
See examples [here].
The StringScanner
Object¶ ↑
This code creates a StringScanner
object (we’ll call it simply a scanner), and shows some of its basic properties:
scanner = StringScanner.new('foobarbaz') scanner.string # => "foobarbaz" put_situation(scanner) # Situation: # pos: 0 # charpos: 0 # rest: "foobarbaz" # rest_size: 9
The scanner has:
-
A stored string, which is:
-
Initially set by
StringScanner.new(string)
to the givenstring
('foobarbaz'
in the example above). -
Modifiable by methods
string=(new_string)
andconcat(more_string)
. -
Returned by method
string
.
More at Stored String below.
-
-
A position; a zero-based index into the bytes of the stored string (not into its characters):
-
Initially set by
StringScanner.new
to0
. -
Returned by method
pos
. -
Modifiable explicitly by methods
reset
,terminate
, andpos=(new_pos)
. -
Modifiable implicitly (various traversing methods, among others).
More at Byte Position below.
-
-
A target substring, which is a trailing substring of the stored string; it extends from the current position to the end of the stored string:
-
Initially set by
StringScanner.new(string)
to the givenstring
('foobarbaz'
in the example above). -
Returned by method
rest
. -
Modified by any modification to either the stored string or the position.
Most importantly: the searching and traversing methods operate on the target substring, which may be (and often is) less than the entire stored string.
More at Target Substring below.
-
Stored String¶ ↑
The stored string is the string stored in the StringScanner
object.
Each of these methods sets, modifies, or returns the stored string:
Method |
Effect |
---|---|
::new(string) |
Creates a new scanner for the given string. |
string=(new_string) |
Replaces the existing stored string. |
concat(more_string) |
Appends a string to the existing stored string. |
string |
Returns the stored string. |
Positions¶ ↑
A StringScanner
object maintains a zero-based byte position and a zero-based character position.
Each of these methods explicitly sets positions:
Method |
Effect |
---|---|
reset |
Sets both positions to zero (begining of stored string). |
terminate |
Sets both positions to the end of the stored string. |
pos=(new_byte_position) |
Sets byte position; adjusts character position. |
Byte Position (Position)¶ ↑
The byte position (or simply position) is a zero-based index into the bytes in the scanner’s stored string; for a new StringScanner
object, the byte position is zero.
When the byte position is:
-
Zero (at the beginning), the target substring is the entire stored string.
-
Equal to the size of the stored string (at the end), the target substring is the empty string
''
.
To get or set the byte position:
-
pos
: returns the byte position. -
pos=(new_pos)
: sets the byte position.
Many methods use the byte position as the basis for finding matches; many others set, increment, or decrement the byte position:
scanner = StringScanner.new('foobar') scanner.pos # => 0 scanner.scan(/foo/) # => "foo" # Match found. scanner.pos # => 3 # Byte position incremented. scanner.scan(/foo/) # => nil # Match not found. scanner.pos # => 3 # Byte position not changed.
Some methods implicitly modify the byte position; see:
The values of these methods are derived directly from the values of pos
and string
:
-
charpos
: the character position. -
rest
: the target substring. -
rest_size
:rest.size
.
Character Position¶ ↑
The character position is a zero-based index into the characters in the stored string; for a new StringScanner
object, the character position is zero.
Method charpos
returns the character position; its value may not be reset explicitly.
Some methods change (increment or reset) the character position; see:
Example (string includes multi-byte characters):
scanner = StringScanner.new(ENGLISH_TEXT) # Five 1-byte characters. scanner.concat(HIRAGANA_TEXT) # Five 3-byte characters scanner.string # => "Helloこんにちは" # Twenty bytes in all. put_situation(scanner) # Situation: # pos: 0 # charpos: 0 # rest: "Helloこんにちは" # rest_size: 20 scanner.scan(/Hello/) # => "Hello" # Five 1-byte characters. put_situation(scanner) # Situation: # pos: 5 # charpos: 5 # rest: "こんにちは" # rest_size: 15 scanner.getch # => "こ" # One 3-byte character. put_situation(scanner) # Situation: # pos: 8 # charpos: 6 # rest: "んにちは" # rest_size: 12
Target Substring¶ ↑
The target substring is the the part of the stored string that extends from the current byte position to the end of the stored string; it is always either:
-
The entire stored string (byte position is zero).
-
A trailing substring of the stored string (byte position positive).
The target substring is returned by method rest
, and its size is returned by method rest_size
.
Examples:
scanner = StringScanner.new('foobarbaz') put_situation(scanner) # Situation: # pos: 0 # charpos: 0 # rest: "foobarbaz" # rest_size: 9 scanner.pos = 3 put_situation(scanner) # Situation: # pos: 3 # charpos: 3 # rest: "barbaz" # rest_size: 6 scanner.pos = 9 put_situation(scanner) # Situation: # pos: 9 # charpos: 9 # rest: "" # rest_size: 0
Setting the Target Substring¶ ↑
The target substring is set whenever:
-
The stored string is set (position reset to zero; target substring set to stored string).
-
The byte position is set (target substring adjusted accordingly).
Querying the Target Substring¶ ↑
This table summarizes (details and examples at the links):
Method |
Returns |
---|---|
rest |
Target substring. |
rest_size |
Size (bytes) of target substring. |
Searching the Target Substring¶ ↑
A search method examines the target substring, but does not advance the positions or (by implication) shorten the target substring.
This table summarizes (details and examples at the links):
Method |
Returns | Sets Match Values? |
---|---|---|
check(pattern) |
Matched leading substring or nil . |
Yes. |
check_until(pattern) |
Matched substring (anywhere) or nil . |
Yes. |
exist?(pattern) |
Matched substring (anywhere) end index. | Yes. |
match?(pattern) |
Size of matched leading substring or nil . |
Yes. |
peek(size) |
Leading substring of given length (bytes). | No. |
peek_byte |
Integer leading byte or nil . |
No. |
rest |
Target substring (from byte position to end). | No. |
Traversing the Target Substring¶ ↑
A traversal method examines the target substring, and, if successful:
-
Advances the positions.
-
Shortens the target substring.
This table summarizes (details and examples at links):
Method |
Returns | Sets Match Values? |
---|---|---|
get_byte |
Leading byte or nil . |
No. |
getch |
Leading character or nil . |
No. |
scan(pattern) |
Matched leading substring or nil . |
Yes. |
scan_byte |
Integer leading byte or nil . |
No. |
scan_until(pattern) |
Matched substring (anywhere) or nil . |
Yes. |
skip(pattern) |
Matched leading substring size or nil . |
Yes. |
skip_until(pattern) |
Position delta to end-of-matched-substring or nil . |
Yes. |
unscan |
self . |
No. |
Querying the Scanner¶ ↑
Each of these methods queries the scanner object without modifying it (details and examples at links)
Method |
Returns |
---|---|
beginning_of_line? |
true or false . |
charpos |
Character position. |
eos? |
true or false . |
fixed_anchor? |
true or false . |
inspect |
String representation of self . |
pos |
Byte position. |
rest |
Target substring. |
rest_size |
Size of target substring. |
string |
Stored string. |
Matching¶ ↑
StringScanner
implements pattern matching via Ruby class Regexp, and its matching behaviors are the same as Ruby’s except for the fixed-anchor property.
Matcher Methods¶ ↑
Each matcher method takes a single argument pattern
, and attempts to find a matching substring in the target substring.
Method |
Pattern Type | Matches Target Substring | Success Return | May Update Positions? |
---|---|---|---|---|
check |
Regexp or String . |
At beginning. | Matched substring. | No. |
check_until |
Regexp . |
Anywhere. | Substring. | No. |
match? |
Regexp or String . |
At beginning. | Updated position. | No. |
exist? |
Regexp . |
Anywhere. | Updated position. | No. |
scan |
Regexp or String . |
At beginning. | Matched substring. | Yes. |
scan_until |
Regexp . |
Anywhere. | Substring. | Yes. |
skip |
Regexp or String . |
At beginning. | Match size. | Yes. |
skip_until |
Regexp . |
Anywhere. | Position delta. | Yes. |
Which matcher you choose will depend on:
-
Where you want to find a match:
-
Only at the beginning of the target substring:
check
,match?
,scan
,skip
. -
Anywhere in the target substring:
check_until
,exist?
,scan_until
,skip_until
.
-
-
Whether you want to:
-
Traverse, by advancing the positions:
scan
,scan_until
,skip
,skip_until
. -
Keep the positions unchanged:
check
,check_until
,exist?
,match?
.
-
-
What you want for the return value:
-
The matched substring:
check
,check_until
,scan
,scan_until
. -
The position delta:
skip_until
. -
The match size:
skip
.
-
Match Values¶ ↑
The match values in a StringScanner
object generally contain the results of the most recent attempted match.
Each match value may be thought of as:
-
Clear: Initially, or after an unsuccessful match attempt: usually,
false
,nil
, or{}
. -
Set: After a successful match attempt:
true
, string, array, or hash.
Each of these methods clears match values:
Each of these methods attempts a match based on a pattern, and either sets match values (if successful) or clears them (if not);
Basic Match Values¶ ↑
Basic match values are those not related to captures.
Each of these methods returns a basic match value:
Method |
Return After Match | Return After No Match |
---|---|---|
matched? |
true . |
false . |
matched_size |
Size of matched substring. | nil . |
matched |
Matched substring. | nil . |
pre_match |
Substring preceding matched substring. | nil . |
post_match |
Substring following matched substring. | nil . |
See examples below.
Captured Match Values¶ ↑
Captured match values are those related to captures.
Each of these methods returns a captured match value:
Method |
Return After Match | Return After No Match |
---|---|---|
size |
Count of captured substrings. | nil . |
[](n) |
n th captured substring. |
nil . |
captures |
Array of all captured substrings. |
nil . |
values_at(*n) |
Array of specified captured substrings. |
nil . |
named_captures |
Hash of named captures. |
{} . |
See examples below.
Match Values Examples¶ ↑
Successful basic match attempt (no captures):
scanner = StringScanner.new('foobarbaz') scanner.exist?(/bar/) put_match_values(scanner) # Basic match values: # matched?: true # matched_size: 3 # pre_match: "foo" # matched : "bar" # post_match: "baz" # Captured match values: # size: 1 # captures: [] # named_captures: {} # values_at: ["bar", nil] # []: # [0]: "bar" # [1]: nil
Failed basic match attempt (no captures);
scanner = StringScanner.new('foobarbaz') scanner.exist?(/nope/) match_values_cleared?(scanner) # => true
Successful unnamed capture match attempt:
scanner = StringScanner.new('foobarbazbatbam') scanner.exist?(/(foo)bar(baz)bat(bam)/) put_match_values(scanner) # Basic match values: # matched?: true # matched_size: 15 # pre_match: "" # matched : "foobarbazbatbam" # post_match: "" # Captured match values: # size: 4 # captures: ["foo", "baz", "bam"] # named_captures: {} # values_at: ["foobarbazbatbam", "foo", "baz", "bam", nil] # []: # [0]: "foobarbazbatbam" # [1]: "foo" # [2]: "baz" # [3]: "bam" # [4]: nil
Successful named capture match attempt; same as unnamed above, except for named_captures
:
scanner = StringScanner.new('foobarbazbatbam') scanner.exist?(/(?<x>foo)bar(?<y>baz)bat(?<z>bam)/) scanner.named_captures # => {"x"=>"foo", "y"=>"baz", "z"=>"bam"}
Failed unnamed capture match attempt:
scanner = StringScanner.new('somestring') scanner.exist?(/(foo)bar(baz)bat(bam)/) match_values_cleared?(scanner) # => true
Failed named capture match attempt; same as unnamed above, except for named_captures
:
scanner = StringScanner.new('somestring') scanner.exist?(/(?<x>foo)bar(?<y>baz)bat(?<z>bam)/) match_values_cleared?(scanner) # => false scanner.named_captures # => {"x"=>nil, "y"=>nil, "z"=>nil}
Fixed-Anchor Property¶ ↑
Pattern matching in StringScanner
is the same as in Ruby’s, except for its fixed-anchor property, which determines the meaning of '\A'
:
-
false
(the default): matches the current byte position.scanner = StringScanner.new('foobar') scanner.scan(/\A./) # => "f" scanner.scan(/\A./) # => "o" scanner.scan(/\A./) # => "o" scanner.scan(/\A./) # => "b"
-
true
: matches the beginning of the target substring; never matches unless the byte position is zero:scanner = StringScanner.new('foobar', fixed_anchor: true) scanner.scan(/\A./) # => "f" scanner.scan(/\A./) # => nil scanner.reset scanner.scan(/\A./) # => "f"
The fixed-anchor property is set when the StringScanner
object is created, and may not be modified (see StringScanner.new
); method fixed_anchor?
returns the setting.
Public Class Methods
Returns a new StringScanner
object whose stored string is the given string
; sets the fixed-anchor property:
scanner = StringScanner.new('foobarbaz') scanner.string # => "foobarbaz" scanner.fixed_anchor? # => false put_situation(scanner) # Situation: # pos: 0 # charpos: 0 # rest: "foobarbaz" # rest_size: 9
static VALUE strscan_initialize(int argc, VALUE *argv, VALUE self) { struct strscanner *p; VALUE str, options; p = check_strscan(self); rb_scan_args(argc, argv, "11", &str, &options); options = rb_check_hash_type(options); if (!NIL_P(options)) { VALUE fixed_anchor; ID keyword_ids[1]; keyword_ids[0] = rb_intern("fixed_anchor"); rb_get_kwargs(options, keyword_ids, 0, 1, &fixed_anchor); if (fixed_anchor == Qundef) { p->fixed_anchor_p = false; } else { p->fixed_anchor_p = RTEST(fixed_anchor); } } else { p->fixed_anchor_p = false; } StringValue(str); p->str = str; return self; }
Public Instance Methods
Returns a captured substring or nil
; see Captured Match Values.
When there are captures:
scanner = StringScanner.new('Fri Dec 12 1975 14:39') scanner.scan(/(?<wday>\w+) (?<month>\w+) (?<day>\d+) /)
-
specifier
zero: returns the entire matched substring:scanner[0] # => "Fri Dec 12 " scanner.pre_match # => "" scanner.post_match # => "1975 14:39"
-
specifier
positive integer. returns then
th capture, ornil
if out of range:scanner[1] # => "Fri" scanner[2] # => "Dec" scanner[3] # => "12" scanner[4] # => nil
-
specifier
negative integer. counts backward from the last subgroup:scanner[-1] # => "12" scanner[-4] # => "Fri Dec 12 " scanner[-5] # => nil
-
specifier
symbol or string. returns the named subgroup, ornil
if no such:scanner[:wday] # => "Fri" scanner['wday'] # => "Fri" scanner[:month] # => "Dec" scanner[:day] # => "12" scanner[:nope] # => nil
When there are no captures, only [0]
returns non-nil
:
scanner = StringScanner.new('foobarbaz') scanner.exist?(/bar/) scanner[0] # => "bar" scanner[1] # => nil
For a failed match, even [0]
returns nil
:
scanner.scan(/nope/) # => nil scanner[0] # => nil scanner[1] # => nil
static VALUE strscan_aref(VALUE self, VALUE idx) { const char *name; struct strscanner *p; long i; GET_SCANNER(self, p); if (! MATCHED_P(p)) return Qnil; switch (TYPE(idx)) { case T_SYMBOL: idx = rb_sym2str(idx); /* fall through */ case T_STRING: if (!RTEST(p->regex)) return Qnil; RSTRING_GETMEM(idx, name, i); i = name_to_backref_number(&(p->regs), p->regex, name, name + i, rb_enc_get(idx)); break; default: i = NUM2LONG(idx); } if (i < 0) i += p->regs.num_regs; if (i < 0) return Qnil; if (i >= p->regs.num_regs) return Qnil; if (p->regs.beg[i] == -1) return Qnil; return extract_range(p, adjust_register_position(p, p->regs.beg[i]), adjust_register_position(p, p->regs.end[i])); }
Returns whether the position is at the beginning of a line; that is, at the beginning of the stored string or immediately after a newline:
scanner = StringScanner.new(MULTILINE_TEXT) scanner.string # => "Go placidly amid the noise and haste,\nand remember what peace there may be in silence.\n" scanner.pos # => 0 scanner.beginning_of_line? # => true scanner.scan_until(/,/) # => "Go placidly amid the noise and haste," scanner.beginning_of_line? # => false scanner.scan(/\n/) # => "\n" scanner.beginning_of_line? # => true scanner.terminate scanner.beginning_of_line? # => true scanner.concat('x') scanner.terminate scanner.beginning_of_line? # => false
StringScanner#bol? is an alias for StringScanner#beginning_of_line?
.
static VALUE strscan_bol_p(VALUE self) { struct strscanner *p; GET_SCANNER(self, p); if (CURPTR(p) > S_PEND(p)) return Qnil; if (p->curr == 0) return Qtrue; return (*(CURPTR(p) - 1) == '\n') ? Qtrue : Qfalse; }
Returns the array of captured match values at indexes (1..)
if the most recent match attempt succeeded, or nil
otherwise:
scanner = StringScanner.new('Fri Dec 12 1975 14:39') scanner.captures # => nil scanner.exist?(/(?<wday>\w+) (?<month>\w+) (?<day>\d+) /) scanner.captures # => ["Fri", "Dec", "12"] scanner.values_at(*0..4) # => ["Fri Dec 12 ", "Fri", "Dec", "12", nil] scanner.exist?(/Fri/) scanner.captures # => [] scanner.scan(/nope/) scanner.captures # => nil
static VALUE strscan_captures(VALUE self) { struct strscanner *p; int i, num_regs; VALUE new_ary; GET_SCANNER(self, p); if (! MATCHED_P(p)) return Qnil; num_regs = p->regs.num_regs; new_ary = rb_ary_new2(num_regs); for (i = 1; i < num_regs; i++) { VALUE str; if (p->regs.beg[i] == -1) str = Qnil; else str = extract_range(p, adjust_register_position(p, p->regs.beg[i]), adjust_register_position(p, p->regs.end[i])); rb_ary_push(new_ary, str); } return new_ary; }
call-seq: charpos -> character_position
Returns the character position (initially zero), which may be different from the byte position given by method pos
:
scanner = StringScanner.new(HIRAGANA_TEXT) scanner.string # => "こんにちは" scanner.getch # => "こ" # 3-byte character. scanner.getch # => "ん" # 3-byte character. put_situation(scanner) # Situation: # pos: 6 # charpos: 2 # rest: "にちは" # rest_size: 9
static VALUE strscan_get_charpos(VALUE self) { struct strscanner *p; GET_SCANNER(self, p); return LONG2NUM(rb_enc_strlen(S_PBEG(p), CURPTR(p), rb_enc_get(p->str))); }
Attempts to match the given pattern
at the beginning of the target substring; does not modify the positions.
If the match succeeds:
-
Returns the matched substring.
-
Sets all match values.
scanner = StringScanner.new('foobarbaz') scanner.pos = 3 scanner.check('bar') # => "bar" put_match_values(scanner) # Basic match values: # matched?: true # matched_size: 3 # pre_match: "foo" # matched : "bar" # post_match: "baz" # Captured match values: # size: 1 # captures: [] # named_captures: {} # values_at: ["bar", nil] # []: # [0]: "bar" # [1]: nil # => 0..1 put_situation(scanner) # Situation: # pos: 3 # charpos: 3 # rest: "barbaz" # rest_size: 6
If the match fails:
-
Returns
nil
. -
Clears all match values.
scanner.check(/nope/) # => nil match_values_cleared?(scanner) # => true
static VALUE strscan_check(VALUE self, VALUE re) { return strscan_do_scan(self, re, 0, 1, 1); }
Attempts to match the given pattern
anywhere (at any position) in the target substring; does not modify the positions.
If the match succeeds:
-
Sets all match values.
-
Returns the matched substring, which extends from the current position to the end of the matched substring.
scanner = StringScanner.new('foobarbazbatbam') scanner.pos = 6 scanner.check_until(/bat/) # => "bazbat" put_match_values(scanner) # Basic match values: # matched?: true # matched_size: 3 # pre_match: "foobarbaz" # matched : "bat" # post_match: "bam" # Captured match values: # size: 1 # captures: [] # named_captures: {} # values_at: ["bat", nil] # []: # [0]: "bat" # [1]: nil put_situation(scanner) # Situation: # pos: 6 # charpos: 6 # rest: "bazbatbam" # rest_size: 9
If the match fails:
-
Clears all match values.
-
Returns
nil
.
scanner.check_until(/nope/) # => nil match_values_cleared?(scanner) # => true
static VALUE strscan_check_until(VALUE self, VALUE re) { return strscan_do_scan(self, re, 0, 1, 0); }
-
Appends the given
more_string
to the stored string. -
Returns
self
. -
Does not affect the positions or match values.
scanner = StringScanner.new('foo') scanner.string # => "foo" scanner.terminate scanner.concat('barbaz') # => #<StringScanner 3/9 "foo" @ "barba..."> scanner.string # => "foobarbaz" put_situation(scanner) # Situation: # pos: 3 # charpos: 3 # rest: "barbaz" # rest_size: 6
static VALUE strscan_concat(VALUE self, VALUE str) { struct strscanner *p; GET_SCANNER(self, p); StringValue(str); rb_str_append(p->str, str); return self; }
Returns whether the position is at the end of the stored string:
scanner = StringScanner.new('foobarbaz') scanner.eos? # => false pos = 3 scanner.eos? # => false scanner.terminate scanner.eos? # => true
static VALUE strscan_eos_p(VALUE self) { struct strscanner *p; GET_SCANNER(self, p); return EOS_P(p) ? Qtrue : Qfalse; }
Attempts to match the given pattern
anywhere (at any position) n the target substring; does not modify the positions.
If the match succeeds:
-
Returns a byte offset: the distance in bytes between the current position and the end of the matched substring.
-
Sets all match values.
scanner = StringScanner.new('foobarbazbatbam') scanner.pos = 6 scanner.exist?(/bat/) # => 6 put_match_values(scanner) # Basic match values: # matched?: true # matched_size: 3 # pre_match: "foobarbaz" # matched : "bat" # post_match: "bam" # Captured match values: # size: 1 # captures: [] # named_captures: {} # values_at: ["bat", nil] # []: # [0]: "bat" # [1]: nil put_situation(scanner) # Situation: # pos: 6 # charpos: 6 # rest: "bazbatbam" # rest_size: 9
If the match fails:
-
Returns
nil
. -
Clears all match values.
scanner.exist?(/nope/) # => nil match_values_cleared?(scanner) # => true
static VALUE strscan_exist_p(VALUE self, VALUE re) { return strscan_do_scan(self, re, 0, 0, 0); }
Returns whether the fixed-anchor property is set.
static VALUE strscan_fixed_anchor_p(VALUE self) { struct strscanner *p; p = check_strscan(self); return p->fixed_anchor_p ? Qtrue : Qfalse; }
call-seq: get_byte
-> byte_as_character or nil
Returns the next byte, if available:
-
If the position is not at the end of the stored string:
-
Returns the next byte.
-
Increments the byte position.
-
Adjusts the character position.
scanner = StringScanner.new(HIRAGANA_TEXT) # => #<StringScanner 0/15 @ "\xE3\x81\x93\xE3\x82..."> scanner.string # => "こんにちは" [scanner.get_byte, scanner.pos, scanner.charpos] # => ["\xE3", 1, 1] [scanner.get_byte, scanner.pos, scanner.charpos] # => ["\x81", 2, 2] [scanner.get_byte, scanner.pos, scanner.charpos] # => ["\x93", 3, 1] [scanner.get_byte, scanner.pos, scanner.charpos] # => ["\xE3", 4, 2] [scanner.get_byte, scanner.pos, scanner.charpos] # => ["\x82", 5, 3] [scanner.get_byte, scanner.pos, scanner.charpos] # => ["\x93", 6, 2]
-
-
Otherwise, returns
nil
, and does not change the positions.scanner.terminate [scanner.get_byte, scanner.pos, scanner.charpos] # => [nil, 15, 5]
static VALUE strscan_get_byte(VALUE self) { struct strscanner *p; GET_SCANNER(self, p); CLEAR_MATCH_STATUS(p); if (EOS_P(p)) return Qnil; p->prev = p->curr; p->curr++; MATCHED(p); adjust_registers_to_matched(p); return extract_range(p, adjust_register_position(p, p->regs.beg[0]), adjust_register_position(p, p->regs.end[0])); }
call-seq: getch -> character or nil
Returns the next (possibly multibyte) character, if available:
-
If the position is at the beginning of a character:
-
Returns the character.
-
Increments the character position by 1.
-
Increments the byte position by the size (in bytes) of the character.
scanner = StringScanner.new(HIRAGANA_TEXT) scanner.string # => "こんにちは" [scanner.getch, scanner.pos, scanner.charpos] # => ["こ", 3, 1] [scanner.getch, scanner.pos, scanner.charpos] # => ["ん", 6, 2] [scanner.getch, scanner.pos, scanner.charpos] # => ["に", 9, 3] [scanner.getch, scanner.pos, scanner.charpos] # => ["ち", 12, 4] [scanner.getch, scanner.pos, scanner.charpos] # => ["は", 15, 5] [scanner.getch, scanner.pos, scanner.charpos] # => [nil, 15, 5]
-
-
If the position is within a multi-byte character (that is, not at its beginning), behaves like
get_byte
(returns a 1-byte character):scanner.pos = 1 [scanner.getch, scanner.pos, scanner.charpos] # => ["\x81", 2, 2] [scanner.getch, scanner.pos, scanner.charpos] # => ["\x93", 3, 1] [scanner.getch, scanner.pos, scanner.charpos] # => ["ん", 6, 2]
-
If the position is at the end of the stored string, returns
nil
and does not modify the positions:scanner.terminate [scanner.getch, scanner.pos, scanner.charpos] # => [nil, 15, 5]
static VALUE strscan_getch(VALUE self) { struct strscanner *p; long len; GET_SCANNER(self, p); CLEAR_MATCH_STATUS(p); if (EOS_P(p)) return Qnil; len = rb_enc_mbclen(CURPTR(p), S_PEND(p), rb_enc_get(p->str)); len = minl(len, S_RESTLEN(p)); p->prev = p->curr; p->curr += len; MATCHED(p); adjust_registers_to_matched(p); return extract_range(p, adjust_register_position(p, p->regs.beg[0]), adjust_register_position(p, p->regs.end[0])); }
Returns a string representation of self
that may show:
-
The current position.
-
The size (in bytes) of the stored string.
-
The substring preceding the current position.
-
The substring following the current position (which is also the target substring).
scanner = StringScanner.new("Fri Dec 12 1975 14:39") scanner.pos = 11 scanner.inspect # => "#<StringScanner 11/21 \"...c 12 \" @ \"1975 ...\">"
If at beginning-of-string, item 4 above (following substring) is omitted:
scanner.reset scanner.inspect # => "#<StringScanner 0/21 @ \"Fri D...\">"
If at end-of-string, all items above are omitted:
scanner.terminate scanner.inspect # => "#<StringScanner fin>"
static VALUE strscan_inspect(VALUE self) { struct strscanner *p; VALUE a, b; p = check_strscan(self); if (NIL_P(p->str)) { a = rb_sprintf("#<%"PRIsVALUE" (uninitialized)>", rb_obj_class(self)); return a; } if (EOS_P(p)) { a = rb_sprintf("#<%"PRIsVALUE" fin>", rb_obj_class(self)); return a; } if (p->curr == 0) { b = inspect2(p); a = rb_sprintf("#<%"PRIsVALUE" %ld/%ld @ %"PRIsVALUE">", rb_obj_class(self), p->curr, S_LEN(p), b); return a; } a = inspect1(p); b = inspect2(p); a = rb_sprintf("#<%"PRIsVALUE" %ld/%ld %"PRIsVALUE" @ %"PRIsVALUE">", rb_obj_class(self), p->curr, S_LEN(p), a, b); return a; }
Attempts to match the given pattern
at the beginning of the target substring; does not modify the positions.
If the match succeeds:
-
Sets match values.
-
Returns the size in bytes of the matched substring.
scanner = StringScanner.new('foobarbaz') scanner.pos = 3 scanner.match?(/bar/) => 3 put_match_values(scanner) # Basic match values: # matched?: true # matched_size: 3 # pre_match: "foo" # matched : "bar" # post_match: "baz" # Captured match values: # size: 1 # captures: [] # named_captures: {} # values_at: ["bar", nil] # []: # [0]: "bar" # [1]: nil put_situation(scanner) # Situation: # pos: 3 # charpos: 3 # rest: "barbaz" # rest_size: 6
If the match fails:
-
Clears match values.
-
Returns
nil
. -
Does not increment positions.
scanner.match?(/nope/) # => nil match_values_cleared?(scanner) # => true
static VALUE strscan_match_p(VALUE self, VALUE re) { return strscan_do_scan(self, re, 0, 0, 1); }
Returns the matched substring from the most recent match attempt if it was successful, or nil
otherwise; see Basic Matched Values:
scanner = StringScanner.new('foobarbaz') scanner.matched # => nil scanner.pos = 3 scanner.match?(/bar/) # => 3 scanner.matched # => "bar" scanner.match?(/nope/) # => nil scanner.matched # => nil
static VALUE strscan_matched(VALUE self) { struct strscanner *p; GET_SCANNER(self, p); if (! MATCHED_P(p)) return Qnil; return extract_range(p, adjust_register_position(p, p->regs.beg[0]), adjust_register_position(p, p->regs.end[0])); }
Returns true
of the most recent match attempt was successful, false
otherwise; see Basic Matched Values:
scanner = StringScanner.new('foobarbaz') scanner.matched? # => false scanner.pos = 3 scanner.exist?(/baz/) # => 6 scanner.matched? # => true scanner.exist?(/nope/) # => nil scanner.matched? # => false
static VALUE strscan_matched_p(VALUE self) { struct strscanner *p; GET_SCANNER(self, p); return MATCHED_P(p) ? Qtrue : Qfalse; }
Returns the size (in bytes) of the matched substring from the most recent match match attempt if it was successful, or nil
otherwise; see Basic Matched Values:
scanner = StringScanner.new('foobarbaz') scanner.matched_size # => nil pos = 3 scanner.exist?(/baz/) # => 9 scanner.matched_size # => 3 scanner.exist?(/nope/) # => nil scanner.matched_size # => nil
static VALUE strscan_matched_size(VALUE self) { struct strscanner *p; GET_SCANNER(self, p); if (! MATCHED_P(p)) return Qnil; return LONG2NUM(p->regs.end[0] - p->regs.beg[0]); }
Returns the array of captured match values at indexes (1..) if the most recent match attempt succeeded, or nil otherwise; see Captured Match Values:
scanner = StringScanner.new('Fri Dec 12 1975 14:39') scanner.named_captures # => {} pattern = /(?<wday>\w+) (?<month>\w+) (?<day>\d+) / scanner.match?(pattern) scanner.named_captures # => {"wday"=>"Fri", "month"=>"Dec", "day"=>"12"} scanner.string = 'nope' scanner.match?(pattern) scanner.named_captures # => {"wday"=>nil, "month"=>nil, "day"=>nil} scanner.match?(/nosuch/) scanner.named_captures # => {}
static VALUE strscan_named_captures(VALUE self) { struct strscanner *p; named_captures_data data; GET_SCANNER(self, p); data.self = self; data.captures = rb_hash_new(); if (!RB_NIL_P(p->regex)) { onig_foreach_name(RREGEXP_PTR(p->regex), named_captures_iter, &data); } return data.captures; }
Returns the substring string[pos, length]
; does not update match values or positions:
scanner = StringScanner.new('foobarbaz') scanner.pos = 3 scanner.peek(3) # => "bar" scanner.terminate scanner.peek(3) # => ""
static VALUE strscan_peek(VALUE self, VALUE vlen) { struct strscanner *p; long len; GET_SCANNER(self, p); len = NUM2LONG(vlen); if (EOS_P(p)) return str_new(p, "", 0); len = minl(len, S_RESTLEN(p)); return extract_beg_len(p, p->curr, len); }
Peeks at the current byte and returns it as an integer.
s = StringScanner.new('ab') s.peek_byte # => 97
static VALUE strscan_peek_byte(VALUE self) { struct strscanner *p; GET_SCANNER(self, p); if (EOS_P(p)) return Qnil; return INT2FIX((unsigned char)*CURPTR(p)); }
call-seq: pos -> byte_position
Returns the integer byte position, which may be different from the character position:
scanner = StringScanner.new(HIRAGANA_TEXT) scanner.string # => "こんにちは" scanner.pos # => 0 scanner.getch # => "こ" # 3-byte character. scanner.charpos # => 1 scanner.pos # => 3
call-seq: pos = n -> n pointer = n -> n
Sets the byte position and the character position; returns n
.
Does not affect match values.
For non-negative n
, sets the position to n
:
scanner = StringScanner.new(HIRAGANA_TEXT) scanner.string # => "こんにちは" scanner.pos = 3 # => 3 scanner.rest # => "んにちは" scanner.charpos # => 1
For negative n
, counts from the end of the stored string:
scanner.pos = -9 # => -9 scanner.pos # => 6 scanner.rest # => "にちは" scanner.charpos # => 2
call-seq: pos -> byte_position
Returns the integer byte position, which may be different from the character position:
scanner = StringScanner.new(HIRAGANA_TEXT) scanner.string # => "こんにちは" scanner.pos # => 0 scanner.getch # => "こ" # 3-byte character. scanner.charpos # => 1 scanner.pos # => 3
static VALUE strscan_get_pos(VALUE self) { struct strscanner *p; GET_SCANNER(self, p); return INT2FIX(p->curr); }
call-seq: pos = n -> n pointer = n -> n
Sets the byte position and the character position; returns n
.
Does not affect match values.
For non-negative n
, sets the position to n
:
scanner = StringScanner.new(HIRAGANA_TEXT) scanner.string # => "こんにちは" scanner.pos = 3 # => 3 scanner.rest # => "んにちは" scanner.charpos # => 1
For negative n
, counts from the end of the stored string:
scanner.pos = -9 # => -9 scanner.pos # => 6 scanner.rest # => "にちは" scanner.charpos # => 2
static VALUE strscan_set_pos(VALUE self, VALUE v) { struct strscanner *p; long i; GET_SCANNER(self, p); i = NUM2INT(v); if (i < 0) i += S_LEN(p); if (i < 0) rb_raise(rb_eRangeError, "index out of range"); if (i > S_LEN(p)) rb_raise(rb_eRangeError, "index out of range"); p->curr = i; return LONG2NUM(i); }
Returns the substring that follows the matched substring from the most recent match attempt if it was successful, or nil
otherwise; see Basic Match Values:
scanner = StringScanner.new('foobarbaz') scanner.post_match # => nil scanner.pos = 3 scanner.match?(/bar/) # => 3 scanner.post_match # => "baz" scanner.match?(/nope/) # => nil scanner.post_match # => nil
static VALUE strscan_post_match(VALUE self) { struct strscanner *p; GET_SCANNER(self, p); if (! MATCHED_P(p)) return Qnil; return extract_range(p, adjust_register_position(p, p->regs.end[0]), S_LEN(p)); }
Returns the substring that precedes the matched substring from the most recent match attempt if it was successful, or nil
otherwise; see Basic Match Values:
scanner = StringScanner.new('foobarbaz') scanner.pre_match # => nil scanner.pos = 3 scanner.exist?(/baz/) # => 6 scanner.pre_match # => "foobar" # Substring of entire string, not just target string. scanner.exist?(/nope/) # => nil scanner.pre_match # => nil
static VALUE strscan_pre_match(VALUE self) { struct strscanner *p; GET_SCANNER(self, p); if (! MATCHED_P(p)) return Qnil; return extract_range(p, 0, adjust_register_position(p, p->regs.beg[0])); }
Sets both byte position and character position to zero, and clears match values; returns self
:
scanner = StringScanner.new('foobarbaz') scanner.exist?(/bar/) # => 6 scanner.reset # => #<StringScanner 0/9 @ "fooba..."> put_situation(scanner) # Situation: # pos: 0 # charpos: 0 # rest: "foobarbaz" # rest_size: 9 # => nil match_values_cleared?(scanner) # => true
static VALUE strscan_reset(VALUE self) { struct strscanner *p; GET_SCANNER(self, p); p->curr = 0; CLEAR_MATCH_STATUS(p); return self; }
Returns the ‘rest’ of the stored string (all after the current position), which is the target substring:
scanner = StringScanner.new('foobarbaz') scanner.rest # => "foobarbaz" scanner.pos = 3 scanner.rest # => "barbaz" scanner.terminate scanner.rest # => ""
static VALUE strscan_rest(VALUE self) { struct strscanner *p; GET_SCANNER(self, p); if (EOS_P(p)) { return str_new(p, "", 0); } return extract_range(p, p->curr, S_LEN(p)); }
Returns the size (in bytes) of the rest
of the stored string:
scanner = StringScanner.new('foobarbaz') scanner.rest # => "foobarbaz" scanner.rest_size # => 9 scanner.pos = 3 scanner.rest # => "barbaz" scanner.rest_size # => 6 scanner.terminate scanner.rest # => "" scanner.rest_size # => 0
static VALUE strscan_rest_size(VALUE self) { struct strscanner *p; long i; GET_SCANNER(self, p); if (EOS_P(p)) { return INT2FIX(0); } i = S_RESTLEN(p); return INT2FIX(i); }
call-seq: scan(pattern) -> substring or nil
Attempts to match the given pattern
at the beginning of the target substring.
If the match succeeds:
-
Returns the matched substring.
-
Increments the byte position by
substring.bytesize
, and may increment the character position. -
Sets match values.
scanner = StringScanner.new(HIRAGANA_TEXT) scanner.string # => "こんにちは" scanner.pos = 6 scanner.scan(/に/) # => "に" put_match_values(scanner) # Basic match values: # matched?: true # matched_size: 3 # pre_match: "こん" # matched : "に" # post_match: "ちは" # Captured match values: # size: 1 # captures: [] # named_captures: {} # values_at: ["に", nil] # []: # [0]: "に" # [1]: nil put_situation(scanner) # Situation: # pos: 9 # charpos: 3 # rest: "ちは" # rest_size: 6
If the match fails:
-
Returns
nil
. -
Does not increment byte and character positions.
-
Clears match values.
scanner.scan(/nope/) # => nil match_values_cleared?(scanner) # => true
static VALUE strscan_scan(VALUE self, VALUE re) { return strscan_do_scan(self, re, 1, 1, 1); }
Scans one byte and returns it as an integer. This method is not multibyte character sensitive. See also: getch
.
static VALUE strscan_scan_byte(VALUE self) { struct strscanner *p; VALUE byte; GET_SCANNER(self, p); CLEAR_MATCH_STATUS(p); if (EOS_P(p)) return Qnil; byte = INT2FIX((unsigned char)*CURPTR(p)); p->prev = p->curr; p->curr++; MATCHED(p); adjust_registers_to_matched(p); return byte; }
call-seq: scan_until
(pattern) -> substring or nil
Attempts to match the given pattern
anywhere (at any position) in the target substring.
If the match attempt succeeds:
-
Sets match values.
-
Sets the byte position to the end of the matched substring; may adjust the character position.
-
Returns the matched substring.
scanner = StringScanner.new(HIRAGANA_TEXT) scanner.string # => "こんにちは" scanner.pos = 6 scanner.scan_until(/ち/) # => "にち" put_match_values(scanner) # Basic match values: # matched?: true # matched_size: 3 # pre_match: "こんに" # matched : "ち" # post_match: "は" # Captured match values: # size: 1 # captures: [] # named_captures: {} # values_at: ["ち", nil] # []: # [0]: "ち" # [1]: nil put_situation(scanner) # Situation: # pos: 12 # charpos: 4 # rest: "は" # rest_size: 3
If the match attempt fails:
-
Clears match data.
-
Returns
nil
. -
Does not update positions.
scanner.scan_until(/nope/) # => nil match_values_cleared?(scanner) # => true
static VALUE strscan_scan_until(VALUE self, VALUE re) { return strscan_do_scan(self, re, 1, 1, 0); }
Returns the count of captures if the most recent match attempt succeeded, nil
otherwise; see Captures Match Values:
scanner = StringScanner.new('Fri Dec 12 1975 14:39') scanner.size # => nil pattern = /(?<wday>\w+) (?<month>\w+) (?<day>\d+) / scanner.match?(pattern) scanner.values_at(*0..scanner.size) # => ["Fri Dec 12 ", "Fri", "Dec", "12", nil] scanner.size # => 4 scanner.match?(/nope/) # => nil scanner.size # => nil
static VALUE strscan_size(VALUE self) { struct strscanner *p; GET_SCANNER(self, p); if (! MATCHED_P(p)) return Qnil; return INT2FIX(p->regs.num_regs); }
call-seq: skip(pattern) match_size or nil
Attempts to match the given pattern
at the beginning of the target substring;
If the match succeeds:
-
Increments the byte position by substring.bytesize, and may increment the character position.
-
Sets match values.
-
Returns the size (bytes) of the matched substring.
scanner = StringScanner.new(HIRAGANA_TEXT) scanner.string # => "こんにちは" scanner.pos = 6 scanner.skip(/に/) # => 3 put_match_values(scanner) # Basic match values: # matched?: true # matched_size: 3 # pre_match: "こん" # matched : "に" # post_match: "ちは" # Captured match values: # size: 1 # captures: [] # named_captures: {} # values_at: ["に", nil] # []: # [0]: "に" # [1]: nil put_situation(scanner) # Situation: # pos: 9 # charpos: 3 # rest: "ちは" # rest_size: 6 scanner.skip(/nope/) # => nil match_values_cleared?(scanner) # => true
static VALUE strscan_skip(VALUE self, VALUE re) { return strscan_do_scan(self, re, 1, 0, 1); }
call-seq: skip_until
(pattern) -> matched_substring_size or nil
Attempts to match the given pattern
anywhere (at any position) in the target substring; does not modify the positions.
If the match attempt succeeds:
-
Sets match values.
-
Returns the size of the matched substring.
scanner = StringScanner.new(HIRAGANA_TEXT) scanner.string # => "こんにちは" scanner.pos = 6 scanner.skip_until(/ち/) # => 6 put_match_values(scanner) # Basic match values: # matched?: true # matched_size: 3 # pre_match: "こんに" # matched : "ち" # post_match: "は" # Captured match values: # size: 1 # captures: [] # named_captures: {} # values_at: ["ち", nil] # []: # [0]: "ち" # [1]: nil put_situation(scanner) # Situation: # pos: 12 # charpos: 4 # rest: "は" # rest_size: 3
If the match attempt fails:
-
Clears match values.
-
Returns
nil
.
scanner.skip_until(/nope/) # => nil match_values_cleared?(scanner) # => true
static VALUE strscan_skip_until(VALUE self, VALUE re) { return strscan_do_scan(self, re, 1, 0, 0); }
Returns the stored string:
scanner = StringScanner.new('foobar') scanner.string # => "foobar" scanner.concat('baz') scanner.string # => "foobarbaz"
static VALUE strscan_get_string(VALUE self) { struct strscanner *p; GET_SCANNER(self, p); return p->str; }
Replaces the stored string with the given other_string
:
-
Sets both positions to zero.
-
Clears match values.
-
Returns
other_string
.
scanner = StringScanner.new('foobar') scanner.scan(/foo/) put_situation(scanner) # Situation: # pos: 3 # charpos: 3 # rest: "bar" # rest_size: 3 match_values_cleared?(scanner) # => false scanner.string = 'baz' # => "baz" put_situation(scanner) # Situation: # pos: 0 # charpos: 0 # rest: "baz" # rest_size: 3 match_values_cleared?(scanner) # => true
static VALUE strscan_set_string(VALUE self, VALUE str) { struct strscanner *p = check_strscan(self); StringValue(str); p->str = str; p->curr = 0; CLEAR_MATCH_STATUS(p); return str; }
call-seq: terminate -> self
Sets the scanner to end-of-string; returns self
:
-
Sets both positions to end-of-stream.
-
Clears match values.
scanner = StringScanner.new(HIRAGANA_TEXT) scanner.string # => "こんにちは" scanner.scan_until(/に/) put_situation(scanner) # Situation: # pos: 9 # charpos: 3 # rest: "ちは" # rest_size: 6 match_values_cleared?(scanner) # => false scanner.terminate # => #<StringScanner fin> put_situation(scanner) # Situation: # pos: 15 # charpos: 5 # rest: "" # rest_size: 0 match_values_cleared?(scanner) # => true
static VALUE strscan_terminate(VALUE self) { struct strscanner *p; GET_SCANNER(self, p); p->curr = S_LEN(p); CLEAR_MATCH_STATUS(p); return self; }
Sets the position to its value previous to the recent successful match attempt:
scanner = StringScanner.new('foobarbaz') scanner.scan(/foo/) put_situation(scanner) # Situation: # pos: 3 # charpos: 3 # rest: "barbaz" # rest_size: 6 scanner.unscan # => #<StringScanner 0/9 @ "fooba..."> put_situation(scanner) # Situation: # pos: 0 # charpos: 0 # rest: "foobarbaz" # rest_size: 9
Raises an exception if match values are clear:
scanner.scan(/nope/) # => nil match_values_cleared?(scanner) # => true scanner.unscan # Raises StringScanner::Error.
static VALUE strscan_unscan(VALUE self) { struct strscanner *p; GET_SCANNER(self, p); if (! MATCHED_P(p)) rb_raise(ScanError, "unscan failed: previous match record not exist"); p->curr = p->prev; CLEAR_MATCH_STATUS(p); return self; }
Returns an array of captured substrings, or nil
of none.
For each specifier
, the returned substring is [specifier]
; see []
.
scanner = StringScanner.new('Fri Dec 12 1975 14:39') pattern = /(?<wday>\w+) (?<month>\w+) (?<day>\d+) / scanner.match?(pattern) scanner.values_at(*0..3) # => ["Fri Dec 12 ", "Fri", "Dec", "12"] scanner.values_at(*%i[wday month day]) # => ["Fri", "Dec", "12"]
static VALUE strscan_values_at(int argc, VALUE *argv, VALUE self) { struct strscanner *p; long i; VALUE new_ary; GET_SCANNER(self, p); if (! MATCHED_P(p)) return Qnil; new_ary = rb_ary_new2(argc); for (i = 0; i<argc; i++) { rb_ary_push(new_ary, strscan_aref(self, argv[i])); } return new_ary; }