class StringScanner

Class StringScanner supports processing a stored string as a stream; this code creates a new StringScanner object with string 'foobarbaz':

require 'strscan'
scanner = StringScanner.new('foobarbaz')

About the Examples

All examples here assume that StringScanner has been required:

require 'strscan'

Some examples here assume that these constants are defined:

MULTILINE_TEXT = <<~EOT
Go placidly amid the noise and haste,
and remember what peace there may be in silence.
EOT

HIRAGANA_TEXT = 'こんにちは'

ENGLISH_TEXT = 'Hello'

Some examples here assume that certain helper methods are defined:

  • put_situation(scanner): Displays the values of the scanner’s methods pos, charpos, rest, and rest_size.

  • put_match_values(scanner): Displays the scanner’s match values.

  • match_values_cleared?(scanner): Returns whether the scanner’s match values are cleared.

See examples [here].

The StringScanner Object

This code creates a StringScanner object (we’ll call it simply a scanner), and shows some of its basic properties:

scanner = StringScanner.new('foobarbaz')
scanner.string # => "foobarbaz"
put_situation(scanner)
# Situation:
#   pos:       0
#   charpos:   0
#   rest:      "foobarbaz"
#   rest_size: 9

The scanner has:

  • A stored string, which is:

    More at Stored String below.

  • A position; a zero-based index into the bytes of the stored string (not into its characters):

    More at Byte Position below.

  • A target substring, which is a trailing substring of the stored string; it extends from the current position to the end of the stored string:

    • Initially set by StringScanner.new(string) to the given string ('foobarbaz' in the example above).

    • Returned by method rest.

    • Modified by any modification to either the stored string or the position.

    Most importantly: the searching and traversing methods operate on the target substring, which may be (and often is) less than the entire stored string.

    More at Target Substring below.

Stored String

The stored string is the string stored in the StringScanner object.

Each of these methods sets, modifies, or returns the stored string:

Method Effect
::new(string) Creates a new scanner for the given string.
string=(new_string) Replaces the existing stored string.
concat(more_string) Appends a string to the existing stored string.
string Returns the stored string.

Positions

A StringScanner object maintains a zero-based byte position and a zero-based character position.

Each of these methods explicitly sets positions:

Method Effect
reset Sets both positions to zero (begining of stored string).
terminate Sets both positions to the end of the stored string.
pos=(new_byte_position) Sets byte position; adjusts character position.

Byte Position (Position)

The byte position (or simply position) is a zero-based index into the bytes in the scanner’s stored string; for a new StringScanner object, the byte position is zero.

When the byte position is:

  • Zero (at the beginning), the target substring is the entire stored string.

  • Equal to the size of the stored string (at the end), the target substring is the empty string ''.

To get or set the byte position:

Many methods use the byte position as the basis for finding matches; many others set, increment, or decrement the byte position:

scanner = StringScanner.new('foobar')
scanner.pos # => 0
scanner.scan(/foo/) # => "foo" # Match found.
scanner.pos         # => 3     # Byte position incremented.
scanner.scan(/foo/) # => nil   # Match not found.
scanner.pos # => 3             # Byte position not changed.

Some methods implicitly modify the byte position; see:

The values of these methods are derived directly from the values of pos and string:

Character Position

The character position is a zero-based index into the characters in the stored string; for a new StringScanner object, the character position is zero.

Method charpos returns the character position; its value may not be reset explicitly.

Some methods change (increment or reset) the character position; see:

Example (string includes multi-byte characters):

scanner = StringScanner.new(ENGLISH_TEXT) # Five 1-byte characters.
scanner.concat(HIRAGANA_TEXT)             # Five 3-byte characters
scanner.string # => "Helloこんにちは"       # Twenty bytes in all.
put_situation(scanner)
# Situation:
#   pos:       0
#   charpos:   0
#   rest:      "Helloこんにちは"
#   rest_size: 20
scanner.scan(/Hello/) # => "Hello" # Five 1-byte characters.
put_situation(scanner)
# Situation:
#   pos:       5
#   charpos:   5
#   rest:      "こんにちは"
#   rest_size: 15
scanner.getch         # => "こ"    # One 3-byte character.
put_situation(scanner)
# Situation:
#   pos:       8
#   charpos:   6
#   rest:      "んにちは"
#   rest_size: 12

Target Substring

The target substring is the the part of the stored string that extends from the current byte position to the end of the stored string; it is always either:

  • The entire stored string (byte position is zero).

  • A trailing substring of the stored string (byte position positive).

The target substring is returned by method rest, and its size is returned by method rest_size.

Examples:

scanner = StringScanner.new('foobarbaz')
put_situation(scanner)
# Situation:
#   pos:       0
#   charpos:   0
#   rest:      "foobarbaz"
#   rest_size: 9
scanner.pos = 3
put_situation(scanner)
# Situation:
#   pos:       3
#   charpos:   3
#   rest:      "barbaz"
#   rest_size: 6
scanner.pos = 9
put_situation(scanner)
# Situation:
#   pos:       9
#   charpos:   9
#   rest:      ""
#   rest_size: 0

Setting the Target Substring

The target substring is set whenever:

  • The stored string is set (position reset to zero; target substring set to stored string).

  • The byte position is set (target substring adjusted accordingly).

Querying the Target Substring

This table summarizes (details and examples at the links):

Method Returns
rest Target substring.
rest_size Size (bytes) of target substring.

Searching the Target Substring

A search method examines the target substring, but does not advance the positions or (by implication) shorten the target substring.

This table summarizes (details and examples at the links):

Method Returns Sets Match Values?
check(pattern) Matched leading substring or nil. Yes.
check_until(pattern) Matched substring (anywhere) or nil. Yes.
exist?(pattern) Matched substring (anywhere) end index. Yes.
match?(pattern) Size of matched leading substring or nil. Yes.
peek(size) Leading substring of given length (bytes). No.
peek_byte Integer leading byte or nil. No.
rest Target substring (from byte position to end). No.

Traversing the Target Substring

A traversal method examines the target substring, and, if successful:

  • Advances the positions.

  • Shortens the target substring.

This table summarizes (details and examples at links):

Method Returns Sets Match Values?
get_byte Leading byte or nil. No.
getch Leading character or nil. No.
scan(pattern) Matched leading substring or nil. Yes.
scan_byte Integer leading byte or nil. No.
scan_until(pattern) Matched substring (anywhere) or nil. Yes.
skip(pattern) Matched leading substring size or nil. Yes.
skip_until(pattern) Position delta to end-of-matched-substring or nil. Yes.
unscan self. No.

Querying the Scanner

Each of these methods queries the scanner object without modifying it (details and examples at links)

Method Returns
beginning_of_line? true or false.
charpos Character position.
eos? true or false.
fixed_anchor? true or false.
inspect String representation of self.
pos Byte position.
rest Target substring.
rest_size Size of target substring.
string Stored string.

Matching

StringScanner implements pattern matching via Ruby class Regexp, and its matching behaviors are the same as Ruby’s except for the fixed-anchor property.

Matcher Methods

Each matcher method takes a single argument pattern, and attempts to find a matching substring in the target substring.

Method Pattern Type Matches Target Substring Success Return May Update Positions?
check Regexp or String. At beginning. Matched substring. No.
check_until Regexp. Anywhere. Substring. No.
match? Regexp or String. At beginning. Updated position. No.
exist? Regexp. Anywhere. Updated position. No.
scan Regexp or String. At beginning. Matched substring. Yes.
scan_until Regexp. Anywhere. Substring. Yes.
skip Regexp or String. At beginning. Match size. Yes.
skip_until Regexp. Anywhere. Position delta. Yes.


Which matcher you choose will depend on:

Match Values

The match values in a StringScanner object generally contain the results of the most recent attempted match.

Each match value may be thought of as:

  • Clear: Initially, or after an unsuccessful match attempt: usually, false, nil, or {}.

  • Set: After a successful match attempt: true, string, array, or hash.

Each of these methods clears match values:

Each of these methods attempts a match based on a pattern, and either sets match values (if successful) or clears them (if not);

Basic Match Values

Basic match values are those not related to captures.

Each of these methods returns a basic match value:

Method Return After Match Return After No Match
matched? true. false.
matched_size Size of matched substring. nil.
matched Matched substring. nil.
pre_match Substring preceding matched substring. nil.
post_match Substring following matched substring. nil.


See examples below.

Captured Match Values

Captured match values are those related to captures.

Each of these methods returns a captured match value:

Method Return After Match Return After No Match
size Count of captured substrings. nil.
[](n) nth captured substring. nil.
captures Array of all captured substrings. nil.
values_at(*n) Array of specified captured substrings. nil.
named_captures Hash of named captures. {}.


See examples below.

Match Values Examples

Successful basic match attempt (no captures):

scanner = StringScanner.new('foobarbaz')
scanner.exist?(/bar/)
put_match_values(scanner)
# Basic match values:
#   matched?:       true
#   matched_size:   3
#   pre_match:      "foo"
#   matched  :      "bar"
#   post_match:     "baz"
# Captured match values:
#   size:           1
#   captures:       []
#   named_captures: {}
#   values_at:      ["bar", nil]
#   []:
#     [0]:          "bar"
#     [1]:          nil

Failed basic match attempt (no captures);

scanner = StringScanner.new('foobarbaz')
scanner.exist?(/nope/)
match_values_cleared?(scanner) # => true

Successful unnamed capture match attempt:

scanner = StringScanner.new('foobarbazbatbam')
scanner.exist?(/(foo)bar(baz)bat(bam)/)
put_match_values(scanner)
# Basic match values:
#   matched?:       true
#   matched_size:   15
#   pre_match:      ""
#   matched  :      "foobarbazbatbam"
#   post_match:     ""
# Captured match values:
#   size:           4
#   captures:       ["foo", "baz", "bam"]
#   named_captures: {}
#   values_at:      ["foobarbazbatbam", "foo", "baz", "bam", nil]
#   []:
#     [0]:          "foobarbazbatbam"
#     [1]:          "foo"
#     [2]:          "baz"
#     [3]:          "bam"
#     [4]:          nil

Successful named capture match attempt; same as unnamed above, except for named_captures:

scanner = StringScanner.new('foobarbazbatbam')
scanner.exist?(/(?<x>foo)bar(?<y>baz)bat(?<z>bam)/)
scanner.named_captures # => {"x"=>"foo", "y"=>"baz", "z"=>"bam"}

Failed unnamed capture match attempt:

scanner = StringScanner.new('somestring')
scanner.exist?(/(foo)bar(baz)bat(bam)/)
match_values_cleared?(scanner) # => true

Failed named capture match attempt; same as unnamed above, except for named_captures:

scanner = StringScanner.new('somestring')
scanner.exist?(/(?<x>foo)bar(?<y>baz)bat(?<z>bam)/)
match_values_cleared?(scanner) # => false
scanner.named_captures # => {"x"=>nil, "y"=>nil, "z"=>nil}

Fixed-Anchor Property

Pattern matching in StringScanner is the same as in Ruby’s, except for its fixed-anchor property, which determines the meaning of '\A':

  • false (the default): matches the current byte position.

    scanner = StringScanner.new('foobar')
    scanner.scan(/\A./) # => "f"
    scanner.scan(/\A./) # => "o"
    scanner.scan(/\A./) # => "o"
    scanner.scan(/\A./) # => "b"
    
  • true: matches the beginning of the target substring; never matches unless the byte position is zero:

    scanner = StringScanner.new('foobar', fixed_anchor: true)
    scanner.scan(/\A./) # => "f"
    scanner.scan(/\A./) # => nil
    scanner.reset
    scanner.scan(/\A./) # => "f"
    

The fixed-anchor property is set when the StringScanner object is created, and may not be modified (see StringScanner.new); method fixed_anchor? returns the setting.

Public Class Methods

Returns a new StringScanner object whose stored string is the given string; sets the fixed-anchor property:

scanner = StringScanner.new('foobarbaz')
scanner.string        # => "foobarbaz"
scanner.fixed_anchor? # => false
put_situation(scanner)
# Situation:
#   pos:       0
#   charpos:   0
#   rest:      "foobarbaz"
#   rest_size: 9
static VALUE
strscan_initialize(int argc, VALUE *argv, VALUE self)
{
    struct strscanner *p;
    VALUE str, options;

    p = check_strscan(self);
    rb_scan_args(argc, argv, "11", &str, &options);
    options = rb_check_hash_type(options);
    if (!NIL_P(options)) {
        VALUE fixed_anchor;
        ID keyword_ids[1];
        keyword_ids[0] = rb_intern("fixed_anchor");
        rb_get_kwargs(options, keyword_ids, 0, 1, &fixed_anchor);
        if (fixed_anchor == Qundef) {
            p->fixed_anchor_p = false;
        }
        else {
            p->fixed_anchor_p = RTEST(fixed_anchor);
        }
    }
    else {
        p->fixed_anchor_p = false;
    }
    StringValue(str);
    p->str = str;

    return self;
}

Public Instance Methods

Alias for: concat

Returns a captured substring or nil; see Captured Match Values.

When there are captures:

scanner = StringScanner.new('Fri Dec 12 1975 14:39')
scanner.scan(/(?<wday>\w+) (?<month>\w+) (?<day>\d+) /)
  • specifier zero: returns the entire matched substring:

    scanner[0]         # => "Fri Dec 12 "
    scanner.pre_match  # => ""
    scanner.post_match # => "1975 14:39"
    
  • specifier positive integer. returns the nth capture, or nil if out of range:

    scanner[1] # => "Fri"
    scanner[2] # => "Dec"
    scanner[3] # => "12"
    scanner[4] # => nil
    
  • specifier negative integer. counts backward from the last subgroup:

    scanner[-1] # => "12"
    scanner[-4] # => "Fri Dec 12 "
    scanner[-5] # => nil
    
  • specifier symbol or string. returns the named subgroup, or nil if no such:

    scanner[:wday]  # => "Fri"
    scanner['wday'] # => "Fri"
    scanner[:month] # => "Dec"
    scanner[:day]   # => "12"
    scanner[:nope]  # => nil
    

When there are no captures, only [0] returns non-nil:

scanner = StringScanner.new('foobarbaz')
scanner.exist?(/bar/)
scanner[0] # => "bar"
scanner[1] # => nil

For a failed match, even [0] returns nil:

scanner.scan(/nope/) # => nil
scanner[0]           # => nil
scanner[1]           # => nil
static VALUE
strscan_aref(VALUE self, VALUE idx)
{
    const char *name;
    struct strscanner *p;
    long i;

    GET_SCANNER(self, p);
    if (! MATCHED_P(p))        return Qnil;

    switch (TYPE(idx)) {
        case T_SYMBOL:
            idx = rb_sym2str(idx);
            /* fall through */
        case T_STRING:
            if (!RTEST(p->regex)) return Qnil;
            RSTRING_GETMEM(idx, name, i);
            i = name_to_backref_number(&(p->regs), p->regex, name, name + i, rb_enc_get(idx));
            break;
        default:
            i = NUM2LONG(idx);
    }

    if (i < 0)
        i += p->regs.num_regs;
    if (i < 0)                 return Qnil;
    if (i >= p->regs.num_regs) return Qnil;
    if (p->regs.beg[i] == -1)  return Qnil;

    return extract_range(p,
                         adjust_register_position(p, p->regs.beg[i]),
                         adjust_register_position(p, p->regs.end[i]));
}

Returns whether the position is at the beginning of a line; that is, at the beginning of the stored string or immediately after a newline:

scanner = StringScanner.new(MULTILINE_TEXT)
scanner.string
# => "Go placidly amid the noise and haste,\nand remember what peace there may be in silence.\n"
scanner.pos                # => 0
scanner.beginning_of_line? # => true

scanner.scan_until(/,/)    # => "Go placidly amid the noise and haste,"
scanner.beginning_of_line? # => false

scanner.scan(/\n/)         # => "\n"
scanner.beginning_of_line? # => true

scanner.terminate
scanner.beginning_of_line? # => true

scanner.concat('x')
scanner.terminate
scanner.beginning_of_line? # => false

StringScanner#bol? is an alias for StringScanner#beginning_of_line?.

static VALUE
strscan_bol_p(VALUE self)
{
    struct strscanner *p;

    GET_SCANNER(self, p);
    if (CURPTR(p) > S_PEND(p)) return Qnil;
    if (p->curr == 0) return Qtrue;
    return (*(CURPTR(p) - 1) == '\n') ? Qtrue : Qfalse;
}

Returns the array of captured match values at indexes (1..) if the most recent match attempt succeeded, or nil otherwise:

scanner = StringScanner.new('Fri Dec 12 1975 14:39')
scanner.captures         # => nil

scanner.exist?(/(?<wday>\w+) (?<month>\w+) (?<day>\d+) /)
scanner.captures         # => ["Fri", "Dec", "12"]
scanner.values_at(*0..4) # => ["Fri Dec 12 ", "Fri", "Dec", "12", nil]

scanner.exist?(/Fri/)
scanner.captures         # => []

scanner.scan(/nope/)
scanner.captures         # => nil
static VALUE
strscan_captures(VALUE self)
{
    struct strscanner *p;
    int   i, num_regs;
    VALUE new_ary;

    GET_SCANNER(self, p);
    if (! MATCHED_P(p))        return Qnil;

    num_regs = p->regs.num_regs;
    new_ary  = rb_ary_new2(num_regs);

    for (i = 1; i < num_regs; i++) {
        VALUE str;
        if (p->regs.beg[i] == -1)
            str = Qnil;
        else
            str = extract_range(p,
                                adjust_register_position(p, p->regs.beg[i]),
                                adjust_register_position(p, p->regs.end[i]));
        rb_ary_push(new_ary, str);
    }

    return new_ary;
}

call-seq: charpos -> character_position

Returns the character position (initially zero), which may be different from the byte position given by method pos:

scanner = StringScanner.new(HIRAGANA_TEXT)
scanner.string # => "こんにちは"
scanner.getch  # => "こ" # 3-byte character.
scanner.getch  # => "ん" # 3-byte character.
put_situation(scanner)
# Situation:
#   pos:       6
#   charpos:   2
#   rest:      "にちは"
#   rest_size: 9
static VALUE
strscan_get_charpos(VALUE self)
{
    struct strscanner *p;

    GET_SCANNER(self, p);

    return LONG2NUM(rb_enc_strlen(S_PBEG(p), CURPTR(p), rb_enc_get(p->str)));
}

Attempts to match the given pattern at the beginning of the target substring; does not modify the positions.

If the match succeeds:

scanner = StringScanner.new('foobarbaz')
scanner.pos = 3
scanner.check('bar') # => "bar"
put_match_values(scanner)
# Basic match values:
#   matched?:       true
#   matched_size:   3
#   pre_match:      "foo"
#   matched  :      "bar"
#   post_match:     "baz"
# Captured match values:
#   size:           1
#   captures:       []
#   named_captures: {}
#   values_at:      ["bar", nil]
#   []:
#     [0]:          "bar"
#     [1]:          nil
# => 0..1
put_situation(scanner)
# Situation:
#   pos:       3
#   charpos:   3
#   rest:      "barbaz"
#   rest_size: 6

If the match fails:

scanner.check(/nope/)          # => nil
match_values_cleared?(scanner) # => true
static VALUE
strscan_check(VALUE self, VALUE re)
{
    return strscan_do_scan(self, re, 0, 1, 1);
}

Attempts to match the given pattern anywhere (at any position) in the target substring; does not modify the positions.

If the match succeeds:

  • Sets all match values.

  • Returns the matched substring, which extends from the current position to the end of the matched substring.

scanner = StringScanner.new('foobarbazbatbam')
scanner.pos = 6
scanner.check_until(/bat/) # => "bazbat"
put_match_values(scanner)
# Basic match values:
#   matched?:       true
#   matched_size:   3
#   pre_match:      "foobarbaz"
#   matched  :      "bat"
#   post_match:     "bam"
# Captured match values:
#   size:           1
#   captures:       []
#   named_captures: {}
#   values_at:      ["bat", nil]
#   []:
#     [0]:          "bat"
#     [1]:          nil
put_situation(scanner)
# Situation:
#   pos:       6
#   charpos:   6
#   rest:      "bazbatbam"
#   rest_size: 9

If the match fails:

scanner.check_until(/nope/)    # => nil
match_values_cleared?(scanner) # => true
static VALUE
strscan_check_until(VALUE self, VALUE re)
{
    return strscan_do_scan(self, re, 0, 1, 0);
}
scanner = StringScanner.new('foo')
scanner.string           # => "foo"
scanner.terminate
scanner.concat('barbaz') # => #<StringScanner 3/9 "foo" @ "barba...">
scanner.string           # => "foobarbaz"
put_situation(scanner)
# Situation:
#   pos:       3
#   charpos:   3
#   rest:      "barbaz"
#   rest_size: 6
static VALUE
strscan_concat(VALUE self, VALUE str)
{
    struct strscanner *p;

    GET_SCANNER(self, p);
    StringValue(str);
    rb_str_append(p->str, str);
    return self;
}
Also aliased as: <<

Returns whether the position is at the end of the stored string:

scanner = StringScanner.new('foobarbaz')
scanner.eos? # => false
pos = 3
scanner.eos? # => false
scanner.terminate
scanner.eos? # => true
static VALUE
strscan_eos_p(VALUE self)
{
    struct strscanner *p;

    GET_SCANNER(self, p);
    return EOS_P(p) ? Qtrue : Qfalse;
}

Attempts to match the given pattern anywhere (at any position) n the target substring; does not modify the positions.

If the match succeeds:

  • Returns a byte offset: the distance in bytes between the current position and the end of the matched substring.

  • Sets all match values.

scanner = StringScanner.new('foobarbazbatbam')
scanner.pos = 6
scanner.exist?(/bat/) # => 6
put_match_values(scanner)
# Basic match values:
#   matched?:       true
#   matched_size:   3
#   pre_match:      "foobarbaz"
#   matched  :      "bat"
#   post_match:     "bam"
# Captured match values:
#   size:           1
#   captures:       []
#   named_captures: {}
#   values_at:      ["bat", nil]
#   []:
#     [0]:          "bat"
#     [1]:          nil
put_situation(scanner)
# Situation:
#   pos:       6
#   charpos:   6
#   rest:      "bazbatbam"
#   rest_size: 9

If the match fails:

scanner.exist?(/nope/)         # => nil
match_values_cleared?(scanner) # => true
static VALUE
strscan_exist_p(VALUE self, VALUE re)
{
    return strscan_do_scan(self, re, 0, 0, 0);
}

Returns whether the fixed-anchor property is set.

static VALUE
strscan_fixed_anchor_p(VALUE self)
{
    struct strscanner *p;
    p = check_strscan(self);
    return p->fixed_anchor_p ? Qtrue : Qfalse;
}

call-seq: get_byte -> byte_as_character or nil

Returns the next byte, if available:

  • If the position is not at the end of the stored string:

    scanner = StringScanner.new(HIRAGANA_TEXT)
    # => #<StringScanner 0/15 @ "\xE3\x81\x93\xE3\x82...">
    scanner.string                                   # => "こんにちは"
    [scanner.get_byte, scanner.pos, scanner.charpos] # => ["\xE3", 1, 1]
    [scanner.get_byte, scanner.pos, scanner.charpos] # => ["\x81", 2, 2]
    [scanner.get_byte, scanner.pos, scanner.charpos] # => ["\x93", 3, 1]
    [scanner.get_byte, scanner.pos, scanner.charpos] # => ["\xE3", 4, 2]
    [scanner.get_byte, scanner.pos, scanner.charpos] # => ["\x82", 5, 3]
    [scanner.get_byte, scanner.pos, scanner.charpos] # => ["\x93", 6, 2]
    
  • Otherwise, returns nil, and does not change the positions.

    scanner.terminate
    [scanner.get_byte, scanner.pos, scanner.charpos] # => [nil, 15, 5]
    
static VALUE
strscan_get_byte(VALUE self)
{
    struct strscanner *p;

    GET_SCANNER(self, p);
    CLEAR_MATCH_STATUS(p);
    if (EOS_P(p))
        return Qnil;

    p->prev = p->curr;
    p->curr++;
    MATCHED(p);
    adjust_registers_to_matched(p);
    return extract_range(p,
                         adjust_register_position(p, p->regs.beg[0]),
                         adjust_register_position(p, p->regs.end[0]));
}

call-seq: getch -> character or nil

Returns the next (possibly multibyte) character, if available:

  • If the position is at the beginning of a character:

    scanner = StringScanner.new(HIRAGANA_TEXT)
    scanner.string                                # => "こんにちは"
    [scanner.getch, scanner.pos, scanner.charpos] # => ["こ", 3, 1]
    [scanner.getch, scanner.pos, scanner.charpos] # => ["ん", 6, 2]
    [scanner.getch, scanner.pos, scanner.charpos] # => ["に", 9, 3]
    [scanner.getch, scanner.pos, scanner.charpos] # => ["ち", 12, 4]
    [scanner.getch, scanner.pos, scanner.charpos] # => ["は", 15, 5]
    [scanner.getch, scanner.pos, scanner.charpos] # => [nil, 15, 5]
    
  • If the position is within a multi-byte character (that is, not at its beginning), behaves like get_byte (returns a 1-byte character):

    scanner.pos = 1
    [scanner.getch, scanner.pos, scanner.charpos] # => ["\x81", 2, 2]
    [scanner.getch, scanner.pos, scanner.charpos] # => ["\x93", 3, 1]
    [scanner.getch, scanner.pos, scanner.charpos] # => ["ん", 6, 2]
    
  • If the position is at the end of the stored string, returns nil and does not modify the positions:

    scanner.terminate
    [scanner.getch, scanner.pos, scanner.charpos] # => [nil, 15, 5]
    
static VALUE
strscan_getch(VALUE self)
{
    struct strscanner *p;
    long len;

    GET_SCANNER(self, p);
    CLEAR_MATCH_STATUS(p);
    if (EOS_P(p))
        return Qnil;

    len = rb_enc_mbclen(CURPTR(p), S_PEND(p), rb_enc_get(p->str));
    len = minl(len, S_RESTLEN(p));
    p->prev = p->curr;
    p->curr += len;
    MATCHED(p);
    adjust_registers_to_matched(p);
    return extract_range(p,
                         adjust_register_position(p, p->regs.beg[0]),
                         adjust_register_position(p, p->regs.end[0]));
}

Returns a string representation of self that may show:

  1. The current position.

  2. The size (in bytes) of the stored string.

  3. The substring preceding the current position.

  4. The substring following the current position (which is also the target substring).

scanner = StringScanner.new("Fri Dec 12 1975 14:39")
scanner.pos = 11
scanner.inspect # => "#<StringScanner 11/21 \"...c 12 \" @ \"1975 ...\">"

If at beginning-of-string, item 4 above (following substring) is omitted:

scanner.reset
scanner.inspect # => "#<StringScanner 0/21 @ \"Fri D...\">"

If at end-of-string, all items above are omitted:

scanner.terminate
scanner.inspect # => "#<StringScanner fin>"
static VALUE
strscan_inspect(VALUE self)
{
    struct strscanner *p;
    VALUE a, b;

    p = check_strscan(self);
    if (NIL_P(p->str)) {
        a = rb_sprintf("#<%"PRIsVALUE" (uninitialized)>", rb_obj_class(self));
        return a;
    }
    if (EOS_P(p)) {
        a = rb_sprintf("#<%"PRIsVALUE" fin>", rb_obj_class(self));
        return a;
    }
    if (p->curr == 0) {
        b = inspect2(p);
        a = rb_sprintf("#<%"PRIsVALUE" %ld/%ld @ %"PRIsVALUE">",
                       rb_obj_class(self),
                       p->curr, S_LEN(p),
                       b);
        return a;
    }
    a = inspect1(p);
    b = inspect2(p);
    a = rb_sprintf("#<%"PRIsVALUE" %ld/%ld %"PRIsVALUE" @ %"PRIsVALUE">",
                   rb_obj_class(self),
                   p->curr, S_LEN(p),
                   a, b);
    return a;
}

Attempts to match the given pattern at the beginning of the target substring; does not modify the positions.

If the match succeeds:

  • Sets match values.

  • Returns the size in bytes of the matched substring.

scanner = StringScanner.new('foobarbaz')
scanner.pos = 3
scanner.match?(/bar/) => 3
put_match_values(scanner)
# Basic match values:
#   matched?:       true
#   matched_size:   3
#   pre_match:      "foo"
#   matched  :      "bar"
#   post_match:     "baz"
# Captured match values:
#   size:           1
#   captures:       []
#   named_captures: {}
#   values_at:      ["bar", nil]
#   []:
#     [0]:          "bar"
#     [1]:          nil
put_situation(scanner)
# Situation:
#   pos:       3
#   charpos:   3
#   rest:      "barbaz"
#   rest_size: 6

If the match fails:

  • Clears match values.

  • Returns nil.

  • Does not increment positions.

scanner.match?(/nope/)         # => nil
match_values_cleared?(scanner) # => true
static VALUE
strscan_match_p(VALUE self, VALUE re)
{
    return strscan_do_scan(self, re, 0, 0, 1);
}

Returns the matched substring from the most recent match attempt if it was successful, or nil otherwise; see Basic Matched Values:

scanner = StringScanner.new('foobarbaz')
scanner.matched        # => nil
scanner.pos = 3
scanner.match?(/bar/)  # => 3
scanner.matched        # => "bar"
scanner.match?(/nope/) # => nil
scanner.matched        # => nil
static VALUE
strscan_matched(VALUE self)
{
    struct strscanner *p;

    GET_SCANNER(self, p);
    if (! MATCHED_P(p)) return Qnil;
    return extract_range(p,
                         adjust_register_position(p, p->regs.beg[0]),
                         adjust_register_position(p, p->regs.end[0]));
}

Returns true of the most recent match attempt was successful, false otherwise; see Basic Matched Values:

scanner = StringScanner.new('foobarbaz')
scanner.matched?       # => false
scanner.pos = 3
scanner.exist?(/baz/)  # => 6
scanner.matched?       # => true
scanner.exist?(/nope/) # => nil
scanner.matched?       # => false
static VALUE
strscan_matched_p(VALUE self)
{
    struct strscanner *p;

    GET_SCANNER(self, p);
    return MATCHED_P(p) ? Qtrue : Qfalse;
}

Returns the size (in bytes) of the matched substring from the most recent match match attempt if it was successful, or nil otherwise; see Basic Matched Values:

scanner = StringScanner.new('foobarbaz')
scanner.matched_size   # => nil

pos = 3
scanner.exist?(/baz/)  # => 9
scanner.matched_size   # => 3

scanner.exist?(/nope/) # => nil
scanner.matched_size   # => nil
static VALUE
strscan_matched_size(VALUE self)
{
    struct strscanner *p;

    GET_SCANNER(self, p);
    if (! MATCHED_P(p)) return Qnil;
    return LONG2NUM(p->regs.end[0] - p->regs.beg[0]);
}

Returns the array of captured match values at indexes (1..) if the most recent match attempt succeeded, or nil otherwise; see Captured Match Values:

scanner = StringScanner.new('Fri Dec 12 1975 14:39')
scanner.named_captures # => {}

pattern = /(?<wday>\w+) (?<month>\w+) (?<day>\d+) /
scanner.match?(pattern)
scanner.named_captures # => {"wday"=>"Fri", "month"=>"Dec", "day"=>"12"}

scanner.string = 'nope'
scanner.match?(pattern)
scanner.named_captures # => {"wday"=>nil, "month"=>nil, "day"=>nil}

scanner.match?(/nosuch/)
scanner.named_captures # => {}
static VALUE
strscan_named_captures(VALUE self)
{
    struct strscanner *p;
    named_captures_data data;
    GET_SCANNER(self, p);
    data.self = self;
    data.captures = rb_hash_new();
    if (!RB_NIL_P(p->regex)) {
        onig_foreach_name(RREGEXP_PTR(p->regex), named_captures_iter, &data);
    }

    return data.captures;
}

Returns the substring string[pos, length]; does not update match values or positions:

scanner = StringScanner.new('foobarbaz')
scanner.pos = 3
scanner.peek(3)   # => "bar"
scanner.terminate
scanner.peek(3)   # => ""
static VALUE
strscan_peek(VALUE self, VALUE vlen)
{
    struct strscanner *p;
    long len;

    GET_SCANNER(self, p);

    len = NUM2LONG(vlen);
    if (EOS_P(p))
        return str_new(p, "", 0);

    len = minl(len, S_RESTLEN(p));
    return extract_beg_len(p, p->curr, len);
}

Peeks at the current byte and returns it as an integer.

s = StringScanner.new('ab')
s.peek_byte         # => 97
static VALUE
strscan_peek_byte(VALUE self)
{
    struct strscanner *p;

    GET_SCANNER(self, p);
    if (EOS_P(p))
        return Qnil;

    return INT2FIX((unsigned char)*CURPTR(p));
}

call-seq: pos -> byte_position

Returns the integer byte position, which may be different from the character position:

scanner = StringScanner.new(HIRAGANA_TEXT)
scanner.string  # => "こんにちは"
scanner.pos     # => 0
scanner.getch   # => "こ" # 3-byte character.
scanner.charpos # => 1
scanner.pos     # => 3
Alias for: pos

call-seq: pos = n -> n pointer = n -> n

Sets the byte position and the character position; returns n.

Does not affect match values.

For non-negative n, sets the position to n:

scanner = StringScanner.new(HIRAGANA_TEXT)
scanner.string  # => "こんにちは"
scanner.pos = 3 # => 3
scanner.rest    # => "んにちは"
scanner.charpos # => 1

For negative n, counts from the end of the stored string:

scanner.pos = -9 # => -9
scanner.pos      # => 6
scanner.rest     # => "にちは"
scanner.charpos  # => 2
Alias for: pos=

call-seq: pos -> byte_position

Returns the integer byte position, which may be different from the character position:

scanner = StringScanner.new(HIRAGANA_TEXT)
scanner.string  # => "こんにちは"
scanner.pos     # => 0
scanner.getch   # => "こ" # 3-byte character.
scanner.charpos # => 1
scanner.pos     # => 3
static VALUE
strscan_get_pos(VALUE self)
{
    struct strscanner *p;

    GET_SCANNER(self, p);
    return INT2FIX(p->curr);
}
Also aliased as: pointer

call-seq: pos = n -> n pointer = n -> n

Sets the byte position and the character position; returns n.

Does not affect match values.

For non-negative n, sets the position to n:

scanner = StringScanner.new(HIRAGANA_TEXT)
scanner.string  # => "こんにちは"
scanner.pos = 3 # => 3
scanner.rest    # => "んにちは"
scanner.charpos # => 1

For negative n, counts from the end of the stored string:

scanner.pos = -9 # => -9
scanner.pos      # => 6
scanner.rest     # => "にちは"
scanner.charpos  # => 2
static VALUE
strscan_set_pos(VALUE self, VALUE v)
{
    struct strscanner *p;
    long i;

    GET_SCANNER(self, p);
    i = NUM2INT(v);
    if (i < 0) i += S_LEN(p);
    if (i < 0) rb_raise(rb_eRangeError, "index out of range");
    if (i > S_LEN(p)) rb_raise(rb_eRangeError, "index out of range");
    p->curr = i;
    return LONG2NUM(i);
}
Also aliased as: pointer=

Returns the substring that follows the matched substring from the most recent match attempt if it was successful, or nil otherwise; see Basic Match Values:

scanner = StringScanner.new('foobarbaz')
scanner.post_match     # => nil

scanner.pos = 3
scanner.match?(/bar/)  # => 3
scanner.post_match     # => "baz"

scanner.match?(/nope/) # => nil
scanner.post_match     # => nil
static VALUE
strscan_post_match(VALUE self)
{
    struct strscanner *p;

    GET_SCANNER(self, p);
    if (! MATCHED_P(p)) return Qnil;
    return extract_range(p,
                         adjust_register_position(p, p->regs.end[0]),
                         S_LEN(p));
}

Returns the substring that precedes the matched substring from the most recent match attempt if it was successful, or nil otherwise; see Basic Match Values:

scanner = StringScanner.new('foobarbaz')
scanner.pre_match      # => nil

scanner.pos = 3
scanner.exist?(/baz/)  # => 6
scanner.pre_match      # => "foobar" # Substring of entire string, not just target string.

scanner.exist?(/nope/) # => nil
scanner.pre_match      # => nil
static VALUE
strscan_pre_match(VALUE self)
{
    struct strscanner *p;

    GET_SCANNER(self, p);
    if (! MATCHED_P(p)) return Qnil;
    return extract_range(p,
                         0,
                         adjust_register_position(p, p->regs.beg[0]));
}

Sets both byte position and character position to zero, and clears match values; returns self:

scanner = StringScanner.new('foobarbaz')
scanner.exist?(/bar/)          # => 6
scanner.reset                  # => #<StringScanner 0/9 @ "fooba...">
put_situation(scanner)
# Situation:
#   pos:       0
#   charpos:   0
#   rest:      "foobarbaz"
#   rest_size: 9
# => nil
match_values_cleared?(scanner) # => true
static VALUE
strscan_reset(VALUE self)
{
    struct strscanner *p;

    GET_SCANNER(self, p);
    p->curr = 0;
    CLEAR_MATCH_STATUS(p);
    return self;
}

Returns the ‘rest’ of the stored string (all after the current position), which is the target substring:

scanner = StringScanner.new('foobarbaz')
scanner.rest # => "foobarbaz"
scanner.pos = 3
scanner.rest # => "barbaz"
scanner.terminate
scanner.rest # => ""
static VALUE
strscan_rest(VALUE self)
{
    struct strscanner *p;

    GET_SCANNER(self, p);
    if (EOS_P(p)) {
        return str_new(p, "", 0);
    }
    return extract_range(p, p->curr, S_LEN(p));
}

Returns the size (in bytes) of the rest of the stored string:

scanner = StringScanner.new('foobarbaz')
scanner.rest      # => "foobarbaz"
scanner.rest_size # => 9
scanner.pos = 3
scanner.rest      # => "barbaz"
scanner.rest_size # => 6
scanner.terminate
scanner.rest      # => ""
scanner.rest_size # => 0
static VALUE
strscan_rest_size(VALUE self)
{
    struct strscanner *p;
    long i;

    GET_SCANNER(self, p);
    if (EOS_P(p)) {
        return INT2FIX(0);
    }
    i = S_RESTLEN(p);
    return INT2FIX(i);
}

call-seq: scan(pattern) -> substring or nil

Attempts to match the given pattern at the beginning of the target substring.

If the match succeeds:

scanner = StringScanner.new(HIRAGANA_TEXT)
scanner.string     # => "こんにちは"
scanner.pos = 6
scanner.scan(/に/) # => "に"
put_match_values(scanner)
# Basic match values:
#   matched?:       true
#   matched_size:   3
#   pre_match:      "こん"
#   matched  :      "に"
#   post_match:     "ちは"
# Captured match values:
#   size:           1
#   captures:       []
#   named_captures: {}
#   values_at:      ["に", nil]
#   []:
#     [0]:          "に"
#     [1]:          nil
put_situation(scanner)
# Situation:
#   pos:       9
#   charpos:   3
#   rest:      "ちは"
#   rest_size: 6

If the match fails:

  • Returns nil.

  • Does not increment byte and character positions.

  • Clears match values.

scanner.scan(/nope/)           # => nil
match_values_cleared?(scanner) # => true
static VALUE
strscan_scan(VALUE self, VALUE re)
{
    return strscan_do_scan(self, re, 1, 1, 1);
}

Scans one byte and returns it as an integer. This method is not multibyte character sensitive. See also: getch.

static VALUE
strscan_scan_byte(VALUE self)
{
    struct strscanner *p;
    VALUE byte;

    GET_SCANNER(self, p);
    CLEAR_MATCH_STATUS(p);
    if (EOS_P(p))
        return Qnil;

    byte = INT2FIX((unsigned char)*CURPTR(p));
    p->prev = p->curr;
    p->curr++;
    MATCHED(p);
    adjust_registers_to_matched(p);
    return byte;
}

call-seq: scan_until(pattern) -> substring or nil

Attempts to match the given pattern anywhere (at any position) in the target substring.

If the match attempt succeeds:

scanner = StringScanner.new(HIRAGANA_TEXT)
scanner.string           # => "こんにちは"
scanner.pos = 6
scanner.scan_until(/ち/) # => "にち"
put_match_values(scanner)
# Basic match values:
#   matched?:       true
#   matched_size:   3
#   pre_match:      "こんに"
#   matched  :      "ち"
#   post_match:     "は"
# Captured match values:
#   size:           1
#   captures:       []
#   named_captures: {}
#   values_at:      ["ち", nil]
#   []:
#     [0]:          "ち"
#     [1]:          nil
put_situation(scanner)
# Situation:
#   pos:       12
#   charpos:   4
#   rest:      "は"
#   rest_size: 3

If the match attempt fails:

  • Clears match data.

  • Returns nil.

  • Does not update positions.

scanner.scan_until(/nope/)     # => nil
match_values_cleared?(scanner) # => true
static VALUE
strscan_scan_until(VALUE self, VALUE re)
{
    return strscan_do_scan(self, re, 1, 1, 0);
}

Returns the count of captures if the most recent match attempt succeeded, nil otherwise; see Captures Match Values:

scanner = StringScanner.new('Fri Dec 12 1975 14:39')
scanner.size                        # => nil

pattern = /(?<wday>\w+) (?<month>\w+) (?<day>\d+) /
scanner.match?(pattern)
scanner.values_at(*0..scanner.size) # => ["Fri Dec 12 ", "Fri", "Dec", "12", nil]
scanner.size                        # => 4

scanner.match?(/nope/)              # => nil
scanner.size                        # => nil
static VALUE
strscan_size(VALUE self)
{
    struct strscanner *p;

    GET_SCANNER(self, p);
    if (! MATCHED_P(p))        return Qnil;
    return INT2FIX(p->regs.num_regs);
}

call-seq: skip(pattern) match_size or nil

Attempts to match the given pattern at the beginning of the target substring;

If the match succeeds:

scanner = StringScanner.new(HIRAGANA_TEXT)
scanner.string                  # => "こんにちは"
scanner.pos = 6
scanner.skip(/に/)              # => 3
put_match_values(scanner)
# Basic match values:
#   matched?:       true
#   matched_size:   3
#   pre_match:      "こん"
#   matched  :      "に"
#   post_match:     "ちは"
# Captured match values:
#   size:           1
#   captures:       []
#   named_captures: {}
#   values_at:      ["に", nil]
#   []:
#     [0]:          "に"
#     [1]:          nil
put_situation(scanner)
# Situation:
#   pos:       9
#   charpos:   3
#   rest:      "ちは"
#   rest_size: 6

scanner.skip(/nope/)            # => nil
match_values_cleared?(scanner)  # => true
static VALUE
strscan_skip(VALUE self, VALUE re)
{
    return strscan_do_scan(self, re, 1, 0, 1);
}

call-seq: skip_until(pattern) -> matched_substring_size or nil

Attempts to match the given pattern anywhere (at any position) in the target substring; does not modify the positions.

If the match attempt succeeds:

  • Sets match values.

  • Returns the size of the matched substring.

scanner = StringScanner.new(HIRAGANA_TEXT)
scanner.string           # => "こんにちは"
scanner.pos = 6
scanner.skip_until(/ち/) # => 6
put_match_values(scanner)
# Basic match values:
#   matched?:       true
#   matched_size:   3
#   pre_match:      "こんに"
#   matched  :      "ち"
#   post_match:     "は"
# Captured match values:
#   size:           1
#   captures:       []
#   named_captures: {}
#   values_at:      ["ち", nil]
#   []:
#     [0]:          "ち"
#     [1]:          nil
put_situation(scanner)
# Situation:
#   pos:       12
#   charpos:   4
#   rest:      "は"
#   rest_size: 3

If the match attempt fails:

  • Clears match values.

  • Returns nil.

scanner.skip_until(/nope/)     # => nil
match_values_cleared?(scanner) # => true
static VALUE
strscan_skip_until(VALUE self, VALUE re)
{
    return strscan_do_scan(self, re, 1, 0, 0);
}

Returns the stored string:

scanner = StringScanner.new('foobar')
scanner.string # => "foobar"
scanner.concat('baz')
scanner.string # => "foobarbaz"
static VALUE
strscan_get_string(VALUE self)
{
    struct strscanner *p;

    GET_SCANNER(self, p);
    return p->str;
}

Replaces the stored string with the given other_string:

scanner = StringScanner.new('foobar')
scanner.scan(/foo/)
put_situation(scanner)
# Situation:
#   pos:       3
#   charpos:   3
#   rest:      "bar"
#   rest_size: 3
match_values_cleared?(scanner) # => false

scanner.string = 'baz'         # => "baz"
put_situation(scanner)
# Situation:
#   pos:       0
#   charpos:   0
#   rest:      "baz"
#   rest_size: 3
match_values_cleared?(scanner) # => true
static VALUE
strscan_set_string(VALUE self, VALUE str)
{
    struct strscanner *p = check_strscan(self);

    StringValue(str);
    p->str = str;
    p->curr = 0;
    CLEAR_MATCH_STATUS(p);
    return str;
}

call-seq: terminate -> self

Sets the scanner to end-of-string; returns self:

scanner = StringScanner.new(HIRAGANA_TEXT)
scanner.string                 # => "こんにちは"
scanner.scan_until(/に/)
put_situation(scanner)
# Situation:
#   pos:       9
#   charpos:   3
#   rest:      "ちは"
#   rest_size: 6
match_values_cleared?(scanner) # => false

scanner.terminate              # => #<StringScanner fin>
put_situation(scanner)
# Situation:
#   pos:       15
#   charpos:   5
#   rest:      ""
#   rest_size: 0
match_values_cleared?(scanner) # => true
static VALUE
strscan_terminate(VALUE self)
{
    struct strscanner *p;

    GET_SCANNER(self, p);
    p->curr = S_LEN(p);
    CLEAR_MATCH_STATUS(p);
    return self;
}

Sets the position to its value previous to the recent successful match attempt:

scanner = StringScanner.new('foobarbaz')
scanner.scan(/foo/)
put_situation(scanner)
# Situation:
#   pos:       3
#   charpos:   3
#   rest:      "barbaz"
#   rest_size: 6
scanner.unscan
# => #<StringScanner 0/9 @ "fooba...">
put_situation(scanner)
# Situation:
#   pos:       0
#   charpos:   0
#   rest:      "foobarbaz"
#   rest_size: 9

Raises an exception if match values are clear:

scanner.scan(/nope/)           # => nil
match_values_cleared?(scanner) # => true
scanner.unscan                 # Raises StringScanner::Error.
static VALUE
strscan_unscan(VALUE self)
{
    struct strscanner *p;

    GET_SCANNER(self, p);
    if (! MATCHED_P(p))
        rb_raise(ScanError, "unscan failed: previous match record not exist");
    p->curr = p->prev;
    CLEAR_MATCH_STATUS(p);
    return self;
}

Returns an array of captured substrings, or nil of none.

For each specifier, the returned substring is [specifier]; see [].

scanner = StringScanner.new('Fri Dec 12 1975 14:39')
pattern = /(?<wday>\w+) (?<month>\w+) (?<day>\d+) /
scanner.match?(pattern)
scanner.values_at(*0..3)               # => ["Fri Dec 12 ", "Fri", "Dec", "12"]
scanner.values_at(*%i[wday month day]) # => ["Fri", "Dec", "12"]
static VALUE
strscan_values_at(int argc, VALUE *argv, VALUE self)
{
    struct strscanner *p;
    long i;
    VALUE new_ary;

    GET_SCANNER(self, p);
    if (! MATCHED_P(p))        return Qnil;

    new_ary = rb_ary_new2(argc);
    for (i = 0; i<argc; i++) {
        rb_ary_push(new_ary, strscan_aref(self, argv[i]));
    }

    return new_ary;
}