On this page

Formula reference

Formulas in DataSuite 2 are powered by Math.js with additional string and utility functions. This page covers what’s available — from simple arithmetic to advanced expressions.

Quick start

For step-by-step scoring examples with real questionnaires, see the questionnaire scoring guide.

A formula is evaluated once per row — think of it as writing a calculation that runs for each participant (row) in your data, using that participant’s values. Reference your selected input variables as v1, v2, v3, etc.:

(v1 + v2) / 2

Use the range syntax v1:v7 with aggregate functions:

sum(v1:v7)

That’s enough for most use cases. Read on for the full reference.

Variables

Input variables

When you select variables in the input panel, they are numbered in the order you selected them. For example, if you select Age, Score, and Grade — they become v1, v2, and v3 respectively. The reference badges next to each variable name in the input panel show the assigned numbers.

Reference Meaning
v1, v2, v3, … value of the 1st, 2nd, 3rd, … input variable in the current row
c1, c2, c3, … full column array of the 1st, 2nd, 3rd, … input variable (all rows)
v1:v7 expands to all variables from v1 through v7 — use inside functions like sum(v1:v7)

Missing values are treated as 0 when accessed via v1, v2, etc.

Special variables

Variable Meaning
i current row number (1-based)
v current value of the variable being processed (see below)
c full column array of the variable being processed (see below)

v and c are available when a rule has multiple input variables and the output is set to replace original values. In that case, the formula runs once per variable per row — v is the current cell value and c is the full column. This is useful for applying the same transformation to many variables at once, e.g. v * 100 to convert all selected variables to percentages.

Values and syntax

Numbers

Numbers are written as-is: 42, 3.14, -7. Scientific notation is supported: 1.5e3 (= 1500).

Strings

Strings are enclosed in double quotes: "hello". Use + to concatenate: "ID_" + v1.

Booleans

true and false — returned by comparisons and logical operators, and used in conditionals.

Arrays

Arrays are written with square brackets:

[1, 2, 3]
[v1, v2, v3]
["low", "medium", "high"]

Arrays are important because some functions expect them. For example, min and max can take either separate arguments or an array:

min(v1, v2, v3)
min([v1, v2, v3])

Both forms produce the same result.

The range syntax v1:v7 expands into an array automatically, so sum(v1:v7) becomes sum([v1, v2, v3, v4, v5, v6, v7]) internally and works as expected.

Indexing

Access individual elements of an array using square brackets. Indices are 1-based (not 0-based like in most programming languages):

c1[1]       // first value in column 1
c1[i]       // current row's value in column 1

Operators

Arithmetic

Operator Meaning Example
+ addition (or string concatenation) v1 + v2
- subtraction v1 - 10
* multiplication v1 * 100
/ division v1 / v2
% modulo (remainder) v1 % 2
^ exponentiation v1 ^ 2

The + operator also concatenates strings: "ID_" + v1 produces "ID_42".

Comparison

Operator Meaning
== equal to
!= not equal to
< less than
> greater than
<= less than or equal
>= greater than or equal

These return true or false. Comparisons work with both numbers and strings.

Logical

Operator Meaning Example
and logical AND v1 > 0 and v2 > 0
or logical OR v1 == 0 or v2 == 0
not logical NOT not (v1 == 0)

Conditional (ternary)

condition ? valueIfTrue : valueIfFalse

Example — assign a label based on a score:

v1 >= 70 ? "pass" : "fail"

Nest them for multiple conditions:

v1 >= 90 ? "A" : (v1 >= 70 ? "B" : "C")

Math functions

These are the most commonly used built-in Math.js functions. For the complete list, see the Math.js documentation.

Basic

Function Description Example
abs(x) absolute value abs(v1 - v2)
round(x) round to nearest integer round(v1)
round(x, n) round to n decimal places round(v1, 2)
ceil(x) round up ceil(v1)
floor(x) round down floor(v1)
sign(x) -1, 0, or 1 sign(v1)
sqrt(x) square root sqrt(v1)
cbrt(x) cube root cbrt(v1)
pow(x, n) power (same as x ^ n) pow(v1, 3)
exp(x) e raised to the power x exp(v1)
log(x) natural logarithm log(v1)
log10(x) base-10 logarithm log10(v1)
log2(x) base-2 logarithm log2(v1)

Aggregate

These work with multiple values or arrays (e.g. via the v1:v7 range syntax):

Function Description Example
sum(...) sum of values sum(v1:v5)
mean(...) arithmetic mean mean(v1:v5)
median(...) median median(v1:v5)
min(...) minimum value min(v1, v2, v3)
max(...) maximum value max(v1:v5)
std(...) standard deviation std(v1:v5)
variance(...) variance variance(v1:v5)

Trigonometric

sin(x), cos(x), tan(x), asin(x), acos(x), atan(x) — all in radians.

Custom aggregate functions

Function Description Example
countif(array, value) count occurrences of value in array countif(c1, "yes")
sumR(array, k) sum of (k - x) for each x in array sumR(c1, 100)
coalesce(a, b) returns a if it has a value, otherwise b coalesce(v1, 0)

String functions

All string functions work on text values. Most take a string as the first argument.

Many string functions also support dot syntax — calling the function as a method on the value. Both forms are equivalent:

contains(v1, "test")    // function syntax
v1.contains("test")     // dot syntax — same result

Use whichever reads more naturally. Dot syntax can be especially convenient for chaining: v1.trim().toLowerCase().

Case conversion

Function Description Example
toLowerCase(s) convert to lowercase toLowerCase(v1)
toUpperCase(s) convert to uppercase toUpperCase(v1)
capitalize(s) uppercase first character capitalize(v1)

Inspection

Function Description Returns
length(s) number of characters number
isBlank(s) true if empty or whitespace only boolean
isNotBlank(s) true if contains non-whitespace boolean
isString(x) true if value is a string boolean
contains(s, search) true if s contains search boolean
startsWith(s, prefix) true if s starts with prefix boolean
endsWith(s, suffix) true if s ends with suffix boolean
indexOf(s, search) position of first match (-1 if not found) number
lastIndexOf(s, search) position of last match (-1 if not found) number

Extraction

Function Description Example
substring(s, start) from start to end substring(v1, 3)
substring(s, start, end) from start to end (exclusive) substring(v1, 0, 5)
slice(s, start) like substring, but supports negative indices slice(v1, -3)
slice(s, start, end) extract a portion with negative support slice(v1, 1, -1)
charAt(s, index) character at position charAt(v1, 0)
left(s, n) first n characters left(v1, 3)
right(s, n) last n characters right(v1, 4)

Modification

Function Description Example
trim(s) remove whitespace from both ends trim(v1)
trimStart(s) remove leading whitespace trimStart(v1)
trimEnd(s) remove trailing whitespace trimEnd(v1)
replace(s, search, repl) replace first occurrence replace(v1, "old", "new")
replace(s, pattern, repl, flags) regex replace replace(v1, "\\d+", "N", "g")
replaceAll(s, search, repl) replace all occurrences replaceAll(v1, " ", "_")
reverse(s) reverse the string reverse(v1)
repeat(s, n) repeat n times repeat("*", v2)
padStart(s, len, pad) pad from the left padStart(v1, 5, "0")
padEnd(s, len, pad) pad from the right padEnd(v1, 10, ".")

Splitting and joining

Function Description Example
split(s, separator) split into an array split(v1, ",")
join(array, separator) join array into a string join(split(v1, ","), ";")

Pattern matching

Function Description Example
test(s, pattern) true if regex matches test(v1, "^[A-Z]")
match(s, pattern) returns the match (or null) match(v1, "\\d+")
test(s, pattern, flags) regex with flags test(v1, "hello", "i")

Formatting

format(template, arg0, arg1, ...)

Replaces {0}, {1}, etc. with the corresponding arguments:

format("Subject {0}, Group {1}", v1, v2)

Date functions

Dates in DataSuite 2 are stored as text in ISO 8601 format (YYYY-MM-DD, e.g. "2024-01-15"). The functions below parse those strings and let you compute calendar-aware differences. Invalid date strings produce NaN, which propagates through arithmetic so bad rows surface downstream rather than silently miscompute.

Function Description Returns
today() current date as an ISO string string (e.g. "2026-04-24")
dateDiff(d1, d2, unit) calendar-aware difference between two dates number
dateAdd(d, n, unit) shift a date by n units; negative n subtracts string (ISO date) or "" on bad input

Supported unit values for dateDiff and dateAdd:

  • "day" — exact integer days.
  • "week" — exact days / 7 for dateDiff; n × 7 days for dateAdd.
  • "month" — for dateDiff, completed calendar months plus a fractional remainder of the current month, where the fraction respects the actual length of that month (28/29/30/31). For dateAdd, n is rounded to whole months; the day-of-month is clamped to the destination month’s length (so Jan 31 + 1 month → Feb 28/29).
  • "year" — for dateDiff, completed years plus a fractional remainder against the actual year length (365 or 366). For dateAdd, n is rounded to whole years; February 29 of a leap year shifts to February 28 in non-leap target years.

Calendar-aware vs average-length: dateDiff(d1, d2, "month") does not divide by an average month length (such as 30.4375 days). Instead, it counts whole calendar months and treats the leftover days as a fraction of the month they fall in, so anniversary dates always come out exact: dateDiff("2024-01-15", "2025-01-15", "year") is exactly 1.0. The same logic anchors February 29 to February 28 in non-leap years.

Examples:

dateDiff("2024-01-15", "2024-02-15", "month")    // 1.0 — exactly one calendar month
dateDiff("2024-01-15", "2024-03-01", "month")    // ~1.517 — 1 month + 15 days into 29-day Feb
dateDiff("2020-02-29", "2021-02-28", "year")     // 1.0 — Feb 29 anchors to Feb 28
dateDiff("2024-01-15", today(), "day")           // days since enrollment, until now

dateAdd("2024-01-15", 60, "month")               // "2029-01-15" — synthetic 5-year cutoff
dateAdd("2024-01-31", 1, "month")                // "2024-02-29" — clamped to month end
dateAdd("2024-03-01", -30, "day")                // "2024-01-31" — negative n subtracts

Practical example: time-to-event for survival analysis

A survival dataset typically has an enrollment date and an event date that may be blank when the participant is censored (still under observation). Three transformation rules turn that into the columns the time-to-event analysis module needs:

cutoff = isBlank(event_date) ? today() : event_date
follow_up = dateDiff(consent_date, cutoff, "month")
event_occurred = isNotBlank(event_date)

The first rule picks the event date when present, otherwise treats the participant as still under observation today. The second computes follow-up time in months from enrollment to that cutoff. The third produces the binary event indicator — the role-picker in the analysis module reads false/true as censored/event automatically.

For days or years instead of months, swap the third argument: dateDiff(consent_date, cutoff, "day") or ... "year". If your dataset has a separate last-contact date for participants still alive at the end of follow-up, use that instead of today():

cutoff = isBlank(event_date) ? last_contact_date : event_date

For studies with a fixed administrative cutoff (e.g. 5-year follow-up window), synthesize the cutoff with dateAdd:

admin_cutoff = dateAdd(consent_date, 60, "month")
cutoff = isBlank(event_date) ? admin_cutoff : event_date

Multi-variable formulas

Use @Name = expression to create multiple output variables in a single rule:

@Total = v1 + v2 + v3
@Average = @Total / 3
@ZScore = (@Total - mean(c1 + c2 + c3)) / std(c1 + c2 + c3)

Each @Name creates a new variable (or overwrites an existing one with the same name). Later declarations can reference earlier ones with the @ prefix. The output options panel is disabled for multi-variable formulas since output targets are explicit.

Intermediate variables

Formulas can span multiple lines. Use assignments to break complex calculations into readable steps:

total = v1 + v2 + v3
total / 3

The last line is the result that gets stored. Earlier lines define intermediate variables for use in subsequent expressions. This is equivalent to writing (v1 + v2 + v3) / 3 but easier to read for complex formulas.

Note: intermediate assignments (with =) are different from multi-variable formulas (with @Name =). Intermediate variables exist only within the formula and are not saved to the dataset.

The formula editor

The formula editor provides several features to help you write formulas:

  • Syntax highlighting — numbers, strings, operators, and function calls are color-coded
  • Bracket matching — matching parentheses are highlighted when the cursor is next to one
  • Autocomplete — start typing (2+ characters) and a suggestion list appears with variable references, functions, and constants. Press Ctrl+Space to open the list manually. Variable suggestions show the display name alongside the reference (e.g. v1 Age)
  • Error checking — syntax errors are underlined in the editor with a description in the gutter. Errors are detected as you type
  • Search — press Ctrl+F to find text, Ctrl+H to find and replace

Tips

  • Missing values become 0 in formulas. If you need different behavior, use coalesce(v1, someDefault) to substitute a specific fallback, or a conditional like v1 == 0 ? NaN : v1 to flag them.
  • v1 vs c1v1 gives you one value (the current row), c1 gives you the entire column as an array. Use c1 for column-wide calculations: v1 - mean(c1) centers each value around the column mean.
  • Errors in formulas produce a missing value for that row — the rest of the data is unaffected. Check the data preview to spot any unexpected blanks.
  • String concatenation uses +: v1 + " " + v2 joins two text values with a space.
  • Parentheses control evaluation order as expected: (v1 + v2) * v3.
  • Constants pi, e, Infinity, and NaN are available.
  • Comments can be added with //: v1 + v2 // total score. Anything after // on the same line is ignored.