On this page
- Quick start
- Variables
- Input variables
- Special variables
- Values and syntax
- Numbers
- Strings
- Booleans
- Arrays
- Indexing
- Operators
- Arithmetic
- Comparison
- Logical
- Conditional (ternary)
- Math functions
- Basic
- Aggregate
- Trigonometric
- Custom aggregate functions
- String functions
- Case conversion
- Inspection
- Extraction
- Modification
- Splitting and joining
- Pattern matching
- Formatting
- Date functions
- Practical example: time-to-event for survival analysis
- Multi-variable formulas
- Intermediate variables
- The formula editor
- Tips
Formula reference
Formulas in DataSuite 2 are powered by Math.js with additional string and utility functions. This page covers what’s available — from simple arithmetic to advanced expressions.
Quick start
For step-by-step scoring examples with real questionnaires, see the questionnaire scoring guide.
A formula is evaluated once per row — think of it as writing a calculation that runs for each participant (row) in your data, using that participant’s values. Reference your selected input variables as v1, v2, v3, etc.:
(v1 + v2) / 2
Use the range syntax v1:v7 with aggregate functions:
sum(v1:v7)
That’s enough for most use cases. Read on for the full reference.
Variables
Input variables
When you select variables in the input panel, they are numbered in the order you selected them. For example, if you select Age, Score, and Grade — they become v1, v2, and v3 respectively. The reference badges next to each variable name in the input panel show the assigned numbers.
| Reference | Meaning |
|---|---|
v1, v2, v3, … |
value of the 1st, 2nd, 3rd, … input variable in the current row |
c1, c2, c3, … |
full column array of the 1st, 2nd, 3rd, … input variable (all rows) |
v1:v7 |
expands to all variables from v1 through v7 — use inside functions like sum(v1:v7) |
Missing values are treated as 0 when accessed via v1, v2, etc.
Special variables
| Variable | Meaning |
|---|---|
i |
current row number (1-based) |
v |
current value of the variable being processed (see below) |
c |
full column array of the variable being processed (see below) |
v and c are available when a rule has multiple input variables and the output is set to replace original values. In that case, the formula runs once per variable per row — v is the current cell value and c is the full column. This is useful for applying the same transformation to many variables at once, e.g. v * 100 to convert all selected variables to percentages.
Values and syntax
Numbers
Numbers are written as-is: 42, 3.14, -7. Scientific notation is supported: 1.5e3 (= 1500).
Strings
Strings are enclosed in double quotes: "hello". Use + to concatenate: "ID_" + v1.
Booleans
true and false — returned by comparisons and logical operators, and used in conditionals.
Arrays
Arrays are written with square brackets:
[1, 2, 3]
[v1, v2, v3]
["low", "medium", "high"]
Arrays are important because some functions expect them. For example, min and max can take either separate arguments or an array:
min(v1, v2, v3)
min([v1, v2, v3])
Both forms produce the same result.
The range syntax v1:v7 expands into an array automatically, so sum(v1:v7) becomes sum([v1, v2, v3, v4, v5, v6, v7]) internally and works as expected.
Indexing
Access individual elements of an array using square brackets. Indices are 1-based (not 0-based like in most programming languages):
c1[1] // first value in column 1
c1[i] // current row's value in column 1
Operators
Arithmetic
| Operator | Meaning | Example |
|---|---|---|
+ |
addition (or string concatenation) | v1 + v2 |
- |
subtraction | v1 - 10 |
* |
multiplication | v1 * 100 |
/ |
division | v1 / v2 |
% |
modulo (remainder) | v1 % 2 |
^ |
exponentiation | v1 ^ 2 |
The + operator also concatenates strings: "ID_" + v1 produces "ID_42".
Comparison
| Operator | Meaning |
|---|---|
== |
equal to |
!= |
not equal to |
< |
less than |
> |
greater than |
<= |
less than or equal |
>= |
greater than or equal |
These return true or false. Comparisons work with both numbers and strings.
Logical
| Operator | Meaning | Example |
|---|---|---|
and |
logical AND | v1 > 0 and v2 > 0 |
or |
logical OR | v1 == 0 or v2 == 0 |
not |
logical NOT | not (v1 == 0) |
Conditional (ternary)
condition ? valueIfTrue : valueIfFalse
Example — assign a label based on a score:
v1 >= 70 ? "pass" : "fail"
Nest them for multiple conditions:
v1 >= 90 ? "A" : (v1 >= 70 ? "B" : "C")
Math functions
These are the most commonly used built-in Math.js functions. For the complete list, see the Math.js documentation.
Basic
| Function | Description | Example |
|---|---|---|
abs(x) |
absolute value | abs(v1 - v2) |
round(x) |
round to nearest integer | round(v1) |
round(x, n) |
round to n decimal places |
round(v1, 2) |
ceil(x) |
round up | ceil(v1) |
floor(x) |
round down | floor(v1) |
sign(x) |
-1, 0, or 1 | sign(v1) |
sqrt(x) |
square root | sqrt(v1) |
cbrt(x) |
cube root | cbrt(v1) |
pow(x, n) |
power (same as x ^ n) |
pow(v1, 3) |
exp(x) |
e raised to the power x | exp(v1) |
log(x) |
natural logarithm | log(v1) |
log10(x) |
base-10 logarithm | log10(v1) |
log2(x) |
base-2 logarithm | log2(v1) |
Aggregate
These work with multiple values or arrays (e.g. via the v1:v7 range syntax):
| Function | Description | Example |
|---|---|---|
sum(...) |
sum of values | sum(v1:v5) |
mean(...) |
arithmetic mean | mean(v1:v5) |
median(...) |
median | median(v1:v5) |
min(...) |
minimum value | min(v1, v2, v3) |
max(...) |
maximum value | max(v1:v5) |
std(...) |
standard deviation | std(v1:v5) |
variance(...) |
variance | variance(v1:v5) |
Trigonometric
sin(x), cos(x), tan(x), asin(x), acos(x), atan(x) — all in radians.
Custom aggregate functions
| Function | Description | Example |
|---|---|---|
countif(array, value) |
count occurrences of value in array |
countif(c1, "yes") |
sumR(array, k) |
sum of (k - x) for each x in array |
sumR(c1, 100) |
coalesce(a, b) |
returns a if it has a value, otherwise b |
coalesce(v1, 0) |
String functions
All string functions work on text values. Most take a string as the first argument.
Many string functions also support dot syntax — calling the function as a method on the value. Both forms are equivalent:
contains(v1, "test") // function syntax
v1.contains("test") // dot syntax — same result
Use whichever reads more naturally. Dot syntax can be especially convenient for chaining: v1.trim().toLowerCase().
Case conversion
| Function | Description | Example |
|---|---|---|
toLowerCase(s) |
convert to lowercase | toLowerCase(v1) |
toUpperCase(s) |
convert to uppercase | toUpperCase(v1) |
capitalize(s) |
uppercase first character | capitalize(v1) |
Inspection
| Function | Description | Returns |
|---|---|---|
length(s) |
number of characters | number |
isBlank(s) |
true if empty or whitespace only | boolean |
isNotBlank(s) |
true if contains non-whitespace | boolean |
isString(x) |
true if value is a string | boolean |
contains(s, search) |
true if s contains search |
boolean |
startsWith(s, prefix) |
true if s starts with prefix |
boolean |
endsWith(s, suffix) |
true if s ends with suffix |
boolean |
indexOf(s, search) |
position of first match (-1 if not found) | number |
lastIndexOf(s, search) |
position of last match (-1 if not found) | number |
Extraction
| Function | Description | Example |
|---|---|---|
substring(s, start) |
from start to end |
substring(v1, 3) |
substring(s, start, end) |
from start to end (exclusive) |
substring(v1, 0, 5) |
slice(s, start) |
like substring, but supports negative indices |
slice(v1, -3) |
slice(s, start, end) |
extract a portion with negative support | slice(v1, 1, -1) |
charAt(s, index) |
character at position | charAt(v1, 0) |
left(s, n) |
first n characters |
left(v1, 3) |
right(s, n) |
last n characters |
right(v1, 4) |
Modification
| Function | Description | Example |
|---|---|---|
trim(s) |
remove whitespace from both ends | trim(v1) |
trimStart(s) |
remove leading whitespace | trimStart(v1) |
trimEnd(s) |
remove trailing whitespace | trimEnd(v1) |
replace(s, search, repl) |
replace first occurrence | replace(v1, "old", "new") |
replace(s, pattern, repl, flags) |
regex replace | replace(v1, "\\d+", "N", "g") |
replaceAll(s, search, repl) |
replace all occurrences | replaceAll(v1, " ", "_") |
reverse(s) |
reverse the string | reverse(v1) |
repeat(s, n) |
repeat n times |
repeat("*", v2) |
padStart(s, len, pad) |
pad from the left | padStart(v1, 5, "0") |
padEnd(s, len, pad) |
pad from the right | padEnd(v1, 10, ".") |
Splitting and joining
| Function | Description | Example |
|---|---|---|
split(s, separator) |
split into an array | split(v1, ",") |
join(array, separator) |
join array into a string | join(split(v1, ","), ";") |
Pattern matching
| Function | Description | Example |
|---|---|---|
test(s, pattern) |
true if regex matches | test(v1, "^[A-Z]") |
match(s, pattern) |
returns the match (or null) | match(v1, "\\d+") |
test(s, pattern, flags) |
regex with flags | test(v1, "hello", "i") |
Formatting
format(template, arg0, arg1, ...)
Replaces {0}, {1}, etc. with the corresponding arguments:
format("Subject {0}, Group {1}", v1, v2)
Date functions
Dates in DataSuite 2 are stored as text in ISO 8601 format (YYYY-MM-DD, e.g. "2024-01-15"). The functions below parse those strings and let you compute calendar-aware differences. Invalid date strings produce NaN, which propagates through arithmetic so bad rows surface downstream rather than silently miscompute.
| Function | Description | Returns |
|---|---|---|
today() |
current date as an ISO string | string (e.g. "2026-04-24") |
dateDiff(d1, d2, unit) |
calendar-aware difference between two dates | number |
dateAdd(d, n, unit) |
shift a date by n units; negative n subtracts |
string (ISO date) or "" on bad input |
Supported unit values for dateDiff and dateAdd:
"day"— exact integer days."week"— exactdays / 7fordateDiff;n × 7days fordateAdd."month"— fordateDiff, completed calendar months plus a fractional remainder of the current month, where the fraction respects the actual length of that month (28/29/30/31). FordateAdd,nis rounded to whole months; the day-of-month is clamped to the destination month’s length (soJan 31 + 1 month → Feb 28/29)."year"— fordateDiff, completed years plus a fractional remainder against the actual year length (365 or 366). FordateAdd,nis rounded to whole years; February 29 of a leap year shifts to February 28 in non-leap target years.
Calendar-aware vs average-length:
dateDiff(d1, d2, "month")does not divide by an average month length (such as 30.4375 days). Instead, it counts whole calendar months and treats the leftover days as a fraction of the month they fall in, so anniversary dates always come out exact:dateDiff("2024-01-15", "2025-01-15", "year")is exactly1.0. The same logic anchors February 29 to February 28 in non-leap years.
Examples:
dateDiff("2024-01-15", "2024-02-15", "month") // 1.0 — exactly one calendar month
dateDiff("2024-01-15", "2024-03-01", "month") // ~1.517 — 1 month + 15 days into 29-day Feb
dateDiff("2020-02-29", "2021-02-28", "year") // 1.0 — Feb 29 anchors to Feb 28
dateDiff("2024-01-15", today(), "day") // days since enrollment, until now
dateAdd("2024-01-15", 60, "month") // "2029-01-15" — synthetic 5-year cutoff
dateAdd("2024-01-31", 1, "month") // "2024-02-29" — clamped to month end
dateAdd("2024-03-01", -30, "day") // "2024-01-31" — negative n subtracts
Practical example: time-to-event for survival analysis
A survival dataset typically has an enrollment date and an event date that may be blank when the participant is censored (still under observation). Three transformation rules turn that into the columns the time-to-event analysis module needs:
cutoff = isBlank(event_date) ? today() : event_date
follow_up = dateDiff(consent_date, cutoff, "month")
event_occurred = isNotBlank(event_date)
The first rule picks the event date when present, otherwise treats the participant as still under observation today. The second computes follow-up time in months from enrollment to that cutoff. The third produces the binary event indicator — the role-picker in the analysis module reads false/true as censored/event automatically.
For days or years instead of months, swap the third argument: dateDiff(consent_date, cutoff, "day") or ... "year". If your dataset has a separate last-contact date for participants still alive at the end of follow-up, use that instead of today():
cutoff = isBlank(event_date) ? last_contact_date : event_date
For studies with a fixed administrative cutoff (e.g. 5-year follow-up window), synthesize the cutoff with dateAdd:
admin_cutoff = dateAdd(consent_date, 60, "month")
cutoff = isBlank(event_date) ? admin_cutoff : event_date
Multi-variable formulas
Use @Name = expression to create multiple output variables in a single rule:
@Total = v1 + v2 + v3
@Average = @Total / 3
@ZScore = (@Total - mean(c1 + c2 + c3)) / std(c1 + c2 + c3)
Each @Name creates a new variable (or overwrites an existing one with the same name). Later declarations can reference earlier ones with the @ prefix. The output options panel is disabled for multi-variable formulas since output targets are explicit.
Intermediate variables
Formulas can span multiple lines. Use assignments to break complex calculations into readable steps:
total = v1 + v2 + v3
total / 3
The last line is the result that gets stored. Earlier lines define intermediate variables for use in subsequent expressions. This is equivalent to writing (v1 + v2 + v3) / 3 but easier to read for complex formulas.
Note: intermediate assignments (with
=) are different from multi-variable formulas (with@Name =). Intermediate variables exist only within the formula and are not saved to the dataset.
The formula editor
The formula editor provides several features to help you write formulas:
- Syntax highlighting — numbers, strings, operators, and function calls are color-coded
- Bracket matching — matching parentheses are highlighted when the cursor is next to one
- Autocomplete — start typing (2+ characters) and a suggestion list appears with variable references, functions, and constants. Press Ctrl+Space to open the list manually. Variable suggestions show the display name alongside the reference (e.g.
v1 Age) - Error checking — syntax errors are underlined in the editor with a description in the gutter. Errors are detected as you type
- Search — press Ctrl+F to find text, Ctrl+H to find and replace
Tips
- Missing values become
0in formulas. If you need different behavior, usecoalesce(v1, someDefault)to substitute a specific fallback, or a conditional likev1 == 0 ? NaN : v1to flag them. v1vsc1—v1gives you one value (the current row),c1gives you the entire column as an array. Usec1for column-wide calculations:v1 - mean(c1)centers each value around the column mean.- Errors in formulas produce a missing value for that row — the rest of the data is unaffected. Check the data preview to spot any unexpected blanks.
- String concatenation uses
+:v1 + " " + v2joins two text values with a space. - Parentheses control evaluation order as expected:
(v1 + v2) * v3. - Constants
pi,e,Infinity, andNaNare available. - Comments can be added with
//:v1 + v2 // total score. Anything after//on the same line is ignored.