csvsee.utils

Shared utility functions for the csvsee library.

exception csvsee.utils.NoMatch

Exception raised when no column name matches a given expression.

class csvsee.utils.ProgressBar(end, prefix='', fill='=', units='secs', width=40)

An ASCII command-line progress bar with percentage.

Adapted from Corey Goldberg’s version: http://code.google.com/p/corey-projects/source/browse/trunk/python2/progress_bar.py

update(current)

Set the current progress.

csvsee.utils.boring_columns(csvfile)

Return a list of column names in csvfile that are “boring”–that is, the data in them is always the same.

csvsee.utils.column_names(csv_file)

Return a list of column names in the given .csv file.

csvsee.utils.filter_csv(csv_infile, csv_outfile, columns, match='regexp', action='include')

Filter csv_infile and write output to csv_outfile.

columns
A list of regular expressions or exact column names
match
regexp to treat each value in columns as a regular expression, exact to match exact literal column names
action
include to keep the specified columns, or exclude to keep all columns except the specified columns
csvsee.utils.float_or_0(value)

Try to convert value to a floating-point number. If conversion fails, return 0.

Examples:

>>> float_or_0(5)
5.0

>>> float_or_0('5')
5.0

>>> float_or_0('five')
0
csvsee.utils.grep_files(filenames, matches, dateformat='guess', resolution=60, show_progress=True)

Search all the given files for matching text, and return a list of (timestamp, counts) for each match, where timestamp is a datetime, and counts is a dictionary of {match: count}, counting the number of times each match was found during intervals of resolution seconds.

csvsee.utils.line_count(filename)

Return the total number of lines in the given file.

csvsee.utils.matching_fields(expr, fields)

Return all fields that match a regular expression expr, or raise a NoMatch exception if no matches are found.

Examples:

>>> matching_fields('a.*', ['apple', 'banana', 'avocado'])
['apple', 'avocado']

>>> matching_fields('a.*', ['peach', 'grape', 'kiwi'])
Traceback (most recent call last):
NoMatch: No matching column found for 'a.*'
csvsee.utils.matching_xy_fields(x_expr, y_exprs, fieldnames, verbose=False)

Match x_expr and y_exprs to all available column names in fieldnames, and return the matched x_column and y_columns.

Example:

>>> matching_xy_fields('x.*', ['y[12]', 'y[ab]'],
...     ['xxx', 'y1', 'y2', 'y3', 'ya', 'yb', 'yc'])
('xxx', ['y1', 'y2', 'ya', 'yb'])

If x_expr is empty, the first column name is used:

>>> matching_xy_fields('', ['y[12]', 'y[ab]'],
...     ['xxx', 'y1', 'y2', 'y3', 'ya', 'yb', 'yc'])
('xxx', ['y1', 'y2', 'ya', 'yb'])

If no match is found for any expression in y_exprs, a NoMatch exception is raised:

>>> matching_xy_fields('', ['y[12]', 'y[jk]'],
...     ['xxx', 'y1', 'y2', 'y3', 'ya', 'yb', 'yc'])
Traceback (most recent call last):
NoMatch: No matching column found for 'y[jk]'
csvsee.utils.read_xy_values(reader, x_column, y_columns, date_format='', gmt_offset=0, zero_time=False)

Read values from a csv.DictReader, and return (x_values, y_values). where x_values is a list of values found in x_column, and y_values is a dictionary of {y_column: [values]} for each column in y_columns.

Arguments:

x_column
Name of the column you want to use as the X axis.
y_columns
Names of columns you want to plot on the Y axis.
date_format
If given, treat values in x_column as timestamps with the given format string.
gmt_offset
Add this many hours to every timestamp. Only useful with date_format.
zero_time
If True, adjust timestamps so the earliest one starts at 00:00 (midnight). Only useful with date_format.
csvsee.utils.strip_prefix(strings)

Strip a common prefix from a sequence of strings. Return (prefix, [stripped]) where prefix is the string that is common (with leading and trailing whitespace removed), and [stripped] is all strings with the prefix removed.

Examples:

>>> strip_prefix(['first', 'fourth', 'fifth'])
('f', ['irst', 'ourth', 'ifth'])

>>> strip_prefix(['spam and eggs', 'spam and potatoes', 'spam and spam'])
('spam and', ['eggs', 'potatoes', 'spam'])
csvsee.utils.top_by(func, count, y_columns, y_values, drop=0)

Apply func to each column, and return the top count column names. Arguments:

func
A function that takes a list of values and returns a single value. max, min, and average are good examples.
count
How many of the “top” values to keep
y_columns
A list of candidate column names. All of these must exist as keys in y_values
y_values
Dictionary of {column: values} for each y-column. Must have data for each column in y_columns (any extra column data will be ignored).
drop
How many top values to skip before returning the next count top columns
csvsee.utils.top_by_average(count, y_columns, y_values, drop=0)

Determine the top count columns based on the average of values in y_values, and return the filtered y_columns names.

csvsee.utils.top_by_peak(count, y_columns, y_values, drop=0)

Determine the top count columns based on the peak value in y_values, and return the filtered y_columns names.

Project Versions

Previous topic

csvsee.dates

Next topic

csvsee.graph

This Page