-
Notifications
You must be signed in to change notification settings - Fork 264
Description
I open this thread related to what I've posted #245
Because initially the problem was on import/update of a excel file, and in my case on export.
See progress here: https://github.com/anykeyh/rubyXL
The problem
- Exporting an excel of around ~100k cells take 25 minutes on sidekiq. Gathering the information is good (few seconds), but rendering using RubyXL is not, and take 99% of the time.
After analyzing the situation, the main problem comes from the reusability of the stylesheets, font and border styles.
All comes from this method in ooxml_object.rb:
def ==(other)
other.is_a?(self.class) &&
obtain_class_variable(:@@ooxml_attributes).all? { |k, v| self.send(v[:accessor]) == other.send(v[:accessor]) } &&
obtain_class_variable(:@@ooxml_child_nodes).all? { |k, v| self.send(v[:accessor]) == other.send(v[:accessor]) }
endThis check if a style/font/border etc... already exists and return the ID of it for linking in the excel file.
The main problem is this computation is long, as all fields are tested, then all subnodes and all fields of subnodes are tested also...
Theses tests are present in convenience_methods.rb, for example:
def register_new_font(new_font, old_xf)
new_xf = old_xf.dup
# v--- HERE for fonts
new_xf.font_id = fonts.find_index { |x| x == new_font } # Reuse existing font, if it exists
new_xf.font_id ||= fonts.size # If this font has never existed before, add it to collection.
fonts[new_xf.font_id] = new_font
new_xf.apply_font = true
new_xf
endAnd performance keep going down as more style exists in your workbook.
Here is an example of excel which take ~8 seconds to generate, and which should not. Ruby-prof give good insight of the problem:
require 'rubyXL'
require 'benchmark'
begin
require 'ruby-prof'
rescue LoadError
puts "This example case is using ruby-prof gem..."
exit
end
def generate_my_excel
@workbook = RubyXL::Workbook.new
@workbook.worksheets.clear
worksheet = @workbook.add_worksheet("TEST")
x = 0
y = 0
bm = Benchmark.realtime do
200.times do |x|
worksheet.add_cell(x, y, "#{x}")
worksheet.sheet_data[x][y].change_border(:top, 'thin')
worksheet.sheet_data[x][y].change_border(:right, 'thin')
worksheet.sheet_data[x][y].change_border(:left, 'thin')
worksheet.sheet_data[x][y].change_border(:bottom, 'medium')
worksheet.sheet_data[x][y].change_font_bold true
x+=1
end
end
puts "Time for header: #{bm}ms"
x = 0
y = 1
50.times do |row|
200.times do |cols|
x+=1
value = rand*100
worksheet.add_cell(x, y, value)
if value > 0.8
worksheet.sheet_data[x][y].change_border(:top, 'thin')
worksheet[x][y].change_fill('00ff00')
elsif value > 0.5
worksheet.sheet_data[x][y].change_border(:top, 'thin')
worksheet[x][y].change_fill('ffff00')
else
worksheet.sheet_data[x][y].change_border(:top, 'thin')
worksheet[x][y].change_fill('ff0000')
end
end
y+=1
x = 0
end
end
RubyProf.start
puts "Generation 200x50 excel with style..."
generate_my_excel
result = RubyProf.stop
printer = RubyProf::GraphPrinter.new(result)
printer.print(STDOUT)
output = "write_complex_table.xlsx"
@workbook.write(output)My idea
I want to implement a test by hash, and an auto hash update on modification of the fields and child nodes.
Eg.
def some_accessor= x
@hash ^= @some_accessor.hash #Remove the X-OR hash merge of the current version of the object
@some_accessor = x #Setter
@hash ^= x.hash # Merge the hash of the object
end
# ...
def hash
@hash
end
# ...
def ==(other)
other.is_a?(self.class) && self.hash == other.hash
endSo far I've implemented a naive method which is working more or less, with some bugs on some styles.
The performance are astonishing (30 sec against 20 minutes... lol), but not yet ready for production.
The problem of hash collision also is not taken in consideration. Hash are 64 bit long and chance of collision on a small set like stylesheets and fonts is less than 10^-15. In short, you can win billions of time in the lottery before it happens.
However, I still need to figure out if the hash function is good enough and spread uniformly. For example, an object with only two nil value will have a hash equals to zero (the two nil.hash canceling each others in the xor operator...)
The second problem is when a field/subnode mutate, it should update the hash of the parent, which is not the case now, leading to issues on finding the styles and wrong styles rendering.
If you have any better idea than this hashing method, I would be pleased to hear from you !